From d091fafdc64521111e204ffa9efb81e7154f49bf Mon Sep 17 00:00:00 2001 From: Toon Claes Date: Fri, 1 Aug 2025 10:24:35 +0200 Subject: [PATCH 1/5] Fix archiving some corner-case files into zip MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit At $DAYJOB, we've add a customer report an issue where they failed to download a zip archive from a repository. The error they saw come from git-archive(1) is: fatal: deflate error (0) My friendly colleague Justin Tobler was able to reproduce this issue[1]. We've diagnosed this error happens on some files that exceed core.bigFileThreshold. To reproduce the issue, you can run: git clone --depth=1 https://github.com/chromium/chromium.git cd chromium git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \ chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js (originally he mentioned another file, but that didn't trigger the bug for me) And a patch to fix the issue was presented that message. I have tested the fix, and I can confirm this fixes the issue. But I'm concerned this doesn't fix all issues. Another way one could trigger the issue, is by initializing `unsigned char compressed` with length `STREAM_BUFFER_SIZE / 2` (so half the length of the input buffer, instead of double). With Justin's fix, you see the error doesn't happen no more. But it seems, the resulting zip archive isn't valid. When I try to unzip it, I see: inflating: chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js bad CRC 3ba68a86 (should be b09a04a2) And when the length is set to `STREAM_BUFFER_SIZE` (so equal length to input buffer), the decompress goes well, but the data seems to be mangled. This is because only the final call of git_deflate() is being wrapped in a loop for the current chunk of input data. We can see in various other callsites in the Git codebase, git_deflate() is usually called in a `while` loop (even when the `flush` parameter is set to `0` = Z_NO_FLUSH). For the record, I want to give all the credit to Justin for diagnosing this bug and to determine a solution. Where he aims to provide a fix that is minimal, I wanted to present an alternative solution that implements zlib usuage according to the official usage example[2], but the changes are more substantial. I'm on the fence which of two is the better approach. Because the ZIP format has a End Of Central Directory record (EOCD) at the end, it's far more likely *only* the final git_deflate() call suffers from unprocessed input data, so the final Justin provides probably Just Works. I'm gonna leave it up to the community to decide what is "better"? [1]: https://lore.kernel.org/git/20250802220803.95137-1-jltobler@gmail.com/ [2]: https://zlib.net/zlib_how.html [3]: https://en.wikipedia.org/wiki/ZIP_(file_format)#End_of_central_directory_record_(EOCD) -- Cheers, Toon Cc: Justin Tobler Cc: "René Scharfe" --- Changes in v2: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v1: https://lore.kernel.org/r/20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 2, "change-id": "20250801-toon-archive-zip-fix-2deac42d5aa3", "prefixes": [], "history": { "v1": [ "20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com" ] } } } -- GitLab From 110fe84cb56a4a9be72dbb92ecd1bb7d4a57bc06 Mon Sep 17 00:00:00 2001 From: Toon Claes Date: Mon, 4 Aug 2025 16:11:44 +0200 Subject: [PATCH 2/5] archive-zip: deduplicate code setting output buffer in write_zip_entry() There were two callsites setting the size and address of the output buffer. Instead of setting them outside the loop and in the loop after calling git_deflate(). Set them once in the loop, right before the git_deflate() call. Co-authored-by: Justin Tobler Signed-off-by: Toon Claes --- archive-zip.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/archive-zip.c b/archive-zip.c index df8866d5bae..cc6d0cadd9a 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -458,8 +458,6 @@ static int write_zip_entry(struct archiver_args *args, git_deflate_init_raw(&zstream, args->compression_level); compressed_size = 0; - zstream.next_out = compressed; - zstream.avail_out = sizeof(compressed); for (;;) { readlen = read_istream(stream, buf, sizeof(buf)); @@ -473,6 +471,8 @@ static int write_zip_entry(struct archiver_args *args, zstream.next_in = buf; zstream.avail_in = readlen; + zstream.next_out = compressed; + zstream.avail_out = sizeof(compressed); result = git_deflate(&zstream, 0); if (result != Z_OK) die(_("deflate error (%d)"), result); @@ -481,8 +481,6 @@ static int write_zip_entry(struct archiver_args *args, if (out_len > 0) { write_or_die(1, compressed, out_len); compressed_size += out_len; - zstream.next_out = compressed; - zstream.avail_out = sizeof(compressed); } } -- GitLab From 212d4e575a1d824b31e8b5534b1a40174645cb69 Mon Sep 17 00:00:00 2001 From: Toon Claes Date: Mon, 4 Aug 2025 16:19:30 +0200 Subject: [PATCH 3/5] archive-zip: remove unneccesarry condition in write_zip_entry() The function write_or_die() can handle a length that's zero, thus we can remove the condition that checks the value of `out_len` that surrounds this call. The value shall never be negative as this would have caused data being omitted in the deflated output. Co-authored-by: Justin Tobler Signed-off-by: Toon Claes --- archive-zip.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/archive-zip.c b/archive-zip.c index cc6d0cadd9a..d41a12de5f5 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -478,11 +478,8 @@ static int write_zip_entry(struct archiver_args *args, die(_("deflate error (%d)"), result); out_len = zstream.next_out - compressed; - if (out_len > 0) { - write_or_die(1, compressed, out_len); - compressed_size += out_len; - } - + write_or_die(1, compressed, out_len); + compressed_size += out_len; } close_istream(stream); if (readlen) -- GitLab From d89564c24aed4932328576eb203df764177cb28f Mon Sep 17 00:00:00 2001 From: Toon Claes Date: Mon, 4 Aug 2025 16:29:39 +0200 Subject: [PATCH 4/5] archive-zip: in write_zip_entry() call git_deflate() in a loop The function git_deflate() might not complete to deflate all the input data in one go. While the function is already being called in a loop, every loop fresh data is read from the stream. This is not correct, because input data might get lost. As we see in many other callsites, git_deflate() should be called in a loop on the existing input to make it process all the input data. Add in a nested loop around git_deflate() to process the input buffer completely, before continuing the parent loop that reads from more data from the input stream. Co-authored-by: Justin Tobler Signed-off-by: Toon Claes --- archive-zip.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/archive-zip.c b/archive-zip.c index d41a12de5f5..25a02241309 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -471,15 +471,17 @@ static int write_zip_entry(struct archiver_args *args, zstream.next_in = buf; zstream.avail_in = readlen; - zstream.next_out = compressed; - zstream.avail_out = sizeof(compressed); - result = git_deflate(&zstream, 0); - if (result != Z_OK) - die(_("deflate error (%d)"), result); - out_len = zstream.next_out - compressed; - - write_or_die(1, compressed, out_len); - compressed_size += out_len; + do { + zstream.next_out = compressed; + zstream.avail_out = sizeof(compressed); + result = git_deflate(&zstream, 0); + if (result != Z_OK) + die(_("deflate error (%d)"), result); + out_len = zstream.next_out - compressed; + + write_or_die(1, compressed, out_len); + compressed_size += out_len; + } while (zstream.avail_out == 0); } close_istream(stream); if (readlen) -- GitLab From 24e32b63de6e86cad095a38475f3f1eed4469e8d Mon Sep 17 00:00:00 2001 From: Toon Claes Date: Mon, 4 Aug 2025 17:04:00 +0200 Subject: [PATCH 5/5] archive-zip: move git_deflate() with Z_FINISH into the loop Instead duplicating code to do the final deflate (with `flush` value Z_FINISH), bring this call inside the loop that's deflate parts of the input stream. This causes also this final deflate to be wrapped in a loop to ensure the whole input is taken care of. This change makes crc32() to be called without checking if the `readlen` is greater than zero, but looking at the zlib manual[1] should be allowed. This patch concluded some refactoring, making the code more similar to the example usage of the official zlib docs[2]. [1]: https://zlib.net/manual.html [2]: https://zlib.net/zlib_how.html Co-authored-by: Justin Tobler Signed-off-by: Toon Claes --- archive-zip.c | 25 +++++++++---------------- 1 file changed, 9 insertions(+), 16 deletions(-) diff --git a/archive-zip.c b/archive-zip.c index 25a02241309..559ed267be4 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -451,7 +451,7 @@ static int write_zip_entry(struct archiver_args *args, unsigned char buf[STREAM_BUFFER_SIZE]; ssize_t readlen; git_zstream zstream; - int result; + int result, flush; size_t out_len; unsigned char compressed[STREAM_BUFFER_SIZE * 2]; @@ -459,44 +459,37 @@ static int write_zip_entry(struct archiver_args *args, compressed_size = 0; - for (;;) { + do { readlen = read_istream(stream, buf, sizeof(buf)); - if (readlen <= 0) + if (readlen < 0) break; crc = crc32(crc, buf, readlen); - if (is_binary == -1) + if ((is_binary == -1) && readlen) is_binary = entry_is_binary(args->repo->index, path_without_prefix, buf, readlen); + flush = readlen ? Z_NO_FLUSH : Z_FINISH; zstream.next_in = buf; zstream.avail_in = readlen; do { zstream.next_out = compressed; zstream.avail_out = sizeof(compressed); - result = git_deflate(&zstream, 0); - if (result != Z_OK) + result = git_deflate(&zstream, flush); + if ((result != Z_OK) && (result != Z_STREAM_END)) die(_("deflate error (%d)"), result); out_len = zstream.next_out - compressed; write_or_die(1, compressed, out_len); compressed_size += out_len; } while (zstream.avail_out == 0); - } + } while (flush != Z_FINISH); + close_istream(stream); if (readlen) return readlen; - zstream.next_in = buf; - zstream.avail_in = 0; - result = git_deflate(&zstream, Z_FINISH); - if (result != Z_STREAM_END) - die("deflate error (%d)", result); - git_deflate_end(&zstream); - out_len = zstream.next_out - compressed; - write_or_die(1, compressed, out_len); - compressed_size += out_len; zip_offset += compressed_size; write_zip_data_desc(size, compressed_size, crc); -- GitLab