[go: up one dir, main page]

Skip to content

Fix archiving some corner-case files into zip

At $DAYJOB, we've add a customer report an issue where they failed to download a zip archive from a repository. The error they saw come from git-archive(1) is:

    fatal: deflate error (0)

My friendly colleague Justin Tobler was able to reproduce this issue1. We've diagnosed this error happens on some files that exceed core.bigFileThreshold. To reproduce the issue, you can run:

    git clone --depth=1 https://github.com/chromium/chromium.git
    cd chromium
    git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
            chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js

(originally he mentioned another file, but that didn't trigger the bug for me)

And a patch to fix the issue was presented that message.

I have tested the fix, and I can confirm this fixes the issue. But I'm concerned this doesn't fix all issues.

Another way one could trigger the issue, is by initializing unsigned char compressed with length STREAM_BUFFER_SIZE / 2 (so half the length of the input buffer, instead of double).

With Justin's fix, you see the error doesn't happen no more. But it seems, the resulting zip archive isn't valid. When I try to unzip it, I see:

inflating: chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js   bad CRC 3ba68a86  (should be b09a04a2)

And when the length is set to STREAM_BUFFER_SIZE (so equal length to input buffer), the decompress goes well, but the data seems to be mangled.

This is because only the final call of git_deflate() is being wrapped in a loop for the current chunk of input data. We can see in various other callsites in the Git codebase, git_deflate() is usually called in a while loop (even when the flush parameter is set to 0 = Z_NO_FLUSH).

For the record, I want to give all the credit to Justin for diagnosing this bug and to determine a solution. Where he aims to provide a fix that is minimal, I wanted to present an alternative solution that implements zlib usuage according to the official usage example2, but the changes are more substantial.

I'm on the fence which of two is the better approach. Because the ZIP format has a End Of Central Directory record (EOCD) at the end, it's far more likely only the final git_deflate() call suffers from unprocessed input data, so the final Justin provides probably Just Works. I'm gonna leave it up to the community to decide what is "better"?

-- Cheers, Toon

Cc: Justin Tobler jltobler@gmail.com Cc: "René Scharfe" l.s.r@web.de


Changes in v2:

--- b4-submit-tracking ---

This section is used internally by b4 prep for tracking purposes.

{ "series": { "revision": 2, "change-id": "20250801-toon-archive-zip-fix-2deac42d5aa3", "prefixes": [], "history": { "v1": [ "20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com" ] } } }

Edited by Toon Claes

Merge request reports

Loading