Fix archiving some corner-case files into zip
At $DAYJOB, we've add a customer report an issue where they failed to download a zip archive from a repository. The error they saw come from git-archive(1) is:
fatal: deflate error (0)
My friendly colleague Justin Tobler was able to reproduce this issue1. We've diagnosed this error happens on some files that exceed core.bigFileThreshold. To reproduce the issue, you can run:
git clone --depth=1 https://github.com/chromium/chromium.git
cd chromium
git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js
(originally he mentioned another file, but that didn't trigger the bug for me)
And a patch to fix the issue was presented that message.
I have tested the fix, and I can confirm this fixes the issue. But I'm concerned this doesn't fix all issues.
Another way one could trigger the issue, is by initializing
unsigned char compressed
with length STREAM_BUFFER_SIZE / 2
(so half
the length of the input buffer, instead of double).
With Justin's fix, you see the error doesn't happen no more. But it seems, the resulting zip archive isn't valid. When I try to unzip it, I see:
inflating: chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js bad CRC 3ba68a86 (should be b09a04a2)
And when the length is set to STREAM_BUFFER_SIZE
(so equal length to
input buffer), the decompress goes well, but the data seems to be
mangled.
This is because only the final call of git_deflate() is being wrapped in
a loop for the current chunk of input data. We can see in various other
callsites in the Git codebase, git_deflate() is usually called in a
while
loop (even when the flush
parameter is set to 0
=
Z_NO_FLUSH).
For the record, I want to give all the credit to Justin for diagnosing this bug and to determine a solution. Where he aims to provide a fix that is minimal, I wanted to present an alternative solution that implements zlib usuage according to the official usage example2, but the changes are more substantial.
I'm on the fence which of two is the better approach. Because the ZIP format has a End Of Central Directory record (EOCD) at the end, it's far more likely only the final git_deflate() call suffers from unprocessed input data, so the final Justin provides probably Just Works. I'm gonna leave it up to the community to decide what is "better"?
-- Cheers, Toon
Cc: Justin Tobler jltobler@gmail.com Cc: "René Scharfe" l.s.r@web.de
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com
--- b4-submit-tracking ---
This section is used internally by b4 prep for tracking purposes.
{ "series": { "revision": 2, "change-id": "20250801-toon-archive-zip-fix-2deac42d5aa3", "prefixes": [], "history": { "v1": [ "20250804-toon-archive-zip-fix-v1-0-ca89858e5eaa@iotcl.com" ] } } }