aboutsummaryrefslogtreecommitdiff
path: root/Documentation/gitformat-pack.adoc
diff options
context:
space:
mode:
authorTaylor Blau <me@ttaylorr.com>2026-02-24 14:00:13 -0500
committerJunio C Hamano <gitster@pobox.com>2026-02-24 11:16:33 -0800
commitb2ec8e90c201d2395b67f89f3bf38bb472e43460 (patch)
treec86297263d2deb2234118d553f2cd02e1cdedf9b /Documentation/gitformat-pack.adoc
parent82c905ea6bd79cd2045fea91b67f4c858379bff1 (diff)
downloadgit-b2ec8e90c201d2395b67f89f3bf38bb472e43460.tar.xz
midx: do not require packs to be sorted in lexicographic order
The MIDX file format currently requires that pack files be identified by the lexicographic ordering of their names (that is, a pack having a checksum beginning with "abc" would have a numeric pack_int_id which is smaller than the same value for a pack beginning with "bcd"). As a result, it is impossible to combine adjacent MIDX layers together without permuting bits from bitmaps that are in more recent layer(s). To see why, consider the following example: | packs | preferred pack --------+-------------+--------------- MIDX #0 | { X, Y, Z } | Y MIDX #1 | { A, B, C } | B MIDX #2 | { D, E, F } | D , where MIDX #2's base MIDX is MIDX #1, and so on. Suppose that we want to combine MIDX layers #0 and #1, to create a new layer #0' containing the packs from both layers. With the original three MIDX layers, objects are laid out in the bitmap in the order they appear in their source pack, and the packs themselves are arranged according to the pseudo-pack order. In this case, that ordering is Y, X, Z, B, A, C. But recall that the pseudo-pack ordering is defined by the order that packs appear in the MIDX, with the exception of the preferred pack, which sorts ahead of all other packs regardless of its position within the MIDX. In the above example, that means that pack 'Y' could be placed anywhere (so long as it is designated as preferred), however, all other packs must be placed in the location listed above. Because that ordering isn't sorted lexicographically, it is impossible to compact MIDX layers in the above configuration without permuting the object-to-bit-position mapping. Changing this mapping would affect all bitmaps belonging to newer layers, rendering the bitmaps associated with MIDX #2 unreadable. One of the goals of MIDX compaction is that we are able to shrink the length of the MIDX chain *without* invalidating bitmaps that belong to newer layers, and the lexicographic ordering constraint is at odds with this goal. However, packs do not *need* to be lexicographically ordered within the MIDX. As far as I can gather, the only reason they are sorted lexically is to make it possible to perform a binary search over the pack names in a MIDX, necessary to make `midx_contains_pack()`'s performance logarithmic in the number of packs rather than linear. Relax this constraint by allowing MIDX writes to proceed with packs that are not arranged in lexicographic order. `midx_contains_pack()` will lazily instantiate a `pack_names_sorted` array on the MIDX, which will be used to implement the binary search over pack names. This change produces MIDXs which may not be correctly read with external tools or older versions of Git. Though older versions of Git know how to gracefully degrade and ignore any MIDX(s) they consider corrupt, external tools may not be as robust. To avoid unintentionally breaking any such tools, guard this change behind a version bump in the MIDX's on-disk format. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/gitformat-pack.adoc')
-rw-r--r--Documentation/gitformat-pack.adoc8
1 files changed, 6 insertions, 2 deletions
diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index 1b4db4aa61..3416edceab 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -374,7 +374,9 @@ HEADER:
The signature is: {'M', 'I', 'D', 'X'}
1-byte version number:
- Git only writes or recognizes version 1.
+ Git writes the version specified by the "midx.version"
+ configuration option, which defaults to 2. It recognizes
+ both versions 1 and 2.
1-byte Object Id Version
We infer the length of object IDs (OIDs) from this value:
@@ -413,7 +415,9 @@ CHUNK DATA:
strings. There is no extra padding between the filenames,
and they are listed in lexicographic order. The chunk itself
is padded at the end with between 0 and 3 NUL bytes to make the
- chunk size a multiple of 4 bytes.
+ chunk size a multiple of 4 bytes. Version 1 MIDXs are required to
+ list their packs in lexicographic order, but version 2 MIDXs may
+ list their packs in any arbitrary order.
Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
Stores a table of two 4-byte unsigned integers in network order.