aboutsummaryrefslogtreecommitdiff
path: root/t/t7703-repack-geometric.sh
AgeCommit message (Collapse)Author
2026-01-14builtin/repack: handle promisor packs with geometric repackingPatrick Steinhardt
When performing a fetch with an object filter, we mark the resulting packfile as a promisor pack. An object part of such a pack may miss any of its referenced objects, and Git knows to handle this case by fetching any such missing objects from the promisor remote. The "promisor" property needs to be retained going forward. So every time we pack a promisor object, the resulting pack must be marked as a promisor pack. git-repack(1) does this already: when a repository has a promisor remote, it knows to pass "--exclude-promisor-objects" to the git-pack-objects(1) child process. Promisor packs are written separately when doing an all-into-one repack via `repack_promisor_objects()`. But we don't support promisor objects when doing a geometric repack yet. Promisor packs do not get any special treatment there, as we simply merge promisor and non-promisor packs. The resulting pack is not even marked as a promisor pack, which essentially corrupts the repository. This corruption couldn't happen in the real world though: we pass both "--exclude-promisor-objects" and "--stdin-packs" to git-pack-objects(1) if a repository has a promisor remote, but as those options are mutually exclusive we always end up dying. And while we made those flags compatible with one another in a preceding commit, we still end up dying in case git-pack-objects(1) is asked to repack a promisor pack. There's multiple ways to fix this: - We can exclude promisor packs from the geometric progression altogether. This would have the consequence that we never repack promisor packs at all. But in a partial clone it is quite likely that the user generates a bunch of promisor packs over time, as every backfill fetch would create another one. So this doesn't really feel like a sensible option. - We can adapt git-pack-objects(1) to support repacking promisor packs and include them in the normal geometric progression. But this would mean that the set of promisor objects expands over time as the packs are merged with normal packs. - We can use a separate geometric progression to repack promisor packs. The first two options both have significant downsides, so they aren't really feasible. But the third option fixes both of these downsides: we make sure that promisor packs get merged, and at the same time we never expand the set of promisor objects beyond the set of objects that are already marked as promisor objects. Implement this strategy so that geometric repacking works in partial clones. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-11midx-write: skip rewriting MIDX with `--stdin-packs` unless neededPatrick Steinhardt
In `write_midx_internal()` we know to skip rewriting the multi-pack index in case the existing one already covers all packs. This logic does not know to handle `git multi-pack-index write --stdin-packs` though, so we end up always rewriting the MIDX in this case even if the MIDX would not change. With our default maintenance strategy this isn't really much of a problem, as git-gc(1) does not use the "--stdin-packs" option. But that is changing with geometric repacking, where "--stdin-packs" is used to explicitly select the packfiles part of the geometric sequence. This issue can be demonstrated trivially with a benchmark in the Git repository: executing `git repack --geometric=2 --write-midx -d` in the Git repository takes more than 3 seconds only to end up with the same multi-pack index as we already had before. The logic that decides if we need to rewrite the MIDX only checks whether the number of packfiles covered will change. That check is of course too lenient for "--stdin-packs", as it could happen that we want to cover a different-but-same-size set of packfiles. But there is no inherent reason why we cannot handle "--stdin-packs". Improve the logic to not only check for the number of packs, but to also verify that we are asked to generate a MIDX for the _same_ packs. This allows us to also skip no-op rewrites for "--stdin-packs". Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21t: remove TEST_PASSES_SANITIZE_LEAK annotationsPatrick Steinhardt
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there is no longer a need to have that variable declared in all of our tests. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05builtin/repack: fix leaking keep-pack listPatrick Steinhardt
The list of packs to keep is populated via a command line option but never free'd. Plug this memory leak. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-02tests: teach callers of test_i18ngrep to use test_grepJunio C Hamano
They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-*.sh' | xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-*.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: disable writing bitmaps when doing a local repackPatrick Steinhardt
In order to write a bitmap, we need to have full coverage of all objects that are about to be packed. In the traditional non-multi-pack-index world this meant we need to do a full repack of all objects into a single packfile. But in the new multi-pack-index world we can get away with writing bitmaps when we have multiple packfiles as long as the multi-pack-index covers all objects. This is not always the case though. When asked to perform a repack of local objects, only, then we cannot guarantee to have full coverage of all objects regardless of whether we do a full repack or a repack with a multi-pack-index. The end result is that writing the bitmap will fail in both worlds: $ git multi-pack-index write --stdin-packs --bitmap <packfiles warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing) error: could not write multi-pack bitmap Now there are two different ways to fix this. The first one would be to amend git-multi-pack-index(1) to disable writing bitmaps when we notice that we don't have full object coverage. - We don't have enough information in git-multi-pack-index(1) in order to tell whether the local repository _should_ have full coverage. Because even when connected to an alternate object directory, it may be the case that we still have all objects around in the main object database. - git-multi-pack-index(1) is quite a low-level tool. Automatically disabling functionality that it was asked to provide does not feel like the right thing to do. We can easily fix it at a higher level in git-repack(1) though. When asked to only include local objects via `-l` and when connected to an alternate object directory then we will override the user's ask and disable writing bitmaps with a warning. This is similar to what we do in git-pack-objects(1), where we also disable writing bitmaps in case we omit an object from the pack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: honor `-l` when calculating pack geometryPatrick Steinhardt
When the user passes `-l` to git-repack(1), then they essentially ask us to only repack objects part of the local object database while ignoring any packfiles part of an alternate object database. And we in fact honor this bit when doing a geometric repack as the resulting packfile will only ever contain local objects. What we're missing though is that we don't take locality of packfiles into account when computing whether the geometric sequence is intact or not. So even though we would only ever roll up local packfiles anyway, we could end up trying to repack because of non-local packfiles. This does not make much sense, and in the worst case it can cause us to try and do the geometric repack over and over again because we're never able to restore the geometric sequence. Fix this bug by honoring whether the user has passed `-l`. If so, we skip adding any non-local packfiles to the pack geometry. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14pack-objects: fix error when packing same pack twicePatrick Steinhardt
When passed the same packfile twice via `--stdin-packs` we return an error that the packfile supposedly was not found. This is because when reading packs into the list of included or excluded packfiles, we will happily re-add packfiles even if they are part of the lists already. And while the list can now contain duplicates, we will only set the `util` pointer of the first list entry to the `packed_git` structure. We notice that at a later point when checking that all list entries have their `util` pointer set and die with an error. While this is kind of a nonsensical request, this scenario can be hit when doing geometric repacks. When a repository is connected to an alternate object directory and both have the exact same packfile then both would get added to the geometric sequence. And when we then decide to perform the repack, we will invoke git-pack-objects(1) with the same packfile twice. Fix this bug by removing any duplicates from both the included and excluded packs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix generating multi-pack-index with only non-local packsPatrick Steinhardt
When writing the multi-pack-index with geometric repacking we will add all packfiles to the index that are part of the geometric sequence. This can potentially also include packfiles borrowed from an alternate object directory. But given that a multi-pack-index can only ever include packs that are part of the main object database this does not make much sense whatsoever. In the edge case where all packfiles are contained in the alternate object database and the local repository has none itself this bug can cause us to invoke git-multi-pack-index(1) with only non-local packfiles that it ultimately cannot find. This causes it to return an error and thus causes the geometric repack to fail. Fix the code to skip non-local packfiles. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14repack: fix trying to use preferred pack in alternatesPatrick Steinhardt
When doing a geometric repack with multi-pack-indices, then we ask git-multi-pack-index(1) to use the largest packfile as the preferred pack. It can happen though that the largest packfile is not part of the main object database, but instead part of an alternate object database. The result is that git-multi-pack-index(1) will not be able to find the preferred pack and print a warning. It then falls back to use the first packfile that the multi-pack-index shall reference. Fix this bug by only considering packfiles as preferred pack that are local. This is the right thing to do given that a multi-pack-index should never reference packfiles borrowed from an alternate. While at it, rename the function `get_largest_active_packfile()` to `get_preferred_pack()` to better document its intent. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17repack: don't remove .keep packs with `--pack-kept-objects`Taylor Blau
`git repack` supports a `--pack-kept-objects` flag which more or less translates to whether or not we pass `--honor-pack-keep` down to `git pack-objects` when assembling a new pack. This behavior has existed since ee34a2bead (repack: add `repack.packKeptObjects` config var, 2014-03-03). In that commit, the documentation was extended to say: [...] Note that we still do not delete `.keep` packs after `pack-objects` finishes. Unfortunately, this is not the case when `--pack-kept-objects` is combined with a `--geometric` repack. When doing a geometric repack, we include `.keep` packs when enumerating available packs only when `pack_kept_objects` is set. So this all works fine when `--no-pack-kept-objects` (or similar) is given. Kept packs are excluded from the geometric roll-up, so when we go to delete redundant packs (with `-d`), no `.keep` packs appear "below the split" in our geometric progression. But when `--pack-kept-objects` is given, things can go awry. Namely, when a kept pack is included in the list of packs tracked by the `pack_geometry` struct *and* part of the pack roll-up, we will delete the `.keep` pack when we shouldn't. Note that this *doesn't* result in object corruption, since the `.keep` pack's objects are still present in the new pack. But the `.keep` pack itself is removed, which violates our promise from back in ee34a2bead. But there's more. Because `repack` computes the geometric roll-up independently from selecting which packs belong in a MIDX (with `--write-midx`), this can lead to odd behavior. Consider when a `.keep` pack appears below the geometric split (ie., its objects will be part of the new pack we generate). We'll write a MIDX containing the new pack along with the existing `.keep` pack. But because the `.keep` pack appears below the geometric split line, we'll (incorrectly) try to remove it. While this doesn't corrupt the repository, it does cause us to remove the MIDX we just wrote, since removing that pack would invalidate the new MIDX. Funny enough, this behavior became far less noticeable after e4d0c11c04 (repack: respect kept objects with '--write-midx -b', 2021-12-20), which made `pack_kept_objects` be enabled by default only when we were writing a non-MIDX bitmap. But e4d0c11c04 didn't resolve this bug, it just made it harder to notice unless callers explicitly passed `--pack-kept-objects`. The solution is to avoid trying to remove `.keep` packs during `--geometric` repacks, even when they appear below the geometric split line, which is the approach this patch implements. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20builtin/repack.c: ensure that `names` is sortedTaylor Blau
The previous patch demonstrates a scenario where the list of packs written by `pack-objects` (and stored in the `names` string_list) is out-of-order, and can thus cause us to delete packs we shouldn't. This patch resolves that bug by ensuring that `names` is sorted in all cases, not just when delete_redundant && pack_everything & ALL_INTO_ONE is true. Because we did sort `names` in that case (which, prior to `--geometric` repacks, was the only time we would actually delete packs, this is only a bug for `--geometric` repacks. It would be sufficient to only sort `names` when `delete_redundant` is set to a non-zero value. But sorting a small list of strings is cheap, and it is defensive against future calls to `string_list_has_string()` on this list. Co-discovered-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20t7703: demonstrate object corruption with pack.packSizeLimitTaylor Blau
When doing a `--geometric=<d>` repack, `git repack` determines a splitting point among packs ordered by their object count such that: - each pack above the split has at least `<d>` times as many objects as the next-largest pack by object count, and - the first pack above the split has at least `<d>` times as many object as the sum of all packs below the split line combined `git repack` then creates a pack containing all of the objects contained in packs below the split line by running `git pack-objects --stdin-packs` underneath. Once packs are moved into place, then any packs below the split line are removed, since their objects were just combined into a new pack. But `git repack` tries to be careful to avoid removing a pack that it just wrote, by checking: struct packed_git *p = geometry->pack[i]; if (string_list_has_string(&names, hash_to_hex(p->hash))) continue; in the `delete_redundant` and `geometric` conditional towards the end of `cmd_repack`. But it's possible to trick `git repack` into not recognizing a pack that it just wrote when `names` is out-of-order (which violates `string_list_has_string()`'s assumption that the list is sorted and thus binary search-able). When this happens in just the right circumstances, it is possible to remove a pack that we just wrote, leading to object corruption. Luckily, this is quite difficult to provoke in practice (for a couple of reasons): - we ordinarily write just one pack, so `names` usually contains just one entry, and is thus sorted - when we do write more than one pack (e.g., due to `--max-pack-size`) we have to: (a) write a pack identical to one that already exists, (b) have that pack be below the split line, and (c) have the set of packs written by `pack-objects` occur in an order which tricks `string_list_has_string()`. Demonstrate the above scenario in a failing test, which causes `git repack --geometric` to write a pack which occurs below the split line, _and_ fail to recognize that it wrote that pack. The following patch will fix this bug. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20repack: respect --keep-pack with geometric repackVictoria Dye
Update 'repack' to ignore packs named on the command line with the '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat command line-kept packs the same way it treats packs with an on-disk '.keep' file (that is, skip the pack and do not include it in the 'geometry' structure). Without this handling, a '--keep-pack' pack would be included in the 'geometry' structure. If the pack is *before* the geometry split line (with at least one other pack and/or loose objects present), 'repack' assumes the pack's contents are "rolled up" into another pack via 'pack-objects'. However, because the internally-invoked 'pack-objects' properly excludes '--keep-pack' objects, any new pack it creates will not contain the kept objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since it assumes 'pack-objects' created a new pack with its contents), resulting in possible object loss and repository corruption. Add a test ensuring that '--keep-pack' packs are now appropriately handled. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28builtin/repack.c: make largest pack preferredTaylor Blau
When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28builtin/repack.c: support writing a MIDX while repackingTaylor Blau
Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX *after* redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05t7703: test --geometric repack with loose objectsTaylor Blau
We don't currently have a test that demonstrates the non-idempotent behavior of 'git repack --geometric' with loose objects, so add one here to make sure we don't regress in this area. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05builtin/repack.c: do not repack single packs with --geometricTaylor Blau
In 0fabafd0b9 (builtin/repack.c: add '--geometric' option, 2021-02-22), the 'git repack --geometric' code aborts early when there is zero or one pack. When there are no packs, this code does the right thing by placing the split at "0". But when there is exactly one pack, the split is placed at "1", which means that "git repack --geometric" (with any factor) repacks all of the objects in a single pack. This is wasteful, and the remaining code in split_pack_geometry() does the right thing (not repacking the objects in a single pack) even when only one pack is present. Loosen the guard to only stop when there aren't any packs, and let the rest of the code do the right thing. Add a test to ensure that this is the case. Noticed-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22builtin/repack.c: add '--geometric' optionTaylor Blau
Often it is useful to both: - have relatively few packfiles in a repository, and - avoid having so few packfiles in a repository that we repack its entire contents regularly This patch implements a '--geometric=<n>' option in 'git repack'. This allows the caller to specify that they would like each pack to be at least a factor times as large as the previous largest pack (by object count). Concretely, say that a repository has 'n' packfiles, labeled P1, P2, ..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'. With a geometric factor of 'r', it should be that: objects(Pi) > r*objects(P(i-1)) for all i in [1, n], where the packs are sorted by objects(P1) <= objects(P2) <= ... <= objects(Pn). Since finding a true optimal repacking is NP-hard, we approximate it along two directions: 1. We assume that there is a cutoff of packs _before starting the repack_ where everything to the right of that cut-off already forms a geometric progression (or no cutoff exists and everything must be repacked). 2. We assume that everything smaller than the cutoff count must be repacked. This forms our base assumption, but it can also cause even the "heavy" packs to get repacked, for e.g., if we have 6 packs containing the following number of objects: 1, 1, 1, 2, 4, 32 then we would place the cutoff between '1, 1' and '1, 2, 4, 32', rolling up the first two packs into a pack with 2 objects. That breaks our progression and leaves us: 2, 1, 2, 4, 32 ^ (where the '^' indicates the position of our split). To restore a progression, we move the split forward (towards larger packs) joining each pack into our new pack until a geometric progression is restored. Here, that looks like: 2, 1, 2, 4, 32 ~> 3, 2, 4, 32 ~> 5, 4, 32 ~> ... ~> 9, 32 ^ ^ ^ ^ This has the advantage of not repacking the heavy-side of packs too often while also only creating one new pack at a time. Another wrinkle is that we assume that loose, indexed, and reflog'd objects are insignificant, and lump them into any new pack that we create. This can lead to non-idempotent results. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>