From 91badeba32d31d4dcc695a8888e5b697b4c3d90c Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Mon, 24 Oct 2022 14:43:12 -0400 Subject: builtin/repack.c: implement `--expire-to` for storing pruned objects When pruning objects with `--cruft`, `git repack` offers some flexibility when selecting the set of which objects are pruned via the `--cruft-expiration` option. This is useful for expiring objects which are older than the grace period, making races where to-be-pruned objects become reachable and then ancestors of freshly pushed objects, leaving the repository in a corrupt state after pruning substantially less likely [1]. But in practice, such races are impossible to avoid entirely, no matter how long the grace period is. To prevent this race, it is often advisable to temporarily put a repository into a read-only state. But in practice, this is not always practical, and so some middle ground would be nice. This patch introduces a new option, `--expire-to`, which teaches `git repack` to write an additional cruft pack containing just the objects which were pruned from the repository. The caller can specify a directory outside of the current repository as the destination for this second cruft pack. This makes it possible to prune objects from a repository, while still holding onto a supplemental copy of them outside of the original repository. Having this copy on-disk makes it substantially easier to recover objects when the aforementioned race is encountered. `--expire-to` is implemented in a somewhat convoluted manner, which is to take advantage of the fact that the first time `write_cruft_pack()` is called, it adds the name of the cruft pack to the `names` string list. That means the second time we call `write_cruft_pack()`, objects in the previously-written cruft pack will be excluded. As long as the caller ensures that no objects are expired during the second pass, this is sufficient to generate a cruft pack containing all objects which don't appear in any of the new packs written by `git repack`, including the cruft pack. In other words, all of the objects which are about to be pruned from the repository. It is important to note that the destination in `--expire-to` does not necessarily need to be a Git repository (though it can be) Notably, the expired packs do not contain all ancestors of expired objects. So if the source repository contains something like: / C1 --- C2 \ refs/heads/master where C2 is unreachable, but has a parent (C1) which is reachable, and C2 would be pruned, then the expiry pack will contain only C2, not C1. [1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/ Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- builtin/repack.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) (limited to 'builtin/repack.c') diff --git a/builtin/repack.c b/builtin/repack.c index 432fa3f677..a70962ee03 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -702,6 +702,10 @@ static int write_cruft_pack(const struct pack_objects_args *args, * By the time it is read here, it contains only the pack(s) * that were just written, which is exactly the set of packs we * want to consider kept. + * + * If `--expire-to` is given, the double-use served by `names` + * ensures that the pack written to `--expire-to` excludes any + * objects contained in the cruft pack. */ in = xfdopen(cmd.in, "w"); for_each_string_list_item(item, names) @@ -755,6 +759,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int geometric_factor = 0; int write_midx = 0; const char *cruft_expiration = NULL; + const char *expire_to = NULL; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -804,6 +809,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("find a geometric progression with factor ")), OPT_BOOL('m', "write-midx", &write_midx, N_("write a multi-pack index of the resulting packs")), + OPT_STRING(0, "expire-to", &expire_to, N_("dir"), + N_("pack prefix to store a pack containing pruned objects")), OPT_END() }; @@ -1000,6 +1007,39 @@ int cmd_repack(int argc, const char **argv, const char *prefix) &existing_kept_packs); if (ret) return ret; + + if (delete_redundant && expire_to) { + /* + * If `--expire-to` is given with `-d`, it's possible + * that we're about to prune some objects. With cruft + * packs, pruning is implicit: any objects from existing + * packs that weren't picked up by new packs are removed + * when their packs are deleted. + * + * Generate an additional cruft pack, with one twist: + * `names` now includes the name of the cruft pack + * written in the previous step. So the contents of + * _this_ cruft pack exclude everything contained in the + * existing cruft pack (that is, all of the unreachable + * objects which are no older than + * `--cruft-expiration`). + * + * To make this work, cruft_expiration must become NULL + * so that this cruft pack doesn't actually prune any + * objects. If it were non-NULL, this call would always + * generate an empty pack (since every object not in the + * cruft pack generated above will have an mtime older + * than the expiration). + */ + ret = write_cruft_pack(&cruft_po_args, expire_to, + pack_prefix, + NULL, + &names, + &existing_nonkept_packs, + &existing_kept_packs); + if (ret) + return ret; + } } string_list_sort(&names); -- cgit v1.3-5-g9baa