From 3d89a8c11801af1f7aae9d009240fd43cf322845 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Fri, 20 May 2022 19:17:32 -0400 Subject: Documentation/technical: add cruft-packs.txt Create a technical document to explain cruft packs. It contains a brief overview of the problem, some background, details on the implementation, and a couple of alternative approaches not considered here. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/technical/cruft-packs.txt | 123 ++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 Documentation/technical/cruft-packs.txt (limited to 'Documentation/technical') diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt new file mode 100644 index 0000000000..c0f583cd48 --- /dev/null +++ b/Documentation/technical/cruft-packs.txt @@ -0,0 +1,123 @@ += Cruft packs + +The cruft packs feature offer an alternative to Git's traditional mechanism of +removing unreachable objects. This document provides an overview of Git's +pruning mechanism, and how a cruft pack can be used instead to accomplish the +same. + +== Background + +To remove unreachable objects from your repository, Git offers `git repack -Ad` +(see linkgit:git-repack[1]). Quoting from the documentation: + +[quote] +[...] unreachable objects in a previous pack become loose, unpacked objects, +instead of being left in the old pack. [...] loose unreachable objects will be +pruned according to normal expiry rules with the next 'git gc' invocation. + +Unreachable objects aren't removed immediately, since doing so could race with +an incoming push which may reference an object which is about to be deleted. +Instead, those unreachable objects are stored as loose object and stay that way +until they are older than the expiration window, at which point they are removed +by linkgit:git-prune[1]. + +Git must store these unreachable objects loose in order to keep track of their +per-object mtimes. If these unreachable objects were written into one big pack, +then either freshening that pack (because an object contained within it was +re-written) or creating a new pack of unreachable objects would cause the pack's +mtime to get updated, and the objects within it would never leave the expiration +window. Instead, objects are stored loose in order to keep track of the +individual object mtimes and avoid a situation where all cruft objects are +freshened at once. + +This can lead to undesirable situations when a repository contains many +unreachable objects which have not yet left the grace period. Having large +directories in the shards of `.git/objects` can lead to decreased performance in +the repository. But given enough unreachable objects, this can lead to inode +starvation and degrade the performance of the whole system. Since we +can never pack those objects, these repositories often take up a large amount of +disk space, since we can only zlib compress them, but not store them in delta +chains. + +== Cruft packs + +A cruft pack eliminates the need for storing unreachable objects in a loose +state by including the per-object mtimes in a separate file alongside a single +pack containing all loose objects. + +A cruft pack is written by `git repack --cruft` when generating a new pack. +linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` +is a classic all-into-one repack, meaning that everything in the resulting pack is +reachable, and everything else is unreachable. Once written, the `--cruft` +option instructs `git repack` to generate another pack containing only objects +not packed in the previous step (which equates to packing all unreachable +objects together). This progresses as follows: + + 1. Enumerate every object, marking any object which is (a) not contained in a + kept-pack, and (b) whose mtime is within the grace period as a traversal + tip. + + 2. Perform a reachability traversal based on the tips gathered in the previous + step, adding every object along the way to the pack. + + 3. Write the pack out, along with a `.mtimes` file that records the per-object + timestamps. + +This mode is invoked internally by linkgit:git-repack[1] when instructed to +write a cruft pack. Crucially, the set of in-core kept packs is exactly the set +of packs which will not be deleted by the repack; in other words, they contain +all of the repository's reachable objects. + +When a repository already has a cruft pack, `git repack --cruft` typically only +adds objects to it. An exception to this is when `git repack` is given the +`--cruft-expiration` option, which allows the generated cruft pack to omit +expired objects instead of waiting for linkgit:git-gc[1] to expire those objects +later on. + +It is linkgit:git-gc[1] that is typically responsible for removing expired +unreachable objects. + +== Caution for mixed-version environments + +Repositories that have cruft packs in them will continue to work with any older +version of Git. Note, however, that previous versions of Git which do not +understand the `.mtimes` file will use the cruft pack's mtime as the mtime for +all of the objects in it. In other words, do not expect older (pre-cruft pack) +versions of Git to interpret or even read the contents of the `.mtimes` file. + +Note that having mixed versions of Git GC-ing the same repository can lead to +unreachable objects never being completely pruned. This can happen under the +following circumstances: + + - An older version of Git running GC explodes the contents of an existing + cruft pack loose, using the cruft pack's mtime. + - A newer version running GC collects those loose objects into a cruft pack, + where the .mtime file reflects the loose object's actual mtimes, but the + cruft pack mtime is "now". + +Repeating this process will lead to unreachable objects not getting pruned as a +result of repeatedly resetting the objects' mtimes to the present time. + +If you are GC-ing repositories in a mixed version environment, consider omitting +the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and +leaving the `gc.cruftPacks` configuration unset until all writers understand +cruft packs. + +== Alternatives + +Notable alternatives to this design include: + + - The location of the per-object mtime data, and + - Storing unreachable objects in multiple cruft packs. + +On the location of mtime data, a new auxiliary file tied to the pack was chosen +to avoid complicating the `.idx` format. If the `.idx` format were ever to gain +support for optional chunks of data, it may make sense to consolidate the +`.mtimes` format into the `.idx` itself. + +Storing unreachable objects among multiple cruft packs (e.g., creating a new +cruft pack during each repacking operation including only unreachable objects +which aren't already stored in an earlier cruft pack) is significantly more +complicated to construct, and so aren't pursued here. The obvious drawback to +the current implementation is that the entire cruft pack must be re-written from +scratch. -- cgit v1.3 From 94cd775a6c52a99caeb1278c3d8044ee109e2d3e Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Fri, 20 May 2022 19:17:35 -0400 Subject: pack-mtimes: support reading .mtimes files To store the individual mtimes of objects in a cruft pack, introduce a new `.mtimes` format that can optionally accompany a single pack in the repository. The format is defined in Documentation/technical/pack-format.txt, and stores a 4-byte network order timestamp for each object in name (index) order. This patch prepares for cruft packs by defining the `.mtimes` format, and introducing a basic API that callers can use to read out individual mtimes. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/technical/pack-format.txt | 19 +++++ Makefile | 1 + builtin/repack.c | 1 + object-store.h | 10 ++- pack-mtimes.c | 129 ++++++++++++++++++++++++++++++++ pack-mtimes.h | 26 +++++++ packfile.c | 19 ++++- 7 files changed, 202 insertions(+), 3 deletions(-) create mode 100644 pack-mtimes.c create mode 100644 pack-mtimes.h (limited to 'Documentation/technical') diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 6d3efb7d16..b520aa9c45 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -294,6 +294,25 @@ Pack file entry: <+ All 4-byte numbers are in network order. +== pack-*.mtimes files have the format: + +All 4-byte numbers are in network byte order. + + - A 4-byte magic number '0x4d544d45' ('MTME'). + + - A 4-byte version identifier (= 1). + + - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). + + - A table of 4-byte unsigned integers. The ith value is the + modification time (mtime) of the ith object in the corresponding + pack by lexicographic (index) order. The mtimes count standard + epoch seconds. + + - A trailer, containing a checksum of the corresponding packfile, + and a checksum of all of the above (each having length according + to the specified hash function). + == multi-pack-index (MIDX) files have the following format: The multi-pack-index files refer to multiple pack-files and loose objects. diff --git a/Makefile b/Makefile index f8bccfab5e..e59328ab7d 100644 --- a/Makefile +++ b/Makefile @@ -993,6 +993,7 @@ LIB_OBJS += oidtree.o LIB_OBJS += pack-bitmap-write.o LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-check.o +LIB_OBJS += pack-mtimes.o LIB_OBJS += pack-objects.o LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o diff --git a/builtin/repack.c b/builtin/repack.c index d1a563d5b6..e7a3920c6d 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -217,6 +217,7 @@ static struct { } exts[] = { {".pack"}, {".rev", 1}, + {".mtimes", 1}, {".bitmap", 1}, {".promisor", 1}, {".idx"}, diff --git a/object-store.h b/object-store.h index bd2322ed8c..3c98028ce6 100644 --- a/object-store.h +++ b/object-store.h @@ -115,12 +115,20 @@ struct packed_git { freshened:1, do_not_close:1, pack_promisor:1, - multi_pack_index:1; + multi_pack_index:1, + is_cruft:1; unsigned char hash[GIT_MAX_RAWSZ]; struct revindex_entry *revindex; const uint32_t *revindex_data; const uint32_t *revindex_map; size_t revindex_size; + /* + * mtimes_map points at the beginning of the memory mapped region of + * this pack's corresponding .mtimes file, and mtimes_size is the size + * of that .mtimes file + */ + const uint32_t *mtimes_map; + size_t mtimes_size; /* something like ".git/objects/pack/xxxxx.pack" */ char pack_name[FLEX_ARRAY]; /* more */ }; diff --git a/pack-mtimes.c b/pack-mtimes.c new file mode 100644 index 0000000000..0e0aafdcb0 --- /dev/null +++ b/pack-mtimes.c @@ -0,0 +1,129 @@ +#include "git-compat-util.h" +#include "pack-mtimes.h" +#include "object-store.h" +#include "packfile.h" + +static char *pack_mtimes_filename(struct packed_git *p) +{ + size_t len; + if (!strip_suffix(p->pack_name, ".pack", &len)) + BUG("pack_name does not end in .pack"); + return xstrfmt("%.*s.mtimes", (int)len, p->pack_name); +} + +#define MTIMES_HEADER_SIZE (12) + +struct mtimes_header { + uint32_t signature; + uint32_t version; + uint32_t hash_id; +}; + +static int load_pack_mtimes_file(char *mtimes_file, + uint32_t num_objects, + const uint32_t **data_p, size_t *len_p) +{ + int fd, ret = 0; + struct stat st; + uint32_t *data = NULL; + size_t mtimes_size, expected_size; + struct mtimes_header header; + + fd = git_open(mtimes_file); + + if (fd < 0) { + ret = -1; + goto cleanup; + } + if (fstat(fd, &st)) { + ret = error_errno(_("failed to read %s"), mtimes_file); + goto cleanup; + } + + mtimes_size = xsize_t(st.st_size); + + if (mtimes_size < MTIMES_HEADER_SIZE) { + ret = error(_("mtimes file %s is too small"), mtimes_file); + goto cleanup; + } + + data = xmmap(NULL, mtimes_size, PROT_READ, MAP_PRIVATE, fd, 0); + + header.signature = ntohl(data[0]); + header.version = ntohl(data[1]); + header.hash_id = ntohl(data[2]); + + if (header.signature != MTIMES_SIGNATURE) { + ret = error(_("mtimes file %s has unknown signature"), mtimes_file); + goto cleanup; + } + + if (header.version != 1) { + ret = error(_("mtimes file %s has unsupported version %"PRIu32), + mtimes_file, header.version); + goto cleanup; + } + + if (!(header.hash_id == 1 || header.hash_id == 2)) { + ret = error(_("mtimes file %s has unsupported hash id %"PRIu32), + mtimes_file, header.hash_id); + goto cleanup; + } + + + expected_size = MTIMES_HEADER_SIZE; + expected_size = st_add(expected_size, st_mult(sizeof(uint32_t), num_objects)); + expected_size = st_add(expected_size, 2 * (header.hash_id == 1 ? GIT_SHA1_RAWSZ : GIT_SHA256_RAWSZ)); + + if (mtimes_size != expected_size) { + ret = error(_("mtimes file %s is corrupt"), mtimes_file); + goto cleanup; + } + +cleanup: + if (ret) { + if (data) + munmap(data, mtimes_size); + } else { + *len_p = mtimes_size; + *data_p = data; + } + + close(fd); + return ret; +} + +int load_pack_mtimes(struct packed_git *p) +{ + char *mtimes_name = NULL; + int ret = 0; + + if (!p->is_cruft) + return ret; /* not a cruft pack */ + if (p->mtimes_map) + return ret; /* already loaded */ + + ret = open_pack_index(p); + if (ret < 0) + goto cleanup; + + mtimes_name = pack_mtimes_filename(p); + ret = load_pack_mtimes_file(mtimes_name, + p->num_objects, + &p->mtimes_map, + &p->mtimes_size); +cleanup: + free(mtimes_name); + return ret; +} + +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos) +{ + if (!p->mtimes_map) + BUG("pack .mtimes file not loaded for %s", p->pack_name); + if (p->num_objects <= pos) + BUG("pack .mtimes out-of-bounds (%"PRIu32" vs %"PRIu32")", + pos, p->num_objects); + + return get_be32(p->mtimes_map + pos + 3); +} diff --git a/pack-mtimes.h b/pack-mtimes.h new file mode 100644 index 0000000000..cc957b3e85 --- /dev/null +++ b/pack-mtimes.h @@ -0,0 +1,26 @@ +#ifndef PACK_MTIMES_H +#define PACK_MTIMES_H + +#include "git-compat-util.h" + +#define MTIMES_SIGNATURE 0x4d544d45 /* "MTME" */ +#define MTIMES_VERSION 1 + +struct packed_git; + +/* + * Loads the .mtimes file corresponding to "p", if any, returning zero + * on success. + */ +int load_pack_mtimes(struct packed_git *p); + +/* Returns the mtime associated with the object at position "pos" (in + * lexicographic/index order) in pack "p". + * + * Note that it is a BUG() to call this function if either (a) "p" does + * not have a corresponding .mtimes file, or (b) it does, but it hasn't + * been loaded + */ +uint32_t nth_packed_mtime(struct packed_git *p, uint32_t pos); + +#endif diff --git a/packfile.c b/packfile.c index 835b2d2716..fc0245fbab 100644 --- a/packfile.c +++ b/packfile.c @@ -334,12 +334,22 @@ static void close_pack_revindex(struct packed_git *p) p->revindex_data = NULL; } +static void close_pack_mtimes(struct packed_git *p) +{ + if (!p->mtimes_map) + return; + + munmap((void *)p->mtimes_map, p->mtimes_size); + p->mtimes_map = NULL; +} + void close_pack(struct packed_git *p) { close_pack_windows(p); close_pack_fd(p); close_pack_index(p); close_pack_revindex(p); + close_pack_mtimes(p); oidset_clear(&p->bad_objects); } @@ -363,7 +373,7 @@ void close_object_store(struct raw_object_store *o) void unlink_pack_path(const char *pack_name, int force_delete) { - static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"}; + static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor", ".mtimes"}; int i; struct strbuf buf = STRBUF_INIT; size_t plen; @@ -718,6 +728,10 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local) if (!access(p->pack_name, F_OK)) p->pack_promisor = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".mtimes"); + if (!access(p->pack_name, F_OK)) + p->is_cruft = 1; + xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack"); if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) { free(p); @@ -869,7 +883,8 @@ static void prepare_pack(const char *full_name, size_t full_name_len, ends_with(file_name, ".pack") || ends_with(file_name, ".bitmap") || ends_with(file_name, ".keep") || - ends_with(file_name, ".promisor")) + ends_with(file_name, ".promisor") || + ends_with(file_name, ".mtimes")) string_list_append(data->garbage, full_name); else report_garbage(PACKDIR_FILE_GARBAGE, full_name); -- cgit v1.3 From f9825d1cf752b8d04a3e9193ff6fdb54d09e28a3 Mon Sep 17 00:00:00 2001 From: Taylor Blau Date: Fri, 20 May 2022 19:18:03 -0400 Subject: builtin/repack.c: support generating a cruft pack Expose a way to split the contents of a repository into a main and cruft pack when doing an all-into-one repack with `git repack --cruft -d`, and a complementary configuration variable. Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano --- Documentation/git-repack.txt | 11 ++ Documentation/technical/cruft-packs.txt | 2 +- builtin/repack.c | 105 +++++++++++++++- t/t5329-pack-objects-cruft.sh | 207 ++++++++++++++++++++++++++++++++ 4 files changed, 319 insertions(+), 6 deletions(-) (limited to 'Documentation/technical') diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index ee30edc178..0bf13893d8 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -63,6 +63,17 @@ to the new separate pack will be written. Also run 'git prune-packed' to remove redundant loose object files. +--cruft:: + Same as `-a`, unless `-d` is used. Then any unreachable objects + are packed into a separate cruft pack. Unreachable objects can + be pruned using the normal expiry rules with the next `git gc` + invocation (see linkgit:git-gc[1]). Incompatible with `-k`. + +--cruft-expiration=:: + Expire unreachable objects older than `` + immediately instead of waiting for the next `git gc` invocation. + Only useful with `--cruft -d`. + -l:: Pass the `--local` option to 'git pack-objects'. See linkgit:git-pack-objects[1]. diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt index c0f583cd48..d81f3a8982 100644 --- a/Documentation/technical/cruft-packs.txt +++ b/Documentation/technical/cruft-packs.txt @@ -17,7 +17,7 @@ pruned according to normal expiry rules with the next 'git gc' invocation. Unreachable objects aren't removed immediately, since doing so could race with an incoming push which may reference an object which is about to be deleted. -Instead, those unreachable objects are stored as loose object and stay that way +Instead, those unreachable objects are stored as loose objects and stay that way until they are older than the expiration window, at which point they are removed by linkgit:git-prune[1]. diff --git a/builtin/repack.c b/builtin/repack.c index e7a3920c6d..593c18d4e8 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -18,12 +18,18 @@ #include "pack-bitmap.h" #include "refs.h" +#define ALL_INTO_ONE 1 +#define LOOSEN_UNREACHABLE 2 +#define PACK_CRUFT 4 + +static int pack_everything; static int delta_base_offset = 1; static int pack_kept_objects = -1; static int write_bitmaps = -1; static int use_delta_islands; static int run_update_server_info = 1; static char *packdir, *packtmp_name, *packtmp; +static char *cruft_expiration; static const char *const git_repack_usage[] = { N_("git repack []"), @@ -305,9 +311,6 @@ static void repack_promisor_objects(const struct pack_objects_args *args, die(_("could not finish pack-objects to repack promisor objects")); } -#define ALL_INTO_ONE 1 -#define LOOSEN_UNREACHABLE 2 - struct pack_geometry { struct packed_git **pack; uint32_t pack_nr, pack_alloc; @@ -344,6 +347,8 @@ static void init_pack_geometry(struct pack_geometry **geometry_p) for (p = get_all_packs(the_repository); p; p = p->next) { if (!pack_kept_objects && p->pack_keep) continue; + if (p->is_cruft) + continue; ALLOC_GROW(geometry->pack, geometry->pack_nr + 1, @@ -605,6 +610,67 @@ static int write_midx_included_packs(struct string_list *include, return finish_command(&cmd); } +static int write_cruft_pack(const struct pack_objects_args *args, + const char *pack_prefix, + struct string_list *names, + struct string_list *existing_packs, + struct string_list *existing_kept_packs) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + struct strbuf line = STRBUF_INIT; + struct string_list_item *item; + FILE *in, *out; + int ret; + + prepare_pack_objects(&cmd, args); + + strvec_push(&cmd.args, "--cruft"); + if (cruft_expiration) + strvec_pushf(&cmd.args, "--cruft-expiration=%s", + cruft_expiration); + + strvec_push(&cmd.args, "--honor-pack-keep"); + strvec_push(&cmd.args, "--non-empty"); + strvec_push(&cmd.args, "--max-pack-size=0"); + + cmd.in = -1; + + ret = start_command(&cmd); + if (ret) + return ret; + + /* + * names has a confusing double use: it both provides the list + * of just-written new packs, and accepts the name of the cruft + * pack we are writing. + * + * By the time it is read here, it contains only the pack(s) + * that were just written, which is exactly the set of packs we + * want to consider kept. + */ + in = xfdopen(cmd.in, "w"); + for_each_string_list_item(item, names) + fprintf(in, "%s-%s.pack\n", pack_prefix, item->string); + for_each_string_list_item(item, existing_packs) + fprintf(in, "-%s.pack\n", item->string); + for_each_string_list_item(item, existing_kept_packs) + fprintf(in, "%s.pack\n", item->string); + fclose(in); + + out = xfdopen(cmd.out, "r"); + while (strbuf_getline_lf(&line, out) != EOF) { + if (line.len != the_hash_algo->hexsz) + die(_("repack: Expecting full hex object ID lines only " + "from pack-objects.")); + string_list_append(names, line.buf); + } + fclose(out); + + strbuf_release(&line); + + return finish_command(&cmd); +} + int cmd_repack(int argc, const char **argv, const char *prefix) { struct child_process cmd = CHILD_PROCESS_INIT; @@ -621,7 +687,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int show_progress; /* variables to be filled by option parsing */ - int pack_everything = 0; int delete_redundant = 0; const char *unpack_unreachable = NULL; int keep_unreachable = 0; @@ -636,6 +701,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix) OPT_BIT('A', NULL, &pack_everything, N_("same as -a, and turn unreachable objects loose"), LOOSEN_UNREACHABLE | ALL_INTO_ONE), + OPT_BIT(0, "cruft", &pack_everything, + N_("same as -a, pack unreachable cruft objects separately"), + PACK_CRUFT), + OPT_STRING(0, "cruft-expiration", &cruft_expiration, N_("approxidate"), + N_("with -C, expire objects older than this")), OPT_BOOL('d', NULL, &delete_redundant, N_("remove redundant packs, and run git-prune-packed")), OPT_BOOL('f', NULL, &po_args.no_reuse_delta, @@ -688,6 +758,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix) (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE))) die(_("options '%s' and '%s' cannot be used together"), "--keep-unreachable", "-A"); + if (pack_everything & PACK_CRUFT) { + pack_everything |= ALL_INTO_ONE; + + if (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)) + die(_("options '%s' and '%s' cannot be used together"), "--cruft", "-A"); + if (keep_unreachable) + die(_("options '%s' and '%s' cannot be used together"), "--cruft", "-k"); + } + if (write_bitmaps < 0) { if (!write_midx && (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository())) @@ -771,7 +850,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (pack_everything & ALL_INTO_ONE) { repack_promisor_objects(&po_args, &names); - if (existing_nonkept_packs.nr && delete_redundant) { + if (existing_nonkept_packs.nr && delete_redundant && + !(pack_everything & PACK_CRUFT)) { for_each_string_list_item(item, &names) { strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", packtmp_name, item->string); @@ -833,6 +913,21 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (!names.nr && !po_args.quiet) printf_ln(_("Nothing new to pack.")); + if (pack_everything & PACK_CRUFT) { + const char *pack_prefix; + if (!skip_prefix(packtmp, packdir, &pack_prefix)) + die(_("pack prefix %s does not begin with objdir %s"), + packtmp, packdir); + if (*pack_prefix == '/') + pack_prefix++; + + ret = write_cruft_pack(&po_args, pack_prefix, &names, + &existing_nonkept_packs, + &existing_kept_packs); + if (ret) + return ret; + } + for_each_string_list_item(item, &names) { item->util = (void *)(uintptr_t)populate_pack_exts(item->string); } diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh index 939cdc297a..067c50af38 100755 --- a/t/t5329-pack-objects-cruft.sh +++ b/t/t5329-pack-objects-cruft.sh @@ -358,4 +358,211 @@ test_expect_success 'expired objects are pruned' ' ) ' +test_expect_success 'repack --cruft generates a cruft pack' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit reachable && + git branch -M main && + git checkout --orphan other && + test_commit unreachable && + + git checkout main && + git branch -D other && + git tag -d unreachable && + # objects are not cruft if they are contained in the reflogs + git reflog expire --all --expire=all && + + git rev-list --objects --all --no-object-names >reachable.raw && + git cat-file --batch-all-objects --batch-check="%(objectname)" >objects && + sort reachable && + comm -13 reachable objects >unreachable && + + git repack --cruft -d && + + cruft=$(basename $(ls $packdir/pack-*.mtimes) .mtimes) && + pack=$(basename $(ls $packdir/pack-*.pack | grep -v $cruft) .pack) && + + git show-index <$packdir/$pack.idx >actual.raw && + cut -f2 -d" " actual.raw | sort >actual && + test_cmp reachable actual && + + git show-index <$packdir/$cruft.idx >actual.raw && + cut -f2 -d" " actual.raw | sort >actual && + test_cmp unreachable actual + ) +' + +test_expect_success 'loose objects mtimes upsert others' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit reachable && + git repack -Ad && + git branch -M main && + + git checkout --orphan other && + test_commit cruft && + # incremental repack, leaving existing objects loose (so + # they can be "freshened") + git repack && + + tip="$(git rev-parse cruft)" && + path="$objdir/$(test_oid_to_path "$tip")" && + test-tool chmtime --get +1000 "$path" >expect && + + git checkout main && + git branch -D other && + git tag -d cruft && + git reflog expire --all --expire=all && + + git repack --cruft -d && + + mtimes="$(basename $(ls $packdir/pack-*.mtimes))" && + test-tool pack-mtimes "$mtimes" >actual.raw && + grep "$tip" actual.raw | cut -d" " -f2 >actual && + test_cmp expect actual + ) +' + +test_expect_success 'cruft packs are not included in geometric repack' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit reachable && + git repack -Ad && + git branch -M main && + + git checkout --orphan other && + test_commit cruft && + git repack -d && + + git checkout main && + git branch -D other && + git tag -d cruft && + git reflog expire --all --expire=all && + + git repack --cruft && + + find $packdir -type f | sort >before && + git repack --geometric=2 -d && + find $packdir -type f | sort >after && + + test_cmp before after + ) +' + +test_expect_success 'repack --geometric collects once-cruft objects' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit reachable && + git repack -Ad && + git branch -M main && + + git checkout --orphan other && + git rm -rf . && + test_commit --no-tag cruft && + cruft="$(git rev-parse HEAD)" && + + git checkout main && + git branch -D other && + git reflog expire --all --expire=all && + + # Pack the objects created in the previous step into a cruft + # pack. Intentionally leave loose copies of those objects + # around so we can pick them up in a subsequent --geometric + # reapack. + git repack --cruft && + + # Now make those objects reachable, and ensure that they are + # packed into the new pack created via a --geometric repack. + git update-ref refs/heads/other $cruft && + + # Without this object, the set of unpacked objects is exactly + # the set of objects already in the cruft pack. Tweak that set + # to ensure we do not overwrite the cruft pack entirely. + test_commit reachable2 && + + find $packdir -name "pack-*.idx" | sort >before && + git repack --geometric=2 -d && + find $packdir -name "pack-*.idx" | sort >after && + + { + git rev-list --objects --no-object-names $cruft && + git rev-list --objects --no-object-names reachable..reachable2 + } >want.raw && + sort want.raw >want && + + pack=$(comm -13 before after) && + git show-index <$pack >objects.raw && + + cut -d" " -f2 objects.raw | sort >got && + + test_cmp want got + ) +' + +test_expect_success 'cruft repack with no reachable objects' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + git repack -ad && + + base="$(git rev-parse base)" && + + git for-each-ref --format="delete %(refname)" >in && + git update-ref --stdin cruft && + test_line_count = 1 cruft && + test-tool pack-mtimes "$(basename "$(cat cruft)")" >objects && + test_line_count = 2 objects + ) +' + +test_expect_success 'cruft repack ignores pack.packSizeLimit' ' + ( + cd max-pack-size && + # repack everything back together to remove the existing cruft + # pack (but to keep its objects) + git repack -adk && + git -c pack.packSizeLimit=1M repack --cruft && + # ensure the same post condition is met when --max-pack-size + # would otherwise be inferred from the configuration + find $packdir -name "*.mtimes" >cruft && + test_line_count = 1 cruft && + test-tool pack-mtimes "$(basename "$(cat cruft)")" >objects && + test_line_count = 2 objects + ) +' + test_done -- cgit v1.3