diff options
| author | Taylor Blau <me@ttaylorr.com> | 2026-02-24 14:00:41 -0500 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2026-02-24 11:16:34 -0800 |
| commit | 9df44a97f165bbdd8d591c6e99c8894ae036b5bd (patch) | |
| tree | f0fdbc701c8e8e47966817eda9c74cfdcd6e3360 /builtin | |
| parent | dedf71f0b1b9a64d5aa7558ef45b26166d8d66fc (diff) | |
| download | git-9df44a97f165bbdd8d591c6e99c8894ae036b5bd.tar.xz | |
midx: implement MIDX compaction
When managing a MIDX chain with many layers, it is convenient to combine
a sequence of adjacent layers into a single layer to prevent the chain
from growing too long.
While it is conceptually possible to "compact" a sequence of MIDX layers
together by running "git multi-pack-index write --stdin-packs", there
are a few drawbacks that make this less than desirable:
- Preserving the MIDX chain is impossible, since there is no way to
write a MIDX layer that contains objects or packs found in an earlier
MIDX layer already part of the chain. So callers would have to write
an entirely new (non-incremental) MIDX containing only the compacted
layers, discarding all other objects/packs from the MIDX.
- There is (currently) no way to write a MIDX layer outside of the MIDX
chain to work around the above, such that the MIDX chain could be
reassembled substituting the compacted layers with the MIDX that was
written.
- The `--stdin-packs` command-line option does not allow us to specify
the order of packs as they appear in the MIDX. Therefore, even if
there were workarounds for the previous two challenges, any bitmaps
belonging to layers which come after the compacted layer(s) would no
longer be valid.
This commit introduces a way to compact a sequence of adjacent MIDX
layers into a single layer while preserving the MIDX chain, as well as
any bitmap(s) in layers which are newer than the compacted ones.
Implementing MIDX compaction does not require a significant number of
changes to how MIDX layers are written. The main changes are as follows:
- Instead of calling `fill_packs_from_midx()`, we call a new function
`fill_packs_from_midx_range()`, which walks backwards along the
portion of the MIDX chain which we are compacting, and adds packs one
layer a time.
In order to preserve the pseudo-pack order, the concatenated pack
order is preserved, with the exception of preferred packs which are
always added first.
- After adding entries from the set of packs in the compaction range,
`compute_sorted_entries()` must adjust the `pack_int_id`'s for all
objects added in each fanout layer to match their original
`pack_int_id`'s (as opposed to the index at which each pack appears
in `ctx.info`).
Note that we cannot reuse `midx_fanout_add_midx_fanout()` directly
here, as it unconditionally recurs through the `->base_midx`. Factor
out a `_1()` variant that operates on a single layer, reimplement
the existing function in terms of it, and use the new variant from
`midx_fanout_add_compact()`.
Since we are sorting the list of objects ourselves, the order we add
them in does not matter.
- When writing out the new 'multi-pack-index-chain' file, discard any
layers in the compaction range, replacing them with the newly written
layer, instead of keeping them and placing the new layer at the end
of the chain.
This ends up being sufficient to implement MIDX compaction in such a way
that preserves bitmaps corresponding to more recent layers in the MIDX
chain.
The tests for MIDX compaction are so far fairly spartan, since the main
interesting behavior here is ensuring that the right packs/objects are
selected from each layer, and that the pack order is preserved despite
whether or not they are sorted in lexicographic order in the original
MIDX chain.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'builtin')
| -rw-r--r-- | builtin/multi-pack-index.c | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index c0c6c1760c..043ee8c478 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -17,6 +17,10 @@ " [--[no-]bitmap] [--[no-]incremental] [--[no-]stdin-packs]\n" \ " [--refs-snapshot=<path>]") +#define BUILTIN_MIDX_COMPACT_USAGE \ + N_("git multi-pack-index [<options>] compact [--[no-]incremental]\n" \ + " <from> <to>") + #define BUILTIN_MIDX_VERIFY_USAGE \ N_("git multi-pack-index [<options>] verify") @@ -30,6 +34,10 @@ static char const * const builtin_multi_pack_index_write_usage[] = { BUILTIN_MIDX_WRITE_USAGE, NULL }; +static char const * const builtin_multi_pack_index_compact_usage[] = { + BUILTIN_MIDX_COMPACT_USAGE, + NULL +}; static char const * const builtin_multi_pack_index_verify_usage[] = { BUILTIN_MIDX_VERIFY_USAGE, NULL @@ -44,6 +52,7 @@ static char const * const builtin_multi_pack_index_repack_usage[] = { }; static char const * const builtin_multi_pack_index_usage[] = { BUILTIN_MIDX_WRITE_USAGE, + BUILTIN_MIDX_COMPACT_USAGE, BUILTIN_MIDX_VERIFY_USAGE, BUILTIN_MIDX_EXPIRE_USAGE, BUILTIN_MIDX_REPACK_USAGE, @@ -195,6 +204,70 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, return ret; } +static int cmd_multi_pack_index_compact(int argc, const char **argv, + const char *prefix, + struct repository *repo) +{ + struct multi_pack_index *m, *cur; + struct multi_pack_index *from_midx = NULL; + struct multi_pack_index *to_midx = NULL; + struct odb_source *source; + int ret; + + struct option *options; + static struct option builtin_multi_pack_index_compact_options[] = { + OPT_BIT(0, "incremental", &opts.flags, + N_("write a new incremental MIDX"), MIDX_WRITE_INCREMENTAL), + OPT_END(), + }; + + repo_config(repo, git_multi_pack_index_write_config, NULL); + + options = add_common_options(builtin_multi_pack_index_compact_options); + + trace2_cmd_mode(argv[0]); + + if (isatty(2)) + opts.flags |= MIDX_PROGRESS; + argc = parse_options(argc, argv, prefix, + options, builtin_multi_pack_index_compact_usage, + 0); + + if (argc != 2) + usage_with_options(builtin_multi_pack_index_compact_usage, + options); + source = handle_object_dir_option(the_repository); + + FREE_AND_NULL(options); + + m = get_multi_pack_index(source); + + for (cur = m; cur && !(from_midx && to_midx); cur = cur->base_midx) { + const char *midx_csum = midx_get_checksum_hex(cur); + + if (!from_midx && !strcmp(midx_csum, argv[0])) + from_midx = cur; + if (!to_midx && !strcmp(midx_csum, argv[1])) + to_midx = cur; + } + + if (!from_midx) + die(_("could not find MIDX: %s"), argv[0]); + if (!to_midx) + die(_("could not find MIDX: %s"), argv[1]); + if (from_midx == to_midx) + die(_("MIDX compaction endpoints must be unique")); + + for (m = from_midx; m; m = m->base_midx) { + if (m == to_midx) + die(_("MIDX %s must be an ancestor of %s"), argv[0], argv[1]); + } + + ret = write_midx_file_compact(source, from_midx, to_midx, opts.flags); + + return ret; +} + static int cmd_multi_pack_index_verify(int argc, const char **argv, const char *prefix, struct repository *repo UNUSED) @@ -295,6 +368,7 @@ int cmd_multi_pack_index(int argc, struct option builtin_multi_pack_index_options[] = { OPT_SUBCOMMAND("repack", &fn, cmd_multi_pack_index_repack), OPT_SUBCOMMAND("write", &fn, cmd_multi_pack_index_write), + OPT_SUBCOMMAND("compact", &fn, cmd_multi_pack_index_compact), OPT_SUBCOMMAND("verify", &fn, cmd_multi_pack_index_verify), OPT_SUBCOMMAND("expire", &fn, cmd_multi_pack_index_expire), OPT_END(), |
