From 70664d2865ccd21da4a182fed6d11da5a615cd2b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 16 May 2025 18:11:52 +0000 Subject: pack-objects: add --path-walk option In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/api-path-walk.adoc | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/technical') diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc index 3e089211fb..e522695dd9 100644 --- a/Documentation/technical/api-path-walk.adoc +++ b/Documentation/technical/api-path-walk.adoc @@ -69,4 +69,5 @@ Examples See example usages in: `t/helper/test-path-walk.c`, + `builtin/pack-objects.c`, `builtin/backfill.c` -- cgit v1.3-6-g1900 From 4705889c3dfbb38f14c3569b489b4b2a00b4186a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Fri, 16 May 2025 18:12:02 +0000 Subject: path-walk: add new 'edge_aggressive' option In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/api-path-walk.adoc | 8 ++++++++ path-walk.c | 6 +++++- path-walk.h | 7 +++++++ t/helper/test-path-walk.c | 2 ++ t/t6601-path-walk.sh | 20 ++++++++++++++++++++ 5 files changed, 42 insertions(+), 1 deletion(-) (limited to 'Documentation/technical') diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc index e522695dd9..34c905eb9c 100644 --- a/Documentation/technical/api-path-walk.adoc +++ b/Documentation/technical/api-path-walk.adoc @@ -56,6 +56,14 @@ better off using the revision walk API instead. the revision walk so that the walk emits commits marked with the `UNINTERESTING` flag. +`edge_aggressive`:: + For performance reasons, usually only the boundary commits are + explored to find UNINTERESTING objects. However, in the case of + shallow clones it can be helpful to mark all trees and blobs + reachable from UNINTERESTING tip commits as UNINTERESTING. This + matches the behavior of `--objects-edge-aggressive` in the + revision API. + `pl`:: This pattern list pointer allows focusing the path-walk search to a set of patterns, only emitting paths that match the given diff --git a/path-walk.c b/path-walk.c index 341bdd2ba4..2d4ddbadd5 100644 --- a/path-walk.c +++ b/path-walk.c @@ -503,7 +503,11 @@ int walk_objects_by_path(struct path_walk_info *info) if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); - /* Walk trees to mark them as UNINTERESTING. */ + /* + * Walk trees to mark them as UNINTERESTING. + * This is particularly important when 'edge_aggressive' is set. + */ + info->revs->edge_hint_aggressive = info->edge_aggressive; edge_repo = info->revs->repo; edge_tree_list = root_tree_list; mark_edges_uninteresting(info->revs, show_edge, diff --git a/path-walk.h b/path-walk.h index 473ee9d361..5ef5a8440e 100644 --- a/path-walk.h +++ b/path-walk.h @@ -50,6 +50,13 @@ struct path_walk_info { */ int prune_all_uninteresting; + /** + * When 'edge_aggressive' is set, then the revision walk will use + * the '--object-edge-aggressive' option to mark even more objects + * as uninteresting. + */ + int edge_aggressive; + /** * Specify a sparse-checkout definition to match our paths to. Do not * walk outside of this sparse definition. If the patterns are in diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c index 61e845e5ec..fe63002c2b 100644 --- a/t/helper/test-path-walk.c +++ b/t/helper/test-path-walk.c @@ -82,6 +82,8 @@ int cmd__path_walk(int argc, const char **argv) N_("toggle inclusion of tree objects")), OPT_BOOL(0, "prune", &info.prune_all_uninteresting, N_("toggle pruning of uninteresting paths")), + OPT_BOOL(0, "edge-aggressive", &info.edge_aggressive, + N_("toggle aggressive edge walk")), OPT_BOOL(0, "stdin-pl", &stdin_pl, N_("read a pattern list over stdin")), OPT_END(), diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh index c89b0f1e19..785c2f2237 100755 --- a/t/t6601-path-walk.sh +++ b/t/t6601-path-walk.sh @@ -378,6 +378,26 @@ test_expect_success 'topic, not base, boundary with pruning' ' test_cmp_sorted expect out ' +test_expect_success 'topic, not base, --edge-aggressive with pruning' ' + test-tool path-walk --prune --edge-aggressive -- topic --not base >out && + + cat >expect <<-EOF && + 0:commit::$(git rev-parse topic) + 1:tree::$(git rev-parse topic^{tree}) + 1:tree::$(git rev-parse base^{tree}):UNINTERESTING + 2:tree:right/:$(git rev-parse topic:right) + 2:tree:right/:$(git rev-parse base:right):UNINTERESTING + 3:blob:right/c:$(git rev-parse base:right/c):UNINTERESTING + 3:blob:right/c:$(git rev-parse topic:right/c) + blobs:2 + commits:1 + tags:0 + trees:4 + EOF + + test_cmp_sorted expect out +' + test_expect_success 'trees are reported exactly once' ' test_when_finished "rm -rf unique-trees" && test_create_repo unique-trees && -- cgit v1.3-6-g1900