From f23ac77a4325fc776ebb38044a34f5e9629e4f67 Mon Sep 17 00:00:00 2001 From: Patrick Steinhardt Date: Mon, 16 Feb 2026 16:38:01 +0100 Subject: commit: avoid parsing non-commits in `lookup_commit_reference_gently()` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The function `lookup_commit_reference_gently()` can be used to look up a committish by object ID. As such, the function knows to peel for example tag objects so that we eventually end up with the commit. The function is used quite a lot throughout our tree. One such user is "shallow.c" via `assign_shallow_commits_to_refs()`. The intent of this function is to figure out whether a shallow push is missing any objects that are required to satisfy the ref updates, and if so, which of the ref updates is missing objects. This is done by painting the tree with `UNINTERESTING`. We start painting by calling `refs_for_each_ref()` so that we can mark all existing referenced objects as the boundary of objects that we already have, and which are supposed to be fully connected. The reference tips are then parsed via `lookup_commit_reference_gently()`, and the commit is then marked as uninteresting. But references may not necessarily point to a committish, and if a lot of them aren't then this step takes a lot of time. This is mostly due to the way that `lookup_commit_reference_gently()` is implemented: before we learn about the type of the object we already call `parse_object()` on the object ID. This has two consequences: - We parse all objects, including trees and blobs, even though we don't even need the contents of them. - More importantly though, `parse_object()` will cause us to check whether the object ID matches its contents. Combined this means that we deflate and hash every non-committish object, and that of course ends up being both CPU- and memory-intensive. Improve the logic so that we first use `peel_object()`. This function won't parse the object for us, and thus it allows us to learn about the object's type before we parse and return it. The following benchmark pushes a single object from a shallow clone into a repository that has 100,000 refs. These refs were created by listing all objects via `git rev-list(1) --objects --all` and creating refs for a subset of them, so lots of those refs will cover non-commit objects. Benchmark 1: git-receive-pack (rev = HEAD~) Time (mean ± σ): 62.571 s ± 0.413 s [User: 58.331 s, System: 4.053 s] Range (min … max): 62.191 s … 63.010 s 3 runs Benchmark 2: git-receive-pack (rev = HEAD) Time (mean ± σ): 38.339 s ± 0.192 s [User: 36.220 s, System: 1.992 s] Range (min … max): 38.176 s … 38.551 s 3 runs Summary git-receive-pack . Signed-off-by: Junio C Hamano --- commit.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) (limited to 'commit.c') diff --git a/commit.c b/commit.c index 28bb5ce029..fb1b01b9c3 100644 --- a/commit.c +++ b/commit.c @@ -43,13 +43,35 @@ const char *commit_type = "commit"; struct commit *lookup_commit_reference_gently(struct repository *r, const struct object_id *oid, int quiet) { - struct object *obj = deref_tag(r, - parse_object(r, oid), - NULL, 0); + const struct object_id *maybe_peeled; + struct object_id peeled_oid; + struct object *object; + enum object_type type; - if (!obj) + switch (peel_object_ext(r, oid, &peeled_oid, 0, &type)) { + case PEEL_NON_TAG: + maybe_peeled = oid; + break; + case PEEL_PEELED: + maybe_peeled = &peeled_oid; + break; + default: return NULL; - return object_as_type(obj, OBJ_COMMIT, quiet); + } + + if (type != OBJ_COMMIT) { + if (!quiet) + error(_("object %s is a %s, not a %s"), + oid_to_hex(oid), type_name(type), + type_name(OBJ_COMMIT)); + return NULL; + } + + object = parse_object(r, maybe_peeled); + if (!object) + return NULL; + + return object_as_type(object, OBJ_COMMIT, quiet); } struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid) -- cgit v1.3