From d0e52c1728bdab27f7ca61ee0d4ff91055646bae Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 6 Nov 2024 16:16:58 -0500 Subject: t6120: demonstrate weakness in disjoint-root handling Commit 30b1c7ad9d (describe: don't abort too early when searching tags, 2020-02-26) tried to fix a problem that happens when there are disjoint histories: to accurately compare the counts for different tags, we need to keep walking the history longer in order to find a common base. But its fix misses a case: we may still bail early if we hit the max_candidates limit, producing suboptimal output. You can see this in action by adding "--candidates=2" to the tests; we'll stop traversing as soon as we see the second tag and will produce the wrong answer. I hit this in practice while trying to teach git-describe not to keep looking for candidates after we've seen all tags in the repo (effectively adding --candidates=2, since these toy repos have only two tags each). This is probably fixable by continuing to walk after hitting the max-candidates limit, all the way down to a common ancestor of all candidates. But it's not clear in practice what the preformance implications would be (it would depend on how long the branches that hold the candidates are). So I'm punting on that for now, but I'd like to adjust the tests to be more resilient, and to document the findings. So this patch: 1. Adds an extra tag at the bottom of history. This shouldn't change the output, but does mean we are more resilient to low values of --candidates (e.g., if we start reducing it to the total number of tags). This is arguably closer to the real world anyway, where you're not going to have just 2 tags, but an arbitrarily long history going back in time, possibly with multiple irrelevant tags in it (I called the new tag "H" here for "history"). 2. Run the same tests with --candidates=2, which shows that even with the current code they can fail if we end the traversal early. That leaves a trail for anybody interested in trying to improve the behavior. Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/t6120-describe.sh | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) (limited to 't') diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 05ed2510d9..69689d2f36 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -19,6 +19,7 @@ TEST_PASSES_SANITIZE_LEAK=true check_describe () { indir= && + outcome=success && while test $# != 0 do case "$1" in @@ -26,6 +27,9 @@ check_describe () { indir="$2" shift ;; + --expect-failure) + outcome=failure + ;; *) break ;; @@ -36,7 +40,7 @@ check_describe () { expect="$1" shift describe_opts="$@" - test_expect_success "describe $describe_opts" ' + test_expect_${outcome} "describe $describe_opts" ' git ${indir:+ -C "$indir"} describe $describe_opts >raw && sed -e "s/-g[0-9a-f]*\$/-gHASH/" actual && echo "$expect" >expect && @@ -617,7 +621,7 @@ test_expect_success 'name-rev --annotate-stdin works with commitGraph' ' # B # o -# \ +# H \ # o-----o---o----x # A # @@ -627,6 +631,7 @@ test_expect_success 'setup: describe commits with disjoint bases' ' cd disjoint1 && echo o >> file && git add file && git commit -m o && + git tag H -a -m H && echo A >> file && git add file && git commit -m A && git tag A -a -m A && echo o >> file && git add file && git commit -m o && @@ -639,8 +644,9 @@ test_expect_success 'setup: describe commits with disjoint bases' ' ' check_describe -C disjoint1 "A-3-gHASH" HEAD +check_describe -C disjoint1 --expect-failure "A-3-gHASH" --candidates=2 HEAD -# B +# H B # o---o---o------------. # \ # o---o---x @@ -658,6 +664,7 @@ test_expect_success 'setup: describe commits with disjoint bases 2' ' git checkout --orphan branch && echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:00" git commit -m o && echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:01" git commit -m o && + git tag H -a -m H && echo B >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:02" git commit -m B && git tag B -a -m B && git merge --no-ff --allow-unrelated-histories main -m x @@ -665,6 +672,7 @@ test_expect_success 'setup: describe commits with disjoint bases 2' ' ' check_describe -C disjoint2 "B-3-gHASH" HEAD +check_describe -C disjoint2 --expect-failure "B-3-gHASH" --candidates=2 HEAD test_expect_success 'setup misleading taggerdates' ' GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1 -- cgit v1.3 From bb0830c6820bf25cdf4722c63a3ff06470e18b0e Mon Sep 17 00:00:00 2001 From: Jeff King Date: Wed, 6 Nov 2024 16:17:08 -0500 Subject: t/perf: add tests for git-describe We don't have a perf script for git-describe, despite it often being accused of slowness. Let's add a few simple tests to start with. Rather than use the existing tags from our test repo, we'll make our own so that we have a known quantity and position. We'll add a "new" tag near the tip of HEAD, and an "old" one that is at the very bottom. And then our tests are: 1. Describing HEAD naively requires walking all the way down to the old tag as we collect candidates. This gives us a baseline for what "slow" looks like. 2. Doing the same with --candidates=1 can potentially be fast, because we can quie after finding "new". But we don't, and it's also slow. 3. Likewise we should be able to quit when there are no more tags to find. This can happen naturally if a repo has few tags, but also if you restrict the set of tags with --match. Here are the results running against linux.git. Note that I have a commit-graph built for the repo, so "slow" here is ~700ms. Without a commit graph it's more like 9s! Test HEAD -------------------------------------------------------------- 6100.2: describe HEAD 0.70(0.66+0.04) 6100.3: describe HEAD with one max candidate 0.70(0.66+0.04) 6100.4: describe HEAD with one tag 0.70(0.64+0.06) Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- t/perf/p6100-describe.sh | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100755 t/perf/p6100-describe.sh (limited to 't') diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh new file mode 100755 index 0000000000..069f91ce49 --- /dev/null +++ b/t/perf/p6100-describe.sh @@ -0,0 +1,30 @@ +#!/bin/sh + +test_description='performance of git-describe' +. ./perf-lib.sh + +test_perf_default_repo + +# clear out old tags and give us a known state +test_expect_success 'set up tags' ' + git for-each-ref --format="delete %(refname)" refs/tags >to-delete && + git update-ref --stdin Date: Fri, 6 Dec 2024 00:42:18 -0500 Subject: describe: split "found all tags" and max_candidates logic Commit a30154187a (describe: stop traversing when we run out of names, 2024-10-31) taught git-describe to automatically reduce the max_candidates setting to match the total number of possible names. This lets us break out of the traversal rather than fruitlessly searching for more candidates when there are no more to be found. However, setting max_candidates to 0 (e.g., if the repo has no tags) overlaps with the --exact-match option, which explicitly uses the same value. And this causes a regression with --always, which is ignored in exact-match mode. We used to get this in a repo with no tags: $ git describe --always HEAD b2f0a7f and now we get: $ git describe --always HEAD fatal: no tag exactly matches 'b2f0a7f47f5f2aebe1e7fceff19a57de20a78c06' The reason is that we bail early in describe_commit() when max_candidates is set to 0. This logic goes all the way back to 2c33f75754 (Teach git-describe --exact-match to avoid expensive tag searches, 2008-02-24). We should obviously fix this regression, but there are two paths, depending on what you think: $ git describe --always --exact-match and $ git describe --always --candidates=0 should do. Since the "--always" option was added, it has always been ignored in --exact-match (or --candidates=0) mode. I.e., we treat --exact-match as a true exact match of a tag, and never fall back to using --always, even if it was requested. If we think that's a bug (or at least a misfeature), then the right solution is to fix it by removing the early bail-out from 2c33f75754, letting the noop algorithm run and then hitting the --always fallback output. And then our regression naturally goes away, because it follows the same path. If we think that the current "--exact-match --always" behavior is the right thing, then we have to differentiate the case where we automatically reduced max_candidates to 0 from the case where the user asked for it specifically. That's possible to do with a flag, but we can also just reimplement the logic from a30154187a to explicitly break out of the traversal when we run out of candidates (rather than relying on the existing max_candidates check). My gut feeling is along the lines of option 1 (it's a bug, and people would be happy for "--exact-match --always" to give the fallback rather than ignoring "--always"). But the documentation can be interpreted in the other direction, and we've certainly lived with the existing behavior for many years. So it's possible that changing it now is the wrong thing. So this patch fixes the regression by taking the second option, retaining the "--exact-match" behavior as-is. There are two new tests. The first shows that the regression is fixed (we don't even need a new repo without tags; a restrictive --match is enough to create the situation that there are no candidate names). The second test confirms that the "--exact-match --always" behavior remains unchanged and continues to die when there is no tag pointing at the specified commit. It's possible we may reconsider this in the future, but this shows that the approach described above is implemented faithfully. We can also run the perf tests in p6100 to see that we've retained the speedup that a30154187a was going for: Test HEAD^ HEAD -------------------------------------------------------------------------------------- 6100.2: describe HEAD 0.72(0.64+0.07) 0.72(0.66+0.06) +0.0% 6100.3: describe HEAD with one max candidate 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 6100.4: describe HEAD with one tag 0.01(0.01+0.00) 0.01(0.01+0.00) +0.0% Reported-by: Josh Steadmon Signed-off-by: Jeff King Signed-off-by: Junio C Hamano --- builtin/describe.c | 5 ++--- t/t6120-describe.sh | 10 ++++++++++ 2 files changed, 12 insertions(+), 3 deletions(-) (limited to 't') diff --git a/builtin/describe.c b/builtin/describe.c index 8ec3be87df..a6ef8af32a 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -367,7 +367,8 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) seen_commits++; - if (match_cnt == max_candidates) { + if (match_cnt == max_candidates || + match_cnt == hashmap_get_size(&names)) { gave_up_on = c; break; } @@ -667,8 +668,6 @@ int cmd_describe(int argc, NULL); if (!hashmap_get_size(&names) && !always) die(_("No names found, cannot describe anything.")); - if (hashmap_get_size(&names) < max_candidates) - max_candidates = hashmap_get_size(&names); if (argc == 0) { if (broken) { diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh index 69689d2f36..8e8bfed05b 100755 --- a/t/t6120-describe.sh +++ b/t/t6120-describe.sh @@ -716,4 +716,14 @@ test_expect_success 'describe --broken --dirty with a file with changed stat' ' ) ' +test_expect_success '--always with no refs falls back to commit hash' ' + git rev-parse HEAD >expect && + git describe --no-abbrev --always --match=no-such-tag >actual && + test_cmp expect actual +' + +test_expect_success '--exact-match does not show --always fallback' ' + test_must_fail git describe --exact-match --always +' + test_done -- cgit v1.3