aboutsummaryrefslogtreecommitdiff
path: root/ref-filter.c
AgeCommit message (Collapse)Author
2023-07-17ref-filter: avoid parsing tagged objects in match_points_at()Jeff King
When we peel tags to check if they match a --points-at oid, we recursively parse the tagged object to see if it is also a tag. But since the tag itself tells us the type of the object it points to (and even gives us the appropriate object struct via its "tagged" member), we can use that directly. We do still have to make sure to call parse_tag() before looking at each tag. This is redundant for the outermost tag (since we did call parse_object() to find its type), but that's OK; parse_tag() is smart enough to make this a noop when the tag has already been parsed. In my clone of linux.git, with 782 tags (and only 3 non-tags), this yields a significant speedup (bringing us back where we were before the commit before this one started recursively dereferencing tags): Benchmark 1: ./git.old for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 20.3 ms ± 0.5 ms [User: 11.1 ms, System: 9.1 ms] Range (min … max): 19.6 ms … 21.5 ms 141 runs Benchmark 2: ./git.new for-each-ref --points-at=HEAD --format="%(refname)" Time (mean ± σ): 11.4 ms ± 0.2 ms [User: 6.3 ms, System: 5.0 ms] Range (min … max): 11.0 ms … 12.2 ms 250 runs Summary './git.new for-each-ref --points-at=HEAD --format="%(refname)"' ran 1.79 ± 0.05 times faster than './git.old for-each-ref --points-at=HEAD --format="%(refname)"' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17ref-filter: handle nested tags in --points-at optionJan Klötzke
Tags are dereferenced until reaching a different object type to handle nested tags, e.g. on checkout. In contrast, "git tag --points-at=..." fails to list such nested tags because only one level of indirection is obtained in filter_refs(). Implement the recursive dereferencing for the "--points-at" option when filtering refs to unify the behaviour. Signed-off-by: Jan Klötzke <jan@kloetzke.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-14Merge branch 'ks/ref-filter-signature'Junio C Hamano
The "git for-each-ref" family of commands learned placeholders related to GPG signature verification. * ks/ref-filter-signature: ref-filter: add new "signature" atom t/lib-gpg: introduce new prereq GPG2
2023-07-10refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)Taylor Blau
When iterating through the `packed-refs` file in order to answer a query like: $ git for-each-ref --exclude=refs/__hidden__ it would be useful to avoid walking over all of the entries in `refs/__hidden__/*` when possible, since we know that the ref-filter code is going to throw them away anyways. In certain circumstances, doing so is possible. The algorithm for doing so is as follows: - For each excluded pattern, find the first record that matches it, and the first record that *doesn't* match it (i.e. the location you'd next want to consider when excluding that pattern). - Sort the set of excluded regions from the previous step in ascending order of the first location within the `packed-refs` file that matches. - Clean up the results from the previous step: discard empty regions, and combine adjacent regions. The set of regions which remains is referred to as the "jump list", and never contains any references which should be included in the result set. Then when iterating through the `packed-refs` file, if `iter->pos` is ever contained in one of the regions from the previous steps, advance `iter->pos` past the end of that region, and continue enumeration. Note that we only perform this optimization when none of the excluded pattern(s) have special meta-characters in them. For a pattern like "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and everything underneath them) are not connected. A future implementation that handles this case may split the character class (pretending as if two patterns were excluded: "refs/fooa", and "refs/fooc"). There are a few other gotchas worth considering. First, note that the jump list is sorted, so once we jump past a region, we can avoid considering it (or any regions preceding it) again. The member `jump_pos` is used to track the first next-possible region to jump through. Second, note that the jump list is best-effort, since we do not handle loose references, and because of the meta-character issue above. The jump list may not skip past all references which won't appear in the results, but will never skip over a reference which does appear in the result set. In repositories with a large number of hidden references, the speed-up can be significant. Tests here are done with a copy of linux.git with a reference "refs/pull/N" pointing at every commit, as in: $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' | git update-ref --stdin $ git pack-refs --all , it is significantly faster to have `for-each-ref` jump over the excluded references, as opposed to filtering them out after the fact: $ hyperfine \ 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \ 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/" Time (mean ± σ): 798.1 ms ± 3.3 ms [User: 687.6 ms, System: 146.4 ms] Range (min … max): 794.5 ms … 805.5 ms 10 runs Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 98.9 ms ± 1.4 ms [User: 93.1 ms, System: 5.7 ms] Range (min … max): 97.0 ms … 104.0 ms 29 runs Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.7 ms, System: 3.8 ms] Range (min … max): 4.1 ms … 5.8 ms 524 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran 21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' 176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' (Comparing stock git and this patch isn't quite fair, since an earlier commit in this series adds a naive implementation of the `--exclude` option. `git.prev` is built from the previous commit and includes this naive implementation). Using the jump list is fairly straightforward (see the changes to `refs/packed-backend.c::next_record()`), but constructing the list is not. To ensure that the construction is correct, add a new suite of tests in t1419 covering various corner cases (overlapping regions, partially overlapping regions, adjacent regions, etc.). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10builtin/for-each-ref.c: add `--exclude` optionTaylor Blau
When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/*` references, the caller may want to emit all references *except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter.c: parameterize match functions over patternsJeff King
`match_pattern()` and `match_name_as_path()` both take a `struct ref_filter *`, and then store a stack variable `patterns` pointing at `filter->patterns`. The subsequent patch will add a new array of patterns to match over (the excluded patterns, via a new `git for-each-ref --exclude` option), treating the return value of these functions differently depending on which patterns are being used to match. Tweak `match_pattern()` and `match_name_as_path()` to take an array of patterns to prepare for passing either in. Once we start passing either in, `match_pattern()` will have little to do with a particular `struct ref_filter *` instance. To clarify this, drop it from the argument list, and replace it with the only bit of the `ref_filter` that we care about (`filter->ignore_case`). Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter: add `ref_filter_clear()`Jeff King
We did not bother to clean up at all in `git branch` or `git tag`, and `git for-each-ref` only cleans up a couple of members. Add and call `ref_filter_clear()` when cleaning up a `struct ref_filter`. Running this patch (without any test changes) indicates a couple of now leak-free tests. This was found by running: $ make SANITIZE=leak $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate (Note that the `reachable_from` and `unreachable_from` lists should be cleaned as they are used. So this is just covering any case where we might bail before running the reachability check.) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10ref-filter: clear reachable list pointers after freeingJeff King
In `reach_filter()`, we pop all commits from the reachable lists, leaving them empty. But because we're operating on a list pointer that was passed by value, the original `filter.reachable_from` and `filter.unreachable_from` pointers are left dangling. As is the case with the previous commit, nobody touches either of these fields after calling `reach_filter()`, so leaving them dangling is OK. But this future proofs against dangerous situations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-06ref-filter: add new "signature" atomKousik Sanagavarapu
Duplicate the code for outputting the signature and its other parameters for commits and tags in ref-filter from pretty. In the future, this will help in getting rid of the current duplicate implementations of such logic everywhere, when ref-filter can do everything that pretty is doing. The new atom "signature" and its friends are equivalent to the existing pretty formats as follows: %(signature) = %GG %(signature:grade) = %G? %(siganture:signer) = %GS %(signature:key) = %GK %(signature:fingerprint) = %GF %(signature:primarykeyfingerprint) = %GP %(signature:trustlevel) = %GT Co-authored-by: Hariom Verma <hariom18599@gmail.com> Co-authored-by: Jaydeep Das <jaydeepjd.8914@gmail.com> Co-authored-by: Nsengiyumva Wilberforce <nsengiyumvawilberforce@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-24commit.h: reduce unnecessary includesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24treewide: remove cache.h inclusion due to previous changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24versioncmp.h: move declarations for versioncmp.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-name.h: move declarations for object-name.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on oid-array.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'ow/ref-format-remove-unused-member'Junio C Hamano
Code clean-up. * ow/ref-format-remove-unused-member: ref-filter: remove unused ref_format member
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-06Merge branch 'ds/ahead-behind'Junio C Hamano
"git for-each-ref" learns '%(ahead-behind:<base>)' that computes the distances from a single reference point in the history with bunch of commits in bulk. * ds/ahead-behind: commit-reach: add tips_reachable_from_bases() for-each-ref: add ahead-behind format atom commit-reach: implement ahead_behind() logic commit-graph: introduce `ensure_generations_valid()` commit-graph: return generation from memory commit-graph: simplify compute_generation_numbers() commit-graph: refactor compute_topological_levels() for-each-ref: explicitly test no matches for-each-ref: add --stdin option
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-30ref-filter: remove unused ref_format memberØystein Walle
use_rest was added in b9dee075eb (ref-filter: add %(rest) atom, 2021-07-26) but was never used. As far as I can tell it was used in a later patch that was submitted to the mailing list but never applied. Signed-off-by: Øystein Walle <oystwa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "cache.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason
Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove unnecessary cache.h inclusion from a few headersElijah Newren
Ever since a64215b6cd ("object.h: stop depending on cache.h; make cache.h depend on object.h", 2023-02-24), we have a few headers that could have replaced their include of cache.h with an include of object.h. Make that change now. Some C files had to start including cache.h after this change (or some smaller header it had brought in), because the C files were depending on things from cache.h but were only formerly implicitly getting cache.h through one of these headers being modified in this patch. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20commit-reach: add tips_reachable_from_bases()Derrick Stolee
Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use the ref-filter machinery to select references or branches (respectively) that are reachable from a set of commits presented by one or more --merged arguments. This happens within reach_filter(), which uses the revision-walk machinery to walk history in a standard way. However, the commit-reach.c file is full of custom searches that are more efficient, especially for reachability queries that can terminate early when reachability is discovered. Add a new tips_reachable_from_bases() method to commit-reach.c and call it from within reach_filter() in ref-filter.c. This affects both 'git branch' and 'git for-each-ref' as tested in p1500-graph-walks.sh. For the Linux kernel repository, we take an already-fast algorithm and make it even faster: Test HEAD~1 HEAD ------------------------------------------------------------------- 1500.5: contains: git for-each-ref --merged 0.13 0.02 -84.6% 1500.6: contains: git branch --merged 0.14 0.02 -85.7% 1500.7: contains: git tag --merged 0.15 0.03 -80.0% (Note that we remove the iterative 'git rev-list' test from p1500 because it no longer makes sense as a comparison to 'git for-each-ref' and would just waste time running it for these comparisons.) The algorithm is implemented in commit-reach.c in the method tips_reachable_from_base(). This method takes a string_list of tips and assigns the 'util' for each item with the value 1 if the base commit can reach those tips. Like other reachability queries in commit-reach.c, the fastest way to search for "can A reach B?" is to do a depth-first search up to the generation number of B, preferring to explore first parents before later parents. While we must walk all reachable commits up to that generation number when the answer is "no", the depth-first search can answer "yes" much faster than other approaches in most cases. This search becomes trickier when there are multiple targets for the depth-first search. The commits with lower generation number are more likely to be within the history of the start commit, but we don't want to waste time searching commits of low generation number if the commit target with lowest generation number has already been found. The trick here is to take the input commits and sort them by generation number in ascending order. Track the index within this order as min_generation_index. When we find a commit, if its index in the list is equal to min_generation_index, then we can increase the generation number boundary of our search to the next-lowest value in the list. With this mechanism, the number of commits to search is minimized with respect to the depth-first search heuristic. We will walk all commits up to the minimum generation number of a commit that is _not_ reachable from the start, but we will walk only the necessary portion of the depth-first search for the reachable commits of lower generation. Add extra tests for this behavior in t6600-test-reach.sh as the interesting data shape of that repository can sometimes demonstrate corner case bugs. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20for-each-ref: add ahead-behind format atomDerrick Stolee
The previous change implemented the ahead_behind() method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases. Now, integrate that algorithm as part of 'git for-each-ref' hidden behind a new format atom, ahead-behind. This naturally extends to 'git branch' and 'git tag' builtins, as well. This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases. For this reason, failing to read a reference provided from these atoms results in an error. In order to translate the ahead_behind() method information to the format output code in ref-filter.c, we must populate arrays of ahead_behind_count structs. In struct ref_array, we store the full array that will be passed to ahead_behind(). In struct ref_array_item, we store an array of pointers that point to the relvant items within the full array. In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item. It also ensures the lifetime of the ahead_behind_count structs matches the time that the array is being used. Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as it has an interesting repository shape. In particular, its merging strategy and its use of different commit-graphs would demonstrate over- counting if the ahead_behind() method did not already account for that possibility. Also add tests for the specific for-each-ref, branch, and tag builtins. In the case of 'git tag', there are intersting cases that happen when some of the selected tips are not commits. This requires careful logic around commits_nr in the second loop of filter_ahead_behind(). Also, the test in t7004 is carefully located to avoid being dependent on the GPG prereq. It also avoids using the test_commit helper, as that will add ticks to the time and disrupt the expected timestamps in later tag tests. Also add performance tests in a new p1300-graph-walks.sh script. This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref' to the naive implementation by running 'git rev-list --count' processes for each input. For the Git source code repository, the improvement is already obvious: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.07(0.07+0.00) 1500.3: ahead-behind counts: git branch 0.07(0.06+0.00) 1500.4: ahead-behind counts: git tag 0.07(0.06+0.00) 1500.5: ahead-behind counts: git rev-list 1.32(1.04+0.27) But the standard performance benchmark is the Linux kernel repository, which demosntrates a significant improvement: Test this tree --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.27(0.24+0.02) 1500.3: ahead-behind counts: git branch 0.27(0.24+0.03) 1500.4: ahead-behind counts: git tag 0.28(0.27+0.01) 1500.5: ahead-behind counts: git rev-list 4.57(4.03+0.54) The 'git rev-list' test exists in this change as a demonstration, but it will be removed in the next change to avoid wasting time on this comparison. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24ref-filter: mark unused callback parametersJeff King
The ref-filter code uses virtual functions to handle specific atoms, but many of the functions ignore some parameters: - most atom parsers do not need the ref_format itself, unless they are looking at centralized options like use_color, quote_style, etc. - meta-atom handlers like append_atom(), align_atom_handler(), etc, can't generate errors, so ignore their "err" parameter - likewise, the handlers for then/else/end do not even need to look at their atom_value, as the "if" handler put everything they need into the ref_formatting_state stack Since these functions all have to conform to virtual function interfaces, we can't just drop the unused parameters, but must mark them as UNUSED (to appease -Wunused-parameter). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24ref-filter: drop unused atom parameter from get_worktree_path()Jeff King
The get_worktree_path() function is used to populate the %(worktreepath) value, but it has never used its "atom" parameter since it was added in 2582083fa1 (ref-filter: add worktreepath atom, 2019-04-28). Normally we'd use the atom struct to cache any work we do, but in this case there's a global hashmap that does that caching already. So we can just drop the unused parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-08convert trivial uses of strncmp() to starts_with()Jeff King
It's more readable to use starts_with() instead of strncmp() to match a prefix, as the latter requires a manually-computed length, and has the funny "matching is zero" return value common to cmp functions. This patch converts several cases which were found with: git grep 'strncmp(.*, [0-9]*)' But note that it doesn't convert all such cases. There are several where the magic length number is repeated elsewhere in the code, like: /* handle "buf" which isn't NUL-terminated and might be too small */ if (len >= 3 && !strncmp(buf, "foo", 3)) or: /* exact match for "foo", but within a larger string */ if (end - buf == 3 && !strncmp(buf, "foo", 3)) While it would not produce the wrong outcome to use starts_with() in these cases, we'd still be left with one instance of "3". We're better to leave them for now, as the repeated "3" makes it clear that the two are linked (there may be other refactorings that handle both, but they're out of scope for this patch). A few things to note while reading the patch: - all cases but one are trying to match, and so lose the extra "!". The case in the first hunk of urlmatch.c is not-matching, and hence gains a "!". - the case in remote-fd.c is matching the beginning of "connect foo", but we never look at str+8 to parse the "foo" part (which would make this a candidate for skip_prefix(), not starts_with()). This seems at first glance like a bug, but is a limitation of how remote-fd works. - the second hunk in urlmatch.c shows some cases adjacent to other strncmp() calls that are left. These are of the "exact match within a larger string" type, as described above. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26Merge branch 'jk/ref-filter-error-reporting-fix'Junio C Hamano
Clean-ups in error messages produced by "git for-each-ref" and friends. * jk/ref-filter-error-reporting-fix: ref-filter: convert email atom parser to use err_bad_arg() ref-filter: truncate atom names in error messages ref-filter: factor out "unrecognized %(foo) arg" errors ref-filter: factor out "%(foo) does not take arguments" errors ref-filter: reject arguments to %(HEAD)
2022-12-26Merge branch 'jk/unused-post-2.39'Junio C Hamano
Code clean-up around unused function parameters. * jk/unused-post-2.39: userdiff: mark unused parameter in internal callback list-objects-filter: mark unused parameters in virtual functions diff: mark unused parameters in callbacks xdiff: mark unused parameter in xdl_call_hunk_func() xdiff: drop unused parameter in def_ff() ws: drop unused parameter from ws_blank_line() list-objects: drop process_gitlink() function blob: drop unused parts of parse_blob_buffer() ls-refs: use repository parameter to iterate refs
2022-12-15ref-filter: convert email atom parser to use err_bad_arg()Jeff King
The error message for a bogus argument to %(authoremail), etc, is: $ git for-each-ref --format='%(authoremail:foo)' fatal: unrecognized email option: foo Saying just "email" is a little vague; most of the other atom parsers would use the full name "%(authoremail)", but we can't do that here because the same function also handles %(taggeremail), etc. Until recently, passing atom->name was a bad idea, because it erroneously included the arguments in the atom name. But since the previous commit taught err_bad_arg() to handle this, we can now do so and get: fatal: unrecognized %(authoremail) argument: foo which is consistent with other atoms. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15ref-filter: truncate atom names in error messagesJeff King
If you pass a bogus argument to %(refname), you may end up with a message like this: $ git for-each-ref --format='%(refname:foo)' fatal: unrecognized %(refname:foo) argument: foo which is confusing. It should just say: fatal: unrecognized %(refname) argument: foo which is clearer, and is consistent with most other atom parsers. Those other parsers do not have the same problem because they pass the atom name from a string literal in the parser function. But because the parser for %(refname) also handles %(upstream) and %(push), it instead uses atom->name, which includes the arguments. The oid atom parser which handles %(tree), %(parent), etc suffers from the same problem. It seems like the cleanest fix would be for atom->name to be _just_ the name, since there's already a separate "args" field. But since that field is also used for other things, we can't change it easily (e.g., it's how we find things in the used_atoms array, and clearly %(refname) and %(refname:short) are not the same thing). Instead, we'll teach our error_bad_arg() function to stop at the first ":". This is a little hacky, as we're effectively re-parsing the name, but the format is simple enough to do this as a one-liner, and this localizes the change to the error-reporting code. We'll give the same treatment to err_no_arg(). None of its callers use this atom->name trick, but it's worth future-proofing it while we're here. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15ref-filter: factor out "unrecognized %(foo) arg" errorsJeff King
Atom parsers that take arguments generally have a catch-all for "this arg is not recognized". Most of them use the same printf template, which is good, because it makes life easier for translators. Let's pull this template into a helper function, which makes the code in the parsers shorter and avoids any possibility of differences. As with the previous commit, we'll pick an arbitrary atom to make sure the test suite covers this code. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15ref-filter: factor out "%(foo) does not take arguments" errorsJeff King
Many atom parsers give the same error message, differing only in the name of the atom. If we use "%s does not take arguments", that should make life easier for translators, as they only need to translate one string. And in doing so, we can easily pull it into a helper function to make sure they are all using the exact same string. I've added a basic test here for %(HEAD), just to make sure this code is exercised at all in the test suite. We could cover each such atom, but the effort-to-reward ratio of trying to maintain an exhaustive list doesn't seem worth it. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15ref-filter: reject arguments to %(HEAD)Jeff King
The %(HEAD) atom doesn't take any arguments, but unlike other atoms in the same boat (objecttype, deltabase, etc), it does not detect this situation and complain. Let's make it consistent with the others. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-14Merge branch 'ab/various-leak-fixes'Junio C Hamano
Various leak fixes. * ab/various-leak-fixes: built-ins: use free() not UNLEAK() if trivial, rm dead code revert: fix parse_options_concat() leak cherry-pick: free "struct replay_opts" members rebase: don't leak on "--abort" connected.c: free the "struct packed_git" sequencer.c: fix "opts->strategy" leak in read_strategy_opts() ls-files: fix a --with-tree memory leak revision API: call graph_clear() in release_revisions() unpack-file: fix ancient leak in create_temp_file() built-ins & libs & helpers: add/move destructors, fix leaks dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" read-cache.c: clear and free "sparse_checkout_patterns" commit: discard partial cache before (re-)reading it {reset,merge}: call discard_index() before returning tests: mark tests as passing with SANITIZE=leak
2022-12-13ls-refs: use repository parameter to iterate refsJeff King
The ls_refs() function (for the v2 protocol command of the same name) takes a repository parameter (like all v2 commands), but ignores it. It should use it to access the refs. This isn't a bug in practice, since we only call this function when serving upload-pack from the main repository. But it's an awkward gotcha, and it causes -Wunused-parameter to complain. The main reason we don't use the repository parameter is that the ref iteration interface we call doesn't have a "refs_" variant that takes a ref_store. However we can easily add one. In fact, since there is only one other caller (in ref-filter.c), there is no need to maintain the non-repository wrapper; that caller can just use the_repository. It's still a long way from consistently using a repository object, but it's one small step in the right direction. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21built-ins & libs & helpers: add/move destructors, fix leaksÆvar Arnfjörð Bjarmason
Fix various leaks in built-ins, libraries and a test helper here we were missing a call to strbuf_release(), string_list_clear() etc, or were calling them after a potential "return". Comments on individual changes: - builtin/checkout.c: Fix a memory leak that was introduced in [1]. A sibling leak introduced in [2] was recently fixed in [3]. As with [3] we should be using the wt_status_state_free_buffers() API introduced in [4]. - builtin/repack.c: Fix a leak that's been here since this use of "strbuf_release()" was added in a1bbc6c0176 (repack: rewrite the shell script in C, 2013-09-15). We don't use the variable for anything except this loop, so we can instead free it right afterwards. - builtin/rev-parse: Fix a leak that's been here since this code was added in 21d47835386 (Add a parseopt mode to git-rev-parse to bring parse-options to shell scripts., 2007-11-04). - builtin/stash.c: Fix a couple of leaks that have been here since this code was added in d4788af875c (stash: convert create to builtin, 2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we allocated earlier in the function, let's release all of them. - ref-filter.c: Fix a leak in 482c1191869 (gpg-interface: improve interface for parsing tags, 2021-02-11), we don't use the "payload" variable that we ask parse_signature() to populate for us, so let's free it. - t/helper/test-fake-ssh.c: Fix a leak that's been here since this code was added in 3064d5a38c7 (mingw: fix t5601-clone.sh, 2016-01-27). Let's free the "struct strbuf" as soon as we don't need it anymore. 1. c45f0f525de (switch: reject if some operation is in progress, 2019-03-29) 2. 2708ce62d21 (branch: sort detached HEAD based on a flag, 2021-01-07) 3. abcac2e19fa (ref-filter.c: fix a leak in get_head_description, 2022-09-25) 4. 962dd7ebc3e (wt-status: introduce wt_status_state_free_buffers(), 2020-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-02ref-filter: fix parsing of signatures with CRLF and no bodyJeff King
This commit fixes a bug when parsing tags that have CRLF line endings, a signature, and no body, like this (the "^M" are marking the CRs): this is the subject^M -----BEGIN PGP SIGNATURE-----^M ^M ...some stuff...^M -----END PGP SIGNATURE-----^M When trying to find the start of the body, we look for a blank line separating the subject and body. In this case, there isn't one. But we search for it using strstr(), which will find the blank line in the signature. In the non-CRLF code path, we check whether the line we found is past the start of the signature, and if so, put the body pointer at the start of the signature (effectively making the body empty). But the CRLF code path doesn't catch the same case, and we end up with the body pointer in the middle of the signature field. This has two visible problems: - printing %(contents:subject) will show part of the signature, too, since the subject length is computed as (body - subject) - the length of the body is (sig - body), which makes it negative. Asking for %(contents:body) causes us to cast this to a very large size_t when we feed it to xmemdupz(), which then complains about trying to allocate too much memory. These are essentially the same bugs fixed in the previous commit, except that they happen when there is a CRLF blank line in the signature, rather than no blank line at all. Both are caused by the refactoring in 9f75ce3d8f (ref-filter: handle CRLF at end-of-line more gracefully, 2020-10-29). We can fix this by doing the same "sigstart" check that we do in the non-CRLF case. And rather than repeat ourselves, we can just use short-circuiting OR to collapse both cases into a single conditional. I.e., rather than: if (strstr("\n\n")) ...found blank, check if it's in signature... else if (strstr("\r\n\r\n")) ...found blank, check if it's in signature... else ...no blank line found... we can collapse this to: if (strstr("\n\n")) || strstr("\r\n\r\n"))) ...found blank, check if it's in signature... else ...no blank line found... The tests show the problem and the fix. Though it wasn't broken, I included contents:signature here to make sure it still behaves as expected, but note the shell hackery needed to make it work. A less-clever option would be to skip using test_atom and just "append_cr >expected" ourselves. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-02ref-filter: fix parsing of signatures without blank linesJeff King
When ref-filter is asked to show %(content:subject), etc, we end up in find_subpos() to parse out the three major parts: the subject, the body, and the signature (if any). When searching for the blank line between the subject and body, if we don't find anything, we try to treat the whole message as the subject, with no body. But our idea of "the whole message" needs to take into account the signature, too. Since 9f75ce3d8f (ref-filter: handle CRLF at end-of-line more gracefully, 2020-10-29), the code instead goes all the way to the end of the buffer, which produces confusing output. Here's an example. If we have a tag message like this: this is the subject -----BEGIN SSH SIGNATURE----- ...some stuff... -----END SSH SIGNATURE----- then the current parser will put the start of the body at the end of the whole buffer. This produces two buggy outcomes: - since the subject length is computed as (body - subject), showing %(contents:subject) will print both the subject and the signature, rather than just the single line - since the body length is computed as (sig - body), and the body now starts _after_ the signature, we end up with a negative length! Fortunately we never access out-of-bounds memory, because the negative length is fed to xmemdupz(), which casts it to a size_t, and xmalloc() bails trying to allocate an absurdly large value. In theory it would be possible for somebody making a malicious tag to wrap it around to a more reasonable value, but it would require a tag on the order of 2^63 bytes. And even if they did, all they get is an out of bounds string read. So the security implications are probably not interesting. We can fix both by correctly putting the start of the body at the same index as the start of the signature (effectively making the body empty). Note that this is a real issue with signatures generated with gpg.format set to "ssh", which would look like the example above. In the new tests here I use a hard-coded tag message, for a few reasons: - regardless of what the ssh-signing code produces now or in the future, we should be testing this particular case - skipping the actual signature makes the tests simpler to write (and allows them to run on more systems) - t6300 has helpers for working with gpg signatures; for the purposes of this bug, "BEGIN PGP" is just as good a demonstration, and this simplifies the tests Curiously, the same issue doesn't happen with real gpg signatures (and there are even existing tests in t6300 with cover this). Those have a blank line between the header and the content, like: this is the subject -----BEGIN PGP SIGNATURE----- ...some stuff... -----END PGP SIGNATURE----- Because we search for the subject/body separator line with a strstr(), we find the blank line in the signature, even though it's outside of what we'd consider the body. But that puts us unto a separate code path, which realizes that we're now in the signature and adjusts the line back to "sigstart". So this patch is basically just making the "no line found at all" case match that. And note that "sigstart" is always defined (if there is no signature, it points to the end of the buffer as you'd expect). Reported-by: Martin Englund <martin@englund.nu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>