aboutsummaryrefslogtreecommitdiff
path: root/revision.c
AgeCommit message (Collapse)Author
8 daysMerge branch 'mm/line-log-use-standard-diff-output'Junio C Hamano
The way the "git log -L<range>:<file>" feature is bolted onto the log/diff machinery is being reworked a bit to make the feature compatible with more diff options, like -S/G. * mm/line-log-use-standard-diff-output: doc: note that -L supports patch formatting and pickaxe options t4211: add tests for -L with standard diff options line-log: route -L output through the standard diff pipeline line-log: fix crash when combined with pickaxe options
2026-03-26revision: avoid writing to const string for parent marksJeff King
We take in a "const char *", but may write a NUL into it when parsing parent marks like "foo^-", since we want to isolate "foo" as a string for further parsing. This is usually OK, as our "const" strings are often actually argv strings which are technically writeable, but we'd segfault with a string literal like: handle_revision_arg("HEAD^-", &revs, 0, 0); Similar to how we handled dotdot in a previous commit, we can avoid this by making a temporary copy of the left-hand side of the string. The cost should negligible compared to the rest of the parsing (like actually parsing commits to create their parent linked-lists). There is one slightly tricky thing, though. We parse some of the marks progressively, so that if we see "foo^!" for example, we'll strip that down to "foo" not just for calling add_parents_only(), but also for the rest of the function. That makes sense since we eventually want to pass "foo" to get_oid_with_context(). But it also means that we'll keep looking for other marks. In particular, "foo^-^!" is valid, though oddly "foo^!^-" would ignore the "^-". I'm not sure if this is a weird historical artifact of the implementation, or if there are important corner cases. So I've left the behavior unchanged. Each mark we find allocates a string with the mark stripped, which means we could allocate multiple times (and carry a free-able pointer for each to the end). But in practice we won't, because of the three marks, "^@" jumps immediately to the end without further parsing, and "^-^!" is nonsense that nobody would pass. So you'd get one allocation in general, and never more than two. Another obvious option would be to just copy "arg" up front and be OK with munging it. But that means we pay the cost even when we find no marks. We could make a single copy upon finding a mark and then munge, but that adds extra code to each site (checking whether somebody else allocated, and if not, adjusting our "mark" pointer to be relative to the copied string). I aimed for something that was clear and obvious, if a bit verbose. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-26revision: make handle_dotdot() interface less confusingJeff King
There are two very subtle bits to the way we parse ".." (and "...") range operators: 1. In handle_dotdot_1(), we assume that the incoming arguments "dotdot" and "arg" are part of the same string, with the first digit of the range-operator blanked to a NUL. Then when we want the full name (e.g., to report an error), we replace the NUL with a dot to restore the original string. 2. In handle_dotdot(), we take in a const string, but then we modify it by overwriting the range operator with a NUL. This has worked OK in practice since we tend to pass in buffers that are actually writeable (including argv), but segfaults with something like: handle_revision_arg("..HEAD", &revs, 0, 0); On top of that, building with recent versions of glibc causes the compiler to complain, because it notices when we use strchr() or strstr() to launder away constness (basically detecting the possibility of the segfault above via the type system). Instead of munging the buffer, let's instead make a temporary copy of the left-hand side of the range operator. That avoids any const violations, and lets us pass around the parsed elements independently: the left-hand side, the right-hand side, the number of dots (via the "symmetric" flag), and the original full string for error messages. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16line-log: route -L output through the standard diff pipelineMichael Montalbo
`git log -L` has always bypassed the standard diff pipeline. `dump_diff_hacky()` in line-log.c hand-rolls its own diff headers and hunk output, which means most diff formatting options are silently ignored. A NEEDSWORK comment has acknowledged this since the feature was introduced: /* * NEEDSWORK: manually building a diff here is not the Right * Thing(tm). log -L should be built into the diff pipeline. */ Remove `dump_diff_hacky()` and its helpers and route -L output through `builtin_diff()` / `fn_out_consume()`, the same path used by `git diff` and `git log -p`. The mechanism is a pair of callback wrappers that sit between `xdi_diff_outf()` and `fn_out_consume()`, filtering xdiff's output to only the tracked line ranges. To ensure xdiff emits all lines within each range as context, the context length is inflated to span the largest range. Wire up the `-L` implies `--patch` default in revision setup rather than forcing it at output time, so `line_log_print()` is just `diffcore_std()` + `diff_flush()` with no format save/restore. Rename detection is a no-op since pairs are already resolved during the history walk in `queue_diffs()`, but running `diffcore_std()` means `-S`/`-G` (pickaxe), `--orderfile`, and `--diff-filter` now work with `-L`, and `diff_resolve_rename_copy()` sets pair statuses correctly without manual assignment. Switch `diff_filepair_dup()` from `xmalloc` to `xcalloc` so that new fields (including `line_ranges`) are zero-initialized by default. As a result, diff formatting options that were previously silently ignored (e.g. --word-diff, --no-prefix, -w, --color-moved) now work with -L, and output gains `index` lines, `new file mode` headers, and funcname context in `@@` headers. This is a user-visible output change: tools that parse -L output may need to handle the additional header lines. The context-length inflation means xdiff may process more output than needed for very wide line ranges, but benchmarks on files up to 7800 lines show no measurable regression. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-09Merge branch 'ps/refs-for-each'Junio C Hamano
Code refactoring around refs-for-each-* API functions. * ps/refs-for-each: refs: replace `refs_for_each_fullref_in()` refs: replace `refs_for_each_namespaced_ref()` refs: replace `refs_for_each_glob_ref()` refs: replace `refs_for_each_glob_ref_in()` refs: replace `refs_for_each_rawref_in()` refs: replace `refs_for_each_rawref()` refs: replace `refs_for_each_ref_in()` refs: improve verification for-each-ref options refs: generalize `refs_for_each_fullref_in_prefixes()` refs: generalize `refs_for_each_namespaced_ref()` refs: speed up `refs_for_each_glob_ref_in()` refs: introduce `refs_for_each_ref_ext` refs: rename `each_ref_fn` refs: rename `do_for_each_ref_flags` refs: move `do_for_each_ref_flags` further up refs: move `refs_head_ref_namespaced()` refs: remove unused `refs_for_each_include_root_ref()`
2026-03-04Merge branch 'pw/no-more-NULL-means-current-worktree'Junio C Hamano
API clean-up for the worktree subsystem. * pw/no-more-NULL-means-current-worktree: path: remove repository argument from worktree_git_path() wt-status: avoid passing NULL worktree
2026-03-02Merge branch 'ps/odb-for-each-object'Junio C Hamano
Revamp object enumeration API around odb. * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-25Merge branch 'ds/revision-maximal-only'Junio C Hamano
"git rev-list" and friends learn "--maximal-only" to show only the commits that are not reachable by other commits. * ds/revision-maximal-only: revision: add --maximal-only option
2026-02-23refs: replace `refs_for_each_fullref_in()`Patrick Steinhardt
Replace calls to `refs_for_each_fullref_in()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23refs: replace `refs_for_each_glob_ref()`Patrick Steinhardt
Replace calls to `refs_for_each_glob_ref()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23refs: replace `refs_for_each_glob_ref_in()`Patrick Steinhardt
Replace calls to `refs_for_each_glob_ref_in()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23refs: rename `each_ref_fn`Patrick Steinhardt
Similar to the preceding commit, rename `each_ref_fn` to better match our current best practices around how we name things. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19path: remove repository argument from worktree_git_path()Phillip Wood
worktree_git_path() takes a struct repository and a struct worktree which also contains a struct repository. The repository argument was added by a973f60dc7c (path: stop relying on `the_repository` in `worktree_git_path()`, 2024-08-13) and exists because the worktree argument is optional. Having two ways of passing a repository is a potential foot-gun as if the the worktree argument is present the repository argument must match the worktree's repository member. Since the last commit there are no callers that pass a NULL worktree so lets remove the repository argument. This removes the potential confusion and lets us delete a number of uses of "the_repository". worktree_git_path() has the following callers: - builtin/worktree.c:validate_no_submodules() which is called from check_clean_worktree() and move_worktree(), both of which supply a non-NULL worktree. - builtin/fsck.c:cmd_fsck() which loops over all worktrees. - revision.c:add_index_objects_to_pending() which loops over all worktrees. - worktree.c:worktree_lock_reason() which dereferences wt before calling worktree_git_path(). - wt-status.c:wt_status_check_bisect() and wt_status_check_rebase() which are always called with a non-NULL worktree after the last commit. - wt-status.c:git_branch() which is only called by wt_status_check_bisect() and wt_status_check_rebase(). Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-13Merge branch 'ps/commit-list-functions-renamed'Junio C Hamano
Rename three functions around the commit_list data structure. * ps/commit-list-functions-renamed: commit: rename `free_commit_list()` to conform to coding guidelines commit: rename `reverse_commit_list()` to conform to coding guidelines commit: rename `copy_commit_list()` to conform to coding guidelines
2026-01-26treewide: enumerate promisor objects via `odb_for_each_object()`Patrick Steinhardt
We have multiple callsites where we enumerate all promisor objects in the object database via `for_each_packed_object()`. This is done by passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to skip over all non-promisor objects. These callsites can be trivially converted to `odb_for_each_object()` as we know to skip enumeration of loose objects in case the `PROMISOR_ONLY` flag was passed by the caller. Refactor the sites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: rename `FOR_EACH_OBJECT_*` flagsPatrick Steinhardt
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-22revision: add --maximal-only optionDerrick Stolee
When inspecting a range of commits from some set of starting references, it is sometimes useful to learn which commits are not reachable from any other commits in the selected range. One such application is in the creation of a sequence of bundles for the bundle URI feature. Creating a stack of bundles representing different slices of time includes defining which references to include. If all references are used, then this may be overwhelming or redundant. Instead, selecting commits that are maximal to the range could help defining a smaller reference set to use in the bundle header. Add a new '--maximal-only' option to restrict the output of a revision range to be only the commits that are not reachable from any other commit in the range, based on the reachability definition of the walk. This is accomplished by adding a new 28th bit flag, CHILD_VISITED, that is set as we walk. This does extend the bit range in object.h, but using an earlier bit may collide with another feature. The tests demonstrate the behavior of the feature with a positive-only range, ranges with negative references, and walk-modifying flags like --first-parent and --exclude-first-parent-only. Since the --boundary option would not increase any results when used with the --maximal-only option, mark them as incompatible. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-21Merge branch 'rs/tree-wo-the-repository'Junio C Hamano
Remove implicit reliance on the_repository global in the APIs around tree objects and make it explicit which repository to work in. * rs/tree-wo-the-repository: cocci: remove obsolete the_repository rules cocci: convert parse_tree functions to repo_ variants tree: stop using the_repository tree: use repo_parse_tree() path-walk: use repo_parse_tree_gently() pack-bitmap-write: use repo_parse_tree() delta-islands: use repo_parse_tree() bloom: use repo_parse_tree() add-interactive: use repo_parse_tree_indirect() tree: add repo_parse_tree*() environment: move access to core.maxTreeDepth into repo settings
2026-01-21Merge branch 'ps/packfile-store-in-odb-source'Junio C Hamano
The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source
2026-01-15Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-objectJunio C Hamano
* ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs
2026-01-15commit: rename `free_commit_list()` to conform to coding guidelinesPatrick Steinhardt
Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `free_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15commit: rename `copy_commit_list()` to conform to coding guidelinesPatrick Steinhardt
Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `copy_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09cocci: convert parse_tree functions to repo_ variantsRené Scharfe
Add and apply a semantic patch to convert calls to parse_tree() and friends to the corresponding variant that takes a repository argument, to allow the functions that implicitly use the_repository to be retired once all potential in-flight topics are settled and converted as well. The changes in .c files were generated by Coccinelle, but I fixed a whitespace bug it would have introduced to builtin/commit.c. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: refactor kept-pack cache to work with packfile storesPatrick Steinhardt
The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-25revision: export commit_stackRené Scharfe
Dynamic arrays of commit pointers are used in several places. Some of them use a custom struct to hold array, item count and capacity, others have them as separate variables linked by a common name part. Pick one succinct, clean implementation -- commit_stack -- and convert the different variants to it to reduce code duplication. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-13Merge branch 'ps/commit-graph-per-object-source'Junio C Hamano
Code clean-up around commit-graph. * ps/commit-graph-per-object-source: commit-graph: pass graphs that are to be merged as parameter commit-graph: return commit graph from `repo_find_commit_pos_in_graph()` commit-graph: return the prepared commit graph from `prepare_commit_graph()` revision: drop explicit check for commit graph blame: drop explicit check for commit graph
2025-09-29Merge branch 'jk/setup-revisions-freefix'Junio C Hamano
There are double frees and leaks around setup_revisions() API used in "git stash show", which has been fixed, and setup_revisions() API gained a wrapper to make it more ergonomic when using it with strvec-manged argc/argv pairs. * jk/setup-revisions-freefix: revision: retain argv NULL invariant in setup_revisions() treewide: pass strvecs around for setup_revisions_from_strvec() treewide: use setup_revisions_from_strvec() when we have a strvec revision: add wrapper to setup_revisions() from a strvec revision: manage memory ownership of argv in setup_revisions() stash: tell setup_revisions() to free our allocated strings
2025-09-22revision: retain argv NULL invariant in setup_revisions()Jeff King
In an argc/argv pair, the entry for argv[argc] is generally NULL. You can iterate by counting up to argc, or by looking for the NULL entry in argv. When we pass such a pair to setup_revisions(), it shrinks argc to account for the options we consumed and returns the result to the caller. But it doesn't touch the entries after the reduced argc. So argv[argc] will be left pointing at some arbitrary entry rather than NULL. This isn't the source of any known bugs, since all callers are aware of the limitation and act accordingly. But it's a possible gotcha that may be easy to miss. Let's set the new argv[argc] to NULL, taking care to free it if the caller asked us to do so. It is tempting to do likewise for all of the entries afterwards, too, as some of them may also need to be freed (e.g., if coming from a strvec). But doing so isn't entirely trivial, as we munge argc in the function (e.g., when we find "--" and move all of the entries after it into the prune_data list). It would be possible with some light refactoring, but it's probably not worth it. Nobody should ever look at them (they are beyond the revised argc and past the NULL argv entry) outside of strvec cleanup, and setup_revisions_from_strvec() already handles this case. There's one other interesting gotcha: many callers which do not want to provide arguments just pass 0/NULL for argc/argv. We need to check for this case before assigning the final NULL. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22revision: add wrapper to setup_revisions() from a strvecJeff King
The setup_revisions() function was designed to take the argc/argv pair from the operating system. But we sometimes construct our own argv using a strvec and pass that in. There are a few gotchas that callers need to deal with here: 1. You should always pass the free_removed_argv_elements option via setup_revision_opt. Otherwise, entries may be leaked if setup_revisions() re-shuffles options. 2. After setup_revisions() returns, the strvec state is odd. We get a reduced argc from setup_revisions() telling us how many unknown options were left in place. Entries after that in argv may be retained, or may be NULL (depending on how the reshuffling happened). But the strvec's "nr" field still represents the original value, and some of the entries it thinks it is still storing may be NULL. Callers must be careful with how they access it. Some callers deal with (1), but not all. In practice they are OK because they do not pass any options that would cause setup_revisions() to re-shuffle (namely unknown options which may be relayed from the user, and the use of the "--" separator). But it's probably a good idea to consistently pass this option anyway to future-proof ourselves against the details of setup_revisions() changing. No callers address (2), though I don't think there any visible bugs. Most of them simply call strvec_clear() and never otherwise look at the result. And in fact, if they naively set foo.nr to the argc returned by setup_revisions(), that would cause leaks! Because setup_revisions() does not free consumed options[1], we have to leave the "nr" field of the strvec at its original value to find and free them during strvec_clear(). So I don't think there are any bugs to fix here, but we can make things safer and simpler for callers. Let's introduce a helper function that sets the free_removed_argv_elements automatically and shrinks the strvec to represent the retained options afterwards (taking care to free the now-obsolete entries). We'll start by converting all of the call-sites which use the free_removed_argv_elements option. There should be no behavior change for them, except that their "shrunken" entries are cleaned up immediately, rather than waiting for a strvec_clear() call. [1] Arguably setup_revisions() should be doing this step for us if we told it to free removed options, but there are many existing callers which will be broken if it did. Introducing this helper is a possible first step towards that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-22revision: manage memory ownership of argv in setup_revisions()Jeff King
The setup_revisions() function takes an argc/argv pair and consumes arguments from it, returning a reduced argc count to the caller. But it may also overwrite entries within the argv array, as it shifts unknown options to the front of argv (so they can be found in the range of 0..argc-1 after we return). For a normal argc/argv coming from the operating system, this is OK. We don't need to worry about memory ownership of the strings in those entries. But some callers pass in allocated strings from a strvec, and we do need to care about those. We faced a similar issue in f92dbdbc6a (revisions API: don't leak memory on argv elements that need free()-ing, 2022-08-02), which added an option for callers to tell us that elements need to be freed. But the implementation within setup_revisions() was incomplete. It only covered the case of dropping "--", but not the movement of unknown options. When we shift argv entries around, we should free the elements we are about to overwrite, so they are not leaked. For example, in: git stash show -p --invalid we will pass this to setup_revisions(): argc = 3, argv[] = { "show", "-p", "--invalid", NULL } which will then return: argc = 2, argv[] = { "show", "--invalid", "--invalid", NULL } overwriting the "-p" entry, which is leaked unless we free it at that moment. You can see in the output above another potential problem. We now have two copies of the "--invalid" string. If the caller does not respect the new argc when free-ing the strings via strvec_clear(), we'll get a double-free. And git-stash suffers from this, and will crash with the above command. So it seems at first glance that the solution is to just assign the reduced argc to the strvec.nr field in the caller. Then it would stop after freeing only any copied entries. But that's not always right either! Remember that we are reducing "argc" to account for elements we've consumed. So if there isn't an invalid option, we'd turn: argc = 2, argv[] = { "show", "-p", NULL } into: argc = 1, argv[] = { "show", "-p", NULL } In that case strvec_clear() must keep looking past the shortened argc we return to find the original "-p" to free. It needs to use the original argc to do that. We can solve this by turning our argv writes into strict moves, not copies. When we shuffle an unknown option to the front, we'll overwrite its old position with NULL. That leaves an argv array that may have NULL "holes" in it. So in the "--invalid" example above we get: argc = 2, argv[] = { "show", "--invalid", NULL, NULL } but something like "git stash -p --invalid -p" would yield: argc = 3, argv[] = { "show", "--invalid", NULL, "-p", NULL } because we move "--invalid" to overwrite the first "-p", but the second one is quietly consumed. But strvec_clear() can handle that fine (it iterates over the "nr" field, and passing NULL to free() is OK). To ease the implementation, I've introduced a helper function. It's a little hacky because it must take a double-pointer to set the old position to NULL. Which in turn means we cannot pass "&arg", our local alias for the current entry we're parsing, but instead "&argv[i]", the pointer in the original array. And to make it even more confusing, we delegate some of this work to handle_revision_opt(), which is passed a subset of the argv array, so is always working on "&argv[0]". Likewise, because handle_revision_opt() only receives the part of argv left to parse, it receives the array to accumulate unknown options as a separate unkc/unkv pair. But we're always working on the same argv array, so our strategy works fine. I suspect this would be a bit more obvious (and avoid some pointer cleverness) if all functions saw the full argv array and worked with positions within it (and our new helper would take two positions, a src and dst). But that would involve refactoring handle_revision_opt(). I punted on that, as what's here is not too ugly and is all contained within revision.c itself. The new test demonstrates that "git stash show -p --invalid" no longer crashes with a double-free (because we move instead of copy). And it passes with SANITIZE=leak because we free "-p" before overwriting. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-04revision: drop explicit check for commit graphPatrick Steinhardt
When filtering down revisions by paths we know to use bloom filters from the commit graph, if we have any. The entry point for this is in `check_maybe_different_in_bloom_filter()`, where we first verify that: - We do have a commit graph. - That the commit is contained therein by checking that we have a proper generation number. - And that the graph contains a bloom filter. The first check is somewhat redundant though: if we don't have a commit graph, then the second check would already tell us that we don't have a generation number for the specific commit. In theory this could be seen as a performance optimization to short-circuit for scenarios where there is no commit graph. But in practice this shouldn't matter: if there is no commit graph, then the commit graph data slab would also be unpopulated and thus a lookup of the commit should happen in constant time. Drop the unnecessary check. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-21Merge branch 'ly/changed-path-traversal-with-magic-pathspec'Junio C Hamano
Revision traversal limited with pathspec, like "git log dir/*", used to ignore changed-paths Bloom filter when the pathspec contained wildcards; now they take advantage of the filter when they can. * ly/changed-path-traversal-with-magic-pathspec: bloom: enable bloom filter with wildcard pathspec in revision traversal
2025-08-21Merge branch 'ps/remote-rename-fix'Junio C Hamano
"git remote rename origin upstream" failed to move origin/HEAD to upstream/HEAD when origin/HEAD is unborn and performed other renames extremely inefficiently, which has been corrected. * ps/remote-rename-fix: builtin/remote: only iterate through refs that are to be renamed builtin/remote: rework how remote refs get renamed builtin/remote: determine whether refs need renaming early on builtin/remote: fix sign comparison warnings refs: simplify logic when migrating reflog entries refs: pass refname when invoking reflog entry callback
2025-08-11bloom: enable bloom filter with wildcard pathspec in revision traversalLidong Yan
When traversing commits, a pathspec item can be used to limit the traversal to commits that modify the specified paths. And the commit-graph includes a Bloom filter to exclude commits that definitely did not modify a given pathspec item. During commit traversal, the Bloom filter can significantly improve performance. However, it is disabled if the specified pathspec item contains wildcard characters or magic signatures. For performance reason, enable Bloom filter even if a pathspec item contains wildcard characters by filtering only the non-wildcard part of the pathspec item. The function of pathspec magic signature is generally to narrow down the path specified by the pathspecs. So, enable Bloom filter when the magic signature is "top", "glob", "attr", "--depth" or "literal". "exclude" is used to select paths other than the specified path, rather than serving as a filtering function, so it cannot be used together with the Bloom filter. Since Bloom filter is not case insensitive even in case insensitive system (e.g. MacOS), it cannot be used together with "icase" magic. With this optimization, we get some improvements for pathspecs with wildcards or magic signatures. First, in the Git repository we see these modest results: git log -100 -- "t/*" Benchmark 1: new Time (mean ± σ): 20.4 ms ± 0.6 ms Range (min … max): 19.3 ms … 24.4 ms Benchmark 2: old Time (mean ± σ): 23.4 ms ± 0.5 ms Range (min … max): 22.5 ms … 24.7 ms git log -100 -- ":(top)t" Benchmark 1: new Time (mean ± σ): 16.2 ms ± 0.4 ms Range (min … max): 15.3 ms … 17.2 ms Benchmark 2: old Time (mean ± σ): 18.6 ms ± 0.5 ms Range (min … max): 17.6 ms … 20.4 ms But in a larger repo, such as the LLVM project repo below, we get even better results: git log -100 -- "libc/*" Benchmark 1: new Time (mean ± σ): 16.0 ms ± 0.6 ms Range (min … max): 14.7 ms … 17.8 ms Benchmark 2: old Time (mean ± σ): 26.7 ms ± 0.5 ms Range (min … max): 25.4 ms … 27.8 ms git log -100 -- ":(top)libc" Benchmark 1: new Time (mean ± σ): 15.6 ms ± 0.6 ms Range (min … max): 14.4 ms … 17.7 ms Benchmark 2: old Time (mean ± σ): 19.6 ms ± 0.5 ms Range (min … max): 18.6 ms … 20.6 ms Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-06refs: pass refname when invoking reflog entry callbackPatrick Steinhardt
With `refs_for_each_reflog_ent()` callers can iterate through all the reflog entries for a given reference. The callback that is being invoked for each such entry does not receive the name of the reference that we are currently iterating through. This isn't really a limiting factor, as callers can simply pass the name via the callback data. But this layout sometimes does make for a bit of an awkward calling pattern. One example: when iterating through all reflogs, and for each reflog we iterate through all refnames, we have to do some extra book keeping to track which reference name we are currently yielding reflog entries for. Change the signature of the callback function so that the reference name of the reflog gets passed through to it. Adapt callers accordingly and start using the new parameter in trivial cases. The next commit will refactor the reference migration logic to make use of this parameter so that we can simplify its logic a bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-08-01Merge branch 'jk/revision-no-early-output'Junio C Hamano
Remove unsupported, unused, and unsupportable old option from "git log". * jk/revision-no-early-output: revision: drop early output option
2025-07-23Merge branch 'ly/changed-paths-traversal'Junio C Hamano
Lift the limitation to use changed-path filter in "git log" so that it can be used for a pathspec with multiple literal paths. * ly/changed-paths-traversal: bloom: optimize multiple pathspec items in revision revision: make helper for pathspec to bloom keyvec bloom: replace struct bloom_key * with struct bloom_keyvec bloom: rename function operates on bloom_key bloom: add test helper to return murmur3 hash
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-21revision: drop early output optionJeff King
We added the --early-output feature long ago in cdcefbc971 (Add "--early-output" log flag for interactive GUI use, 2007-11-03). The idea was that GUIs could use it to progressively render a history view, showing something quick-and-inaccurate at first and then enhancing it later. But we never documented it, and it appears never to have been used, even by the projects which initially expressed interest. There was an RFC patch for gitk to use it: http://public-inbox.org/git/18221.2285.259487.655684@cargo.ozlabs.ibm.com/ but it was never merged. Likewise QGit had a patch in: https://lore.kernel.org/git/e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com/ but it was never fully merged (to this day, QGit has a commented-out line to add "--early-output" to the "log" invocation). Searching for other mentions on the web or forges like github.com turns up nothing. Meanwhile, the feature has been broken off and on over the years without anybody noticing (and naturally, there are no tests, either). From 2011 to 2017 the option didn't even turn on via "--early-output"; this was fixed in e35b6ac56f (revision.h: turn rev_info.early_output back into an unsigned int, 2017-06-10). It worked for a while then, but it does not interact well at all with commit-graphs (which are turned on by default these days). The main logic to count early commits is triggered by limit_list(), which we traditionally invoked when showing output in topo-order (and --early-output always enables --topo-order). But that changed in f0d9cc4196 (revision.c: begin refactoring --topo-order logic, 2018-11-01). Now when we have generation numbers, we skip limit_list() entirely, and the early-output code shows no commits, and just the final header "Final output: 1 done". Which is syntactically OK, but semantically wrong: that message should give the total number of commits we're about to show. So let's drop the feature. It is extra code that is untested and undocumented, and makes working on the revision machinery more brittle. Given the history above, it seems unlikely that anybody is using it (or has used it), and we can drop it without the usual deprecation period. A gentler option might be to "soft" drop it: keep accepting the option, have it imply --topo-order as it does now, print "Final output: 1 done", and then do our regular traversal. That would keep any hypothetical caller working. But it doesn't seem worth the hassle to me. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15Merge branch 'ps/object-store'Junio C Hamano
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-15bloom: optimize multiple pathspec items in revisionLidong Yan
To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible. Add new test cases in t/t4216-log-bloom.sh to ensure - consistent results between the optimization for multiple pathspec items using bloom filter and the case without bloom filter optimization. - does not use bloom filter if any pathspec item is not literal. With these optimizations, we get some improvements for multi-pathspec runs of 'git log'. First, in the Git repository we see these modest results: Benchmark 1: old Time (mean ± σ): 73.1 ms ± 2.9 ms Range (min … max): 69.9 ms … 84.5 ms 42 runs Benchmark 2: new Time (mean ± σ): 55.1 ms ± 2.9 ms Range (min … max): 51.1 ms … 61.2 ms 52 runs Summary 'new' ran 1.33 ± 0.09 times faster than 'old' But in a larger repo, such as the LLVM project repo below, we get even better results: Benchmark 1: old Time (mean ± σ): 1.974 s ± 0.006 s Range (min … max): 1.960 s … 1.983 s 10 runs Benchmark 2: new Time (mean ± σ): 262.9 ms ± 2.4 ms Range (min … max): 257.7 ms … 266.2 ms 11 runs Summary 'new' ran 7.51 ± 0.07 times faster than 'old' Signed-off-by: Derrick Stolee <stolee@gmail.com> [ly: rename convert_pathspec_to_filter() to convert_pathspec_to_bloom_keyvec()] Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14Merge branch 'jk/all-negative-diff-filter-fix'Junio C Hamano
A diff-filter with negative-only specification like "git log --diff-filter=d" did not trigger correctly, which has been fixed. * jk/all-negative-diff-filter-fix: setup_revisions(): turn on diffs for all-negative diff filter
2025-07-14revision: make helper for pathspec to bloom keyvecLidong Yan
When preparing to use bloom filters in a revision walk, Git populates a boom_keyvec with an array of bloom keys for the components of a path. Before we create the ability to map multiple pathspecs to multiple bloom_keyvecs, extract the conversion from a pathspec to a bloom_keyvec into its own helper method. This simplifies the state that persists in prepare_to_use_bloom_filter() as well as makes the future change much simpler. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: replace struct bloom_key * with struct bloom_keyvecLidong Yan
Previously, we stored bloom keys in a flat array and marked a commit as NOT TREESAME if any key reported "definitely not changed". To support multiple pathspec items, we now require that for each pathspec item, there exists a bloom key reporting "definitely not changed". This "for every" condition makes a flat array insufficient, so we introduce a new structure to group keys by a single pathspec item. `struct bloom_keyvec` is introduced to replace `struct bloom_key *` and `bloom_key_nr`. And because we want to support multiple pathspec items, we added a bloom_keyvec * and a bloom_keyvec_nr field to `struct rev_info` to represent an array of bloom_keyvecs. This commit still optimize only one pathspec item, thus bloom_keyvec_nr can only be 0 or 1. New bloom_keyvec_* functions are added to create and destroy a keyvec. bloom_filter_contains_vec() is added to check if all key in keyvec is contained in a bloom filter. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: rename function operates on bloom_keyLidong Yan
git code style requires that functions operating on a struct S should be named in the form S_verb. However, the functions operating on struct bloom_key do not follow this convention. Therefore, fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill() and bloom_key_clear(), respectively. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-07setup_revisions(): turn on diffs for all-negative diff filterJeff King
When the user gives us a diff filter like --diff-filter=D, we need to do a tree diff even if we're not planning to show the diff result itself, in order to decide whether to show the commit at all. So there's an explicit check of revs->diffopt.filter in setup_revisions(), and we set revs->diff if any bits are set. Originally that "filter" field covered both positive capital-letter filters (like "D") and also negative lowercase filters (like "d"), so it was sufficient for both cases. But later, 75408ca949 (diff-filter: be more careful when looking for negative bits, 2022-01-28) split the negative bits out into a "filter_not" field. We eventually fold those into "filter", but not until diff_setup_done() is called, which happens after our explicit check. As a result, a purely negative filter like: git log --diff-filter=d failed to turn on diffs at all. But rather than fail to filter by diff, because the filter variable is eventually set, we mistakenly show no commits at all, thinking that the empty diffs were cases where nothing passed through the filter. The smallest fix here is to just have our check look for any bits in either "filter" or "filter_not". I suspect it would also be OK to reorder the function a bit to call diff_setup_done() earlier, but that risks violating some other subtle ordering dependency. So I went with the simple and safe solution here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `for_each()` functionsPatrick Steinhardt
There are a couple of iterator-style functions that execute a callback for each instance of a given set, all of which currently depend on `the_repository`. Refactor them to instead take an object database as parameter so that we can get rid of this dependency. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>