aboutsummaryrefslogtreecommitdiff
path: root/line-log.c
AgeCommit message (Collapse)Author
2026-03-16line-log: route -L output through the standard diff pipelineMichael Montalbo
`git log -L` has always bypassed the standard diff pipeline. `dump_diff_hacky()` in line-log.c hand-rolls its own diff headers and hunk output, which means most diff formatting options are silently ignored. A NEEDSWORK comment has acknowledged this since the feature was introduced: /* * NEEDSWORK: manually building a diff here is not the Right * Thing(tm). log -L should be built into the diff pipeline. */ Remove `dump_diff_hacky()` and its helpers and route -L output through `builtin_diff()` / `fn_out_consume()`, the same path used by `git diff` and `git log -p`. The mechanism is a pair of callback wrappers that sit between `xdi_diff_outf()` and `fn_out_consume()`, filtering xdiff's output to only the tracked line ranges. To ensure xdiff emits all lines within each range as context, the context length is inflated to span the largest range. Wire up the `-L` implies `--patch` default in revision setup rather than forcing it at output time, so `line_log_print()` is just `diffcore_std()` + `diff_flush()` with no format save/restore. Rename detection is a no-op since pairs are already resolved during the history walk in `queue_diffs()`, but running `diffcore_std()` means `-S`/`-G` (pickaxe), `--orderfile`, and `--diff-filter` now work with `-L`, and `diff_resolve_rename_copy()` sets pair statuses correctly without manual assignment. Switch `diff_filepair_dup()` from `xmalloc` to `xcalloc` so that new fields (including `line_ranges`) are zero-initialized by default. As a result, diff formatting options that were previously silently ignored (e.g. --word-diff, --no-prefix, -w, --color-moved) now work with -L, and output gains `index` lines, `new file mode` headers, and funcname context in `@@` headers. This is a user-visible output change: tools that parse -L output may need to handle the additional header lines. The context-length inflation means xdiff may process more output than needed for very wide line ranges, but benchmarks on files up to 7800 lines show no measurable regression. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16line-log: fix crash when combined with pickaxe optionsMichael Montalbo
queue_diffs() passes the caller's diff_options, which may carry user-specified pickaxe state, to diff_tree_oid() and diffcore_std() when detecting renames for line-level history tracking. When pickaxe options are present on the command line (-G and -S to filter by text pattern, --find-object to filter by object identity), diffcore_std() also runs diffcore_pickaxe(), which may discard diff pairs that are relevant for rename detection. Losing those pairs breaks rename following. Before a2bb801f6a (line-log: avoid unnecessary full tree diffs, 2019-08-21), this silently truncated history at rename boundaries. That commit moved filter_diffs_for_paths() inside the rename- detection block, so it only runs when diff_might_be_rename() returns true. When pickaxe discards a rename pair, the rename goes undetected, and a deletion pair at a subsequent commit passes through uncleaned, reaching process_diff_filepair() with an invalid filespec and triggering an assertion failure. Fix this by building a private diff_options for the rename-detection path inside queue_diffs(), following the same pattern used by blame's find_rename(). This isolates the rename machinery from unrelated user-specified options. Reported-by: Matthew Hughes <matthewhughes934@gmail.com> Signed-off-by: Michael Montalbo <mmontalbo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-15commit: rename `free_commit_list()` to conform to coding guidelinesPatrick Steinhardt
Our coding guidelines say that: Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. While most of the functions related to `struct commit_list` already follow that naming schema, `free_commit_list()` doesn't. Rename the function to address this and adjust all of its callers. Add a compatibility wrapper for the old function name to ease the transition and avoid any semantic conflicts with in-flight patch series. This wrapper will be removed once Git 2.53 has been released. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-18Merge branch 'sg/line-log-boundary-fixes'Junio C Hamano
A corner case bug in "git log -L..." has been corrected. * sg/line-log-boundary-fixes: line-log: show all line ranges touched by the same diff range line-log: fix assertion error
2025-08-25line-log: simplify condition checking for merge commitsSZEDER Gábor
In process_ranges_arbitrary_commit() the condition deciding whether the given commit is not a merge, i.e. that it doesn't have more than one parent, is head-scratchingly backwards, flip it. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25line-log: initialize diff queue in process_ranges_ordinary_commit()SZEDER Gábor
process_ranges_ordinary_commit() uses a local diff queue variable, which it leaves uninitialized before passing its address to queue_diffs(). This is not an issue, because at the end of that function the contents of an other diff queue is moved into it by simply overwriting whatever is in there, i.e. without reading any uninitialized memory. Still, seeing the uninitialized diff queue being passed around scared me more than once, so out of caution let's make sure that it's initialized. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25line-log: get rid of the parents array in process_ranges_merge_commit()SZEDER Gábor
We can easily iterate through the parents of a merge commit without turning the list of parents into a dynamically allocated array of parents, so let's do so. This way we can avoid a memory allocation for each processed merge commit, though its effect on runtime seems to be unmeasurable. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-25line-log: avoid unnecessary tree diffs when processing merge commitsSZEDER Gábor
In process_ranges_merge_commit(), the line-level log first creates an array of diff queues by iterating over all parents of a merge commit and computing a tree diff for each. Then in a second loop it iterates over those diff queues, and if it finds that none of the interesting paths were modified in one of them, then it will return early. This means that when none of the interesting paths were modified between a merge and its first parent, then the tree diff between the merge and its second (Nth...) parent was computed in vain. Unify these two loops, so when it iterates over all parents of a merge commit, then it first computes the tree diff between the merge and that particular parent and then processes the resulting diff queue right away. This way we can spare some tree diff computing, thereby speeding up line-level log in repositories with mergy history: # git.git, 25.8% of commits are merges: Benchmark 1: ./git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0 Time (mean ± σ): 1.001 s ± 0.009 s [User: 0.906 s, System: 0.095 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: ./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0 Time (mean ± σ): 445.5 ms ± 3.4 ms [User: 358.8 ms, System: 84.3 ms] Range (min … max): 440.1 ms … 450.3 ms 10 runs Summary './git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran 2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' # linux.git, 7.5% of commits are merges: Benchmark 1: ./git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16 Time (mean ± σ): 3.246 s ± 0.007 s [User: 2.835 s, System: 0.409 s] Range (min … max): 3.232 s … 3.255 s 10 runs Benchmark 2: ./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16 Time (mean ± σ): 2.467 s ± 0.014 s [User: 2.113 s, System: 0.353 s] Range (min … max): 2.455 s … 2.505 s 10 runs Summary './git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran 1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' And since now each iteration computes a tree diff and processes its result, there is no reason to store the diff queues for each merge parent anymore, so replace that diff queue array with a loop-local diff queue variable. With this change the static free_diffqueues() helper function in 'line-log.c' has no more callers left, remove it. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-20line-log: show all line ranges touched by the same diff rangeSZEDER Gábor
When line-level log is invoked with more than one disjoint line range in the same file, and one of the commits happens to change that file such that one diff range modifies more than one line range, then changes to all modified line ranges should be shown, but only the changes in the first modified line range are: $ git log --oneline -p 80ca903 (HEAD -> master) Initial diff --git a/file b/file new file mode 100644 index 0000000..00935f1 --- /dev/null +++ b/file @@ -0,0 +1,10 @@ +Line 1 +Line 2 +Line 3 +Line 4 +Line 5 +Line 6 +Line 7 +Line 8 +Line 9 +Line 10 $ git log --oneline -L1,2:file -L4,5:file -L7,8:file 80ca903 (HEAD -> master) Initial diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +1,2 @@ +Line 1 +Line 2 The line-log-specific diff printer is already clever enough to handle the case when one line range covers multiple diff ranges, but the possibility of one diff range touching multiple disjoint line ranges was apparently overlooked. Add the necessary condition to dump_diff_hacky_one() to handle this case as well, and show all modified line ranges: $ git log --oneline -L1,2:file -L4,5:file -L7,8:file 0f9a5b4 (HEAD -> master) Initial diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +1,2 @@ +Line 1 +Line 2 @@ -0,0 +4,2 @@ +Line 4 +Line 5 @@ -0,0 +7,2 @@ +Line 7 +Line 8 This bug was already present in the initial line-log implementation added in 2da1d1f6f (Implement line-history search (git log -L), 2013-03-28). Interestingly, that commit already contained a canned test case covering a similar scenario: "-L '/long f/',/^}/:a.c -L /main/,/^}/:a.c simple" This test case looks for two line ranges in the same file, and both trace back disjointly to the test repository's inital commit, therefore changes to both line ranges should have been shown for the initial commit, but only changes for the first line range are shown. So this test case should have failed from the very beginning, but it never did, because, unfortunately, the canned expected result is incorrect, as it doesn't include changes for the second line range. A similar test with a similarly incorrect canned expected result was added later in 209618860c (log -L: fix overlapping input ranges, 2013-04-05). Correct these two canned expected results to contain the changes for the second line range for the initial commit as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-20line-log: fix assertion errorSZEDER Gábor
When line-level log is invoked with more than one disjoint line range in the same file, and one of the commits happens to change that file such that: - the last line of a line range R(n) immediately preceeds the first line modified or added by a hunk H, and - subtracting the number of lines added by hunk H from the start and end of the subsequent line range R(n+1) would result in a range overlapping with line range R(n), then git aborts with an assertion error, because those overlapping line ranges violate the invariants: $ git log --oneline -p 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file index 572d5d9..00935f1 100644 --- a/file +++ b/file @@ -3,3 +3,8 @@ Line 2 Line 3 Line 4 Line 5 +Line 6 +Line 7 +Line 8 +Line 9 +Line 10 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file new file mode 100644 index 0000000..572d5d9 --- /dev/null +++ b/file @@ -0,0 +1,5 @@ +Line 1 +Line 2 +Line 3 +Line 4 +Line 5 $ git log --oneline -L3,5:file -L7,8:file git: line-log.c:73: range_set_append: Assertion `rs->nr == 0 || rs->ranges[rs->nr-1].end <= a' failed. Aborted (core dumped) The line-log machinery encodes line and diff ranges internally as [start, end) pairs, i.e. include 'start' but exclude 'end', and line numbering starts at 0 (as opposed to the -LX,Y option, where it starts at 1, IOW the parameter -L3,5 is represented internally as { start = 2, end = 5 }). The reason for this assertion error and some related issues is that there are a couple of places where 'end' is mistakenly considered to be part of the range: - When a commit modifies an interesting path, the line-log machinery first checks which diff range (i.e. hunk) modify any line ranges. This is done in diff_ranges_filter_touched(), where the outer loop iterates over the diff ranges, and in each iteration the inner loop advances the line ranges supposedly until the current line range ends at or after the current diff range starts, and then the current diff and line ranges are checked for overlap. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10) starts, so the inner loop should advance, and then the second line range [6, 8) and the diff range should be checked for overlap. Unfortunately, the condition of the inner loop mistakenly considers 'end' as part of the line range, and, seeing the diff range starting at 5 and the line range ending at 5, it doesn't skip the first range. Consequently, the diff range and the first line range are checked for overlap, and after that the outer loop runs out of diff ranges, and then the processing goes on in the false belief that this commit didn't touch any of the interesting line ranges. The line-log machinery later shifts the line ranges to account for any added/removed lines in the diff ranges preceeding each line range. This leaves the first line range intact, but attempts to shift the second line range [6, 8) by 5 lines towards the beginning of the file, resulting in [1, 3), triggering the assertion error, because the two overlapping line ranges violate the invariants. Fix that loop condition in diff_ranges_filter_touched() to not treat 'end' as part of the line range. - With the above fix the assertion error is gone... but, alas, we now get stuck in an endless loop! This happens in range_set_difference(), where a couple of nested loops iterate over the line and diff ranges, and a condition is supposed to break the middle loop when the current line range ends before the current diff range, so processing could continue with the next line range. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10) starts, so this condition should trigger and break the middle loop. Unfortunately, just like in the case of the assertion error, this conditions mistakenly considers 'end' as part of the line range, and, seeing the line range ending at 5 and the diff range starting at 5, it doesn't break the loop, which will then go on and on. Fix this condition in range_set_difference() to not treat 'end' as part of the line range. - With the above fix the endless loop is gone... but, alas, the output is now wrong, as it shows both line ranges for HEAD, even though the first line range is not modified by that commit: $ git log --oneline -L3,5:file -L7,8:file 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file --- a/file +++ b/file @@ -3,3 +3,3 @@ Line 3 Line 4 Line 5 @@ -6,0 +7,2 @@ +Line 7 +Line 8 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +3,3 @@ +Line 3 +Line 4 +Line 5 In dump_diff_hacky_one() a couple of nested loops are responsible for finding and printing the modified line ranges: the big outer loop iterates over all line ranges, and the first inner loop skips over the diff ranges that end before the start of the current line range. This is followed by a condition checking whether the current diff range starts after the end of the current line range, which, when fulfilled, continues and advances the outer loop to the next line range. For HEAD in the above example the first line range [2, 5) ends just before the diff range [5, 10), so this condition should trigger, and the outer loop should advance to the second line range. Unfortunately, just like in the previous cases, this condition mistakenly considers 'end' as part of the line range, and, seeing the first line range ending at 5 and the diff range starting at 5, it doesn't continue to advance the outher loop, but goes on to show the (unmodified) first line range. Fix this condition to not treat 'end' as part of the line range, just like in the previous cases. After all this the command in the above example finally finishes and produces the right output: $ git log --oneline -L3,5:file -L7,8:file 73e4e2f (HEAD -> master) Add lines 6 7 8 9 10 diff --git a/file b/file --- a/file +++ b/file @@ -6,0 +7,2 @@ +Line 7 +Line 8 66e3561 Add lines 1 2 3 4 5 diff --git a/file b/file --- /dev/null +++ b/file @@ -0,0 +3,3 @@ +Line 3 +Line 4 +Line 5 Add a canned test similar to the above example, with the line ranges adjusted to the test repository's history. Reported-by: Evgeni Chasnovski <evgeni.chasnovski@gmail.com> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-14bloom: rename function operates on bloom_keyLidong Yan
git code style requires that functions operating on a struct S should be named in the form S_verb. However, the functions operating on struct bloom_key do not follow this convention. Therefore, fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill() and bloom_key_clear(), respectively. Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21line-log: fix leak when rewriting commit parentsPatrick Steinhardt
In `process_ranges_merge_commit()` we try to figure out which of the parents can be blamed for the given line changes. When we figure out that none of the files in the line-log have changed we assign the complete blame to that commit and rewrite the parents of the current commit to only use that single parent. This is done via `commit_list_append()`, which is misleadingly _not_ appending to the list of parents. Instead, we overwrite the parents with the blamed parent. This makes us lose track of the old pointers, creating a memory leak. Fix this issue by freeing the parents before we overwrite them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10Merge branch 'jk/output-prefix-cleanup'Junio C Hamano
Code clean-up. * jk/output-prefix-cleanup: diff: store graph prefix buf in git_graph struct diff: return line_prefix directly when possible diff: return const char from output_prefix callback diff: drop line_prefix_length field line-log: use diff_line_prefix() instead of custom helper
2024-10-10Merge branch 'ps/leakfixes-part-8'Junio C Hamano
More leakfixes. * ps/leakfixes-part-8: (23 commits) builtin/send-pack: fix leaking list of push options remote: fix leaking push reports t/helper: fix leaks in proc-receive helper pack-write: fix return parameter of `write_rev_file_order()` revision: fix leaking saved parents revision: fix memory leaks when rewriting parents midx-write: fix leaking buffer pack-bitmap-write: fix leaking OID array pseudo-merge: fix leaking strmap keys pseudo-merge: fix various memory leaks line-log: fix several memory leaks diff: improve lifecycle management of diff queues builtin/revert: fix leaking `gpg_sign` and `strategy` config t/helper: fix leaking repository in partial-clone helper builtin/clone: fix leaking repo state when cloning with bundle URIs builtin/pack-redundant: fix various memory leaks builtin/stash: fix leaking `pathspec_from_file` submodule: fix leaking submodule entry list wt-status: fix leaking buffer with sparse directories shell: fix leaking strings ...
2024-10-03line-log: use diff_line_prefix() instead of custom helperJeff King
Our local output_prefix() is exactly the same as the public diff_line_prefix() function. Let's just use that one, saving us a little bit of code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-03line-log: protect inner strbuf from freeDerrick Stolee
The output_prefix() method in line-log.c may call a function pointer via the diff_options struct. This function pointer returns a strbuf struct and then its buffer is passed back. However, that implies that the consumer is responsible to free the string. This is especially true because the default behavior is to duplicate the empty string. The existing functions used in the output_prefix pointer include: 1. idiff_prefix_cb() in diff-lib.c. This returns the data pointer, so the value exists across multiple calls. 2. diff_output_prefix_callback() in graph.c. This uses a static strbuf struct, so it reuses buffers across calls. These should not be freed. 3. output_prefix_cb() in range-diff.c. This is similar to the diff-lib.c case. In each case, we should not be freeing this buffer. We can convert the output_prefix() function to return a const char pointer and stop freeing the result. This choice is essentially the opposite of what was done in 394affd46d (line-log: always allocate the output prefix, 2024-06-07). This was discovered via 'valgrind' while investigating a public report of a bug in 'git log --graph -L' [1]. [1] https://github.com/git-for-windows/git/issues/5185 This issue would have been caught by the new test, when Git is compiled with ASan to catch these double frees. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30line-log: fix several memory leaksPatrick Steinhardt
As described in "line-log.c" itself, the code is "leaking like a sieve". These leaks are all of rather trivial nature, so this commit plugs them without going too much into details for each of those leaks. The leaks are hit by t4211, but plugging them alone does not make the full test suite pass. The remaining leaks are unrelated to the line-log subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30diff: improve lifecycle management of diff queuesPatrick Steinhardt
The lifecycle management of diff queues is somewhat confusing: - For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`, which does not release any memory but rather initializes the queue, only. This is in contrast to our common naming schema, where "clearing" means that we release underlying memory and then re-initialize the data structure such that it is ready to use. - A second offender is `diff_free_queue()`, which does not free the queue structure itself. It is rather a release-style function. Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while `diff_free_queue()` is replaced by `diff_queue_release()`. While on it, adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to release underlying memory to instead call `diff_queue_clear()` to fix memory leaks. This memory leak is exposed by t4211, but plugging it alone does not make the whole test suite pass. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07line-log: always allocate the output prefixPatrick Steinhardt
The returned string by `output_prefix()` is sometimes a string constant and sometimes an allocated string. This has been fine until now because we always leak the allocated strings, and thus we never tried to free the string constant. Fix the code to always return an allocated string and free the returned value at all callsites. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07line-log: stop assigning string constant to file parent bufferPatrick Steinhardt
Stop assigning a string constant to the file parent buffer and instead assign an allocated string. While the code is fine in practice, it will break once we compile with `-Wwrite-strings`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26line-log.h: remove unnecessary includeElijah Newren
The unnecessary include in the header transitively pulled in some other headers actually needed by source files, though. Have those source files explicitly include the headers they need. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-05revision: clear decoration structs during release_revisions()Jeff King
The point of release_revisions() is to free memory associated with the rev_info struct, but we have several "struct decoration" members that are left untouched. Since the previous commit introduced a function to do that, we can just call it. We do have to provide some specialized callbacks to map the void pointers onto real ones (the alternative would be casting the existing function pointers; this generally works because "void *" is usually interchangeable with a struct pointer, but it is technically forbidden by the standard). Since the line-log code does not expose the type it stores in the decoration (nor of course the function to free it), I put this behind a generic line_log_free() entry point. It's possible we may need to add more line-log specific bits anyway (running t4211 shows a number of other leaks in the line-log code). While this doubtless cleans up many leaks triggered by the test suite, the only script which becomes leak-free is t4217, as it does very little beyond a simple traversal (its existing leak was from the use of --children, which is now fixed). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21diff.h: remove unnecessary include of oidset.hElijah Newren
This also made it clear that several .c files depended upon various things that oidset included, but had omitted the direct #include for those headers. Add those now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24diff.h: reduce unnecessary includesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove cache.h inclusion due to setup.h changesElijah Newren
By moving several declarations to setup.h, the previous patch made it possible to remove the include of cache.h in several source files. Do so. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24rewrite_parents(): mark unused callback parameterJeff King
The rewrite_parents() function takes a callback, but not every callback needs the "rev" parameter. Mark the unused one so -Wunused-parameter will be happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-02line-log: free the diff queues' arrays when processing merge commitsSZEDER Gábor
When processing merge commits, the line-level log first creates an array of diff queues, each comparing the merge commit with one of its parents, to check whether any of the files in the given line ranges were modified. Alas, when freeing these queues it only frees the filepairs in the queues, but not the queues' internal arrays holding pointers to those filepairs. Use the diff_free_queue() helper function introduced in the previous commit to free the diff queues' internal arrays as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-02line-log: free diff queue when processing non-merge commitsSZEDER Gábor
When processing a non-merge commit, the line-level log first asks the tree-diff machinery whether any of the files in the given line ranges were modified between the current commit and its parent, and if some of them were, then it loads the contents of those files from both commits to see whether their line ranges were modified and/or need to be adjusted. Alas, it doesn't free() the diff queue holding the results of that query and the contents of those files once its done. This can add up to a substantial amount of leaked memory, especially when the file in question is big and is frequently modified: a user reported "Out of memory, malloc failed" errors with a 2MB text file that was modified ~2800 times [1] (I estimate the leak would use up almost 11GB memory in that case). Free that diff queue to plug this memory leak. However, instead of simply open-coding the necessary three lines, add them as a helper function to the diff API, because it will be useful elsewhere as well. [1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/ Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2021-03-13use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12line-log: handle deref_tag() returning NULLRené Scharfe
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-29Merge branch 'tb/bloom-improvements'Junio C Hamano
"git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'
2020-09-17bloom: split 'get_bloom_filter()' in twoTaylor Blau
'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert more callers away from argv_array nameJeff King
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts remaining files from the first half of the alphabet, to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add '[abcdefghjkl]*'". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: rename files from argv-array to strvecJeff King
This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '*.c' '*.h' | xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08Merge branch 'ds/line-log-on-bloom'Junio C Hamano
"git log -L..." now takes advantage of the "which paths are touched by this commit?" info stored in the commit-graph system. * ds/line-log-on-bloom: line-log: integrate with changed-path Bloom filters line-log: try to use generation number-based topo-ordering line-log: more responsive, incremental 'git log -L' t4211-line-log: add tests for parent oids line-log: remove unused fields from 'struct line_log_data'
2020-05-11line-log: integrate with changed-path Bloom filtersDerrick Stolee
The previous changes to the line-log machinery focused on making the first result appear faster. This was achieved by no longer walking the entire commit history before returning the early results. There is still another way to improve the performance: walk most commits much faster. Let's use the changed-path Bloom filters to reduce time spent computing diffs. Since the line-log computation requires opening blobs and checking the content-diff, there is still a lot of necessary computation that cannot be replaced with changed-path Bloom filters. The part that we can reduce is most effective when checking the history of a file that is deep in several directories and those directories are modified frequently. In this case, the computation to check if a commit is TREESAME to its first parent takes a large fraction of the time. That is ripe for improvement with changed-path Bloom filters. We must ensure that prepare_to_use_bloom_filters() is called in revision.c so that the bloom_filter_settings are loaded into the struct rev_info from the commit-graph. Of course, some cases are still forbidden, but in the line-log case the pathspec is provided in a different way than normal. Since multiple paths and segments could be requested, we compute the struct bloom_key data dynamically during the commit walk. This could likely be improved, but adds code complexity that is not valuable at this time. There are two cases to care about: merge commits and "ordinary" commits. Merge commits have multiple parents, but if we are TREESAME to our first parent in every range, then pass the blame for all ranges to the first parent. Ordinary commits have the same condition, but each is done slightly differently in the process_ranges_[merge|ordinary]_commit() methods. By checking if the changed-path Bloom filter can guarantee TREESAME, we can avoid that tree-diff cost. If the filter says "probably changed", then we need to run the tree-diff and then the blob-diff if there was a real edit. The Linux kernel repository is a good testing ground for the performance improvements claimed here. There are two different cases to test. The first is the "entire history" case, where we output the entire history to /dev/null to see how long it would take to compute the full line-log history. The second is the "first result" case, where we find how long it takes to show the first value, which is an indicator of how quickly a user would see responses when waiting at a terminal. To test, I selected the paths that were changed most frequently in the top 10,000 commits using this command (stolen from StackOverflow [1]): git log --pretty=format: --name-only -n 10000 | sort | \ uniq -c | sort -rg | head -10 which results in 121 MAINTAINERS 63 fs/namei.c 60 arch/x86/kvm/cpuid.c 59 fs/io_uring.c 58 arch/x86/kvm/vmx/vmx.c 51 arch/x86/kvm/x86.c 45 arch/x86/kvm/svm.c 42 fs/btrfs/disk-io.c 42 Documentation/scsi/index.rst (along with a bogus first result). It appears that the path arch/x86/kvm/svm.c was renamed, so we ignore that entry. This leaves the following results for the real command time: | | Entire History | First Result | | Path | Before | After | Before | After | |------------------------------|--------|--------|--------|--------| | MAINTAINERS | 4.26 s | 3.87 s | 0.41 s | 0.39 s | | fs/namei.c | 1.99 s | 0.99 s | 0.42 s | 0.21 s | | arch/x86/kvm/cpuid.c | 5.28 s | 1.12 s | 0.16 s | 0.09 s | | fs/io_uring.c | 4.34 s | 0.99 s | 0.94 s | 0.27 s | | arch/x86/kvm/vmx/vmx.c | 5.01 s | 1.34 s | 0.21 s | 0.12 s | | arch/x86/kvm/x86.c | 2.24 s | 1.18 s | 0.21 s | 0.14 s | | fs/btrfs/disk-io.c | 1.82 s | 1.01 s | 0.06 s | 0.05 s | | Documentation/scsi/index.rst | 3.30 s | 0.89 s | 1.46 s | 0.03 s | It is worth noting that the least speedup comes for the MAINTAINERS file which is * edited frequently, * low in the directory heirarchy, and * quite a large file. All of those points lead to spending more time doing the blob diff and less time doing the tree diff. Still, we see some improvement in that case and significant improvement in other cases. A 2-4x speedup is likely the more typical case as opposed to the small 5% change for that file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-11line-log: more responsive, incremental 'git log -L'SZEDER Gábor
The current line-level log implementation performs a preprocessing step in prepare_revision_walk(), during which the line_log_filter() function filters and rewrites history to keep only commits modifying the given line range. This preprocessing affects both responsiveness and correctness: - Git doesn't produce any output during this preprocessing step. Checking whether a commit modified the given line range is somewhat expensive, so depending on the size of the given revision range this preprocessing can result in a significant delay before the first commit is shown. - Limiting the number of displayed commits (e.g. 'git log -3 -L...') doesn't limit the amount of work during preprocessing, because that limit is applied during history traversal. Alas, by that point this expensive preprocessing step has already churned through the whole revision range to find all commits modifying the revision range, even though only a few of them need to be shown. - It rewrites parents, with no way to turn it off. Without the user explicitly requesting parent rewriting any parent object ID shown should be that of the immediate parent, just like in case of a pathspec-limited history traversal without parent rewriting. However, after that preprocessing step rewrote history, the subsequent "regular" history traversal (i.e. get_revision() in a loop) only sees commits modifying the given line range. Consequently, it can only show the object ID of the last ancestor that modified the given line range (which might happen to be the immediate parent, but many-many times it isn't). This patch addresses both the correctness and, at least for the common case, the responsiveness issues by integrating line-level log filtering into the regular revision walking machinery: - Make process_ranges_arbitrary_commit(), the static function in 'line-log.c' deciding whether a commit modifies the given line range, public by removing the static keyword and adding the 'line_log_' prefix, so it can be called from other parts of the revision walking machinery. - If the user didn't explicitly ask for parent rewriting (which, I believe, is the most common case): - Call this now-public function during regular history traversal, namely from get_commit_action() to ignore any commits not modifying the given line range. Note that while this check is relatively expensive, it must be performed before other, much cheaper conditions, because the tracked line range must be adjusted even when the commit will end up being ignored by other conditions. - Skip the line_log_filter() call, i.e. the expensive preprocessing step, in prepare_revision_walk(), because, thanks to the above points, the revision walking machinery is now able to filter out commits not modifying the given line range while traversing history. This way the regular history traversal sees the unmodified history, and is therefore able to print the object ids of the immediate parents of the listed commits. The eliminated preprocessing step can greatly reduce the delay before the first commit is shown, see the numbers below. - However, if the user did explicitly ask for parent rewriting via '--parents' or a similar option, then stick with the current implementation for now, i.e. perform that expensive filtering and history rewriting in the preprocessing step just like we did before, leaving the initial delay as long as it was. I tried to integrate line-level log filtering with parent rewriting into the regular history traversal, but, unfortunately, several subtleties resisted... :) Maybe someday we'll figure out how to do that, but until then at least the simple and common (i.e. without parent rewriting) 'git log -L:func:file' commands can benefit from the reduced delay. This change makes the failing 'parent oids without parent rewriting' test in 't4211-line-log.sh' succeed. The reduced delay is most noticable when there's a commit modifying the line range near the tip of a large-ish revision range: # no parent rewriting requested, no commit-graph present $ time git --no-pager log -L:read_alternate_refs:sha1-file.c -1 v2.23.0 Before: real 0m9.570s user 0m9.494s sys 0m0.076s After: real 0m0.718s user 0m0.674s sys 0m0.044s A significant part of the remaining delay is spent reading and parsing commit objects in limit_list(). With the help of the commit-graph we can eliminate most of that reading and parsing overhead, so here are the timing results of the same command as above, but this time using the commit-graph: Before: real 0m8.874s user 0m8.816s sys 0m0.057s After: real 0m0.107s user 0m0.091s sys 0m0.013s The next patch will further reduce the remaining delay. To be clear: this patch doesn't actually optimize the line-level log, but merely moves most of the work from the preprocessing step to the history traversal, so the commits modifying the line range can be shown as soon as they are processed, and the traversal can be terminated as soon as the given number of commits are shown. Consequently, listing the full history of a line range, potentially all the way to the root commit, will take the same time as before (but at least the user might start reading the output earlier). Furthermore, if the most recent commit modifying the line range is far away from the starting revision, then that initial delay will still be significant. Additional testing by Derrick Stolee: In the Linux kernel repository, the MAINTAINERS file was changed ~3,500 times across the ~915,000 commits. In addition to that edit frequency, the file itself is quite large (~18,700 lines). This means that a significant portion of the computation is taken up by computing the patch-diff of the file. This patch improves the real time it takes to output the first result quite a bit: Command: git log -L 100,200:MAINTAINERS -n 1 >/dev/null Before: 3.88 s After: 0.71 s If we drop the "-n 1" in the command, then there is no change in end-to-end process time. This is because the command still needs to walk the entire commit history, which negates the point of this patch. This is expected. As a note for future reference, the ~4.3 seconds in the old code spends ~2.6 seconds computing the patch-diffs, and the rest of the time is spent walking commits and computing diffs for which paths changed at each commit. The changed-path Bloom filters could improve the end-to-end computation time (i.e. no "-n 1" in the command). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-07diff: make diff_populate_filespec_options structJonathan Tan
The behavior of diff_populate_filespec() currently can be customized through a bitflag, but a subsequent patch requires it to support a non-boolean option. Replace the bitflag with an options struct. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-18Merge branch 'sg/line-log-tree-diff-optim'Junio C Hamano
Optimize unnecessary full-tree diff away from "git log -L" machinery. * sg/line-log-tree-diff-optim: line-log: avoid unnecessary full tree diffs line-log: extract pathspec parsing from line ranges into a helper function
2019-08-21line-log: avoid unnecessary full tree diffsSZEDER Gábor
With rename detection enabled the line-level log is able to trace the evolution of line ranges across whole-file renames [1]. Alas, to achieve that it uses the diff machinery very inefficiently, making the operation very slow [2]. And since rename detection is enabled by default, the line-level log is very slow by default. When the line-level log processes a commit with rename detection enabled, it currently does the following (see queue_diffs()): 1. Computes a full tree diff between the commit and (one of) its parent(s), i.e. invokes diff_tree_oid() with an empty 'diffopt->pathspec'. 2. Checks whether any paths in the line ranges were modified. 3. Checks whether any modified paths in the line ranges are missing in the parent commit's tree. 4. If there is such a missing path, then calls diffcore_std() to figure out whether the path was indeed renamed based on the previously computed full tree diff. 5. Continues doing stuff that are unrelated to the slowness. So basically the line-level log computes a full tree diff for each commit-parent pair in step (1) to be used for rename detection in step (4) in the off chance that an interesting path is missing from the parent. Avoid these expensive and mostly unnecessary full tree diffs by limiting the diffs to paths in the line ranges. This is much cheaper, and makes step (2) unnecessary. If it turns out that an interesting path is missing from the parent, then fall back and compute a full tree diff, so the rename detection will still work. Care must be taken when to update the pathspec used to limit the diff in case of renames. A path might be renamed on one branch and modified on several parallel running branches, and while processing commits on these branches the line-level log might have to alternate between looking at a path's new and old name. However, at any one time there is only a single 'diffopt->pathspec'. So add a step (0) to the above to ensure that the paths in the pathspec match the paths in the line ranges associated with the currently processed commit, and re-parse the pathspec from the paths in the line ranges if they differ. The new test cases include a specially crafted piece of history with two merged branches and two files, where each branch modifies both files, renames on of them, and then modifies both again. Then two separate 'git log -L' invocations check the line-level log of each of those two files, which ensures that at least one of those invocations have to do that back-and-forth between the file's old and new name (no matter which branch is traversed first). 't/t4211-line-log.sh' already contains two tests involving renames, they don't don't trigger this back-and-forth. Avoiding these unnecessary full tree diffs can have huge impact on performance, especially in big repositories with big trees and mergy history. Tracing the evolution of a function through the whole history: # git.git $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0 Before: real 0m8.874s user 0m8.816s sys 0m0.057s After: real 0m2.516s user 0m2.456s sys 0m0.060s # linux.git $ time ~/src/git/git --no-pager log \ -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2 Before: real 3m50.033s user 3m48.041s sys 0m0.300s After: real 0m2.599s user 0m2.466s sys 0m0.157s That's just over 88x speedup. [1] Line-level log's rename following is quite similar to 'git log --follow path', with the notable differences that it does handle multiple paths at once as well, and that it doesn't show the commit performing the rename if it's an exact rename. [2] This slowness might not have been apparent initially, because back when the line-level log feature was introduced rename detection was not yet enabled by default; 12da1d1f6f (Implement line-history search (git log -L), 2013-03-28) and 5404c116aa (diff: activate diff.renames by default, 2016-02-25). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-21line-log: extract pathspec parsing from line ranges into a helper functionSZEDER Gábor
A helper function to parse the paths involved in the line ranges and to turn them into a pathspec will be useful in the next patch. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-27tree-walk.c: remove the_repo from get_tree_entry()Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>