<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/range-diff.c, branch main</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=main</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=main'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2026-03-17T18:08:24Z</updated>
<entry>
<title>apply: report input location in header parsing errors</title>
<updated>2026-03-17T18:08:24Z</updated>
<author>
<name>Jialong Wang</name>
<email>jerrywang183@yahoo.com</email>
</author>
<published>2026-03-17T16:23:20Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=838c0ba5004b37009d8d2649ed1be7b77770c2f0'/>
<id>urn:sha1:838c0ba5004b37009d8d2649ed1be7b77770c2f0</id>
<content type='text'>
Several header parsing errors in apply.c still report only line
numbers. When applying more than one input, that does not tell the
user which input the line belongs to.

Report the patch input location for these header parsing errors, and
update the related tests.

While touching parse_git_diff_header(), update the helper state to use
the current header line when reporting these errors.

Signed-off-by: Jialong Wang &lt;jerrywang183@yahoo.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>range-diff: rename other_arg to log_arg</title>
<updated>2025-09-25T18:34:11Z</updated>
<author>
<name>Kristoffer Haugsbakk</name>
<email>code@khaugsbakk.name</email>
</author>
<published>2025-09-25T17:07:34Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=71fd6c695cd9fc9cc0a829d1579c7584c2ad9e18'/>
<id>urn:sha1:71fd6c695cd9fc9cc0a829d1579c7584c2ad9e18</id>
<content type='text'>
Rename `other_arg` to `log_arg` in `range_diff_options` and
related places.

“Other argument” comes from bd361918 (range-diff: pass through --notes
to `git log`, 2019-11-20) which introduced Git notes handling to
git-range-diff(1) by passing that option on to git-log(1). And that kind
of name might be fine in a local context. However, it was initially
spread among multiple files, and is now[1] part of the
`range_diff_options` struct. It is, prima facie, difficult to guess what
“other” means, especially when just looking at the struct.

But with a little reading we find out that it is used for `--[no-]notes`
and `--diff-merges`, which are both passed on to git-log(1). We should
just rename it to reflect this role; `log_arg` suggests, along with the
`strvec` type, that it is used to pass extra arguments to git-log(1).

† 1: since f1ce6c19 (range-diff: combine all options in a single data
     structure, 2021-02-05)

Suggested-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Kristoffer Haugsbakk &lt;code@khaugsbakk.name&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>range-diff: add configurable memory limit for cost matrix</title>
<updated>2025-08-29T16:46:07Z</updated>
<author>
<name>Paulo Casaretto</name>
<email>pcasaretto@gmail.com</email>
</author>
<published>2025-08-29T16:02:54Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=00727249ec8404c68391ec58e9c9f0d8a88d5ca0'/>
<id>urn:sha1:00727249ec8404c68391ec58e9c9f0d8a88d5ca0</id>
<content type='text'>
When comparing large commit ranges (e.g., 250,000+ commits), range-diff
attempts to allocate an n×n cost matrix that can exhaust available
memory. For example, with 256,784 commits (n = 513,568), the matrix
would require approximately 256GB of memory (513,568² × 4 bytes),
causing either immediate segmentation faults due to integer overflow or
system hangs.

Add a memory limit check in get_correspondences() before allocating the
cost matrix. This check uses the total size in bytes (n² × sizeof(int))
and compares it against a configurable maximum, preventing both
excessive memory usage and integer overflow issues.

The limit is configurable via a new --max-memory option that accepts
human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit
systems and 2GB for 32 bit systems. This allows comparing ranges of
approximately 32,000 (16,000) commits - generous for real-world use cases
while preventing impractical operations.

When the limit is exceeded, range-diff now displays a clear error
message showing both the requested memory size and the maximum allowed,
formatted in human-readable units for better user experience.

Example usage:
  git range-diff --max-memory=1G branch1...branch2
  git range-diff --max-memory=500M base..topic1 base..topic2

This approach was chosen over alternatives:
- Pre-counting commits: Would require spawning additional git processes
  and reading all commits twice
- Limiting by commit count: Less precise than actual memory usage
- Streaming approach: Would require significant refactoring of the
  current algorithm

This issue was previously discussed in:
https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@gmail.com/

Acked-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Paulo Casaretto &lt;pcasaretto@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'js/range-diff-diff-merges'</title>
<updated>2024-12-23T17:32:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-12-23T17:32:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=88e59f8027ed0260584ccc0abd6fe435031614eb'/>
<id>urn:sha1:88e59f8027ed0260584ccc0abd6fe435031614eb</id>
<content type='text'>
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.

* js/range-diff-diff-merges:
  range-diff: introduce the convenience option `--remerge-diff`
  range-diff: optionally include merge commits' diffs in the analysis
</content>
</entry>
<entry>
<title>range-diff: optionally include merge commits' diffs in the analysis</title>
<updated>2024-12-16T16:45:48Z</updated>
<author>
<name>Johannes Schindelin</name>
<email>johannes.schindelin@gmx.de</email>
</author>
<published>2024-12-16T14:11:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=f8043236c6c9cb9e943a87ab2e55e8e394796727'/>
<id>urn:sha1:f8043236c6c9cb9e943a87ab2e55e8e394796727</id>
<content type='text'>
The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=&lt;format&gt;` option.

Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between commit ranges that contain
merges.

This is especially true in scenarios with non-trivial merges, i.e.
merges introducing changes other than, or in addition to, what merge ORT
would have produced. Merging a topic branch that changes a function
signature into a branch that added a caller of that function, for
example, would require the merge commit itself to adjust that caller to
the modified signature.

In my code reviews, I found the `--diff-merges=remerge` option
particularly useful.

Signed-off-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>global: mark code units that generate warnings with `-Wsign-compare`</title>
<updated>2024-12-06T11:20:02Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-12-06T10:27:19Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=41f43b8243f42b9df2e98be8460646d4c0100ad3'/>
<id>urn:sha1:41f43b8243f42b9df2e98be8460646d4c0100ad3</id>
<content type='text'>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/output-prefix-cleanup'</title>
<updated>2024-10-10T21:22:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2024-10-10T21:22:29Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=3eb4cc451ed97123ff76e183a5be8a7dc164d1f6'/>
<id>urn:sha1:3eb4cc451ed97123ff76e183a5be8a7dc164d1f6</id>
<content type='text'>
Code clean-up.

* jk/output-prefix-cleanup:
  diff: store graph prefix buf in git_graph struct
  diff: return line_prefix directly when possible
  diff: return const char from output_prefix callback
  diff: drop line_prefix_length field
  line-log: use diff_line_prefix() instead of custom helper
</content>
</entry>
<entry>
<title>diff: return const char from output_prefix callback</title>
<updated>2024-10-03T21:22:22Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2024-10-03T21:09:24Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=436728fe9d75d05fa2439f867ca2039012b86e69'/>
<id>urn:sha1:436728fe9d75d05fa2439f867ca2039012b86e69</id>
<content type='text'>
The diff_options structure has an output_prefix callback for returning a
prefix string, but it does so by returning a pointer to a strbuf.

This makes the interface awkward. There's no reason the callback should
need to use a strbuf, and it creates questions about whether the
ownership of the resulting buffer should be transferred to the caller
(it should not be, but a recent attempt to clean up this code led to a
double-free in some cases).

The one advantage we get is that the strbuf contains a ptr/len pair, so
we could in theory have a prefix with embedded NULs. But we can observe
that none of the existing callbacks would ever produce such a NUL (they
are usually just indentation or graph symbols, and even the
"--line-prefix" option takes a NUL-terminated string).

And anyway, only one caller (the one in log_tree_diff_flush) actually
looks at the strbuf length. In every other case we use a helper function
which discards the length and just returns the NUL-terminated string.

So let's just have the callback return a "const char *" pointer. It's up
to the callbacks themselves if they want to use a strbuf under the hood.
And now the caller in log_tree_diff_flush() can just use the helper
function along with everybody else. That lets us even simplify out the
function pointer check, since the helper returns an empty string
(technically this does mean we'll sometimes issue an empty fputs() call,
but I don't think this code path is hot enough to care about that).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>userdiff: fix leaking memory for configured diff drivers</title>
<updated>2024-08-14T17:08:01Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-08-14T06:52:50Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=38678e5df55b77e98b47e7847faf83451eb161de'/>
<id>urn:sha1:38678e5df55b77e98b47e7847faf83451eb161de</id>
<content type='text'>
The userdiff structures may be initialized either statically on the
stack or dynamically via configuration keys. In the latter case we end
up leaking memory because we didn't have any infrastructure to discern
those strings which have been allocated statically and those which have
been allocated dynamically.

Refactor the code such that we have two pointers for each of these
strings: one that holds the value as accessed by other subsystems, and
one that points to the same string in case it has been allocated. Like
this, we can safely free the second pointer and thus plug those memory
leaks.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
