<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diff-lib.c, branch main</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=main</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=main'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2026-02-13T21:39:25Z</updated>
<entry>
<title>Merge branch 'ps/commit-list-functions-renamed'</title>
<updated>2026-02-13T21:39:25Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-02-13T21:39:25Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=528820243334c55e015047477e720a14bc9cf25f'/>
<id>urn:sha1:528820243334c55e015047477e720a14bc9cf25f</id>
<content type='text'>
Rename three functions around the commit_list data structure.

* ps/commit-list-functions-renamed:
  commit: rename `free_commit_list()` to conform to coding guidelines
  commit: rename `reverse_commit_list()` to conform to coding guidelines
  commit: rename `copy_commit_list()` to conform to coding guidelines
</content>
</entry>
<entry>
<title>commit: rename `free_commit_list()` to conform to coding guidelines</title>
<updated>2026-01-15T13:32:31Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-01-15T09:35:34Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=9f18d089c51fba2776fe1fece877a359c47417f7'/>
<id>urn:sha1:9f18d089c51fba2776fe1fece877a359c47417f7</id>
<content type='text'>
Our coding guidelines say that:

  Functions that operate on `struct S` are named `S_&lt;verb&gt;()` and should
  generally receive a pointer to `struct S` as first parameter.

While most of the functions related to `struct commit_list` already
follow that naming schema, `free_commit_list()` doesn't.

Rename the function to address this and adjust all of its callers. Add a
compatibility wrapper for the old function name to ease the transition
and avoid any semantic conflicts with in-flight patch series. This
wrapper will be removed once Git 2.53 has been released.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>cocci: convert parse_tree functions to repo_ variants</title>
<updated>2026-01-10T02:36:18Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2026-01-09T21:30:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ec7a16b14551fed736ecfe0a9d4f6d6f9e03be79'/>
<id>urn:sha1:ec7a16b14551fed736ecfe0a9d4f6d6f9e03be79</id>
<content type='text'>
Add and apply a semantic patch to convert calls to parse_tree() and
friends to the corresponding variant that takes a repository argument,
to allow the functions that implicitly use the_repository to be retired
once all potential in-flight topics are settled and converted as well.

The changes in .c files were generated by Coccinelle, but I fixed a
whitespace bug it would have introduced to builtin/commit.c.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-files: fix copy detection</title>
<updated>2025-12-16T01:23:26Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2025-12-14T15:57:06Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=f293bdcc29f91e3e56c478473a85a8e13e6fd87c'/>
<id>urn:sha1:f293bdcc29f91e3e56c478473a85a8e13e6fd87c</id>
<content type='text'>
Copy detection cannot work when comparing the index to the working tree
because Git ignores files that it is not explicitly told to track.  It
should work in the other direction, though, i.e. for a reverse diff of
the deletion of a copy from the index.

d1f2d7e8ca (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19) broke it with a seemingly stray change to run_diff_files().

We didn't notice because there's no test for that.  But even if we had
one, it might have gone unnoticed because the breakage only happens
with index preloading, which requires at least 1000 entries (more than
most test repos have) and is racy because it runs in parallel with the
actual command.

Fix copy detection by queuing up-to-date and skip-worktree entries using
diff_same().

While at it, use diff_same() also for queuing unchanged files not
flagged as up-to-date, i.e. clean submodules and entries where
preloading was not done at all or not quickly enough.  It uses less
memory than diff_change() and doesn't unnecessarily set the diff flag
has_changes.

Add two tests to cover running both without and with preloading.  The
first one passes reliably with the original code.  The second one
enables preloading and thus is racy.  It has a good chance to pass even
without the fix, but fails within seconds when running the test script
with --stress.  With the fix it runs fine for several minutes, until
my patience runs out.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff-index: don't queue unchanged filepairs with diff_change()</title>
<updated>2025-11-30T17:58:53Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2025-11-30T11:47:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=38f88051dae6ddb2f1cdb9c7415d4ba6caef04af'/>
<id>urn:sha1:38f88051dae6ddb2f1cdb9c7415d4ba6caef04af</id>
<content type='text'>
diff_cache() queues unchanged filepairs if the flag find_copies_harder
is set, and uses diff_change() for that.  This function allocates a
filespec for each side, does a few other things that are unnecessary for
unchanged filepairs and always sets the diff_flag has_changes, which is
simply misleading in this case.

Add a new streamlined function for queuing unchanged filepairs and
use it in show_modified(), which is called by diff_cache() via
oneway_diff() and do_oneway_diff().  It allocates only a single filespec
for each filepair and uses it twice with reference counting.  This has a
measurable effect if there are a lot of them, like in the Linux repo:

Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
  Time (mean ± σ):      31.8 ms ±   0.2 ms    [User: 24.2 ms, System: 6.3 ms]
  Range (min … max):    31.5 ms …  32.3 ms    85 runs

Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder
  Time (mean ± σ):      23.9 ms ±   0.2 ms    [User: 18.1 ms, System: 4.6 ms]
  Range (min … max):    23.5 ms …  24.4 ms    111 runs

Summary
  ./git -C ../linux diff --cached --find-copies-harder ran
    1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: teach tree-diff a max-depth parameter</title>
<updated>2025-08-07T22:29:35Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-08-07T20:52:58Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=a1dfa5448d583bbfd1ec45642a4495ad499970c9'/>
<id>urn:sha1:a1dfa5448d583bbfd1ec45642a4495ad499970c9</id>
<content type='text'>
When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".

This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:

  git log --raw --max-depth=1 -- a/b/c

and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).

Co-authored-by: Toon Claes &lt;toon@iotcl.com&gt;
Signed-off-by: Toon Claes &lt;toon@iotcl.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: stop depending on `the_repository` in `null_oid()`</title>
<updated>2025-03-10T20:16:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2025-03-10T07:13:31Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e'/>
<id>urn:sha1:7d70b29c4f0b2fd3c6698956d9fb4026632d9c6e</id>
<content type='text'>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.

This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.

There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:

  - "builtin/grep.c"
  - "grep.c"
  - "refs/debug.c"

There are also two non-trivial exceptions:

  - "diff-no-index.c": Here we know that we may not have a repository
    initialized at all, so we cannot rely on `the_repository`. Instead,
    we adapt `diff_no_index()` to get a `struct git_hash_algo` as
    parameter. The only caller is located in "builtin/diff.c", where we
    know to call `repo_set_hash_algo()` in case we're running outside of
    a Git repository. Consequently, it is fine to continue passing
    `the_repository-&gt;hash_algo` even in this case.

  - "builtin/ls-files.c": There is an in-flight patch series that drops
    `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
    conflict because we use `null_oid()` in `show_submodule()`. The
    value is passed to `repo_submodule_init()`, which may use the object
    ID to resolve a tree-ish in the superproject from which we want to
    read the submodule config. As such, the object ID should refer to an
    object in the superproject, and consequently we need to use its hash
    algorithm.

    This means that we could in theory just not bother about this edge
    case at all and just use `the_repository` in "diff-no-index.c". But
    doing so would feel misdesigned.

Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run_diff_files(): de-mystify the size of combine_diff_path struct</title>
<updated>2025-01-09T20:24:24Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:44:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ca3abe41d71c4789ca00cba0ca2b6c22d67f08a3'/>
<id>urn:sha1:ca3abe41d71c4789ca00cba0ca2b6c22d67f08a3</id>
<content type='text'>
We allocate a combine_diff_path struct with space for 5 parents. Why 5?

The history is not particularly enlightening. The allocation comes from
b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which
just switched to xmalloc from a stack struct with 5 elements. That
struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode
changes as well., 2006-02-06), when we also moved from storing raw sha1
bytes to the combine_diff_parent struct. But no explanation is given.
That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and
--cc options., 2006-01-28).

One might guess it is for the 4 stages we can store in the index. But
this code path only ever diffs the current state against stages 2 and 3.
So we only need two slots.

And it's easy to see this is still the case. We fill the parent slots by
subtracting 2 from the ce_stage() values, ignoring values below 2. And
since ce_stage() is only 2 bits, there are 4 values, and thus we need 2
slots.

Let's use the correct value (saving a tiny bit of memory) and add a
comment explaining what's going on (saving a tiny bit of programmer
brain power).

Arguably we could use:

  1 + (STAGEMASK &gt;&gt; STAGESHIFT) - 2

which lets the compiler enforce that we will not go out-of-bounds if we
see an unexpected value from ce_stage(). But that is more confusing to
explain, and the constant "2" is baked into other parts of the function.
It is a fundamental constant, not something where somebody might bump a
macro and forget to update this code.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>combine-diff: add combine_diff_path_new()</title>
<updated>2025-01-09T17:57:44Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:32:36Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=706779344155823518745a19515601905877c41f'/>
<id>urn:sha1:706779344155823518745a19515601905877c41f</id>
<content type='text'>
The combine_diff_path struct has variable size, since it embeds both the
memory allocation for the path field as well as a variable-sized parent
array. This makes allocating one a bit tricky.

We have a helper to compute the required size, but it's up to individual
sites to actually initialize all of the fields. Let's provide a
constructor function to make that a little nicer. Besides being shorter,
it also hides away tricky bits like the computation of the "path"
pointer (which is right after the "parent" flex array).

As a bonus, using the same constructor everywhere means that we'll
consistently initialize all parts of the struct. A few code paths left
the parent array unitialized. This didn't cause any bugs, but we'll be
able to simplify some code in the next few patches knowing that the
parent fields have all been zero'd.

This also gets rid of some questionable uses of "int" to store buffer
lengths. Though we do use them to allocate, I don't think there are any
integer overflow vulnerabilities here (the allocation helper promotes
them to size_t and checks arithmetic for overflow, and the actual memcpy
of the bytes is done using the possibly-truncated "int" value).

Sadly we can't use the FLEX_* macros to simplify the allocation here,
because there are two variable-sized parts to the struct (and those
macros only handle one).

Nor can we get stop publicly declaring combine_diff_path_size(). This
patch does not touch the code in path_appendnew() at all, which is not
ready to be moved to our new constructor for a few reasons:

  - path_appendnew() has a memory-reuse optimization where it tries to
    reuse combine_diff_path structs rather than freeing and
    reallocating.

  - path_appendnew() does not create the struct from a single path
    string, but rather allocates and copies into the buffer from
    multiple sources.

These can be addressed by some refactoring, but let's leave it as-is for
now.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>run_diff_files(): delay allocation of combine_diff_path</title>
<updated>2025-01-09T17:56:28Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2025-01-09T08:28:18Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=949bb8f74f6db7405d6ad8bbf02ebc42a947801d'/>
<id>urn:sha1:949bb8f74f6db7405d6ad8bbf02ebc42a947801d</id>
<content type='text'>
While looping over the index entries, when we see a higher level stage
the first thing we do is allocate a combine_diff_path struct for it. But
this can leak; if check_removed() returns an error, we'll continue to
the next iteration of the loop without cleaning up.

We can fix this by just delaying the allocation by a few lines.

I don't think this leak is triggered in the test suite, but it's pretty
easy to see by inspection. My ulterior motive here is that the delayed
allocation means we have all of the data needed to initialize "dpath" at
the time of malloc, making it easier to factor out a constructor
function.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
