<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/diffcore.h, branch main</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=main</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=main'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2026-03-17T04:05:42Z</updated>
<entry>
<title>line-log: route -L output through the standard diff pipeline</title>
<updated>2026-03-17T04:05:42Z</updated>
<author>
<name>Michael Montalbo</name>
<email>mmontalbo@gmail.com</email>
</author>
<published>2026-03-17T02:21:33Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=86e986f166d207e1f4b80062c2befb4f94c191c4'/>
<id>urn:sha1:86e986f166d207e1f4b80062c2befb4f94c191c4</id>
<content type='text'>
`git log -L` has always bypassed the standard diff pipeline.
`dump_diff_hacky()` in line-log.c hand-rolls its own diff headers and
hunk output, which means most diff formatting options are silently
ignored.  A NEEDSWORK comment has acknowledged this since the feature
was introduced:

    /*
     * NEEDSWORK: manually building a diff here is not the Right
     * Thing(tm).  log -L should be built into the diff pipeline.
     */

Remove `dump_diff_hacky()` and its helpers and route -L output through
`builtin_diff()` / `fn_out_consume()`, the same path used by `git diff`
and `git log -p`.  The mechanism is a pair of callback wrappers that sit
between `xdi_diff_outf()` and `fn_out_consume()`, filtering xdiff's
output to only the tracked line ranges.  To ensure xdiff emits all lines
within each range as context, the context length is inflated to span the
largest range.

Wire up the `-L` implies `--patch` default in revision setup rather
than forcing it at output time, so `line_log_print()` is just
`diffcore_std()` + `diff_flush()` with no format save/restore.
Rename detection is a no-op since pairs are already resolved during
the history walk in `queue_diffs()`, but running `diffcore_std()`
means `-S`/`-G` (pickaxe), `--orderfile`, and `--diff-filter` now
work with `-L`, and `diff_resolve_rename_copy()` sets pair statuses
correctly without manual assignment.

Switch `diff_filepair_dup()` from `xmalloc` to `xcalloc` so that new
fields (including `line_ranges`) are zero-initialized by default.

As a result, diff formatting options that were previously silently
ignored (e.g. --word-diff, --no-prefix, -w, --color-moved) now work
with -L, and output gains `index` lines, `new file mode` headers, and
funcname context in `@@` headers.  This is a user-visible output change:
tools that parse -L output may need to handle the additional header
lines.

The context-length inflation means xdiff may process more output than
needed for very wide line ranges, but benchmarks on files up to 7800
lines show no measurable regression.

Signed-off-by: Michael Montalbo &lt;mmontalbo@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore.h: *.txt -&gt; *.adoc fixes</title>
<updated>2025-03-03T21:49:23Z</updated>
<author>
<name>Todd Zullinger</name>
<email>tmz@pobox.com</email>
</author>
<published>2025-03-03T20:44:16Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=3936e95a7f056877425c6faa6994d9f3712a6fb0'/>
<id>urn:sha1:3936e95a7f056877425c6faa6994d9f3712a6fb0</id>
<content type='text'>
Signed-off-by: Todd Zullinger &lt;tmz@pobox.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: improve lifecycle management of diff queues</title>
<updated>2024-09-30T18:23:05Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-09-30T09:13:45Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=a5aecb2cdc8c5f2c1501bdbe30c02959948d8442'/>
<id>urn:sha1:a5aecb2cdc8c5f2c1501bdbe30c02959948d8442</id>
<content type='text'>
The lifecycle management of diff queues is somewhat confusing:

  - For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`,
    which does not release any memory but rather initializes the queue,
    only. This is in contrast to our common naming schema, where
    "clearing" means that we release underlying memory and then
    re-initialize the data structure such that it is ready to use.

  - A second offender is `diff_free_queue()`, which does not free the
    queue structure itself. It is rather a release-style function.

Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is
replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while
`diff_free_queue()` is replaced by `diff_queue_release()`. While on it,
adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to
release underlying memory to instead call `diff_queue_clear()` to fix
memory leaks.

This memory leak is exposed by t4211, but plugging it alone does not
make the whole test suite pass.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash-ll: merge with "hash.h"</title>
<updated>2024-06-14T17:26:33Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2024-06-14T06:50:32Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8a676bdc5c3a9377322834326605b3804c7c54bf'/>
<id>urn:sha1:8a676bdc5c3a9377322834326605b3804c7c54bf</id>
<content type='text'>
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.

Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash-ll.h: split out of hash.h to remove dependency on repository.h</title>
<updated>2023-04-24T19:47:32Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-22T20:17:20Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49'/>
<id>urn:sha1:d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49</id>
<content type='text'>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository-&gt;hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash.h: move some oid-related declarations from cache.h</title>
<updated>2023-02-24T01:25:28Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:25Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=41227cb138c91fbd369ac6ee4877f253b39260cc'/>
<id>urn:sha1:41227cb138c91fbd369ac6ee4877f253b39260cc</id>
<content type='text'>
These defines and enum are all oid-related and as such seem to make
more sense being included in hash.h.  Further, moving them there
allows us to remove some includes of cache.h in other files.

The change to line-log.h might look unrelated, but line-log.h includes
diffcore.h, which previously included cache.h, which included the
kitchen sink.  Since this patch makes diffcore.h no longer include
cache.h, the compiler complains about the 'struct string_list *'
function parameter.  Add a forward declaration for struct string_list to
address this.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>line-log: free diff queue when processing non-merge commits</title>
<updated>2022-11-03T00:16:34Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2022-11-02T22:01:40Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=04ae00062d2ec301828e8df9931817be4b2f8653'/>
<id>urn:sha1:04ae00062d2ec301828e8df9931817be4b2f8653</id>
<content type='text'>
When processing a non-merge commit, the line-level log first asks the
tree-diff machinery whether any of the files in the given line ranges
were modified between the current commit and its parent, and if some
of them were, then it loads the contents of those files from both
commits to see whether their line ranges were modified and/or need to
be adjusted.  Alas, it doesn't free() the diff queue holding the
results of that query and the contents of those files once its done.
This can add up to a substantial amount of leaked memory, especially
when the file in question is big and is frequently modified: a user
reported "Out of memory, malloc failed" errors with a 2MB text file
that was modified ~2800 times [1] (I estimate the leak would use up
almost 11GB memory in that case).

Free that diff queue to plug this memory leak.  However, instead of
simply open-coding the necessary three lines, add them as a helper
function to the diff API, because it will be useful elsewhere as well.

[1] https://public-inbox.org/git/CAFOPqVXz2XwzX8vGU7wLuqb2ZuwTuOFAzBLRM_QPk+NJa=eC-g@mail.gmail.com/

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
</content>
</entry>
<entry>
<title>merge-ort: store filepairs and filespecs in our mem_pool</title>
<updated>2021-07-30T16:01:19Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-30T11:47:42Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=f239fff4c183b231134115bc3b38b4f8e3982d3e'/>
<id>urn:sha1:f239fff4c183b231134115bc3b38b4f8e3982d3e</id>
<content type='text'>
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc</title>
<updated>2021-07-30T16:01:19Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2021-07-30T11:47:41Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=a8791ef6492bf702ba70352009cbd28fb581c6e2'/>
<id>urn:sha1:a8791ef6492bf702ba70352009cbd28fb581c6e2</id>
<content type='text'>
We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt-&gt;priv-&gt;pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/ort-perf-batch-11'</title>
<updated>2021-06-14T04:33:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-06-14T04:33:26Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=169914ede2aa205c5500c7be0501889a8962dc24'/>
<id>urn:sha1:169914ede2aa205c5500c7be0501889a8962dc24</id>
<content type='text'>
Optimize out repeated rename detection in a sequence of mergy
operations.

* en/ort-perf-batch-11:
  merge-ort, diffcore-rename: employ cached renames when possible
  merge-ort: handle interactions of caching and rename/rename(1to1) cases
  merge-ort: add helper functions for using cached renames
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: avoid accidental API mis-use
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: populate caches of rename detection results
  merge-ort: add data structures for in-memory caching of rename detection
  t6429: testcases for remembering renames
  fast-rebase: write conflict state to working tree, index, and HEAD
  fast-rebase: change assert() to BUG()
  Documentation/technical: describe remembering renames optimization
  t6423: rename file within directory that other side renamed
</content>
</entry>
</feed>
