<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/tree-diff.c, branch gitk-resize-error</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=gitk-resize-error</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=gitk-resize-error'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2021-12-07T20:31:17Z</updated>
<entry>
<title>pathspec: use BUG(...) not die("BUG:%s:%d....", &lt;file&gt;, &lt;line&gt;)</title>
<updated>2021-12-07T20:31:17Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2021-12-07T11:05:53Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=a78537a0f26615b69b680845cfbb765cc665c808'/>
<id>urn:sha1:a78537a0f26615b69b680845cfbb765cc665c808</id>
<content type='text'>
Change code that was added in 8f4f8f4579f (guard against new pathspec
magic in pathspec matching code, 2013-07-14) to use the BUG() macro
instead of emitting a "fatal" message with the "__FILE__"-name and
"__LINE__"-numbers.

The original code predated the existence of the BUG() function, which
was added in d8193743e08 (usage.c: add BUG() function, 2017-05-12).

Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-diff: fix leak when not HAVE_ALLOCA_H</title>
<updated>2021-09-16T20:43:42Z</updated>
<author>
<name>Carlo Marcelo Arenas Belón</name>
<email>carenas@gmail.com</email>
</author>
<published>2021-09-16T08:55:22Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=637799bf0ab72a509e1f2b29ee6ab3367eefbff9'/>
<id>urn:sha1:637799bf0ab72a509e1f2b29ee6ab3367eefbff9</id>
<content type='text'>
b8ba412bf7 (tree-diff: avoid alloca for large allocations, 2016-06-07)
adds a way to route some bigger allocations out of the stack and free
them through the addition of two conveniently named macros, but leaves
the calls to free the xalloca part, which could be also in the heap,
if the system doesn't HAVE_ALLOCA_H (ex: macOS and other BSD).

Add the missing free call, xalloca_free(), which is a noop if we
allocated memory in the stack frame, but a real free() if we
allocated in the heap instead, and while at it, change the expression
to match in both macros for ease of readability.

This avoids a leak reported by LSAN while running t0000 but that
wouldn't fail the test (which is fixed in the next patch):

  SUMMARY: LeakSanitizer: 1034 byte(s) leaked in 15 allocation(s).

Signed-off-by: Carlo Marcelo Arenas Belón &lt;carenas@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash: provide per-algorithm null OIDs</title>
<updated>2021-04-27T07:31:39Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2021-04-26T01:02:56Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=14228447c9ce664a4e9c31ba10344ec5e4ea4ba5'/>
<id>urn:sha1:14228447c9ce664a4e9c31ba10344ec5e4ea4ba5</id>
<content type='text'>
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bloom/diff: properly short-circuit on max_changes</title>
<updated>2020-09-17T16:31:25Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-09-16T18:07:52Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=b16a8277644e4b1d21c08d97de757105039dc7ae'/>
<id>urn:sha1:b16a8277644e4b1d21c08d97de757105039dc7ae</id>
<content type='text'>
Commit e3696980 (diff: halt tree-diff early after max_changes,
2020-03-30) intended to create a mechanism to short-circuit a diff
calculation after a certain number of paths were modified. By
incrementing a "num_changes" counter throughout the recursive
ll_diff_tree_paths(), this was supposed to match the number of changes
that would be written into the changed-path Bloom filters.
Unfortunately, this was not implemented correctly and instead misses
simple cases like file modifications. This then does not stop very
large changed-path filters from being written (unless they add or remove
many files).

To start, change the implementation in ll_diff_tree_paths() to instead
use the global diff_queue_diff struct's 'nr' member as the count. This
is a way to simplify the logic instead of making more mistakes in the
complicated diff code.

This has a drawback: the diff_queue_diff struct only lists the paths
corresponding to blob changes, not their leading directories. Thus,
get_or_compute_bloom_filter() needs an additional check to see if the
hashmap with the leading directories becomes too large.

One reason why this was not caught by test cases was that the test in
t4216-log-bloom.sh that was supposed to check this "too many changes"
condition only checked this on the initial commit of a repository. The
old logic counted these values correctly. Update this test in a few
ways:

1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit,
   allowing smaller commits to engage with this logic.

2. Create several interesting cases of edits, adds, removes, and mode
   changes (in the second commit). By testing both sides of the
   inequality with the *_MAX_CHANGED_PATHS variable, we can see that
   the count is exactly correct, so none of these changes are missed
   or over-counted.

3. Use the trace2 data value filter_found_large to verify that these
   commits are on the correct side of the limit.

Another way to verify the behavior is correct is through performance
tests. By testing on my local copies of the Git repository and the Linux
kernel repository, I could measure the effect of these short-circuits
when computing a fresh commit-graph file with changed-path Bloom filters
using the command

  GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \
    git commit-graph write --reachable --changed-paths

and reporting the wall time and resulting commit-graph size.

For Git, the results are

|        |      N=1       |       N=10     |      N=512     |
|--------|----------------|----------------|----------------|
| HEAD~1 | 10.90s  9.18MB | 11.11s  9.34MB | 11.31s  9.35MB |
| HEAD   |  9.21s  8.62MB | 11.11s  9.29MB | 11.29s  9.34MB |

For Linux, the results are

|        |       N=1      |     N=20      |     N=512     |
|--------|----------------|---------------|---------------|
| HEAD~1 | 61.28s  64.3MB | 76.9s  72.6MB | 77.6s  72.6MB |
| HEAD   | 49.44s  56.3MB | 68.7s  65.9MB | 69.2s  65.9MB |

Naturally, the improvement becomes much less as the limit grows, as
fewer commits satisfy the short-circuit.

Reported-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff.h: drop diff_tree_oid() &amp; friends' return value</title>
<updated>2020-06-08T19:28:49Z</updated>
<author>
<name>SZEDER Gábor</name>
<email>szeder.dev@gmail.com</email>
</author>
<published>2020-06-05T13:00:27Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=0ee3cb888dc4da900b2e701ef8106205d8963ffb'/>
<id>urn:sha1:0ee3cb888dc4da900b2e701ef8106205d8963ffb</id>
<content type='text'>
ll_diff_tree_oid() has only ever returned 0 [1], so it's return value
is basically useless.  It's only caller diff_tree_oid() has only ever
returned the return value of ll_diff_tree_oid() as-is [2], so its
return value is just as useless.  Most of diff_tree_oid()'s callers
simply ignore its return value, except:

  - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and
    returns with its return value, but all of diff_root_tree_oid()'s
    callers ignore its return value.

  - rev_compare_tree() and rev_same_tree_as_empty() do look at the
    return value in a condition, but, since the return value is always
    0, the former's &lt; 0 condition is never fulfilled, while the
    latter's &gt;= 0 condition is always fulfilled.

So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid()
and diff_root_tree_oid(), and drop those conditions from
rev_compare_tree() and rev_same_tree_as_empty() as well.

[1] ll_diff_tree_oid() and its ancestors have been returning only 0
    ever since it was introduced as diff_tree() in 9174026cfe (Add
    "diff-tree" program to show which files have changed between two
    trees., 2005-04-09).
[2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as
    well.

Signed-off-by: SZEDER Gábor &lt;szeder.dev@gmail.com&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>diff: halt tree-diff early after max_changes</title>
<updated>2020-03-30T16:59:53Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-03-30T00:31:27Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=e3696980163bdbd3bc56e5ffc69e8770015f366f'/>
<id>urn:sha1:e3696980163bdbd3bc56e5ffc69e8770015f366f</id>
<content type='text'>
When computing the changed-paths bloom filters for the commit-graph,
we limit the size of the filter by restricting the number of paths
in the diff. Instead of computing a large diff and then ignoring the
result, it is better to halt the diff computation early.

Create a new "max_changes" option in struct diff_options. If non-zero,
then halt the diff computation after discovering strictly more changed
paths. This includes paths corresponding to trees that change.

Use this max_changes option in the bloom filter calculations. This
reduces the time taken to compute the filters for the Linux kernel
repo from 2m50s to 2m35s. On a large internal repository with ~500
commits that perform tree-wide changes, the time reduced from
6m15s to 3m48s.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Garima Singh &lt;garima.singh@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tree-walk.c: remove the_repo from fill_tree_descriptor()</title>
<updated>2019-06-27T19:45:17Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2019-06-27T09:28:48Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=5e575807337d15dbecf88767e709df08304e3b94'/>
<id>urn:sha1:5e575807337d15dbecf88767e709df08304e3b94</id>
<content type='text'>
While at there, clean up the_repo usage in builtin/merge-tree.c a tiny
bit.

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Use 'unsigned short' for mode, like diff_filespec does</title>
<updated>2019-04-08T07:02:07Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2019-04-05T15:00:12Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=5ec1e72823735b5682389589b6bee774ae70fa49'/>
<id>urn:sha1:5ec1e72823735b5682389589b6bee774ae70fa49</id>
<content type='text'>
struct diff_filespec defines mode to be an 'unsigned short'.  Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int).  This caused
problems when taking addresses, so switch to unsigned short.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bc/tree-walk-oid'</title>
<updated>2019-01-29T20:47:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-01-29T20:47:56Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=371820d5f1bb3c3e691ad21cee652c02c36ea758'/>
<id>urn:sha1:371820d5f1bb3c3e691ad21cee652c02c36ea758</id>
<content type='text'>
The code to walk tree objects has been taught that we may be
working with object names that are not computed with SHA-1.

* bc/tree-walk-oid:
  cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  tree-walk: store object_id in a separate member
  match-trees: use hashcpy to splice trees
  match-trees: compute buffer offset correctly when splicing
  tree-walk: copy object ID before use
</content>
</entry>
<entry>
<title>tree-walk: store object_id in a separate member</title>
<updated>2019-01-15T17:57:41Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2019-01-15T00:39:44Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ea82b2a0857e3e0449bdce4e3987dee6adbc51ae'/>
<id>urn:sha1:ea82b2a0857e3e0449bdce4e3987dee6adbc51ae</id>
<content type='text'>
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
