git/tree-diff.c, branch gitk-resize-error

pathspec: use BUG(...) not die("BUG:%s:%d....", , )

2021-12-07T20:31:17Z

Change code that was added in 8f4f8f4579f (guard against new pathspec magic in pathspec matching code, 2013-07-14) to use the BUG() macro instead of emitting a "fatal" message with the "__FILE__"-name and "__LINE__"-numbers. The original code predated the existence of the BUG() function, which was added in d8193743e08 (usage.c: add BUG() function, 2017-05-12). Signed-off-by: Junio C Hamano Signed-off-by: Ævar Arnfjörð Bjarmason Signed-off-by: Junio C Hamano

tree-diff: fix leak when not HAVE_ALLOCA_H

2021-09-16T20:43:42Z

b8ba412bf7 (tree-diff: avoid alloca for large allocations, 2016-06-07) adds a way to route some bigger allocations out of the stack and free them through the addition of two conveniently named macros, but leaves the calls to free the xalloca part, which could be also in the heap, if the system doesn't HAVE_ALLOCA_H (ex: macOS and other BSD). Add the missing free call, xalloca_free(), which is a noop if we allocated memory in the stack frame, but a real free() if we allocated in the heap instead, and while at it, change the expression to match in both macros for ease of readability. This avoids a leak reported by LSAN while running t0000 but that wouldn't fail the test (which is fixed in the next patch): SUMMARY: LeakSanitizer: 1034 byte(s) leaked in 15 allocation(s). Signed-off-by: Carlo Marcelo Arenas Belón Signed-off-by: Junio C Hamano

hash: provide per-algorithm null OIDs

2021-04-27T07:31:39Z

Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano

bloom/diff: properly short-circuit on max_changes

2020-09-17T16:31:25Z

Commit e3696980 (diff: halt tree-diff early after max_changes, 2020-03-30) intended to create a mechanism to short-circuit a diff calculation after a certain number of paths were modified. By incrementing a "num_changes" counter throughout the recursive ll_diff_tree_paths(), this was supposed to match the number of changes that would be written into the changed-path Bloom filters. Unfortunately, this was not implemented correctly and instead misses simple cases like file modifications. This then does not stop very large changed-path filters from being written (unless they add or remove many files). To start, change the implementation in ll_diff_tree_paths() to instead use the global diff_queue_diff struct's 'nr' member as the count. This is a way to simplify the logic instead of making more mistakes in the complicated diff code. This has a drawback: the diff_queue_diff struct only lists the paths corresponding to blob changes, not their leading directories. Thus, get_or_compute_bloom_filter() needs an additional check to see if the hashmap with the leading directories becomes too large. One reason why this was not caught by test cases was that the test in t4216-log-bloom.sh that was supposed to check this "too many changes" condition only checked this on the initial commit of a repository. The old logic counted these values correctly. Update this test in a few ways: 1. Use GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS to reduce the limit, allowing smaller commits to engage with this logic. 2. Create several interesting cases of edits, adds, removes, and mode changes (in the second commit). By testing both sides of the inequality with the *_MAX_CHANGED_PATHS variable, we can see that the count is exactly correct, so none of these changes are missed or over-counted. 3. Use the trace2 data value filter_found_large to verify that these commits are on the correct side of the limit. Another way to verify the behavior is correct is through performance tests. By testing on my local copies of the Git repository and the Linux kernel repository, I could measure the effect of these short-circuits when computing a fresh commit-graph file with changed-path Bloom filters using the command GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS=N time \ git commit-graph write --reachable --changed-paths and reporting the wall time and resulting commit-graph size. For Git, the results are | | N=1 | N=10 | N=512 | |--------|----------------|----------------|----------------| | HEAD~1 | 10.90s 9.18MB | 11.11s 9.34MB | 11.31s 9.35MB | | HEAD | 9.21s 8.62MB | 11.11s 9.29MB | 11.29s 9.34MB | For Linux, the results are | | N=1 | N=20 | N=512 | |--------|----------------|---------------|---------------| | HEAD~1 | 61.28s 64.3MB | 76.9s 72.6MB | 77.6s 72.6MB | | HEAD | 49.44s 56.3MB | 68.7s 65.9MB | 69.2s 65.9MB | Naturally, the improvement becomes much less as the limit grows, as fewer commits satisfy the short-circuit. Reported-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau Signed-off-by: Junio C Hamano

diff.h: drop diff_tree_oid() & friends' return value

2020-06-08T19:28:49Z

ll_diff_tree_oid() has only ever returned 0 [1], so it's return value is basically useless. It's only caller diff_tree_oid() has only ever returned the return value of ll_diff_tree_oid() as-is [2], so its return value is just as useless. Most of diff_tree_oid()'s callers simply ignore its return value, except: - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and returns with its return value, but all of diff_root_tree_oid()'s callers ignore its return value. - rev_compare_tree() and rev_same_tree_as_empty() do look at the return value in a condition, but, since the return value is always 0, the former's < 0 condition is never fulfilled, while the latter's >= 0 condition is always fulfilled. So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid() and diff_root_tree_oid(), and drop those conditions from rev_compare_tree() and rev_same_tree_as_empty() as well. [1] ll_diff_tree_oid() and its ancestors have been returning only 0 ever since it was introduced as diff_tree() in 9174026cfe (Add "diff-tree" program to show which files have changed between two trees., 2005-04-09). [2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as well. Signed-off-by: SZEDER Gábor Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano

diff: halt tree-diff early after max_changes

2020-03-30T16:59:53Z

When computing the changed-paths bloom filters for the commit-graph, we limit the size of the filter by restricting the number of paths in the diff. Instead of computing a large diff and then ignoring the result, it is better to halt the diff computation early. Create a new "max_changes" option in struct diff_options. If non-zero, then halt the diff computation after discovering strictly more changed paths. This includes paths corresponding to trees that change. Use this max_changes option in the bloom filter calculations. This reduces the time taken to compute the filters for the Linux kernel repo from 2m50s to 2m35s. On a large internal repository with ~500 commits that perform tree-wide changes, the time reduced from 6m15s to 3m48s. Signed-off-by: Derrick Stolee Signed-off-by: Garima Singh Signed-off-by: Junio C Hamano

tree-walk.c: remove the_repo from fill_tree_descriptor()

2019-06-27T19:45:17Z

While at there, clean up the_repo usage in builtin/merge-tree.c a tiny bit. Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Junio C Hamano

Use 'unsigned short' for mode, like diff_filespec does

2019-04-08T07:02:07Z

struct diff_filespec defines mode to be an 'unsigned short'. Several other places in the API which we'd like to interact with using a diff_filespec used a plain unsigned (or unsigned int). This caused problems when taking addresses, so switch to unsigned short. Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

Merge branch 'bc/tree-walk-oid'

2019-01-29T20:47:56Z

The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use

tree-walk: store object_id in a separate member

2019-01-15T17:57:41Z

When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano