<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-reach.c, branch gitk-resize-error</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=gitk-resize-error</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=gitk-resize-error'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2021-03-14T00:00:09Z</updated>
<entry>
<title>use CALLOC_ARRAY</title>
<updated>2021-03-14T00:00:09Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2021-03-13T16:17:22Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ca56dadb4b65ccaeab809d80db80a312dc00941a'/>
<id>urn:sha1:ca56dadb4b65ccaeab809d80db80a312dc00941a</id>
<content type='text'>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: stale commits may prune generation further</title>
<updated>2021-02-22T21:34:34Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-19T12:34:10Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=41f3c9949fc384f6a122e7d23bc36b7d9bb96ce2'/>
<id>urn:sha1:41f3c9949fc384f6a122e7d23bc36b7d9bb96ce2</id>
<content type='text'>
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.

This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.

However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.

With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.

We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).

This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.

This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:

  git merge-base --independent 2ecedd756908 d2360a398f0b \
	1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e

The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.

Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.

commit-graph at v5.5:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 864ms |
 | *_with_gen() (before) | 858ms |
 | *_with_gen() (after)  | 810ms |

commit-graph at v5.7:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 625ms |
 | *_with_gen() (before) | 572ms |
 | *_with_gen() (after)  | 517ms |

commit-graph at v5.9:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 268ms |
 | *_with_gen() (before) | 224ms |
 | *_with_gen() (after)  | 202ms |

commit-graph at v5.10:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            |  72ms |
 | *_with_gen() (before) |  37ms |
 | *_with_gen() (after)  |   9ms |

Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: use heuristic in remove_redundant()</title>
<updated>2021-02-22T21:34:34Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-19T12:34:09Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=36777733713afeee31e7cf2dbb6d6a3dac186a51'/>
<id>urn:sha1:36777733713afeee31e7cf2dbb6d6a3dac186a51</id>
<content type='text'>
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).

Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.

The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.

The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.

To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.

We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.

On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent &lt;all-tags&gt;' goes from 1.1 seconds to 0.11
seconds.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: move compare_commits_by_gen</title>
<updated>2021-02-22T21:34:34Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-19T12:34:08Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=c8d693e1e6d3bd883224559cf4c1b07d9d58e0cf'/>
<id>urn:sha1:c8d693e1e6d3bd883224559cf4c1b07d9d58e0cf</id>
<content type='text'>
Move this earlier in the file so it can be used by more methods.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: use one walk in remove_redundant()</title>
<updated>2021-02-22T21:34:34Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-19T12:34:07Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=fbc21e3fbba982c50a3c2a273038770a249ed510'/>
<id>urn:sha1:fbc21e3fbba982c50a3c2a273038770a249ed510</id>
<content type='text'>
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.

For example, in the Linux kernel repository, I tested the performance
by passing all tags:

 git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")

(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)

Here is the performance improvement introduced by this change:

 Before: 16.4s
  After:  1.1s

This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().

The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.

At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.

This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: reduce requirements for remove_redundant()</title>
<updated>2021-02-01T19:50:33Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-01T12:47:23Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=0fac15652322e607ef3cb9f59e46ca2b168e933a'/>
<id>urn:sha1:0fac15652322e607ef3cb9f59e46ca2b168e933a</id>
<content type='text'>
Remove a comment at the beggining of remove_redundant() that mentions a
reordering of the input array to have the initial segment be the
independent commits and the final segment be the redundant commits.
While this behavior is followed in remove_redundant(), no callers rely
on that behavior.

Remove the final loop that copies this final segment and update the
comment to match the new behavior.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: use corrected commit dates in paint_down_to_common()</title>
<updated>2021-01-19T00:21:18Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2021-01-16T18:11:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8d00d7c3df8b7947c9154873116b5153c1a84dbf'/>
<id>urn:sha1:8d00d7c3df8b7947c9154873116b5153c1a84dbf</id>
<content type='text'>
091f4cf (commit: don't use generation numbers if not needed,
2018-08-30) changed paint_down_to_common() to use commit dates instead
of generation numbers v1 (topological levels) as the performance
regressed on certain topologies. With generation number v2 (corrected
commit dates) implemented, we no longer have to rely on commit dates and
can use generation numbers.

For example, the command `git merge-base v4.8 v4.9` on the Linux
repository walks 167468 commits, taking 0.135s for committer date and
167496 commits, taking 0.157s for corrected committer date respectively.

While using corrected commit dates, Git walks nearly the same number of
commits as commit date, the process is slower as for each comparision we
have to access a commit-slab (for corrected committer date) instead of
accessing struct member (for committer date).

This change incidentally broke the fragile t6404-recursive-merge test.
t6404-recursive-merge sets up a unique repository where all commits have
the same committer date without a well-defined merge-base.

While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
conflicts' merges commits in the order:
- Merge C with B to form an intermediate commit.
- Merge the intermediate commit with A.

With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
use the corrected committer date, which changes the order in which
commits are merged:
- Merge A with B to form an intermediate commit.
- Merge the intermediate commit with C.

While resulting repositories are equivalent, 6404.4 'virtual trees were
processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
different merge-bases and thus have different object ids for the
intermediate commits.

As this has already causes problems (as noted in 859fdc0 (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
within t6404-recursive-merge.

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: return 64-bit generation number</title>
<updated>2021-01-19T00:21:18Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2021-01-16T18:11:13Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=d7f92784c65b143c5ec9f435484146b6098c2f85'/>
<id>urn:sha1:d7f92784c65b143c5ec9f435484146b6098c2f85</id>
<content type='text'>
In a preparatory step for introducing corrected commit dates, let's
return timestamp_t values from commit_graph_generation(), use
timestamp_t for local variables and define GENERATION_NUMBER_INFINITY
as (2 ^ 63 - 1) instead.

We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to
represent the largest topological level we can store in the commit data
chunk.

With corrected commit dates implemented, we will have two such *_MAX
variables to denote the largest offset and largest topological level
that can be stored.

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: fix in_merge_bases_many bug</title>
<updated>2020-10-02T17:26:31Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-10-02T14:58:56Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8791bf18414a37205127e184c04cad53a43aeff1'/>
<id>urn:sha1:8791bf18414a37205127e184c04cad53a43aeff1</id>
<content type='text'>
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.

Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".

In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.

This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.

This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!

Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.

The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().

Reported-by: Srinidhi Kaushik &lt;shrinidhi.kaushik@gmail.com&gt;
Helped-by: Johannes Schindelin &lt;johannes.schindelin@gmx.de&gt;
Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'cb/is-descendant-of'</title>
<updated>2020-07-07T05:09:16Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-07-07T05:09:15Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=0258ed1e08e7973f1d6829db8d33109851067a91'/>
<id>urn:sha1:0258ed1e08e7973f1d6829db8d33109851067a91</id>
<content type='text'>
Code clean-up.

* cb/is-descendant-of:
  commit-reach: avoid is_descendant_of() shim
</content>
</entry>
</feed>
