<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit-graph.h, branch v2.41.2</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=v2.41.2</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2023-04-06T20:38:21Z</updated>
<entry>
<title>Merge branch 'ds/ahead-behind'</title>
<updated>2023-04-06T20:38:21Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2023-04-06T20:38:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=7727da99dfab82148c5b77eaf334b305fb835956'/>
<id>urn:sha1:7727da99dfab82148c5b77eaf334b305fb835956</id>
<content type='text'>
"git for-each-ref" learns '%(ahead-behind:&lt;base&gt;)' that computes the
distances from a single reference point in the history with bunch
of commits in bulk.

* ds/ahead-behind:
  commit-reach: add tips_reachable_from_bases()
  for-each-ref: add ahead-behind format atom
  commit-reach: implement ahead_behind() logic
  commit-graph: introduce `ensure_generations_valid()`
  commit-graph: return generation from memory
  commit-graph: simplify compute_generation_numbers()
  commit-graph: refactor compute_topological_levels()
  for-each-ref: explicitly test no matches
  for-each-ref: add --stdin option
</content>
</entry>
<entry>
<title>commit-graph: introduce `ensure_generations_valid()`</title>
<updated>2023-03-20T19:17:33Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2023-03-20T11:26:52Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=c08645b353514fe14dbd62cf52afd49d0e88146b'/>
<id>urn:sha1:c08645b353514fe14dbd62cf52afd49d0e88146b</id>
<content type='text'>
Use the just-introduced compute_reachable_generation_numbers_1() to
implement a function which dynamically computes topological levels (or
corrected commit dates) for out-of-graph commits.

This will be useful for the ahead-behind algorithm we are about to
introduce, which needs accurate topological levels on _all_ commits
reachable from the tips in order to avoid over-counting.

Co-authored-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: remove unnecessary git-compat-util.h includes in headers</title>
<updated>2023-02-24T01:25:28Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-02-24T00:09:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=f332121e75d3aa2b0ce7efd120ac3ede19e9a733'/>
<id>urn:sha1:f332121e75d3aa2b0ce7efd120ac3ede19e9a733</id>
<content type='text'>
For sanity, we should probably do one of the following:

(a) make C and header files both depend upon everything they need
(b) consistently exclude git-compat-util.h from headers and require it
    be the first include in C files

Currently, we have some of the headers following (a) and others
following (b), which makes things messy.  In the past I was pushed
towards (b), as per [1] and [2].  Further, during this series I
discovered that this mixture empirically will mean that we end up with C
files that do not directly include git-compat-util.h, and do include
headers that don't include git-compat-util.h, with the result that we
likely have headers included before an indirect inclusion of
git-compat-util.h.  Since git-compat-util.h has tricky platform-specific
stuff that is meant to be included before everything else, this state of
affairs is risky and may lead to things breaking in subtle ways (and
only on some platforms) as per [1] and [2].

Since including git-compat-util.h in existing header files makes it
harder for us to catch C files that are missing that include, let's
switch to (b) to make the enforcement of this rule easier.  Remove the
inclusion of git-compat-util.h from header files other than the ones
that have been approved as alternate first includes.

[1] https://lore.kernel.org/git/20180811173406.GA9119@sigill.intra.peff.net/
[2] https://lore.kernel.org/git/20180811174301.GA9287@sigill.intra.peff.net/

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/commit-graph-genv2-upgrade-fix'</title>
<updated>2022-08-03T20:36:08Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-08-03T20:36:08Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=37e4bdd5ee5d6a7e09feaf5857299aac8fd56aeb'/>
<id>urn:sha1:37e4bdd5ee5d6a7e09feaf5857299aac8fd56aeb</id>
<content type='text'>
There was a bug in the codepath to upgrade generation information
in commit-graph from v1 to v2 format, which has been corrected.

* tb/commit-graph-genv2-upgrade-fix:
  commit-graph: fix corrupt upgrade from generation v1 to v2
  commit-graph: introduce `repo_find_commit_pos_in_graph()`
  t5318: demonstrate commit-graph generation v2 corruption
</content>
</entry>
<entry>
<title>commit-graph: introduce `repo_find_commit_pos_in_graph()`</title>
<updated>2022-07-15T23:51:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-07-12T23:10:31Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=7805360b7a3be02057385bc9d17aa493120b9538'/>
<id>urn:sha1:7805360b7a3be02057385bc9d17aa493120b9538</id>
<content type='text'>
Low-level callers in systems that are adjacent to the commit-graph (like
the changed-path Bloom filter code) could benefit from being able to
call a function like `parse_commit_in_graph()` without modifying the
corresponding commit slab data.

This is useful in contexts where that slab data is being used to prepare
for an upcoming commit-graph write, where Git must be careful to avoid
clobbering any of that data during a read operation.

Introduce a low-level variant of `parse_commit_in_graph()` which returns
the graph position of a given commit only, without modifying any of the
slab data.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: pass repo_settings instead of repository</title>
<updated>2022-07-14T22:42:17Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-07-14T21:43:06Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=a92d8523cef66d46e24cd5ef2f01ef97dc4ab239'/>
<id>urn:sha1:a92d8523cef66d46e24cd5ef2f01ef97dc4ab239</id>
<content type='text'>
The parse_commit_graph() function takes a 'struct repository *' pointer,
but it only ever accesses config settings (either directly or through
the .settings field of the repo struct). Move all relevant config
settings into the repo_settings struct, and update parse_commit_graph()
and its existing callers so that it takes 'struct repo_settings *'
instead.

Callers of parse_commit_graph() will now need to call
prepare_repo_settings() themselves, or initialize a 'struct
repo_settings' directly.

Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.

Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62
(2021-12-06, repo-settings:prepare_repo_settings only in git repos),
prepare_repo_settings was changed to issue a BUG() if it is called by a
process whose CWD is not a Git repository.

The combination of commits mentioned above broke fuzz-commit-graph,
which attempts to parse arbitrary fuzzing-engine-provided bytes as a
commit graph file. Prior to this change, parse_commit_graph() called
prepare_repo_settings(), but since we run the fuzz tests without a valid
repository, we are hitting the BUG() from 44c7e62 for every test case.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Josh Steadmon &lt;steadmon@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: fix memory leak in misused string_list API</title>
<updated>2022-03-04T21:24:18Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-03-04T18:32:12Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=4a0479086a9bea5d31c4588b07bd45ae92a12b71'/>
<id>urn:sha1:4a0479086a9bea5d31c4588b07bd45ae92a12b71</id>
<content type='text'>
When this code was migrated to the string_list API in
d88b14b3fd6 (commit-graph: use string-list API for input, 2018-06-27)
it was made to use used both STRING_LIST_INIT_NODUP and a
strbuf_detach() pattern.

Those should not be used together if string_list_clear() is expected
to free the memory, instead we need to either use STRING_LIST_INIT_DUP
with a string_list_append_nodup(), or a STRING_LIST_INIT_NODUP and
manually fiddle with the "strdup_strings" member before calling
string_list_clear(). Let's do the former.

Since "strdup_strings = 1" is set now other code might be broken by
relying on "pack_indexes" not to duplicate it strings, but that
doesn't happen. When we pass this down to write_commit_graph() that
code uses the "struct string_list" without modifying it. Let's add a
"const" to the variable to have the compiler enforce that assumption.

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>revision: avoid hitting packfiles when commits are in commit-graph</title>
<updated>2021-08-09T16:51:12Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-08-09T08:12:03Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=f559d6d45e7e58ae1f922213948723de77ea77bd'/>
<id>urn:sha1:f559d6d45e7e58ae1f922213948723de77ea77bd</id>
<content type='text'>
When queueing references in git-rev-list(1), we try to optimize parsing
of commits via the commit-graph. To do so, we first look up the object's
type, and if it is a commit we call `repo_parse_commit()` instead of
`parse_object()`. This is quite inefficient though given that we're
always uncompressing the object header in order to determine the type.
Instead, we can opportunistically search the commit-graph for the object
ID: in case it's found, we know it's a commit and can directly fill in
the commit object without having to uncompress the object header.

Expose a new function `lookup_commit_in_graph()`, which tries to find a
commit in the commit-graph by ID, and convert `get_reference()` to use
this function. This provides a big performance win in cases where we
load references in a repository with lots of references pointing to
commits. The following has been executed in a real-world repository with
about 2.2 million refs:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.458 s ±  0.044 s    [User: 4.115 s, System: 0.342 s]
      Range (min … max):    4.409 s …  4.534 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      3.089 s ±  0.015 s    [User: 2.768 s, System: 0.321 s]
      Range (min … max):    3.061 s …  3.105 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-graph: use config to specify generation type</title>
<updated>2021-02-25T23:10:41Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2021-02-25T18:19:43Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=702110aac63556b4572d9c7b65c9123ec8038ebf'/>
<id>urn:sha1:702110aac63556b4572d9c7b65c9123ec8038ebf</id>
<content type='text'>
We have two established generation number versions:

 1: topological levels
 2: corrected commit dates

The corrected commit dates are enabled by default, but they also write
extra data in the GDAT and GDOV chunks. Services that host Git data
might want to have more control over when this feature rolls out than
just updating the Git binaries.

Add a new "commitGraph.generationVersion" config option that specifies
the intended generation number version. If this value is less than 2,
then the GDAT chunk is never written _or read_ from an existing file.

This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT
environment variable in the test suite. Remove it.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit-reach: use corrected commit dates in paint_down_to_common()</title>
<updated>2021-01-19T00:21:18Z</updated>
<author>
<name>Abhishek Kumar</name>
<email>abhishekkumar8222@gmail.com</email>
</author>
<published>2021-01-16T18:11:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8d00d7c3df8b7947c9154873116b5153c1a84dbf'/>
<id>urn:sha1:8d00d7c3df8b7947c9154873116b5153c1a84dbf</id>
<content type='text'>
091f4cf (commit: don't use generation numbers if not needed,
2018-08-30) changed paint_down_to_common() to use commit dates instead
of generation numbers v1 (topological levels) as the performance
regressed on certain topologies. With generation number v2 (corrected
commit dates) implemented, we no longer have to rely on commit dates and
can use generation numbers.

For example, the command `git merge-base v4.8 v4.9` on the Linux
repository walks 167468 commits, taking 0.135s for committer date and
167496 commits, taking 0.157s for corrected committer date respectively.

While using corrected commit dates, Git walks nearly the same number of
commits as commit date, the process is slower as for each comparision we
have to access a commit-slab (for corrected committer date) instead of
accessing struct member (for committer date).

This change incidentally broke the fragile t6404-recursive-merge test.
t6404-recursive-merge sets up a unique repository where all commits have
the same committer date without a well-defined merge-base.

While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
conflicts' merges commits in the order:
- Merge C with B to form an intermediate commit.
- Merge the intermediate commit with A.

With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
use the corrected committer date, which changes the order in which
commits are merged:
- Merge A with B to form an intermediate commit.
- Merge the intermediate commit with C.

While resulting repositories are equivalent, 6404.4 'virtual trees were
processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
different merge-bases and thus have different object ids for the
intermediate commits.

As this has already causes problems (as noted in 859fdc0 (commit-graph:
define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
within t6404-recursive-merge.

Signed-off-by: Abhishek Kumar &lt;abhishekkumar8222@gmail.com&gt;
Reviewed-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
