<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/list-objects.c, branch gitk-resize-error</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=gitk-resize-error</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=gitk-resize-error'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2021-08-12T20:08:30Z</updated>
<entry>
<title>list-objects.c: rename "traverse_trees_and_blobs" to "traverse_non_commits"</title>
<updated>2021-08-12T20:08:30Z</updated>
<author>
<name>Teng Long</name>
<email>dyroneteng@gmail.com</email>
</author>
<published>2021-08-12T08:59:31Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=b3e36df0d4daa7ede26e0c7252564f43662778d6'/>
<id>urn:sha1:b3e36df0d4daa7ede26e0c7252564f43662778d6</id>
<content type='text'>
Function `traverse_trees_and_blobs` not only works on trees and blobs,
but also on tags, the function name is somewhat misleading. This commit
rename it to `traverse_non_commits`.

Signed-off-by: Teng Long &lt;dyroneteng@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>bitmaps: don't recurse into trees already in the bitmap</title>
<updated>2021-06-15T02:13:11Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-06-14T12:05:44Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=aa9ad6fee54898b9965f4fd26b3035fdd7b20f37'/>
<id>urn:sha1:aa9ad6fee54898b9965f4fd26b3035fdd7b20f37</id>
<content type='text'>
If an object is already mentioned in a reachability bitmap we are
building, then by definition so are all of the objects it can reach. We
have an optimization to stop traversing commits when we see they are
already in the bitmap, but we don't do the same for trees.

It's generally unavoidable to recurse into trees for commits not yet
covered by bitmaps (since most commits generally do have unique
top-level trees). But they usually have subtrees that are shared with
other commits (i.e., all of the subtrees the commit _didn't_ touch). And
some of those commits (and their trees) may be covered by the bitmap.

Usually this isn't _too_ big a deal, because we'll visit those subtrees
only once in total for the whole walk. But if you have a large number of
unbitmapped commits, and if your tree is big, then you may end up
opening a lot of sub-trees for no good reason.

We can use the same optimization we do for commits here: when we are
about to open a tree, see if it's in the bitmap (either the one we are
building, or the "seen" bitmap which covers the UNINTERESTING side of
the bitmap when doing a set-difference).

This works especially well because we'll visit all commits before
hitting any trees. So even in a history like:

  A -- B

if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in
the results from A before looking at B's tree (so we really will only
look at trees touched by B).

For most repositories, the timings produced by p5310 are unspectacular.
Here's linux.git:

  Test                         HEAD^             HEAD
  --------------------------------------------------------------------
  5310.4: simulated clone      6.00(5.90+0.10)   5.98(5.90+0.08) -0.3%
  5310.5: simulated fetch      2.98(5.45+0.18)   2.85(5.31+0.18) -4.4%
  5310.7: rev-list (commits)   0.32(0.29+0.03)   0.33(0.30+0.03) +3.1%
  5310.8: rev-list (objects)   1.48(1.44+0.03)   1.49(1.44+0.05) +0.7%

Any improvement there is within the noise (the +3.1% on test 7 has to be
noise, since we are not recursing into trees, and thus the new code
isn't even run). The results for git.git are likewise uninteresting.

But here are numbers from some other real-world repositories (that are
not public). This one's tree is comparable in size to linux.git, but has
~16k refs (and so less complete bitmap coverage):

  Test                         HEAD^               HEAD
  -------------------------------------------------------------------------
  5310.4: simulated clone      38.34(39.86+0.74)   33.95(35.53+0.76) -11.5%
  5310.5: simulated fetch      2.29(6.31+0.35)     2.20(5.97+0.41) -3.9%
  5310.7: rev-list (commits)   0.99(0.86+0.13)     0.96(0.85+0.11) -3.0%
  5310.8: rev-list (objects)   11.32(11.04+0.27)   6.59(6.37+0.21) -41.8%

And here's another with a very large tree (~340k entries), and a fairly
large number of refs (~10k):

  Test                         HEAD^               HEAD
  -------------------------------------------------------------------------
  5310.3: simulated clone      53.83(54.71+1.54)   39.77(40.76+1.50) -26.1%
  5310.4: simulated fetch      19.91(20.11+0.56)   19.79(19.98+0.67) -0.6%
  5310.6: rev-list (commits)   0.54(0.44+0.11)     0.51(0.43+0.07) -5.6%
  5310.7: rev-list (objects)   24.32(23.59+0.73)   9.85(9.49+0.36) -59.5%

This patch provides substantial improvements in these larger cases, and
have any drawbacks for smaller ones (the cost of the bitmap check is
quite small compared to an actual tree traversal).

Note that we have to add a version of revision.c's include_check
callback which handles non-commits. We could possibly consolidate this
into a single callback for all objects types, as there's only one user
of the feature which would need converted (pack-bitmap.c:should_include).
That would in theory let us avoid duplicating any logic. But when I
tried it, the code ended up much worse to read, with lots of repeated
"if it's a commit do this, otherwise do that". Having two separate
callbacks splits that naturally, and matches the existing split of
show_commit/show_object callbacks.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: support filtering by tag and commit</title>
<updated>2021-04-12T16:35:50Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-04-12T13:37:35Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=9a2a4f95448890d138a800c8a55c5d5dcfe16082'/>
<id>urn:sha1:9a2a4f95448890d138a800c8a55c5d5dcfe16082</id>
<content type='text'>
Object filters currently only support filtering blobs or trees based on
some criteria. This commit lays the foundation to also allow filtering
of tags and commits.

No change in behaviour is expected from this commit given that there are
no filters yet for those object types.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: move tag processing into its own function</title>
<updated>2021-04-11T06:03:20Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2021-04-09T11:28:02Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=628d81be6c007de5aa0fa352b912b89f74a5cfa7'/>
<id>urn:sha1:628d81be6c007de5aa0fa352b912b89f74a5cfa7</id>
<content type='text'>
Move processing of tags into its own function to make the logic easier
to extend when we're going to implement filtering for tags. No change in
behaviour is expected from this commit.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/list-objects-optim-wo-trees'</title>
<updated>2019-10-07T02:32:56Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2019-10-07T02:32:56Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=bbfe5f2241f8baf6019c6cb7428aed9fb1353799'/>
<id>urn:sha1:bbfe5f2241f8baf6019c6cb7428aed9fb1353799</id>
<content type='text'>
The object traversal machinery has been optimized not to load tree
objects when we are only interested in commit history.

* jk/list-objects-optim-wo-trees:
  list-objects: don't queue root trees unless revs-&gt;tree_objects is set
</content>
</entry>
<entry>
<title>list-objects: don't queue root trees unless revs-&gt;tree_objects is set</title>
<updated>2019-09-12T18:47:30Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-09-12T01:11:37Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=72ed80c784e65770c4d4944b0ea54b08e098d7e3'/>
<id>urn:sha1:72ed80c784e65770c4d4944b0ea54b08e098d7e3</id>
<content type='text'>
When traverse_commit_list() processes each commit, it queues the
commit's root tree in the pending array. Then, after all commits are
processed, it calls traverse_trees_and_blobs() to walk over the pending
list, calling process_tree() on each. But if revs-&gt;tree_objects is not
set, process_tree() just exists immediately!

We can save ourselves some work by not even bothering to queue these
trees in the first place. There are a few subtle points to make:

  - we also detect commits with a NULL tree pointer here. But this isn't
    an interesting check for broken commits, since the lookup_tree()
    we'd have done during commit parsing doesn't actually check that we
    have the tree on disk. So we're not losing any robustness.

  - besides queueing, we also set the NOT_USER_GIVEN flag on the tree
    object. This is used by the traverse_commit_list_filtered() variant.
    But if we're not exploring trees, then we won't actually care about
    this flag, which is used only inside process_tree() code-paths.

  - queueing trees eventually leads to us queueing blobs, too. But we
    don't need to check revs-&gt;blob_objects here. Even in the current
    code, we still wouldn't find those blobs, because we'd never open up
    the tree objects to list their contents.

  - the user-visible impact to the caller is minimal. The pending trees
    are all cleared by the time the function returns anyway, by
    traverse_trees_and_blobs(). We do call a show_commit() callback,
    which technically could be looking at revs-&gt;pending during the
    callback. But it seems like a rather unlikely thing to do (if you
    want the tree of the current commit, then accessing the tree struct
    member is a lot simpler).

So this should be safe to do. Let's look at the benefits:

  [before]
  Benchmark #1: git -C linux rev-list HEAD &gt;/dev/null
    Time (mean ± σ):      7.651 s ±  0.021 s    [User: 7.399 s, System: 0.252 s]
    Range (min … max):    7.607 s …  7.683 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD &gt;/dev/null
    Time (mean ± σ):      7.593 s ±  0.023 s    [User: 7.329 s, System: 0.264 s]
    Range (min … max):    7.565 s …  7.634 s    10 runs

Not too impressive, but then we're really just avoiding sticking a
pointer into a growable array. But still, I'll take a free 0.75%
speedup.

Let's try it after running "git commit-graph write":

  [before]
  Benchmark #1: git -C linux rev-list HEAD &gt;/dev/null
    Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
    Range (min … max):    1.447 s …  1.481 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD &gt;/dev/null
    Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
    Range (min … max):    1.106 s …  1.181 s    10 runs

Now that's more like it. We saved over 22% of the total time. Part of
that is because the runtime is shorter overall, but the absolute
improvement is also much larger. What's going on?

When we fill in a commit struct using the commit graph, we don't bother
to set the tree pointer, and instead lazy-load it when somebody calls
get_commit_tree(). So we're not only skipping the pointer write to the
pending queue, but we're skipping the lazy-load of the tree entirely.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects-filter: encapsulate filter components</title>
<updated>2019-06-28T15:41:53Z</updated>
<author>
<name>Matthew DeVore</name>
<email>matvore@google.com</email>
</author>
<published>2019-06-27T22:54:05Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=9430147ca0aab0189d7e52df97b95a0985fc0c8a'/>
<id>urn:sha1:9430147ca0aab0189d7e52df97b95a0985fc0c8a</id>
<content type='text'>
Encapsulate filter_fn, filter_free_fn, and filter_data into their own
opaque struct.

Due to opaqueness, filter_fn and filter_free_fn can no longer be
accessed directly by users. Currently, all usages of filter_fn are
guarded by a necessary check:

	(obj-&gt;flags &amp; NOT_USER_GIVEN) &amp;&amp; filter_fn

Take the opportunity to include this check into the new function
list_objects_filter__filter_object(), so that we no longer need to write
this check at every caller of the filter function.

Also, the init functions in list-objects-filter.c no longer need to
confusingly return the filter constituents in various places (filter_fn
and filter_free_fn as out parameters, and filter_data as the function's
return value); they can just initialize the "struct filter" passed in.

Helped-by: Jeff Hostetler &lt;git@jeffhostetler.com&gt;
Helped-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Helped-by: Junio C Hamano &lt;gitster@pobox.com&gt;
Signed-off-by: Matthew DeVore &lt;matvore@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>rev-list: detect broken root trees</title>
<updated>2019-04-10T03:59:39Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2019-04-10T02:13:25Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=97dd512af7ce4afb4f638ef73b4770921c8ca3aa'/>
<id>urn:sha1:97dd512af7ce4afb4f638ef73b4770921c8ca3aa</id>
<content type='text'>
When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects.c: handle unexpected non-tree entries</title>
<updated>2019-04-10T03:59:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2019-04-10T02:13:19Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=b49e74eac480d167c3af8f1286fe520c3d7ce9e1'/>
<id>urn:sha1:b49e74eac480d167c3af8f1286fe520c3d7ce9e1</id>
<content type='text'>
Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects.c: handle unexpected non-blob entries</title>
<updated>2019-04-10T03:59:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2019-04-10T02:13:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=23c204455bf2198806e8c7b0cd86b20a50a379d0'/>
<id>urn:sha1:23c204455bf2198806e8c7b0cd86b20a50a379d0</id>
<content type='text'>
Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
