<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/Documentation/git-pack-objects.txt, branch gitk-resize-error</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=gitk-resize-error</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=gitk-resize-error'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2021-11-09T17:39:11Z</updated>
<entry>
<title>doc: express grammar placeholders between angle brackets</title>
<updated>2021-11-09T17:39:11Z</updated>
<author>
<name>Jean-Noël Avila</name>
<email>jn.avila@free.fr</email>
</author>
<published>2021-11-06T18:48:51Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=49cbad0edd4dcf53e373e9fd27a9c36a41fb044e'/>
<id>urn:sha1:49cbad0edd4dcf53e373e9fd27a9c36a41fb044e</id>
<content type='text'>
This discerns user inputs from verbatim options in the synopsis.

Signed-off-by: Jean-Noël Avila &lt;jn.avila@free.fr&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jk/doc-max-pack-size'</title>
<updated>2021-07-08T20:15:03Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-07-08T20:15:03Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=18b49be4927bb327449624c37eebd13a5b9c1582'/>
<id>urn:sha1:18b49be4927bb327449624c37eebd13a5b9c1582</id>
<content type='text'>
Doc update.

* jk/doc-max-pack-size:
  doc: warn people against --max-pack-size
</content>
</entry>
<entry>
<title>doc: warn people against --max-pack-size</title>
<updated>2021-06-08T23:56:09Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-06-08T07:24:48Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=6fb9195f6c6df047949efcdb8a1fb9eaed0c925e'/>
<id>urn:sha1:6fb9195f6c6df047949efcdb8a1fb9eaed0c925e</id>
<content type='text'>
This option is almost never a good idea, as the resulting repository is
larger and slower (see the new explanations in the docs).

I outlined the potential problems. We could go further and make the
option harder to find (or at least, make the command-line option
descriptions a much more terse "you probably don't want this; see
pack.packsizeLimit for details"). But this seems like a minimal change
that may prevent people from thinking it's more useful than it is.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/geometric-repack'</title>
<updated>2021-03-24T21:36:27Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2021-03-24T21:36:27Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=2744383cbda9bbbe4219bd3532757ae6d28460e1'/>
<id>urn:sha1:2744383cbda9bbbe4219bd3532757ae6d28460e1</id>
<content type='text'>
"git repack" so far has been only capable of repacking everything
under the sun into a single pack (or split by size).  A cleverer
strategy to reduce the cost of repacking a repository has been
introduced.

* tb/geometric-repack:
  builtin/pack-objects.c: ignore missing links with --stdin-packs
  builtin/repack.c: reword comment around pack-objects flags
  builtin/repack.c: be more conservative with unsigned overflows
  builtin/repack.c: assign pack split later
  t7703: test --geometric repack with loose objects
  builtin/repack.c: do not repack single packs with --geometric
  builtin/repack.c: add '--geometric' option
  packfile: add kept-pack cache for find_kept_pack_entry()
  builtin/pack-objects.c: rewrite honor-pack-keep logic
  p5303: measure time to repack with keep
  p5303: add missing &amp;&amp;-chains
  builtin/pack-objects.c: add '--stdin-packs' option
  revision: learn '--no-kept-objects'
  packfile: introduce 'find_kept_pack_entry()'
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: add '--stdin-packs' option</title>
<updated>2021-02-23T07:30:52Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2021-02-23T02:25:10Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=339bce27f4f2a6f3bfab3e708429c810f4030c43'/>
<id>urn:sha1:339bce27f4f2a6f3bfab3e708429c810f4030c43</id>
<content type='text'>
In an upcoming commit, 'git repack' will want to create a pack comprised
of all of the objects in some packs (the included packs) excluding any
objects in some other packs (the excluded packs).

This caller could iterate those packs themselves and feed the objects it
finds to 'git pack-objects' directly over stdin, but this approach has a
few downsides:

  - It requires every caller that wants to drive 'git pack-objects' in
    this way to implement pack iteration themselves. This forces the
    caller to think about details like what order objects are fed to
    pack-objects, which callers would likely rather not do.

  - If the set of objects in included packs is large, it requires
    sending a lot of data over a pipe, which is inefficient.

  - The caller is forced to keep track of the excluded objects, too, and
    make sure that it doesn't send any objects that appear in both
    included and excluded packs.

But the biggest downside is the lack of a reachability traversal.
Because the caller passes in a list of objects directly, those objects
don't get a namehash assigned to them, which can have a negative impact
on the delta selection process, causing 'git pack-objects' to fail to
find good deltas even when they exist.

The caller could formulate a reachability traversal themselves, but the
only way to drive 'git pack-objects' in this way is to do a full
traversal, and then remove objects in the excluded packs after the
traversal is complete. This can be detrimental to callers who care
about performance, especially in repositories with many objects.

Introduce 'git pack-objects --stdin-packs' which remedies these four
concerns.

'git pack-objects --stdin-packs' expects a list of pack names on stdin,
where 'pack-xyz.pack' denotes that pack as included, and
'^pack-xyz.pack' denotes it as excluded. The resulting pack includes all
objects that are present in at least one included pack, and aren't
present in any excluded pack.

To address the delta selection problem, 'git pack-objects --stdin-packs'
works as follows. First, it assembles a list of objects that it is going
to pack, as above. Then, a reachability traversal is started, whose tips
are any commits mentioned in included packs. Upon visiting an object, we
find its corresponding object_entry in the to_pack list, and set its
namehash parameter appropriately.

To avoid the traversal visiting more objects than it needs to, the
traversal is halted upon encountering an object which can be found in an
excluded pack (by marking the excluded packs as kept in-core, and
passing --no-kept-objects=in-core to the revision machinery).

This can cause the traversal to halt early, for example if an object in
an included pack is an ancestor of ones in excluded packs. But stopping
early is OK, since filling in the namehash fields of objects in the
to_pack list is only additive (i.e., having it helps the delta selection
process, but leaving it blank doesn't impact the correctness of the
resulting pack).

Even still, it is unlikely that this hurts us much in practice, since
the 'git repack --geometric' caller (which is introduced in a later
commit) marks small packs as included, and large ones as excluded.
During ordinary use, the small packs usually represent pushes after a
large repack, and so are unlikely to be ancestors of objects that
already exist in the repository.

(I found it convenient while developing this patch to have 'git
pack-objects' report the number of objects which were visited and got
their namehash fields filled in during traversal. This is also included
in the below patch via trace2 data lines).

Suggested-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Reviewed-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>doc: mention bigFileThreshold for packing</title>
<updated>2021-02-22T21:18:30Z</updated>
<author>
<name>Christian Walther</name>
<email>cwalther@gmx.ch</email>
</author>
<published>2021-02-21T13:23:57Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=3a837b58e33ee67f21a2ca737c6a12d74cee9e5e'/>
<id>urn:sha1:3a837b58e33ee67f21a2ca737c6a12d74cee9e5e</id>
<content type='text'>
Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Capitalize CONFIGURATION for consistency with other pages having such a
section.

Signed-off-by: Christian Walther &lt;cwalther@gmx.ch&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: no fetch when allow-{any,promisor}</title>
<updated>2020-08-06T20:01:03Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-08-05T23:06:51Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ee47243d7636d3d54b727ad24027a9167b68ebb1'/>
<id>urn:sha1:ee47243d7636d3d54b727ad24027a9167b68ebb1</id>
<content type='text'>
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:

    This patch introduces handling of missing objects to help
    debugging and development of the "partial clone" mechanism,
    and once the mechanism is implemented, for a power user to
    perform operations that are missing-object aware without
    incurring the cost of checking if a missing link is expected.

The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).

However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: set pack.useSparse=true by default</title>
<updated>2020-03-20T21:22:31Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2020-03-20T12:38:09Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=de3a864114291632c05e67bec4a316257c7ff97d'/>
<id>urn:sha1:de3a864114291632c05e67bec4a316257c7ff97d</id>
<content type='text'>
The pack.useSparse config option was introduced by 3d036eb0
(pack-objects: create pack.useSparse setting, 2019-01-19) and was
first available in v2.21.0. When enabled, the pack-objects process
during 'git push' will use a sparse tree walk when deciding which
trees and blobs to send to the remote. The algorithm was introduced
by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
has been in production use by VFS for Git since around that time.
The features.experimental config option also enabled pack.useSparse,
so hopefully that has also increased exposure.

It is worth noting that pack.useSparse has a possibility of
sending more objects across a push, but requires a special
arrangement of exact _copies_ across directories. There is a test
in t5322-pack-objects-sparse.sh that demonstrates this possibility.
This test uses the --sparse option to "git pack-objects" but we
can make it implied by the config value to demonstrate that the
default value has changed.

While updating that test, I noticed that the documentation did not
include an option for --no-sparse, which is now more important than
it was before.

Since the downside is unlikely but the upside is significant, set
the default value of pack.useSparse to true. Remove it from the
set of options implied by features.experimental.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>doc: fix repeated words</title>
<updated>2019-08-12T00:40:07Z</updated>
<author>
<name>Mark Rushakoff</name>
<email>mark.rushakoff@gmail.com</email>
</author>
<published>2019-08-10T05:59:14Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=24966cd9820a6b0d4e348807b07cb9af8ba84fc7'/>
<id>urn:sha1:24966cd9820a6b0d4e348807b07cb9af8ba84fc7</id>
<content type='text'>
Inspired by 21416f0a07 ("restore: fix typo in docs", 2019-08-03), I ran
"git grep -E '(\b[a-zA-Z]+) \1\b' -- Documentation/" to find other cases
where words were duplicated, e.g. "the the", and in most cases removed
one of the repeated words.

There were many false positives by this grep command, including
deliberate repeated words like "really really" or valid uses of "that
that" which I left alone, of course.

I also did not correct any of the legitimate, accidentally repeated
words in old RelNotes.

Signed-off-by: Mark Rushakoff &lt;mark.rushakoff@gmail.com&gt;
Acked-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>list-objects: consume sparse tree walk</title>
<updated>2019-01-17T21:44:39Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2019-01-16T18:25:58Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=4f6d26b16703e59e009fe5dde923b87793c5f561'/>
<id>urn:sha1:4f6d26b16703e59e009fe5dde923b87793c5f561</id>
<content type='text'>
When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from  uninteresting commits. This
commit walk is not changing during this series.

The mark_edges_uninteresting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
  object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
  UNINTERSTING parent (and all objects that tree can reach) as
  UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly different way. As we iterate
over the commits, we add all of the root trees to an oidset. Then,
call mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

Add a '--sparse' flag in 'git pack-objects' to call this new logic.
Add a new test script t/t5322-pack-objects-sparse.sh that tests this
option. The tests currently demonstrate that the resulting object
list is the same as the old algorithm. This includes a case where
both algorithms pack an object that is not needed by a remote due to
limits on the explored set of trees. When the sparse algorithm is
changed in a later commit, we will add a test that demonstrates a
change of behavior in some cases.

Signed-off-by: Derrick Stolee &lt;dstolee@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
