<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/builtin/pack-objects.c, branch main</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=main</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=main'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2026-04-06T22:42:49Z</updated>
<entry>
<title>Merge branch 'tb/stdin-packs-excluded-but-open'</title>
<updated>2026-04-06T22:42:49Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-04-06T22:42:49Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=03311dca7f91f69e9e0c532fce1c1e3c0a9fa34d'/>
<id>urn:sha1:03311dca7f91f69e9e0c532fce1c1e3c0a9fa34d</id>
<content type='text'>
pack-objects's --stdin-packs=follow mode learns to handle
excluded-but-open packs.

* tb/stdin-packs-excluded-but-open:
  repack: mark non-MIDX packs above the split as excluded-open
  pack-objects: support excluded-open packs with --stdin-packs
  t7704: demonstrate failure with once-cruft objects above the geometric split
  pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`
  pack-objects: plug leak in `read_stdin_packs()`
</content>
</entry>
<entry>
<title>Merge branch 'ps/odb-generic-object-name-handling'</title>
<updated>2026-04-06T22:42:49Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-04-06T22:42:48Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=d75badf83bc3fc8e47413970874bac681eeb5bbe'/>
<id>urn:sha1:d75badf83bc3fc8e47413970874bac681eeb5bbe</id>
<content type='text'>
Object name handling (disambiguation and abbreviation) has been
refactored to be backend-generic, moving logic into the respective
object database backends.

* ps/odb-generic-object-name-handling:
  odb: introduce generic `odb_find_abbrev_len()`
  object-file: move logic to compute packed abbreviation length
  object-name: move logic to compute loose abbreviation length
  object-name: simplify computing common prefixes
  object-name: abbreviate loose object names without `disambiguate_state`
  object-name: merge `update_candidates()` and `match_prefix()`
  object-name: backend-generic `get_short_oid()`
  object-name: backend-generic `repo_collect_ambiguous()`
  object-name: extract function to parse object ID prefixes
  object-name: move logic to iterate through packed prefixed objects
  object-name: move logic to iterate through loose prefixed objects
  odb: introduce `struct odb_for_each_object_options`
  oidtree: extend iteration to allow for arbitrary return codes
  oidtree: modernize the code a bit
  object-file: fix sparse 'plain integer as NULL pointer' error
</content>
</entry>
<entry>
<title>pack-objects: support excluded-open packs with --stdin-packs</title>
<updated>2026-03-27T20:40:40Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-03-27T20:06:51Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=3f7c0e722e2733aede32b1e531caf83e7043d1bd'/>
<id>urn:sha1:3f7c0e722e2733aede32b1e531caf83e7043d1bd</id>
<content type='text'>
In cd846bacc7d (pack-objects: introduce '--stdin-packs=follow',
2025-06-23), pack-objects learned to traverse through commits in
included packs when using '--stdin-packs=follow', rescuing reachable
objects from unlisted packs into the output.

When we encounter a commit in an excluded pack during this rescuing
phase we will traverse through its parents. But because we set
`revs.no_kept_objects = 1`, commit simplification will prevent us from
showing it via `get_revision()`. (In practice, `--stdin-packs=follow`
walks commits down to the roots, but only opens up trees for ones that
do not appear in an excluded pack.)

But there are certain cases where we *do* need to see the parents of an
object in an excluded pack. Namely, if an object is rescue-able, but
only reachable from object(s) which appear in excluded packs, then
commit simplification will exclude those commits from the object
traversal, and we will never see a copy of that object, and thus not
rescue it.

This is what causes the failure in the previous commit during repacking.
When performing a geometric repack, packs above the geometric split that
weren't part of the previous MIDX (e.g., packs pushed directly into
`$GIT_DIR/objects/pack`) may not have full object closure.  When those
packs are listed as excluded via the '^' marker, the reachability
traversal encounters the sequence described above, and may miss objects
which we expect to rescue with `--stdin-packs=follow`.

Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed
packs, objects from '!'-prefixed packs are excluded from the resulting
pack. But unlike '^', commits in '!'-prefixed packs *are* used as
starting points for the follow traversal, and the traversal does not
treat them as a closure boundary.

In order to distinguish excluded-closed from excluded-open packs during
the traversal, introduce a new `pack_keep_in_core_open` bit on
`struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN`
flag for the kept-pack cache.

In `add_object_entry_from_pack()`, move the `want_object_in_pack()`
check to *after* `add_pending_oid()`. This is necessary so that commits
from excluded-open packs are added as traversal tips even though their
objects won't appear in the output. As a consequence, the caller
`for_each_object_in_pack()` will always provide a non-NULL 'p', hence we
are able to drop the "if (p)" conditional.

The `include_check` and `include_check_obj` callbacks on `rev_info` are
used to halt the walk at closed-excluded packs, since objects behind a
'^' boundary are guaranteed to have closure and need not be rescued.

The following commit will make use of this new functionality within the
repack layer to resolve the test failure demonstrated in the previous
commit.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`</title>
<updated>2026-03-27T20:40:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-03-27T20:06:46Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=d31d1f2e06942046b5220942c77245725d7df2c1'/>
<id>urn:sha1:d31d1f2e06942046b5220942c77245725d7df2c1</id>
<content type='text'>
The '--stdin-packs' mode of pack-objects maintains two separate
string_lists: one for included packs, and one for excluded packs. Each
list stores the pack basename as a string and the corresponding
`packed_git` pointer in its `-&gt;util` field.

This works, but makes it awkward to extend the set of pack "kinds" that
pack-objects can accept via stdin, since each new kind would need its
own string_list and duplicated handling. A future commit will want to do
just this, so prepare for that change by handling the various "kinds" of
packs specified over stdin in a more generic fashion.

Namely, replace the two `string_list`s with a single `strmap` keyed on
the pack basename, with values pointing to a new `struct
stdin_pack_info`. This struct tracks both the `packed_git` pointer and a
`kind` bitfield indicating whether the pack was specified as included or
excluded.

Extract the logic for sorting packs by mtime and adding their objects
into a separate `stdin_packs_add_pack_entries()` helper.

While we could have used a `string_list`, we must handle the case where
the same pack is specified more than once. With a `string_list` only, we
would have to pay a quadratic cost to either (a) insert elements into
their sorted positions, or (b) a repeated linear search, which is
accidentally quadratic. For that reason, use a strmap instead.

This patch does not include any functional changes.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>pack-objects: plug leak in `read_stdin_packs()`</title>
<updated>2026-03-27T20:40:39Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2026-03-27T20:06:43Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=81e29064371c4b4599b171ac71e73d1e21475a63'/>
<id>urn:sha1:81e29064371c4b4599b171ac71e73d1e21475a63</id>
<content type='text'>
The `read_stdin_packs()` function added originally via 339bce27f4f
(builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22)
declares a `rev_info` struct but neglects to call `release_revisions()`
on it before returning, creating the potential for a leak.

The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for
'--stdin-packs' earlier, 2025-06-23) carried forward this oversight and
did not address it.

Ensure that we call `release_revisions()` appropriately to prevent a
potential leak from this function. Note that in practice our `rev_info`
here does not have a present leak, hence t5331 passes cleanly before
this commit, even when built with SANITIZE=leak.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/upload-pack-buffer-more-writes'</title>
<updated>2026-03-24T19:31:34Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-03-24T19:31:34Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8023abc632ea45a9a1b7f78b13db2cf541849775'/>
<id>urn:sha1:8023abc632ea45a9a1b7f78b13db2cf541849775</id>
<content type='text'>
Reduce system overhead "git upload-pack" spends on relaying "git
pack-objects" output to the "git fetch" running on the other end of
the connection.

* ps/upload-pack-buffer-more-writes:
  builtin/pack-objects: reduce lock contention when writing packfile data
  csum-file: drop `hashfd_throughput()`
  csum-file: introduce `hashfd_ext()`
  sideband: use writev(3p) to send pktlines
  wrapper: introduce writev(3p) wrappers
  compat/posix: introduce writev(3p) wrapper
  upload-pack: reduce lock contention when writing packfile data
  upload-pack: prefer flushing data over sending keepalive
  upload-pack: adapt keepalives based on buffering
  upload-pack: fix debug statement when flushing packfile data
</content>
</entry>
<entry>
<title>odb: introduce `struct odb_for_each_object_options`</title>
<updated>2026-03-20T20:16:41Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-20T07:07:29Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=cfd575f0a9730712107e4ee6799a37665bcd8204'/>
<id>urn:sha1:cfd575f0a9730712107e4ee6799a37665bcd8204</id>
<content type='text'>
The `odb_for_each_object()` function only accepts a bitset of flags. In
a subsequent commit we'll want to change object iteration to also
support iterating over only those objects that have a specific prefix.
While we could of course add the prefix to the function signature, or
alternatively introduce a new function, both of these options don't
really seem to be that sensible.

Instead, introduce a new `struct odb_for_each_object_options` that can
be passed to a new `odb_for_each_object_ext()` function. Splice through
the options structure into the respective object database sources.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>builtin/pack-objects: reduce lock contention when writing packfile data</title>
<updated>2026-03-13T15:54:15Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-13T06:45:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=835e0aaf6f0e07e9f9a393ed0e456db7c1be33ef'/>
<id>urn:sha1:835e0aaf6f0e07e9f9a393ed0e456db7c1be33ef</id>
<content type='text'>
When running `git pack-objects --stdout` we feed the data through
`hashfd_ext()` with a progress meter and a smaller-than-usual buffer
length of 8kB so that we can track throughput more granularly. But as
packfiles tend to be on the larger side, this small buffer size may
cause a ton of write(3p) syscalls.

Originally, the buffer we used in `hashfd()` was 8kB for all use cases.
This was changed though in 2ca245f8be (csum-file.h: increase hashfile
buffer size, 2021-05-18) because we noticed that the number of writes
can have an impact on performance. So the buffer size was increased to
128kB, which improved performance a bit for some use cases.

But the commit didn't touch the buffer size for `hashd_throughput()`.
The reasoning here was that callers expect the progress indicator to
update frequently, and a larger buffer size would of course reduce the
update frequency especially on slow networks.

While that is of course true, there was (and still is, even though it's
now a call to `hashfd_ext()`) only a single caller of this function in
git-pack-objects(1). This command is responsible for writing packfiles,
and those packfiles are often on the bigger side. So arguably:

  - The user won't care about increments of 8kB when packfiles tend to
    be megabytes or even gigabytes in size.

  - Reducing the number of syscalls would be even more valuable here
    than it would be for multi-pack indices, which was the benchmark
    done in the mentioned commit, as MIDXs are typically significantly
    smaller than packfiles.

  - Nowadays, many internet connections should be able to transfer data
    at a rate significantly higher than 8kB per second.

Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`,
which translates to ~64kB. This limit was chosen because `git
pack-objects --stdout` is most often used when sending packfiles via
git-upload-pack(1), where packfile data is chunked into pktlines when
using the sideband. Furthermore, most internet connections should have a
bandwidth signifcantly higher than 64kB/s, so we'd still be able to
observe progress updates at a rate of at least once per second.

This change significantly reduces the number of write(3p) syscalls from
355,000 to 44,000 when packing the Linux repository. While this results
in a small performance improvement on an otherwise-unused system, this
improvement is mostly negligible. More importantly though, it will
reduce lock contention in the kernel on an extremely busy system where
we have many processes writing data at once.

Suggested-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>csum-file: drop `hashfd_throughput()`</title>
<updated>2026-03-13T15:54:15Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2026-03-13T06:45:20Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=2bf8f36ddb308b084912f8265ad6fd60df34a036'/>
<id>urn:sha1:2bf8f36ddb308b084912f8265ad6fd60df34a036</id>
<content type='text'>
The `hashfd_throughput()` function is used by a single callsite in
git-pack-objects(1). In contrast to `hashfd()`, this function uses a
progress meter to measure throughput and a smaller buffer length so that
the progress meter can provide more granular metrics.

We're going to change that caller in the next commit to be a bit more
specific to packing objects. As such, `hashfd_throughput()` will be a
somewhat unfitting mechanism for any potential new callers.

Drop the function and replace it with a call to `hashfd_ext()`.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ps/odb-sources'</title>
<updated>2026-03-12T21:09:07Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2026-03-12T21:09:06Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=c89a495ce4c508db4dd798050b9115ad103711c7'/>
<id>urn:sha1:c89a495ce4c508db4dd798050b9115ad103711c7</id>
<content type='text'>
The object source API is getting restructured to allow plugging new
backends.

* ps/odb-sources:
  odb/source: make `begin_transaction()` function pluggable
  odb/source: make `write_alternate()` function pluggable
  odb/source: make `read_alternates()` function pluggable
  odb/source: make `write_object_stream()` function pluggable
  odb/source: make `write_object()` function pluggable
  odb/source: make `freshen_object()` function pluggable
  odb/source: make `for_each_object()` function pluggable
  odb/source: make `read_object_stream()` function pluggable
  odb/source: make `read_object_info()` function pluggable
  odb/source: make `close()` function pluggable
  odb/source: make `reprepare()` function pluggable
  odb/source: make `free()` function pluggable
  odb/source: introduce source type for robustness
  odb: move reparenting logic into respective subsystems
  odb: embed base source in the "files" backend
  odb: introduce "files" source
  odb: split `struct odb_source` into separate header
</content>
</entry>
</feed>
