aboutsummaryrefslogtreecommitdiff
path: root/builtin/pack-objects.c
AgeCommit message (Collapse)Author
9 daysMerge branch 'tb/stdin-packs-excluded-but-open'Junio C Hamano
pack-objects's --stdin-packs=follow mode learns to handle excluded-but-open packs. * tb/stdin-packs-excluded-but-open: repack: mark non-MIDX packs above the split as excluded-open pack-objects: support excluded-open packs with --stdin-packs t7704: demonstrate failure with once-cruft objects above the geometric split pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` pack-objects: plug leak in `read_stdin_packs()`
9 daysMerge branch 'ps/odb-generic-object-name-handling'Junio C Hamano
Object name handling (disambiguation and abbreviation) has been refactored to be backend-generic, moving logic into the respective object database backends. * ps/odb-generic-object-name-handling: odb: introduce generic `odb_find_abbrev_len()` object-file: move logic to compute packed abbreviation length object-name: move logic to compute loose abbreviation length object-name: simplify computing common prefixes object-name: abbreviate loose object names without `disambiguate_state` object-name: merge `update_candidates()` and `match_prefix()` object-name: backend-generic `get_short_oid()` object-name: backend-generic `repo_collect_ambiguous()` object-name: extract function to parse object ID prefixes object-name: move logic to iterate through packed prefixed objects object-name: move logic to iterate through loose prefixed objects odb: introduce `struct odb_for_each_object_options` oidtree: extend iteration to allow for arbitrary return codes oidtree: modernize the code a bit object-file: fix sparse 'plain integer as NULL pointer' error
2026-03-27pack-objects: support excluded-open packs with --stdin-packsTaylor Blau
In cd846bacc7d (pack-objects: introduce '--stdin-packs=follow', 2025-06-23), pack-objects learned to traverse through commits in included packs when using '--stdin-packs=follow', rescuing reachable objects from unlisted packs into the output. When we encounter a commit in an excluded pack during this rescuing phase we will traverse through its parents. But because we set `revs.no_kept_objects = 1`, commit simplification will prevent us from showing it via `get_revision()`. (In practice, `--stdin-packs=follow` walks commits down to the roots, but only opens up trees for ones that do not appear in an excluded pack.) But there are certain cases where we *do* need to see the parents of an object in an excluded pack. Namely, if an object is rescue-able, but only reachable from object(s) which appear in excluded packs, then commit simplification will exclude those commits from the object traversal, and we will never see a copy of that object, and thus not rescue it. This is what causes the failure in the previous commit during repacking. When performing a geometric repack, packs above the geometric split that weren't part of the previous MIDX (e.g., packs pushed directly into `$GIT_DIR/objects/pack`) may not have full object closure. When those packs are listed as excluded via the '^' marker, the reachability traversal encounters the sequence described above, and may miss objects which we expect to rescue with `--stdin-packs=follow`. Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed packs, objects from '!'-prefixed packs are excluded from the resulting pack. But unlike '^', commits in '!'-prefixed packs *are* used as starting points for the follow traversal, and the traversal does not treat them as a closure boundary. In order to distinguish excluded-closed from excluded-open packs during the traversal, introduce a new `pack_keep_in_core_open` bit on `struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN` flag for the kept-pack cache. In `add_object_entry_from_pack()`, move the `want_object_in_pack()` check to *after* `add_pending_oid()`. This is necessary so that commits from excluded-open packs are added as traversal tips even though their objects won't appear in the output. As a consequence, the caller `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we are able to drop the "if (p)" conditional. The `include_check` and `include_check_obj` callbacks on `rev_info` are used to halt the walk at closed-excluded packs, since objects behind a '^' boundary are guaranteed to have closure and need not be rescued. The following commit will make use of this new functionality within the repack layer to resolve the test failure demonstrated in the previous commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-27pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap`Taylor Blau
The '--stdin-packs' mode of pack-objects maintains two separate string_lists: one for included packs, and one for excluded packs. Each list stores the pack basename as a string and the corresponding `packed_git` pointer in its `->util` field. This works, but makes it awkward to extend the set of pack "kinds" that pack-objects can accept via stdin, since each new kind would need its own string_list and duplicated handling. A future commit will want to do just this, so prepare for that change by handling the various "kinds" of packs specified over stdin in a more generic fashion. Namely, replace the two `string_list`s with a single `strmap` keyed on the pack basename, with values pointing to a new `struct stdin_pack_info`. This struct tracks both the `packed_git` pointer and a `kind` bitfield indicating whether the pack was specified as included or excluded. Extract the logic for sorting packs by mtime and adding their objects into a separate `stdin_packs_add_pack_entries()` helper. While we could have used a `string_list`, we must handle the case where the same pack is specified more than once. With a `string_list` only, we would have to pay a quadratic cost to either (a) insert elements into their sorted positions, or (b) a repeated linear search, which is accidentally quadratic. For that reason, use a strmap instead. This patch does not include any functional changes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-27pack-objects: plug leak in `read_stdin_packs()`Taylor Blau
The `read_stdin_packs()` function added originally via 339bce27f4f (builtin/pack-objects.c: add '--stdin-packs' option, 2021-02-22) declares a `rev_info` struct but neglects to call `release_revisions()` on it before returning, creating the potential for a leak. The related change in 97ec43247c0 (pack-objects: declare 'rev_info' for '--stdin-packs' earlier, 2025-06-23) carried forward this oversight and did not address it. Ensure that we call `release_revisions()` appropriately to prevent a potential leak from this function. Note that in practice our `rev_info` here does not have a present leak, hence t5331 passes cleanly before this commit, even when built with SANITIZE=leak. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-24Merge branch 'ps/upload-pack-buffer-more-writes'Junio C Hamano
Reduce system overhead "git upload-pack" spends on relaying "git pack-objects" output to the "git fetch" running on the other end of the connection. * ps/upload-pack-buffer-more-writes: builtin/pack-objects: reduce lock contention when writing packfile data csum-file: drop `hashfd_throughput()` csum-file: introduce `hashfd_ext()` sideband: use writev(3p) to send pktlines wrapper: introduce writev(3p) wrappers compat/posix: introduce writev(3p) wrapper upload-pack: reduce lock contention when writing packfile data upload-pack: prefer flushing data over sending keepalive upload-pack: adapt keepalives based on buffering upload-pack: fix debug statement when flushing packfile data
2026-03-20odb: introduce `struct odb_for_each_object_options`Patrick Steinhardt
The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13builtin/pack-objects: reduce lock contention when writing packfile dataPatrick Steinhardt
When running `git pack-objects --stdout` we feed the data through `hashfd_ext()` with a progress meter and a smaller-than-usual buffer length of 8kB so that we can track throughput more granularly. But as packfiles tend to be on the larger side, this small buffer size may cause a ton of write(3p) syscalls. Originally, the buffer we used in `hashfd()` was 8kB for all use cases. This was changed though in 2ca245f8be (csum-file.h: increase hashfile buffer size, 2021-05-18) because we noticed that the number of writes can have an impact on performance. So the buffer size was increased to 128kB, which improved performance a bit for some use cases. But the commit didn't touch the buffer size for `hashd_throughput()`. The reasoning here was that callers expect the progress indicator to update frequently, and a larger buffer size would of course reduce the update frequency especially on slow networks. While that is of course true, there was (and still is, even though it's now a call to `hashfd_ext()`) only a single caller of this function in git-pack-objects(1). This command is responsible for writing packfiles, and those packfiles are often on the bigger side. So arguably: - The user won't care about increments of 8kB when packfiles tend to be megabytes or even gigabytes in size. - Reducing the number of syscalls would be even more valuable here than it would be for multi-pack indices, which was the benchmark done in the mentioned commit, as MIDXs are typically significantly smaller than packfiles. - Nowadays, many internet connections should be able to transfer data at a rate significantly higher than 8kB per second. Update the buffer to instead have a size of `LARGE_PACKET_DATA_MAX - 1`, which translates to ~64kB. This limit was chosen because `git pack-objects --stdout` is most often used when sending packfiles via git-upload-pack(1), where packfile data is chunked into pktlines when using the sideband. Furthermore, most internet connections should have a bandwidth signifcantly higher than 64kB/s, so we'd still be able to observe progress updates at a rate of at least once per second. This change significantly reduces the number of write(3p) syscalls from 355,000 to 44,000 when packing the Linux repository. While this results in a small performance improvement on an otherwise-unused system, this improvement is mostly negligible. More importantly though, it will reduce lock contention in the kernel on an extremely busy system where we have many processes writing data at once. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13csum-file: drop `hashfd_throughput()`Patrick Steinhardt
The `hashfd_throughput()` function is used by a single callsite in git-pack-objects(1). In contrast to `hashfd()`, this function uses a progress meter to measure throughput and a smaller buffer length so that the progress meter can provide more granular metrics. We're going to change that caller in the next commit to be a bit more specific to packing objects. As such, `hashfd_throughput()` will be a somewhat unfitting mechanism for any potential new callers. Drop the function and replace it with a call to `hashfd_ext()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12Merge branch 'ps/odb-sources'Junio C Hamano
The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header
2026-03-10Merge branch 'rs/parse-options-duplicated-long-options'Junio C Hamano
The parse-options API learned to notice an options[] array with duplicated long options. * rs/parse-options-duplicated-long-options: parseopt: check for duplicate long names and numerical options pack-objects: remove duplicate --stdin-packs definition
2026-03-05odb: embed base source in the "files" backendPatrick Steinhardt
The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb: introduce "files" sourcePatrick Steinhardt
Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-02Merge branch 'ps/odb-for-each-object'Junio C Hamano
Revamp object enumeration API around odb. * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-27Merge branch 'ps/for-each-ref-in-fixes'Junio C Hamano
A handful of places used refs_for_each_ref_in() API incorrectly, which has been corrected. * ps/for-each-ref-in-fixes: bisect: simplify string_list memory handling bisect: fix misuse of `refs_for_each_ref_in()` pack-bitmap: fix bug with exact ref match in "pack.preferBitmapTips" pack-bitmap: deduplicate logic to iterate over preferred bitmap tips
2026-02-27pack-objects: remove duplicate --stdin-packs definitionRené Scharfe
cd846bacc7 (pack-objects: introduce '--stdin-packs=follow', 2025-06-23) added a new definition of the option --stdin-packs that accepts an argument. It kept the old definition, which still shows up in the short help, but is shadowed by the new one. Remove it. Hinted-at-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-25Merge branch 'ps/pack-concat-wo-backfill'Junio C Hamano
"git pack-objects --stdin-packs" with "--exclude-promisor-objects" fetched objects that are promised, which was not wanted. This has been fixed. * ps/pack-concat-wo-backfill: builtin/pack-objects: don't fetch objects when merging packs
2026-02-23Merge branch 'ps/odb-for-each-object' into ps/odb-sourcesJunio C Hamano
* ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-19pack-bitmap: deduplicate logic to iterate over preferred bitmap tipsPatrick Steinhardt
We have two locations that iterate over the preferred bitmap tips as configured by the user via "pack.preferBitmapTips". Both of these callsites are subtly wrong: when the preferred bitmap tips contain an exact refname match, then we will hit a `BUG()`. Prepare for the fix by unifying the two callsites into a new `for_each_preferred_bitmap_tip()` function. This removes the last callsite of `bitmap_preferred_tips()` outside of "pack-bitmap.c". As such, convert the function to be local to that file only. Note that the function is still used by a second caller, so we cannot just inline it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-17Merge branch 'rs/clean-includes'Junio C Hamano
Clean up redundant includes of header files. * rs/clean-includes: remove duplicate includes
2026-02-11builtin/pack-objects: don't fetch objects when merging packsPatrick Steinhardt
The "--stdin-packs" option can be used to merge objects from multiple packfiles given via stdin into a new packfile. One big upside of this option is that we don't have to perform a complete rev walk to enumerate objects. Instead, we can simply enumerate all objects that are part of the specified packfiles, which can be significantly faster in very large repositories. There is one downside though: when we don't perform a rev walk we also don't have a good way to learn about the respective object's names. As a consequence, we cannot use the name hashes as a heuristic to get better delta selection. We try to offset this downside though by performing a localized rev walk: we queue all objects that we're about to repack as interesting, and all objects from excluded packfiles as uninteresting. We then perform a best-effort rev walk that allows us to fill in object names. There is one gotcha here though: when "--exclude-promisor-objects" has not been given we will perform backfill fetches for any promised objects that are missing. This used to not be an issue though as this option was mutually exclusive with "--stdin-packs". But that has changed recently, and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with geometric repacking, 2026-01-05) we will now repack promisor packs during geometric compaction. The consequence is that a geometric repack may now perform a bunch of backfill fetches. We of course cannot pass "--exclude-promisor-objects" to fix this issue -- after all, the whole intent is to repack objects part of a promisor pack. But arguably we don't have to: the rev walk is intended as best effort, and we already configure it to ignore missing links to other objects. So we can adapt the walk to unconditionally disable fetching any missing objects. Do so and add a test that verifies we don't backfill any objects. Reported-by: Lukas Wanko <lwanko@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-08remove duplicate includesRené Scharfe
The following command reports that some header files are included twice: $ git grep '#include' '*.c' | sort | uniq -cd Remove the second #include line in each case, as it has no effect. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-29string-list: add string_list_sort_u() that mimics "sort -u"Amisha Chhajed
Many callsites of string_list_remove_duplicates() call it immdediately after calling string_list_sort(), understandably as the former requires string-list to be sorted, it is clear that these places are sorting only to remove duplicates and for no other reason. Introduce a helper function string_list_sort_u that combines these two calls that often appear together, to simplify these callsites. Replace the current calls of those methods with string_list_sort_u(). Signed-off-by: Amisha Chhajed <amishhhaaaa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26builtin/pack-objects: use `packfile_store_for_each_object()`Patrick Steinhardt
When enumerating objects that are supposed to be stored in a new cruft pack we use `for_each_packed_object()` and then derive each object's mtime individually. Refactor this logic to instead use the new `packfile_store_for_each_object()` function with an object info request that asks for the respective mtimes. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: rename `FOR_EACH_OBJECT_*` flagsPatrick Steinhardt
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-21Merge branch 'ps/geometric-repacking-with-promisor-remotes'Junio C Hamano
"git repack --geometric" did not work with promisor packs, which has been corrected. * ps/geometric-repacking-with-promisor-remotes: builtin/repack: handle promisor packs with geometric repacking repack-promisor: extract function to remove redundant packs repack-promisor: extract function to finalize repacking repack-geometry: extract function to compute repacking split builtin/pack-objects: exclude promisor objects with "--stdin-packs"
2026-01-21Merge branch 'ps/read-object-info-improvements'Junio C Hamano
The object-info API has been cleaned up. * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info
2026-01-21Merge branch 'ps/packfile-store-in-odb-source'Junio C Hamano
The packfile_store data structure is moved from object store to odb source. * ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source
2026-01-15Merge branch 'ps/packfile-store-in-odb-source' into ps/odb-for-each-objectJunio C Hamano
* ps/packfile-store-in-odb-source: packfile: move MIDX into packfile store packfile: refactor `find_pack_entry()` to work on the packfile store packfile: inline `find_kept_pack_entry()` packfile: only prepare owning store in `packfile_store_prepare()` packfile: only prepare owning store in `packfile_store_get_packs()` packfile: move packfile store into object source packfile: refactor misleading code when unusing pack windows packfile: refactor kept-pack cache to work with packfile stores packfile: pass source to `prepare_pack()` packfile: create store via its owning source odb: properly close sources before freeing them builtin/gc: fix condition for whether to write commit graphs
2026-01-15Merge branch 'ps/read-object-info-improvements' into ps/odb-for-each-objectJunio C Hamano
* ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info
2026-01-14builtin/pack-objects: exclude promisor objects with "--stdin-packs"Patrick Steinhardt
It is currently not possible to combine "--exclude-promisor-objects" with "--stdin-packs" because both flags want to set up a revision walk to enumerate the objects to pack. In a subsequent commit though we want to extend geometric repacks to support promisor objects, and for that we need to handle the combination of both flags. There are two cases we have to think about here: - "--stdin-packs" asks us to pack exactly the objects part of the specified packfiles. It is somewhat questionable what to do in the case where the user asks us to exclude promisor objects, but at the same time explicitly passes a promisor pack to us. For now, we simply abort the request as it is self-contradicting. As we have also been dying before this commit there is no regression here. - "--stdin-packs=follow" does the same as the first flag, but it also asks us to include all objects transitively reachable from any object in the packs we are about to repack. This is done by doing the revision walk mentioned further up. Luckily, fixing this case is trivial: we only need to modify the revision walk to also set the `exclude_promisor_objects` field. Note that we do not support the "--exclude-promisor-objects-best-effort" flag for now as we don't need it to support geometric repacking with promisor objects. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12packfile: drop repository parameter from `packed_object_info()`Patrick Steinhardt
The function `packed_object_info()` takes a packfile and offset and returns the object info for the corresponding object. Despite these two parameters though it also takes a repository pointer. This is redundant information though, as `struct packed_git` already has a repository pointer that is always populated. Drop the redundant parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: move packfile store into object sourcePatrick Steinhardt
The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: refactor kept-pack cache to work with packfile storesPatrick Steinhardt
The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-29tag: support arbitrary repositories in parse_tag()René Scharfe
Allow callers of parse_tag() pass in the repository to use. Let most of them pass in the_repository to get the same result as before. One of them has stopped using the_repository in ef9b0370da (sha1-name.c: store and use repo in struct disambiguate_state, 2019-04-16); let it pass in its stored repository. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-16Merge branch 'ps/object-read-stream'Junio C Hamano
The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-12-15Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-sourceJunio C Hamano
* ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-11-24Merge branch 'ps/object-source-loose'Junio C Hamano
A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable
2025-11-23streaming: drop redundant type and size pointersPatrick Steinhardt
In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: move into object database subsystemPatrick Steinhardt
The "streaming" terminology is somewhat generic, so it may not be immediately obvious that "streaming.{c,h}" is specific to the object database. Rectify this by moving it into the "odb/" directory so that it can be immediately attributed to the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: refactor interface to be object-database-centricPatrick Steinhardt
Refactor the streaming interface to be centered around object databases instead of centered around the repository. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: rename `git_istream` into `odb_read_stream`Patrick Steinhardt
In the following patches we are about to make the `git_istream` more generic so that it becomes fully controlled by the specific object source that wants to create it. As part of these refactorings we'll fully move the structure into the object database subsystem. Prepare for this change by renaming the structure from `git_istream` to `odb_read_stream`. This mirrors the `odb_write_stream` structure that we already have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-19Merge branch 'ps/ref-peeled-tags'Junio C Hamano
Some ref backend storage can hold not just the object name of an annotated tag, but the object name of the object the tag points at. The code to handle this information has been streamlined. * ps/ref-peeled-tags: t7004: do not chdir around in the main process ref-filter: fix stale parsed objects ref-filter: parse objects on demand ref-filter: detect broken tags when dereferencing them refs: don't store peeled object IDs for invalid tags object: add flag to `peel_object()` to verify object type refs: drop infrastructure to peel via iterators refs: drop `current_ref_iter` hack builtin/show-ref: convert to use `reference_get_peeled_oid()` ref-filter: propagate peeled object ID upload-pack: convert to use `reference_get_peeled_oid()` refs: expose peeled object ID via the iterator refs: refactor reference status flags refs: fully reset `struct ref_iterator::ref` on iteration refs: introduce `.ref` field for the base iterator refs: introduce wrapper struct for `each_ref_fn`
2025-11-04refs: expose peeled object ID via the iteratorPatrick Steinhardt
Both the "files" and "reftable" backend are able to store peeled values for tags in the respective formats. This allows for a more efficient lookup of the target object of such a tag without having to manually peel via the object database. The infrastructure to access these peeled object IDs is somewhat funky though. When iterating through objects, we store a pointer reference to the current iterator in a global variable. The callbacks invoked by that iterator are then expected to call `peel_iterated_oid()`, which checks whether the globally-stored iterator's current reference refers to the one handed into that function. If so, we ask the iterator to peel the object, otherwise we manually peel the object via the object database. Depending on global state like this is somewhat weird and also quite fragile. Introduce a new `struct reference::peeled_oid` field that can be populated by the reference backends. This field can be accessed via a new function `reference_get_peeled_oid()` that either uses that value, if set, or alternatively peels via the ODB. With this change we don't have to rely on global state anymore, but make the peeled object ID available to the callback functions directly. Adjust trivial callers that already have a `struct reference` available. Remaining callers will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: rename `has_loose_object()`Patrick Steinhardt
Rename `has_loose_object()` to `odb_source_loose_has_object()` so that it becomes clear that this is tied to a specific loose object source. This matches our modern naming schema for functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30Merge branch 'ps/remove-packfile-store-get-packs'Junio C Hamano
Two slightly different ways to get at "all the packfiles" in API has been cleaned up. * ps/remove-packfile-store-get-packs: packfile: rename `packfile_store_get_all_packs()` packfile: introduce macro to iterate through packs packfile: drop `packfile_store_get_packs()` builtin/grep: simplify how we preload packs builtin/gc: convert to use `packfile_store_get_all_packs()` object-name: convert to use `packfile_store_get_all_packs()`
2025-10-30packfile: track packs via the MRU list exclusivelyPatrick Steinhardt
We track packfiles via two different lists: - `struct packfile_store::packs` is a list that sorts local packs first. In addition, these packs are sorted so that younger packs are sorted towards the front. - `struct packfile_store::mru` is a list that sorts packs so that most-recently used packs are at the front. The reasoning behind the ordering in the `packs` list is that younger objects stored in the local object store tend to be accessed more frequently, and that is certainly true for some cases. But there are going to be lots of cases where that isn't true. Especially when traversing history it is likely that one needs to access many older objects, and due to our housekeeping it is very likely that almost all of those older objects will be contained in one large pack that is oldest. So whether or not the ordering makes sense really depends on the use case at hand. A flexible approach like our MRU list addresses that need, as it will sort packs towards the front that are accessed all the time. Intuitively, this approach is thus able to satisfy more use cases more efficiently. This reasoning casts some doubt on whether or not it really makes sense to track packs via two different lists. It causes confusion, and it is not clear whether there are use cases where the `packs` list really is such an obvious choice. Merge these two lists into one most-recently-used list. Note that there is one important edge case: `for_each_packed_object()` uses the MRU list to iterate through packs, and then it lists each object in those packs. This would have the effect that we now sort the current pack towards the front, thus modifying the list of packfiles we are iterating over, with the consequence that we'll see an infinite loop. This edge case is worked around by introducing a new field that allows us to skip updating the MRU. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30builtin/pack-objects: simplify logic to find kept or nonlocal objectsPatrick Steinhardt
The function `has_sha1_pack_kept_or_nonlocal()` takes an object ID and then searches through packed objects to figure out whether the object exists in a kept or non-local pack. As a performance optimization we remember the packfile that contains a given object ID so that the next call to the function first checks that same packfile again. The way this is written is rather hard to follow though, as the caching mechanism is intertwined with the loop that iterates through the packs. Consequently, we need to do some gymnastics to re-start the iteration if the cached pack does not contain the objects. Refactor this so that we check the cached packfile at the beginning. We don't have to re-verify whether the packfile meets the properties as we have already verified those when storing the pack in `last_found` in the first place. So all we need to do is to use `find_pack_entry_one()` to check whether the pack contains the object ID, and to skip the cached pack in the loop so that we don't search it twice. Furthermore, stop using the `(void *)1` sentinel value and instead use a simple `NULL` pointer to indicate that we don't have a last-found pack yet. This refactoring significantly simplifies the logic and makes it much easier to follow. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: move the MRU list into the packfile storePatrick Steinhardt
Packfiles have two lists associated to them: - A list that keeps track of packfiles in the order that they were added to a packfile store. - A list that keeps track of packfiles in most-recently-used order so that packfiles that are more likely to contain a specific object are ordered towards the front. Both of these lists are hosted by `struct packed_git` itself, So to identify all packfiles in a repository you simply need to grab the first packfile and then iterate the `->next` pointers or the MRU list. This pattern has the problem that all packfiles are part of the same list, regardless of whether or not they belong to the same object source. With the upcoming pluggable object database effort this needs to change: packfiles should be contained by a single object source, and reading an object from any such packfile should use that source to look up the object. Consequently, we need to break up the global lists of packfiles into per-object-source lists. A first step towards this goal is to move those lists out of `struct packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series. Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>