aboutsummaryrefslogtreecommitdiff
path: root/object-file.c
AgeCommit message (Collapse)Author
7 daysMerge branch 'jt/index-fd-wo-repo-regression-fix-maint'Junio C Hamano
During Git 2.52 timeframe, we broke streaming computation of object hash outside a repository, which has been corrected. * jt/index-fd-wo-repo-regression-fix-maint: object-file: avoid ODB transaction when not writing objects
7 daysMerge branch 'ps/odb-cleanup'Junio C Hamano
Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags
7 daysobject-file: avoid ODB transaction when not writing objectsJustin Tobler
In ce1661f9da (odb: add transaction interface, 2025-09-16), existing ODB transaction logic is adapted to create a transaction interface at the ODB layer. The intent here is for the ODB transaction interface to eventually provide an object source agnostic means to manage transactions. An unintended consequence of this change though is that `object-file.c:index_fd()` may enter the ODB transaction path even when no object write is requested. In non-repository contexts, this can result in a NULL dereference and segfault. One such case occurs when running git-diff(1) outside of a repository with "core.bigFileThreshold" forcing the streaming path in `index_fd()`: $ echo foo >foo $ echo bar >bar $ git -c core.bigFileThreshold=1 diff -- foo bar In this scenario, the caller only needs to compute the object ID. Object hashing does not require an ODB, so starting a transaction is both unnecessary and invalid. Fix the bug by avoiding the use of ODB transactions in `index_fd()` when callers are only interested in computing the object hash. Reported-by: Luca Stefani <luca.stefani.ge1@gmail.com> Signed-off-by: Justin Tobler <jltobler@gmail.com> [jc: adjusted to fd13909e (Merge branch 'jt/odb-transaction', 2025-10-02)] Signed-off-by: Junio C Hamano <gitster@pobox.com>
7 daysMerge branch 'ps/fsck-wo-the-repository'Junio C Hamano
Internals of "git fsck" have been refactored to not depend on the global `the_repository` variable. * ps/fsck-wo-the-repository: builtin/fsck: stop using `the_repository` in error reporting builtin/fsck: stop using `the_repository` when marking objects builtin/fsck: stop using `the_repository` when checking packed objects builtin/fsck: stop using `the_repository` with loose objects builtin/fsck: stop using `the_repository` when checking reflogs builtin/fsck: stop using `the_repository` when checking refs builtin/fsck: stop using `the_repository` when snapshotting refs builtin/fsck: fix trivial dependence on `the_repository` fsck: drop USE_THE_REPOSITORY fsck: store repository in fsck options fsck: initialize fsck options via a function fetch-pack: move fsck options into function scope
2026-03-31odb: rename `odb_has_object()` flagsPatrick Steinhardt
Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31odb: use enum for `odb_write_object` flagsPatrick Steinhardt
We've got a couple of functions that accept `odb_write_object()` flags, but all of them accept the flags as an `unsigned` integer. In fact, we don't even have an `enum` for the flags field. Introduce this `enum` and adapt functions accordingly according to our coding style. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-31odb: rename `odb_write_object()` flagsPatrick Steinhardt
Rename `odb_write_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: store repository in fsck optionsPatrick Steinhardt
The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: initialize fsck options via a functionPatrick Steinhardt
We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20object-name: move logic to compute loose abbreviation lengthPatrick Steinhardt
The function `repo_find_unique_abbrev_r()` takes as input an object ID as well as a minimum object ID length and returns the minimum required prefix to make the object ID unique. The logic that computes the abbreviation length for loose objects is deeply tied to the loose object storage format. As such, it would fail in case a different object storage format was used. Prepare for making this logic generic to the backend by moving the logic into a new `odb_source_loose_find_abbrev_len()` function that is part of "object-file.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20object-name: move logic to iterate through loose prefixed objectsPatrick Steinhardt
The logic to iterate through loose objects that have a certain prefix is currently hosted in "object-name.c". This logic reaches into specifics of the loose object source, so it breaks once a different backend is used for the object storage. Move the logic to iterate through loose objects with a prefix into "object-file.c". This is done by extending the for-each-object options to support an optional prefix that is then honored by the loose source. Naturally, we'll also have this support in the packfile store. This is done in the next commit. Furthermore, there are no users of the loose cache outside of "object-file.c" anymore. As such, convert `odb_source_loose_cache()` to have file scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20odb: introduce `struct odb_for_each_object_options`Patrick Steinhardt
The `odb_for_each_object()` function only accepts a bitset of flags. In a subsequent commit we'll want to change object iteration to also support iterating over only those objects that have a specific prefix. While we could of course add the prefix to the function signature, or alternatively introduce a new function, both of these options don't really seem to be that sensible. Instead, introduce a new `struct odb_for_each_object_options` that can be passed to a new `odb_for_each_object_ext()` function. Splice through the options structure into the respective object database sources. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-20Merge branch 'ps/object-counting' into ps/odb-generic-object-name-handlingJunio C Hamano
* ps/object-counting: object-file: fix sparse 'plain integer as NULL pointer' error odb: introduce generic object counting odb/source: introduce generic object counting object-file: generalize counting objects object-file: extract logic to approximate object count packfile: extract logic to count number of objects odb: stop including "odb/source.h"
2026-03-19object-file: fix sparse 'plain integer as NULL pointer' errorRamsay Jones
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-16Merge branch 'jk/unleak-mmap'Junio C Hamano
Plug a few leaks where mmap'ed memory regions are not unmapped. * jk/unleak-mmap: meson: turn on NO_MMAP when building with LSan Makefile: turn on NO_MMAP when building with LSan object-file: fix mmap() leak in odb_source_loose_read_object_stream() pack-revindex: avoid double-loading .rev files check_connected(): fix leak of pack-index mmap check_connected(): delay opening new_pack
2026-03-12Merge branch 'ps/odb-sources'Junio C Hamano
The object source API is getting restructured to allow plugging new backends. * ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header
2026-03-12object-file: generalize counting objectsPatrick Steinhardt
Generalize the function introduced in the preceding commit to not only be able to approximate the number of loose objects, but to also provide an accurate count. The behaviour can be toggled via a new flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-12object-file: extract logic to approximate object countPatrick Steinhardt
In "builtin/gc.c" we have some logic that checks whether we need to repack objects. This is done by counting the number of objects that we have and checking whether it exceeds a certain threshold. We don't really need an accurate object count though, which is why we only open a single object directory shard and then extrapolate from there. Extract this logic into a new function that is owned by the loose object database source. This is done to prepare for a subsequent change, where we'll introduce object counting on the object database source level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-10Merge branch 'ps/odb-sources' into ps/object-countingJunio C Hamano
* ps/odb-sources: odb/source: make `begin_transaction()` function pluggable odb/source: make `write_alternate()` function pluggable odb/source: make `read_alternates()` function pluggable odb/source: make `write_object_stream()` function pluggable odb/source: make `write_object()` function pluggable odb/source: make `freshen_object()` function pluggable odb/source: make `for_each_object()` function pluggable odb/source: make `read_object_stream()` function pluggable odb/source: make `read_object_info()` function pluggable odb/source: make `close()` function pluggable odb/source: make `reprepare()` function pluggable odb/source: make `free()` function pluggable odb/source: introduce source type for robustness odb: move reparenting logic into respective subsystems odb: embed base source in the "files" backend odb: introduce "files" source odb: split `struct odb_source` into separate header
2026-03-06object-file: fix mmap() leak in odb_source_loose_read_object_stream()Jeff King
We mmap() a loose object file, storing the result in the local variable "mapped", which is eventually assigned into our stream struct as "st.mapped". If we hit an error, we jump to an error label which does: munmap(st.mapped, st.mapsize); to clean up. But this is wrong; we don't assign st.mapped until the end of the function, after all of the "goto error" jumps. So this munmap() is never cleaning up anything (st.mapped is always NULL, because we initialize the struct with calloc). Instead, we should feed the local variable to munmap(). This leak is due to 595296e124 (streaming: allocate stream inside the backend-specific logic, 2025-11-23), which introduced the local variable. Before that, we assigned the mmap result directly into st.mapped. It was probably switched there so that we do not have to allocate/free the struct when the map operation fails (e.g., because we don't have the loose object). Before that commit, the struct was passed in from the caller, so there was no allocation at all. You can see the leak in the test suite by building with: make SANITIZE=leak NO_MMAP=1 CC=clang and running t1060. We need NO_MMAP so that the mmap() is backed by an actual malloc(), which allows LSan to detect it. And the leak seems not to be detected when compiling with gcc, probably due to some internal compiler decisions about how the stack memory is written. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb/source: make `read_object_info()` function pluggablePatrick Steinhardt
Introduce a new callback function in `struct odb_source` to make the function pluggable. Note that this function is a bit less straight-forward to convert compared to the other functions. The reason here is that the logic to read an object is: 1. We try to read the object. If it exists we return it. 2. If the object does not exist we reprepare the object database source. 3. We then try reading the object info a second time in case the reprepare caused it to appear. The second read is only supposed to happen for the packfile store though, as reading loose objects is not impacted by repreparing the object database. Ideally, we'd just move this whole logic into the ODB source. But that's not easily possible because we try to avoid the reprepare unless really required, which is after we have found out that no other ODB source contains the object, either. So the logic spans across multiple ODB sources, and consequently we cannot move it into an individual source. Instead, introduce a new flag `OBJECT_INFO_SECOND_READ` that tells the backend that we already tried to look up the object once, and that this time around the ODB source should try to find any new objects that may have surfaced due to an on-disk change. With this flag, the "files" backend can trivially skip trying to re-read the object as a loose object. Furthermore, as we know that we only try the second read via the packfile store, we can skip repreparing loose objects and only reprepare the packfile store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb: embed base source in the "files" backendPatrick Steinhardt
The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb: introduce "files" sourcePatrick Steinhardt
Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05Merge branch 'ps/fsck-stream-from-the-right-object-instance'Junio C Hamano
"fsck" iterates over packfiles and its access to pack data caused the list to be permuted, which caused it to loop forever; the code to access pack data by "fsck" has been updated to avoid this. * ps/fsck-stream-from-the-right-object-instance: pack-check: fix verification of large objects packfile: expose function to read object stream for an offset object-file: adapt `stream_object_signature()` to take a stream t/helper: improve "genrandom" test helper
2026-03-02Merge branch 'jt/object-file-use-container-of'Junio C Hamano
Code clean-up. * jt/object-file-use-container-of: object-file.c: avoid container_of() of a NULL container object-file: use `container_of()` to convert from base types
2026-03-02Merge branch 'ps/object-info-bits-cleanup'Junio C Hamano
A couple of bugs in use of flag bits around odb API has been corrected, and the flag bits reordered. * ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`
2026-03-02Merge branch 'ps/odb-for-each-object'Junio C Hamano
Revamp object enumeration API around odb. * ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-23Merge branch 'ps/object-info-bits-cleanup' into ps/odb-sourcesJunio C Hamano
* ps/object-info-bits-cleanup: odb: convert `odb_has_object()` flags into an enum odb: convert object info flags into an enum odb: drop gaps in object info flag values builtin/fsck: fix flags passed to `odb_has_object()` builtin/backfill: fix flags passed to `odb_has_object()`
2026-02-23Merge branch 'ps/odb-for-each-object' into ps/odb-sourcesJunio C Hamano
* ps/odb-for-each-object: odb: drop unused `for_each_{loose,packed}_object()` functions reachable: convert to use `odb_for_each_object()` builtin/pack-objects: use `packfile_store_for_each_object()` odb: introduce mtime fields for object info requests treewide: drop uses of `for_each_{loose,packed}_object()` treewide: enumerate promisor objects via `odb_for_each_object()` builtin/fsck: refactor to use `odb_for_each_object()` odb: introduce `odb_for_each_object()` packfile: introduce function to iterate through objects packfile: extract function to iterate through objects of a store object-file: introduce function to iterate through objects object-file: extract function to read object info from path odb: fix flags parameter to be unsigned odb: rename `FOR_EACH_OBJECT_*` flags
2026-02-23object-file: adapt `stream_object_signature()` to take a streamPatrick Steinhardt
The function `stream_object_signature()` is responsible for verifying whether the given object ID matches the actual hash of the object's contents. In contrast to `check_object_signature()` it does so in a streaming fashion so that we don't have to load the full object into memory. In a subsequent commit we'll want to adapt one of its callsites to pass a preconstructed stream. Prepare for this by accepting a stream as input that the caller needs to assemble. While at it, improve the error reporting in `parse_object_with_flags()` to tell apart the two failure modes. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-22object-file.c: avoid container_of() of a NULL containerJunio C Hamano
Even though the "struct odb_transaction" member is at the beginning of the containing "struct odb_transaction_files", i.e., at offset 0, using container_of() to add offset 0 to a NULL pointer gets flagged as a bad behaviour under SANITIZE=undefined. Use container_of_or_null() to work around this issue. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-19object-file: use `container_of()` to convert from base typesJustin Tobler
To improve code hygiene, replace direct casts from `struct odb_transaction` and `struct odb_read_stream` to their concrete implementations with `container_of()`. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-12odb: convert object info flags into an enumPatrick Steinhardt
Convert the object info flags into an enum and adapt all functions that receive these flags as parameters to use the enum instead of an integer. This serves two purposes: - The function signatures become more self-documenting, as callers don't have to wonder which flags they expect. - The compiler can warn when a wrong flag type is passed. Note that the second benefit is somewhat limited. For example, when or-ing multiple enum flags together the result will be an integer, and the compiler will not warn about such use cases. But where it does help is when a single flag of the wrong type is passed, as the compiler would generate a warning in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02odb: transparently handle common transaction behaviorJustin Tobler
A new ODB transaction is created and returned via `odb_transaction_begin()` and stored in the ODB. Only a single transaction may be pending at a time. If the ODB already has a transaction, the function is expected to return NULL. Similarly, when committing a transaction via `odb_transaction_commit()` the transaction being committed must match the pending transaction and upon commit reset the ODB transaction to NULL. These behaviors apply regardless of the ODB transaction implementation. Move the corresponding logic into `odb_transaction_{begin,commit}()` accordingly. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02odb: prepare `struct odb_transaction` to become genericJustin Tobler
An ODB transaction handles how objects are stored temporarily and eventually committed. Due to object storage being implemented differently for a given ODB source, the ODB transactions must be implemented in a manner specific to the source the objects are being written to. To provide generic transactions, `struct odb_transaction` is updated to store a commit callback that can be configured to support a specific ODB source. For now `struct odb_transaction_files` is the only transaction type and what is always returned when starting a transaction. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02object-file: rename transaction functionsJustin Tobler
In a subsequent commit, ODB transactions are made more generic to facilitate each ODB source providing its own transaction handling. Rename `object_file_transaction_{begin,commit}()` to `odb_transaction_files_{begin,commit}()` to better match the future source specific transaction implementation. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-02odb: store ODB source in `struct odb_transaction`Justin Tobler
Each `struct odb_transaction` currently stores a reference to the `struct object_database`. Since transactions are handled per object source, instead store a reference to the source. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: drop unused `for_each_{loose,packed}_object()` functionsPatrick Steinhardt
We have converted all callers of `for_each_loose_object()` and `for_each_packed_object()` to use their new replacement functions instead. We can thus remove them now. Do so and inline `packfile_store_for_each_object_internal()` now that it only has a single callsite again. This makes it a bit easier to follow the callback indirection that is happening there. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: introduce mtime fields for object info requestsPatrick Steinhardt
There are some use cases where we need to figure out the mtime for objects. Most importantly, this is the case when we want to prune unreachable objects. But getting at that data requires users to manually derive the info either via the loose object's mtime, the packfiles' mtime or via the ".mtimes" file. Introduce a new `struct object_info::mtimep` pointer that allows callers to request an object's mtime. This new field will be used in a subsequent commit. Note that the concept of "mtime" is ambiguous: given an object, it may be stored multiple times in the object database, and each of these instances may have a different mtime. Disambiguating these mtimes is nothing that can happen on the generic ODB layer: the caller may search for the oldest object, the newest object, or even the relation of object mtimes depending on the specific source they are located in. As such, it is the responsibility of the caller to disambiguate mtimes. A consequence of this is that it's most likely incorrect to look up the mtime via `odb_read_object_info()`, as this interface does not give us enough information to disambiguate the mtime. Document this accordingly and tell users to use `odb_for_each_object()` instead. Even with this gotcha though it's sensible to have this request as part of the object info, as the mtime is a property of the object storage format. If we for example had a "black-box" storage backend, we'd still need to be able to query it for the mtime info in a generic way. We could introduce a safety mechanism that for example calls `BUG()` in case we look up the mtime outside of `odb_for_each_object()`. But that feels somewhat heavy-handed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26object-file: introduce function to iterate through objectsPatrick Steinhardt
We have multiple divergent interfaces to iterate through objects of a specific backend: - `for_each_loose_object()` yields all loose objects. - `for_each_packed_object()` (somewhat obviously) yields all packed objects. These functions have different function signatures, which makes it hard to create a common abstraction layer that covers both of these. Introduce a new function `odb_source_loose_for_each_object()` to plug this gap. This function doesn't take any data specific to loose objects, but instead it accepts a `struct object_info` that will be populated the exact same as if `odb_source_loose_read_object()` was called. The benefit of this new interface is that we can continue to pass backend-specific data, as `struct object_info` contains a union for these exact use cases. This will allow us to unify how we iterate through objects across both loose and packed objects in a subsequent commit. The `for_each_loose_object()` function continues to exist for now, but it will be removed at the end of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26object-file: extract function to read object info from pathPatrick Steinhardt
Extract a new function that allows us to read object info for a specific loose object via a user-supplied path. This function will be used in a subsequent commit. Note that this also allows us to drop `stat_loose_object()`, which is a simple wrapper around `odb_loose_path()` plus lstat(3p). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: fix flags parameter to be unsignedPatrick Steinhardt
The `flags` parameter accepted by various `for_each_object()` functions is a bitfield of multiple flags. Such parameters are typically unsigned in the Git codebase, but we use `enum odb_for_each_object_flags` in some places. Adapt these function signatures to use the correct type. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: rename `FOR_EACH_OBJECT_*` flagsPatrick Steinhardt
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12object-file: always set OI_LOOSE when reading object infoPatrick Steinhardt
There are some early returns in `odb_source_loose_read_object_info()` in cases where we don't have to open the loose object. These return paths do not set `struct object_info::whence` to `OI_LOOSE` though, so it becomes impossible for the caller to tell the format of such an object. The root cause of this really is that we have so many different return paths in the function. As a consequence, it's harder than necessary to make sure that all successful exit paths sot up the `whence` field as expected. Address this by refactoring the function to have a single exit path. Like this, we can trivially set up the `whence` field when we exit successfully from the function. Note that we also: - Rename `status` to `ret` to match our usual coding style, but also to show that the old `status` variable is now always getting the expected value. Furthermore, the value is not initialized anymore, which has the consequence that most compilers will warn for exit paths where we forgot to set it. - Move the setup of scratch pointers closer to `parse_loose_header()` to show where it's needed. - Guard a couple of variables on cleanup so that they only get released in case they have been set up. - Reset `oi->delta_base_oid` towards the end of the function, together with all the other object info pointers. Overall, all these changes result in a diff that is somewhat hard to read. But the end result is significantly easier to read and reason about, so I'd argue this one-time churn is worth it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-18Merge branch 'jc/object-read-stream-fix' into ps/read-object-info-improvementsJunio C Hamano
* jc/object-read-stream-fix: odb: do not use "blank" substitute for NULL
2025-12-18odb: do not use "blank" substitute for NULLJunio C Hamano
When various *object_info() functions are given an extended object info structure as NULL by a caller that does not want any details, the code uses a file-scope static blank_oi and passes it down to the helper functions they use, to avoid handling NULL specifically. The ps/object-read-stream topic graduated to 'master' recently however had a bug that assumed that two identically named file-scope static variables in two functions are the same, which of course is not the case. This made "git commit" take 0.38 seconds to 1508 seconds in some case, as reported by Aaron Plattner here: https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/ We _could_ move the blank_oi variable to the global scope in common section to fix this regression, but explicitly handling the NULL is a much safer fix. It would also reduce the chance of errors that somebody accidentally writes into blank_oi, making its contents dirty, which potentially will make subsequent calls into the function misbehave. By explicitly handling NULL input, we no longer have to worry about it. Reported-by: Aaron Plattner <aplattner@nvidia.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-16Merge branch 'ps/object-read-stream'Junio C Hamano
The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-11-24Merge branch 'ps/object-source-loose'Junio C Hamano
A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable
2025-11-24Merge branch 'bc/submodule-force-same-hash'Junio C Hamano
Adding a repository that uses a different hash function is a no-no, but "git submodule add" did nt prevent it, which has been corrected. * bc/submodule-force-same-hash: read-cache: drop submodule check from add_to_cache() object-file: disallow adding submodules of different hash algo
2025-11-23streaming: drop redundant type and size pointersPatrick Steinhardt
In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>