summaryrefslogtreecommitdiff
path: root/packfile.h
AgeCommit message (Collapse)Author
2026-01-21Merge branch 'ps/read-object-info-improvements'Junio C Hamano
The object-info API has been cleaned up. * ps/read-object-info-improvements: packfile: drop repository parameter from `packed_object_info()` packfile: skip unpacking object header for disk size requests packfile: disentangle return value of `packed_object_info()` packfile: always populate pack-specific info when reading object info packfile: extend `is_delta` field to allow for "unknown" state packfile: always declare object info to be OI_PACKED object-file: always set OI_LOOSE when reading object info
2026-01-12packfile: drop repository parameter from `packed_object_info()`Patrick Steinhardt
The function `packed_object_info()` takes a packfile and offset and returns the object info for the corresponding object. Despite these two parameters though it also takes a repository pointer. This is redundant information though, as `struct packed_git` already has a repository pointer that is always populated. Drop the redundant parameter. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12packfile: disentangle return value of `packed_object_info()`Patrick Steinhardt
The `packed_object_info()` function returns the type of the packed object. While we use an `enum object_type` to store the return value, this type is not to be confused with the actual object type. It _may_ contain the object type, but it may just as well encode that the given packed object is stored as a delta. We have removed the only caller that relied on this returned object type in the preceding commit, so let's simplify semantics and return either 0 on success or a negative error code otherwise. This unblocks a small optimization where we can skip reading the object type altogether. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: move MIDX into packfile storePatrick Steinhardt
The multi-pack index still is tracked as a member of the object database source, but ultimately the MIDX is always tied to one specific packfile store. Move the structure into `struct packfile_store` accordingly. This ensures that the packfile store now keeps track of all data related to packfiles. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: inline `find_kept_pack_entry()`Patrick Steinhardt
The `find_kept_pack_entry()` function is only used in `has_object_kept_pack()`, which is only a trivial wrapper itself. Inline the latter into the former. Furthermore, reorder the code so that we can drop the declaration of the function in "packfile.h". This allows us to make the function file-local. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: move packfile store into object sourcePatrick Steinhardt
The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: refactor kept-pack cache to work with packfile storesPatrick Steinhardt
The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: create store via its owning sourcePatrick Steinhardt
In subsequent patches we're about to move the packfile store from the object database layer into the object database source layer. Once done, we'll have one packfile store per source, where the source is owning the store. Prepare for this future and refactor `packfile_store_new()` to be initialized via an object database source instead of via the object database itself. This refactoring leads to a weird in-between state where the store is owned by the object database but created via the source. But this makes subsequent refactorings easier because we can now start to access the owning source of a given store. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-16Merge branch 'ps/object-read-stream'Junio C Hamano
The "git_istream" abstraction has been revamped to make it easier to interface with pluggable object database design. * ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-12-15Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-sourceJunio C Hamano
* ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-12-05Merge branch 'ps/object-source-management'Junio C Hamano
Code refactoring around object database sources. * ps/object-source-management: odb: handle recreation of quarantine directories odb: handle changing a repository's commondir chdir-notify: add function to unregister listeners odb: handle initialization of sources in `odb_new()` http-push: stop setting up `the_repository` for each reference t/helper: stop setting up `the_repository` repeatedly builtin/index-pack: fix deferred fsck outside repos oidset: introduce `oidset_equal()` odb: move logic to disable ref updates into repo odb: refactor `odb_clear()` to `odb_free()` odb: adopt logic to close object databases setup: convert `set_git_dir()` to have file scope path: move `enter_repo()` into "setup.c"
2025-11-24Merge branch 'ps/object-source-loose'Junio C Hamano
A part of code paths that deals with loose objects has been cleaned up. * ps/object-source-loose: object-file: refactor writing objects via a stream object-file: rename `write_object_file()` object-file: refactor freshening of objects object-file: rename `has_loose_object()` object-file: read objects via the loose object source object-file: move loose object map into loose source object-file: hide internals when we need to reprepare loose sources object-file: move loose object cache into loose source object-file: introduce `struct odb_source_loose` object-file: move `fetch_if_missing` odb: adjust naming to free object sources odb: introduce `odb_source_new()` odb: fix subtle logic to check whether an alternate is usable
2025-11-23streaming: move logic to read packed objects streams into backendPatrick Steinhardt
Move the logic to read packed object streams into the respective subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23packfile: introduce function to read object info from a storePatrick Steinhardt
Extract the logic to read object info for a packed object from `do_oid_object_into_extended()` into a standalone function that operates on the packfile store. This function will be used in a subsequent commit. Note that this change allows us to make `find_pack_entry()` an internal implementation detail. As a consequence though we have to move around `packfile_store_freshen_object()` so that it is defined after that function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-19odb: adopt logic to close object databasesPatrick Steinhardt
The logic to close an object database is currently contained in the packfile subsystem. That choice is somewhat relatable, as most of the logic really is to close resources associated with the packfile store itself. But we also end up handling object sources and commit graphs, which certainly is not related to packfiles. Move the function into the object database subsystem and rename it to `odb_close()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03object-file: refactor freshening of objectsPatrick Steinhardt
When writing an object that already exists in our object database we skip the write and instead only update mtimes of the object, either in its packed or loose object format. This logic is wholly contained in "object-file.c", but that file is really only concerned with loose objects. So it does not really make sense that it also contains the logic to freshen a packed object. Introduce a new `odb_freshen_object()` function that sits on the object database level and two functions `packfile_store_freshen_object()` and `odb_source_loose_freshen_object()`. Like this, the format-specific functions can be part of their respective subsystems, while the backend agnostic function to freshen an object sits at the object database layer. Note that this change also moves the logic that iterates through object sources from the object source layer into the object database layer. This change is intentional: object sources should ideally only have to worry about themselves, and coordination of different sources should be handled on the object database level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: track packs via the MRU list exclusivelyPatrick Steinhardt
We track packfiles via two different lists: - `struct packfile_store::packs` is a list that sorts local packs first. In addition, these packs are sorted so that younger packs are sorted towards the front. - `struct packfile_store::mru` is a list that sorts packs so that most-recently used packs are at the front. The reasoning behind the ordering in the `packs` list is that younger objects stored in the local object store tend to be accessed more frequently, and that is certainly true for some cases. But there are going to be lots of cases where that isn't true. Especially when traversing history it is likely that one needs to access many older objects, and due to our housekeeping it is very likely that almost all of those older objects will be contained in one large pack that is oldest. So whether or not the ordering makes sense really depends on the use case at hand. A flexible approach like our MRU list addresses that need, as it will sort packs towards the front that are accessed all the time. Intuitively, this approach is thus able to satisfy more use cases more efficiently. This reasoning casts some doubt on whether or not it really makes sense to track packs via two different lists. It causes confusion, and it is not clear whether there are use cases where the `packs` list really is such an obvious choice. Merge these two lists into one most-recently-used list. Note that there is one important edge case: `for_each_packed_object()` uses the MRU list to iterate through packs, and then it lists each object in those packs. This would have the effect that we now sort the current pack towards the front, thus modifying the list of packfiles we are iterating over, with the consequence that we'll see an infinite loop. This edge case is worked around by introducing a new field that allows us to skip updating the MRU. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: move list of packs into the packfile storePatrick Steinhardt
Move the list of packs into the packfile store. This follows the same logic as in a previous commit, where we moved the most-recently-used list of packs, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30http: refactor subsystem to use `packfile_list`sPatrick Steinhardt
The dumb HTTP protocol directly fetches packfiles from the remote server and temporarily stores them in a list of packfiles. Those packfiles are not yet added to the repository's packfile store until we finalize the whole fetch. Refactor the code to instead use a `struct packfile_list` to store those packs. This prepares us for a subsequent change where the `->next` pointer of `struct packed_git` will go away. Note that this refactoring creates some temporary duplication of code, as we now have both `packfile_list_find_oid()` and `find_oid_pack()`. The latter function will be removed in a subsequent commit though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: move the MRU list into the packfile storePatrick Steinhardt
Packfiles have two lists associated to them: - A list that keeps track of packfiles in the order that they were added to a packfile store. - A list that keeps track of packfiles in most-recently-used order so that packfiles that are more likely to contain a specific object are ordered towards the front. Both of these lists are hosted by `struct packed_git` itself, So to identify all packfiles in a repository you simply need to grab the first packfile and then iterate the `->next` pointers or the MRU list. This pattern has the problem that all packfiles are part of the same list, regardless of whether or not they belong to the same object source. With the upcoming pluggable object database effort this needs to change: packfiles should be contained by a single object source, and reading an object from any such packfile should use that source to look up the object. Consequently, we need to break up the global lists of packfiles into per-object-source lists. A first step towards this goal is to move those lists out of `struct packed_git` and into the packfile store. While the packfile store is currently sitting on the `struct object_database` level, the intent is to push it down one level into the `struct odb_source` in a subsequent patch series. Introduce a new `struct packfile_list` that is used to manage lists of packfiles and use it to store the list of most-recently-used packfiles in `struct packfile_store`. For now, the new list type is only used in a single spot, but we'll expand its usage in subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30packfile: use a `strmap` to store packs by namePatrick Steinhardt
To allow fast lookups of a packfile by name we use a hashmap that has the packfile name as key and the pack itself as value. But while this is the perfect use case for a `strmap`, we instead use `struct hashmap` and store the hashmap entry in the packfile itself. Simplify the code by using a `strmap` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16packfile: rename `packfile_store_get_all_packs()`Patrick Steinhardt
In a preceding commit we have removed `packfile_store_get_packs()`. With this function removed it's somewhat useless to still have the "all" infix in `packfile_store_get_all_packs()`. Rename the latter to drop that infix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16packfile: introduce macro to iterate through packsPatrick Steinhardt
We have a bunch of different sites that want to iterate through all packs of a given `struct packfile_store`. This pattern is somewhat verbose and repetitive, which makes it somewhat cumbersome. Introduce a new macro `repo_for_each_pack()` that removes some of the boilerplate. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-16packfile: drop `packfile_store_get_packs()`Patrick Steinhardt
In the preceding commits we have removed all remaining callers of `packfile_store_get_packs()`, the function is thus unused now. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: refactor `get_packed_git_mru()` to work on packfile storePatrick Steinhardt
The `get_packed_git_mru()` function prepares the packfile store and then returns its packfiles in most-recently-used order. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: refactor `get_all_packs()` to work on packfile storePatrick Steinhardt
The `get_all_packs()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: refactor `get_packed_git()` to work on packfile storePatrick Steinhardt
The `get_packed_git()` function prepares the packfile store and then returns its packfiles. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: move `get_multi_pack_index()` into "midx.c"Patrick Steinhardt
The `get_multi_pack_index()` function is declared and implemented in the packfile subsystem, even though it really belongs into the multi-pack index subsystem. The reason for this is likely that it needs to call `packfile_store_prepare()`, which is not exposed by the packfile system. In a subsequent commit we're about to add another caller outside of the packfile system though, so we'll have to expose the function anyway. Do so now already and move `get_multi_pack_index()` into the MIDX subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: introduce function to load and add packfilesPatrick Steinhardt
We have a recurring pattern where we essentially perform an upsert of a packfile in case it isn't yet known by the packfile store. The logic to do so is non-trivial as we have to reconstruct the packfile's key, check the map of packfiles, then create the new packfile and finally add it to the store. Introduce a new function that does this dance for us. Refactor callsites to use it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: refactor `install_packed_git()` to work on packfile storePatrick Steinhardt
The `install_packed_git()` functions adds a packfile to a specific object store. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: split up responsibilities of `reprepare_packed_git()`Patrick Steinhardt
In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24odb: move kept cache into `struct packfile_store`Patrick Steinhardt
The object database tracks a cache of "kept" packfiles, which is used by git-pack-objects(1) to handle cruft objects. With the introduction of the `struct packfile_store` we have a better place to host this cache though. Move the cache accordingly. This moves the last bit of packfile-related state from the object database into the packfile store. Adapt the comment for the `packfiles` pointer in `struct object_database` to reflect this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24odb: move MRU list of packfiles into `struct packfile_store`Patrick Steinhardt
The object database tracks the list of packfiles in most-recently-used order, which is mostly used to favor reading from packfiles that contain most of the objects that we're currently accessing. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24odb: move packfile map into `struct packfile_store`Patrick Steinhardt
The object database tracks a map of packfiles by their respective paths, which is used to figure out whether a given packfile has already been loaded. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the map accordingly. `pack_map_entry_cmp()` isn't used anywhere but in "packfile.c" anymore after this change, so we convert it to a static function, as well. Note that we also drop the `inline` hint: the function is used as a callback function exclusively, and callbacks cannot be inlined. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24odb: move initialization bit into `struct packfile_store`Patrick Steinhardt
The object database knows to skip re-initializing the list of packfiles in case it's already been initialized. Whether or not that is the case is tracked via a separate `initialized` bit that is stored in the object database. With the introduction of the `struct packfile_store` we have a better place to host this bit though. Move it accordingly. While at it, convert the field into a boolean now that we're allowed to use them in our code base. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24odb: move list of packfiles into `struct packfile_store`Patrick Steinhardt
The object database tracks the list of packfiles it currently knows about. With the introduction of the `struct packfile_store` we have a better place to host this list though. Move the list accordingly. Extract the logic from `odb_clear()` that knows to close all such packfiles and move it into the new subsystem, as well. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: introduce a new `struct packfile_store`Patrick Steinhardt
Information about an object database's packfiles is currently distributed across two different structures: - `struct packed_git` contains the `next` pointer as well as the `mru_head`, both of which serve to store the list of packfiles. - `struct object_database` contains several fields that relate to the packfiles. So we don't really have a central data structure that tracks our packfiles, and consequently responsibilities aren't always clear cut. A consequence for the upcoming pluggable object databases is that this makes it very hard to move management of packfiles from the object database level down into the object database source. Introduce a new `struct packfile_store` which is about to become the single source of truth for managing packfiles. Right now this data structure doesn't yet contain anything, but in subsequent patches we will move all data structures that relate to packfiles and that are currently contained in `struct object_database` into this new home. Note that this is only a first step: most importantly, we won't (yet) move the `struct packed_git::next` pointer around. This will happen in a subsequent patch series though so that `struct packed_git` will really only host information about the specific packfile it represents. Further note that the new structure still sits at the wrong level at the end of this patch series: as mentioned, it should eventually sit at the level of the object database source, not at the object database level. But introducing the packfile store now already makes it way easier to eventually push down the now-selfcontained data structure by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15packfile: refactor `get_multi_pack_index()` to work on sourcesPatrick Steinhardt
The function `get_multi_pack_index()` loads multi-pack indices via `prepare_packed_git()` and then returns the linked list of multi-pack indices that is stored in `struct object_database`. That list is in the process of being removed though in favor of storing the MIDX as part of the object database source it belongs to. Refactor `get_multi_pack_index()` so that it returns the multi-pack index for a single object source. Callers are now expected to call this function for each source they are interested in. This requires them to iterate through alternates, so we have to prepare alternate object sources before doing so. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename `raw_object_store` to `object_database`Patrick Steinhardt
The `raw_object_store` structure is the central entry point for reading and writing objects in a repository. The main purpose of this structure is to manage object directories and provide an interface to access and write objects in those object directories. Right now, many of the functions associated with the raw object store implicitly rely on `the_repository` to get access to its `objects` pointer, which is the `raw_object_store`. As we want to generally get rid of using `the_repository` across our codebase we will have to convert this implicit dependency on this global variable into an explicit parameter. This conversion can be done by simply passing in an explicit pointer to a repository and then using its `->objects` pointer. But there is a second effort underway, which is to make the object subsystem more selfcontained so that we can eventually have pluggable object backends. As such, passing in a repository wouldn't make a ton of sense, and the goal is to convert the object store interfaces such that we always pass in a reference to the `raw_object_store` instead. This will expose the `raw_object_store` type to a lot more callers though, which surfaces that this type is named somewhat awkwardly. The "raw_" prefix makes readers wonder whether there is a non-raw variant of the object store, but there isn't. Furthermore, we nowadays want to name functions in a way that they can be clearly attributed to a specific subsystem, but calling them e.g. `raw_object_store_has_object()` is just too unwieldy, even when dropping the "raw_" prefix. Instead, rename the structure to `object_database`. This term is already used a lot throughout our codebase, and it cannot easily be mistaken for "object directories", either. Furthermore, its acronym ODB is already well-known and works well as part of a function's name, like for example `odb_has_object()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: move function declarations to their respective subsystemsPatrick Steinhardt
We carry declarations for a couple of functions in "object-store.h" that are not defined in "object-store.c", but in a different subsystem. Move these declarations to the respective headers whose matching code files carry the corresponding definition. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: move `struct packed_git` into "packfile.h"Patrick Steinhardt
The "object-store.h" header contains the definition of `struct packed_git`. As this structure hosts all kind of information about a specific packfile it is arguably a bit out of place in a generic place like "object-store.h". Move the structure as well as `pack_map_entry_cmp()` into "packfile.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-13builtin/pack-objects.c: freshen objects from existing cruft packsTaylor Blau
Once an object is written into a cruft pack, we can only freshen it by writing a new loose or packed copy of that object with a more recent mtime. Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size` with `--cruft`, 2023-08-28), we typically had at most one cruft pack in a repository at any given time. So freshening unreachable objects was straightforward when already rewriting the cruft pack (and its *.mtimes file). But 61568efa95 changes things: 'pack-objects' now supports writing multiple cruft packs when invoked with `--cruft` and the `--max-pack-size` flag. Cruft packs are rewritten until they reach some size threshold, at which point they are considered "frozen", and will only be modified in a pruning GC, or if the threshold itself is adjusted. Prior to this patch, however, this process breaks down when we attempt to freshen an object packed in an earlier cruft pack, and that cruft pack is larger than the threshold and thus will survive the repack. When this is the case, it is impossible to freshen objects in cruft pack(s) when those cruft packs are larger than the threshold. This is because we would avoid writing them in the new cruft pack entirely, for a couple of reasons. 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()' we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping over the packs we're going to retain (which are marked as kept in-core by 'read_cruft_objects()'). This means that we will avoid enumerating additional packed copies of objects found in any cruft packs which are larger than the given size threshold. Thus there is no opportunity to call 'create_object_entry()' whatsoever. 2. We likewise will discard the loose copy (if one exists) of any unreachable object packed in a cruft pack that is larger than the threshold. Here our call path is 'add_unreachable_loose_objects()', which uses the 'add_loose_object()' callback. That function will eventually land us in 'want_object_in_pack()' (via 'add_cruft_object_entry()'), and we'll discard the object as it appears in one of the packs which we marked as kept in-core. This means in effect that it is impossible to freshen an unreachable object once it appears in a cruft pack larger than the given threshold. Instead, we should pack an additional copy of an unreachable object we want to freshen even if it appears in a cruft pack, provided that the cruft copy has an mtime which is before the mtime of the copy we are trying to pack/freshen. This is sub-optimal in the sense that it requires keeping an additional copy of unreachable objects upon freshening, but we don't have a better alternative without the ability to make in-place modifications to existing *.mtimes files. In order to implement this, we have to adjust the behavior of 'want_found_object()'. When 'pack-objects' is told that we're *not* going to retain any cruft packs (i.e. the set of packs marked as kept in-core does not contain a cruft pack), the behavior is unchanged. But when there *is* at least one cruft pack that we're holding onto, it is no longer sufficient to reject a copy of an object found in that cruft pack for that reason alone. In this case, we only want to reject a candidate object when copies of that object either: - exists in a non-cruft pack that we are retaining, regardless of that pack's mtime, or - exists in a cruft pack with an mtime at least as recent as the copy we are debating whether or not to pack, in which case freshening would be redundant. To do this, keep track of whether or not we have any cruft packs in our in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' flag. When we end up in this new special case, we replace a call to 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject objects when we have a copy in an existing cruft pack with at least as recent an mtime as our candidate (in which case "freshening" would be redundant). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28Merge branch 'jk/pack-header-parse-alignment-fix'Junio C Hamano
It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-21packfile: factor out --pack_header argument parsingJeff King
Both index-pack and unpack-objects accept a --pack_header argument. This is an undocumented internal argument used by receive-pack and fetch to pass along information about the header of the pack, which they've already read from the incoming stream. In preparation for a bugfix, let's factor the duplicated code into a common helper. The callers are still responsible for identifying the option. While this could likewise be factored out, it is more flexible this way (e.g., if they ever started using parse-options and wanted to handle both the stuck and unstuck forms). Likewise, the callers are responsible for reporting errors, though they both just call die(). I've tweaked unpack-objects to match index-pack in marking the error for translation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04config: make `packed_git_(limit|window_size)` non-global variablesKarthik Nayak
The variables `packed_git_window_size` and `packed_git_limit` are global config variables used in the `packfile.c` file. Since it is only used in this file, let's change it from being a global config variable to a local variable for the subsystem. With this, we rid `packfile.c` from all global variable usage and this means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `for_each_packed_object`Karthik Nayak
The function `for_each_packed_object` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from this function and closely related function `is_promisor_object`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `has_object[_kept]_pack`Karthik Nayak
The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `odb_pack_name`Karthik Nayak
The function `odb_pack_name` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: add repository to struct `packed_git`Karthik Nayak
The struct `packed_git` holds information regarding a packed object file. Let's add the repository variable to this object, to represent the repository that this packfile belongs to. This helps remove dependency on the global `the_repository` object in `packfile.c` by simply using repository information now readily available in the struct. We do need to consider that a packfile could be part of the alternates of a repository, but considering that we only have one repository struct and also that we currently anyways use 'the_repository', we should be OK with this change. We also modify `alloc_packed_git` to ensure that the repository is added to newly created `packed_git` structs. This requires modifying the function and all its callee to pass the repository object down the levels. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>