aboutsummaryrefslogtreecommitdiff
path: root/builtin/index-pack.c
AgeCommit message (Collapse)Author
7 daysMerge branch 'ps/odb-cleanup'Junio C Hamano
Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags
2026-03-31odb: rename `odb_has_object()` flagsPatrick Steinhardt
Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: store repository in fsck optionsPatrick Steinhardt
The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: initialize fsck options via a functionPatrick Steinhardt
We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb: embed base source in the "files" backendPatrick Steinhardt
The "files" backend is implemented as a pointer in the `struct odb_source`. This contradicts our typical pattern for pluggable backends like we use it for example in the ref store or for object database streams, where we typically embed the generic base structure in the specialized implementation. This pattern has a couple of small benefits: - We avoid an extra allocation. - We hide implementation details in the generic structure. - We can easily downcast from a generic backend to the specialized structure and vice versa because the offsets are known at compile time. - It becomes trivial to identify locations where we depend on backend specific logic because the cast needs to be explicit. Refactor our "files" object database source to do the same and embed the `struct odb_source` in the `struct odb_source_files`. There are still a bunch of sites in our code base where we do have to access internals of the "files" backend. The intent is that those will go away over time, but this will certainly take a while. Meanwhile, provide a `odb_source_files_downcast()` function that can convert a generic source into a "files" source. As we only have a single source the downcast succeeds unconditionally for now. Eventually though the intent is to make the cast `BUG()` in case the caller requests to downcast a non-"files" backend to a "files" backend. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-05odb: introduce "files" sourcePatrick Steinhardt
Introduce a new "files" object database source. This source encapsulates access to both loose object files and the packfile store, similar to how the "files" backend for refs encapsulates access to loose refs and the packed-refs file. Note that for now the "files" source is still a direct member of a `struct odb_source`. This architecture will be reversed in the next commit so that the files source contains a `struct odb_source`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: move packfile store into object sourcePatrick Steinhardt
The packfile store is a member of `struct object_database`, which means that we have a single store per database. This doesn't really make much sense though: each source connected to the database has its own set of packfiles, so there is a conceptual mismatch here. This hasn't really caused much of a problem in the past, but with the advent of pluggable object databases this is becoming more of a problem because some of the sources may not even use packfiles in the first place. Move the packfile store down by one level from the object database into the object database source. This ensures that each source now has its own packfile store, and we can eventually start to abstract it away entirely so that the caller doesn't even know what kind of store it uses. Note that we only need to adjust a relatively small number of callers, way less than one might expect. This is because most callers are using `repo_for_each_pack()`, which handles enumeration of all packfiles that exist in the repository. So for now, none of these callers need to be adapted. The remaining callers that iterate through the packfiles directly and that need adjustment are those that are a bit more tangled with packfiles. These will be adjusted over time. Note that this patch only moves the packfile store, and there is still a bunch of functions that seemingly operate on a packfile store but that end up iterating over all sources. These will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-15Merge branch 'ps/object-read-stream' into ps/packfile-store-in-odb-sourceJunio C Hamano
* ps/object-read-stream: streaming: drop redundant type and size pointers streaming: move into object database subsystem streaming: refactor interface to be object-database-centric streaming: move logic to read packed objects streams into backend streaming: move logic to read loose objects streams into backend streaming: make the `odb_read_stream` definition public streaming: get rid of `the_repository` streaming: rely on object sources to create object stream packfile: introduce function to read object info from a store streaming: move zlib stream into backends streaming: create structure for filtered object streams streaming: create structure for packed object streams streaming: create structure for loose object streams streaming: create structure for in-core object streams streaming: allocate stream inside the backend-specific logic streaming: explicitly pass packfile info when streaming a packed object streaming: propagate final object type via the stream streaming: drop the `open()` callback function streaming: rename `git_istream` into `odb_read_stream`
2025-11-25builtin/index-pack: fix deferred fsck outside reposPatrick Steinhardt
When asked to perform object consistency checks via the `--fsck-objects` flag we verify that each object part of the pack is valid. In general, this check can even be performed outside of a Git repository: we don't need an initialized object database as we simply read the object from the packfile directly. But there's one exception: a subset of the object checks may be deferred to a later point in time. For now, this only concerns ".gitmodules" and ".gitattributes" files: whenever we see a tree referencing these files we queue them for a deferred check. This is done because we need to do some extra checks for those files to ensure that they are well-formed, and these checks need to be done regardless of whether the corresponding blobs are part of the packfile or not. This works inside a repository, but unfortunately the logic leads to a segfault when running outside of one. This is because we eventually call `odb_read_object()`, which will crash because the object database has not been initialized. There's multiple options here: - We could in theory create a purely in-memory database with only a packfile store that contains the single packfile. We don't really have the infrastructure for this yet though, and it would end up being quite hacky. - We could refuse to perform consistency checks outside of a repository. But most of the checks work alright, so this would be a regression. - We can skip the finalizing consistency checks when running outside of a repository. This is not as invasive as skipping all checks, but it's not great to randomly skip a subset of tests, either. None of these options really feel perfect. The first one would be the obvious choice if easily possible. There's another option though: instead of skipping the final object checks, we can die if there are any queued object checks. With this change we now die exactly if and only if we would have previously segfaulted. Like this we ensure that objects that _may_ fail the consistency checks won't be silently skipped, and at the same time we give users a much better error message. Refactor the code accordingly and add a test that would have triggered the segfault. Note that we also move down the logic to add the packfile to the store. There is no point doing this any earlier than right before we execute `fsck_finish()`, and it ensures that the logic to set up and perform the consistency check is self-contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: drop redundant type and size pointersPatrick Steinhardt
In the preceding commits we have turned `struct odb_read_stream` into a publicly visible structure. Furthermore, this structure now contains the type and size of the object that we are about to stream. Consequently, the out-pointers that we used before to propagate the type and size of the streamed object are now somewhat redundant with the data contained in the structure itself. Drop these out-pointers and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: move into object database subsystemPatrick Steinhardt
The "streaming" terminology is somewhat generic, so it may not be immediately obvious that "streaming.{c,h}" is specific to the object database. Rectify this by moving it into the "odb/" directory so that it can be immediately attributed to the object subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: refactor interface to be object-database-centricPatrick Steinhardt
Refactor the streaming interface to be centered around object databases instead of centered around the repository. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-23streaming: rename `git_istream` into `odb_read_stream`Patrick Steinhardt
In the following patches we are about to make the `git_istream` more generic so that it becomes fully controlled by the specific object source that wants to create it. As part of these refactorings we'll fully move the structure into the object database subsystem. Prepare for this change by renaming the structure from `git_istream` to `odb_read_stream`. This mirrors the `odb_write_stream` structure that we already have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: introduce function to load and add packfilesPatrick Steinhardt
We have a recurring pattern where we essentially perform an upsert of a packfile in case it isn't yet known by the packfile store. The logic to do so is non-trivial as we have to reconstruct the packfile's key, check the map of packfiles, then create the new packfile and finally add it to the store. Introduce a new function that does this dance for us. Refactor callsites to use it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: refactor `install_packed_git()` to work on packfile storePatrick Steinhardt
The `install_packed_git()` functions adds a packfile to a specific object store. Refactor it to accept a packfile store instead of a repository to clarify its scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-17Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repositoryJunio C Hamano
* bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-16object-file: get rid of `the_repository` in `finalize_object_file()`Patrick Steinhardt
We implicitly depend on `the_repository` when moving an object file into place in `finalize_object_file()`. Get rid of this global dependency by passing in a repository. Note that one might be pressed to inject an object database instead of a repository. But the function doesn't really care about the ODB at all. All it does is to move a file into place while checking whether there is any collision. As such, the functionality it provides is independent of the object database and only needs the repository as parameter so that it can adjust permissions of the file we are about to finalize. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01builtin: use default hash when outside a repositorybrian m. carlson
We have some commands that can operate inside or outside a repository. If we're operating outside a repository, we clearly cannot use the repository's hash algorithm as a default since it doesn't exist, so instead, let's pick the default instead of specifically SHA-1. Right now this results in no functional change since the default is SHA-1, but that may change in the future. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `repo_read_object_file()`Patrick Steinhardt
Rename `repo_read_object_file()` to `odb_read_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `odb_mkstemp()`Patrick Steinhardt
Get rid of our dependency on `the_repository` in `odb_mkstemp()` by passing in the object database as a parameter and adjusting all callers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12Merge branch 'ds/fix-thin-fix'Junio C Hamano
"git index-pack --fix-thin" used to abort to prevent a cycle in delta chains from forming in a corner case even when there is no such cycle. * ds/fix-thin-fix: index-pack: allow revisiting REF_DELTA chains t5309: create failing test for 'git index-pack' test-tool: add pack-deltas helper
2025-04-29treewide: convert users of `repo_has_object_file()` to `has_object()`Patrick Steinhardt
As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: move and rename `odb_pack_keep()`Patrick Steinhardt
The function `odb_pack_keep()` creates a file at the passed-in path. If this fails, then the function re-tries by first creating any potentially missing leading directories and then trying to create the file once again. As such, this function doesn't host any kind of logic that is specific to the object store, but is rather a generic helper function. Rename the function to `safe_create_file_with_leading_directories()` and move it into "path.c". While at it, refactor it so that it loses its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-28index-pack: allow revisiting REF_DELTA chainsDerrick Stolee
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the logic within 'git index-pack' to analyze an incoming thin packfile with REF_DELTAs is suspect. The algorithm is overly cautious around delta cycles, and that leads in fact to failing even when there is no cycle. This change adjusts the algorithm to no longer fail in these cases. In fact, these cycle cases will no longer fail but more importantly the valid cases will no longer fail, either. The resulting packfile from the --fix-thin operation will not have cycles either since REF_DELTAs are forbidden from the on-disk format and OFS_DELTAs are impossible to write as a cycle. The crux of the matter is how the algorithm works when the REF_DELTAs point to base objects that exist in the local repository. When reading the thin packfile, the object IDs for the delta objects are unknown so we do not have the delta chain structure automatically. Instead, we need to start somewhere by selecting a delta whose base is inside our current object database. Consider the case where the packfile has two REF_DELTA objects, A and B, and the delta chain looks like "A depends on B" and "B depends on C" for some third object C, where C is already in the current repository. The algorithm _should_ start with all objects that depend on C, finding B, and then moving on to all objects depending on B, finding A. However, if the repository also already has object B, then the delta chain can be analyzed in a different order. The deltas with base B can be analyzed first, finding A, and then the deltas with base C are analyzed, finding B. The algorithm currently continues to look for objects that depend on B, finding A again. This fails due to A's 'real_type' member already being overwritten from OBJ_REF_DELTA to the correct object type. This scenario is possible in a typical 'git fetch' where the client does not advertise B as a 'have' but requests A as a 'want' (and C is noticed as a common object based on other 'have's). The reason this isn't typically seen is that most Git servers use OFS_DELTAs to represent deltas within a packfile. However, if a server uses only REF_DELTAs, then this kind of issue can occur. There is nothing in the explicit packfile format that states this use of inter-pack REF_DELTA is incorrect, only that REF_DELTAs should not be used in the on-disk representation to avoid cycles. This die() was introduced in ab791dd138 (index-pack: fix race condition with duplicate bases, 2014-08-29). Several refactors have adjusted the error message and the surrounding logic, but this issue has existed for a longer time as that was only a conversion from an assert(). The tests in t5309 originated in 3b910d0c5e (add tests for indexing packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on packs with recoverable delta cycles, 2013-08-23). These changes make note that the current behavior of handling "resolvable" cycles is mostly a documentation-only test, not that this behavior is the best way for Git to handle the situation. The fix here is somewhat complicated due to the amount of state being adjusted by the loop within threaded_second_pass(). Instead of trying to resume the start of the loop while adjusting the necessary context, I chose to scan the REF_DELTAs depending on the current 'parent' and skip any that have already been processed. This necessarily leaves us in a state where 'child' and 'child_obj' could be left as NULL and that must be handled later. There is also some careful handling around skipping REF_DELTAs when there are also OFS_DELTAs depending on that parent. There may be value in extending 'test-tool pack-deltas' to allow writing OFS_DELTAs in order to exercise this logic across the delta types. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15Merge branch 'ps/object-wo-the-repository'Junio C Hamano
The object layer has been updated to take an explicit repository instance as a parameter in more code paths. * ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08Merge branch 'ps/object-wo-the-repository' into ps/object-file-cleanupJunio C Hamano
* ps/object-wo-the-repository: hash: stop depending on `the_repository` in `null_oid()` hash: fix "-Wsign-compare" warnings object-file: split out logic regarding hash algorithms delta-islands: stop depending on `the_repository` object-file-convert: stop depending on `the_repository` pack-bitmap-write: stop depending on `the_repository` pack-revindex: stop depending on `the_repository` pack-check: stop depending on `the_repository` environment: move access to "core.bigFileThreshold" into repo settings pack-write: stop depending on `the_repository` and `the_hash_algo` object: stop depending on `the_repository` csum-file: stop depending on `the_repository`
2025-04-07Merge branch 'jh/hash-init-fixes'Junio C Hamano
An earlier code refactoring of the hash machinery missed a few required calls to init_fn. * jh/hash-init-fixes: index-pack, unpack-objects: restore missing ->init_fn
2025-03-18index-pack, unpack-objects: restore missing ->init_fnJensen Huang
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers") accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1. This fixes the following error on fetch: fatal: fetch-pack: invalid index-pack output Signed-off-by: Jensen Huang <hmz007@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10environment: move access to "core.bigFileThreshold" into repo settingsPatrick Steinhardt
The "core.bigFileThreshold" setting is stored in a global variable and populated via `git_default_core_config()`. This may cause issues in the case where one is handling multiple different repositories in a single process with different values for that config key, as we may or may not see the correct value in that case. Furthermore, global state blocks our path towards libification. Refactor the code so that we instead store the value in `struct repo_settings`, where the value is computed as-needed and cached. Note that this change requires us to adapt one test in t1050 that verifies that we die when parsing an invalid "core.bigFileThreshold" value. The exercised Git command doesn't use the value at all, and thus it won't hit the new code path that parses the value. This is addressed by using git-hash-object(1) instead, which does read the value. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10pack-write: stop depending on `the_repository` and `the_hash_algo`Patrick Steinhardt
There are a couple of functions in "pack-write.c" that implicitly depend on `the_repository` or `the_hash_algo`. Remove this dependency by injecting the repository via a parameter and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object: stop depending on `the_repository`Patrick Steinhardt
There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10csum-file: stop depending on `the_repository`Patrick Steinhardt
There are multiple sites in "csum-file.c" where we use the global `the_repository` variable, either explicitly or implicitly by using `the_hash_algo`. Refactor the code to stop using `the_repository` by adapting functions to receive required data as parameters. Adapt callsites accordingly by either using `the_repository->hash_algo`, or by using a context-provided hash algorithm in case the subsystem already got rid of its dependency on `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-10Merge branch 'ps/hash-cleanup'Junio C Hamano
Further code clean-up on the use of hash functions. Now the context object knows what hash function it is working with. * ps/hash-cleanup: global: adapt callers to use generic hash context helpers hash: provide generic wrappers to update hash contexts hash: stop typedeffing the hash context hash: convert hashing context to a structure
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31hash: stop typedeffing the hash contextPatrick Steinhardt
We generally avoid using `typedef` in the Git codebase. One exception though is the `git_hash_ctx`, likely because it used to be a union rather than a struct until the preceding commit refactored it. But now that it is a normal `struct` there isn't really a need for a typedef anymore. Drop the typedef and adapt all callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28Merge branch 'jk/pack-header-parse-alignment-fix'Junio C Hamano
It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-28Merge branch 'jc/show-usage-help'Junio C Hamano
The help text from "git $cmd -h" appear on the standard output for some $cmd and the standard error for others. The built-in commands have been fixed to show them on the standard output consistently. * jc/show-usage-help: builtin: send usage() help text to standard output oddballs: send usage() help text to standard output builtins: send usage_with_options() help text to standard output usage: add show_usage_if_asked() parse-options: add show_usage_with_options_if_asked() t0012: optionally check that "-h" output goes to stdout
2025-01-21pack-write: pass hash_algo to `write_rev_file()`Karthik Nayak
The `write_rev_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Also modify children functions `write_rev_file_order()` and `write_rev_header()` to accept 'the_hash_algo'. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. However, in `midx-write.c`, since all usage of global variables is removed, don't reintroduce them and instead use the `repo` available in the context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21pack-write: pass hash_algo to `write_idx_file()`Karthik Nayak
The `write_idx_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls `write_idx_file()`, update it to accept a `struct git_hash_algo` as a parameter and pass it through to the callee. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21pack-write: pass hash_algo to `fixup_pack_header_footer()`Karthik Nayak
The `fixup_pack_header_footer()` function uses the global `the_hash_algo` variable to access the repository's hash function. To avoid global variable usage, pass a hash_algo from the layers above. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21Merge branch 'ps/the-repository'Junio C Hamano
More code paths have a repository passed through the callchain, instead of assuming the primary the_repository object. * ps/the-repository: match-trees: stop using `the_repository` graph: stop using `the_repository` add-interactive: stop using `the_repository` tmp-objdir: stop using `the_repository` resolve-undo: stop using `the_repository` credential: stop using `the_repository` mailinfo: stop using `the_repository` diagnose: stop using `the_repository` server-info: stop using `the_repository` send-pack: stop using `the_repository` serve: stop using `the_repository` trace: stop using `the_repository` pager: stop using `the_repository` progress: stop using `the_repository`
2025-01-21index-pack, unpack-objects: use skip_prefix to avoid magic numberJeff King
When parsing --pack_header=, we manually skip 14 bytes to the data. Let's use skip_prefix() to do this automatically. Note that we overwrite our pointer to the front of the string, so we have to add more context to the error message. We could avoid this by declaring an extra pointer to hold the value, but I think the modified message is actually preferable; it should give translators a bit more context. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21index-pack, unpack-objects: use get_be32() for reading pack headerJeff King
Both of these commands read the incoming pack into a static unsigned char buffer in BSS, and then parse it by casting the start of the buffer to a struct pack_header. This can result in SIGBUS on some platforms if the compiler doesn't place the buffer in a position that is properly aligned for 4-byte integers. This reportedly happens with unpack-objects (but not index-pack) on sparc64 when compiled with clang (but not gcc). But we are definitely in the wrong in both spots; since the buffer's type is unsigned char, we can't depend on larger alignment. When it works it is only because we are lucky. We'll fix this by switching to get_be32() to read the headers (just like the last few commits similarly switched us to put_be32() for writing into the same buffer). It would be nice to factor this out into a common helper function, but the interface ends up quite awkward. Either the caller needs to hardcode how many bytes we'll need, or it needs to pass us its fill()/use() functions as pointers. So I've just fixed both spots in the same way; this is not code that is likely to be repeated a third time (most of the pack reading code uses an mmap'd buffer, which should be properly aligned). I did make one tweak to the shared code: our pack_version_ok() macro expects us to pass the big-endian value we'd get by casting. We can introduce a "native" variant which uses the host integer ordering. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>