aboutsummaryrefslogtreecommitdiff
path: root/fetch-pack.c
AgeCommit message (Collapse)Author
7 daysMerge branch 'ps/odb-cleanup'Junio C Hamano
Various code clean-up around odb subsystem. * ps/odb-cleanup: odb: drop unneeded headers and forward decls odb: rename `odb_has_object()` flags odb: use enum for `odb_write_object` flags odb: rename `odb_write_object()` flags treewide: use enum for `odb_for_each_object()` flags CodingGuidelines: document our style for flags
7 daysMerge branch 'ps/fsck-wo-the-repository'Junio C Hamano
Internals of "git fsck" have been refactored to not depend on the global `the_repository` variable. * ps/fsck-wo-the-repository: builtin/fsck: stop using `the_repository` in error reporting builtin/fsck: stop using `the_repository` when marking objects builtin/fsck: stop using `the_repository` when checking packed objects builtin/fsck: stop using `the_repository` with loose objects builtin/fsck: stop using `the_repository` when checking reflogs builtin/fsck: stop using `the_repository` when checking refs builtin/fsck: stop using `the_repository` when snapshotting refs builtin/fsck: fix trivial dependence on `the_repository` fsck: drop USE_THE_REPOSITORY fsck: store repository in fsck options fsck: initialize fsck options via a function fetch-pack: move fsck options into function scope
2026-03-31odb: rename `odb_has_object()` flagsPatrick Steinhardt
Rename `odb_has_object()` flags to be properly prefixed with the function name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-24use strvec_pushv() to add another strvecJunio C Hamano
Add and apply a semantic patch that simplifies the code by letting strvec_pushv() append the items of a second strvec instead of pushing them one by one. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: store repository in fsck optionsPatrick Steinhardt
The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: initialize fsck options via a functionPatrick Steinhardt
We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fetch-pack: move fsck options into function scopePatrick Steinhardt
When fetching a packfile, we optionally verify received objects via the fsck subsystem. The options for those consistency checks are declared in global scope without a good reason, and they are never cleaned up. So in case the options are reused, they may accumulate more state over time. Furthermore, in subsequent changes we'll introduce a repository pointer into the structure. Obviously though, we don't have a repository available at static time, except for `the_repository`, which we don't want to use here. Refactor the code to move the options into the respective functions and properly manage their lifecycle. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-09Merge branch 'ps/refs-for-each'Junio C Hamano
Code refactoring around refs-for-each-* API functions. * ps/refs-for-each: refs: replace `refs_for_each_fullref_in()` refs: replace `refs_for_each_namespaced_ref()` refs: replace `refs_for_each_glob_ref()` refs: replace `refs_for_each_glob_ref_in()` refs: replace `refs_for_each_rawref_in()` refs: replace `refs_for_each_rawref()` refs: replace `refs_for_each_ref_in()` refs: improve verification for-each-ref options refs: generalize `refs_for_each_fullref_in_prefixes()` refs: generalize `refs_for_each_namespaced_ref()` refs: speed up `refs_for_each_glob_ref_in()` refs: introduce `refs_for_each_ref_ext` refs: rename `each_ref_fn` refs: rename `do_for_each_ref_flags` refs: move `do_for_each_ref_flags` further up refs: move `refs_head_ref_namespaced()` refs: remove unused `refs_for_each_include_root_ref()`
2026-02-23refs: replace `refs_for_each_rawref()`Patrick Steinhardt
Replace calls to `refs_for_each_rawref()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-17fetch-pack: wire up and enable auto filter logicChristian Couder
Previous commits have set up an infrastructure for `--filter=auto` to automatically prepare a partial clone filter based on what the server advertised and the client accepted. Using that infrastructure, let's now enable the `--filter=auto` option in `git clone` and `git fetch` by setting `allow_auto_filter` to 1. Note that these small changes mean that when `git clone --filter=auto` or `git fetch --filter=auto` are used, "auto" is automatically saved as the partial clone filter for the server on the client. Therefore subsequent calls to `git fetch` on the client will automatically use this "auto" mode even without `--filter=auto`. Let's also set `allow_auto_filter` to 1 in `transport.c`, as the transport layer must be able to accept the "auto" filter spec even if the invoking command hasn't fully parsed it yet. When an "auto" filter is requested, let's have the "fetch-pack.c" code in `do_fetch_pack_v2()` compute a filter and send it to the server. In `do_fetch_pack_v2()` the logic also needs to check for the "promisor-remote" capability and call `promisor_remote_reply()` to parse advertised remotes and populate the list of those accepted (and their filters). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-05Merge branch 'jc/optional-path'Junio C Hamano
"git config get --path" segfaulted on an ":(optional)path" that does not exist, which has been corrected. * jc/optional-path: config: really treat missing optional path as not configured config: really pretend missing :(optional) value is not there config: mark otherwise unused function as file-scope static
2025-11-24config: really treat missing optional path as not configuredJunio C Hamano
These callers expect that git_config_pathname() that returns 0 is a signal that the variable they passed has a string they need to act on. But with the introduction of ":(optional)path" earlier, that is no longer the case. If the path specified by the configuration variable is missing, their variable will get a NULL in it, and they need to act on it (often, just refraining from copying it elsewhere). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-24packfile: split up responsibilities of `reprepare_packed_git()`Patrick Steinhardt
In `reprepare_packed_git()` we perform a couple of operations: - We reload alternate object directories. - We clear the loose object cache. - We reprepare packfiles. While the logic is hosted in "packfile.c", it clearly reaches into other subsystems that aren't related to packfiles. Split up the responsibility and introduce `odb_reprepare()` which now becomes responsible for repreparing the whole object database. The existing `reprepare_packed_git()` function is refactored accordingly and only cares about reloading the packfile store now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-28Merge branch 'jk/fetch-check-graph-objects-fix'Junio C Hamano
Under a race against another process that is repacking the repository, especially a partially cloned one, "git fetch" may mistakenly think some objects we do have are missing, which has been corrected. * jk/fetch-check-graph-objects-fix: fetch-pack: re-scan when double-checking graph objects
2025-08-25fetch-pack: re-scan when double-checking graph objectsJeff King
The fetch code tries to avoid asking the remote side for an object we already have. It does this by traversing recent commits reachable from our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in commit graph but not obj db, 2024-11-05) introduced an extra check there: if we think we have an object because it's in the commit graph, we double-check that we actually have it in our object database with a call to odb_has_object(). But that call does not pass any flags, and so the function won't call reprepared_packed_git() if it does not find the object. That opens us up to the usual race against some other process repacking the odb: 1. We scan the list of packs in objects/pack but haven't yet opened them. 2. Somebody else packs the object into a new pack (which we don't know about), and deletes the old pack it was in. 3. Our odb_has_object() calls tries to open that old pack, but finds it is gone. We declare that we don't have the object. And this causes us to erroneously complain and abort the fetch, thinking our commit-graph and object database are out of sync. Instead, we should pass HAS_OBJECT_RECHECK_PACKED, which will add a new step: 4. We re-scan the pack directory again, find the new pack, and locate the object. Often the fetch code tries to avoid these kinds of re-scans if it's likely that we won't have the object. If the other side has told us about object X and we want to know if we have it, we'll skip the re-scan (to avoid spending a lot of effort when there are many such objects). We can accept the racy false negative in that case because the worst case is that we ask the other side to send us the object. But this is not one of those cases. These are objects which are accessible from _our_ refs, and which we already found in the commit graph file. We should have them, and if we don't, we'll die() immediately. So the performance impact is negligible, and getting the right answer is important. There's no test here because it's inherently racy. In fact, I had trouble even developing a minimal test. The problem seen in the wild can be produced like this: # Any git.git mirror which supports partial clones; I think this # should work with any repo that contains submodules, but note that # $obj below is specific to this repo url=https://github.com/git/git.git # This is a commit that is not at the tip of any branches (so after # we have it, we'll still have some commits to fetch). obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9 git init git fetch --filter=tree:0 $url $obj:refs/heads/foo git checkout foo git commit-graph write --reachable git fetch $url What happens here is that the initial fetch grabs that older commit (and its ancestors) but no trees or blobs, and the subsequent checkout grabs the necessary trees and blobs just for that commit. The final fetch spawns a long sequence of child fetches due to fetch_submodules(), which wants to check whether there have been any gitlink modifications which should trigger a fetch of the related submodule (we'll leave aside the irony that we did not even check out any submodules yet). That series of fetches causes us to accumulate packs, which eventually triggers background maintenance to run. That repacks all-into-one, and the pack containing $obj goes away in favor of a new pack. And then the fetch eventually fails with: fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database. In the scenario above, the race becomes likely because of the long series of quick fetches. But I _think_ the bug is independent of partial clones entirely, and you could run into the same thing with a single fetch, some other process running "git repack" simultaneously, and a bit of bad luck. I haven't been able to reproduce, though. I'm not sure if that's because there's some mis-analysis above, or if the race window is just small enough that it's hard to trigger. At any rate, re-scanning here seems like an obviously correct thing to do with no downside, and it does fix the partial-clone case shown above. Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-21Merge branch 'jc/string-list-split'Junio C Hamano
string_list_split*() family of functions have been extended to simplify common use cases. * jc/string-list-split: string-list: split-then-remove-empty can be done while splitting string-list: optionally omit empty string pieces in string_list_split*() diff: simplify parsing of diff.colormovedws string-list: optionally trim string pieces split by string_list_split*() string-list: unify string_list_split* functions string-list: align string_list_split() with its _in_place() counterpart string-list: report programming error with BUG
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-08-02string-list: align string_list_split() with its _in_place() counterpartJunio C Hamano
The string_list_split_in_place() function was updated by 52acddf3 (string-list: multi-delimiter `string_list_split_in_place()`, 2023-04-24) to take more than one delimiter characters, hoping that we can later use it to replace our uses of strtok(). We however did not make a matching change to the string_list_split() function, which is very similar. Before giving both functions more features in future commits, allow string_list_split() to also take more than one delimiter characters to make them closer to each other. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-28Merge branch 'rs/pop-recent-commit-with-prio-queue'Junio C Hamano
The pop_most_recent_commit() function can have quite expensive worst case performance characteristics, which has been optimized by using prio-queue data structure. * rs/pop-recent-commit-with-prio-queue: commit: use prio_queue_replace() in pop_most_recent_commit() prio-queue: add prio_queue_replace() commit: convert pop_most_recent_commit() to prio_queue
2025-07-23config: drop `git_config_get_bool()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_bool()`. All callsites are adjusted so that they use `repo_config_get_bool(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config_get_string()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_string()`. All callsites are adjusted so that they use `repo_config_get_string(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-22commit: convert pop_most_recent_commit() to prio_queueRené Scharfe
pop_most_recent_commit() calls commit_list_insert_by_date() for parent commits, which is itself called in a loop. This can lead to quadratic complexity if there are many merges. Replace the commit_list with a prio_queue to ensure logarithmic worst case complexity and convert all three users. Add a performance test that exercises one of them using a pathological history that consists of 50% merges and 50% root commits to demonstrate the speedup: Test v2.50.1 HEAD ---------------------------------------------------------------------- 1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9% Alas, sane histories don't benefit from the conversion much, and traversing Git's own history takes a 1% performance hit on my machine: $ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision' Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s] Range (min … max): 1.067 s … 1.078 s 10 runs Benchmark 2: ./git rev-parse :/^Initial.revision Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s] Range (min … max): 1.074 s … 1.083 s 10 runs Summary ./git_2.50.1 rev-parse :/^Initial.revision ran 1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-21Merge branch 'bc/use-sha256-by-default-in-3.0'Junio C Hamano
Prepare to flip the default hash function to SHA-256. * bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-17Merge branch 'bc/use-sha256-by-default-in-3.0' into ps/config-wo-the-repositoryJunio C Hamano
* bc/use-sha256-by-default-in-3.0: Enable SHA-256 by default in breaking changes mode help: add a build option for default hash t5300: choose the built-in hash outside of a repo t4042: choose the built-in hash outside of a repo t1007: choose the built-in hash outside of a repo t: default to compile-time default hash if not set setup: use the default algorithm to initialize repo format Use legacy hash for legacy formats builtin: use default hash when outside a repository hash: add a constant for the legacy hash algorithm hash: add a constant for the default hash algorithm
2025-07-01Use legacy hash for legacy formatsbrian m. carlson
We have a large variety of data formats and protocols where no hash algorithm was defined and the default was assumed to always be SHA-1. Instead of explicitly stating SHA-1, let's use the constant to represent the legacy hash algorithm (which is still SHA-1) so that it's clear for documentary purposes that it's a legacy fallback option and not an intentional choice to use SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: get rid of `the_repository` in `for_each()` functionsPatrick Steinhardt
There are a couple of iterator-style functions that execute a callback for each instance of a given set, all of which currently depend on `the_repository`. Refactor them to instead take an object database as parameter so that we can get rid of this dependency. Rename the functions accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29treewide: convert users of `repo_has_object_file()` to `has_object()`Patrick Steinhardt
As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-01-21pack-write: pass repository to `index_pack_lockfile()`Karthik Nayak
The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07fsck: reject misconfigured fsck.skipListJustin Tobler
In Git, fsck operations can ignore known broken objects via the `fsck.skipList` configuration. This option expects a path to a file with the list of object names. When the configuration is specified without a path, an error message is printed, but the command continues as if the configuration was not set. Configuring `fsck.skipList` without a value is a misconfiguration so config parsing should be more strict and reject it. Update `git_fsck_config()` to no longer ignore misconfiguration of `fsck.skipList`. The same behavior is also present for `fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration parsers for these are updated to ensure the related operations remain consistent. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28fetch-pack: split out fsck config parsingJustin Tobler
When `fetch_pack_config()` is invoked, fetch-pack configuration is parsed from the config. As part of this operation, fsck message severity configuration is assigned to the `fsck_msg_types` global variable. This is optionally used to configure the downstream git-index-pack(1) when the `--strict` option is specified. The same parsed fsck message severity configuration is also needed outside of fetch-pack. Instead of exposing/relying on the existing global state, split out the fsck config parsing logic into `fetch_pack_fsck_config()` and expose it. In a subsequent commit, this is used to provide fsck configuration when invoking `unbundle()`. For `fetch_pack_fsck_config()` to discern between errors and unhandled config variables, the return code when `git_config_path()` errors is changed to a different value also indicating success. This frees up the previous return code to now indicate the provided config variable was unhandled. The behavior remains functionally the same. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05fetch-pack: die if in commit graph but not obj dbJonathan Tan
When fetching, there is a step in which sought objects are first checked against the local repository; only objects that are not in the local repository are then fetched. This check first looks up the commit graph file, and returns "present" if the object is in there. However, the action of first looking up the commit graph file is not done everywhere in Git, especially if the type of the object at the time of lookup is not known. This means that in a repo corruption situation, a user may encounter an "object missing" error, attempt to fetch it, and still encounter the same error later when they reattempt their original action, because the object is present in the commit graph file but not in the object DB. Therefore, make it a fatal error when this occurs. (Note that we cannot proceed to include this object in the list of objects to be fetched without changing at least the fetch negotiation code: what would happen is that the client will send "want X" and "have X" and when I tested at $DAYJOB with a work server that uses JGit, the server reasonably returned an empty packfile. And changing the fetch negotiation code to only use the object DB when deciding what to report as "have" would be an unnecessary slowdown, I think.) This was discovered when a lazy fetch of a missing commit completed with nothing actually fetched, and the writing of the commit graph file after every fetch then attempted to read said missing commit, triggering a lazy fetch of said missing commit, resulting in an infinite loop with no user-visible indication (until they check the list of processes running on their computer). With this fix, there is no infinite loop. Note that although the repo corruption we discovered was caused by a bug in GC in a partial clone, the behavior that this patch teaches Git to warn about applies to any repo with commit graph enabled and with a missing commit, whether it is a partial clone or not. t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in lookup_commit_in_graph(), 2022-07-01), tests that an interaction between fetch and the commit graph does not cause an infinite loop. This patch changes the exit code in that situation, so that test had to be changed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-05Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"Jonathan Tan
This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb. This revert simplifies the next patch in this patch set. The commit message of that commit mentions that the new function "will be used for the bundle-uri client in a subsequent commit", but it seems that eventually it wasn't used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-20Merge branch 'ps/leakfixes-part-6'Junio C Hamano
More leakfixes. * ps/leakfixes-part-6: (22 commits) builtin/repack: fix leaking keep-pack list merge-ort: fix two leaks when handling directory rename modifications match-trees: fix leaking prefixes in `shift_tree()` builtin/fmt-merge-msg: fix leaking buffers builtin/grep: fix leaking object context builtin/pack-objects: plug leaking list of keep-packs builtin/repack: fix leaking line buffer when packing promisors negotiator/skipping: fix leaking commit entries shallow: fix leaking members of `struct shallow_info` shallow: free grafts when unregistering them object: clear grafts when clearing parsed object pool gpg-interface: fix misdesigned signing key interfaces send-pack: fix leaking push cert nonce remote: fix leak in reachability check of a remote-tracking ref remote: fix leaking tracking refs builtin/submodule--helper: fix leaking refs on push-check submodule: fix leaking fetch task data upload-pack: fix leaking child process data on reachability checks builtin/push: fix leaking refspec query result send-pack: fix leaking common object IDs ...
2024-09-12environment: make `get_object_directory()` accept a repositoryPatrick Steinhardt
The `get_object_directory()` function retrieves the path to the object directory for `the_repository`. Make it accept a `struct repository` such that it can work on arbitrary repositories and make it part of the repository subsystem. This reduces our reliance on `the_repository` and clarifies scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05drop trailing newline from warning/error/die messagesJeff King
Our error reporting routines append a trailing newline, and the strings we pass to them should not include them (otherwise we get an extra blank line after the message). These cases were all found by looking at the results of: git grep -P '[^_](error|error_errno|warning|die|die_errno)\(.*\\n"[,)]' '*.c' Note that we _do_ sometimes include a newline in the middle of such messages, to create multiline output (hence our grep matching "," or ")" after we see the newline, so we know we're at the end of the string). It's possible that one or more of these cases could intentionally be including a blank line at the end, but having looked at them all manually, I think these are all just mistakes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05fetch-pack: fix memory leaks on fetch negotiationPatrick Steinhardt
We leak both the `nt_object_array` and `negotiator` structures in `negotiate_using_fetch()`. Plug both of these leaks. These leaks were exposed by t5516, but fixing them is not sufficient to make the whole test suite leak free. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'xx/bundie-uri-fixes'Junio C Hamano
When bundleURI interface fetches multiple bundles, Git failed to take full advantage of all bundles and ended up slurping duplicated objects. * xx/bundie-uri-fixes: unbundle: extend object verification for fetches fetch-pack: expose fsckObjects configuration logic bundle-uri: verify oid before writing refs
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions