aboutsummaryrefslogtreecommitdiff
path: root/refs/packed-backend.c
AgeCommit message (Collapse)Author
2026-03-09Merge branch 'ps/refs-for-each'Junio C Hamano
Code refactoring around refs-for-each-* API functions. * ps/refs-for-each: refs: replace `refs_for_each_fullref_in()` refs: replace `refs_for_each_namespaced_ref()` refs: replace `refs_for_each_glob_ref()` refs: replace `refs_for_each_glob_ref_in()` refs: replace `refs_for_each_rawref_in()` refs: replace `refs_for_each_rawref()` refs: replace `refs_for_each_ref_in()` refs: improve verification for-each-ref options refs: generalize `refs_for_each_fullref_in_prefixes()` refs: generalize `refs_for_each_namespaced_ref()` refs: speed up `refs_for_each_glob_ref_in()` refs: introduce `refs_for_each_ref_ext` refs: rename `each_ref_fn` refs: rename `do_for_each_ref_flags` refs: move `do_for_each_ref_flags` further up refs: move `refs_head_ref_namespaced()` refs: remove unused `refs_for_each_include_root_ref()`
2026-03-04Merge branch 'kn/ref-location'Junio C Hamano
Allow the directory in which reference backends store their data to be specified. * kn/ref-location: refs: add GIT_REFERENCE_BACKEND to specify reference backend refs: allow reference location in refstorage config refs: receive and use the reference storage payload refs: move out stub modification to generic layer refs: extract out `refs_create_refdir_stubs()` setup: don't modify repo in `create_reference_database()`
2026-02-25refs: receive and use the reference storage payloadKarthik Nayak
An upcoming commit will add support for providing an URI via the 'extensions.refStorage' config. The URI will contain the reference backend and a corresponding payload. The payload can be then used for providing an alternate locations for the reference backend. To prepare for this, modify the existing backends to accept such an argument when initializing via the 'init()' function. Both the files and reftable backends will parse the information to be filesystem paths to store references. Given that no callers pass any payload yet this is essentially a no-op change for now. To enable this, provide a 'refs_compute_filesystem_location()' function which will parse the current 'gitdir' and the 'payload' to provide the final reference directory and common reference directory (if working in a linked worktree). The documentation and tests will be added alongside the extension of the config variable. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23refs: rename `do_for_each_ref_flags`Patrick Steinhardt
The enum `do_for_each_ref_flags` and its individual values don't match to our current best practices when it comes to naming things. Rename it to `refs_for_each_flag`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-25refs: skip to next ref when current ref is rejectedKarthik Nayak
In `refs_verify_refnames_available()` we have two nested loops: the outer loop iterates over all references to check, while the inner loop checks for filesystem conflicts for a given ref by breaking down its path. With batched updates, when we detect a filesystem conflict, we mark the update as rejected and execute 'continue'. However, this only skips to the next iteration of the inner loop, not the outer loop as intended. This causes the same reference to be repeatedly rejected. Fix this by using a goto statement to skip to the next reference in the outer loop. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-21Merge branch 'kn/maintenance-is-needed'Junio C Hamano
"git maintenance" command learned "is-needed" subcommand to tell if it is necessary to perform various maintenance tasks. * kn/maintenance-is-needed: maintenance: add 'is-needed' subcommand maintenance: add checking logic in `pack_refs_condition()` refs: add a `optimize_required` field to `struct ref_storage_be` reftable/stack: add function to check if optimization is required reftable/stack: return stack segments directly
2025-11-19Merge branch 'ps/ref-peeled-tags-fixes'Junio C Hamano
Another fix-up to "peeled-tags" topic. * ps/ref-peeled-tags-fixes: object: fix performance regression when peeling tags
2025-11-10refs: add a `optimize_required` field to `struct ref_storage_be`Karthik Nayak
To allow users of the refs namespace to check if the reference backend requires optimization, add a new field `optimize_required` field to `struct ref_storage_be`. This field is of type `optimize_required_fn` which is also introduced in this commit. Modify the debug, files, packed and reftable backend to implement this field. A following commit will expose this via 'git pack-refs' and 'git refs optimize'. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-06object: fix performance regression when peeling tagsPatrick Steinhardt
Our Bencher dashboards [1] have recently alerted us about a bunch of performance regressions when writing references, specifically with the reftable backend. There is a 3x regression when writing many refs with preexisting refs in the reftable format, and a 10x regression when migrating refs between backends in either of the formats. Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled object IDs for invalid tags, 2025-10-23). The gist of the commit is that we may end up storing peeled objects in both reftables and packed-refs for corrupted tags, where the claimed tagged object type is different than the actual tagged object type. This will then cause us to create the `struct object *` with a wrong type, as well, and obviously nothing good comes out of that. The fix for this issue was to introduce a new flag to `peel_object()` that causes us to verify the tagged object's type before writing it into the refdb -- if the tag is corrupt, we skip writing the peeled value. To verify whether the peeled value is correct we have to look up the object type via the ODB and compare the actual type with the claimed type, and that additional object lookup is costly. This also explains why we see the regression only when writing refs with the reftable backend, but we see the regression with both backends when migrating refs: - The reftable backend knows to store peeled values in the new table immediately, so it has to try and peel each ref it's about to write to the transaction. So the performance regression is visible for all writes. - The files backend only stores peeled values when writing the packed-refs file, so it wouldn't hit the performance regression for normal writes. But on ref migrations we know to write all new values into the packed-refs file immediately, and that's why we see the regression for both backends there. Taking a step back though reveals an oddity in the new verification logic: we not only verify the _tagged_ object's type, but we also verify the type of the tag itself. But this isn't really needed, as we wouldn't hit the bug in such a case anyway, as we only hit the issue with corrupt tags claiming an invalid type for the tagged object. The consequence of this is that we now started to look up the target object of every single reference we're about to write, regardless of whether it even is a tag or not. And that is of course quite costly. Fix the issue by only verifying the type of the tagged objects. This means that we of course still have a performance hit for actual tags. But this only happens for writes anyway, and I'd claim it's preferable to not store corrupted data in the refdb than to be fast here. Rename the flag accordingly to clarify that we only verify the tagged object's type. This fix brings performance back to previous levels: Benchmark 1: baseline Time (mean ± σ): 46.0 ms ± 0.4 ms [User: 40.0 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.1 ms 54 runs Benchmark 2: regression Time (mean ± σ): 140.2 ms ± 1.3 ms [User: 77.5 ms, System: 60.5 ms] Range (min … max): 138.0 ms … 142.7 ms 20 runs Benchmark 3: fix Time (mean ± σ): 46.2 ms ± 0.4 ms [User: 40.2 ms, System: 5.7 ms] Range (min … max): 45.0 ms … 47.3 ms 55 runs Summary update-ref: baseline 1.00 ± 0.01 times faster than fix 3.05 ± 0.04 times faster than regression [1]: https://bencher.dev/perf/git/plots Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: rename 'pack_refs_opts' to 'refs_optimize_opts'Karthik Nayak
The previous commit removed all references to 'pack_refs()' within the refs subsystem. Continue this cleanup by also renaming 'pack_refs_opts' to 'refs_optimize_opts' and the respective flags accordingly. Keeping the naming consistent will make the code easier to maintain. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: move to using the '.optimize' functionsKarthik Nayak
The `struct ref_store` variable exposes two ways to optimize a reftable backend: 1. pack_refs 2. optimize The former was specific to the 'files' + 'packed' refs backend. The latter is more generic and covers all backends. While the naming is different, both of these functions perform the same functionality. Consolidate this code to only maintain the 'optimize' functions. Do this by modifying the backends so that they exclusively implement the `optimize` callback, only. All users of the refs subsystem already use the 'optimize' function so there is no changes needed on the callee side. Finally, cleanup all references to the 'pack_refs' field of the structure and code around it. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: don't store peeled object IDs for invalid tagsPatrick Steinhardt
Both the "files" and "reftable" backend store peeled object IDs for references that point to tags: - The "files" backend stores the value when packing refs, where each peeled object ID is prefixed with "^". - The "reftable" backend stores the value whenever writing a new reference that points to a tag via a special ref record type. Both of these backends use `peel_object()` to find the peeled object ID. But as explained in the preceding commit, that function does not detect the case where the tag's tagged object and its claimed type mismatch. The consequence of storing these bogus peeled object IDs is that we're less likely to detect such corruption in other parts of Git. git-for-each-ref(1) for example does not notice anymore that the tag is broken when using "--format=%(*objectname)" to dereference tags. One could claim that this is good, because it still allows us to mostly use the tag as intended. But the biggest problem here is that we now have different behaviour for such a broken tag depending on whether or not we have its peeled value in the refdb. Fix the issue by verifying the object type when peeling the object. If that verification fails we simply skip storing the peeled value in either of the reference formats. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04object: add flag to `peel_object()` to verify object typePatrick Steinhardt
When peeling a tag to a non-tag object we repeatedly call `parse_object()` on the tagged object until we find the first object that isn't a tag. While this feels sensible at first, there is a big catch here: `parse_object()` doesn't actually verify the type of the tagged object. The relevant code path here eventually ends up in `parse_tag_buffer()`. Here, we parse the various fields of the tag, including the "type". Once we've figured out the type and the tagged object ID, we call one of the `lookup_${type}()` functions for whatever type we have found. There is two possible outcomes in the successful case: 1. The object is already part of our cached objects. In that case we double-check whether the type we're trying to look up matches the type that was cached. 2. The object is _not_ part of our cached objects. In that case, we simply create a new object with the expected type, but we don't parse that object. In the first case we might notice type mismatches, but only in the case where our cache has the object with the correct type. In the second case, we'll blindly assume that the type is correct and then go with it. We'll only notice that the type might be wrong when we try to parse the object at a later point. Now arguably, we could change `parse_tag_buffer()` to verify the tagged object's type for us. But that would have the effect that such a tag cannot be parsed at all anymore, and we have a small bunch of tests for exactly this case that assert we still can open such tags. So this change does not feel like something we can retroactively tighten, even though one shouldn't ever hit such corrupted tags. Instead, add a new `flags` field to `peel_object()` that allows the caller to opt in to strict object verification. This will be wired up at a subset of callsites over the next few commits. Note that this change also inlines `deref_tag_noverify()`. There's only been two callsites of that function, the one we're changing and one in our test helpers. The latter callsite can trivially use `deref_tag()` instead, so by inlining the function we avoid having to pass down the flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: drop infrastructure to peel via iteratorsPatrick Steinhardt
Now that the peeled object ID gets propagated via the `struct reference` there is no need anymore to call into the reference iterator itself to dereference an object. Remove this infrastructure. Most of the changes are straight-forward deletions of code. There is one exception though in `refs/packed-backend.c::write_with_updates()`. Here we stop peeling the iterator and instead just pass the peeled object ID of that iterator directly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: expose peeled object ID via the iteratorPatrick Steinhardt
Both the "files" and "reftable" backend are able to store peeled values for tags in the respective formats. This allows for a more efficient lookup of the target object of such a tag without having to manually peel via the object database. The infrastructure to access these peeled object IDs is somewhat funky though. When iterating through objects, we store a pointer reference to the current iterator in a global variable. The callbacks invoked by that iterator are then expected to call `peel_iterated_oid()`, which checks whether the globally-stored iterator's current reference refers to the one handed into that function. If so, we ask the iterator to peel the object, otherwise we manually peel the object via the object database. Depending on global state like this is somewhat weird and also quite fragile. Introduce a new `struct reference::peeled_oid` field that can be populated by the reference backends. This field can be accessed via a new function `reference_get_peeled_oid()` that either uses that value, if set, or alternatively peels via the ODB. With this change we don't have to rely on global state anymore, but make the peeled object ID available to the callback functions directly. Adjust trivial callers that already have a `struct reference` available. Remaining callers will be adjusted in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: fully reset `struct ref_iterator::ref` on iterationPatrick Steinhardt
With the introduction of the `struct ref_iterator::ref` field it now is a whole lot easier to introduce new fields that become accessible to the caller without having to adapt every single callsite. But there's a downside: when a new field is introduced we always have to adapt all backends to set that field. This isn't something we can avoid in the general case: when the new field is expected to be populated by all backends we of course cannot avoid doing so. But new fields may be entirely optional, in which case we'd still have such churn. And furthermore, it is very easy right now to leak state from a previous iteration into the next iteration. Address this issue by ensuring that the reference backends all fully reset the field on every single iteration. This ensures that no state from previous iterations can leak into the next one. And it ensures that any newly introduced fields will be zeroed out by default. Note that we don't have to explicitly adapt the "files" backend, as it uses the `cache_ref_iterator` internally. Furthermore, other "wrapping" iterators like for example the `prefix_ref_iterator` copy around the whole reference, so these don't need to be adapted either. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce `.ref` field for the base iteratorPatrick Steinhardt
The base iterator has a couple of fields that tracks the name, target, object ID and flags for the current reference. Due to this design we have to create a new `struct reference` whenever we want to hand over that reference to the callback function, which is tedious and not very efficient. Convert the structure to instead contain a `struct reference` as member. This member is expected to be populated by the implementations of the iterator and is handed over to the callback directly. While at it, simplify `should_pack_ref()` to take a `struct reference` directly instead of passing its respective fields. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-04Merge branch 'ps/config-wo-the-repository'Junio C Hamano
The config API had a set of convenience wrapper functions that implicitly use the_repository instance; they have been removed and inlined at the calling sites. * ps/config-wo-the-repository: (21 commits) config: fix sign comparison warnings config: move Git config parsing into "environment.c" config: remove unused `the_repository` wrappers config: drop `git_config_set_multivar()` wrapper config: drop `git_config_get_multivar_gently()` wrapper config: drop `git_config_set_multivar_in_file_gently()` wrapper config: drop `git_config_set_in_file_gently()` wrapper config: drop `git_config_set()` wrapper config: drop `git_config_set_gently()` wrapper config: drop `git_config_set_in_file()` wrapper config: drop `git_config_get_bool()` wrapper config: drop `git_config_get_ulong()` wrapper config: drop `git_config_get_int()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string()` wrapper config: drop `git_config_get_string_multi()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get_value()` wrapper config: drop `git_config_get()` wrapper config: drop `git_config_clear()` wrapper ...
2025-07-23config: drop `git_config_get_int()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_int()`. All callsites are adjusted so that they use `repo_config_get_int(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15refs: selectively set prefix in the seek functionsKarthik Nayak
The ref iterator exposes a `ref_iterator_seek()` function. The name suggests that this would seek the iterator to a specific reference in some ways similar to how `fseek()` works for the filesystem. However, the function actually sets the prefix for refs iteration. So further iteration would only yield references which match the particular prefix. This is a bit confusing. Let's add a 'flags' field to the function, which when set with the 'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the iteration in-line with the existing behavior. Otherwise, the reference backends will simply seek to the specified reference and clears any previously set prefix. This allows users to start iteration from a specific reference. In the packed and reftable backend, since references are available in a sorted list, the changes are simply setting the prefix if needed. The changes on the files-backend are a little more involved, since the files backend uses the 'ref-cache' mechanism. We move out the existing logic within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()` which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We then parse the provided seek string and set the required levels and their indexes to ensure that seeking is possible. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: mmap large "packed-refs" file during fsckshejialuo
During fsck, we use "strbuf_read" to read the content of "packed-refs" without using mmap mechanism. This is a bad practice which would consume more memory than using mmap mechanism. Besides, as all code paths in "packed-backend.c" use this way, we should make "fsck" align with the current codebase. As we have introduced the helper function "allocate_snapshot_buffer", we can simply use this function to use mmap mechanism. Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: extract snapshot allocation in `load_contents`shejialuo
"load_contents" would choose which way to load the content of the "packed-refs". However, we cannot directly use this function when checking the consistency due to we don't want to open the file. And we also need to reuse the logic to avoid causing repetition. Let's create a new helper function "allocate_snapshot_buffer" to extract the snapshot allocation logic in "load_contents" and update the "load_contents" to align with the behavior. Suggested-by: Jeff King <peff@peff.net> Suggested-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: fsck should warn when "packed-refs" file is emptyshejialuo
We assume the "packed-refs" won't be empty and instead has at least one line in it (even when there are no refs packed, there is the file header line). Because there is no terminating LF in the empty file, we will report "packedRefEntryNotTerminated(ERROR)" to the user. However, the runtime code paths would accept an empty "packed-refs" file, for example, "create_snapshot" would simply return the "snapshot" without checking the content of "packed-refs". So, we should skip checking the content of "packed-refs" when it is empty during fsck. After 694b7a1999 (repack_without_ref(): write peeled refs in the rewritten file, 2013-04-22), we would always write a header into the "packed-refs" file. So, versions of Git that are not too ancient never write such an empty "packed-refs" file. As an empty file often indicates a sign of a filesystem-level issue, the way we want to resolve this inconsistency is not make everybody totally silent but notice and report the anomaly. Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report to the users that "packed-refs" is empty. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-17Merge branch 'ps/refname-avail-check-optim'Junio C Hamano
Incorrect sorting of refs with bytes with high-bit set on platforms with signed char led to a BUG, which has been corrected. * ps/refname-avail-check-optim: refs/packed: fix BUG when seeking refs with UTF-8 characters
2025-04-16Merge branch 'kn/non-transactional-batch-updates'Junio C Hamano
Updating multiple references have only been possible in all-or-none fashion with transactions, but it can be more efficient to batch multiple updates even when some of them are allowed to fail in a best-effort manner. A new "best effort batches of updates" mode has been introduced. * kn/non-transactional-batch-updates: update-ref: add --batch-updates flag for stdin mode refs: support rejection in batch updates during F/D checks refs: implement batch reference update support refs: introduce enum-based transaction error types refs/reftable: extract code from the transaction preparation refs/files: remove duplicate duplicates check refs: move duplicate refname update check to generic layer refs/files: remove redundant check in split_symref_update()
2025-04-09refs/packed: fix BUG when seeking refs with UTF-8 charactersPatrick Steinhardt
It was reported that using git-pull(1) in a repository whose remote contains branches with emojis leads to the following bug: $ git pull remote: Enumerating objects: 161255, done. remote: Counting objects: 100% (55884/55884), done. remote: Compressing objects: 100% (5518/5518), done. remote: Total 161255 (delta 54253), reused 50509 (delta 50364), pack-reused 105371 (from 4) Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done. Resolving deltas: 100% (118048/118048), completed with 13416 local objects. From github.com:github/github 97ab7ae3f3745..8fb2f9fa180ed master -> origin/master [...snip many screenfuls of updates to origin remotes...] BUG: refs/packed-backend.c:984: packed-refs backend yielded reference preceding its prefix error: fetch died of signal 6 This issue bisects to 22600c04529 (refs/iterator: implement seeking for packed-ref iterators, 2025-03-12) where we have implemented seeking for the packed-ref iterator. As part of that change we introduced a check that verifies that the iterator only returns refnames bigger than the prefix. In theory, this check should always hold: when a prefix is set we know that we would've seeked that prefix first, so we should never see a reference sorting before that prefix. But in practice the check itself is misbehaving when handling unicode characters. The particular issue triggered with a branch that got the "shaved ice" unicode character in its name, which is composed of the bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it specifically hits when comparing the first byte, "0xEE". The root cause is that the most-significant bit of 0xEE is set. The `refname` and `prefix` pointers that we use to compare bytes with one another are both pointers to signed characters. As such, when we dereference the 0xEE byte the result is a _negative_ value, and this value will of course compare smaller than "z". We can see that this issue is avoided in `cmp_packed_refname()`, where we explicitly cast each byte to its unsigned form. Fix the bug by doing the same in `packed_ref_iterator_advance()`. Reported-by: Elijah Newren <newren@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: implement batch reference update supportKarthik Nayak
Git supports making reference updates with or without transactions. Updates with transactions are generally better optimized. But transactions are all or nothing. This means, if a user wants to batch updates to take advantage of the optimizations without the hard requirement that all updates must succeed, there is no way currently to do so. Particularly with the reftable backend where batching multiple reference updates is more efficient than performing them sequentially. Introduce batched update support with a new flag, 'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from transactions, use the transaction infrastructure under the hood. When enabled, this flag allows individual reference updates that would typically cause the entire transaction to fail due to non-system-related errors to be marked as rejected while permitting other updates to proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC' continue to result in the entire transaction failing. This approach enhances flexibility while preserving transactional integrity where necessary. The implementation introduces several key components: - Add 'rejection_err' field to struct `ref_update` to track failed updates with failure reason. - Add a new struct `ref_transaction_rejections` and a field within `ref_transaction` to this struct to allow quick iteration over rejected updates. - Modify reference backends (files, packed, reftable) to handle partial transactions by using `ref_transaction_set_rejected()` instead of failing the entire transaction when `REF_TRANSACTION_ALLOW_FAILURE` is set. - Add `ref_transaction_for_each_rejected_update()` to let callers examine which updates were rejected and why. This foundational change enables batched update support throughout the reference subsystem. A following commit will expose this capability to users by adding a `--batch-updates` flag to 'git-update-ref(1)', providing both a user-facing feature and a testable implementation. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: introduce enum-based transaction error typesKarthik Nayak
Replace preprocessor-defined transaction errors with a strongly-typed enum `ref_transaction_error`. This change: - Improves type safety and function signature clarity. - Makes error handling more explicit and discoverable. - Maintains existing error cases, while adding new error cases for common scenarios. This refactoring paves the way for more comprehensive error handling which we will utilize in the upcoming commits to add batch reference update support. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08refs: move duplicate refname update check to generic layerKarthik Nayak
Move the tracking of refnames in `affected_refnames` from individual backends into the generic layer in 'refs.c'. This centralizes the duplicate refname detection that was previously handled separately by each backend. Make some changes to accommodate this move: - Add a `string_list` field `refnames` to `ref_transaction` to contain all the references in a transaction. This field is updated whenever a new update is added via `ref_transaction_add_update`, so manual additions in reference backends are dropped. - Modify the backends to use this field internally as needed. The backends need to check if an update for refname already exists when splitting symrefs or adding an update for 'HEAD'. - In the reftable backend, within `reftable_be_transaction_prepare()`, move the `string_list_has_string()` check above `ref_transaction_add_update()`. Since `ref_transaction_add_update()` automatically adds the refname to `transaction->refnames`, performing the check after will always return true, so we perform the check before adding the update. This helps reduce duplication of functionality between the backends and makes it easier to make changes in a more centralized manner. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-29Merge branch 'ps/refname-avail-check-optim'Junio C Hamano
The code paths to check whether a refname X is available (by seeing if another ref X/Y exists, etc.) have been optimized. * ps/refname-avail-check-optim: refs: reuse iterators when determining refname availability refs/iterator: implement seeking for files iterators refs/iterator: implement seeking for packed-ref iterators refs/iterator: implement seeking for ref-cache iterators refs/iterator: implement seeking for reftable iterators refs/iterator: implement seeking for merged iterators refs/iterator: provide infrastructure to re-seek iterators refs/iterator: separate lifecycle from iteration refs: stop re-verifying common prefixes for availability refs/files: batch refname availability checks for initial transactions refs/files: batch refname availability checks for normal transactions refs/reftable: batch refname availability checks refs: introduce function to batch refname availability checks builtin/update-ref: skip ambiguity checks when parsing object IDs object-name: allow skipping ambiguity checks in `get_oid()` family object-name: introduce `repo_get_oid_with_flags()`
2025-03-12refs/iterator: implement seeking for packed-ref iteratorsPatrick Steinhardt
Implement seeking of `packed-ref` iterators. The implementation is again straight forward, except that we cannot continue to use the prefix iterator as we would otherwise not be able to reseek the iterator anymore in case one first asks for an empty and then for a non-empty prefix. Instead, we open-code the logic to in `advance()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-12refs/iterator: separate lifecycle from iterationPatrick Steinhardt
The ref and reflog iterators have their lifecycle attached to iteration: once the iterator reaches its end, it is automatically released and the caller doesn't have to care about that anymore. When the iterator should be released before it has been exhausted, callers must explicitly abort the iterator via `ref_iterator_abort()`. This lifecycle is somewhat unusual in the Git codebase and creates two problems: - Callsites need to be very careful about when exactly they call `ref_iterator_abort()`, as calling the function is only valid when the iterator itself still is. This leads to somewhat awkward calling patterns in some situations. - It is impossible to reuse iterators and re-seek them to a different prefix. This feature isn't supported by any iterator implementation except for the reftable iterators anyway, but if it was implemented it would allow us to optimize cases where we need to search for specific references repeatedly by reusing internal state. Detangle the lifecycle from iteration so that we don't deallocate the iterator anymore once it is exhausted. Instead, callers are now expected to always call a newly introduce `ref_iterator_free()` function that deallocates the iterator and its internal state. Note that the `dir_iterator` is somewhat special because it does not implement the `ref_iterator` interface, but is only used to implement other iterators. Consequently, we have to provide `dir_iterator_free()` instead of `dir_iterator_release()` as the allocated structure itself is managed by the `dir_iterator` interfaces, as well, and not freed by `ref_iterator_free()` like in all the other cases. While at it, drop the return value of `ref_iterator_abort()`, which wasn't really required by any of the iterator implementations anyway. Furthermore, stop calling `base_ref_iterator_free()` in any of the backends, but instead call it in `ref_iterator_free()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the "packed-refs" is sortedshejialuo
When there is a "sorted" trait in the header of the "packed-refs" file, it means that each entry is sorted increasingly by comparing the refname. We should add checks to verify whether the "packed-refs" is sorted in this case. Update the "packed_fsck_ref_header" to know whether there is a "sorted" trail in the header. It may seem that we could record all refnames during the parsing process and then compare later. However, this is not a good design due to the following reasons: 1. Because we need to store the state across the whole checking lifetime, we would consume a lot of memory if there are many entries in the "packed-refs" file. 2. We cannot reuse the existing compare function "cmp_packed_ref_records" which cause repetition. Because "cmp_packed_ref_records" needs an extra parameter "struct snaphost", extract the common part into a new function "cmp_packed_ref_records" to reuse this function to compare. Then, create a new function "packed_fsck_ref_sorted" to parse the file again and user the new fsck message "packedRefUnsorted(ERROR)" to report to the user if the file is not sorted. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" entry consistency checkshejialuo
"packed-backend.c::next_record" will parse the ref entry to check the consistency. This function has already checked the following things: 1. Parse the main line of the ref entry to inspect whether the oid is not correct. Then, check whether the next character is oid. Then check the refname. 2. If the next line starts with '^', it would continue to parse the peeled oid and check whether the last character is '\n'. As we decide to implement the ref consistency check for "packed-refs", let's port these two checks and update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the refname contains NUL charactersshejialuo
"packed-backend.c::next_record" will use "check_refname_format" to check the consistency of the refname. If it is not OK, the program will die. However, it is reported in [1], we cannot catch some corruption. But we already have the code path and we must miss out something. We use the following code to get the refname: strbuf_add(&iter->refname_buf, p, eol - p); iter->base.refname = iter->refname_buf.buf In the above code, `p` is the start pointer of the refname and `eol` is the next newline pointer. We calculate the length of the refname by subtracting the two pointers. Then we add the memory range between `p` and `eol` to get the refname. However, if there are some NUL characters in the memory range between `p` and `eol`, we will see the refname as a valid ref name as long as the memory range between `p` and first occurred NUL character is valid. In order to catch above corruption, create a new function "refname_contains_nul" by searching the first NUL character. If it is not at the end of the string, there must be some NUL characters in the refname. Use this function in "next_record" function to die the program if "refname_contains_nul" returns true. [1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/ Reported-by: R. Diez <rdiez-temp3@rd10.de> Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" header consistency checkshejialuo
In "packed-backend.c::create_snapshot", if there is a header (the line which starts with '#'), we will check whether the line starts with "# pack-refs with: ". However, we need to consider other situations and discuss whether we need to add checks. 1. If the header does not exist, we should not report an error to the user. This is because in older Git version, we never write header in the "packed-refs" file. Also, we do allow no header in "packed-refs" in runtime. 2. If the header content does not start with "# packed-ref with: ", we should report an error just like what "create_snapshot" does. So, create a new fsck message "badPackedRefHeader(ERROR)" for this. 3. If the header content is not the same as the constant string "PACKED_REFS_HEADER". This is expected because we make it extensible intentionally and runtime "create_snapshot" won't complain about unknown traits. In order to align with the runtime behavior. There is no need to report. As we have analyzed, we only need to check the case 2 in the above. In order to do this, use "open_nofollow" function to get the file descriptor and then read the "packed-refs" file via "strbuf_read". Like what "create_snapshot" and other functions do, we could split the line by finding the next newline in the buffer. When we cannot find a newline, we could report an error. So, create a function "packed_fsck_ref_next_line" to find the next newline and if there is no such newline, use "packedRefEntryNotTerminated(ERROR)" to report an error to the user. Then, parse the first line to apply the checks. Update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check if header starts with "# pack-refs with: "shejialuo
We always write a space after "# pack-refs with:" but we don't align with this rule in the "create_snapshot" method where we would check whether header starts with "# pack-refs with:". It might seem that we should undoubtedly tighten this rule, however, we don't have any technical documentation about this and there is a possibility that we would break the compatibility for other third-party libraries. By investigating influential third-party libraries, we could conclude how these libraries handle the header of "packed-refs" file: 1. libgit2 is fine and always writes the space. It also expects the whitespace to exist. 2. JGit does not expect th header to have a trailing space, but expects the "peeled" capability to have a leading space, which is mostly equivalent because that capability is typically the first one we write. It always writes the space. 3. gitoxide expects the space t exist and writes it. 4. go-git doesn't create the header by default. As many third-party libraries expect a single space after "# pack-refs with:", if we forget to write the space after the colon, "create_snapshot" won't catch this. And we would break other re-implementations. So, we'd better tighten the rule by checking whether the header starts with "# pack-refs with: ". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the "packed-refs" is regular fileshejialuo
Although "git-fsck(1)" and "packed-backend.c" will check some consistency and correctness of "packed-refs" file, they never check the filetype of the "packed-refs". Let's verify that the "packed-refs" has the expected filetype, confirming it is created by "git pack-refs" command. We could use "open_nofollow" wrapper to open the raw "packed-refs" file. If the returned "fd" value is less than 0, we could check whether the "errno" is "ELOOP" to report an error to the user. And then we use "fstat" to check whether the "packed-refs" file is a regular file. Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to the user if "packed-refs" is not a regular file. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'sj/ref-contents-check'Junio C Hamano
"git fsck" learned to issue warnings on "curiously formatted" ref contents that have always been taken valid but something Git wouldn't have written itself (e.g., missing terminating end-of-line after the full object name). * sj/ref-contents-check: ref: add symlink ref content check for files backend ref: check whether the target of the symref is a ref ref: add basic symref content check for files backend ref: add more strict checks for regular refs ref: port git-fsck(1) regular refs check for files backend ref: support multiple worktrees check for refs ref: initialize ref name outside of check functions ref: check the full refname instead of basename ref: initialize "fsck_ref_report" with zero
2024-11-21ref: support multiple worktrees check for refsshejialuo
We have already set up the infrastructure to check the consistency for refs, but we do not support multiple worktrees. However, "git-fsck(1)" will check the refs of worktrees. As we decide to get feature parity with "git-fsck(1)", we need to set up support for multiple worktrees. Because each worktree has its own specific refs, instead of just showing the users "refs/worktree/foo", we need to display the full name such as "worktrees/<id>/refs/worktree/foo". So we should know the id of the worktree to get the full name. Add a new parameter "struct worktree *" for "refs-internal.h::fsck_fn". Then change the related functions to follow this new interface. The "packed-refs" only exists in the main worktree, so we should only check "packed-refs" in the main worktree. Use "is_main_worktree" method to skip checking "packed-refs" in "packed_fsck" function. Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add "worktree/<id>/" prefix when we are not in the main worktree. Last, add a new test to check the refname when there are multiple worktrees to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21refs: introduce "initial" transaction flagPatrick Steinhardt
There are two different ways to commit a transaction: - `ref_transaction_commit()` can be used to commit a regular transaction and is what almost every caller wants. - `initial_ref_transaction_commit()` can be used when it is known that the ref store that the transaction is committed for is empty and when there are no concurrent processes. This is used when cloning a new repository. Implementing this via two separate functions has a couple of downsides. First, every reference backend needs to implement a separate callback even in the case where they don't special-case the initial transaction. Second, backends are basically forced to reimplement the whole logic for how to commit the transaction like the "files" backend does, even though backends may wish to only tweak certain behaviour of a "normal" commit. Third, it is awkward that callers must never prepare the transaction as this is somewhat different than how a transaction typically works. Refactor the code such that we instead mark initial transactions via a separate flag when starting the transaction. This addresses all of the mentioned painpoints, where the most important part is that it will allow backends to have way more leeway in how exactly they want to handle the initial transaction. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12Merge branch 'ps/pack-refs-auto-heuristics'Junio C Hamano
"git pack-refs --auto" for the files backend was too aggressive, which has been a bit tamed. * ps/pack-refs-auto-heuristics: refs/files: use heuristic to decide whether to repack with `--auto` t0601: merge tests for auto-packing of refs wrapper: introduce `log2u()`
2024-09-04refs/files: use heuristic to decide whether to repack with `--auto`Patrick Steinhardt
The `--auto` flag for git-pack-refs(1) allows the ref backend to decide whether or not a repack is in order. This switch has been introduced mostly with the "reftable" backend in mind, which already knows to auto-compact its tables during normal operations. When the flag is set, then it will use the same auto-compaction mechanism and thus end up doing nothing in most cases. The "files" backend does not have any such heuristic yet and instead packs any loose references unconditionally. So we rewrite the complete "packed-refs" file even if there's only a single loose reference to be packed. Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using `git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is unconditionally executed via our auto maintenance, so we end up repacking references every single time auto maintenance kicks in. And while that commit already mentioned that the "files" backend unconditionally packs refs now, the author obviously didn't quite think about the consequences thereof. So while the idea was sound, we really should have added a heuristic to the "files" backend before implementing it. Introduce a heuristic that decides whether or not it is worth to pack loose references. The important factors to decide here are the number of loose references in comparison to the overall size of the "packed-refs" file. The bigger the "packed-refs" file, the longer it takes to rewrite it and thus we scale up the limit of allowed loose references before we repack. As is the nature of heuristics, this mechansim isn't obviously "correct", but should rather be seen as a tradeoff between how much resources we spend packing refs and how inefficient the ref store becomes. For all I can say, we have successfully been using the exact same heuristic in Gitaly for several years by now. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-26Merge branch 'jk/mark-unused-parameters'Junio C Hamano
Mark unused parameters as UNUSED to squelch -Wunused warnings. * jk/mark-unused-parameters: t-hashmap: stop calling setup() for t_intern() test scalar: mark unused parameters in dummy function daemon: mark unused parameters in non-posix fallbacks setup: mark unused parameter in config callback test-mergesort: mark unused parameters in trivial callback t-hashmap: mark unused parameters in callback function reftable: mark unused parameters in virtual functions reftable: drop obsolete test function declarations reftable: ignore unused argc/argv in test functions unit-tests: ignore unused argc/argv t/helper: mark more unused argv/argc arguments oss-fuzz: mark unused argv/argc argument refs: mark unused parameters in do_for_each_reflog_helper() refs: mark unused parameters in ref_store fsck callbacks update-ref: mark more unused parameters in parser callbacks imap-send: mark unused parameter in ssl_socket_connect() fallback
2024-08-23Merge branch 'ps/config-wo-the-repository'Junio C Hamano
Use of API functions that implicitly depend on the_repository object in the config subsystem has been rewritten to pass a repository object through the callchain. * ps/config-wo-the-repository: config: hide functions using `the_repository` by default global: prepare for hiding away repo-less config functions config: don't depend on `the_repository` with branch conditions config: don't have setters depend on `the_repository` config: pass repo to functions that rename or copy sections config: pass repo to `git_die_config()` config: pass repo to `git_config_get_expiry_in_days()` config: pass repo to `git_config_get_expiry()` config: pass repo to `git_config_get_max_percent_split_change()` config: pass repo to `git_config_get_split_index()` config: pass repo to `git_config_get_index_threads()` config: expose `repo_config_clear()` config: introduce missing setters that take repo as parameter path: hide functions using `the_repository` by default path: stop relying on `the_repository` in `worktree_git_path()` path: stop relying on `the_repository` when reporting garbage hooks: remove implicit dependency on `the_repository` editor: do not rely on `the_repository` for interactive edits path: expose `do_git_common_path()` as `repo_common_pathv()` path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-17refs: mark unused parameters in ref_store fsck callbacksJeff King
Commit ab6f79d8df (refs: set up ref consistency check infrastructure, 2024-08-08) added virtual functions to the ref store for doing fsck checks. But the packed and reftable backends do not yet do anything. Let's annotate them to silence -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-16Merge branch 'sj/ref-fsck'Junio C Hamano
"git fsck" infrastructure has been taught to also check the sanity of the ref database, in addition to the object database. * sj/ref-fsck: fsck: add ref name check for files backend files-backend: add unified interface for refs scanning builtin/refs: add verify subcommand refs: set up ref consistency check infrastructure fsck: add refs report function fsck: add a unified interface for reporting fsck messages fsck: make "fsck_error" callback generic fsck: rename objects-related fsck error functions fsck: rename "skiplist" to "skip_oids"
2024-08-13global: prepare for hiding away repo-less config functionsPatrick Steinhardt
We're about to hide config functions that implicitly depend on `the_repository` behind the `USE_THE_REPOSITORY_VARIABLE` macro. This will uncover a bunch of dependents that transitively relied on the global variable, but didn't define the macro yet. Adapt them such that we define the macro to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08refs: set up ref consistency check infrastructureshejialuo
The "struct ref_store" is the base class which contains the "be" pointer which provides backend-specific functions whose interfaces are defined in the "ref_storage_be". We could reuse this polymorphism to define only one interface. For every backend, we need to provide its own function pointer. The interfaces defined in the `ref_storage_be` are carefully structured in semantic. It's organized as the five parts: 1. The name and the initialization interfaces. 2. The ref transaction interfaces. 3. The ref internal interfaces (pack, rename and copy). 4. The ref filesystem interfaces. 5. The reflog related interfaces. To keep consistent with the git-fsck(1), add a new interface named "fsck_refs_fn" to the end of "ref_storage_be". This semantic cannot be grouped into any above five categories. Explicitly add blank line to make it different from others. Last, implement placeholder functions for each ref backends. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>