aboutsummaryrefslogtreecommitdiff
path: root/bundle-uri.c
AgeCommit message (Collapse)Author
2025-12-20bundle-uri: validate that bundle entries have a uriSam Bostock
When a bundle list config file has a typo like 'url' instead of 'uri', or simply omits the uri field, the bundle entry is created but bundle->uri remains NULL. This causes a segfault when copy_uri_to_file() passes the NULL to starts_with(). Signed-off-by: Sam Bostock <sam@sambostock.ca> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15Merge branch 'ps/object-store'Junio C Hamano
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-07Sync with Git 2.50.1Junio C Hamano
2025-07-01odb: get rid of `the_repository` in `odb_mkstemp()`Patrick Steinhardt
Get rid of our dependency on `the_repository` in `odb_mkstemp()` by passing in the object database as a parameter and adjusting all callers. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-18Merge branch 'jm/bundle-uri-debug-output-to-fp'Junio C Hamano
Code clean-up. * jm/bundle-uri-debug-output-to-fp: bundle-uri: send debug output to given FILE * stream
2025-06-15Sync with 2.49.1Junio C Hamano
2025-06-04bundle-uri: send debug output to given FILE * streamJan Mazur
d796cedb (bundle-uri: unit test "key=value" parsing, 2022-10-12) introduced the print_bundle_list() function, which takes a "FILE *fp" to write the output to. Later with c93c3d2f (bundle-uri: parse bundle.heuristic=creationToken, 2023-01-31) the function started showing additional information, which is always written to the standard output stream. It does not look like a deliberate decision to do so, and it does not hurt, as all callers of the function passes stdout to it. We could change the function not to take fp and always write to the standard output to simplify, but let's use the FILE *fp provided by the caller consistently to write out output. Signed-off-by: Jan Mazur <mzr@meta.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-28Sync with 2.47.3Taylor Blau
* maint-2.47: Git 2.47.3 Git 2.46.4 Git 2.45.4 Git 2.44.4 Git 2.43.7 wincred: avoid buffer overflow in wcsncat() bundle-uri: fix arbitrary file writes via parameter injection config: quote values containing CR character git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls git-gui: do not mistake command arguments as redirection operators git-gui: introduce function git_redir for git calls with redirections git-gui: pass redirections as separate argument to git_read git-gui: pass redirections as separate argument to _open_stdout_stderr git-gui: convert git_read*, git_write to be non-variadic git-gui: override exec and open only on Windows gitk: sanitize 'open' arguments: revisit recently updated 'open' calls git-gui: use git_read in githook_read git-gui: sanitize $PATH on all platforms git-gui: break out a separate function git_read_nice git-gui: assure PATH has only absolute elements. git-gui: remove option --stderr from git_read git-gui: cleanup git-bash menu item git-gui: sanitize 'exec' arguments: background git-gui: avoid auto_execok in do_windows_shortcut git-gui: sanitize 'exec' arguments: simple cases git-gui: avoid auto_execok for git-bash menu item git-gui: treat file names beginning with "|" as relative paths git-gui: remove unused proc is_shellscript git-gui: remove git config --list handling for git < 1.5.3 git-gui: remove special treatment of Windows from open_cmd_pipe git-gui: remove HEAD detachment implementation for git < 1.5.3 git-gui: use only the configured shell git-gui: remove Tcl 8.4 workaround on 2>@1 redirection git-gui: make _shellpath usable on startup git-gui: use [is_Windows], not bad _shellpath git-gui: _which, only add .exe suffix if not present gitk: encode arguments correctly with "open" gitk: sanitize 'open' arguments: command pipeline gitk: collect construction of blameargs into a single conditional gitk: sanitize 'open' arguments: simple commands, readable and writable gitk: sanitize 'open' arguments: simple commands with redirections gitk: sanitize 'open' arguments: simple commands gitk: sanitize 'exec' arguments: redirect to process gitk: sanitize 'exec' arguments: redirections and background gitk: sanitize 'exec' arguments: redirections gitk: sanitize 'exec' arguments: 'eval exec' gitk: sanitize 'exec' arguments: simple cases gitk: have callers of diffcmd supply pipe symbol when necessary gitk: treat file names beginning with "|" as relative paths
2025-05-28Sync with 2.46.4Taylor Blau
* maint-2.46: Git 2.46.4 Git 2.45.4 Git 2.44.4 Git 2.43.7 wincred: avoid buffer overflow in wcsncat() bundle-uri: fix arbitrary file writes via parameter injection config: quote values containing CR character git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls git-gui: do not mistake command arguments as redirection operators git-gui: introduce function git_redir for git calls with redirections git-gui: pass redirections as separate argument to git_read git-gui: pass redirections as separate argument to _open_stdout_stderr git-gui: convert git_read*, git_write to be non-variadic git-gui: override exec and open only on Windows gitk: sanitize 'open' arguments: revisit recently updated 'open' calls git-gui: use git_read in githook_read git-gui: sanitize $PATH on all platforms git-gui: break out a separate function git_read_nice git-gui: assure PATH has only absolute elements. git-gui: remove option --stderr from git_read git-gui: cleanup git-bash menu item git-gui: sanitize 'exec' arguments: background git-gui: avoid auto_execok in do_windows_shortcut git-gui: sanitize 'exec' arguments: simple cases git-gui: avoid auto_execok for git-bash menu item git-gui: treat file names beginning with "|" as relative paths git-gui: remove unused proc is_shellscript git-gui: remove git config --list handling for git < 1.5.3 git-gui: remove special treatment of Windows from open_cmd_pipe git-gui: remove HEAD detachment implementation for git < 1.5.3 git-gui: use only the configured shell git-gui: remove Tcl 8.4 workaround on 2>@1 redirection git-gui: make _shellpath usable on startup git-gui: use [is_Windows], not bad _shellpath git-gui: _which, only add .exe suffix if not present gitk: encode arguments correctly with "open" gitk: sanitize 'open' arguments: command pipeline gitk: collect construction of blameargs into a single conditional gitk: sanitize 'open' arguments: simple commands, readable and writable gitk: sanitize 'open' arguments: simple commands with redirections gitk: sanitize 'open' arguments: simple commands gitk: sanitize 'exec' arguments: redirect to process gitk: sanitize 'exec' arguments: redirections and background gitk: sanitize 'exec' arguments: redirections gitk: sanitize 'exec' arguments: 'eval exec' gitk: sanitize 'exec' arguments: simple cases gitk: have callers of diffcmd supply pipe symbol when necessary gitk: treat file names beginning with "|" as relative paths Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-28Sync with 2.45.4Taylor Blau
* maint-2.45: Git 2.45.4 Git 2.44.4 Git 2.43.7 wincred: avoid buffer overflow in wcsncat() bundle-uri: fix arbitrary file writes via parameter injection config: quote values containing CR character git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls git-gui: do not mistake command arguments as redirection operators git-gui: introduce function git_redir for git calls with redirections git-gui: pass redirections as separate argument to git_read git-gui: pass redirections as separate argument to _open_stdout_stderr git-gui: convert git_read*, git_write to be non-variadic git-gui: override exec and open only on Windows gitk: sanitize 'open' arguments: revisit recently updated 'open' calls git-gui: use git_read in githook_read git-gui: sanitize $PATH on all platforms git-gui: break out a separate function git_read_nice git-gui: assure PATH has only absolute elements. git-gui: remove option --stderr from git_read git-gui: cleanup git-bash menu item git-gui: sanitize 'exec' arguments: background git-gui: avoid auto_execok in do_windows_shortcut git-gui: sanitize 'exec' arguments: simple cases git-gui: avoid auto_execok for git-bash menu item git-gui: treat file names beginning with "|" as relative paths git-gui: remove unused proc is_shellscript git-gui: remove git config --list handling for git < 1.5.3 git-gui: remove special treatment of Windows from open_cmd_pipe git-gui: remove HEAD detachment implementation for git < 1.5.3 git-gui: use only the configured shell git-gui: remove Tcl 8.4 workaround on 2>@1 redirection git-gui: make _shellpath usable on startup git-gui: use [is_Windows], not bad _shellpath git-gui: _which, only add .exe suffix if not present gitk: encode arguments correctly with "open" gitk: sanitize 'open' arguments: command pipeline gitk: collect construction of blameargs into a single conditional gitk: sanitize 'open' arguments: simple commands, readable and writable gitk: sanitize 'open' arguments: simple commands with redirections gitk: sanitize 'open' arguments: simple commands gitk: sanitize 'exec' arguments: redirect to process gitk: sanitize 'exec' arguments: redirections and background gitk: sanitize 'exec' arguments: redirections gitk: sanitize 'exec' arguments: 'eval exec' gitk: sanitize 'exec' arguments: simple cases gitk: have callers of diffcmd supply pipe symbol when necessary gitk: treat file names beginning with "|" as relative paths Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-27Merge branch 'js/misc-fixes'Junio C Hamano
Assorted fixes for issues found with CodeQL. * js/misc-fixes: sequencer: stop pretending that an assignment is a condition bundle-uri: avoid using undefined output of `sscanf()` commit-graph: avoid using stale stack addresses trace2: avoid "futile conditional" Avoid redundant conditions fetch: avoid unnecessary work when there is no current branch has_dir_name(): make code more obvious upload-pack: rename `enum` to reflect the operation commit-graph: avoid malloc'ing a local variable fetch: carefully clear local variable's address after use commit: simplify code
2025-05-23bundle-uri: fix arbitrary file writes via parameter injectionPatrick Steinhardt' via Git Security
We fetch bundle URIs via `download_https_uri_to_file()`. The logic to fetch those bundles is not handled in-process, but we instead use a separate git-remote-https(1) process that performs the fetch for us. The information about which file should be downloaded and where that file should be put gets communicated via stdin of that process via a "get" request. This "get" request has the form "get $uri $file\n\n". As may be obvious to the reader, this will cause git-remote-https(1) to download the URI "$uri" and put it into "$file". The fact that we are using plain spaces and newlines as separators for the request arguments means that we have to be extra careful with the respective vaules of these arguments: - If "$uri" contained a space we would interpret this as both URI and target location. - If either "$uri" or "$file" contained a newline we would interpret this as a new command. But we neither quote the arguments such that any characters with special meaning would be escaped, nor do we verify that none of these special characters are contained. If either the URI or file contains a newline character, we are open to protocol injection attacks. Likewise, if the URI itself contains a space, then an attacker-controlled URI can lead to partially-controlled file writes. Note that the attacker-controlled URIs do not permit completely arbitrary file writes, but instead allows an attacker to control the path in which we will write a temporary (e.g., "tmp_uri_XXXXXX") file. The result is twofold: - By adding a space in "$uri" we can control where exactly a file will be written to, including out-of-repository writes. The final location is not completely arbitrary, as the injected string will be concatenated with the original "$file" path. Furthermore, the name of the bundle will be "tmp_uri_XXXXXX", further restricting what an adversary would be able to write. Also note that is not possible for the URI to contain a newline because we end up in `credential_from_url_1()` before we try to issue any requests using that URI. As such, it is not possible to inject arbitrary commands via the URI. - By adding a newline to "$file" we can inject arbitrary commands. This gives us full control over where a specific file will be written to. Potential attack vectors would be to overwrite hooks, but if an adversary were to guess where the user's home directory is located they might also easily write e.g. a "~/.profile" file and thus cause arbitrary code execution. This injection can only become possible when the adversary has full control over the target path where a bundle will be downloaded to. While this feels unlikely, it is possible to control this path when users perform a recursive clone with a ".gitmodules" file that is controlled by the adversary. Luckily though, the use of bundle URIs is not enabled by default in Git clients (yet): they have to be enabled by setting the `bundle.heuristic` config key explicitly. As such, the blast radius of this parameter injection should overall be quite contained. Fix the issue by rejecting spaces in the URI and newlines in both the URI and the file. As explained, it shouldn't be required to also restrict the use of newlines in the URI, as we would eventually die anyway in `credential_from_url_1()`. But given that we're only one small step away from arbitrary code execution, let's rather be safe and restrict newlines in URIs, as well. Eventually we should probably refactor the way that Git talks with the git-remote-https(1) subprocess so that it is less fragile. Until then, these two restrictions should plug the issue. Reported-by: David Leadbeater <dgl@dgl.cx> Based-on-patch-by: David Leadbeater <dgl@dgl.cx> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2025-05-19Merge branch 'sc/bundle-uri-use-all-refs-in-bundle'Junio C Hamano
Bundle-URI feature did not use refs recorded in the bundle other than normal branches as anchoring points to optimize the follow-up fetch during "git clone"; now it is told to utilize all. * sc/bundle-uri-use-all-refs-in-bundle: bundle-uri: add test for bundle-uri clones with tags bundle-uri: copy all bundle references ino the refs/bundle space
2025-05-15bundle-uri: avoid using undefined output of `sscanf()`Johannes Schindelin
In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31) code was introduced that assumes that an `sscanf()` call leaves its output variables unchanged unless the return value indicates success. However, the POSIX documentation makes no such guarantee: https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html So let's make sure that the output variable `maxCreationToken` is always well-defined. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-25bundle-uri: copy all bundle references ino the refs/bundle spaceScott Chacon
When downloading bundles via the bundle-uri functionality, we only copy the references from refs/heads into the refs/bundle space. I'm not sure why this refspec is hardcoded to be so limited, but it makes the ref negotiation on the subsequent fetch suboptimal, since it won't use objects that are referenced outside of the current heads of the bundled repository. This change to copy everything in refs/ in the bundle to refs/bundles/ significantly helps the subsequent fetch, since nearly all the references are now included in the negotiation. The update to the bundle-uri unbundling refspec puts all the heads from a bundle file into refs/bundle/heads instead of directly into refs/bundle/ so the tests also need to be updated to look in the new heirarchy. Signed-off-by: Scott Chacon <schacon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-28bundle: add bundle verification options typeJustin Tobler
When `unbundle()` is invoked, fsck verification may be configured by passing the `VERIFY_BUNDLE_FSCK` flag. This mechanism allows fsck checks on the bundle to be enabled or disabled entirely. To facilitate more fine-grained fsck configuration, additional context must be provided to `unbundle()`. Introduce the `unbundle_opts` type, which wraps the existing `verify_bundle_flags`, to facilitate future extension of `unbundle()` configuration. Also update `unbundle()` and its call sites to accept this new options type instead of the flags directly. The end behavior is functionally the same, but allows for the set of configurable options to be extended. This is leveraged in a subsequent commit to enable fsck message severity configuration. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-10-10bundle-uri: plug leak in unbundle_from_file()Toon Claes
The function `unbundle_from_file()` has two memory leaks: - We do not release the `struct bundle_header header` when hitting errors because we return early without any cleanup. - We do not release the `struct strbuf bundle_ref` at all. Plug these leaks by creating a common exit path where both of these variables are released. While at it, refactor the code such that the variable assignments do not happen inside the conditional statement itself according to our coding style. Signed-off-by: Toon Claes <toon@iotcl.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-23Merge branch 'ps/environ-wo-the-repository'Junio C Hamano
Code clean-up. * ps/environ-wo-the-repository: (21 commits) environment: stop storing "core.notesRef" globally environment: stop storing "core.warnAmbiguousRefs" globally environment: stop storing "core.preferSymlinkRefs" globally environment: stop storing "core.logAllRefUpdates" globally refs: stop modifying global `log_all_ref_updates` variable branch: stop modifying `log_all_ref_updates` variable repo-settings: track defaults close to `struct repo_settings` repo-settings: split out declarations into a standalone header environment: guard state depending on a repository environment: reorder header to split out `the_repository`-free section environment: move `set_git_dir()` and related into setup layer environment: make `get_git_namespace()` self-contained environment: move object database functions into object layer config: make dependency on repo in `read_early_config()` explicit config: document `read_early_config()` and `read_very_early_config()` environment: make `get_git_work_tree()` accept a repository environment: make `get_graft_file()` accept a repository environment: make `get_index_file()` accept a repository environment: make `get_object_directory()` accept a repository environment: make `get_git_common_dir()` accept a repository ...
2024-09-12environment: move object database functions into object layerPatrick Steinhardt
The `odb_mkstemp()` and `odb_pack_keep()` functions are quite clearly tied to the object store, but regardless of that they are located in "environment.c". Move them over, which also helps to get rid of dependencies on `the_repository` in the environment subsystem. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-22fetch: add top-level trace2 regionsJosh Steadmon
At $DAYJOB we experienced some slow fetch operations and needed some additional data to help diagnose the issue. Add top-level trace2 regions for the various modes of operation of `git-fetch`. None of these regions are in recursive code, so any enclosed trace messages should only see their nesting level increase by one. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-08Merge branch 'xx/bundie-uri-fixes'Junio C Hamano
When bundleURI interface fetches multiple bundles, Git failed to take full advantage of all bundles and ended up slurping duplicated objects. * xx/bundie-uri-fixes: unbundle: extend object verification for fetches fetch-pack: expose fsckObjects configuration logic bundle-uri: verify oid before writing refs
2024-06-20unbundle: extend object verification for fetchesXing Xin
The existing fetch.fsckObjects and transfer.fsckObjects configurations were not fully applied to bundle-involved fetches, including direct bundle fetches and bundle-uri enabled fetches. Furthermore, there was no object verification support for unbundle. This commit extends object verification support in `bundle.c:unbundle` by adding the `VERIFY_BUNDLE_FSCK` option to `verify_bundle_flags`. When this option is enabled, we append the `--fsck-objects` flag to `git-index-pack`. The `VERIFY_BUNDLE_FSCK` option is now used by bundle-involved fetches, where we use `fetch-pack.c:fetch_pack_fsck_objects` to determine whether to enable this option for `bundle.c:unbundle`, specifically in: - `transport.c:fetch_refs_from_bundle` for direct bundle fetches. - `bundle-uri.c:unbundle_from_file` for bundle-uri enabled fetches. This addition ensures a consistent logic for object verification during fetches. Tests have been added to confirm functionality in the scenarios mentioned above. Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-20bundle-uri: verify oid before writing refsXing Xin
When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29bundle-uri: mark unused parameters in callbacksJeff King
The first hunk is similar to 02c3c59e62 (hashmap: mark unused callback parameters, 2022-08-19), but was added after that commit. The other two are used with for_all_bundles_in_list(), but don't use their void data pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-06Merge branch 'gc/config-context'Junio C Hamano
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-06-28config: pass ctx with config filesGlen Choo
Pass config_context to config_callbacks when parsing config files. To provide the .kvi member, refactor out the configset logic that caches "struct config_source" and "enum config_scope" as a "struct key_value_info". Make the "enum config_scope" available to the config file machinery by plumbing an additional arg through git_config_from_file_with_options(). We do not exercise ctx yet because the remaining current_config_*() callers may be used with config_with_options(), which may read config from parameters, but parameters don't pass ctx yet. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-24treewide: remove cache.h inclusion due to previous changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24copy.h: move declarations for copy.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24fetch_bundle_uri(): drop pointless NULL checkJeff King
We check if "uri" is NULL, but it cannot be since we'd have segfaulted earlier in the function when we unconditionally called xstrdup() on it. In theory we might want to soften that xstrdup() to handle this case, but even before the code which added it via c23f592117 (bundle-uri: fetch a list of bundles, 2022-10-12), we'd have fed NULL to fetch_bundle_uri_internal(), which would also segfault. The extra check isn't hurting anything, but it does cause Coverity to complain, and it may mislead somebody reading the code into thinking that a NULL uri is something we're prepared to handle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'ds/fetch-bundle-uri-with-all'Junio C Hamano
"git fetch --all" does not have to download and handle the same bundleURI over and over, which has been corrected. * ds/fetch-bundle-uri-with-all: fetch: download bundles once, even with --all
2023-03-31fetch: download bundles once, even with --allDerrick Stolee
When fetch.bundleURI is set, 'git fetch' downloads bundles from the given bundle URI before fetching from the specified remote. However, when using non-file remotes, 'git fetch --all' will launch 'git fetch' subprocesses which then read fetch.bundleURI and fetch the bundle list again. We do not expect the bundle list to have new information during these multiple runs, so avoid these extra calls by un-setting fetch.bundleURI in the subprocess arguments. Be careful to skip fetching bundles for the empty bundle string. Fetching bundles from the empty list presents some interesting test failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24serve: use repository pointer to get configJeff King
A few of the v2 "serve" callbacks ignore their repository parameter and read config using the_repository (either directly or implicitly by calling wrapper functions). This isn't a bug since the server code only handles a single main repository anyway (and indeed, if you look at the callers, these repository parameters will always be the_repository). But in the long run we want to get rid of the_repository, so let's take a tiny step in that direction. As a bonus, this silences some -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15Merge branch 'ds/bundle-uri-5'Junio C Hamano
The bundle-URI subsystem adds support for creation-token heuristics to help incremental fetches. * ds/bundle-uri-5: bundle-uri: test missing bundles with heuristic bundle-uri: store fetch.bundleCreationToken fetch: fetch from an external bundle URI bundle-uri: drop bundle.flag from design doc clone: set fetch.bundleURI if appropriate bundle-uri: download in creationToken order bundle-uri: parse bundle.<id>.creationToken values bundle-uri: parse bundle.heuristic=creationToken t5558: add tests for creationToken heuristic bundle: verify using check_connected() bundle: test unbundling with incomplete history
2023-01-31bundle-uri: store fetch.bundleCreationTokenDerrick Stolee
When a bundle list specifies the "creationToken" heuristic, the Git client downloads the list and then starts downloading bundles in descending creationToken order. This process stops as soon as all downloaded bundles can be applied to the repository (because all required commits are present in the repository or in the downloaded bundles). When checking the same bundle list twice, this strategy requires downloading the bundle with the maximum creationToken again, which is wasteful. The creationToken heuristic promises that the client will not have a use for that bundle if its creationToken value is at most the previous creationToken value. To prevent these wasteful downloads, create a fetch.bundleCreationToken config setting that the Git client sets after downloading bundles. This value allows skipping that maximum bundle download when this config value is the same value (or larger). To test that this works correctly, we can insert some "duplicate" fetches into existing tests and demonstrate that only the bundle list is downloaded. The previous logic for downloading bundles by creationToken worked even if the bundle list was empty, but now we have logic that depends on the first entry of the list. Terminate early in the (non-sensical) case of an empty bundle list. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31clone: set fetch.bundleURI if appropriateDerrick Stolee
Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote.<id>.* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31bundle-uri: download in creationToken orderDerrick Stolee
The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31bundle-uri: parse bundle.<id>.creationToken valuesDerrick Stolee
The previous change taught Git to parse the bundle.heuristic value, especially when its value is "creationToken". Now, teach Git to parse the bundle.<id>.creationToken values on each bundle in a bundle list. Before implementing any logic based on creationToken values for the creationToken heuristic, parse and print these values for testing purposes. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31bundle-uri: parse bundle.heuristic=creationTokenDerrick Stolee
The bundle.heuristic value communicates that the bundle list is organized to make use of the bundle.<id>.creationToken values that may be provided in the bundle list. Those values will create a total order on the bundles, allowing the Git client to download them in a specific order and even remember previously-downloaded bundles by storing the maximum creation token value. Before implementing any logic that parses or uses the bundle.<id>.creationToken values, teach Git to parse the bundle.heuristic value from a bundle list. We can use 'test-tool bundle-uri' to print the heuristic value and verify that the parsing works correctly. As an extra precaution, create the internal 'heuristics' array to be a list of (enum, string) pairs so we can iterate through the array entries carefully, regardless of the enum values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>