summaryrefslogtreecommitdiff
path: root/http.c
AgeCommit message (Collapse)Author
2025-07-23config: move Git config parsing into "environment.c"Patrick Steinhardt
In "config.c" we host both the business logic to read and write config files as well as the logic to parse specific Git-related variables. On the one hand this is mixing concerns, but even more importantly it means that we cannot easily remove the dependency on `the_repository` in our config parsing logic. Move the logic into "environment.c". This file is a grab bag of all kinds of global state already, so it is quite a good fit. Furthermore, it also hosts most of the global variables that we're parsing the config values into, making this an even better fit. Note that there is one hidden change: in `parse_fsync_components()` we use an `int` to iterate through `ARRAY_SIZE(fsync_component_names)`. But as -Wsign-compare warnings are enabled in this file this causes a compiler warning. The issue is fixed by using a `size_t` instead. This change allows us to drop the `USE_THE_REPOSITORY_VARIABLE` declaration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-15Merge branch 'ps/object-store'Junio C Hamano
Code clean-up around object access API. * ps/object-store: odb: rename `read_object_with_reference()` odb: rename `pretend_object_file()` odb: rename `has_object()` odb: rename `repo_read_object_file()` odb: rename `oid_object_info()` odb: trivial refactorings to get rid of `the_repository` odb: get rid of `the_repository` when handling submodule sources odb: get rid of `the_repository` when handling the primary source odb: get rid of `the_repository` in `for_each()` functions odb: get rid of `the_repository` when handling alternates odb: get rid of `the_repository` in `odb_mkstemp()` odb: get rid of `the_repository` in `assert_oid_type()` odb: get rid of `the_repository` in `find_odb()` odb: introduce parent pointers object-store: rename files to "odb.{c,h}" object-store: rename `object_directory` to `odb_source` object-store: rename `raw_object_store` to `object_database`
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename `object_directory` to `odb_source`Patrick Steinhardt
The `object_directory` structure is used as an access point for a single object directory like ".git/objects". While the structure isn't yet fully self-contained, the intent is for it to eventually contain all information required to access objects in one specific location. While the name "object directory" is a good fit for now, this will change over time as we continue with the agenda to make pluggable object databases a thing. Eventually, objects may not be accessed via any kind of directory at all anymore, but they could instead be backed by any kind of durable storage mechanism. While it seems quite far-fetched for now, it is thinkable that eventually this might even be some form of a database, for example. As such, the current name of this structure will become worse over time as we evolve into the direction of pluggable ODBs. Immediate next steps will start to carve out proper self-contained object directories, which requires us to pass in these object directories as parameters. Based on our modern naming schema this means that those functions should then be named after their subsystem, which means that we would start to bake the current name into the codebase more and more. Let's preempt this by renaming the structure. There have been a couple alternatives that were discussed: - `odb_backend` was discarded because it led to the association that one object database has a single backend, but the model is that one alternate has one backend. Furthermore, "backend" is more about the actual backing implementation and less about the high-level concept. - `odb_alternate` was discarded because it is a bit of a stretch to also call the main object directory an "alternate". Instead, pick `odb_source` as the new name. It makes it sufficiently clear that there can be multiple sources and does not cause confusion when mixed with the already-existing "alternate" terminology. In the future, this change allows us to easily introduce for example a `odb_files_source` and other format-specific implementations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-06curl: pass `long` values where expectedJohannes Schindelin
As of Homebrew's update to cURL v8.14.0, there are new compile errors to be observed in the `osx-gcc` job of Git's CI builds: In file included from http.h:8, from imap-send.c:36: In function 'setup_curl', inlined from 'curl_append_msgs_to_imap' at imap-send.c:1460:9, inlined from 'cmd_main' at imap-send.c:1581:9: /usr/local/Cellar/curl/8.14.0/include/curl/typecheck-gcc.h:50:15: error: call to '_curl_easy_setopt_err_long' declared with attribute warning: curl_easy_setopt expects a long argument [-Werror=attribute-warning] 50 | _curl_easy_setopt_err_long(); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/local/Cellar/curl/8.14.0/include/curl/curl.h:54:7: note: in definition of macro 'CURL_IGNORE_DEPRECATION' 54 | statements \ | ^~~~~~~~~~ imap-send.c:1423:9: note: in expansion of macro 'curl_easy_setopt' 1423 | curl_easy_setopt(curl, CURLOPT_PORT, srvc->port); | ^~~~~~~~~~~~~~~~ [... many more instances of nearly identical warnings...] See for example this CI workflow run: https://github.com/git/git/actions/runs/15454602308/job/43504278284#step:4:307 The most likely explanation is the entry "typecheck-gcc.h: fix the typechecks" in cURL's release notes (https://curl.se/ch/8.14.0.html). Nearly identical compile errors afflicted recently-updated Debian setups, which have been addressed by `jk/curl-easy-setopt-typefix`. However, on macOS Git is built with different build options, which uncovered more instances of `int` values that need to be cast to constants, which were not covered by 6f11c42e8edc (curl: fix integer constant typechecks with curl_easy_setopt(), 2025-06-04). Let's explicitly convert even those remaining `int` constants in `curl_easy_setopt()` calls to `long` parameters. In addition to looking at the compile errors of the `osx-gcc` job, I verified that there are no other instances of the same issue that need to be handled in this manner (and that might not be caught by our CI builds because of yet other build options that might skip those code parts), I ran the following command and inspected all 23 results manually to ensure that the fix is now actually complete: git grep -n curl_easy_setopt | grep -ve ',.*, *[A-Za-z_"&]' \ -e ',.*, *[-0-9]*L)' \ -e ',.*,.* (long)' Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04curl: fix symbolic constant typechecks with curl_easy_setopt()Jeff King
As with the previous two commits, we should be passing long integers, not regular ones, to curl_easy_setopt(), and compiling against curl 8.14 loudly complains if we don't. This patch catches the remaining cases, which are ones where we pass curl's own symbolic constants. We'll cast them to long manually in each call. It seems kind of weird to me that curl doesn't define these constants as longs, since the point of them is to pass to curl_easy_setopt(). But in the curl documentation and examples, they clearly show casting them as part of the setopt calls. It may be that there is some reason not to push the type into the macro, like backwards compatibility. I didn't dig, as it doesn't really matter: we have to follow what existing curl versions ask for anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-06-04curl: fix integer constant typechecks with curl_easy_setopt()Jeff King
The curl documentation specifies that curl_easy_setopt() takes either: ...a long, a function pointer, an object pointer or a curl_off_t, depending on what the specific option expects. But when we pass an integer constant like "0", it will by default be a regular non-long int. This has always been wrong, but seemed to work in practice (I didn't dig into curl's implementation to see whether this might actually be triggering undefined behavior, but it seems likely and regardless we should do what the docs say). This is especially important since curl has a type-checking macro that causes building against curl 8.14 to produce many warnings. The specific commit is due to their 79b4e56b3 (typecheck-gcc.h: fix the typechecks, 2025-04-22). Curiously, it does only seem to trigger when compiled with -O2 for me. We can fix it by just marking the constants with a long "L". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: drop `loose_object_path()`Patrick Steinhardt
The function `loose_object_path()` is a trivial wrapper around `odb_loose_path()`, with the only exception that it always uses the primary object database of the given repository. This doesn't really add a ton of value though, so let's drop the function and inline it at every callsite. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanupJunio C Hamano
* ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21http.c: allow custom TCP keepalive behavior via configTaylor Blau
curl supports a few options to control when and how often it should instruct the OS to send TCP keepalives, like KEEPIDLE, KEEPINTVL, and KEEPCNT. Until this point, there hasn't been a way for users to change what values are used for these options, forcing them to rely on curl's defaults. But we do unconditionally enable TCP keepalives without giving users an ability to tweak any fine-grained parameters. Ordinarily this isn't a problem, particularly for users that have fast-enough connections, and/or are talking to a server that has generous or nonexistent thresholds for killing a connection it hasn't heard from in a while. But it can present a problem when one or both of those assumptions fail. For instance, I can reliably get an in-progress clone to be killed from the remote end when cloning from some forges while using trickle to limit my clone's bandwidth. For those users and others who wish to more finely tune the OS's keepalive behavior, expose configuration and environment variables which allow setting curl's KEEPIDLE, KEEPINTVL, and KEEPCNT options. Note that while KEEPIDLE and KEEPINTVL were added in curl 7.25.0, KEEPCNT was added much more recently in curl 8.9.0. Per f7c094060c (git-curl-compat: remove check for curl 7.25.0, 2024-10-23), both KEEPIDLE and KEEPINTVL are set unconditionally. But since we may be compiled with a curl that isn't as new as 8.9.0, only set KEEPCNT when we have CURLOPT_TCP_KEEPCNT to begin with. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21http.c: inline `set_curl_keepalive()`Taylor Blau
At the end of `get_curl_handle()` we call `set_curl_keepalive()` to enable TCP keepalive probes on our CURL handle. `set_curl_keepalive()` dates back to 47ce115370 (http: use curl's tcp keepalive if available, 2013-10-14), which conditionally compiled different variants of `set_curl_keepalive()` depending on what version of curl we were compiled with[^1]. As of f7c094060c (git-curl-compat: remove check for curl 7.25.0, 2024-10-23), we no longer conditionally compile `set_curl_keepalive()` since we no longer support pre-7.25.0 versions of curl. But the version of that function that we kept is really just a thin wrapper around setting the TCP_KEEPALIVE option, so there's no reason to keep it in its own function. Inline the definition of `set_curl_keepalive()` to within `get_curl_handle()` so that the setup of our CURL handle is self-contained. [1]: The details are spelled out in 47ce115370, but the gist is curl 7.25.0 and newer use CURLOPT_TCP_KEEPALIVE, older versions use CURLOPT_SOCKOPTFUNCTION with a custom callback, and older versions that predate even that option do nothing. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21http.c: introduce `set_long_from_env()` for convenienceTaylor Blau
In 7059cd99fc (http_init(): Fix config file parsing, 2009-03-09), http.c gained a new "set_from_env()" function as a convenience function around conditionally assigning an environment variable to some variable if and only if the environment variable was set to begin with. But prior to 7059cd99fc, there were two spots which need to first strtol() whatever is set in the environment before assigning it to a long pointer. Both instances stored the result of getenv() in a temporary variable, and conditionally strtol() it depending on whether or not getenv() returned NULL. Replace those two instances with a new cousin of 'set_from_env()' called 'set_long_from_env()', which does what its name suggests. This allows us to remove the temporary variables and clean up some minor code duplication while also adding more robust error handling. More importantly, however, it prepares us for a future commit which will introduce more instances of assigning an environment variable to a long. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-21http.c: remove unnecessary casts to longTaylor Blau
When parsing 'http.lowSpeedLimit' and 'http.lowSpeedTime', we explicitly cast the result of 'git_config_int()' to a long before assignment. This cast has been in place since all the way back in 58e60dd203 (Add support for pushing to a remote repository using HTTP/DAV, 2005-11-02). But that cast has always been unnecessary, since long is guaranteed to be at least as wide as int. Let's drop the cast accordingly. Noticed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-24http: allow using netrc for WebDAV-based HTTP protocolbrian m. carlson
For an extended period of time, we've enabled libcurl's netrc functionality, which will read credentials from the netrc file if none are provided. Unfortunately, we have also not documented this fact or written any tests for it, but people have come to rely on it. In 610cbc1dfb ("http: allow authenticating proactively", 2024-07-10), we accidentally broke the ability of users to use the netrc file for the WebDAV-based HTTP protocol. Notably, it works on the initial request but does not work on subsequent requests, which causes failures because that version of the protocol will necessarily make multiple requests. This happens because curl_empty_auth_enabled never returns -1, only 0 or 1, and so if http.proactiveAuth is not enabled, the username and password are always set to empty credentials, which prevents libcurl's fallback to netrc from working. However, in other cases, the server continues to get a 401 response and the credential helper is invoked, which is the normal behavior, so this was not noticed earlier. To fix this, change the condition to check for enabling empty auth and also not having proactive auth enabled, which should result in the username and password not being set to a single colon in the typical case, and thus the netrc file being used. Reported-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-31global: adapt callers to use generic hash context helpersPatrick Steinhardt
Adapt callers to use generic hash context helpers instead of using the hash algorithm to update them. This makes the callsites easier to reason about and removes the possibility that the wrong hash algorithm is used to update the hash context's state. And as a nice side effect this also gets rid of a bunch of users of `the_hash_algo`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18credential: stop using `the_repository`Patrick Steinhardt
Stop using `the_repository` in the "credential" subsystem by passing in a repository when filling, approving or rejecting credentials. Adjust callers accordingly by using `the_repository`. While there may be some callers that have a repository available in their context, this trivial conversion allows for easier verification and bubbles up the use of `the_repository` by one level. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-18Merge branch 'ps/build-sign-compare' into ps/the-repositoryJunio C Hamano
* ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-13Merge branch 'kn/midx-wo-the-repository'Junio C Hamano
Yet another "pass the repository through the callchain" topic. * kn/midx-wo-the-repository: midx: inline the `MIDX_MIN_SIZE` definition midx: pass down `hash_algo` to functions using global variables midx: pass `repository` to `load_multi_pack_index` midx: cleanup internal usage of `the_repository` and `the_hash_algo` midx-write: pass down repository to `write_midx_file[_only]` write-midx: add repository field to `write_midx_context` midx-write: use `revs->repo` inside `read_refs_snapshot` midx-write: pass down repository to static functions packfile.c: remove unnecessary prepare_packed_git() call midx: add repository to `multi_pack_index` struct config: make `packed_git_(limit|window_size)` non-global variables config: make `delta_base_cache_limit` a non-global variable packfile: pass down repository to `for_each_packed_object` packfile: pass down repository to `has_object[_kept]_pack` packfile: pass down repository to `odb_pack_name` packfile: pass `repository` to static function in the file packfile: use `repository` from `packed_git` directly packfile: add repository to struct `packed_git`
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04Merge branch 'bc/drop-ancient-libcurl-and-perl'Junio C Hamano
Drop support for older libcURL and Perl. * bc/drop-ancient-libcurl-and-perl: gitweb: make use of s///r Require Perl 5.26.0 INSTALL: document requirement for libcurl 7.61.0 git-curl-compat: remove check for curl 7.56.0 git-curl-compat: remove check for curl 7.53.0 git-curl-compat: remove check for curl 7.52.0 git-curl-compat: remove check for curl 7.44.0 git-curl-compat: remove check for curl 7.43.0 git-curl-compat: remove check for curl 7.39.0 git-curl-compat: remove check for curl 7.34.0 git-curl-compat: remove check for curl 7.25.0 git-curl-compat: remove check for curl 7.21.5
2024-12-04packfile: pass down repository to `odb_pack_name`Karthik Nayak
The function `odb_pack_name` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: add repository to struct `packed_git`Karthik Nayak
The struct `packed_git` holds information regarding a packed object file. Let's add the repository variable to this object, to represent the repository that this packfile belongs to. This helps remove dependency on the global `the_repository` object in `packfile.c` by simply using repository information now readily available in the struct. We do need to consider that a packfile could be part of the alternates of a repository, but considering that we only have one repository struct and also that we currently anyways use 'the_repository', we should be OK with this change. We also modify `alloc_packed_git` to ensure that the repository is added to newly created `packed_git` structs. This requires modifying the function and all its callee to pass the repository object down the levels. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-01Merge branch 'jk/dumb-http-finalize'Taylor Blau
The dumb-http code regressed when the result of re-indexing a pack yielded an *.idx file that differs in content from the *.idx file it downloaded from the remote. This has been corrected by no longer relying on the *.idx file we got from the remote. * jk/dumb-http-finalize: packfile: use oidread() instead of hashcpy() to fill object_id packfile: use object_id in find_pack_entry_one() packfile: convert find_sha1_pack() to use object_id http-walker: use object_id instead of bare hash packfile: warn people away from parse_packed_git() packfile: drop sha1_pack_index_name() packfile: drop sha1_pack_name() packfile: drop has_pack_index() dumb-http: store downloaded pack idx as tempfile t5550: count fetches in "previously-fetched .idx" test midx: avoid duplicate packed_git entries
2024-10-25packfile: drop sha1_pack_name()Jeff King
The sha1_pack_name() function has a few ugly bits: - it writes into a static strbuf (and not even a ring buffer of them), which can lead to subtle invalidation problems - it uses the term "sha1", but it's really using the_hash_algo, which could be sha256 There's only one caller of it left. And in fact that caller is better off using the underlying odb_pack_name() function itself, since it's just copying the result into its own strbuf anyway. Converting that caller lets us get rid of this now-obselete function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25packfile: drop has_pack_index()Jeff King
The has_pack_index() function has several oddities that may make it surprising if you are trying to find out if we have a pack with some $hash: - it is not looking for a valid pack that we found while searching object directories. It just looks for any pack-$hash.idx file in the pack directory. - it only looks in the local directory, not any alternates - it takes a bare "unsigned char" hash, which we try to avoid these days The only caller it has is in the dumb http code; it wants to know if we already have the pack idx in question. This can happen if we downloaded the pack (and generated its index) during a previous fetch. Before the previous patch ("dumb-http: store downloaded pack idx as tempfile"), it could also happen if we downloaded the .idx from the remote but didn't get the matching .pack. But since that patch, we don't hold on to those .idx files. So there's no need to look for the .idx file in the filesystem; we can just scan through the packed_git list to see if we have it. That lets us simplify the dumb http code a bit, as we know that if we have the .idx we have the matching .pack already. And it lets us get rid of this odd function that is unlikely to be needed again. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-25dumb-http: store downloaded pack idx as tempfileJeff King
This patch fixes a regression in b1b8dfde69 (finalize_object_file(): implement collision check, 2024-09-26) where fetching a v1 pack idx file over the dumb-http protocol would cause the fetch to fail. The core of the issue is that dumb-http stores the idx we fetch from the remote at the same path that will eventually hold the idx we generate from "index-pack --stdin". The sequence is something like this: 0. We realize we need some object X, which we don't have locally, and nor does the other side have it as a loose object. 1. We download the list of remote packs from objects/info/packs. 2. For each entry in that file, we download each pack index and store it locally in .git/objects/pack/pack-$hash.idx (the $hash is not something we can verify yet and is given to us by the remote). 3. We check each pack index we got to see if it has object X. When we find a match, we download the matching .pack file from the remote to a tempfile. We feed that to "index-pack --stdin", which reindexes the pack, rather than trusting that it has what the other side claims it does. In most cases, this will end up generating the exact same (byte-for-byte) pack index which we'll store at the same pack-$hash.idx path, because the index generation and $hash id are computed based on what's in the packfile. But: a. The other side might have used other options to generate the index. For instance we use index v2 by default, but long ago it was v1 (and you can still ask for v1 explicitly). b. The other side might even use a different mechanism to determine $hash. E.g., long ago it was based on the sorted list of objects in the packfile, but we switched to using the pack checksum in 1190a1acf8 (pack-objects: name pack files after trailer hash, 2013-12-05). The regression we saw in the real world was (3a). A recent client fetching from a server with a v1 index downloaded that index, then complained about trying to overwrite it with its own v2 index. This collision is otherwise harmless; we know we want to replace the remote version with our local one, but the collision check doesn't realize that. There are a few options to fix it: - we could teach index-pack a command-line option to ignore only pack idx collisions, and use it when the dumb-http code invokes index-pack. This would be an awkward thing to expose users to and would involve a lot of boilerplate to get the option down to the collision code. - we could delete the remote .idx file right before running index-pack. It should be redundant at that point (since we've just downloaded the matching pack). But it feels risky to delete something from our own .git/objects based on what the other side has said. I'm not entirely positive that a malicious server couldn't lie about which pack-$hash.idx it has and get us to delete something precious. - we can stop co-mingling the downloaded idx files in our local objects directory. This is a slightly bigger change but I think fixes the root of the problem more directly. This patch implements the third option. The big design questions are: where do we store the downloaded files, and how do we manage their lifetimes? There are some additional quirks to the dumb-http system we should consider. Remember that in step 2 we downloaded every pack index, but in step 3 we may only download some of the matching packs. What happens to those other idx files now? They sit in the .git/objects/pack directory, possibly waiting to be used at a later date. That may save bandwidth for a subsequent fetch, but it also creates a lot of weird corner cases: - our local object directory now has semi-untrusted .idx files sitting around, without their matching .pack - in case 3b, we noted that we might not generate the same hash as the other side. In that case even if we download the matching pack, our index-pack invocation will store it in a different pack-$hash.idx file. And the unmatched .idx will sit there forever. - if the server repacks, it may delete the old packs. Now we have these orphaned .idx files sitting around locally that will never be used (nor deleted). - if we repack locally we may delete our local version of the server's pack index and not realize we have it. So we'll download it again, even though we have all of the objects it mentions. I think the right solution here is probably some more complex cache management system: download the remote .idx files to their own storage directory, mark them as "seen" when we get their matching pack (to avoid re-downloading even if we repack), and then delete them when the server's objects/info/refs no longer mentions them. But since the dumb http protocol is so ancient and so inferior to the smart http protocol, I don't think it's worth spending a lot of time creating such a system. For this patch I'm just downloading the idx files to .git/objects/tmp_pack_*, and marking them as tempfiles to be deleted when we exit (and due to the name, any we miss due to a crash, etc, should eventually be removed by "git gc" runs based on timestamps). That is slightly worse for one case: if we download an idx but not the matching pack, we won't retain that idx for subsequent runs. But the flip side is that we're making other cases better (we never hold on to useless idx files forever). I suspect that worse case does not even come up often, since it implies that the packs are generated to match distinct parts of history (i.e., in practice even in a repo with many packs you're going to end up grabbing all of those packs to do a clone). If somebody really cares about that, I think the right path forward is a managed cache directory as above, and this patch is providing the first step in that direction anyway (by moving things out of the objects/pack/ directory). There are two test changes. One demonstrates the broken v1 index case (it double-checks the resulting clone with fsck to be careful, but prior to this patch it actually fails at the clone step). The other tweaks the expectation for a test that covers the "slightly worse" case to accommodate the extra index download. The code changes are fairly simple. We stop using finalize_object_file() to copy the remote's index file into place, and leave it as a tempfile. We give the tempfile a real ".idx" name, since the packfile code expects that, and thus we make sure it is out of the usual packs/ directory (so we'd never mistake it for a real local .idx). We also have to change parse_pack_index(), which creates a temporary packed_git to access our index (we need this because all of the pack idx code assumes we have that struct). It reads the index data from the tempfile, but prior to this patch would speculatively write the finalized name into the packed_git struct using the pack-$hash we expect to use. I was mildly surprised that this worked at all, since we call verify_pack_index() on the packed_git which mentions the final name before moving the file into place! But it works because parse_pack_index() leaves the mmap-ed data in the struct, so the lazy-open in verify_pack_index() never triggers, and we read from the tempfile, ignoring the filename in the struct completely. Hacky, but it works. After this patch, parse_pack_index() now uses the index filename we pass in to derive a matching .pack name. This is OK to change because there are only two callers, both in the dumb http code (and the other passes in an existing pack-$hash.idx name, so the derived name is going to be pack-$hash.pack, which is what we were using anyway). I'll follow up with some more cleanups in that area, but this patch is sufficient to fix the regression. Reported-by: fox <fox.gbr@townlong-yak.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.56.0brian m. carlson
libcurl 7.56.0 was released in September 2017, which is over seven years ago, and no major operating system vendor is still providing security support for it. Debian 10, which is out of mainstream security support, has supported a newer version, and Ubuntu 20.04 and RHEL 8, which are still in support, also have a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.53.0brian m. carlson
libcurl 7.53.0 was released in February 2017, which is over seven years ago, and no major operating system vendor is still providing security support for it. Debian 10 and Ubuntu 18.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.52.0brian m. carlson
libcurl 7.52.0 was released in August 2017, which is over seven years ago, and no major operating system vendor is still providing security support for it. Debian 9 and Ubuntu 18.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.44.0brian m. carlson
libcurl 7.44.0 was released in August 2015, which is over nine years ago, and no major operating system vendor is still providing security support for it. Debian 9 and Ubuntu 16.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.43.0brian m. carlson
libcurl 7.43.0 was released in June 2015, which is over nine years ago, and no major operating system vendor is still providing security support for it. Debian 9 and Ubuntu 16.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.39.0brian m. carlson
libcurl 7.39.0 was released in November 2014, which is almost ten years ago, and no major operating system vendor is still providing security support for it. Debian 9 and Ubuntu 16.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.34.0brian m. carlson
libcurl 7.34.0 was released in December 2013, which is well over ten years ago, and no major operating system vendor is still providing security support for it. Debian 8 and Ubuntu 14.04, both of which are out of mainstream security support, have supported a newer version, and RHEL 8, which is still in support, also has a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-23git-curl-compat: remove check for curl 7.25.0brian m. carlson
libcurl 7.25.0 was released in March 2012, which is well over ten years ago, and no major operating system vendor is still providing security support for it. Debian 8, RHEL 7, and Ubuntu 12.10, all of which are out of mainstream security support, have all supported a newer version. Remove the check for this version and use this functionality unconditionally. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-16http: fix build error on FreeBSDPatrick Steinhardt
The `result` parameter passed to `http_request_reauth()` may either point to a `struct strbuf` or a `FILE *`, where the `target` parameter tells us which of either it actually is. To accommodate for both types the pointer is a `void *`, which we then pass directly to functions without doing a cast. This is fine on most platforms, but it breaks on FreeBSD because `fileno()` is implemented as a macro that tries to directly access the `FILE *` structure. Fix this issue by storing the `FILE *` in a local variable before we pass it on to other functions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2024-10-02Merge branch 'jk/http-leakfixes'Junio C Hamano
Leakfixes. * jk/http-leakfixes: (28 commits) http-push: clean up local_refs at exit http-push: clean up loose request when falling back to packed http-push: clean up objects list http-push: free xml_ctx.cdata after use http-push: free remote_ls_ctx.dentry_name http-push: free transfer_request strbuf http-push: free transfer_request dest field http-push: free curl header lists http-push: free repo->url string http-push: clear refspecs before exiting http-walker: free fake packed_git list remote-curl: free HEAD ref with free_one_ref() http: stop leaking buffer in http_get_info_packs() http: call git_inflate_end() when releasing http_object_request http: fix leak of http_object_request struct http: fix leak when redacting cookies from curl trace transport-helper: fix leak of dummy refs_list fetch-pack: clear pack lockfiles list fetch: free "raw" string when shrinking refspec transport-helper: fix strbuf leak in push_refs_with_push() ...
2024-09-25Merge branch 'ak/typofix-2.46-maint'Junio C Hamano
Typofix. * ak/typofix-2.46-maint: upload-pack: fix a typo sideband: fix a typo setup: fix a typo run-command: fix a typo revision: fix a typo refs: fix typos rebase: fix a typo read-cache-ll: fix a typo pretty: fix a typo object-file: fix a typo merge-ort: fix typos merge-ll: fix a typo http: fix a typo gpg-interface: fix a typo git-p4: fix typos git-instaweb: fix a typo fsmonitor-settings: fix a typo diffcore-rename: fix typos config.mak.dev: fix a typo
2024-09-25http: stop leaking buffer in http_get_info_packs()Jeff King
We use http_get_strbuf() to fetch the remote info/packs content into a strbuf, but never free it, causing a leak. There's no need to hold onto it, as we've already parsed it completely. This lets us mark t5619 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25http: call git_inflate_end() when releasing http_object_requestJeff King
In new_http_object_request(), we initialize the zlib stream with git_inflate_init(). We must have a matching git_inflate_end() to avoid leaking any memory allocated by zlib. In most cases this happens in finish_http_object_request(), but we don't always get there. If we abort a request mid-stream, then we may clean it up without hitting that function. We can't just add a git_inflate_end() call to the release function, though. That would double-free the cases that did actually finish. Instead, we'll move the call from the finish function to the release function. This does delay it for the cases that do finish, but I don't think it matters. We should have already reached Z_STREAM_END (and complain if we didn't), and we do not record any status code from git_inflate_end(). This leak is triggered by t5550 at least (and probably other dumb-http tests). I did find one other related spot of interest. If we try to read a previously downloaded file and fail, we reset the stream by calling memset() followed by a fresh git_inflate_init(). I don't think this case is triggered in the test suite, but it seemed like an obvious leak, so I added the appropriate git_inflate_end() before the memset() there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25http: fix leak of http_object_request structJeff King
The new_http_object_request() function allocates a struct on the heap, along with some fields inside the struct. But the matching function to clean it up, release_http_object_request(), only frees the interior fields without freeing the struct itself, causing a leak. The related http_pack_request new/release pair gets this right, and at first glance we should be able to do the same thing and just add a single free() call. But there's a catch. These http_object_request structs are typically embedded in the object_request struct of http-walker.c. And when we clean up that parent struct, it sanity-checks the embedded struct to make sure we are not leaking descriptors. Which means a use-after-free if we simply free() the embedded struct. I have no idea how valuable that sanity-check is, or whether it can simply be deleted. This all goes back to 5424bc557f (http*: add helper methods for fetching objects (loose), 2009-06-06). But the obvious way to make it all work is to be sure we set the pointer to NULL after freeing it (and our freeing process closes the descriptor, so we know there is no leak). To make sure we do that consistently, we'll switch the pointer we take in release_http_object_request() to a pointer-to-pointer, and we'll set it to NULL ourselves. And then the compiler can help us find each caller which needs to be updated. Most cases will just pass "&obj_req->req", which will obviously do the right thing. In a few cases, like http-push's finish_request(), we are working with a copy of the pointer, so we don't NULL the original. But it's OK because the next step is to free the struct containing the original pointer anyway. This lets us mark t5551 as leak-free. Ironically this is the "smart" http test, and the leak here only affects dumb http. But there's a single dumb-http invocation in there. The full dumb tests are in t5550, which still has some more leaks. This also makes t5559 leak-free, as it's just an HTTP/2 variant of t5551. But we don't need to mark it as such, since it inherits the flag from t5551. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25http: fix leak when redacting cookies from curl traceJeff King
When redacting headers for GIT_TRACE_CURL, we build up a redacted cookie header in a local strbuf, and then copy it into the output. But we forget to release the temporary strbuf, leaking it for every cookie header we show. The other redacted headers don't run into this problem, since they're able to work in-place in the output buffer. But the cookie parsing is too complicated for that, since we redact the cookies individually. This leak is triggered by the cookie tests in t5551. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-19http: fix a typoAndrew Kreimer
Fix a typo in comments. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-02http: do not ignore proxy pathRyan Hendrickson
The documentation for `http.proxy` describes that option, and the environment variables it overrides, as supporting "the syntax understood by curl". curl allows SOCKS proxies to use a path to a Unix domain socket, like `socks5h://localhost/path/to/socket.sock`. Git should therefore include, if present, the path part of the proxy URL in what it passes to libcurl. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-17Merge branch 'jc/http-cookiefile'Junio C Hamano
The http.cookieFile and http.saveCookies configuration variables have a few values that need to be avoided, which are now ignored with warning messages. * jc/http-cookiefile: http.c: cookie file tightening
2024-07-16Merge branch 'bc/http-proactive-auth'Junio C Hamano
The http transport can now be told to send request with authentication material without first getting a 401 response. * bc/http-proactive-auth: http: allow authenticating proactively
2024-07-09http.c: cookie file tighteningJunio C Hamano
The http.cookiefile configuration variable is used to call curl_easy_setopt() to set CURLOPT_COOKIEFILE and if http.savecookies is set, the same value is used for CURLOPT_COOKIEJAR. The former is used only to read cookies at startup, the latter is used to write cookies at the end. The manual pages https://curl.se/libcurl/c/CURLOPT_COOKIEFILE.html and https://curl.se/libcurl/c/CURLOPT_COOKIEJAR.html talk about two interesting special values. * "" (an empty string) given to CURLOPT_COOKIEFILE means not to read cookies from any file upon startup. * It is not specified what "" (an empty string) given to CURLOPT_COOKIEJAR does; presumably open a file whose name is an empty string and write cookies to it? In any case, that is not what we want to see happen, ever. * "-" (a dash) given to CURLOPT_COOKIEFILE makes cURL read cookies from the standard input, and given to CURLOPT_COOKIEJAR makes cURL write cookies to the standard output. Neither of which we want ever to happen. So, let's make sure we avoid these nonsense cases. Specifically, when http.cookies is set to "-", ignore it with a warning, and when it is set to "" and http.savecookies is set, ignore http.savecookies with a warning. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-09http: allow authenticating proactivelybrian m. carlson
When making a request over HTTP(S), Git only sends authentication if it receives a 401 response. Thus, if a repository is open to the public for reading, Git will typically never ask for authentication for fetches and clones. However, there may be times when a user would like to authenticate nevertheless. For example, a forge may give higher rate limits to users who authenticate because they are easier to contact in case of excessive use. Or it may be useful for a known heavy user, such as an internal service, to proactively authenticate so its use can be monitored and, if necessary, throttled. Let's make this possible with a new option, "http.proactiveAuth". This option specifies a type of authentication which can be used to authenticate against the host in question. This is necessary because we lack the WWW-Authenticate header to provide us details; similarly, we cannot accept certain types of authentication because we require information from the server, such as a nonce or challenge, to successfully authenticate. If we're in auto mode and we got a username and password, set the authentication scheme to Basic. libcurl will not send authentication proactively unless there's a single choice of allowed authentication, and we know in this case we didn't get an authtype entry telling us what scheme to use, or we would have taken a different codepath and written the header ourselves. In any event, of the other schemes that libcurl supports, Digest and NTLM require a nonce or challenge, which means that they cannot work with proactive auth, and GSSAPI does not use a username and password at all, so Basic is the only logical choice among the built-in options. Note that the existing http_proactive_auth variable signifies proactive auth if there are already credentials, which is different from the functionality we're adding, which always seeks credentials even if none are provided. Nonetheless, t5540 tests the existing behavior for WebDAV-based pushes to an open repository without credentials, so we preserve it. While at first this may seem an insecure and bizarre decision, it may be that authentication is done with TLS certificates, in which case it might actually provide a quite high level of security. Expand the variable to use an enum to handle the additional cases and a helper function to distinguish our new cases from the old ones. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions