aboutsummaryrefslogtreecommitdiff
path: root/upload-pack.c
AgeCommit message (Collapse)Author
2026-03-24Merge branch 'ps/upload-pack-buffer-more-writes'Junio C Hamano
Reduce system overhead "git upload-pack" spends on relaying "git pack-objects" output to the "git fetch" running on the other end of the connection. * ps/upload-pack-buffer-more-writes: builtin/pack-objects: reduce lock contention when writing packfile data csum-file: drop `hashfd_throughput()` csum-file: introduce `hashfd_ext()` sideband: use writev(3p) to send pktlines wrapper: introduce writev(3p) wrappers compat/posix: introduce writev(3p) wrapper upload-pack: reduce lock contention when writing packfile data upload-pack: prefer flushing data over sending keepalive upload-pack: adapt keepalives based on buffering upload-pack: fix debug statement when flushing packfile data
2026-03-13upload-pack: reduce lock contention when writing packfile dataPatrick Steinhardt
In our production systems we have recently observed write contention in git-upload-pack(1). The system in question was consistently streaming packfiles at a rate of dozens of gigabits per second, but curiously the system was neither bottlenecked on CPU, memory or IOPS. We eventually discovered that Git was spending 80% of its time in `pipe_write()`, out of which almost all of the time was spent in the `ep_poll_callback` function in the kernel. Quoting the reporter: This infrastructure is part of an event notification queue designed to allow for multiple producers to emit events, but that concurrency safety is guarded by 3 layers of locking. The layer we're hitting contention in uses a simple reader/writer lock mode (a.k.a. shared versus exclusive mode), where producers need shared-mode (read mode), and various other actions use exclusive (write) mode. The system in question generates workloads where we have hundreds of git-upload-pack(1) processes active at the same point in time. These processes end up contending around those locks, and the consequence is that the Git processes stall. Now git-upload-pack(1) already has the infrastructure in place to buffer some of the data it reads from git-pack-objects(1) before actually sending it out. We only use this infrastructure in very limited ways though, so we generally end up matching one read(3p) call with one write(3p) call. Even worse, when the sideband is enabled we end up matching one read with _two_ writes: one for the pkt-line length, and one for the packfile data. Extend our use of the buffering infrastructure so that we soak up bytes until the buffer is filled up at least 2/3rds of its capacity. The change is relatively simple to implement as we already know to flush the buffer in `create_pack_file()` after git-pack-objects(1) has finished. This significantly reduces the number of write(3p) syscalls we need to do. Before this change, cloning the Linux repository resulted in around 400,000 write(3p) syscalls. With the buffering in place we only do around 130,000 syscalls. Now we could of course go even further and make sure that we always fill up the whole buffer. But this might cause an increase in read(3p) syscalls, and some tests show that this only reduces the number of write(3p) syscalls from 130,000 to 100,000. So overall this doesn't seem worth it. Note that the issue could also be fixed by adapting the write buffer that we use in the downstream git-pack-objects(1) command, and such a change would have roughly the same result. But the command that generates the packfile data may not always be git-pack-objects(1) as it can be changed via "uploadpack.packObjectsHook", so such a fix would only help in _some_ cases. Regardless of that, we'll also adapt the write buffer size of git-pack-objects(1) in a subsequent commit. Helped-by: Matt Smiley <msmiley@gitlab.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13upload-pack: prefer flushing data over sending keepalivePatrick Steinhardt
When using the sideband in git-upload-pack(1) we know to send out keepalive packets in case generating the pack takes too long. These keepalives take the form of a simple empty pktline. In the preceding commit we have adapted git-upload-pack(1) to buffer data more aggressively before sending it to the client. This creates an obvious optimization opportunity: when we hit the keepalive timeout while we still hold on to some buffered data, then it makes more sense to flush out the data instead of sending the empty keepalive packet. This is overall not going to be a significant win. Most keepalives will come before the pack data starts, and once pack-objects starts producing data, it tends to do so pretty consistently. And of course we can't send data before we see the PACK header, because the whole point is to buffer the early bit waiting for packfile URIs. But the optimization is easy enough to realize. Do so and flush out data instead of sending an empty pktline. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13upload-pack: adapt keepalives based on bufferingPatrick Steinhardt
The function `create_pack_file()` is responsible for sending the packfile data to the client of git-upload-pack(1). As generating the bytes may take significant computing resources we also have a mechanism in place that optionally sends keepalive pktlines in case we haven't sent out any data. The keepalive logic is purely based poll(3p): we pass a timeout to that syscall, and if the call times out we send out the keepalive pktline. While reasonable, this logic isn't entirely sufficient: even if the call to poll(3p) ends because we have received data on any of the file descriptors we may not necessarily send data to the client. The most important edge case here happens in `relay_pack_data()`. When we haven't seen the initial "PACK" signature from git-pack-objects(1) yet we buffer incoming data. So in the worst case, if each of the bytes of that signature arrive shortly before the configured keepalive timeout, then we may not send out any data for a time period that is (almost) four times as long as the configured timeout. This edge case is rather unlikely to matter in practice. But in a subsequent commit we're going to adapt our buffering mechanism to become more aggressive, which makes it more likely that we don't send any data for an extended amount of time. Adapt the logic so that instead of using a fixed timeout on every call to poll(3p), we instead figure out how much time has passed since the last-sent data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-13upload-pack: fix debug statement when flushing packfile dataPatrick Steinhardt
When git-upload-pack(1) writes packfile data to the client we have some logic in place that buffers some partial lines. When that buffer still contains data after git-pack-objects(1) has finished we flush the buffer so that all remaining bytes are sent out. Curiously, when we do so we also print the string "flushed." to stderr. This statement has been introduced in b1c71b7281 (upload-pack: avoid sending an incomplete pack upon failure, 2006-06-20), so quite a while ago. What's interesting though is that stderr is typically spliced through to the client-side, and consequently the client would see this message. Munging the way how we do the caching indeed confirms this: $ git clone file:///home/pks/Development/linux/ Cloning into bare repository 'linux.git'... remote: Enumerating objects: 12980346, done. remote: Counting objects: 100% (131820/131820), done. remote: Compressing objects: 100% (50290/50290), done. remote: Total 12980346 (delta 96319), reused 104500 (delta 81217), pack-reused 12848526 (from 1) Receiving objects: 100% (12980346/12980346), 3.23 GiB | 57.44 MiB/s, done. flushed. Resolving deltas: 100% (10676718/10676718), done. It's quite clear that this string shouldn't ever be visible to the client, so it rather feels like this is a left-over debug statement. The menitoned commit doesn't mention this line, either. Remove the debug output to prepare for a change in how we do the buffering in the next commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-09Merge branch 'ps/refs-for-each'Junio C Hamano
Code refactoring around refs-for-each-* API functions. * ps/refs-for-each: refs: replace `refs_for_each_fullref_in()` refs: replace `refs_for_each_namespaced_ref()` refs: replace `refs_for_each_glob_ref()` refs: replace `refs_for_each_glob_ref_in()` refs: replace `refs_for_each_rawref_in()` refs: replace `refs_for_each_rawref()` refs: replace `refs_for_each_ref_in()` refs: improve verification for-each-ref options refs: generalize `refs_for_each_fullref_in_prefixes()` refs: generalize `refs_for_each_namespaced_ref()` refs: speed up `refs_for_each_glob_ref_in()` refs: introduce `refs_for_each_ref_ext` refs: rename `each_ref_fn` refs: rename `do_for_each_ref_flags` refs: move `do_for_each_ref_flags` further up refs: move `refs_head_ref_namespaced()` refs: remove unused `refs_for_each_include_root_ref()`
2026-02-23refs: replace `refs_for_each_namespaced_ref()`Patrick Steinhardt
Replace calls to `refs_for_each_namespaced_ref()` with the newly introduced `refs_for_each_ref_ext()` function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-23refs: rename `each_ref_fn`Patrick Steinhardt
Similar to the preceding commit, rename `each_ref_fn` to better match our current best practices around how we name things. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-17shallow: handling fetch relative-deepenSamo Pogačnik
When a shallowed repository gets deepened beyond the beginning of a merged branch, we may end up with some shallows that are hidden behind the reachable shallow commits. Added test 'fetching deepen beyond merged branch' exposes that behaviour. An example showing the problem based on added test: 0. Whole initial git repo to be cloned from Graph: * 033585d (HEAD -> main) Merge branch 'branch' |\ | * 984f8b1 (branch) five | * ecb578a four |/ * 0cb5d20 three * 2b4e70d two * 61ba98b one 1. Initial shallow clone --depth=3 (all good) Shallows: 2b4e70da2a10e1d3231a0ae2df396024735601f1 ecb578a3cf37198d122ae5df7efed9abaca17144 Graph: * 033585d (HEAD -> main) Merge branch 'branch' |\ | * 984f8b1 five | * ecb578a (grafted) four * 0cb5d20 three * 2b4e70d (grafted) two 2. Deepen shallow clone with fetch --deepen=1 (NOT OK) Shallows: 0cb5d204f4ef96ed241feb0f2088c9f4794ba758 61ba98be443fd51c542eb66585a1f6d7e15fcdae Graph: * 033585d (HEAD -> main) Merge branch 'branch' |\ | * 984f8b1 five | * ecb578a four |/ * 0cb5d20 (grafted) three --- Note that second shallow commit 61ba98be443fd51c542eb66585a1f6d7e15fcdae is not reachable. On the other hand, it seems that equivalent absolute depth driven fetches result in all the correct shallows. That led to this proposal, which unifies absolute and relative deepening in a way that the same get_shallow_commits() call is used in both cases. The difference is only that depth is adapted for relative deepening by measuring equivalent depth of current local shallow commits in the current remote repo. Thus a new function get_shallows_depth() has been added and the function get_reachable_list() became redundant / removed. Same example showing the corrected second step: 2. Deepen shallow clone with fetch --deepen=1 (all good) Shallow: 61ba98be443fd51c542eb66585a1f6d7e15fcdae Graph: * 033585d (HEAD -> main) Merge branch 'branch' |\ | * 984f8b1 five | * ecb578a four |/ * 0cb5d20 three * 2b4e70d two * 61ba98b (grafted) one The get_shallows_depth() function also shares the logic of the get_shallow_commits() function, but it focuses on counting depth of each existing shallow commit. The minimum result is stored as 'data->deepen_relative', which is set not to be zero for relative deepening anyway. That way we can always sum 'data->deepen_relative' and 'depth' values, because 'data->deepen_relative' is always 0 in absolute deepening. To avoid duplicating logic between get_shallows_depth() and get_shallow_commits(), get_shallow_commits() was modified so that it is used by get_shallows_depth(). Signed-off-by: Samo Pogačnik <samo_pogacnik@t-2.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04upload-pack: convert to use `reference_get_peeled_oid()`Patrick Steinhardt
The `write_v0_ref()` callback is invoked from two callsites: - Once via `send_ref()` which is a callback passed to `for_each_namespaced_ref_1()` and `refs_head_ref_namespaced()`. - Once manually to announce capabilities. When sending references to the client we also send the peeled value of tags. As we don't have a `struct reference` available in the second case, we cannot easily peel by calling `reference_get_peeled_oid()`, but we instead have to depend on on global state via `peel_iterated_oid()`. We do have a reference available though in the first case, it's only the second case that keeps us from using `reference_get_peeled_oid()`. But that second case only announces capabilities anyway, so we're not really handling a reference at all here. Adapt that case to construct a reference manually and pass that to `write_v0_ref()`. Start to use `reference_get_peeled_oid()` now that we always have a `struct reference` available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-29Merge branch 'jk/setup-revisions-freefix'Junio C Hamano
There are double frees and leaks around setup_revisions() API used in "git stash show", which has been fixed, and setup_revisions() API gained a wrapper to make it more ergonomic when using it with strvec-manged argc/argv pairs. * jk/setup-revisions-freefix: revision: retain argv NULL invariant in setup_revisions() treewide: pass strvecs around for setup_revisions_from_strvec() treewide: use setup_revisions_from_strvec() when we have a strvec revision: add wrapper to setup_revisions() from a strvec revision: manage memory ownership of argv in setup_revisions() stash: tell setup_revisions() to free our allocated strings
2025-09-22treewide: pass strvecs around for setup_revisions_from_strvec()Jeff King
The previous commit converted callers of setup_revisions() with a strvec to use the safer and easier _from_strvec() variant. Let's now convert spots that don't directly have a strvec, but receive an argc/argv pair that eventually comes from one. We'll instead pass the strvec down to the point where we call setup_revisions(). That makes these functions slightly less flexible if they were to grow other callers that don't use strvecs, but this rigidity is buying us some safety. It is only safe to pass the free_removed_argv_elements option to setup_revisions() if we know the elements of argv/argc are allocated on the heap. That isn't communicated in the type system when we are passed the bare elements. But if we get a strvec, we know that the elements are allocated strings. And at any rate, each of these modified functions has only a single caller (that has a strvec), so the loss of flexibility is unlikely to ever matter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-15Merge branch 'ps/upload-pack-oom-protection'Junio C Hamano
A broken or malicious "git fetch" can say that it has the same object for many many times, and the upload-pack serving it can exhaust memory storing them redundantly, which has been corrected. * ps/upload-pack-oom-protection: upload-pack: don't ACK non-commits repeatedly in protocol v2 t5530: modernize tests
2025-09-05upload-pack: don't ACK non-commits repeatedly in protocol v2Patrick Steinhardt
When a client performs a fetch or clone they can optionally send "have" lines to tell the server which objects they already have available locally. These object IDs are stored by the server in an object array so that it can remember any objects it doesn't have to include in the pack sent to the client. While there isn't any reason to do so, clients are free to send the same "have" line repeatedly. git-upload-pack(1) already knows to handle this well: every commit it has seen via a "have" line gets marked with the `THEY_HAVE` flag, and if such a commit is seen repeatedly we know to not process it another time. This also has the effect that we only store the object ID once, only, in the `have_obj` array. There is an edge case though: if the client sends an object ID that does not refer to a commit we neither store nor check the `THEY_HAVE` flag. This means that we repeatedly store the same object ID in our `have_obj` array, with two consequences: - In protocol v2 we deduplicate ACKs for commits, but not for any other objects as we send ACKs for every object ID in the `have_obj` array. - The `have_obj` array can grow in size indefinitely with both protocols. The potentially-more-serious issue is the second one, as we basically have a way for an adversary to allocate arbitrarily large buffers now. Ultimately, this doesn't seem to be all that serious though: on my machine, the growth of that array is at around 4MB/s, and after roughly five minutes I was only at 1GB RSS. So this is concerning, but only mildly so. Fix this bug by storing the `THEY_HAVE` flag independent of the object type so that we don't store duplicate object IDs in `have_obj` anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-02string-list: align string_list_split() with its _in_place() counterpartJunio C Hamano
The string_list_split_in_place() function was updated by 52acddf3 (string-list: multi-delimiter `string_list_split_in_place()`, 2023-04-24) to take more than one delimiter characters, hoping that we can later use it to replace our uses of strtok(). We however did not make a matching change to the string_list_split() function, which is very similar. Before giving both functions more features in future commits, allow string_list_split() to also take more than one delimiter characters to make them closer to each other. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `has_object()`Patrick Steinhardt
Rename `has_object()` to `odb_has_object()` to match other functions related to the object database and our modern coding guidelines. Introduce a compatibility wrapper so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01object-store: rename files to "odb.{c,h}"Patrick Steinhardt
In the preceding commits we have renamed the structures contained in "object-store.h" to `struct object_database` and `struct odb_backend`. As such, the code files "object-store.{c,h}" are confusingly named now. Rename them to "odb.{c,h}" accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-15upload-pack: rename `enum` to reflect the operationJohannes Schindelin
While 3145ea957d (upload-pack: introduce fetch server command, 2018-03-15) added support for the `fetch` command, from the server's point of view it is an upload, and hence the `enum` should really be called `upload_state` instead of `fetch_state`. Likewise, rename its values. This also helps unconfuse CodeQL which would otherwise be at sixes or sevens about having _two_ non-local definitions of the same `enum` with the same values. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29treewide: convert users of `repo_has_object_file()` to `has_object()`Patrick Steinhardt
As the comment of `repo_has_object_file()` and its `_with_flags()` variant tells us, these functions are considered to be deprecated in favor of `has_object()`. There are a couple of slight benefits in favor of the replacement: - The new function has a short-and-sweet name. - More explicit defaults: `has_object()` doesn't fetch missing objects via promisor remotes, and neither does it reload packfiles if an object wasn't found by default. This ensures that it becomes immediately obvious when a simple object existence check may result in expensive actions. Most importantly though, it is confusing that we have two sets of functions that ultimately do the same thing, but with different defaults. Start sunsetting `repo_has_object_file()` and its `_with_flags()` sibling by replacing all callsites with `has_object()`: - `repo_has_object_file(...)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., 0)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)` is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`. - `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)` is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`. The replacements should be functionally equivalent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object: stop depending on `the_repository`Patrick Steinhardt
There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18Add 'promisor-remote' capability to protocol v2Christian Couder
When a server S knows that some objects from a repository are available from a promisor remote X, S might want to suggest to a client C cloning or fetching the repo from S that C may use X directly instead of S for these objects. Note that this could happen both in the case S itself doesn't have the objects and borrows them from X, and in the case S has the objects but knows that X is better connected to the world (e.g., it is in a $LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections) than S. Implementation of the latter case, which would require S to omit in its response the objects available on X, is left for future improvement though. Then C might or might not, want to get the objects from X. If S and C can agree on C using X directly, S can then omit objects that can be obtained from X when answering C's request. To allow S and C to agree and let each other know about C using X or not, let's introduce a new "promisor-remote" capability in the protocol v2, as well as a few new configuration variables: - "promisor.advertise" on the server side, and: - "promisor.acceptFromServer" on the client side. By default, or if "promisor.advertise" is set to 'false', a server S will not advertise the "promisor-remote" capability. If S doesn't advertise the "promisor-remote" capability, then a client C replying to S shouldn't advertise the "promisor-remote" capability either. If "promisor.advertise" is set to 'true', S will advertise its promisor remotes with a string like: promisor-remote=<pr-info>[;<pr-info>]... where each <pr-info> element contains information about a single promisor remote in the form: name=<pr-name>[,url=<pr-url>] where <pr-name> is the urlencoded name of a promisor remote and <pr-url> is the urlencoded URL of the promisor remote named <pr-name>. For now, the URL is passed in addition to the name. In the future, it might be possible to pass other information like a filter-spec that the client may use when cloning from S, or a token that the client may use when retrieving objects from X. It is C's responsibility to arrange how it can reach X though, so pieces of information that are usually outside Git's concern, like proxy configuration, must not be distributed over this protocol. It might also be possible in the future for "promisor.advertise" to have other values. For example a value like "onlyName" could prevent S from advertising URLs, which could help in case C should use a different URL for X than the URL S is using. (The URL S is using might be an internal one on the server side for example.) By default or if "promisor.acceptFromServer" is set to "None", C will not accept to use the promisor remotes that might have been advertised by S. In this case, C will not advertise any "promisor-remote" capability in its reply to S. If "promisor.acceptFromServer" is set to "All" and S advertised some promisor remotes, then on the contrary, C will accept to use all the promisor remotes that S advertised and C will reply with a string like: promisor-remote=<pr-name>[;<pr-name>]... where the <pr-name> elements are the urlencoded names of all the promisor remotes S advertised. In a following commit, other values for "promisor.acceptFromServer" will be implemented, so that C will be able to decide the promisor remotes it accepts depending on the name and URL it received from S. So even if that name and URL information is not used much right now, it will be needed soon. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-13Merge branch 'en/shallow-exclude-takes-a-ref-fix'Junio C Hamano
The "--shallow-exclude=<ref>" option to various history transfer commands takes a ref, not an arbitrary revision. * en/shallow-exclude-takes-a-ref-fix: doc: correct misleading descriptions for --shallow-exclude upload-pack: fix ambiguous error message
2024-11-04upload-pack: fix leaking URI protocolsPatrick Steinhardt
We don't clear `struct upload_pack::uri_protocols`, which causes a memory leak. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-04upload-pack: fix ambiguous error messageElijah Newren
upload-pack.c takes any --shallow-exclude argument(s) from clone/fetch/etc. and passes them through expand_ref(). If it does not get back exactly one ref from the call to expand_ref(), it will die with the following error: fatal: git upload-pack: ambiguous deepen-not: %s Given that the documentation suggests to users that --shallow-exclude accepts a revision rather than a ref (which will be corrected in a subsequent commit), users may try to pass a revision. In such a case, expand_ref() will return 0 matches, but the error message we print will be misleading since "ambiguous" suggests there are multiple matches. Provide a clearer error message for such a case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-25Merge branch 'ak/typofix-2.46-maint'Junio C Hamano
Typofix. * ak/typofix-2.46-maint: upload-pack: fix a typo sideband: fix a typo setup: fix a typo run-command: fix a typo revision: fix a typo refs: fix typos rebase: fix a typo read-cache-ll: fix a typo pretty: fix a typo object-file: fix a typo merge-ort: fix typos merge-ll: fix a typo http: fix a typo gpg-interface: fix a typo git-p4: fix typos git-instaweb: fix a typo fsmonitor-settings: fix a typo diffcore-rename: fix typos config.mak.dev: fix a typo
2024-09-19upload-pack: fix a typoAndrew Kreimer
Fix a typo in comments. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-05upload-pack: fix leaking child process data on reachability checksPatrick Steinhardt
We spawn a git-rev-list(1) command to perform reachability checks in "upload-pack.c". We do not release memory associated with the process in error cases though, thus leaking memory. Fix these by calling `child_process_clear()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06Merge branch 'ps/leakfixes'Junio C Hamano
Leakfixes. * ps/leakfixes: builtin/mv: fix leaks for submodule gitfile paths builtin/mv: refactor to use `struct strvec` builtin/mv duplicate string list memory builtin/mv: refactor `add_slash()` to always return allocated strings strvec: add functions to replace and remove strings submodule: fix leaking memory for submodule entries commit-reach: fix memory leak in `ahead_behind()` builtin/credential: clear credential before exit config: plug various memory leaks config: clarify memory ownership in `git_config_string()` builtin/log: stop using globals for format config builtin/log: stop using globals for log config convert: refactor code to clarify ownership of check_roundtrip_encoding diff: refactor code to clarify memory ownership of prefixes config: clarify memory ownership in `git_config_pathname()` http: refactor code to clarify memory ownership checkout: clarify memory ownership in `unique_tracking_name()` strbuf: fix leak when `appendwholeline()` fails with EOF transport-helper: fix leaking helper name
2024-05-27config: clarify memory ownership in `git_config_string()`Patrick Steinhardt
The out parameter of `git_config_string()` is a `const char **` even though we transfer ownership of memory to the caller. This is quite misleading and has led to many memory leaks all over the place. Adapt the parameter to instead be `char **`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17refs: pass repo when peeling objectsPatrick Steinhardt
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on `the_repository` to look up objects. Despite the fact that we want to get rid of `the_repository`, it also leads to some restrictions in our ref iterators when trying to retrieve the peeled value for a repository other than `the_repository`. Refactor these functions such that both take a repository as argument and remove the now-unnecessary restrictions. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07Merge branch 'jk/upload-pack-v2-capability-cleanup'Junio C Hamano
The upload-pack program, when talking over v2, accepted the packfile-uris protocol extension from the client, even if it did not advertise the capability, which has been corrected. * jk/upload-pack-v2-capability-cleanup: upload-pack: only accept packfile-uris if we advertised it upload-pack: use existing config mechanism for advertisement upload-pack: centralize setup of sideband-all config upload-pack: use repository struct to get config
2024-03-07Merge branch 'jk/upload-pack-bounded-resources'Junio C Hamano
Various parts of upload-pack has been updated to bound the resource consumption relative to the size of the repository to protect from abusive clients. * jk/upload-pack-bounded-resources: upload-pack: free tree buffers after parsing upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places upload-pack: always turn off save_commit_buffer upload-pack: disallow object-info capability by default upload-pack: accept only a single packfile-uri line upload-pack: use a strmap for want-ref lines upload-pack: use oidset for deepen_not list upload-pack: switch deepen-not list to an oid_array upload-pack: drop separate v2 "haves" array
2024-02-29upload-pack: only accept packfile-uris if we advertised itJeff King
Clients are only supposed to request particular capabilities or features if the server advertised them. For the "packfile-uris" feature, we only advertise it if uploadpack.blobpacfileuri is set, but we always accept a request from the client regardless. In practice this doesn't really hurt anything, as we'd pass the client's protocol list on to pack-objects, which ends up ignoring it. But we should try to follow the protocol spec, and tightening this up may catch buggy or misbehaving clients more easily. Thanks to recent refactoring, we can hoist the config check from upload_pack_advertise() into upload_pack_config(). Note the subtle handling of a value-less bool (which does not count for triggering an advertisement). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use existing config mechanism for advertisementJeff King
When serving a v2 capabilities request, we call upload_pack_advertise() to tell us the set of features we can advertise to the client. That involves looking at various config options, all of which need to be kept in sync with the rules we use in upload_pack_config to set flags like allow_filter, allow_sideband_all, and so on. If these two pieces of code get out of sync then we may refuse to respect a capability we advertised, or vice versa accept one that we should not. Instead, let's call the same config helper that we'll use for processing the actual client request, and then just pick the values out of the resulting struct. This is only a little bit shorter than the current code, but we don't repeat any policy logic (e.g., we don't have to worry about the magic sideband-all environment variable here anymore). And this reveals a gap in the existing code: there is no struct flag for the packfile-uris capability (we accept it even if it is not advertised, which we should not). We'll leave the advertisement code for now and deal with it in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: centralize setup of sideband-all configJeff King
We read uploadpack.allowsidebandall to set a matching flag in our upload_pack_data struct. But for our tests, we also respect GIT_TEST_SIDEBAND_ALL from the environment, and anybody looking at the flag in the struct needs to remember to check both. There's only one such piece of code now, but we're about to add another. So let's have the config step actually fold the environment value into the struct, letting the rest of the code use the flag in the obvious way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use repository struct to get configJeff King
Our upload_pack_v2() function gets a repository struct, but we ignore it totally. In practice this doesn't cause any problems, as it will never differ from the_repository. But in the spirit of taking a small step towards getting rid of the_repository, let's at least starting using it to grab config. There are probably other spots that could benefit, but it's a start. Note that we don't need to pass the repo for protected_config(); the whole point there is that we are not looking at repo config, so there is no repo-specific version of the function. For the v0 version of the protocol, we're not passed a repository struct, so we'll continue to use the_repository there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: free tree buffers after parsingJeff King
When a client sends us a "want" or "have" line, we call parse_object() to get an object struct. If the object is a tree, then the parsed state means that tree->buffer points to the uncompressed contents of the tree. But we don't really care about it. We only really need to parse commits and tags; for trees and blobs, the important output is just a "struct object" with the correct type. But much worse, we do not ever free that tree buffer. It's not leaked in the traditional sense, in that we still have a pointer to it from the global object hash. But if the client requests many trees, we'll hold all of their contents in memory at the same time. Nobody really noticed because it's rare for clients to directly request a tree. It might happen for a lightweight tag pointing straight at a tree, or it might happen for a "tree:depth" partial clone filling in missing trees. But it's also possible for a malicious client to request a lot of trees, causing upload-pack's memory to balloon. For example, without this patch, requesting every tree in git.git like: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_trees() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "tree" || continue pktline want $oid done pktline done printf 0000 } want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null shows a peak heap usage of ~3.7GB. Which is just about the sum of the sizes of all of the uncompressed trees. For linux.git, it's closer to 17GB. So the obvious thing to do is to call free_tree_buffer() after we realize that we've parsed a tree. We know that upload-pack won't need it later. But let's push the logic into parse_object_with_flags(), telling it to discard the tree buffer immediately. There are two reasons for this. One, all of the relevant call-sites already call the with_options variant to pass the SKIP_HASH flag. So it actually ends up as less code than manually free-ing in each spot. And two, it enables an extra optimization that I'll discuss below. I've touched all of the sites that currently use SKIP_HASH in upload-pack. That drops the peak heap of the upload-pack invocation above from 3.7GB to ~24MB. I've also modified the caller in get_reference(); a partial clone benefits from its use in pack-objects for the reasons given in 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), where we were measuring blob requests. But note that the results of get_reference() are used for traversing, as well; so we really would _eventually_ use the tree contents. That makes this at first glance a space/time tradeoff: we won't hold all of the trees in memory at once, but we'll have to reload them each when it comes time to traverse. And here's where our extra optimization comes in. If the caller is not going to immediately look at the tree contents, and it doesn't care about checking the hash, then parse_object() can simply skip loading the tree entirely, just like we do for blobs! And now it's not a space/time tradeoff in get_reference() anymore. It's just a lazy-load: we're delaying reading the tree contents until it's time to actually traverse them one by one. And of course for upload-pack, this optimization means we never load the trees at all, saving lots of CPU time. Timing the "every tree from git.git" request above shows upload-pack dropping from 32 seconds of CPU to 19 (the remainder is mostly due to pack-objects actually sending the pack; timing just the upload-pack portion shows we go from 13s to ~0.28s). These are all highly gamed numbers, of course. For real-world partial-clone requests we're saving only a small bit of time in practice. But it does help harden upload-pack against malicious denial-of-service attacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more placesJeff King
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), we optimized the parse_object() calls for v2 "want" lines from the client so that they avoided parsing blobs, and so that they used the commit-graph rather than parsing commit objects from scratch. We should extend that to two other spots: 1. We parse "have" objects in the got_oid() function. These won't generally be non-commits (unlike "want" lines from a partial clone). But we still benefit from the use of the commit-graph. 2. For v0, the "want" lines are parsed in receive_needs(). These are also less likely to be non-commits because by default they have to be ref tips. There are config options you might set to allow non-tip objects, but you'd mostly do so to support partial clones, and clients recent enough to support partial clone will generally speak v2 anyway. So I don't expect this change to improve performance much for day-to-day operations. But both are possible denial-of-service vectors, where an attacker can waste our time by sending over a large number of objects to parse (of course we may waste even more time serving a pack to them, but we try as much as possible to optimize that in pack-objects; we should do what we can here in upload-pack, too). With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows similar results to what we saw in 0bc2557951 (which ran with the v2 protocol by default). Here are the numbers for linux.git: Test HEAD^ HEAD ----------------------------------------------------------------------------- 5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0% Or for a more extreme (and malicious) case, we can claim to "have" every blob in git.git over the v0 protocol: $ { echo "0032want $(git rev-parse HEAD)" printf 0000 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"' } >input $ time ./git.old upload-pack . <input >/dev/null real 0m52.951s user 0m51.633s sys 0m1.304s $ time ./git.new upload-pack . <input >/dev/null real 0m0.261s user 0m0.156s sys 0m0.105s (Note that these don't actually compute a pack because of the hacky protocol usage, so those numbers are representing the raw blob-parsing effort done by upload-pack). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: always turn off save_commit_bufferJeff King
When the client sends us "want $oid" lines, we call parse_object($oid) to get an object struct. It's important to parse the commits because we need to traverse them in the negotiation phase. But of course we don't need to hold on to the commit messages for each one. We've turned off the save_commit_buffer flag in get_common_commits() for a long time, since f0243f26f6 (git-upload-pack: More efficient usage of the has_sha1 array, 2005-10-28). That helps with the commits we see while actually traversing. But: 1. That function is only used by the v0 protocol. I think the v2 protocol's code path leaves the flag on (and thus pays the extra memory penalty), though I didn't measure it specifically. 2. If the client sends us a bunch of "want" lines, that happens before the negotiation phase. So we'll hold on to all of those commit messages. Generally the number of "want" lines scales with the refs, not with the number of objects in the repo. But a malicious client could send a lot in order to waste memory. As an example of (2), if I generate a request to fetch all commits in git.git like this: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_commits() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' | while read oid type; do test "$type" = "commit" || continue pktline want $oid done pktline done printf 0000 } want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null before this patch upload-pack peaks at ~125MB, and after at ~35MB. The difference is not coincidentally about the same as the sum of all commit object sizes as computed by: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' | perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }' In a larger repository like linux.git, that number is ~1GB. In a repository with a full commit-graph file this will have no impact (and the commit graph would save us from parsing at all, so is a much better solution!). But it's easy to do, might help a little in real-world cases (where even if you have a commit graph it might not be fully up to date), and helps a lot for a worst-case malicious request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: accept only a single packfile-uri lineJeff King
When we see a packfile-uri line from the client, we use string_list_split() to split it on commas and store the result in a string_list. A single packfile-uri line is therefore limited to storing ~64kb, the size of a pkt-line. But we'll happily accept multiple such lines, and each line appends to the string list, growing without bound. In theory this could be useful, making: 0017packfile-uris http 0018packfile-uris https equivalent to: 001dpackfile-uris http,https But the protocol documentation doesn't indicate that this should work (and indeed, refers to this in the singular as "the following argument can be included in the client's request"). And the client-side implementation in fetch-pack has always sent a single line (JGit appears to understand the line on the server side but has no client-side implementation, and libgit2 understands neither). If we were worried about compatibility, we could instead just put a limit on the maximum number of values we'd accept. The current client implementation limits itself to only two values: "http" and "https", so something like "256" would be more than enough. But accepting only a single line seems more in line with the protocol documentation, and matches other parts of the protocol (e.g., we will not accept a second "filter" line). We'll also make this more explicit in the protocol documentation; as above, I think this was always the intent, but there's no harm in making it clear. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use a strmap for want-ref linesJeff King
When the "ref-in-want" capability is advertised (which it is not by default), then upload-pack processes a "want-ref" line from the client by checking that the name is a valid ref and recording it in a string-list. In theory this list should grow no larger than the number of refs in the server-side repository. But since we don't do any de-duplication, a client which sends "want-ref refs/heads/foo" over and over will cause the array to grow without bound. We can fix this by switching to strmap, which efficiently detects duplicates. There are two client-visible changes here: 1. The "wanted-refs" response will now be in an apparently-random order (based on iterating the hashmap) rather than the order given by the client. The protocol documentation is quiet on ordering here. The current fetch-pack implementation is happy with any order, as it looks up each returned ref using a binary search in its local sorted list. JGit seems to implement want-ref on the server side, but has no client-side support. libgit2 doesn't support either side. It would obviously be possible to record the original order or to use the strmap as an auxiliary data structure. But if the client doesn't care, we may as well do the simplest thing. 2. We'll now reject duplicates explicitly as a protocol error. The client should never send them (and our current implementation, even when asked to "git fetch master:one master:two" will de-dup on the client side). If we wanted to be more forgiving, we could perhaps just throw away the duplicates. But then our "wanted-refs" response back to the client would omit the duplicates, and it's hard to say what a client that accidentally sent a duplicate would do with that. So I think we're better off to complain loudly before anybody accidentally writes such a client. Let's also add a note to the protocol documentation clarifying that duplicates are forbidden. As discussed above, this was already the intent, but it's not very explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: use oidset for deepen_not listJeff King
We record the oid of every deepen-not line the client sends to us. For a well-behaved client, the resulting array should be bounded by the number of unique refs we have. But because there's no de-duplication, a malicious client can cause the array to grow unbounded by just sending the same "refs/heads/foo" over and over (assuming such a ref exists). Since the deepen-not list is just being fed to a "rev-list --not" traversal, the order of items doesn't matter. So we can replace the oid_array with an oidset which notices and skips duplicates. That bounds the memory in malicious cases to be linear in the number of unique refs. And even in non-malicious cases, there may be a slight improvement in memory usage if multiple refs point to the same oid (though in practice this list is probably pretty tiny anyway, as it comes from the user specifying "--shallow-exclude" on the client fetch). Note that in the trace2 output we'll now output the number of de-duplicated objects, rather than the total number of "deepen-not" lines we received. This is arguably a more useful value for tracing / debugging anyway. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28upload-pack: switch deepen-not list to an oid_arrayJeff King
When we see a "deepen-not" line from the client, we verify that the given name can be resolved as a ref, and then add it to a string list to be passed later to an internal "rev-list --not" traversal. We record the actual refname in the string list (so the traversal resolves it again later), but we'd be better off recording the resolved oid: 1. There's a tiny bit of wasted work in resolving it twice. 2. There's a small race condition with simultaneous updates; the later traversal may resolve to a different value (or not at all). This shouldn't cause any bad behavior (we do not care about the value in this first resolution, so whatever value rev-list gets is OK) but it could mean a confusing error message (if upload-pack fails to resolve the ref it produces a useful message, but a failing traversal later results in just "revision walk setup failed"). 3. It makes it simpler to de-duplicate the results. We don't de-dup at all right now, but we will in the next patch. >From the client's perspective the behavior should be the same. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>