aboutsummaryrefslogtreecommitdiff
path: root/builtin/clone.c
AgeCommit message (Collapse)Author
2023-02-15Merge branch 'ds/bundle-uri-5'Junio C Hamano
The bundle-URI subsystem adds support for creation-token heuristics to help incremental fetches. * ds/bundle-uri-5: bundle-uri: test missing bundles with heuristic bundle-uri: store fetch.bundleCreationToken fetch: fetch from an external bundle URI bundle-uri: drop bundle.flag from design doc clone: set fetch.bundleURI if appropriate bundle-uri: download in creationToken order bundle-uri: parse bundle.<id>.creationToken values bundle-uri: parse bundle.heuristic=creationToken t5558: add tests for creationToken heuristic bundle: verify using check_connected() bundle: test unbundling with incomplete history
2023-02-13Sync with Git 2.39.2Junio C Hamano
2023-02-06clone: use free() instead of UNLEAK()Ævar Arnfjörð Bjarmason
Change an UNLEAK() added in 0c4542738e6 (clone: free or UNLEAK further pointers when finished, 2021-03-14) to use a "to_free" pattern instead. In this case the "repo" can be either this absolute_pathdup() value, or in the "else if" branch seen in the context the the "argv[0]" argument to "main()". We can only free() the value in the former case, hence the "to_free" pattern. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06Sync with 2.38.4Johannes Schindelin
* maint-2.38: Git 2.38.4 Git 2.37.6 Git 2.36.5 Git 2.35.7 Git 2.34.7 http: support CURLOPT_PROTOCOLS_STR http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.37.6Johannes Schindelin
* maint-2.37: Git 2.37.6 Git 2.36.5 Git 2.35.7 Git 2.34.7 http: support CURLOPT_PROTOCOLS_STR http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.36.5Johannes Schindelin
* maint-2.36: Git 2.36.5 Git 2.35.7 Git 2.34.7 http: support CURLOPT_PROTOCOLS_STR http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.35.7Johannes Schindelin
* maint-2.35: Git 2.35.7 Git 2.34.7 http: support CURLOPT_PROTOCOLS_STR http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.34.7Johannes Schindelin
* maint-2.34: Git 2.34.7 http: support CURLOPT_PROTOCOLS_STR http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.33.7Johannes Schindelin
* maint-2.33: Git 2.33.7 Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.32.6Johannes Schindelin
* maint-2.32: Git 2.32.6 Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.31.7Johannes Schindelin
* maint-2.31: Git 2.31.7 Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-02-06Sync with 2.30.8Johannes Schindelin
* maint-2.30: Git 2.30.8 apply: fix writing behind newly created symbolic links dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS clone: delay picking a transport until after get_repo_path() t5619: demonstrate clone_local() with ambiguous transport
2023-01-31clone: set fetch.bundleURI if appropriateDerrick Stolee
Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote.<id>.* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-24clone: delay picking a transport until after get_repo_path()Taylor Blau
In the previous commit, t5619 demonstrates an issue where two calls to `get_repo_path()` could trick Git into using its local clone mechanism in conjunction with a non-local transport. That sequence is: - the starting state is that the local path https:/example.com/foo is a symlink that points to ../../../.git/modules/foo. So it's dangling. - get_repo_path() sees that no such path exists (because it's dangling), and thus we do not canonicalize it into an absolute path - because we're using --separate-git-dir, we create .git/modules/foo. Now our symlink is no longer dangling! - we pass the url to transport_get(), which sees it as an https URL. - we call get_repo_path() again, on the url. This second call was introduced by f38aa83f9a (use local cloning if insteadOf makes a local URL, 2014-07-17). The idea is that we want to pull the url fresh from the remote.c API, because it will apply any aliases. And of course now it sees that there is a local file, which is a mismatch with the transport we already selected. The issue in the above sequence is calling `transport_get()` before deciding whether or not the repository is indeed local, and not passing in an absolute path if it is local. This is reminiscent of a similar bug report in [1], where it was suggested to perform the `insteadOf` lookup earlier. Taking that approach may not be as straightforward, since the intent is to store the original URL in the config, but to actually fetch from the insteadOf one, so conflating the two early on is a non-starter. Note: we pass the path returned by `get_repo_path(remote->url[0])`, which should be the same as `repo_name` (aside from any `insteadOf` rewrites). We *could* pass `absolute_pathdup()` of the same argument, which 86521acaca (Bring local clone's origin URL in line with that of a remote clone, 2008-09-01) indicates may differ depending on the presence of ".git/" for a non-bare repo. That matters for forming relative submodule paths, but doesn't matter for the second call, since we're just feeding it to the transport code, which is fine either way. [1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-02Merge branch 'ds/bundle-uri-4'Junio C Hamano
Bundle URIs part 4. * ds/bundle-uri-4: clone: unbundle the advertised bundles bundle-uri: download bundles from an advertised list bundle-uri: allow relative URLs in bundle lists strbuf: introduce strbuf_strip_file_from_path() bundle-uri: serve bundle.* keys from config bundle-uri client: add helper for testing server transport: rename got_remote_heads bundle-uri client: add boolean transfer.bundleURI setting clone: request the 'bundle-uri' command when available t: create test harness for 'bundle-uri' command protocol v2: add server-side "bundle-uri" skeleton
2022-12-25clone: unbundle the advertised bundlesDerrick Stolee
A previous change introduced the transport methods to acquire a bundle list from the 'bundle-uri' protocol v2 command, when advertised _and_ when the client has chosen to enable the feature. Teach Git to download and unbundle the data advertised by those bundles during 'git clone'. This takes place between the ref advertisement and the object data download, and stateful connections will linger while the client downloads bundles. In the future, we should consider closing the remote connection during this process. Also, since the --bundle-uri option exists, we do not want to mix the advertised bundles with the user-specified bundles. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25clone: request the 'bundle-uri' command when availableÆvar Arnfjörð Bjarmason
Set up all the needed client parts of the 'bundle-uri' protocol v2 command, without actually doing anything with the bundle URIs. If the server says it supports 'bundle-uri' teach Git to issue the 'bundle-uri' command after the 'ls-refs' during 'git clone'. The returned key=value pairs are passed to the bundle list code which is tested using a different ingest mechanism in t5750-bundle-uri-parse.sh. At this point, Git does nothing with that bundle list. It will not download any of the bundles. That will come in a later change after these protocol bits are finalized. The no-op client is initially used only by 'git clone' to test the basic functionality, and eventually will bootstrap the initial download of Git objects during a fresh clone. The bundle URI client will not be integrated into other fetches until a mechanism is created to select a subset of bundles for download. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-21cocci: apply "pending" index-compatibility to some "builtin/*.c"Ævar Arnfjörð Bjarmason
Apply "index-compatibility.pending.cocci" rule to "builtin/*", but exclude those where we conflict with in-flight changes. As a result some of them end up using only "the_index", so let's have them use the more narrow "USE_THE_INDEX_VARIABLE" rather than "USE_THE_INDEX_COMPATIBILITY_MACROS". Manual changes not made by coccinelle, that were squashed in: * Whitespace-wrap argument lists for repo_hold_locked_index(), repo_read_index_preload() and repo_refresh_and_write_index(), in cases where the line became too long after the transformation. * Change "refresh_cache()" to "refresh_index()" in a comment in "builtin/update-index.c". * For those whose call was followed by perror("<macro-name>"), change it to perror("<function-name>"), referring to the new function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-30replace and remove run_command_v_opt()René Scharfe
Replace the remaining calls of run_command_v_opt() with run_command() calls and explict struct child_process variables. This is more verbose, but not by much overall. The code becomes more flexible, e.g. it's easy to extend to conditionally add a new argument. Then remove the now unused function and its own flag names, simplifying the run-command API. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-30use child_process members "args" and "env" directlyRené Scharfe
Build argument list and environment of child processes by using struct child_process and populating its members "args" and "env" directly instead of maintaining separate strvecs and letting run_command_v_opt() and friends populate these members. This is simpler, shorter and slightly more efficient. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-25Merge branch 'jk/clone-allow-bare-and-o-together' into maint-2.38Junio C Hamano
"git clone" did not like to see the "--bare" and the "--origin" options used together without a good reason. * jk/clone-allow-bare-and-o-together: clone: allow "--bare" with "-o"
2022-10-17Sync with v2.38.1Junio C Hamano
2022-10-10Merge branch 'jk/clone-allow-bare-and-o-together'Junio C Hamano
"git clone" did not like to see the "--bare" and the "--origin" options used together without a good reason. * jk/clone-allow-bare-and-o-together: clone: allow "--bare" with "-o"
2022-10-06Sync with 2.37.4Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.36.3Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.35.5Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.34.5Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.33.5Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.32.4Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.31.5Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06Sync with 2.30.6Taylor Blau
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-01builtin/clone.c: disallow `--local` clones with symlinksTaylor Blau
When cloning a repository with `--local`, Git relies on either making a hardlink or copy to every file in the "objects" directory of the source repository. This is done through the callpath `cmd_clone()` -> `clone_local()` -> `copy_or_link_directory()`. The way this optimization works is by enumerating every file and directory recursively in the source repository's `$GIT_DIR/objects` directory, and then either making a copy or hardlink of each file. The only exception to this rule is when copying the "alternates" file, in which case paths are rewritten to be absolute before writing a new "alternates" file in the destination repo. One quirk of this implementation is that it dereferences symlinks when cloning. This behavior was most recently modified in 36596fd2df (clone: better handle symlinked files at .git/objects/, 2019-07-10), which attempted to support `--local` clones of repositories with symlinks in their objects directory in a platform-independent way. Unfortunately, this behavior of dereferencing symlinks (that is, creating a hardlink or copy of the source's link target in the destination repository) can be used as a component in attacking a victim by inadvertently exposing the contents of file stored outside of the repository. Take, for example, a repository that stores a Dockerfile and is used to build Docker images. When building an image, Docker copies the directory contents into the VM, and then instructs the VM to execute the Dockerfile at the root of the copied directory. This protects against directory traversal attacks by copying symbolic links as-is without dereferencing them. That is, if a user has a symlink pointing at their private key material (where the symlink is present in the same directory as the Dockerfile, but the key itself is present outside of that directory), the key is unreadable to a Docker image, since the link will appear broken from the container's point of view. This behavior enables an attack whereby a victim is convinced to clone a repository containing an embedded submodule (with a URL like "file:///proc/self/cwd/path/to/submodule") which has a symlink pointing at a path containing sensitive information on the victim's machine. If a user is tricked into doing this, the contents at the destination of those symbolic links are exposed to the Docker image at runtime. One approach to preventing this behavior is to recreate symlinks in the destination repository. But this is problematic, since symlinking the objects directory are not well-supported. (One potential problem is that when sharing, e.g. a "pack" directory via symlinks, different writers performing garbage collection may consider different sets of objects to be reachable, enabling a situation whereby garbage collecting one repository may remove reachable objects in another repository). Instead, prohibit the local clone optimization when any symlinks are present in the `$GIT_DIR/objects` directory of the source repository. Users may clone the repository again by prepending the "file://" scheme to their clone URL, or by adding the `--no-local` option to their `git clone` invocation. The directory iterator used by `copy_or_link_directory()` must no longer dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()` in order to discover whether or not there are symlinks present). This has no bearing on the overall behavior, since we will immediately `die()` on encounter a symlink. Note that t5604.33 suggests that we do support local clones with symbolic links in the source repository's objects directory, but this was likely unintentional, or at least did not take into consideration the problem with sharing parts of the objects directory with symbolic links at the time. Update this test to reflect which options are and aren't supported. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-22clone: allow "--bare" with "-o"Jeff King
We explicitly forbid the combination of "--bare" with "-o", but there doesn't seem to be any good reason to do so. The original logic came as part of e6489a1bdf (clone: do not accept more than one -o option., 2006-01-22), but that commit does not give any reason. Furthermore, the equivalent combination via config is allowed: git -c clone.defaultRemoteName=foo clone ... and works as expected. It may be that this combination was considered useless, because a bare clone does not set remote.origin.fetch (and hence there is no refs/remotes/origin hierarchy). But it does set remote.origin.url, and that name is visible to the user via "git fetch origin", etc. Let's allow the options to be used together, and switch the "forbid" test in t5606 to check that we use the requested name. That test came much later in 349cff76de (clone: add tests for --template and some disallowed option pairs, 2020-09-29), and does not offer any logic beyond "let's test what the code currently does". Reported-by: John A. Leuenhagen <john@zlima12.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-19Merge branch 'jk/list-objects-filter-cleanup'Junio C Hamano
A couple of bugfixes with code clean-up. * jk/list-objects-filter-cleanup: list-objects-filter: convert filter_spec to a strbuf list-objects-filter: add and use initializers list-objects-filter: handle null default filter spec list-objects-filter: don't memset after releasing filter struct
2022-09-12list-objects-filter: add and use initializersJeff King
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec strings, 2022-09-08), we noted that the filter_spec string_list was inconsistent in how it handled memory ownership of strings stored in the list. The fix there was a bit of a band-aid to set the "strdup_strings" variable right before adding anything. That works OK, and it lets the users of the API continue to zero-initialize the struct. But it makes the code a bit hard to follow and accident-prone, as any other spots appending the filter_spec need to think about whether to set the strdup_strings value, too (there's one such spot in partial_clone_get_default_filter_spec(), which is probably a possible memory leak). So let's do that full cleanup now. We'll introduce a LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as appropriate (though it is for the "_options" struct, this matches the corresponding list_objects_filter_release() function). This is harder than it seems! Many other structs, like git_transport_data, embed the filter struct. So they need to initialize it themselves even if the rest of the enclosing struct is OK with zero-initialization. I found all of the relevant spots by grepping manually for declarations of list_objects_filter_options. And then doing so recursively for structs which embed it, and ones which embed those, and so on. I'm pretty sure I got everything, but there's no change that would alert the compiler if any topics in flight added new declarations. To catch this case, we now double-check in the parsing function that things were initialized as expected and BUG() if appropriate. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-24clone: warn on failure to repo_init()Derrick Stolee
The --bundle-uri option was added in 55568919616 (clone: add --bundle-uri option, 2022-08-09), but this also introduced a call to repo_init() whose return value was ignored. Fix that ignored value by warning that the bundle URI process could not continue if it failed. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10clone: --bundle-uri cannot be combined with --depthDerrick Stolee
A previous change added the '--bundle-uri' option, but did not check if the --depth parameter was included. Since bundles are not compatible with shallow clones, provide an error message to the user who is attempting this combination. I am leaving this as its own change, separate from the one that implements '--bundle-uri', because this is more of an advisory for the user. There is nothing wrong with bootstrapping with bundles and then fetching a shallow clone. However, that is likely going to involve too much work for the client _and_ the server. The client will download all of this bundle information containing the full history of the repository only to ignore most of it. The server will get a shallow fetch request, but with a list of haves that might cause a more painful computation of that shallow pack-file. Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10clone: add --bundle-uri optionDerrick Stolee
Cloning a remote repository is one of the most expensive operations in Git. The server can spend a lot of CPU time generating a pack-file for the client's request. The amount of data can clog the network for a long time, and the Git protocol is not resumable. For users with poor network connections or are located far away from the origin server, this can be especially painful. Add a new '--bundle-uri' option to 'git clone' to bootstrap a clone from a bundle. If the user is aware of a bundle server, then they can tell Git to bootstrap the new repository with these bundles before fetching the remaining objects from the origin server. Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05Merge branch 'jk/clone-unborn-confusion' into maintJunio C Hamano
"git clone" from a repository with some ref whose HEAD is unborn did not set the HEAD in the resulting repository correctly, which has been corrected. source: <YsdyLS4UFzj0j/wB@coredump.intra.peff.net> * jk/clone-unborn-confusion: clone: move unborn head creation to update_head() clone: use remote branch if it matches default HEAD clone: propagate empty remote HEAD even with other branches clone: drop extra newline from warning message
2022-07-19Merge branch 'jk/clone-unborn-confusion'Junio C Hamano
"git clone" from a repository with some ref whose HEAD is unborn did not set the HEAD in the resulting repository correctly, which has been corrected. * jk/clone-unborn-confusion: clone: move unborn head creation to update_head() clone: use remote branch if it matches default HEAD clone: propagate empty remote HEAD even with other branches clone: drop extra newline from warning message
2022-07-11clone: move unborn head creation to update_head()Jeff King
Prior to 4f37d45706 (clone: respect remote unborn HEAD, 2021-02-05), creation of the local HEAD was always done in update_head(). That commit added code to handle an unborn head in an empty repository, and just did all symref creation and config setup there. This makes the code flow a little bit confusing, especially as new corner cases have been covered (like the previous commit to match our default branch name to a non-HEAD remote branch). Let's move the creation of the unborn symref into update_head(). This matches the other HEAD-creation cases, and now the logic is consistently separated: the main cmd_clone() function only examines the situation and sets variables based on what it finds, and update_head() actually performs the update. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-07clone: use remote branch if it matches default HEADJeff King
Usually clone tries to use the same local HEAD as the remote (unless the user has given --branch explicitly). Even if the remote HEAD is detached or unborn, we can detect those situations with modern versions of Git. If the remote is too old to support the "unborn" extension (or it has been disabled via config), then we can't know the name of the remote's unborn HEAD, and we fall back whatever the local default branch name is configured to be. But that leads to one weird corner case. It's rare because it needs a number of factors: - the remote has an unborn HEAD - the remote is too old to support "unborn", or has disabled it - the remote has another branch "foo" - the local default branch name is "foo" In that case you end up with a local clone on an unborn "foo" branch, disconnected completely from the remote's "foo". This is rare in practice, but the result is quite confusing. When choosing "foo", we can double check whether the remote has such a name, and if so, start our local "foo" at the same spot, rather than making it unborn. Note that this causes a test failure in t5605, which is cloning from a bundle that doesn't contain HEAD (so it behaves like a remote that doesn't support "unborn"), but has a single "main" branch. That test expects that we end up in the weird "unborn main" case, where we don't actually check out the remote branch of the same name. Even though we have to update the test, this seems like an argument in favor of this patch: checking out main is what I'd expect from such a bundle. So this patch updates the test for the new behavior and adds an adjacent one that checks what the original was going for: if there's no HEAD and the bundle _doesn't_ have a branch that matches our local default name, then we end up with nothing checked out. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-07clone: propagate empty remote HEAD even with other branchesJeff King
Unless "--branch" was given, clone generally tries to match the local HEAD to the remote one. For most repositories, this is easy: the remote tells us which branch HEAD was pointing to, and we call our local checkout() function on that branch. When cloning an empty repository, it's a little more tricky: we have special code that checks the transport's "unborn" extension, or falls back to our local idea of what the default branch should be. In either case, we point the new HEAD to that, and set up the branch.* config. But that leaves one case unhandled: when the remote repository _isn't_ empty, but its HEAD is unborn. The checkout() function is smart enough to realize we didn't fetch the remote HEAD and it bails with a warning. But we'll have ignored any information the remote gave us via the unborn extension. This leads to nonsense outcomes: - If the remote has its HEAD pointing to an unborn "foo" and contains another branch "bar", cloning will get branch "bar" but leave the local HEAD pointing at "master" (or whatever our local default is), which is useless. The project does not use "master" as a branch. - Worse, if the other branch "bar" is instead called "master" (but again, the remote HEAD is not pointing to it), then we end up with a local unborn branch "master", which is not connected to the remote "master" (it shares no history, and there's no branch.* config). Instead, we should try to use the remote's HEAD, even if its unborn, to be consistent with the other cases. The reason this case was missed is that cmd_clone() handles empty and non-empty repositories on two different sides of a conditional: if (we have any refs) { fetch refs; check for --branch; otherwise, try to point our head at remote head; otherwise, our head is NULL; } else { check for --branch; otherwise, try to use "unborn" extension; otherwise, fall back to our default name name; } So the smallest change would be to repeat the "unborn" logic at the end of the first block. But we can note some other overlaps and inconsistencies: - both sides have to handle --branch (though note that it's always an error for the empty repo case, since an empty repo by definition does not have a matching branch) - the fall back to the default name is much more explicit in the empty-repo case. The non-empty case eventually ends up bailing from checkout() with a warning, which produces a similar result, but fails to set up the branch config we do in the empty case. So let's pull the HEAD setup out of this conditional entirely. This de-duplicates some of the code and the result is easy to follow, because helper functions like find_ref_by_name() do the right thing even in the empty-repo case (i.e., by returning NULL). There are two subtleties: - for a remote with a detached HEAD, it will advertise an oid for HEAD (which we store in our "remote_head" variable), but we won't find a matching refname (so our "remote_head_points_at" is NULL). In this case we make a local detached HEAD to match. Right now this happens implicitly by reaching update_head() with a non-NULL remote_head (since we skip all of the unborn-fallback). We'll now need to account for it explicitly before doing the fallback. - for an empty repo, we issue a warning to the user that they've cloned an empty repo. The text of that warning doesn't make sense for a non-empty repo with an unborn HEAD, so we'll have to differentiate the two cases there. We could just use different text, but instead let's allow the code to continue down to checkout(), which will issue an appropriate warning, like: remote HEAD refers to nonexistent ref, unable to checkout Continuing down to checkout() will make it easier to do more fixes on top (see below). Note that this patch fixes the case where the other side reports an unborn head to us using the protocol extension. It _doesn't_ fix the case where the other side doesn't tell us, we locally guess "master", and the other side happens to have a "master" which its HEAD doesn't point. But it doesn't make anything worse there, and it should actually make it easier to fix that problem on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-07clone: drop extra newline from warning messageJeff King
We don't need to put a "\n" in calls to warning(), since it adds one itself (and the user sees an extra blank line). Drop it, and while we're here, drop the full-stop from the message, which goes against our guidelines. This bug dates all the way back to 8434c2f1af (Build in clone, 2008-04-27), but presumably nobody noticed because it's hard to trigger: you have to clone a repository whose HEAD is unborn, but which is not otherwise empty. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-01clone: fix memory leak in wanted_peer_refs()Ævar Arnfjörð Bjarmason
Fix a memory leak added in 0ec4b1650cc (clone: fix ref selection in --single-branch --branch=xxx, 2012-06-22). Whether we get our "remote_head" from copy_ref() directly, or with a call to guess_remote_head() it'll be the result of a copy_ref() in either case, as guess_remote_head() is a wrapper for copy_ref() (or it returns NULL). We can't mark any tests passing passing with SANITIZE=leak using "TEST_PASSES_SANITIZE_LEAK=true" as a result of this change, but e.g. "t/t1500-rev-parse.sh" now gets closer to passing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-30clone: plug a miniscule leakJunio C Hamano
The remote_name variable is first assigned a copy of the value of the "clone.defaultremotename" configuration variable and then by the value of the "--origin" command line option. The former is prepared to see multiple instances of the configuration variable by freeing the current value of the variable before a copy of the newly discovered value gets assigned to it. The latter however blindly assigned a copy of the new value to the variable, thereby leaking the value read from the configuration variable. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-21Merge branch 'ds/partial-bundles'Junio C Hamano
Bundle file format gets extended to allow a partial bundle, filtered by similar criteria you would give when making a partial/lazy clone. * ds/partial-bundles: clone: fail gracefully when cloning filtered bundle bundle: unbundle promisor packs bundle: create filtered bundles rev-list: move --filter parsing into revision.c bundle: parse filter capability list-objects: handle NULL function pointers MyFirstObjectWalk: update recommended usage list-objects: consolidate traverse_commit_list[_filtered] pack-bitmap: drop filter in prepare_bitmap_walk() pack-objects: use rev.filter when possible revision: put object filter into struct rev_info list-objects-filter-options: create copy helper index-pack: document and test the --promisor option