From 2101341484b093a7324ef8cd408af52c73cc5bea Mon Sep 17 00:00:00 2001 From: "brian m. carlson" Date: Tue, 9 Jul 2024 23:37:43 +0000 Subject: gitfaq: add documentation on proxies Many corporate environments and local systems have proxies in use. Note the situations in which proxies can be used and how to configure them. At the same time, note what standards a proxy must follow to work with Git. Explicitly call out certain classes that are known to routinely have problems reported various places online, including in the Git for Windows issue tracker and on Stack Overflow, and recommend against the use of such software, noting that they are associated with myriad security problems (including, for example, breaking sandboxing and image integrity[0], and, for TLS middleboxes, the use of insecure protocols and ciphers and lack of certificate verification[1]). Don't mention the specific nature of these security problems in the FAQ entry because they are extremely numerous and varied and we wish to keep the FAQ entry relatively brief. [0] https://issues.chromium.org/issues/40285192 [1] https://faculty.cc.gatech.edu/~mbailey/publications/ndss17_interception.pdf Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano --- Documentation/gitfaq.txt | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt index 8c1f2d5675..e4125b1178 100644 --- a/Documentation/gitfaq.txt +++ b/Documentation/gitfaq.txt @@ -241,6 +241,42 @@ How do I know if I want to do a fetch or a pull?:: ignore the upstream changes. A pull consists of a fetch followed immediately by either a merge or rebase. See linkgit:git-pull[1]. +[[proxy]] +Can I use a proxy with Git?:: + Yes, Git supports the use of proxies. Git honors the standard `http_proxy`, + `https_proxy`, and `no_proxy` environment variables commonly used on Unix, and + it also can be configured with `http.proxy` and similar options for HTTPS (see + linkgit:git-config[1]). The `http.proxy` and related options can be + customized on a per-URL pattern basis. In addition, Git can in theory + function normally with transparent proxies that exist on the network. ++ +For SSH, Git can support a proxy using OpenSSH's `ProxyCommand`. Commonly used +tools include `netcat` and `socat`. However, they must be configured not to +exit when seeing EOF on standard input, which usually means that `netcat` will +require `-q` and `socat` will require a timeout with something like `-t 10`. +This is required because the way the Git SSH server knows that no more requests +will be made is an EOF on standard input, but when that happens, the server may +not have yet processed the final request, so dropping the connection at that +point would interrupt that request. ++ +An example configuration entry in `~/.ssh/config` with an HTTP proxy might look +like this: ++ +---- +Host git.example.org + User git + ProxyCommand socat -t 10 - PROXY:proxy.example.org:%h:%p,proxyport=8080 +---- ++ +Note that in all cases, for Git to work properly, the proxy must be completely +transparent. The proxy cannot modify, tamper with, or buffer the connection in +any way, or Git will almost certainly fail to work. Note that many proxies, +including many TLS middleboxes, Windows antivirus and firewall programs other +than Windows Defender and Windows Firewall, and filtering proxies fail to meet +this standard, and as a result end up breaking Git. Because of the many +reports of problems and their poor security history, we recommend against the +use of these classes of software and devices. + Merging and Rebasing -------------------- -- cgit v1.3 From c98f78b806c93560c2a6c7eadda5449ac2a5432b Mon Sep 17 00:00:00 2001 From: "brian m. carlson" Date: Tue, 9 Jul 2024 23:37:44 +0000 Subject: gitfaq: give advice on using eol attribute in gitattributes In the FAQ, we tell people how to use the text attribute, but we fail to explain what to do with the eol attribute. As we ourselves have noticed, most shell implementations do not care for carriage returns, and as such, people will practically always want them to use LF endings. Similar things can be said for batch files on Windows, except with CRLF endings. Since these are common things to have in a repository, let's help users make a good decision by recommending that they use the gitattributes file to correctly check out the endings. In addition, let's correct the cross-reference to this question, which originally referred to "the following entry", even though a new entry has been inserted in between. The cross-reference notation should prevent this from occurring and provide a link in formats, such as HTML, which support that. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano --- Documentation/gitfaq.txt | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt index e4125b1178..058ef32a97 100644 --- a/Documentation/gitfaq.txt +++ b/Documentation/gitfaq.txt @@ -393,8 +393,9 @@ I'm on Windows and git diff shows my files as having a `^M` at the end.:: + You can store the files in the repository with Unix line endings and convert them automatically to your platform's line endings. To do that, set the -configuration option `core.eol` to `native` and see the following entry for -information about how to configure files as text or binary. +configuration option `core.eol` to `native` and see +<> +for information about how to configure files as text or binary. + You can also control this behavior with the `core.whitespace` setting if you don't wish to remove the carriage returns from your line endings. @@ -456,14 +457,26 @@ references, URLs, and hashes stored in the repository. + We also recommend setting a linkgit:gitattributes[5] file to explicitly mark which files are text and which are binary. If you want Git to guess, you can -set the attribute `text=auto`. For example, the following might be appropriate -in some projects: +set the attribute `text=auto`. ++ +With text files, Git will generally ensure that LF endings are used in the +repository. The `core.autocrlf` and `core.eol` configuration variables specify +what line-ending convention is followed when any text file is checked out. You +can also use the `eol` attribute (e.g., `eol=crlf`) to override which files get +what line-ending treatment. ++ +For example, generally shell files must have LF endings and batch files must +have CRLF endings, so the following might be appropriate in some projects: + ---- # By default, guess. * text=auto # Mark all C files as text. *.c text +# Ensure all shell files have LF endings and all batch files have CRLF +# endings in the working tree and both have LF in the repo. +*.sh text eol=lf +*.bat text eol=crlf # Mark all JPEG files as binary. *.jpg binary ---- -- cgit v1.3 From 804ecbcfd1057794f7881fdea071a2069a005a64 Mon Sep 17 00:00:00 2001 From: "brian m. carlson" Date: Tue, 9 Jul 2024 23:37:45 +0000 Subject: gitfaq: add entry about syncing working trees Users very commonly want to sync their working tree with uncommitted changes across machines, often to carry across in-progress work or stashes. Despite this not being a recommended approach, users want to do it and are not dissuaded by suggestions not to, so let's recommend a sensible technique. The technique that many users are using is their preferred cloud syncing service, which is a bad idea. Users have reported problems where they end up with duplicate files that won't go away (with names like "file.c 2"), broken references, oddly named references that have date stamps appended to them, missing objects, and general corruption and data loss. That's because almost all of these tools sync file by file, which is a great technique if your project is a single word processing document or spreadsheet, but is utterly abysmal for Git repositories because they don't necessarily snapshot the entire repository correctly. They also tend to sync the files immediately instead of when the repository is quiescent, so writing multiple files, as occurs during a commit or a gc, can confuse the tools and lead to corruption. We know that the old standby, rsync, is up to the task, provided that the repository is quiescent, so let's suggest that and dissuade people from using cloud syncing tools. Let's tell people about common things they should be aware of before doing this and that this is still potentially risky. Additionally, let's tell people that Git's security model does not permit sharing working trees across users in case they planned to do that. While we'd still prefer users didn't try to do this, hopefully this will lead them in a safer direction. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano --- Documentation/gitfaq.txt | 52 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt index 058ef32a97..f2917d142c 100644 --- a/Documentation/gitfaq.txt +++ b/Documentation/gitfaq.txt @@ -185,6 +185,58 @@ Then, you can adjust your push URL to use `git@example_author` or `git@example_committer` instead of `git@example.org` (e.g., `git remote set-url git@example_author:org1/project1.git`). +Transfers +--------- + +[[sync-working-tree]] +How do I sync a working tree across systems?:: + First, decide whether you want to do this at all. Git works best when you + push or pull your work using the typical `git push` and `git fetch` commands + and isn't designed to share a working tree across systems. This is + potentially risky and in some cases can cause repository corruption or data + loss. ++ +Usually, doing so will cause `git status` to need to re-read every file in the +working tree. Additionally, Git's security model does not permit sharing a +working tree across untrusted users, so it is only safe to sync a working tree +if it will only be used by a single user across all machines. ++ +It is important not to use a cloud syncing service to sync any portion of a Git +repository, since this can cause corruption, such as missing objects, changed +or added files, broken refs, and a wide variety of other problems. These +services tend to sync file by file on a continuous basis and don't understand +the structure of a Git repository. This is especially bad if they sync the +repository in the middle of it being updated, since that is very likely to +cause incomplete or partial updates and therefore data loss. ++ +An example of the kind of corruption that can occur is conflicts over the state +of refs, such that both sides end up with different commits on a branch that +the other doesn't have. This can result in important objects becoming +unreferenced and possibly pruned by `git gc`, causing data loss. ++ +Therefore, it's better to push your work to either the other system or a central +server using the normal push and pull mechanism. However, this doesn't always +preserve important data, like stashes, so some people prefer to share a working +tree across systems. ++ +If you do this, the recommended approach is to use `rsync -a --delete-after` +(ideally with an encrypted connection such as with `ssh`) on the root of +repository. You should ensure several things when you do this: ++ +* If you have additional worktrees or a separate Git directory, they must be + synced at the same time as the main working tree and repository. +* You are comfortable with the destination directory being an exact copy of the + source directory, _deleting any data that is already there_. +* The repository (including all worktrees and the Git directory) is in a + quiescent state for the duration of the transfer (that is, no operations of + any sort are taking place on it, including background operations like `git + gc` and operations invoked by your editor). ++ +Be aware that even with these recommendations, syncing in this way has some risk +since it bypasses Git's normal integrity checking for repositories, so having +backups is advised. You may also wish to do a `git fsck` to verify the +integrity of your data on the destination system after syncing. + Common Issues ------------- -- cgit v1.3 From 70405acf60ca20de1aaf0fe67440de93ee2318e6 Mon Sep 17 00:00:00 2001 From: "brian m. carlson" Date: Tue, 9 Jul 2024 23:37:46 +0000 Subject: doc: mention that proxies must be completely transparent We already document in the FAQ that proxies must be completely transparent and not modify the request or response in any way, but add similar documentation to the http.proxy entry. We know that while the FAQ is very useful, users sometimes are less likely to read in favor of the documentation specific to an option or command, so adding it in both places will help users be adequately informed. Signed-off-by: brian m. carlson Signed-off-by: Junio C Hamano --- Documentation/config/http.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt index 2d4e0c9b86..a9c7480f6a 100644 --- a/Documentation/config/http.txt +++ b/Documentation/config/http.txt @@ -7,6 +7,11 @@ http.proxy:: linkgit:gitcredentials[7] for more information. The syntax thus is '[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden on a per-remote basis; see remote..proxy ++ +Any proxy, however configured, must be completely transparent and must not +modify, transform, or buffer the request or response in any way. Proxies which +are not completely transparent are known to cause various forms of breakage +with Git. http.proxyAuthMethod:: Set the method with which to authenticate against the HTTP proxy. This -- cgit v1.3