aboutsummaryrefslogtreecommitdiff
path: root/fsck.h
AgeCommit message (Collapse)Author
2026-03-23fsck: store repository in fsck optionsPatrick Steinhardt
The fsck subsystem relies on `the_repository` quite a bit. While we could of course explicitly pass a repository down the callchain, we already have a `struct fsck_options` that we pass to almost all functions. Extend the options to also store the repository to make it readily available. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-03-23fsck: initialize fsck options via a functionPatrick Steinhardt
We initialize the `struct fsck_options` via a set of macros, often in global scope. In the next commit though we're about to introduce a new repository field to the options that must be initialized, and naturally we don't have a repo other than `the_repository` available in this scope. Refactor the code to instead intrdouce a new `fsck_options_init()` function that initializes the options for us and move initialization into function scope. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12builtin/fsck: move generic HEAD check into `refs_fsck()`Patrick Steinhardt
Move the check that detects "HEAD" refs that do not point at a branch into `refs_fsck()`. This follows the same motivation as the preceding commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12builtin/fsck: move generic object ID checks into `refs_fsck()`Patrick Steinhardt
While most of the logic that verifies the consistency of refs is driven by `refs_fsck()`, we still have a small handful of checks in `fsck_head_link()`. These checks don't use the git-fsck(1) reporting infrastructure, and as such it's impossible to for example disable some of those checks. One such check detects refs that point to the all-zeroes object ID. Extract this check into the generic `refs_fsck_ref()` function that is used by both the "files" and "reftable" backends. Note that this will cause us to not return an error code from `fsck_head_link()` anymore in case this error was detected. This is fine though: the only caller of this function does not check the error code anyway. To demonstrate this, adapt the function to drop its return value altogether. The function will be removed in a subsequent commit anyway. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-12fsck: drop unused fields from `struct fsck_ref_report`Patrick Steinhardt
The `struct fsck_ref_report` has a couple fields that are intended to improve the error reporting for broken ref reports by showing which object ID or target reference the ref points to. These fields are never set though and are thus essentially unused. Remove them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-25builtin/index-pack: fix deferred fsck outside reposPatrick Steinhardt
When asked to perform object consistency checks via the `--fsck-objects` flag we verify that each object part of the pack is valid. In general, this check can even be performed outside of a Git repository: we don't need an initialized object database as we simply read the object from the packfile directly. But there's one exception: a subset of the object checks may be deferred to a later point in time. For now, this only concerns ".gitmodules" and ".gitattributes" files: whenever we see a tree referencing these files we queue them for a deferred check. This is done because we need to do some extra checks for those files to ensure that they are well-formed, and these checks need to be done regardless of whether the corresponding blobs are part of the packfile or not. This works inside a repository, but unfortunately the logic leads to a segfault when running outside of one. This is because we eventually call `odb_read_object()`, which will crash because the object database has not been initialized. There's multiple options here: - We could in theory create a purely in-memory database with only a packfile store that contains the single packfile. We don't really have the infrastructure for this yet though, and it would end up being quite hacky. - We could refuse to perform consistency checks outside of a repository. But most of the checks work alright, so this would be a regression. - We can skip the finalizing consistency checks when running outside of a repository. This is not as invasive as skipping all checks, but it's not great to randomly skip a subset of tests, either. None of these options really feel perfect. The first one would be the obvious choice if easily possible. There's another option though: instead of skipping the final object checks, we can die if there are any queued object checks. With this change we now die exactly if and only if we would have previously segfaulted. Like this we ensure that objects that _may_ fail the consistency checks won't be silently skipped, and at the same time we give users a much better error message. Refactor the code accordingly and add a test that would have triggered the segfault. Note that we also move down the logic to add the packfile to the store. There is no point doing this any earlier than right before we execute `fsck_finish()`, and it ensures that the logic to set up and perform the consistency check is self-contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-22Merge branch 'bc/sha1-256-interop-01'Junio C Hamano
The beginning of SHA1-SHA256 interoperability work. * bc/sha1-256-interop-01: t1010: use BROKEN_OBJECTS prerequisite t: allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format
2025-10-09fsck: consider gpgsig headers expected in tagsbrian m. carlson
When we're creating a tag, we want to make sure that gpgsig and gpgsig-sha256 headers are allowed for the commit. The default fsck behavior is to ignore the fact that they're left over, but some of our tests enable strict checking which flags them nonetheless. Add improved checking for these headers as well as documentation and several tests. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07refs/reftable: add fsck check for checking the table nameKarthik Nayak
Add glue code in 'refs/reftable-backend.c' which calls the reftable library to perform the fsck checks. Here we also map the reftable errors to Git' fsck errors. Introduce a check to validate table names for a given reftable stack. Also add 'badReftableTableName' as a corresponding error within Git. The reftable specification mentions: It suggested to use ${min_update_index}-${max_update_index}-${random}.ref as a naming convention. So treat non-conformant file names as warnings. While adding the fsck header to 'refs/reftable-backend.c', modify the list to maintain lexicographical ordering. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-07fsck: order 'fsck_msg_type' alphabeticallyKarthik Nayak
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there are a few small misses. Fix this by sorting the sub-sections of the list to maintain alphabetical ordering. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-23config: drop `git_config()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config()`. All callsites are adjusted so that they use `repo_config(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14packed-backend: fsck should warn when "packed-refs" file is emptyshejialuo
We assume the "packed-refs" won't be empty and instead has at least one line in it (even when there are no refs packed, there is the file header line). Because there is no terminating LF in the empty file, we will report "packedRefEntryNotTerminated(ERROR)" to the user. However, the runtime code paths would accept an empty "packed-refs" file, for example, "create_snapshot" would simply return the "snapshot" without checking the content of "packed-refs". So, we should skip checking the content of "packed-refs" when it is empty during fsck. After 694b7a1999 (repack_without_ref(): write peeled refs in the rewritten file, 2013-04-22), we would always write a header into the "packed-refs" file. So, versions of Git that are not too ancient never write such an empty "packed-refs" file. As an empty file often indicates a sign of a filesystem-level issue, the way we want to resolve this inconsistency is not make everybody totally silent but notice and report the anomaly. Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report to the users that "packed-refs" is empty. Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-26Merge branch 'sj/ref-consistency-checks-more'Junio C Hamano
"git fsck" becomes more careful when checking the refs. * sj/ref-consistency-checks-more: builtin/fsck: add `git refs verify` child process packed-backend: check whether the "packed-refs" is sorted packed-backend: add "packed-refs" entry consistency check packed-backend: check whether the refname contains NUL characters packed-backend: add "packed-refs" header consistency check packed-backend: check if header starts with "# pack-refs with: " packed-backend: check whether the "packed-refs" is regular file builtin/refs: get worktrees without reading head information t0602: use subshell to ensure working directory unchanged
2025-03-03fsck.h: *.txt -> *.adoc fixesTodd Zullinger
Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: check whether the "packed-refs" is sortedshejialuo
When there is a "sorted" trait in the header of the "packed-refs" file, it means that each entry is sorted increasingly by comparing the refname. We should add checks to verify whether the "packed-refs" is sorted in this case. Update the "packed_fsck_ref_header" to know whether there is a "sorted" trail in the header. It may seem that we could record all refnames during the parsing process and then compare later. However, this is not a good design due to the following reasons: 1. Because we need to store the state across the whole checking lifetime, we would consume a lot of memory if there are many entries in the "packed-refs" file. 2. We cannot reuse the existing compare function "cmp_packed_ref_records" which cause repetition. Because "cmp_packed_ref_records" needs an extra parameter "struct snaphost", extract the common part into a new function "cmp_packed_ref_records" to reuse this function to compare. Then, create a new function "packed_fsck_ref_sorted" to parse the file again and user the new fsck message "packedRefUnsorted(ERROR)" to report to the user if the file is not sorted. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" entry consistency checkshejialuo
"packed-backend.c::next_record" will parse the ref entry to check the consistency. This function has already checked the following things: 1. Parse the main line of the ref entry to inspect whether the oid is not correct. Then, check whether the next character is oid. Then check the refname. 2. If the next line starts with '^', it would continue to parse the peeled oid and check whether the last character is '\n'. As we decide to implement the ref consistency check for "packed-refs", let's port these two checks and update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-27packed-backend: add "packed-refs" header consistency checkshejialuo
In "packed-backend.c::create_snapshot", if there is a header (the line which starts with '#'), we will check whether the line starts with "# pack-refs with: ". However, we need to consider other situations and discuss whether we need to add checks. 1. If the header does not exist, we should not report an error to the user. This is because in older Git version, we never write header in the "packed-refs" file. Also, we do allow no header in "packed-refs" in runtime. 2. If the header content does not start with "# packed-ref with: ", we should report an error just like what "create_snapshot" does. So, create a new fsck message "badPackedRefHeader(ERROR)" for this. 3. If the header content is not the same as the constant string "PACKED_REFS_HEADER". This is expected because we make it extensible intentionally and runtime "create_snapshot" won't complain about unknown traits. In order to align with the runtime behavior. There is no need to report. As we have analyzed, we only need to check the case 2 in the above. In order to do this, use "open_nofollow" function to get the file descriptor and then read the "packed-refs" file via "strbuf_read". Like what "create_snapshot" and other functions do, we could split the line by finding the next newline in the buffer. When we cannot find a newline, we could report an error. So, create a function "packed_fsck_ref_next_line" to find the next newline and if there is no such newline, use "packedRefEntryNotTerminated(ERROR)" to report an error to the user. Then, parse the first line to apply the checks. Update the test to exercise the code. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add symlink ref content check for files backendshejialuo
Besides the textual symref, we also allow symbolic links as the symref. So, we should also provide the consistency check as what we have done for textual symref. And also we consider deprecating writing the symbolic links. We first need to access whether symbolic links still be used. So, add a new fsck message "symlinkRef(INFO)" to tell the user be aware of this information. We have already introduced "files_fsck_symref_target". We should reuse this function to handle the symrefs which use legacy symbolic links. We should not check the trailing garbage for symbolic refs. Add a new parameter "symbolic_link" to disable some checks which should only be executed for textual symrefs. And we need to also generate the "referent" parameter for reusing "files_fsck_symref_target" by the following steps: 1. Use "strbuf_add_real_path" to resolve the symlink and get the absolute path "ref_content" which the symlink ref points to. 2. Generate the absolute path "abs_gitdir" of "gitdir" and combine "ref_content" and "abs_gitdir" to extract the relative path "relative_referent_path". 3. If "ref_content" is outside of "gitdir", we just set "referent" with "ref_content". Instead, we set "referent" with "relative_referent_path". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: check whether the target of the symref is a refshejialuo
Ideally, we want to the users use "git symbolic-ref" to create symrefs instead of writing raw contents into the filesystem. However, "git symbolic-ref" is strict with the refname but not strict with the referent. For example, we can make the "referent" located at the "$(gitdir)/logs/aaa" and manually write the content into this where we can still successfully parse this symref by using "git rev-parse". $ git init repo && cd repo && git commit --allow-empty -mx $ git symbolic-ref refs/heads/test logs/aaa $ echo $(git rev-parse HEAD) > .git/logs/aaa $ git rev-parse test We may need to add some restrictions for "referent" parameter when using "git symbolic-ref" to create symrefs because ideally all the nonpseudo-refs should be located under the "refs" directory and we may tighten this in the future. In order to tell the user we may tighten the above situation, create a new fsck message "symrefTargetIsNotARef" to notify the user that this may become an error in the future. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add basic symref content check for files backendshejialuo
We have code that checks regular ref contents, but we do not yet check the contents of symbolic refs. By using "parse_loose_ref_content" for symbolic refs, we will get the information of the "referent". We do not need to check the "referent" by opening the file. This is because if "referent" exists in the file system, we will eventually check its correctness by inspecting every file in the "refs" directory. If the "referent" does not exist in the filesystem, this is OK as it is seen as the dangling symref. So we just need to check the "referent" string content. A regular ref could be accepted as a textual symref if it begins with "ref:", followed by zero or more whitespaces, followed by the full refname, followed only by whitespace characters. However, we always write a single SP after "ref:" and a single LF after the refname. It may seem that we should report a fsck error message when the "referent" does not apply above rules and we should not be so aggressive because third-party reimplementations of Git may have taken advantage of the looser syntax. Put it more specific, we accept the following contents: 1. "ref: refs/heads/master " 2. "ref: refs/heads/master \n \n" 3. "ref: refs/heads/master\n\n" When introducing the regular ref content checks, we created two fsck infos "refMissingNewline" and "trailingRefContent" which exactly represents above situations. So we will reuse these two fsck messages to write checks to info the user about these situations. But we do not allow any other trailing garbage. The followings are bad symref contents which will be reported as fsck error by "git-fsck(1)". 1. "ref: refs/heads/master garbage\n" 2. "ref: refs/heads/master \n\n\n garbage " And we introduce a new "badReferentName(ERROR)" fsck message to report above errors by using "is_root_ref" and "check_refname_format" to check the "referent". Since both "is_root_ref" and "check_refname_format" don't work with whitespaces, we use the trimmed version of "referent" with these functions. In order to add checks, we will do the following things: 1. Record the untrimmed length "orig_len" and untrimmed last byte "orig_last_byte". 2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure "is_root_ref" and "check_refname_format" won't be failed by them. 3. Use "orig_len" and "orig_last_byte" to check whether the "referent" misses '\n' at the end or it has trailing whitespaces or newlines. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: add more strict checks for regular refsshejialuo
We have already used "parse_loose_ref_contents" function to check whether the ref content is valid in files backend. However, by using "parse_loose_ref_contents", we allow the ref's content to end with garbage or without a newline. Even though we never create such loose refs ourselves, we have accepted such loose refs. So, it is entirely possible that some third-party tools may rely on such loose refs being valid. We should not report an error fsck message at current. We should notify the users about such "curiously formatted" loose refs so that adequate care is taken before we decide to tighten the rules in the future. And it's not suitable either to report a warn fsck message to the user. We don't yet want the "--strict" flag that controls this bit to end up generating errors for such weirdly-formatted reference contents, as we first want to assess whether this retroactive tightening will cause issues for any tools out there. It may cause compatibility issues which may break the repository. So, we add the following two fsck infos to represent the situation where the ref content ends without newline or has trailing garbages: 1. refMissingNewline(INFO): A loose ref that does not end with newline(LF). 2. trailingRefContent(INFO): A loose ref has trailing content. It might appear that we can't provide the user with any warnings by using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert FSCK_INFO to FSCK_WARN and we can still warn the user about these situations when using "git refs verify" without introducing compatibility issues. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21ref: port git-fsck(1) regular refs check for files backendshejialuo
"git-fsck(1)" implicitly checks the ref content by passing the callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref". Then, it will check whether the ref content (eventually "oid") is valid. If not, it will report the following error to the user. error: refs/heads/main: invalid sha1 pointer 0000... And it will also report above errors when there are dangling symrefs in the repository wrongly. This does not align with the behavior of the "git symbolic-ref" command which allows users to create dangling symrefs. As we have already introduced the "git refs verify" command, we'd better check the ref content explicitly in the "git refs verify" command thus later we could remove these checks in "git-fsck(1)" and launch a subprocess to call "git refs verify" in "git-fsck(1)" to make the "git-fsck(1)" more clean. Following what "git-fsck(1)" does, add a similar check to "git refs verify". Then add a new fsck error message "badRefContent(ERROR)" to represent that a ref has an invalid content. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: add ref name check for files backendshejialuo
The git-fsck(1) only implicitly checks the reference, it does not fully check refs with bad format name such as standalone "@". However, a file ending with ".lock" should not be marked as having a bad ref name. It is expected that concurrent writers may have such lock files. We currently ignore this situation. But for bare ".lock" file, we will report it as error. In order to provide such checks, add a new fsck message id "badRefName" with default ERROR type. Use existing "check_refname_format" to explicit check the ref name. And add a new unit test to verify the functionality. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08files-backend: add unified interface for refs scanningshejialuo
For refs and reflogs, we need to scan its corresponding directories to check every regular file or symbolic link which shares the same pattern. Introduce a unified interface for scanning directories for files-backend. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08builtin/refs: add verify subcommandshejialuo
Introduce a new subcommand "verify" in git-refs(1) to allow the user to check the reference database consistency and also this subcommand will be used as the entry point of checking refs for "git-fsck(1)". Add "verbose" field into "fsck_options" to indicate whether we should print verbose messages when checking refs and objects consistency. Remove bit-field for "strict" field, this is because we cannot take address of a bit-field which makes it unhandy to set member variables when parsing the command line options. The "git-fsck(1)" declares "fsck_options" variable with "static" identifier which avoids complaint by the leak-checker. However, in "git-refs verify", we need to do memory clean manually. Thus add "fsck_options_clear" function in "fsck.c" to provide memory clean operation. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: add refs report functionshejialuo
Introduce a new struct "fsck_ref_report" to contain the information we need when reporting refs-related messages. With the new "fsck_vreport" function, add a new function "fsck_report_ref" to report refs-related fsck error message. Unlike "report" function uses the exact parameters, we simply pass "struct fsck_ref_report *report" as the parameter. This is because at current we don't know exactly how many fields we need. By passing this parameter, we don't need to change this function prototype when we want to add more information into "fsck_ref_report". We have introduced "fsck_report_ref" function to report the error message for refs. We still need to add the corresponding callback function. Create refs-specific "error_func" callback "fsck_refs_error_function". Last, add "FSCK_REFS_OPTIONS_DEFAULT" macro to create default options when checking ref consistency. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: make "fsck_error" callback genericshejialuo
The "fsck_error" callback is designed to report the objects-related error messages. It accepts two parameter "oid" and "object_type" which is not generic. In order to provide a unified callback which can report either objects or refs, remove the objects-related parameters and add the generic parameter "void *fsck_report". Create a new "fsck_object_report" structure which incorporates the removed parameters "oid" and "object_type". Then change the corresponding references to adapt to new "fsck_error" callback. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: rename objects-related fsck error functionsshejialuo
The names of objects-related fsck error functions are generic. It's OK when there is only object database check. However, we are going to introduce refs database check report function. To avoid ambiguity, rename object-related fsck error functions to explicitly indicate these functions are used to report objects-related messages. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-08fsck: rename "skiplist" to "skip_oids"shejialuo
The "skiplist" field in "fsck_options" is related to objects. Because we are going to introduce ref consistency check, the "skiplist" name is too general which will make the caller think "skiplist" is related to both the refs and objects. It may seem that for both refs and objects, we should provide a general "skiplist" here. However, the type for "skiplist" is `struct oidset` which is totally unsuitable for refs. To avoid above ambiguity, rename "skiplist" to "skip_oids". Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17Merge branch 'ps/no-writable-strings'Junio C Hamano
Building with "-Werror -Wwrite-strings" is now supported. * ps/no-writable-strings: (27 commits) config.mak.dev: enable `-Wwrite-strings` warning builtin/merge: always store allocated strings in `pull_twohead` builtin/rebase: always store allocated string in `options.strategy` builtin/rebase: do not assign default backend to non-constant field imap-send: fix leaking memory in `imap_server_conf` imap-send: drop global `imap_server_conf` variable mailmap: always store allocated strings in mailmap blob revision: always store allocated strings in output encoding remote-curl: avoid assigning string constant to non-const variable send-pack: always allocate receive status parse-options: cast long name for OPTION_ALIAS http: do not assign string constant to non-const field compat/win32: fix const-correctness with string constants pretty: add casts for decoration option pointers object-file: make `buf` parameter of `index_mem()` a constant object-file: mark cached object buffers as const ident: add casts for fallback name and GECOS entry: refactor how we remove items for delayed checkouts line-log: always allocate the output prefix line-log: stop assigning string constant to file parent buffer ...
2024-06-07global: improve const correctness when assigning string constantsPatrick Steinhardt
We're about to enable `-Wwrite-strings`, which changes the type of string constants to `const char[]`. Fix various sites where we assign such constants to non-const variables. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24Merge branch 'fixes/2.45.1/2.42' into fixes/2.45.1/2.43Junio C Hamano
* fixes/2.45.1/2.42: Revert "fsck: warn about symlink pointing inside a gitdir" Revert "Add a helper function to compare file contents" clone: drop the protections where hooks aren't run tests: verify that `clone -c core.hooksPath=/dev/null` works again Revert "core.hooksPath: add some protection while cloning" init: use the correct path of the templates directory again hook: plug a new memory leak ci: stop installing "gcc-13" for osx-gcc ci: avoid bare "gcc" for osx-gcc job ci: drop mention of BREW_INSTALL_PACKAGES variable send-email: avoid creating more than one Term::ReadLine object send-email: drop FakeTerm hack
2024-05-24Merge branch 'fixes/2.45.1/2.41' into fixes/2.45.1/2.42Junio C Hamano
* fixes/2.45.1/2.41: Revert "fsck: warn about symlink pointing inside a gitdir" Revert "Add a helper function to compare file contents" clone: drop the protections where hooks aren't run tests: verify that `clone -c core.hooksPath=/dev/null` works again Revert "core.hooksPath: add some protection while cloning" init: use the correct path of the templates directory again hook: plug a new memory leak ci: stop installing "gcc-13" for osx-gcc ci: avoid bare "gcc" for osx-gcc job ci: drop mention of BREW_INSTALL_PACKAGES variable send-email: avoid creating more than one Term::ReadLine object send-email: drop FakeTerm hack
2024-05-24Merge branch 'fixes/2.45.1/2.40' into fixes/2.45.1/2.41Junio C Hamano
* fixes/2.45.1/2.40: Revert "fsck: warn about symlink pointing inside a gitdir" Revert "Add a helper function to compare file contents" clone: drop the protections where hooks aren't run tests: verify that `clone -c core.hooksPath=/dev/null` works again Revert "core.hooksPath: add some protection while cloning" init: use the correct path of the templates directory again hook: plug a new memory leak ci: stop installing "gcc-13" for osx-gcc ci: avoid bare "gcc" for osx-gcc job ci: drop mention of BREW_INSTALL_PACKAGES variable send-email: avoid creating more than one Term::ReadLine object send-email: drop FakeTerm hack
2024-05-24Merge branch 'jc/fix-2.45.1-and-friends-for-2.39' into fixes/2.45.1/2.40Junio C Hamano
Revert overly aggressive "layered defence" that went into 2.45.1 and friends, which broke "git-lfs", "git-annex", and other use cases, so that we can rebuild necessary counterparts in the open. * jc/fix-2.45.1-and-friends-for-2.39: Revert "fsck: warn about symlink pointing inside a gitdir" Revert "Add a helper function to compare file contents" clone: drop the protections where hooks aren't run tests: verify that `clone -c core.hooksPath=/dev/null` works again Revert "core.hooksPath: add some protection while cloning" init: use the correct path of the templates directory again hook: plug a new memory leak ci: stop installing "gcc-13" for osx-gcc ci: avoid bare "gcc" for osx-gcc job ci: drop mention of BREW_INSTALL_PACKAGES variable send-email: avoid creating more than one Term::ReadLine object send-email: drop FakeTerm hack
2024-05-22Revert "fsck: warn about symlink pointing inside a gitdir"Junio C Hamano
This reverts commit a33fea08 (fsck: warn about symlink pointing inside a gitdir, 2024-04-10), which warns against symbolic links commonly created by git-annex.
2024-04-19Sync with 2.42.2Johannes Schindelin
* maint-2.42: (39 commits) Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' ...
2024-04-19Sync with 2.41.1Johannes Schindelin
* maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
2024-04-19Sync with 2.40.2Johannes Schindelin
* maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
2024-04-19Sync with 2.39.4Johannes Schindelin
* maint-2.39: (38 commits) Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default fetch/clone: detect dubious ownership of local repositories ...
2024-04-19fsck: warn about symlink pointing inside a gitdirJohannes Schindelin
In the wake of fixing a vulnerability where `git clone` mistakenly followed a symbolic link that it had just written while checking out files, writing into a gitdir, let's add some defense-in-depth by teaching `git fsck` to report symbolic links stored in its trees that point inside `.git/`. Even though the Git project never made any promises about the exact shape of the `.git/` directory's contents, there are likely repositories out there containing symbolic links that point inside the gitdir. For that reason, let's only report these as warnings, not as errors. Security-conscious users are encouraged to configure `fsck.symlinkPointsToGitDir = error`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2023-08-31fsck: detect very large tree pathnamesJeff King
In general, Git tries not to arbitrarily limit what it will store, and there are currently no limits at all on the size of the path we find in a tree. In theory you could have one that is gigabytes long. But in practice this freedom is not really helping anybody, and is potentially harmful: 1. Most operating systems have much lower limits for the size of a single pathname component (e.g., on Linux you'll generally get ENAMETOOLONG for anything over 255 bytes). And while you _can_ use Git in a way that never touches the filesystem (manipulating the index and trees directly), it's still probably not a good idea to have gigantic tree names. Many operations load and traverse them, so any clever Git-as-a-database scheme is likely to perform poorly in that case. 2. We still have a lot of code which assumes strings are reasonably sized, and I won't be at all surprised if you can trigger some interesting integer overflows with gigantic pathnames. Stopping malicious trees from entering the repository provides an extra line of defense, protecting downstream code. This patch implements an fsck check so that such trees can be rejected by transfer.fsckObjects. I've picked a reasonably high maximum depth here (4096) that hopefully should not bother anybody in practice. I've also made it configurable, as an escape hatch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23object.h: stop depending on cache.h; make cache.h depend on object.hElijah Newren
Things should be able to depend on object.h without pulling in all of cache.h. Move an enum to allow this. Note that a couple files previously depended on things brought in through cache.h indirectly (revision.h -> commit.h -> object.h -> cache.h). As such, this change requires making existing dependencies more explicit in half a dozen files. The inclusion of strbuf.h in some headers if of particular note: these headers directly embedded a strbuf in some new structs, meaning they should have been including strbuf.h all along but were indirectly getting the necessary definitions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18fsck: provide a function to fsck buffer without object structJeff King
The fsck code has been slowly moving away from requiring an object struct in commits like 103fb6d43b (fsck: accept an oid instead of a "struct tag" for fsck_tag(), 2019-10-18), c5b4269b57 (fsck: accept an oid instead of a "struct commit" for fsck_commit(), 2019-10-18), etc. However, the only external interface that fsck.c provides is fsck_object(), which requires an object struct, then promptly discards everything except its oid and type. Let's factor out the post-discard part of that function as fsck_buffer(), leaving fsck_object() as a thin wrapper around it. That will provide more flexibility for callers which may not have a struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13Sync with 2.38.3Junio C Hamano
2022-12-13Merge branch 'maint-2.36' into maint-2.37Junio C Hamano
2022-12-09fsck: implement checks for gitattributesPatrick Steinhardt
Recently, a vulnerability was reported that can lead to an out-of-bounds write when reading an unreasonably large gitattributes file. The root cause of this error are multiple integer overflows in different parts of the code when there are either too many lines, when paths are too long, when attribute names are too long, or when there are too many attributes declared for a pattern. As all of these are related to size, it seems reasonable to restrict the size of the gitattributes file via git-fsck(1). This allows us to both stop distributing known-vulnerable objects via common hosting platforms that have fsck enabled, and users to protect themselves by enabling the `fetch.fsckObjects` config. There are basically two checks: 1. We verify that size of the gitattributes file is smaller than 100MB. 2. We verify that the maximum line length does not exceed 2048 bytes. With the preceding commits, both of these conditions would cause us to either ignore the complete gitattributes file or blob in the first case, or the specific line in the second case. Now with these consistency checks added, we also grow the ability to stop distributing such files in the first place when `receive.fsckObjects` is enabled. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25fsck: document msg-idJohn Cai
The documentation lacks mention of specific <msg-id> that are supported. While git-help --config will display a list of these options, often developers' first instinct is to consult the git docs to find valid config values. Add a list of fsck error messages, and link to it from the git-fsck documentation. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25fsck: remove the unused MISSING_TREE_OBJECTJunio C Hamano
This error type has never been used since it was introduced at 159e7b08 (fsck: detect gitmodules files, 2018-05-02). Signed-off-by: Junio C Hamano <gitster@pobox.com>