aboutsummaryrefslogtreecommitdiff
path: root/pack-bitmap.c
AgeCommit message (Collapse)Author
2023-05-08pack-bitmap.c: extract `fill_in_bitmap()`Taylor Blau
To prepare for the boundary-based bitmap walk to perform a fill-in traversal using the boundary of either side as the tips, extract routine used to perform fill-in traversal by `find_objects()` so that it can be used in both places. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02fsck: verify checksums of all .bitmap filesDerrick Stolee
If a filesystem-level corruption occurs in a .bitmap file, Git can react poorly. This could take the form of a run-time error due to failing to parse an EWAH bitmap or be more subtle such as returning the wrong set of objects to a fetch or clone. A natural first response to either of these kinds of errors is to run 'git fsck' to see if any files are corrupt. This currently ignores all .bitmap files. Add checks to 'git fsck' for all .bitmap files that are currently associated with a multi-pack-index or pack file. Verify their checksums using the hashfile API. We iterate through all multi-pack-indexes and pack-files to be sure to check all .bitmap files, not just the one that would be read by the process. For example, a multi-pack-index bitmap overrules a pack-bitmap. However, if the multi-pack-index is removed, the pack-bitmap may be selected instead. Be thorough to include every file that could become active in such a way. This includes checking files in alternates. There is potential that we could extend this effort to check the structure of the reachability bitmaps themselves, but it is very expensive to do so. At minimum, it's as expensive as generating the bitmaps in the first place, and that's assuming that we don't use the trivial algorithm of verifying each bitmap individually. The trivial algorithm will result in quadratic behavior (number of objects times number of bitmapped commits) while the bitmap building operation constructs a lattice of commits to build bitmaps incrementally and then generate the final bitmaps from a subset of those commits. If we were to extend 'git fsck' to check .bitmap file contents more closely like this, then we would likely want to hide it behind an option that signals the user is more willing to do expensive operations such as this. For testing, set up a repository with a pack-bitmap _and_ a multi-pack-index bitmap. This requires some file movement to avoid deleting the pack-bitmap during the repack that creates the multi-pack-index bitmap. We can then verify that 'git fsck' is checking all files, not just the "active" bitmap. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27Merge branch 'ds/fsck-pack-revindex'Junio C Hamano
"git fsck" learned to validate the on-disk pack reverse index files. * ds/fsck-pack-revindex: fsck: validate .rev file header fsck: check rev-index position values fsck: check rev-index checksums fsck: create scaffolding for rev-index checks
2023-04-27Merge branch 'tb/pack-revindex-on-disk'Junio C Hamano
The on-disk reverse index that allows mapping from the pack offset to the object name for the object stored at the offset has been enabled by default. * tb/pack-revindex-on-disk: t: invert `GIT_TEST_WRITE_REV_INDEX` config: enable `pack.writeReverseIndex` by default pack-revindex: introduce `pack.readReverseIndex` pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK pack-revindex: make `load_pack_revindex` take a repository t5325: mark as leak-free pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-25Merge branch 'en/header-split-cache-h'Junio C Hamano
Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
2023-04-17fsck: validate .rev file headerDerrick Stolee
While parsing a .rev file, we check the header information to be sure it makes sense. This happens before doing any additional validation such as a checksum or value check. In order to differentiate between a bad header and a non-existent file, we need to update the API for loading a reverse index. Make load_pack_revindex_from_disk() non-static and specify that a positive value means "the file does not exist" while other errors during parsing are negative values. Since an invalid header prevents setting up the structures we would use for further validations, we can stop at that point. The place where we can distinguish between a missing file and a corrupt file is inside load_revindex_from_disk(), which is used both by pack rev-indexes and multi-pack-index rev-indexes. Some tests in t5326 demonstrate that it is critical to take some conditions to allow positive error signals. Add tests that check the three header values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13pack-revindex: make `load_pack_revindex` take a repositoryTaylor Blau
In a future commit, we will introduce a `pack.readReverseIndex` configuration, which forces Git to generate the reverse index from scratch instead of loading it from disk. In order to avoid reading this configuration value more than once, we'll use the `repo_settings` struct to lazily load this value. In order to access the `struct repo_settings`, add a repository argument to `load_pack_revindex`, and update all callers to pass the correct instance (in all cases, `the_repository`). In certain instances, a new function-local variable is introduced to take the place of a `struct repository *` argument to the function itself to avoid propagating the new parameter even further throughout the tree. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove cache.h inclusion due to object-file.h changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/config-multi-and-nonbool'Junio C Hamano
Assorted config API updates. * ab/config-multi-and-nonbool: for-each-repo: with bad config, don't conflate <path> and <cmd> config API: add "string" version of *_value_multi(), fix segfaults config API users: test for *_get_value_multi() segfaults for-each-repo: error on bad --config config API: have *_multi() return an "int" and take a "dest" versioncmp.c: refactor config reading next commit config API: add and use a "git_config_get()" family of functions config tests: add "NULL" tests for *_get_value_multi() config tests: cover blind spots in git_die_config() tests
2023-03-28config API: add "string" version of *_value_multi(), fix segfaultsÆvar Arnfjörð Bjarmason
Fix numerous and mostly long-standing segfaults in consumers of the *_config_*value_multi() API. As discussed in the preceding commit an empty key in the config syntax yields a "NULL" string, which these users would give to strcmp() (or similar), resulting in segfaults. As this change shows, most users users of the *_config_*value_multi() API didn't really want such an an unsafe and low-level API, let's give them something with the safety of git_config_get_string() instead. This fix is similar to what the *_string() functions and others acquired in[1] and [2]. Namely introducing and using a safer "*_get_string_multi()" variant of the low-level "_*value_multi()" function. This fixes segfaults in code introduced in: - d811c8e17c6 (versionsort: support reorder prerelease suffixes, 2015-02-26) - c026557a373 (versioncmp: generalize version sort suffix reordering, 2016-12-08) - a086f921a72 (submodule: decouple url and submodule interest, 2017-03-17) - a6be5e6764a (log: add log.excludeDecoration config option, 2020-04-16) - 92156291ca8 (log: add default decoration filter, 2022-08-05) - 50a044f1e40 (gc: replace config subprocesses with API calls, 2022-09-27) There are now two users ofthe low-level API: - One in "builtin/for-each-repo.c", which we'll convert in a subsequent commit. - The "t/helper/test-config.c" code added in [3]. As seen in the preceding commit we need to give the "t/helper/test-config.c" caller these "NULL" entries. We could also alter the underlying git_configset_get_value_multi() function to be "string safe", but doing so would leave no room for other variants of "*_get_value_multi()" that coerce to other types. Such coercion can't be built on the string version, since as we've established "NULL" is a true value in the boolean context, but if we coerced it to "" for use in a list of strings it'll be subsequently coerced to "false" as a boolean. The callback pattern being used here will make it easy to introduce e.g. a "multi" variant which coerces its values to "bool", "int", "path" etc. 1. 40ea4ed9032 (Add config_error_nonbool() helper function, 2008-02-11) 2. 6c47d0e8f39 (config.c: guard config parser from value=NULL, 2008-02-11). 3. 4c715ebb96a (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28config API: have *_multi() return an "int" and take a "dest"Ævar Arnfjörð Bjarmason
Have the "git_configset_get_value_multi()" function and its siblings return an "int" and populate a "**dest" parameter like every other git_configset_get_*()" in the API. As we'll take advantage of in subsequent commits, this fixes a blind spot in the API where it wasn't possible to tell whether a list was empty from whether a config key existed. For now we don't make use of those new return values, but faithfully convert existing API users. Most of this is straightforward, commentary on cases that stand out: - To ensure that we'll properly use the return values of this function in the future we're using the "RESULT_MUST_BE_USED" macro introduced in [1]. As git_die_config() now has to handle this return value let's have it BUG() if it can't find the config entry. As tested for in a preceding commit we can rely on getting the config list in git_die_config(). - The loops after getting the "list" value in "builtin/gc.c" could also make use of "unsorted_string_list_has_string()" instead of using that loop, but let's leave that for now. - In "versioncmp.c" we now use the return value of the functions, instead of checking if the lists are still non-NULL. 1. 1e8697b5c4e (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21csum-file.h: remove unnecessary inclusion of cache.hElijah Newren
With the change in the last commit to move several functions to write-or-die.h, csum-file.h no longer needs to include cache.h. However, removing that include forces several other C files, which directly or indirectly dependend upon csum-file.h's inclusion of cache.h, to now be more explicit about their dependencies. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24list-objects: mark unused callback parametersJeff King
Our graph-traversal functions take callbacks for showing commits and objects, but not all callbacks need each parameter. Likewise for the similar traverse_bitmap_commit_list(), which has a different interface but serves the same purpose. And the include_check mechanism, which passes along a void pointer which is not always used. Mark the unused ones to to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-29pack-bitmap.c: trace bitmap ignore logs when midx-bitmap is foundJeff King
When we find a midx bitmap, we do not bother checking for pack bitmaps, since we can use only one. But since we will warn of unused bitmaps via trace2, let's continue looking for pack bitmaps when tracing is enabled. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-29pack-bitmap.c: break out of the bitmap loop early if not tracingJeff King
After opening a bitmap successfully, we try opening others only because we want to report that other bitmap files are ignored in the trace2 log. When trace2 is not enabled, we do not have to do any of that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-14pack-bitmap.c: avoid exposing absolute pathsTeng Long
In "open_midx_bitmap_1()" and "open_pack_bitmap_1()", when we find that there are multiple bitmaps, we will only open the first one and then leave warnings about the remaining pack information, the information will contain the absolute path of the repository, for example in a alternates usage scenario. So let's hide this kind of potentially sensitive information in this commit. Found-by: XingXin <moweng.xx@antgroup.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-14pack-bitmap.c: remove unnecessary "open_pack_index()" callsTeng Long
When trying to open a pack bitmap, we call open_pack_bitmap_1() in a loop, during which it tries to open up the pack index corresponding with each available pack. It's likely that we'll end up relying on objects in that pack later in the process (in which case we're doing the work of opening the pack index optimistically), but not guaranteed. For instance, consider a repository with a large number of small packs, and one large pack with a bitmap. If we see that bitmap pack last in our loop which calls open_pack_bitmap_1(), the current code will have opened *all* pack index files in the repository. If the request can be served out of the bitmapped pack alone, then the time spent opening these idx files was wasted.S Since open_pack_bitmap_1() calls is_pack_valid() later on (which in turns calls open_pack_index() itself), we can just drop the earlier call altogether. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-26Merge branch 'ds/bitmap-lookup-remove-tracing'Junio C Hamano
Perf-fix. * ds/bitmap-lookup-remove-tracing: pack-bitmap: remove trace2 region from hot path
2022-09-26pack-bitmap: remove trace2 region from hot pathDerrick Stolee
The trace2 region around the call to lazy_bitmap_for_commit() in bitmap_for_commit() was added in 28cd730680d (pack-bitmap: prepare to read lookup table extension, 2022-08-14). While adding trace2 regions is typically helpful for tracking performance, this method is called possibly thousands of times as a commit walk explores commit history looking for a matching bitmap. When trace2 output is enabled, this region is emitted many times and performance is throttled by that output. For now, remove these regions entirely. This is a critical path, and it would be valuable to measure that the time spent in bitmap_for_commit() does not increase when using the commit lookup table. The best way to do that would be to use a mechanism that sums the time spent in a region and reports a single value at the end of the process. This technique was introduced but not merged by [1] so maybe this example presents some justification to revisit that approach. [1] https://lore.kernel.org/git/pull.1099.v2.git.1640720202.gitgitgadget@gmail.com/ To help with the 'git blame' output in this region, add a comment that warns against adding a trace2 region. Delete a test from t5310 that used that trace output to check that this lookup optimization was activated. To create this kind of test again in the future, the stopwatch traces mentioned earlier could be used as a signal that we activated this code path. Helpedy-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-23pack-bitmap: improve grammar of "xor chain" error messageAlex Henrie
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-26pack-bitmap: prepare to read lookup table extensionAbhradeep Chakraborty
Earlier change teaches Git to write bitmap lookup table. But Git does not know how to parse them. Teach Git to parse the existing bitmap lookup table. The older versions of Git are not affected by it. Those versions ignore the lookup table. Mentored-by: Taylor Blau <me@ttaylorr.com> Co-Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com> Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-27Merge branch 'tl/pack-bitmap-error-messages'Junio C Hamano
Tweak various messages that come from the pack-bitmap codepaths. * tl/pack-bitmap-error-messages: pack-bitmap.c: continue looping when first MIDX bitmap is found pack-bitmap.c: using error() instead of silently returning -1 pack-bitmap.c: do not ignore error when opening a bitmap file pack-bitmap.c: rename "idx_name" to "bitmap_name" pack-bitmap.c: mark more strings for translations pack-bitmap.c: fix formatting of error messages
2022-07-18pack-bitmap.c: continue looping when first MIDX bitmap is foundTeng Long
In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when the first one has been found, then will break out by a "return" directly. But actually, it's better to continue the loop until we have visited both the MIDX in our repository, as well as any alternates (along with _their_ alternates, recursively). The reason for this is, there may exist more than one MIDX file in a repo. The "multi_pack_index" struct is actually designed as a singly linked list, and if a MIDX file has been already opened successfully, then the other MIDX files will be skipped and left with a warning "ignoring extra bitmap file." to the output. The discussion link of community: https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/ Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18pack-bitmap.c: using error() instead of silently returning -1Teng Long
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to return error() instead of "-1" when some unexpected error occurs like "stat bitmap file failed", "bitmap header is invalid" or "checksum mismatch", etc. There are places where we do not replace, such as when the bitmap does not exist (no bitmap in repository is allowed) or when another bitmap has already been opened (in which case it should be a warning rather than an error). Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18pack-bitmap.c: do not ignore error when opening a bitmap fileTeng Long
Calls to git_open() to open the pack bitmap file and multi-pack bitmap file do not report any error when they fail. These files are optional and it is not an error if open failed due to ENOENT, but we shouldn't be ignoring other kinds of errors. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18pack-bitmap.c: rename "idx_name" to "bitmap_name"Teng Long
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use a var named "idx_name" to represent the bitmap filename which is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()" before we open it. There may bring some confusion in this "idx_name" naming, which might lead us to think of ".idx "or" multi-pack-index" files, although bitmap is essentially can be understood as a kind of index, let's define this name a little more accurate here. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18pack-bitmap.c: mark more strings for translationsTeng Long
In pack-bitmap.c, some printed texts are translated, some are not. Let's support the translations of the bitmap related output. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18pack-bitmap.c: fix formatting of error messagesTeng Long
There are some text output issues in 'pack-bitmap.c', they exist in die(), error() etc. This includes issues with capitalization the first letter, newlines, error() instead of BUG(), and substitution that don't have quotes around them. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03Merge branch 'tb/midx-race-in-pack-objects'Junio C Hamano
The multi-pack-index code did not protect the packfile it is going to depend on from getting removed while in use, which has been corrected. * tb/midx-race-in-pack-objects: builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects builtin/pack-objects.c: ensure included `--stdin-packs` exist builtin/pack-objects.c: avoid redundant NULL check pack-bitmap.c: check preferred pack validity when opening MIDX bitmap
2022-05-24pack-bitmap.c: check preferred pack validity when opening MIDX bitmapTaylor Blau
When pack-objects adds an entry to its packing list, it marks the packfile and offset containing the object, which we may later use during verbatim reuse (c.f., `write_reused_pack_verbatim()`). If the packfile in question is deleted in the background (e.g., due to a concurrent `git repack`), we'll die() as a result of calling use_pack(), unless we have an open file descriptor on the pack itself. 4c08018204 (pack-objects: protect against disappearing packs, 2011-10-14) worked around this by opening the pack ahead of time before recording it as a valid source for reuse. 4c08018204's treatment meant that we could tolerate disappearing packs, since it ensures we always have an open file descriptor on any pack that we mark as a valid source for reuse. This tightens the race to only happen when we need to close an open pack's file descriptor (c.f., the caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in which case we'll complain that a pack could not be accessed and die(). The pack bitmap code does this, too, since prior to dc1daacdcc (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it was vulnerable to the same race. The MIDX bitmap code does not do this, and is vulnerable to the same race. Apply the same treatment as dc1daacdcc to the routine responsible for opening the multi-pack bitmap's preferred pack to close this race. This patch handles the "preferred" pack (c.f., the section "multi-pack-index reverse indexes" in Documentation/technical/pack-format.txt) specially, since pack-objects depends on reusing exact chunks of that pack verbatim in reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, the utility of a bitmap is significantly diminished. Similar to dc1daacdcc, we could technically just add this check in reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX .bitmap without needing to open any of its packs. But it's simpler to do the check as early as possible, covering all direct uses of the preferred pack. Note that doing this check early requires us to call prepare_midx_pack() early, too, so move the relevant part of that loop from load_reverse_index() into open_midx_bitmap_1(). Subsequent patches handle the non-preferred packs in a slightly different fashion. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09list-objects: consolidate traverse_commit_list[_filtered]Derrick Stolee
Now that all consumers of traverse_commit_list_filtered() populate the 'filter' member of 'struct rev_info', we can drop that parameter from the method prototype to simplify things. In addition, the only thing different now between traverse_commit_list_filtered() and traverse_commit_list() is the presence of the 'omitted' parameter, which is only non-NULL for one caller. We can consolidate these two methods by having one call the other and use the simpler form everywhere the 'omitted' parameter would be NULL. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09pack-bitmap: drop filter in prepare_bitmap_walk()Derrick Stolee
Now that all consumers of prepare_bitmap_walk() have populated the 'filter' member of 'struct rev_info', we can drop that extra parameter from the method and access it directly from the 'struct rev_info'. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27pack-bitmap.c: gracefully fallback after opening pack/MIDXTaylor Blau
When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs. By design, these functions are supposed to be called over every pack and MIDX, since only one of them should have a valid bitmap. Ordinarily we return '0' from these two functions in order to indicate that we successfully loaded a bitmap To signal that we couldn't load a bitmap corresponding to the MIDX/pack (either because one doesn't exist, or because there was an error with loading it), we can return '-1'. In either case, the callers each enumerate all MIDXs/packs to ensure that at most one bitmap per-kind is present. But when we fail to load a bitmap that does exist (for example, loading a MIDX bitmap without finding a corresponding reverse index), we'll return -1 but leave the 'midx' field non-NULL. So when we fallback to loading a pack bitmap, we'll complain that the bitmap we're trying to populate already is "opened", even though it isn't. Rectify this by setting the '->pack' and '->midx' field back to NULL as appropriate. Two tests are added: one to ensure that the MIDX-to-pack bitmap fallback works, and another to ensure we still complain when there are multiple pack bitmaps in a repository. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10Merge branch 'jk/test-bitmap-fix'Junio C Hamano
Tighten code for testing pack-bitmap. * jk/test-bitmap-fix: test_bitmap_hashes(): handle repository without bitmaps
2021-11-05test_bitmap_hashes(): handle repository without bitmapsJeff King
If prepare_bitmap_git() returns NULL (one easy-to-trigger cause being that the repository does not have bitmaps at all), then we'll segfault accessing bitmap_git->hashes: $ t/helper/test-tool bitmap dump-hashes Segmentation fault We should treat this the same as a repository with bitmaps but no name-hashes, and quietly produce an empty output. The later call to free_bitmap_index() in the cleanup label is OK, as it treats a NULL pointer as a noop. This isn't a big deal in practice, as this function is intended for and used only by test-tool. It's probably worth fixing to avoid confusion, but not worth adding coverage for this to the test suite. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28pack-bitmap.c: more aggressively free in free_bitmap_index()Taylor Blau
The function free_bitmap_index() is somewhat lax in what it frees. There are two notable examples: - While it does call kh_destroy_oid_map on the "bitmaps" map, which maps commit OIDs to their corresponding bitmaps, the bitmaps themselves are not freed. Note here that we recycle already-freed ewah_bitmaps into a pool, but these are handled correctly by ewah_pool_free(). - We never bother to free the extended index's "positions" map, which we always allocate in load_bitmap(). Fix both of these. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28pack-bitmap.c: don't leak type-level bitmapsTaylor Blau
test_bitmap_walk() is used to implement `git rev-list --test-bitmap`, which compares the result of the on-disk bitmaps with ones generated on-the-fly during a revision walk. In fa95666a40 (pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps, 2021-08-24), we hardened those tests to also check the four special type-level bitmaps, but never freed those bitmaps. We should have, since each required an allocation when we EWAH-decompressed them. Free those, plugging that leak, and also free the base (the scratch-pad bitmap), too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-28midx.c: write MIDX filenames to strbufTaylor Blau
To ask for the name of a MIDX and its corresponding .rev file, callers invoke get_midx_filename() and get_midx_rev_filename(), respectively. These both invoke xstrfmt(), allocating a chunk of memory which must be freed later on. This makes callers in pack-bitmap.c somewhat awkward. Specifically, midx_bitmap_filename(), which is implemented like: return xstrfmt("%s-%s.bitmap", get_midx_filename(midx->object_dir), hash_to_hex(get_midx_checksum(midx))); this leaks the second argument to xstrfmt(), which itself was allocated with xstrfmt(). This caller could assign both the result of get_midx_filename() and the outer xstrfmt() to a temporary variable, remembering to free() the former before returning. But that involves a wasteful copy. Instead, get_midx_filename() and get_midx_rev_filename() take a strbuf as an output parameter. This way midx_bitmap_filename() can manipulate and pass around a temporary buffer which it detaches back to its caller. That allows us to implement the function without copying or open-coding get_midx_filename() in a way that doesn't leak. Update the other callers of get_midx_filename() and get_midx_rev_filename() accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-18Merge branch 'tb/repack-write-midx'Junio C Hamano
"git repack" has been taught to generate multi-pack reachability bitmaps. * tb/repack-write-midx: test-read-midx: fix leak of bitmap_index struct builtin/repack.c: pass `--refs-snapshot` when writing bitmaps builtin/repack.c: make largest pack preferred builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: extract showing progress to a variable builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: keep track of existing packs unconditionally midx: preliminary support for `--refs-snapshot` builtin/multi-pack-index.c: support `--stdin-packs` mode midx: expose `write_midx_file_only()` publicly
2021-09-28builtin/repack.c: make largest pack preferredTaylor Blau
When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>