aboutsummaryrefslogtreecommitdiff
path: root/reachable.c
AgeCommit message (Collapse)Author
2026-01-26reachable: convert to use `odb_for_each_object()`Patrick Steinhardt
To figure out which objects expired objects we enumerate all loose and packed objects individually so that we can figure out their respective mtimes. Refactor the code to instead use `odb_for_each_object()` with a request that ask for the object mtime instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-26odb: rename `FOR_EACH_OBJECT_*` flagsPatrick Steinhardt
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This prepares us for a new upcoming `odb_for_each_object()` function and ensures that both the function and its flags have the same prefix. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-09packfile: refactor kept-pack cache to work with packfile storesPatrick Steinhardt
The kept pack cache is a cache of packfiles that are marked as kept either via an accompanying ".kept" file or via an in-memory flag. The cache can be retrieved via `kept_pack_cache()`, where one needs to pass in a repository. Ultimately though the kept-pack cache is a property of the packfile store, and this causes problems in a subsequent commit where we want to move down the packfile store to be a per-object-source entity. Prepare for this and refactor the kept-pack cache to work on top of a packfile store instead. While at it, rename both the function and flags specific to the kept-pack cache so that they can be properly attributed to the respective subsystems. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-04refs: introduce wrapper struct for `each_ref_fn`Patrick Steinhardt
The `each_ref_fn` callback function type is used across our code base for several different functions that iterate through reference. There's a bunch of callbacks implementing this type, which makes any changes to the callback signature extremely noisy. An example of the required churn is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a single argument required us to change 48 files. It was already proposed back then [1] that we might want to introduce a wrapper structure to alleviate the pain going forward. While this of course requires the same kind of global refactoring as just introducing a new parameter, it at least allows us to more change the callback type afterwards by just extending the wrapper structure. One counterargument to this refactoring is that it makes the structure more opaque. While it is obvious which callsites need to be fixed up when we change the function type, it's not obvious anymore once we use a structure. That being said, we only have a handful of sites that actually need to populate this wrapper structure: our ref backends, "refs/iterator.c" as well as very few sites that invoke the iterator callback functions directly. Introduce this wrapper structure so that we can adapt the iterator interfaces more readily. [1]: <ZmarVcF5JjsZx0dl@tanuki> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-08-05Merge branch 'ps/object-file-wo-the-repository'Junio C Hamano
Reduce implicit assumption and dependence on the_repository in the object-file subsystem. * ps/object-file-wo-the-repository: object-file: get rid of `the_repository` in index-related functions object-file: get rid of `the_repository` in `force_object_loose()` object-file: get rid of `the_repository` in `read_loose_object()` object-file: get rid of `the_repository` in loose object iterators object-file: remove declaration for `for_each_file_in_obj_subdir()` object-file: inline `for_each_loose_file_in_objdir_buf()` object-file: get rid of `the_repository` when writing objects odb: introduce `odb_write_object()` loose: write loose objects map via their source object-file: get rid of `the_repository` in `finalize_object_file()` object-file: get rid of `the_repository` in `loose_object_info()` object-file: get rid of `the_repository` when freshening objects object-file: inline `check_and_freshen()` functions object-file: get rid of `the_repository` in `has_loose_object()` object-file: stop using `the_hash_algo` object-file: fix -Wsign-compare warnings
2025-07-23config: drop `git_config_get_string_multi()` wrapperPatrick Steinhardt
In 036876a1067 (config: hide functions using `the_repository` by default, 2024-08-13) we have moved around a bunch of functions in the config subsystem that depend on `the_repository`. Those function have been converted into mere wrappers around their equivalent function that takes in a repository as parameter, and the intent was that we'll eventually remove those wrappers to make the dependency on the global repository variable explicit at the callsite. Follow through with that intent and remove `git_config_get_string_multi()`. All callsites are adjusted so that they use `repo_config_get_string_multi(the_repository, ...)` instead. While some callsites might already have a repository available, this mechanical conversion is the exact same as the current situation and thus cannot cause any regression. Those sites should eventually be cleaned up in a later patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-16object-file: get rid of `the_repository` in loose object iteratorsPatrick Steinhardt
The iterators for loose objects still rely on `the_repository`. Refactor them: - `for_each_loose_file_in_objdir()` is refactored so that the caller is now expected to pass an `odb_source` as parameter instead of the path to that source. Furthermore, it is renamed accordingly to `for_each_loose_file_in_source()`. - `for_each_loose_object()` is refactored to take in an object database now and calls the above function in a loop. This allows us to get rid of the global dependency. Adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01odb: rename `oid_object_info()`Patrick Steinhardt
Rename `oid_object_info()` to `odb_read_object_info()` as well as their `_extended()` variant to match other functions related to the object database and our modern coding guidelines. Introduce compatibility wrappers so that any in-flight topics will continue to compile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-29object-store: move function declarations to their respective subsystemsPatrick Steinhardt
We carry declarations for a couple of functions in "object-store.h" that are not defined in "object-store.c", but in a different subsystem. Move these declarations to the respective headers whose matching code files carry the corresponding definition. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-24Merge branch 'ps/object-file-cleanup' into ps/object-store-cleanupJunio C Hamano
* ps/object-file-cleanup: object-store: merge "object-store-ll.h" and "object-store.h" object-store: remove global array of cached objects object: split out functions relating to object store subsystem object-file: drop `index_blob_stream()` object-file: split up concerns of `HASH_*` flags object-file: split out functions relating to object store subsystem object-file: move `xmmap()` into "wrapper.c" object-file: move `git_open_cloexec()` to "compat/open.c" object-file: move `safe_create_leading_directories()` into "path.c" object-file: move `mkdir_in_gitdir()` into "path.c"
2025-04-16Merge branch 'ps/cat-file-filter-batch'Junio C Hamano
"git cat-file --batch" and friends learned to allow "--filter=" to omit certain objects, just like the transport layer does. * ps/cat-file-filter-batch: builtin/cat-file: use bitmaps to efficiently filter by object type builtin/cat-file: deduplicate logic to iterate over all objects pack-bitmap: introduce function to check whether a pack is bitmapped pack-bitmap: add function to iterate over filtered bitmapped objects pack-bitmap: allow passing payloads to `show_reachable_fn()` builtin/cat-file: support "object:type=" objects filter builtin/cat-file: support "blob:limit=" objects filter builtin/cat-file: support "blob:none" objects filter builtin/cat-file: wire up an option to filter objects builtin/cat-file: introduce function to report object status builtin/cat-file: rename variable that tracks usage
2025-04-15object-store: merge "object-store-ll.h" and "object-store.h"Patrick Steinhardt
The "object-store-ll.h" header has been introduced to keep transitive header dependendcies and compile times at bay. Now that we have created a new "object-store.c" file though we can easily move the last remaining additional bit of "object-store.h", the `odb_path_map`, out of the header. Do so. As the "object-store.h" header is now equivalent to its low-level alternative we drop the latter and inline it into the former. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07pack-bitmap: allow passing payloads to `show_reachable_fn()`Patrick Steinhardt
The `show_reachable_fn` callback is used by a couple of functions to present reachable objects to the caller. The function does not provide a way for the caller to pass a payload though, which is functionality that we'll require in a subsequent commit. Change the callback type to accept a payload and adapt all callsites accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object: stop depending on `the_repository`Patrick Steinhardt
There are a couple of functions exposed by "object.c" that implicitly depend on `the_repository`. Remove this dependency by injecting the repository via a parameter. Adapt callers accordingly by simply using `the_repository`, except in cases where the subsystem is already free of the repository. In that case, we instead pass the repository provided by the caller's context. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-07worktree: return allocated string from `get_worktree_git_dir()`Patrick Steinhardt
The `get_worktree_git_dir()` function returns a string constant that does not need to be free'd by the caller. This string is computed for three different cases: - If we don't have a worktree we return a path into the Git directory. The returned string is owned by `the_repository`, so there is no need for the caller to free it. - If we have a worktree, but no worktree ID then the caller requests the main worktree. In this case we return a path into the common directory, which again is owned by `the_repository` and thus does not need to be free'd. - In the third case, where we have an actual worktree, we compute the path relative to "$GIT_COMMON_DIR/worktrees/". This string does not need to be released either, even though `git_common_path()` ends up allocating memory. But this doesn't result in a memory leak either because we write into a buffer returned by `get_pathname()`, which returns one out of four static buffers. We're about to drop `git_common_path()` in favor of `repo_common_path()`, which doesn't use the same mechanism but instead returns an allocated string owned by the caller. While we could adapt `get_worktree_git_dir()` to also use `get_pathname()` and print the derived common path into that buffer, the whole schema feels a lot like premature optimization in this context. There are some callsites where we call `get_worktree_git_dir()` in a loop that iterates through all worktrees. But none of these loops seem to be even remotely in the hot path, so saving a single allocation there does not feel worth it. Refactor the function to instead consistently return an allocated path so that we can start using `repo_common_path()` in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `for_each_packed_object`Karthik Nayak
The function `for_each_packed_object` currently relies on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from this function and closely related function `is_promisor_object`. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04packfile: pass down repository to `has_object[_kept]_pack`Karthik Nayak
The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09refs: add referent to each_ref_fnJohn Cai
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt
Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07cocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-09prune: mark rebase autostash and orig-head as reachablePhillip Wood
Rebase records the oid of HEAD before rebasing and the commit created by "--autostash" in files in the rebase state directory. This means that the autostash commit is never reachable from any ref or reflog and when rebasing a detached HEAD the original HEAD can become unreachable if the user expires HEAD's the reflog while the rebase is running. Fix this by reading the relevant files when marking reachable commits. Note that it is possible for the commit recorded in .git/rebase-merge/amend to be unreachable but pruning that object does not affect the operation of "git rebase --continue" as we're only interested in the object id, not in the object itself. Reported-by: Orgad Shaneh <orgads@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-29Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano
Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12gc: introduce `gc.recentObjectsHook`Taylor Blau
This patch introduces a new multi-valued configuration option, `gc.recentObjectsHook` as a means to mark certain objects as recent (and thus exempt from garbage collection), regardless of their age. When performing a garbage collection operation on a repository with unreachable objects, Git makes its decision on what to do with those object(s) based on how recent the objects are or not. Generally speaking, unreachable-but-recent objects stay in the repository, and older objects are discarded. However, we have no convenient way to keep certain precious, unreachable objects around in the repository, even if they have aged out and would be pruned. Our options today consist of: - Point references at the reachability tips of any objects you consider precious, which may be undesirable or infeasible if there are many such objects. - Track them via the reflog, which may be undesirable since the reflog's lifetime is limited to that of the reference it's tracking (and callers may want to keep those unreachable objects around for longer). - Extend the grace period, which may keep around other objects that the caller *does* want to discard. - Manually modify the mtimes of objects you want to keep. If those objects are already loose, this is easy enough to do (you can just enumerate and `touch -m` each one). But if they are packed, you will either end up modifying the mtimes of *all* objects in that pack, or be forced to write out a loose copy of that object, both of which may be undesirable. Even worse, if they are in a cruft pack, that requires modifying its `*.mtimes` file by hand, since there is no exposed plumbing for this. - Force the caller to construct the pack of objects they want to keep themselves, and then mark the pack as kept by adding a ".keep" file. This works, but is burdensome for the caller, and having extra packs is awkward as you roll forward your cruft pack. This patch introduces a new option to the above list via the `gc.recentObjectsHook` configuration, which allows the caller to specify a program (or set of programs) whose output is treated as a set of objects to treat as recent, regardless of their true age. The implementation is straightforward. Git enumerates recent objects via `add_unseen_recent_objects_to_traversal()`, which enumerates loose and packed objects, and eventually calls add_recent_object() on any objects for which `want_recent_object()`'s conditions are met. This patch modifies the recency condition from simply "is the mtime of this object more recent than the cutoff?" to "[...] or, is this object mentioned by at least one `gc.recentObjectsHook`?". Depending on whether or not we are generating a cruft pack, this allows the caller to do one of two things: - If generating a cruft pack, the caller is able to retain additional objects via the cruft pack, even if they would have otherwise been pruned due to their age. - If not generating a cruft pack, the caller is likewise able to retain additional objects as loose. A potential alternative here is to introduce a new mode to alter the contents of the reachable pack instead of the cruft one. One could imagine a new option to `pack-objects`, say `--extra-reachable-tips` that does the same thing as above, adding the visited set of objects along the traversal to the pack. But this has the unfortunate side-effect of altering the reachability closure of that pack. If parts of the unreachable object graph mentioned by one or more of the "extra reachable tips" programs is not closed, then the resulting pack won't be either. This makes it impossible in the general case to write out reachability bitmaps for that pack, since closure is a requirement there. Instead, keep these unreachable objects in the cruft pack (or set of unreachable, loose objects) instead, to ensure that we can continue to have a pack containing just reachable objects, which is always safe to write a bitmap over. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12reachable.c: extract `obj_is_recent()`Taylor Blau
When enumerating objects in order to add recent ones (i.e. those whose mtime is strictly newer than the cutoff) as tips of a reachability traversal, `add_recent_object()` discards objects which do not meet the recency criteria. The subsequent commit will make checking whether or not an object is recent also consult the list of hooks in `pack.recentHook`. Isolate this check in its own function to keep the additional complexity outside of `add_recent_object()`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove cache.h inclusion due to environment.h changesElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24for_each_object: mark unused callback parametersJeff King
The for_each_{loose,packed}_object interface uses callback functions, but not every callback needs all of the parameters. Mark the unused ones to satisfy -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24list-objects: mark unused callback parametersJeff King
Our graph-traversal functions take callbacks for showing commits and objects, but not all callbacks need each parameter. Likewise for the similar traverse_bitmap_commit_list(), which has a different interface but serves the same purpose. And the include_check mechanism, which passes along a void pointer which is not always used. Mark the unused ones to to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26reachable: report precise timestamps from objects in cruft packsTaylor Blau
When generating a cruft pack, the caller within pack-objects will want to know the precise timestamps of cruft objects (i.e., their corresponding values in the .mtimes table) rather than the mtime of the cruft pack itself. Teach add_recent_packed() to lookup each object's precise mtime from the .mtimes file if one exists (indicated by the is_cruft bit on the packed_git structure). A couple of small things worth noting here: - load_pack_mtimes() needs to be called before asking for nth_packed_mtime(), and that call is done lazily here. That function exits early if the .mtimes file has already been opened and parsed, so only the first call is slow. - Checking the is_cruft bit can be done without any extra work on the caller's behalf, since it is set up for us automatically as a side-effect of calling add_packed_git() (just like the 'pack_keep' and 'pack_promisor' bits). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26reachable: add options to add_unseen_recent_objects_to_traversalTaylor Blau
This function behaves very similarly to what we will need in pack-objects in order to implement cruft packs with expiration. But it is lacking a couple of things. Namely, it needs: - a mechanism to communicate the timestamps of individual recent objects to some external caller - and, in the case of packed objects, our future caller will also want to know the originating pack, as well as the offset within that pack at which the object can be found - finally, it needs a way to skip over packs which are marked as kept in-core. To address the first two, add a callback interface in this patch which reports the time of each recent object, as well as a (packed_git, off_t) pair for packed objects. Likewise, add a new option to the packed object iterators to skip over packs which are marked as kept in core. This option will become implicitly tested in a future patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09pack-bitmap: drop filter in prepare_bitmap_walk()Derrick Stolee
Now that all consumers of prepare_bitmap_walk() have populated the 'filter' member of 'struct rev_info', we can drop that extra parameter from the method and access it directly from the 'struct rev_info'. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28object.h: add lookup_object_by_type() functionJeff King
In some cases it's useful for efficiency reasons to get the type of an object before deciding whether to parse it, but we still want an object struct. E.g., in reachable.c, bitmaps give us the type, but we just want to mark flags on each object. Likewise, we may loop over every object and only parse tags in order to peel them; checking the type first lets us avoid parsing the non-tags. But our lookup_blob(), etc, functions make getting an object struct annoying: we have to call the right function for every type. And we cannot just use the generic lookup_object(), because it only returns an already-seen object; it won't allocate a new object struct. Let's provide a function that dispatches to the correct lookup_* function based on a run-time type. In fact, reachable.c already has such a helper, so we'll just make that public. I did change the return type from "void *" to "struct object *". While the former is a clever way to avoid casting inside the function, it's less safe and less informative to people reading the function declaration. The next commit will add a new caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-07Merge branch 'ps/rev-list-object-type-filter'Junio C Hamano
"git rev-list" learns the "--filter=object:type=<type>" option, which can be used to exclude objects of the given kind from the packfile generated by pack-objects. * ps/rev-list-object-type-filter: rev-list: allow filtering of provided items pack-bitmap: implement combined filter pack-bitmap: implement object type filter list-objects: implement object type filter list-objects: support filtering by tag and commit list-objects: move tag processing into its own function revision: mark commit parents as NOT_USER_GIVEN uploadpack.txt: document implication of `uploadpackfilter.allow`
2021-04-29prune: save reachable-from-recent objects with bitmapsJeff King
We pass our prune expiration to mark_reachable_objects(), which will traverse not only the reachable objects, but consider any recent ones as tips for reachability; see d3038d22f9 (prune: keep objects reachable from recent objects, 2014-10-15) for details. However, this interacts badly with the bitmap code path added in fde67d6896 (prune: use bitmaps for reachability traversal, 2019-02-13). If we hit the bitmap-optimized path, we return immediately to avoid the regular traversal, accidentally skipping the "also traverse recent" code. Instead, we should do an if-else for the bitmap versus regular traversal, and then follow up with the "recent" traversal in either case. This reuses the "rev_info" for a bitmap and then a regular traversal, but that should work OK (the bitmap code clears the pending array in the usual way, just like a regular traversal would). Note that I dropped the comment above the regular traversal here. It has little explanatory value, and makes the if-else logic much harder to read. Here are a few variants that I rejected: - it seems like both the reachability and recent traversals could be done in a single traversal. This was rejected by d3038d22f9 (prune: keep objects reachable from recent objects, 2014-10-15), though the balance may be different when using bitmaps. However, there's a subtle correctness issue, too: we use revs->ignore_missing_links for the recent traversal, but not the reachability one. - we could try using bitmaps for the recent traversal, too, which could possibly improve performance. But it would require some fixes in the bitmap code, which uses ignore_missing_links for its own purposes. Plus it would probably not help all that much in practice. We use the reachable tips to generate bitmaps, so those objects are likely not covered by bitmaps (unless they just became unreachable). And in general, we expect the set of unreachable objects to be much smaller anyway, so there's less to gain. The test in t5304 detects the bug and confirms the fix. I also beefed up the tests in t6501, which covers the mtime-checking code more thoroughly, to handle the bitmap case (in addition to just "loose" and "packed" cases). Interestingly, this test doesn't actually detect the bug, because it is running "git gc", and not "prune" directly. And "gc" will call "repack" first, which does not suffer the same bug. So the old-but-reachable-from-recent objects get scooped up into the new pack along with the actually-recent objects, which gives both a recent mtime. But it seemed prudent to get more coverage of the bitmap case for related code. Reported-by: David Emett <dave@sp4m.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19rev-list: allow filtering of provided itemsPatrick Steinhardt
When providing an object filter, it is currently impossible to also filter provided items. E.g. when executing `git rev-list HEAD` , the commit this reference points to will be treated as user-provided and is thus excluded from the filtering mechanism. This makes it harder than necessary to properly use the new `--filter=object:type` filter given that even if the user wants to only see blobs, he'll still see commits of provided references. Improve this by introducing a new `--filter-provided-objects` option to the git-rev-parse(1) command. If given, then all user-provided references will be subject to filtering. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14pack-bitmap: basic noop bitmap filter infrastructureJeff King
Currently you can't use object filters with bitmaps, but we plan to support at least some filters with bitmaps. Let's introduce some infrastructure that will help us do that: - prepare_bitmap_walk() now accepts a list_objects_filter_options parameter (which can be NULL for no filtering; all the current callers pass this) - we'll bail early if the filter is incompatible with bitmaps (just as we would if there were no bitmaps at all). Currently all filters are incompatible. - we'll filter the resulting bitmap; since there are no supported filters yet, this is always a noop. There should be no behavior change yet, but we'll support some actual filters in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14rev-list: allow commit-only bitmap traversalsJeff King
Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_object() to use object_idJeff King
There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14prune: use bitmaps for reachability traversalJeff King
Pruning generally has to traverse the whole commit graph in order to see which objects are reachable. This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course). Here are timings on git.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3% And on linux.git: Test HEAD^ HEAD -------------------------------------------------------------------------- 5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4% The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps. But that's actually pretty realistic: normally prune is run via "gc" right after repacking. A few notes on the implementation: - the change is actually in reachable.c, so it would improve reachability traversals by "reflog expire --stale-fix", as well. Those aren't performed regularly, though (a normal "git gc" doesn't use --stale-fix), so they're not really worth measuring. There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective. - The bitmap case could actually get away without creating a "struct object", and instead the caller could just look up each object id in the bitmap result. However, this would be a marginal improvement in runtime, and it would make the callers much more complicated. They'd have to handle both the bitmap and non-bitmap cases separately, and in the case of git-prune, we'd also have to tweak prune_shallow(), which relies on our SEEN flags. - Because we do create real object structs, we go through a few contortions to create ones of the right type. This isn't strictly necessary (lookup_unknown_object() would suffice), but it's more memory efficient to use the correct types, since we already know them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29tree: add repository argument to lookup_treeStefan Beller
Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29blob: add repository argument to lookup_blobStefan Beller
Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to lookup_objectStefan Beller
Add a repository argument to allow callers of lookup_object to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26cache.h: add repository argument to oid_object_infoStefan Beller
Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11Merge branch 'sb/object-store'Junio C Hamano
Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
2018-03-26object-store: move packed_git and packed_git_mru to object storeStefan Beller
In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14sha1_file: convert sha1_object_info* to object_idbrian m. carlson
Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>