aboutsummaryrefslogtreecommitdiff
path: root/Documentation/config/maintenance.adoc
AgeCommit message (Collapse)Author
2026-02-24builtin/maintenance: use "geometric" strategy by defaultPatrick Steinhardt
The git-gc(1) command has been introduced in the early days of Git in 30f610b7b0 (Create 'git gc' to perform common maintenance operations., 2006-12-27) as the main repository maintenance utility. And while the tool has of course evolved since then to cover new parts, the basic strategy it uses has never really changed much. It is safe to say that since 2006 the Git ecosystem has changed quite a bit. Repositories tend to be much larger nowadays than they have been almost 20 years ago, and large parts of the industry went crazy for monorepos (for various wildly different definitions of "monorepo"). So the maintenance strategy we used back then may not be the best fit nowadays anymore. Arguably, most of the maintenance tasks that git-gc(1) does are still perfectly fine today: repacking references, expiring various data structures and things like tend to not cause huge problems. But the big exception is the way we repack objects. git-gc(1) by default uses a split strategy: it performs incremental repacks by default, and then whenever we have too many packs we perform a large all-into-one repack. This all-into-one repack is what is causing problems nowadays, as it is an operation that is quite expensive. While it is wasteful in small- and medium-sized repositories, in large repos it may even be prohibitively expensive. We have eventually introduced git-maintenance(1) that was slated as a replacement for git-gc(1). In contrast to git-gc(1), it is much more flexible as it is structured around configurable tasks and strategies. So while its default "gc" strategy still uses git-gc(1) under the hood, it allows us to iterate. A second strategy it knows about is the "incremental" strategy, which we configure when registering a repository for scheduled maintenance. This strategy isn't really a full replacement for git-gc(1) though, as it doesn't know to expire unused data structures. In Git 2.52 we have thus introduced a new "geometric" strategy that is a proper replacement for the old git-gc(1). In contrast to the incremental/all-into-one split used by git-gc(1), the new "geometric" strategy maintains a geometric progression of packfiles, which significantly reduces the number of all-into-one repacks that we have to perform in large repositories. It is thus a much better fit for large repositories than git-gc(1). Note that the "geometric" strategy isn't perfect though: while we perform way less all-into-one repacks compared to git-gc(1), we still have to perform them eventually. But for the largest repositories out there this may not be an option either, as client machines might not be powerful enough to perform such a repack in the first place. These cases would thus still be covered by the "incremental" strategy. Switch the default strategy away from "gc" to "geometric", but retain the "incremental" strategy configured when registering background maintenance with `git maintenance register`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24builtin/maintenance: introduce "geometric" strategyPatrick Steinhardt
We have two different repacking strategies in Git: - The "gc" strategy uses git-gc(1). - The "incremental" strategy uses multi-pack indices and `git multi-pack-index repack` to merge together smaller packfiles as determined by a specific batch size. The former strategy is our old and trusted default, whereas the latter has historically been used for our scheduled maintenance. But both strategies have their shortcomings: - The "gc" strategy performs regular all-into-one repacks. Furthermore it is rather inflexible, as it is not easily possible for a user to enable or disable specific subtasks. - The "incremental" strategy is not a full replacement for the "gc" strategy as it doesn't know to prune stale data. So today, we don't have a strategy that is well-suited for large repos while being a full replacement for the "gc" strategy. Introduce a new "geometric" strategy that aims to fill this gap. This strategy invokes all the usual cleanup tasks that git-gc(1) does like pruning reflogs and rerere caches as well as stale worktrees. But where it differs from both the "gc" and "incremental" strategy is that it uses our geometric repacking infrastructure exposed by git-repack(1) to repack packfiles. The advantage of geometric repacking is that we only need to perform an all-into-one repack when the object count in a repo has grown significantly. One downside of this strategy is that pruning of unreferenced objects is not going to happen regularly anymore. Every geometric repack knows to soak up all loose objects regardless of their reachability, and merging two or more packs doesn't consider reachability, either. Consequently, the number of unreachable objects will grow over time. This is remedied by doing an all-into-one repack instead of a geometric repack whenever we determine that the geometric repack would end up merging all packfiles anyway. This all-into-one repack then performs our usual reachability checks and writes unreachable objects into a cruft pack. As cruft packs won't ever be merged during geometric repacks we can thus phase out these objects over time. Of course, this still means that we retain unreachable objects for far longer than with the "gc" strategy. But the maintenance strategy is intended especially for large repositories, where the basic assumption is that the set of unreachable objects will be significantly dwarfed by the number of reachable objects. If this assumption is ever proven to be too disadvantageous we could for example introduce a time-based strategy: if the largest packfile has not been touched for longer than $T, we perform an all-into-one repack. But for now, such a mechanism is deferred into the future as it is not clear yet whether it is needed in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24builtin/maintenance: make "gc" strategy accessiblePatrick Steinhardt
While the user can pick the "incremental" maintenance strategy, it is not possible to explicitly use the "gc" strategy. This has two downsides: - It is impossible to use the default "gc" strategy for a specific repository when the strategy was globally set to a different strategy. - It is not possible to use git-gc(1) for scheduled maintenance. Address these issues by making making the "gc" strategy configurable. Furthermore, extend the strategy so that git-gc(1) runs for both manual and scheduled maintenance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24builtin/maintenance: extend "maintenance.strategy" to manual maintenancePatrick Steinhardt
The "maintenance.strategy" configuration allows users to configure how Git is supposed to perform repository maintenance. The idea is that we provide a set of high-level strategies that may be useful in different contexts, like for example when handling a large monorepo. Furthermore, the strategy can be tweaked by the user by overriding specific tasks. In its current form though, the strategy only applies to scheduled maintenance. This creates something of a gap, as scheduled and manual maintenance will now use _different_ strategies as the latter would continue to use git-gc(1) by default. This makes the strategies way less useful than they could be on the one hand. But even more importantly, the two different strategies might clash with one another, where one of the strategies performs maintenance in such a way that it discards benefits from the other strategy. So ideally, it should be possible to pick one strategy that then applies globally to all the different ways that we perform maintenance. This doesn't necessarily mean that the strategy always does the _same_ thing for every maintenance type. But it means that the strategy can configure the different types to work in tandem with each other. Change the meaning of "maintenance.strategy" accordingly so that the strategy is applied to both types, manual and scheduled. As preceding commits have introduced logic to run maintenance tasks depending on this type we can tweak strategies so that they perform those tasks depending on the context. Note that this raises the question of backwards compatibility: when the user has configured the "incremental" strategy we would have ignored that strategy beforehand. Instead, repository maintenance would have continued to use git-gc(1) by default. But luckily, we can match that behaviour by: - Keeping all current tasks of the incremental strategy as `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not run during manual maintenance. - Configuring the "gc" task so that it is invoked during manual maintenance. Like this, the user shouldn't observe any difference in behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24builtin/maintenance: make the geometric factor configurablePatrick Steinhardt
The geometric repacking task uses a factor of two for its geometric sequence, meaning that each next pack must contain at least twice as many objects as the next-smaller one. In some cases it may be helpful to configure this factor though to reduce the number of packfile merges even further, e.g. in very big repositories. But while git-repack(1) itself supports doing this, the maintenance task does not give us a way to tune it. Introduce a new "maintenance.geometric-repack.splitFactor" configuration to plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24builtin/maintenance: introduce "geometric-repack" taskPatrick Steinhardt
Introduce a new "geometric-repack" task. This task uses our geometric repack infrastructure as provided by git-repack(1) itself, which is a strategy that especially hosting providers tend to use to amortize the costs of repacking objects. There is one issue though with geometric repacks, namely that they unconditionally pack all loose objects, regardless of whether or not they are reachable. This is done because it means that we can completely skip the reachability step, which significantly speeds up the operation. But it has the big downside that we are unable to expire objects over time. To address this issue we thus use a split strategy in this new task: whenever a geometric repack would merge together all packs, we instead do an all-into-one repack. By default, these all-into-one repacks have cruft packs enabled, so unreachable objects would now be written into their own pack. Consequently, they won't be soaked up during geometric repacking anymore and can be expired with the next full repack, assuming that their expiry date has surpassed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07builtin/maintenance: introduce "rerere-gc" taskPatrick Steinhardt
While git-gc(1) knows to garbage collect the rerere cache, git-maintenance(1) does not yet have a task for this cleanup. Introduce a new "rerere-gc" task to plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07builtin/maintenance: introduce "worktree-prune" taskPatrick Steinhardt
While git-gc(1) knows to prune stale worktrees, git-maintenance(1) does not yet have a task for this cleanup. Introduce a new "worktree-prune" task to plug this gap. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-16Merge branch 'ps/maintenance-reflog-expire'Junio C Hamano
"git maintenance" learns a new task to expire reflog entries. * ps/maintenance-reflog-expire: builtin/maintenance: introduce "reflog-expire" task builtin/gc: split out function to expire reflog entries builtin/reflog: make functions regarding `reflog_expire_options` public builtin/reflog: stop storing per-reflog expiry dates globally builtin/reflog: stop storing default reflog expiry dates globally reflog: rename `cmd_reflog_expire_cb` to `reflog_expire_options`
2025-04-08builtin/maintenance: introduce "reflog-expire" taskPatrick Steinhardt
By default, git-maintenance(1) uses the "gc" task to ensure that the repository is well-maintained. This can be changed, for example by either explicitly configuring which tasks should be enabled or by using the "incremental" maintenance strategy. If so, git-maintenance(1) does not know to expire reflog entries, which is a subtask that git-gc(1) knows to perform for the user. Consequently, the reflog will grow indefinitely unless the user manually trims it. Introduce a new "reflog-expire" task that plugs this gap: - When running the task directly, then we simply execute `git reflog expire --all`, which is the same as git-gc(1). - When running git-maintenance(1) with the `--auto` flag, then we only run the task in case the "HEAD" reflog has at least N reflog entries that would be discarded. By default, N is set to 100, but this can be configured via "maintenance.reflog-expire.auto". When a negative integer has been provided we always expire entries, zero causes us to never expire entries, and a positive value specifies how many entries need to exist before we consider pruning the entries. Note that the condition for the `--auto` flags is merely a heuristic and optimized for being fast. This is because `git maintenance run --auto` will be executed quite regularly, so scanning through all reflogs would likely be too expensive in many repositories. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-23maintenance: add loose-objects.batchSize configDerrick Stolee
The 'loose-objects' task of 'git maintenance run' first deletes loose objects that exit within packfiles and then collects loose objects into a packfile. This second step uses an implicit limit of fifty thousand that cannot be modified by users. Add a new config option that allows this limit to be adjusted or ignored entirely. While creating tests for this option, I noticed that actually there was an off-by-one error due to the strict comparison in the limit check. I considered making the limit check turn true on equality, but instead I thought to use INT_MAX as a "no limit" barrier which should mean it's never possible to hit the limit. Thus, a new decrement to the limit is provided if the value is positive. (The restriction to positive values is to avoid underflow if INT_MIN is configured.) Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21doc: use .adoc extension for AsciiDoc filesbrian m. carlson
We presently use the ".txt" extension for our AsciiDoc files. While not wrong, most editors do not associate this extension with AsciiDoc, meaning that contributors don't get automatic editor functionality that could be useful, such as syntax highlighting and prose linting. It is much more common to use the ".adoc" extension for AsciiDoc files, since this helps editors automatically detect files and also allows various forges to provide rich (HTML-like) rendering. Let's do that here, renaming all of the files and updating the includes where relevant. Adjust the various build scripts and makefiles to use the new extension as well. Note that this should not result in any user-visible changes to the documentation. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>