diff options
| author | Junio C Hamano <gitster@pobox.com> | 2022-06-03 14:30:37 -0700 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2022-06-03 14:30:37 -0700 |
| commit | a50036da1a39806a8ae1aba2e2f2fea6f7fb8e08 (patch) | |
| tree | 1e6c184852ee5a5e5860073c04ea73e034b0060f /Documentation | |
| parent | 37d4ae58efcc9f716435f327c39d5552aedb4b7c (diff) | |
| parent | a613164257b46700ca583bdcab160c712ad392fe (diff) | |
| download | git-a50036da1a39806a8ae1aba2e2f2fea6f7fb8e08.tar.xz | |
Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.
* tb/cruft-packs:
sha1-file.c: don't freshen cruft packs
builtin/gc.c: conditionally avoid pruning objects via loose
builtin/repack.c: add cruft packs to MIDX during geometric repack
builtin/repack.c: use named flags for existing_packs
builtin/repack.c: allow configuring cruft pack generation
builtin/repack.c: support generating a cruft pack
builtin/pack-objects.c: --cruft with expiration
reachable: report precise timestamps from objects in cruft packs
reachable: add options to add_unseen_recent_objects_to_traversal
builtin/pack-objects.c: --cruft without expiration
builtin/pack-objects.c: return from create_object_entry()
t/helper: add 'pack-mtimes' test-tool
pack-mtimes: support writing pack .mtimes files
chunk-format.h: extract oid_version()
pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
pack-mtimes: support reading .mtimes files
Documentation/technical: add cruft-packs.txt
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/Makefile | 1 | ||||
| -rw-r--r-- | Documentation/config/gc.txt | 21 | ||||
| -rw-r--r-- | Documentation/config/repack.txt | 9 | ||||
| -rw-r--r-- | Documentation/git-gc.txt | 5 | ||||
| -rw-r--r-- | Documentation/git-pack-objects.txt | 30 | ||||
| -rw-r--r-- | Documentation/git-repack.txt | 11 | ||||
| -rw-r--r-- | Documentation/technical/cruft-packs.txt | 123 | ||||
| -rw-r--r-- | Documentation/technical/pack-format.txt | 19 |
8 files changed, 212 insertions, 7 deletions
diff --git a/Documentation/Makefile b/Documentation/Makefile index d3f043f50d..f2e7fc1daa 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -95,6 +95,7 @@ TECH_DOCS += MyFirstObjectWalk TECH_DOCS += SubmittingPatches TECH_DOCS += ToolsForGit TECH_DOCS += technical/bundle-format +TECH_DOCS += technical/cruft-packs TECH_DOCS += technical/hash-function-transition TECH_DOCS += technical/http-protocol TECH_DOCS += technical/index-format diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt index c834e07991..38fea076a2 100644 --- a/Documentation/config/gc.txt +++ b/Documentation/config/gc.txt @@ -81,14 +81,21 @@ gc.packRefs:: to enable it within all non-bare repos or it can be set to a boolean value. The default is `true`. +gc.cruftPacks:: + Store unreachable objects in a cruft pack (see + linkgit:git-repack[1]) instead of as loose objects. The default + is `false`. + gc.pruneExpire:: - When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'. - Override the grace period with this config variable. The value - "now" may be used to disable this grace period and always prune - unreachable objects immediately, or "never" may be used to - suppress pruning. This feature helps prevent corruption when - 'git gc' runs concurrently with another process writing to the - repository; see the "NOTES" section of linkgit:git-gc[1]. + When 'git gc' is run, it will call 'prune --expire 2.weeks.ago' + (and 'repack --cruft --cruft-expiration 2.weeks.ago' if using + cruft packs via `gc.cruftPacks` or `--cruft`). Override the + grace period with this config variable. The value "now" may be + used to disable this grace period and always prune unreachable + objects immediately, or "never" may be used to suppress pruning. + This feature helps prevent corruption when 'git gc' runs + concurrently with another process writing to the repository; see + the "NOTES" section of linkgit:git-gc[1]. gc.worktreePruneExpire:: When 'git gc' is run, it calls diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt index 41ac6953c8..c79af6d7b8 100644 --- a/Documentation/config/repack.txt +++ b/Documentation/config/repack.txt @@ -30,3 +30,12 @@ repack.updateServerInfo:: If set to false, linkgit:git-repack[1] will not run linkgit:git-update-server-info[1]. Defaults to true. Can be overridden when true by the `-n` option of linkgit:git-repack[1]. + +repack.cruftWindow:: +repack.cruftWindowMemory:: +repack.cruftDepth:: +repack.cruftThreads:: + Parameters used by linkgit:git-pack-objects[1] when generating + a cruft pack and the respective parameters are not given over + the command line. See similarly named `pack.*` configuration + variables for defaults and meaning. diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index 853967dea0..ba4e67700e 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -54,6 +54,11 @@ other housekeeping tasks (e.g. rerere, working trees, reflog...) will be performed as well. +--cruft:: + When expiring unreachable objects, pack them separately into a + cruft pack instead of storing the loose objects as loose + objects. + --prune=<date>:: Prune loose objects older than date (default is 2 weeks ago, overridable by the config variable `gc.pruneExpire`). diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index f8344e1e5b..a9995a932c 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -13,6 +13,7 @@ SYNOPSIS [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=<n>] [--depth=<n>] [--revs [--unpacked | --all]] [--keep-pack=<pack-name>] + [--cruft] [--cruft-expiration=<time>] [--stdout [--filter=<filter-spec>] | <base-name>] [--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list> @@ -95,6 +96,35 @@ base-name:: Incompatible with `--revs`, or options that imply `--revs` (such as `--all`), with the exception of `--unpacked`, which is compatible. +--cruft:: + Packs unreachable objects into a separate "cruft" pack, denoted + by the existence of a `.mtimes` file. Typically used by `git + repack --cruft`. Callers provide a list of pack names and + indicate which packs will remain in the repository, along with + which packs will be deleted (indicated by the `-` prefix). The + contents of the cruft pack are all objects not contained in the + surviving packs which have not exceeded the grace period (see + `--cruft-expiration` below), or which have exceeded the grace + period, but are reachable from an other object which hasn't. ++ +When the input lists a pack containing all reachable objects (and lists +all other packs as pending deletion), the corresponding cruft pack will +contain all unreachable objects (with mtime newer than the +`--cruft-expiration`) along with any unreachable objects whose mtime is +older than the `--cruft-expiration`, but are reachable from an +unreachable object whose mtime is newer than the `--cruft-expiration`). ++ +Incompatible with `--unpack-unreachable`, `--keep-unreachable`, +`--pack-loose-unreachable`, `--stdin-packs`, as well as any other +options which imply `--revs`. Also incompatible with `--max-pack-size`; +when this option is set, the maximum pack size is not inferred from +`pack.packSizeLimit`. + +--cruft-expiration=<approxidate>:: + If specified, objects are eliminated from the cruft pack if they + have an mtime older than `<approxidate>`. If unspecified (and + given `--cruft`), then no objects are eliminated. + --window=<n>:: --depth=<n>:: These two options affect how the objects contained in diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index ee30edc178..0bf13893d8 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -63,6 +63,17 @@ to the new separate pack will be written. Also run 'git prune-packed' to remove redundant loose object files. +--cruft:: + Same as `-a`, unless `-d` is used. Then any unreachable objects + are packed into a separate cruft pack. Unreachable objects can + be pruned using the normal expiry rules with the next `git gc` + invocation (see linkgit:git-gc[1]). Incompatible with `-k`. + +--cruft-expiration=<approxidate>:: + Expire unreachable objects older than `<approxidate>` + immediately instead of waiting for the next `git gc` invocation. + Only useful with `--cruft -d`. + -l:: Pass the `--local` option to 'git pack-objects'. See linkgit:git-pack-objects[1]. diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt new file mode 100644 index 0000000000..d81f3a8982 --- /dev/null +++ b/Documentation/technical/cruft-packs.txt @@ -0,0 +1,123 @@ += Cruft packs + +The cruft packs feature offer an alternative to Git's traditional mechanism of +removing unreachable objects. This document provides an overview of Git's +pruning mechanism, and how a cruft pack can be used instead to accomplish the +same. + +== Background + +To remove unreachable objects from your repository, Git offers `git repack -Ad` +(see linkgit:git-repack[1]). Quoting from the documentation: + +[quote] +[...] unreachable objects in a previous pack become loose, unpacked objects, +instead of being left in the old pack. [...] loose unreachable objects will be +pruned according to normal expiry rules with the next 'git gc' invocation. + +Unreachable objects aren't removed immediately, since doing so could race with +an incoming push which may reference an object which is about to be deleted. +Instead, those unreachable objects are stored as loose objects and stay that way +until they are older than the expiration window, at which point they are removed +by linkgit:git-prune[1]. + +Git must store these unreachable objects loose in order to keep track of their +per-object mtimes. If these unreachable objects were written into one big pack, +then either freshening that pack (because an object contained within it was +re-written) or creating a new pack of unreachable objects would cause the pack's +mtime to get updated, and the objects within it would never leave the expiration +window. Instead, objects are stored loose in order to keep track of the +individual object mtimes and avoid a situation where all cruft objects are +freshened at once. + +This can lead to undesirable situations when a repository contains many +unreachable objects which have not yet left the grace period. Having large +directories in the shards of `.git/objects` can lead to decreased performance in +the repository. But given enough unreachable objects, this can lead to inode +starvation and degrade the performance of the whole system. Since we +can never pack those objects, these repositories often take up a large amount of +disk space, since we can only zlib compress them, but not store them in delta +chains. + +== Cruft packs + +A cruft pack eliminates the need for storing unreachable objects in a loose +state by including the per-object mtimes in a separate file alongside a single +pack containing all loose objects. + +A cruft pack is written by `git repack --cruft` when generating a new pack. +linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` +is a classic all-into-one repack, meaning that everything in the resulting pack is +reachable, and everything else is unreachable. Once written, the `--cruft` +option instructs `git repack` to generate another pack containing only objects +not packed in the previous step (which equates to packing all unreachable +objects together). This progresses as follows: + + 1. Enumerate every object, marking any object which is (a) not contained in a + kept-pack, and (b) whose mtime is within the grace period as a traversal + tip. + + 2. Perform a reachability traversal based on the tips gathered in the previous + step, adding every object along the way to the pack. + + 3. Write the pack out, along with a `.mtimes` file that records the per-object + timestamps. + +This mode is invoked internally by linkgit:git-repack[1] when instructed to +write a cruft pack. Crucially, the set of in-core kept packs is exactly the set +of packs which will not be deleted by the repack; in other words, they contain +all of the repository's reachable objects. + +When a repository already has a cruft pack, `git repack --cruft` typically only +adds objects to it. An exception to this is when `git repack` is given the +`--cruft-expiration` option, which allows the generated cruft pack to omit +expired objects instead of waiting for linkgit:git-gc[1] to expire those objects +later on. + +It is linkgit:git-gc[1] that is typically responsible for removing expired +unreachable objects. + +== Caution for mixed-version environments + +Repositories that have cruft packs in them will continue to work with any older +version of Git. Note, however, that previous versions of Git which do not +understand the `.mtimes` file will use the cruft pack's mtime as the mtime for +all of the objects in it. In other words, do not expect older (pre-cruft pack) +versions of Git to interpret or even read the contents of the `.mtimes` file. + +Note that having mixed versions of Git GC-ing the same repository can lead to +unreachable objects never being completely pruned. This can happen under the +following circumstances: + + - An older version of Git running GC explodes the contents of an existing + cruft pack loose, using the cruft pack's mtime. + - A newer version running GC collects those loose objects into a cruft pack, + where the .mtime file reflects the loose object's actual mtimes, but the + cruft pack mtime is "now". + +Repeating this process will lead to unreachable objects not getting pruned as a +result of repeatedly resetting the objects' mtimes to the present time. + +If you are GC-ing repositories in a mixed version environment, consider omitting +the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and +leaving the `gc.cruftPacks` configuration unset until all writers understand +cruft packs. + +== Alternatives + +Notable alternatives to this design include: + + - The location of the per-object mtime data, and + - Storing unreachable objects in multiple cruft packs. + +On the location of mtime data, a new auxiliary file tied to the pack was chosen +to avoid complicating the `.idx` format. If the `.idx` format were ever to gain +support for optional chunks of data, it may make sense to consolidate the +`.mtimes` format into the `.idx` itself. + +Storing unreachable objects among multiple cruft packs (e.g., creating a new +cruft pack during each repacking operation including only unreachable objects +which aren't already stored in an earlier cruft pack) is significantly more +complicated to construct, and so aren't pursued here. The obvious drawback to +the current implementation is that the entire cruft pack must be re-written from +scratch. diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 6d3efb7d16..b520aa9c45 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -294,6 +294,25 @@ Pack file entry: <+ All 4-byte numbers are in network order. +== pack-*.mtimes files have the format: + +All 4-byte numbers are in network byte order. + + - A 4-byte magic number '0x4d544d45' ('MTME'). + + - A 4-byte version identifier (= 1). + + - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). + + - A table of 4-byte unsigned integers. The ith value is the + modification time (mtime) of the ith object in the corresponding + pack by lexicographic (index) order. The mtimes count standard + epoch seconds. + + - A trailer, containing a checksum of the corresponding packfile, + and a checksum of all of the above (each having length according + to the specified hash function). + == multi-pack-index (MIDX) files have the following format: The multi-pack-index files refer to multiple pack-files and loose objects. |
