aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2022-06-03 14:30:37 -0700
committerJunio C Hamano <gitster@pobox.com>2022-06-03 14:30:37 -0700
commita50036da1a39806a8ae1aba2e2f2fea6f7fb8e08 (patch)
tree1e6c184852ee5a5e5860073c04ea73e034b0060f /Documentation
parent37d4ae58efcc9f716435f327c39d5552aedb4b7c (diff)
parenta613164257b46700ca583bdcab160c712ad392fe (diff)
downloadgit-a50036da1a39806a8ae1aba2e2f2fea6f7fb8e08.tar.xz
Merge branch 'tb/cruft-packs'
A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Makefile1
-rw-r--r--Documentation/config/gc.txt21
-rw-r--r--Documentation/config/repack.txt9
-rw-r--r--Documentation/git-gc.txt5
-rw-r--r--Documentation/git-pack-objects.txt30
-rw-r--r--Documentation/git-repack.txt11
-rw-r--r--Documentation/technical/cruft-packs.txt123
-rw-r--r--Documentation/technical/pack-format.txt19
8 files changed, 212 insertions, 7 deletions
diff --git a/Documentation/Makefile b/Documentation/Makefile
index d3f043f50d..f2e7fc1daa 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -95,6 +95,7 @@ TECH_DOCS += MyFirstObjectWalk
TECH_DOCS += SubmittingPatches
TECH_DOCS += ToolsForGit
TECH_DOCS += technical/bundle-format
+TECH_DOCS += technical/cruft-packs
TECH_DOCS += technical/hash-function-transition
TECH_DOCS += technical/http-protocol
TECH_DOCS += technical/index-format
diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt
index c834e07991..38fea076a2 100644
--- a/Documentation/config/gc.txt
+++ b/Documentation/config/gc.txt
@@ -81,14 +81,21 @@ gc.packRefs::
to enable it within all non-bare repos or it can be set to a
boolean value. The default is `true`.
+gc.cruftPacks::
+ Store unreachable objects in a cruft pack (see
+ linkgit:git-repack[1]) instead of as loose objects. The default
+ is `false`.
+
gc.pruneExpire::
- When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'.
- Override the grace period with this config variable. The value
- "now" may be used to disable this grace period and always prune
- unreachable objects immediately, or "never" may be used to
- suppress pruning. This feature helps prevent corruption when
- 'git gc' runs concurrently with another process writing to the
- repository; see the "NOTES" section of linkgit:git-gc[1].
+ When 'git gc' is run, it will call 'prune --expire 2.weeks.ago'
+ (and 'repack --cruft --cruft-expiration 2.weeks.ago' if using
+ cruft packs via `gc.cruftPacks` or `--cruft`). Override the
+ grace period with this config variable. The value "now" may be
+ used to disable this grace period and always prune unreachable
+ objects immediately, or "never" may be used to suppress pruning.
+ This feature helps prevent corruption when 'git gc' runs
+ concurrently with another process writing to the repository; see
+ the "NOTES" section of linkgit:git-gc[1].
gc.worktreePruneExpire::
When 'git gc' is run, it calls
diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
index 41ac6953c8..c79af6d7b8 100644
--- a/Documentation/config/repack.txt
+++ b/Documentation/config/repack.txt
@@ -30,3 +30,12 @@ repack.updateServerInfo::
If set to false, linkgit:git-repack[1] will not run
linkgit:git-update-server-info[1]. Defaults to true. Can be overridden
when true by the `-n` option of linkgit:git-repack[1].
+
+repack.cruftWindow::
+repack.cruftWindowMemory::
+repack.cruftDepth::
+repack.cruftThreads::
+ Parameters used by linkgit:git-pack-objects[1] when generating
+ a cruft pack and the respective parameters are not given over
+ the command line. See similarly named `pack.*` configuration
+ variables for defaults and meaning.
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 853967dea0..ba4e67700e 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -54,6 +54,11 @@ other housekeeping tasks (e.g. rerere, working trees, reflog...) will
be performed as well.
+--cruft::
+ When expiring unreachable objects, pack them separately into a
+ cruft pack instead of storing the loose objects as loose
+ objects.
+
--prune=<date>::
Prune loose objects older than date (default is 2 weeks ago,
overridable by the config variable `gc.pruneExpire`).
diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index f8344e1e5b..a9995a932c 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -13,6 +13,7 @@ SYNOPSIS
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
+ [--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
@@ -95,6 +96,35 @@ base-name::
Incompatible with `--revs`, or options that imply `--revs` (such as
`--all`), with the exception of `--unpacked`, which is compatible.
+--cruft::
+ Packs unreachable objects into a separate "cruft" pack, denoted
+ by the existence of a `.mtimes` file. Typically used by `git
+ repack --cruft`. Callers provide a list of pack names and
+ indicate which packs will remain in the repository, along with
+ which packs will be deleted (indicated by the `-` prefix). The
+ contents of the cruft pack are all objects not contained in the
+ surviving packs which have not exceeded the grace period (see
+ `--cruft-expiration` below), or which have exceeded the grace
+ period, but are reachable from an other object which hasn't.
++
+When the input lists a pack containing all reachable objects (and lists
+all other packs as pending deletion), the corresponding cruft pack will
+contain all unreachable objects (with mtime newer than the
+`--cruft-expiration`) along with any unreachable objects whose mtime is
+older than the `--cruft-expiration`, but are reachable from an
+unreachable object whose mtime is newer than the `--cruft-expiration`).
++
+Incompatible with `--unpack-unreachable`, `--keep-unreachable`,
+`--pack-loose-unreachable`, `--stdin-packs`, as well as any other
+options which imply `--revs`. Also incompatible with `--max-pack-size`;
+when this option is set, the maximum pack size is not inferred from
+`pack.packSizeLimit`.
+
+--cruft-expiration=<approxidate>::
+ If specified, objects are eliminated from the cruft pack if they
+ have an mtime older than `<approxidate>`. If unspecified (and
+ given `--cruft`), then no objects are eliminated.
+
--window=<n>::
--depth=<n>::
These two options affect how the objects contained in
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ee30edc178..0bf13893d8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -63,6 +63,17 @@ to the new separate pack will be written.
Also run 'git prune-packed' to remove redundant
loose object files.
+--cruft::
+ Same as `-a`, unless `-d` is used. Then any unreachable objects
+ are packed into a separate cruft pack. Unreachable objects can
+ be pruned using the normal expiry rules with the next `git gc`
+ invocation (see linkgit:git-gc[1]). Incompatible with `-k`.
+
+--cruft-expiration=<approxidate>::
+ Expire unreachable objects older than `<approxidate>`
+ immediately instead of waiting for the next `git gc` invocation.
+ Only useful with `--cruft -d`.
+
-l::
Pass the `--local` option to 'git pack-objects'. See
linkgit:git-pack-objects[1].
diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt
new file mode 100644
index 0000000000..d81f3a8982
--- /dev/null
+++ b/Documentation/technical/cruft-packs.txt
@@ -0,0 +1,123 @@
+= Cruft packs
+
+The cruft packs feature offer an alternative to Git's traditional mechanism of
+removing unreachable objects. This document provides an overview of Git's
+pruning mechanism, and how a cruft pack can be used instead to accomplish the
+same.
+
+== Background
+
+To remove unreachable objects from your repository, Git offers `git repack -Ad`
+(see linkgit:git-repack[1]). Quoting from the documentation:
+
+[quote]
+[...] unreachable objects in a previous pack become loose, unpacked objects,
+instead of being left in the old pack. [...] loose unreachable objects will be
+pruned according to normal expiry rules with the next 'git gc' invocation.
+
+Unreachable objects aren't removed immediately, since doing so could race with
+an incoming push which may reference an object which is about to be deleted.
+Instead, those unreachable objects are stored as loose objects and stay that way
+until they are older than the expiration window, at which point they are removed
+by linkgit:git-prune[1].
+
+Git must store these unreachable objects loose in order to keep track of their
+per-object mtimes. If these unreachable objects were written into one big pack,
+then either freshening that pack (because an object contained within it was
+re-written) or creating a new pack of unreachable objects would cause the pack's
+mtime to get updated, and the objects within it would never leave the expiration
+window. Instead, objects are stored loose in order to keep track of the
+individual object mtimes and avoid a situation where all cruft objects are
+freshened at once.
+
+This can lead to undesirable situations when a repository contains many
+unreachable objects which have not yet left the grace period. Having large
+directories in the shards of `.git/objects` can lead to decreased performance in
+the repository. But given enough unreachable objects, this can lead to inode
+starvation and degrade the performance of the whole system. Since we
+can never pack those objects, these repositories often take up a large amount of
+disk space, since we can only zlib compress them, but not store them in delta
+chains.
+
+== Cruft packs
+
+A cruft pack eliminates the need for storing unreachable objects in a loose
+state by including the per-object mtimes in a separate file alongside a single
+pack containing all loose objects.
+
+A cruft pack is written by `git repack --cruft` when generating a new pack.
+linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
+is a classic all-into-one repack, meaning that everything in the resulting pack is
+reachable, and everything else is unreachable. Once written, the `--cruft`
+option instructs `git repack` to generate another pack containing only objects
+not packed in the previous step (which equates to packing all unreachable
+objects together). This progresses as follows:
+
+ 1. Enumerate every object, marking any object which is (a) not contained in a
+ kept-pack, and (b) whose mtime is within the grace period as a traversal
+ tip.
+
+ 2. Perform a reachability traversal based on the tips gathered in the previous
+ step, adding every object along the way to the pack.
+
+ 3. Write the pack out, along with a `.mtimes` file that records the per-object
+ timestamps.
+
+This mode is invoked internally by linkgit:git-repack[1] when instructed to
+write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
+of packs which will not be deleted by the repack; in other words, they contain
+all of the repository's reachable objects.
+
+When a repository already has a cruft pack, `git repack --cruft` typically only
+adds objects to it. An exception to this is when `git repack` is given the
+`--cruft-expiration` option, which allows the generated cruft pack to omit
+expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
+later on.
+
+It is linkgit:git-gc[1] that is typically responsible for removing expired
+unreachable objects.
+
+== Caution for mixed-version environments
+
+Repositories that have cruft packs in them will continue to work with any older
+version of Git. Note, however, that previous versions of Git which do not
+understand the `.mtimes` file will use the cruft pack's mtime as the mtime for
+all of the objects in it. In other words, do not expect older (pre-cruft pack)
+versions of Git to interpret or even read the contents of the `.mtimes` file.
+
+Note that having mixed versions of Git GC-ing the same repository can lead to
+unreachable objects never being completely pruned. This can happen under the
+following circumstances:
+
+ - An older version of Git running GC explodes the contents of an existing
+ cruft pack loose, using the cruft pack's mtime.
+ - A newer version running GC collects those loose objects into a cruft pack,
+ where the .mtime file reflects the loose object's actual mtimes, but the
+ cruft pack mtime is "now".
+
+Repeating this process will lead to unreachable objects not getting pruned as a
+result of repeatedly resetting the objects' mtimes to the present time.
+
+If you are GC-ing repositories in a mixed version environment, consider omitting
+the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and
+leaving the `gc.cruftPacks` configuration unset until all writers understand
+cruft packs.
+
+== Alternatives
+
+Notable alternatives to this design include:
+
+ - The location of the per-object mtime data, and
+ - Storing unreachable objects in multiple cruft packs.
+
+On the location of mtime data, a new auxiliary file tied to the pack was chosen
+to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
+support for optional chunks of data, it may make sense to consolidate the
+`.mtimes` format into the `.idx` itself.
+
+Storing unreachable objects among multiple cruft packs (e.g., creating a new
+cruft pack during each repacking operation including only unreachable objects
+which aren't already stored in an earlier cruft pack) is significantly more
+complicated to construct, and so aren't pursued here. The obvious drawback to
+the current implementation is that the entire cruft pack must be re-written from
+scratch.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 6d3efb7d16..b520aa9c45 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -294,6 +294,25 @@ Pack file entry: <+
All 4-byte numbers are in network order.
+== pack-*.mtimes files have the format:
+
+All 4-byte numbers are in network byte order.
+
+ - A 4-byte magic number '0x4d544d45' ('MTME').
+
+ - A 4-byte version identifier (= 1).
+
+ - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
+
+ - A table of 4-byte unsigned integers. The ith value is the
+ modification time (mtime) of the ith object in the corresponding
+ pack by lexicographic (index) order. The mtimes count standard
+ epoch seconds.
+
+ - A trailer, containing a checksum of the corresponding packfile,
+ and a checksum of all of the above (each having length according
+ to the specified hash function).
+
== multi-pack-index (MIDX) files have the following format:
The multi-pack-index files refer to multiple pack-files and loose objects.