aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2025-10-22 11:38:58 -0700
committerJunio C Hamano <gitster@pobox.com>2025-10-22 11:38:58 -0700
commit98401c10fc9e991b36c4ccf5a270746f123feeb6 (patch)
tree83d4cb6fc595ed0db95d680c28178f8679e8af46 /Documentation
parentc9ccf81948973e9b9632cbb483a3908307092620 (diff)
parentdb00605c13a9f5709da712671df5c7594c06cf31 (diff)
downloadgit-98401c10fc9e991b36c4ccf5a270746f123feeb6.tar.xz
Merge branch 'bc/sha1-256-interop-01'
The beginning of SHA1-SHA256 interoperability work. * bc/sha1-256-interop-01: t1010: use BROKEN_OBJECTS prerequisite t: allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Makefile1
-rw-r--r--Documentation/fsck-msgids.adoc6
-rw-r--r--Documentation/git-rev-parse.adoc11
-rw-r--r--Documentation/gitformat-loose.adoc53
-rw-r--r--Documentation/gitformat-pack.adoc19
-rw-r--r--Documentation/meson.build1
-rw-r--r--Documentation/technical/hash-function-transition.adoc42
7 files changed, 109 insertions, 24 deletions
diff --git a/Documentation/Makefile b/Documentation/Makefile
index a3fbd29744..627204928e 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
MAN5_TXT += gitformat-chunk.adoc
MAN5_TXT += gitformat-commit-graph.adoc
MAN5_TXT += gitformat-index.adoc
+MAN5_TXT += gitformat-loose.adoc
MAN5_TXT += gitformat-pack.adoc
MAN5_TXT += gitformat-signature.adoc
MAN5_TXT += githooks.adoc
diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc
index 81f11ba125..acac9683af 100644
--- a/Documentation/fsck-msgids.adoc
+++ b/Documentation/fsck-msgids.adoc
@@ -10,6 +10,12 @@
`badFilemode`::
(INFO) A tree contains a bad filemode entry.
+`badGpgsig`::
+ (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
+
+`badHeaderContinuation`::
+ (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
+
`badName`::
(ERROR) An author/committer name is empty.
diff --git a/Documentation/git-rev-parse.adoc b/Documentation/git-rev-parse.adoc
index 18383e52af..5398691f3f 100644
--- a/Documentation/git-rev-parse.adoc
+++ b/Documentation/git-rev-parse.adoc
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
path of the current directory relative to the top-level
directory.
---show-object-format[=(storage|input|output)]::
- Show the object format (hash algorithm) used for the repository
- for storage inside the `.git` directory, input, or output. For
- input, multiple algorithms may be printed, space-separated.
- If not specified, the default is "storage".
+--show-object-format[=(storage|input|output|compat)]::
+ Show the object format (hash algorithm) used for the repository for storage
+ inside the `.git` directory, input, output, or compatibility. For input,
+ multiple algorithms may be printed, space-separated. If `compat` is
+ requested and no compatibility algorithm is enabled, prints an empty line. If
+ not specified, the default is "storage".
--show-ref-format::
Show the reference storage format used for the repository.
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
new file mode 100644
index 0000000000..947993663e
--- /dev/null
+++ b/Documentation/gitformat-loose.adoc
@@ -0,0 +1,53 @@
+gitformat-loose(5)
+==================
+
+NAME
+----
+gitformat-loose - Git loose object format
+
+
+SYNOPSIS
+--------
+[verse]
+$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
+
+DESCRIPTION
+-----------
+
+Loose objects are how Git stores individual objects, where every object is
+written as a separate file.
+
+Over the lifetime of a repository, objects are usually written as loose objects
+initially. Eventually, these loose objects will be compacted into packfiles
+via repository maintenance to improve disk space usage and speed up the lookup
+of these objects.
+
+== Loose objects
+
+Each loose object contains a prefix, followed immediately by the data of the
+object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
+`tree`, `commit`, or `tag` and `size` is the size of the data (without the
+prefix) as a decimal integer expressed in ASCII.
+
+The entire contents, prefix and data concatenated, is then compressed with zlib
+and the compressed data is stored in the file. The object ID of the object is
+the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
+
+The file for the loose object is stored under the `objects` directory, with the
+first two hex characters of the object ID being the directory and the remaining
+characters being the file name. This is done to shard the data and avoid too
+many files being in one directory, since some file systems perform poorly with
+many items in a directory.
+
+As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
+and, in a SHA-256 repository, would have the object ID
+`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
+stored under
+`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
+
+Similarly, a blob containing the contents `abc` would have the uncompressed
+data of `blob 3\0abc`.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc
index d6ae229be5..1b4db4aa61 100644
--- a/Documentation/gitformat-pack.adoc
+++ b/Documentation/gitformat-pack.adoc
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
+CRC32 checksums are always computed over the entire packed object, including
+the header (n-byte type and length); the base object name or offset, if any;
+and the entire compressed object. The CRC32 algorithm used is that of zlib.
+
== pack-*.pack files have the following format:
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,16 @@ Valid object types are:
Type 5 is reserved for future expansion. Type 0 is invalid.
+=== Object encoding
+
+Unlike loose objects, packed objects do not have a prefix containing the type,
+size, and a NUL byte. These are not necessary because they can be determined by
+the n-byte type and length that prefixes the data and so they are omitted from
+the compressed and deltified data.
+
+The computation of the object ID still uses this prefix by reconstructing it
+from the type and length as needed.
+
=== Size encoding
This document uses the following "size encoding" of non-negative
@@ -92,6 +106,11 @@ values are more significant.
This size encoding should not be confused with the "offset encoding",
which is also used in this document.
+When encoding the size of an undeltified object in a pack, the size is that of
+the uncompressed raw object. For deltified objects, it is the size of the
+uncompressed delta. The base object name or offset is not included in the size
+computation.
+
=== Deltified representation
Conceptually there are only four object types: commit, tree, tag and
diff --git a/Documentation/meson.build b/Documentation/meson.build
index 44f94cdb7b..9d24f2da54 100644
--- a/Documentation/meson.build
+++ b/Documentation/meson.build
@@ -173,6 +173,7 @@ manpages = {
'gitformat-chunk.adoc' : 5,
'gitformat-commit-graph.adoc' : 5,
'gitformat-index.adoc' : 5,
+ 'gitformat-loose.adoc' : 5,
'gitformat-pack.adoc' : 5,
'gitformat-signature.adoc' : 5,
'githooks.adoc' : 5,
diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc
index f047fd80ca..2359d7d106 100644
--- a/Documentation/technical/hash-function-transition.adoc
+++ b/Documentation/technical/hash-function-transition.adoc
@@ -227,9 +227,9 @@ network byte order):
** 4-byte length in bytes of shortened object names. This is the
shortest possible length needed to make names in the shortened
object name table unambiguous.
- ** 4-byte integer, recording where tables relating to this format
+ ** 8-byte integer, recording where tables relating to this format
are stored in this index file, as an offset from the beginning.
- * 4-byte offset to the trailer from the beginning of this file.
+ * 8-byte offset to the trailer from the beginning of this file.
* Zero or more additional key/value pairs (4-byte key, 4-byte
value). Only one key is supported: 'PSRC'. See the "Loose objects
and unreachable objects" section for supported values and how this
@@ -260,12 +260,10 @@ network byte order):
compressed data to be copied directly from pack to pack during
repacking without undetected data corruption.
- * A table of 4-byte offset values. For an object in the table of
- sorted shortened object names, the value at the corresponding
- index in this table indicates where that object can be found in
- the pack file. These are usually 31-bit pack file offsets, but
- large offsets are encoded as an index into the next table with the
- most significant bit set.
+ * A table of 4-byte offset values. The index of this table in pack order
+ indicates where that object can be found in the pack file. These are
+ usually 31-bit pack file offsets, but large offsets are encoded as
+ an index into the next table with the most significant bit set.
* A table of 8-byte offset entries (empty for pack files less than
2 GiB). Pack files are organized with heavily used objects toward
@@ -276,10 +274,14 @@ network byte order):
up to and not including the table of CRC32 values.
- Zero or more NUL bytes.
- The trailer consists of the following:
- * A copy of the 20-byte SHA-256 checksum at the end of the
+ * A copy of the full main hash checksum at the end of the
corresponding packfile.
- * 20-byte SHA-256 checksum of all of the above.
+ * Full main hash checksum of all of the above.
+
+The "full main hash" is a full-length hash of the main (not compatibility)
+algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
+a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
Loose object index
~~~~~~~~~~~~~~~~~~
@@ -427,17 +429,19 @@ ordinary unsigned commit.
Signed Tags
~~~~~~~~~~~
-We add a new field "gpgsig-sha256" to the tag object format to allow
-signing tags without relying on SHA-1. Its signed payload is the
-SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
-SIGNATURE-----" delimited in-body signature removed.
+We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
+allow signing tags in both formats. The in-body signature is used for the
+signature in the current hash algorithm and the header is used for the
+signature in the other algorithm. Thus, a dual-signature tag will contain both
+an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
+object or both an in-body signature and a gpgsig header for the SHA-256 format
+of and object.
-This means tags can be signed
+The signed payload of the tag is the content of the tag in the current
+algorithm with both its gpgsig and gpgsig-sha256 fields and
+"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
-1. using SHA-1 only, as in existing signed tag objects
-2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
- signature.
-3. using only SHA-256, by only using the gpgsig-sha256 field.
+This means tags can be signed using one or both algorithms.
Mergetag embedding
~~~~~~~~~~~~~~~~~~