diff options
| author | Junio C Hamano <gitster@pobox.com> | 2025-10-22 11:38:58 -0700 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2025-10-22 11:38:58 -0700 |
| commit | 98401c10fc9e991b36c4ccf5a270746f123feeb6 (patch) | |
| tree | 83d4cb6fc595ed0db95d680c28178f8679e8af46 /Documentation | |
| parent | c9ccf81948973e9b9632cbb483a3908307092620 (diff) | |
| parent | db00605c13a9f5709da712671df5c7594c06cf31 (diff) | |
| download | git-98401c10fc9e991b36c4ccf5a270746f123feeb6.tar.xz | |
Merge branch 'bc/sha1-256-interop-01'
The beginning of SHA1-SHA256 interoperability work.
* bc/sha1-256-interop-01:
t1010: use BROKEN_OBJECTS prerequisite
t: allow specifying compatibility hash
fsck: consider gpgsig headers expected in tags
rev-parse: allow printing compatibility hash
docs: add documentation for loose objects
docs: improve ambiguous areas of pack format documentation
docs: reflect actual double signature for tags
docs: update offset order for pack index v3
docs: update pack index v3 format
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/Makefile | 1 | ||||
| -rw-r--r-- | Documentation/fsck-msgids.adoc | 6 | ||||
| -rw-r--r-- | Documentation/git-rev-parse.adoc | 11 | ||||
| -rw-r--r-- | Documentation/gitformat-loose.adoc | 53 | ||||
| -rw-r--r-- | Documentation/gitformat-pack.adoc | 19 | ||||
| -rw-r--r-- | Documentation/meson.build | 1 | ||||
| -rw-r--r-- | Documentation/technical/hash-function-transition.adoc | 42 |
7 files changed, 109 insertions, 24 deletions
diff --git a/Documentation/Makefile b/Documentation/Makefile index a3fbd29744..627204928e 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc MAN5_TXT += gitformat-chunk.adoc MAN5_TXT += gitformat-commit-graph.adoc MAN5_TXT += gitformat-index.adoc +MAN5_TXT += gitformat-loose.adoc MAN5_TXT += gitformat-pack.adoc MAN5_TXT += gitformat-signature.adoc MAN5_TXT += githooks.adoc diff --git a/Documentation/fsck-msgids.adoc b/Documentation/fsck-msgids.adoc index 81f11ba125..acac9683af 100644 --- a/Documentation/fsck-msgids.adoc +++ b/Documentation/fsck-msgids.adoc @@ -10,6 +10,12 @@ `badFilemode`:: (INFO) A tree contains a bad filemode entry. +`badGpgsig`:: + (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header. + +`badHeaderContinuation`:: + (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated. + `badName`:: (ERROR) An author/committer name is empty. diff --git a/Documentation/git-rev-parse.adoc b/Documentation/git-rev-parse.adoc index 18383e52af..5398691f3f 100644 --- a/Documentation/git-rev-parse.adoc +++ b/Documentation/git-rev-parse.adoc @@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`: path of the current directory relative to the top-level directory. ---show-object-format[=(storage|input|output)]:: - Show the object format (hash algorithm) used for the repository - for storage inside the `.git` directory, input, or output. For - input, multiple algorithms may be printed, space-separated. - If not specified, the default is "storage". +--show-object-format[=(storage|input|output|compat)]:: + Show the object format (hash algorithm) used for the repository for storage + inside the `.git` directory, input, output, or compatibility. For input, + multiple algorithms may be printed, space-separated. If `compat` is + requested and no compatibility algorithm is enabled, prints an empty line. If + not specified, the default is "storage". --show-ref-format:: Show the reference storage format used for the repository. diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc new file mode 100644 index 0000000000..947993663e --- /dev/null +++ b/Documentation/gitformat-loose.adoc @@ -0,0 +1,53 @@ +gitformat-loose(5) +================== + +NAME +---- +gitformat-loose - Git loose object format + + +SYNOPSIS +-------- +[verse] +$GIT_DIR/objects/[0-9a-f][0-9a-f]/* + +DESCRIPTION +----------- + +Loose objects are how Git stores individual objects, where every object is +written as a separate file. + +Over the lifetime of a repository, objects are usually written as loose objects +initially. Eventually, these loose objects will be compacted into packfiles +via repository maintenance to improve disk space usage and speed up the lookup +of these objects. + +== Loose objects + +Each loose object contains a prefix, followed immediately by the data of the +object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`, +`tree`, `commit`, or `tag` and `size` is the size of the data (without the +prefix) as a decimal integer expressed in ASCII. + +The entire contents, prefix and data concatenated, is then compressed with zlib +and the compressed data is stored in the file. The object ID of the object is +the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data. + +The file for the loose object is stored under the `objects` directory, with the +first two hex characters of the object ID being the directory and the remaining +characters being the file name. This is done to shard the data and avoid too +many files being in one directory, since some file systems perform poorly with +many items in a directory. + +As an example, the empty tree contains the data (when uncompressed) `tree 0\0` +and, in a SHA-256 repository, would have the object ID +`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be +stored under +`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`. + +Similarly, a blob containing the contents `abc` would have the uncompressed +data of `blob 3\0abc`. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Documentation/gitformat-pack.adoc b/Documentation/gitformat-pack.adoc index d6ae229be5..1b4db4aa61 100644 --- a/Documentation/gitformat-pack.adoc +++ b/Documentation/gitformat-pack.adoc @@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums, and object IDs (object names) mentioned below are all computed using SHA-1. Similarly, in SHA-256 repositories, these values are computed using SHA-256. +CRC32 checksums are always computed over the entire packed object, including +the header (n-byte type and length); the base object name or offset, if any; +and the entire compressed object. The CRC32 algorithm used is that of zlib. + == pack-*.pack files have the following format: - A header appears at the beginning and consists of the following: @@ -80,6 +84,16 @@ Valid object types are: Type 5 is reserved for future expansion. Type 0 is invalid. +=== Object encoding + +Unlike loose objects, packed objects do not have a prefix containing the type, +size, and a NUL byte. These are not necessary because they can be determined by +the n-byte type and length that prefixes the data and so they are omitted from +the compressed and deltified data. + +The computation of the object ID still uses this prefix by reconstructing it +from the type and length as needed. + === Size encoding This document uses the following "size encoding" of non-negative @@ -92,6 +106,11 @@ values are more significant. This size encoding should not be confused with the "offset encoding", which is also used in this document. +When encoding the size of an undeltified object in a pack, the size is that of +the uncompressed raw object. For deltified objects, it is the size of the +uncompressed delta. The base object name or offset is not included in the size +computation. + === Deltified representation Conceptually there are only four object types: commit, tree, tag and diff --git a/Documentation/meson.build b/Documentation/meson.build index 44f94cdb7b..9d24f2da54 100644 --- a/Documentation/meson.build +++ b/Documentation/meson.build @@ -173,6 +173,7 @@ manpages = { 'gitformat-chunk.adoc' : 5, 'gitformat-commit-graph.adoc' : 5, 'gitformat-index.adoc' : 5, + 'gitformat-loose.adoc' : 5, 'gitformat-pack.adoc' : 5, 'gitformat-signature.adoc' : 5, 'githooks.adoc' : 5, diff --git a/Documentation/technical/hash-function-transition.adoc b/Documentation/technical/hash-function-transition.adoc index f047fd80ca..2359d7d106 100644 --- a/Documentation/technical/hash-function-transition.adoc +++ b/Documentation/technical/hash-function-transition.adoc @@ -227,9 +227,9 @@ network byte order): ** 4-byte length in bytes of shortened object names. This is the shortest possible length needed to make names in the shortened object name table unambiguous. - ** 4-byte integer, recording where tables relating to this format + ** 8-byte integer, recording where tables relating to this format are stored in this index file, as an offset from the beginning. - * 4-byte offset to the trailer from the beginning of this file. + * 8-byte offset to the trailer from the beginning of this file. * Zero or more additional key/value pairs (4-byte key, 4-byte value). Only one key is supported: 'PSRC'. See the "Loose objects and unreachable objects" section for supported values and how this @@ -260,12 +260,10 @@ network byte order): compressed data to be copied directly from pack to pack during repacking without undetected data corruption. - * A table of 4-byte offset values. For an object in the table of - sorted shortened object names, the value at the corresponding - index in this table indicates where that object can be found in - the pack file. These are usually 31-bit pack file offsets, but - large offsets are encoded as an index into the next table with the - most significant bit set. + * A table of 4-byte offset values. The index of this table in pack order + indicates where that object can be found in the pack file. These are + usually 31-bit pack file offsets, but large offsets are encoded as + an index into the next table with the most significant bit set. * A table of 8-byte offset entries (empty for pack files less than 2 GiB). Pack files are organized with heavily used objects toward @@ -276,10 +274,14 @@ network byte order): up to and not including the table of CRC32 values. - Zero or more NUL bytes. - The trailer consists of the following: - * A copy of the 20-byte SHA-256 checksum at the end of the + * A copy of the full main hash checksum at the end of the corresponding packfile. - * 20-byte SHA-256 checksum of all of the above. + * Full main hash checksum of all of the above. + +The "full main hash" is a full-length hash of the main (not compatibility) +algorithm in the repository. Thus, if the main algorithm is SHA-256, this is +a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash. Loose object index ~~~~~~~~~~~~~~~~~~ @@ -427,17 +429,19 @@ ordinary unsigned commit. Signed Tags ~~~~~~~~~~~ -We add a new field "gpgsig-sha256" to the tag object format to allow -signing tags without relying on SHA-1. Its signed payload is the -SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP -SIGNATURE-----" delimited in-body signature removed. +We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to +allow signing tags in both formats. The in-body signature is used for the +signature in the current hash algorithm and the header is used for the +signature in the other algorithm. Thus, a dual-signature tag will contain both +an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an +object or both an in-body signature and a gpgsig header for the SHA-256 format +of and object. -This means tags can be signed +The signed payload of the tag is the content of the tag in the current +algorithm with both its gpgsig and gpgsig-sha256 fields and +"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed. -1. using SHA-1 only, as in existing signed tag objects -2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body - signature. -3. using only SHA-256, by only using the gpgsig-sha256 field. +This means tags can be signed using one or both algorithms. Mergetag embedding ~~~~~~~~~~~~~~~~~~ |
