rust: add a new binary object map format

Our current loose object format has a few problems. First, it is not efficient: the list of object IDs is not sorted and even if it were, there would not be an efficient way to look up objects in both algorithms. Second, we need to store mappings for things which are not technically loose objects but are not packed objects, either, and so cannot be stored in a pack index. These kinds of things include shallows, their parents, and their trees, as well as submodules. Yet we also need to implement a sensible way to store the kind of object so that we can prune unneeded entries. For instance, if the user has updated the shallows, we can remove the old values. For these reasons, introduce a new binary object map format. The careful reader will notice that it resembles very closely the pack index v3 format. Add an in-memory object map as well, and allow writing to a batched map, which can then be written later as one of the binary object maps. Include several tests for round tripping and data lookup across algorithms. Note that the use of this code elsewhere in Git will involve some C code and some C-compatible code in Rust that will be introduced in a future commit. Thus, for example, we ignore the fact that if there is no current batch and the caller asks for data to be written, this code does nothing, mostly because this code also does not involve itself with opening or manipulating files. The C code that we will add later will implement this functionality at a higher level and take care of this, since the code which is necessary for writing to the object store is deeply involved with our C abstractions and it would require extensive work (which would not be especially valuable at this point) to port those to Rust. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: brian m. carlson <sandals@crustytoothpaste.net> 2026-02-07 20:04:44 +0000
committer: Junio C Hamano <gitster@pobox.com> 2026-02-07 17:41:03 -0800
commit: 40a1b4fb2bfc6a3af608655e2e55435912f5550a (patch)
tree: bb049da02df66b361ccad840a738c0647ad7017f /Documentation/gitformat-loose.adoc
parent: 0c04f2621ed7cbdff8b11f680fb66e1a9807f5b1 (diff)
download: git-40a1b4fb2bfc6a3af608655e2e55435912f5550a.tar.xz
1 files changed, 78 insertions, 0 deletions
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc
index 947993663e..b0b569761b 100644
--- a/Documentation/gitformat-loose.adoc
+++ b/Documentation/gitformat-loose.adoc
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 $GIT_DIR/objects/[0-9a-f][0-9a-f]/*
+$GIT_DIR/objects/object-map/map-*.map
 
 DESCRIPTION
 -----------
@@ -48,6 +49,83 @@ stored under
 Similarly, a blob containing the contents `abc` would have the uncompressed
 data of `blob 3\0abc`.
 
+== Loose object mapping
+
+When the `compatObjectFormat` option is used, Git needs to store a mapping
+between the repository's main algorithm and the compatibility algorithm for
+loose objects as well as some auxiliary information.
+
+The mapping consists of a set of files under `$GIT_DIR/objects/object-map`
+ending in `.map`.  The portion of the filename before the extension is that of
+the main hash checksum (that is, the one specified in
+`extensions.objectformat`) in hex format.
+
+`git gc` will repack existing entries into one file, removing any unnecessary
+objects, such as obsolete shallow entries or loose objects that have been
+packed.
+
+The file format is as follows.  All values are in network byte order and all
+4-byte and 8-byte values must be 4-byte aligned in the file, so the NUL padding
+may be required in some cases.  Git always uses the smallest number of NUL
+bytes (including zero) that is required for the padding in order to make
+writing files deterministic.
+
+- A header appears at the beginning and consists of the following:
+	* A 4-byte mapping signature: `LMAP`
+	* 4-byte version number: 1
+	* 4-byte length of the header section (including reserved entries but
+		excluding any NUL padding).
+	* 4-byte number of objects declared in this map file.
+	* 4-byte number of object formats declared in this map file.
+	* For each object format:
+		** 4-byte format identifier (e.g., `sha1` for SHA-1)
+		** 4-byte length in bytes of shortened object names (that is, prefixes of
+			 the full object names). This is the shortest possible length needed to
+			 make names in the shortened object name table unambiguous.
+		** 8-byte integer, recording where tables relating to this format
+		are stored in this index file, as an offset from the beginning.
+	* 8-byte offset to the trailer from the beginning of this file.
+	* The remainder of the header section is reserved for future use.
+		Readers must ignore unrecognized data here.
+- Zero or more NUL bytes.  These are used to improve the alignment of the
+	4-byte quantities below.
+- Tables for the first object format:
+	* A sorted table of shortened object names.  These are prefixes of the names
+		of all objects in this file, packed together to reduce the cache footprint
+		of the binary search for a specific object name.
+  * A sorted table of full object names.
+	* A table of 4-byte metadata values.
+- Zero or more NUL bytes.
+- Tables for subsequent object formats:
+	* A sorted table of shortened object names.  These are prefixes of the names
+		of all objects in this file, packed together without offset values to
+		reduce the cache footprint of the binary search for a specific object name.
+	* A table of full object names in the order specified by the first object format.
+	* A table of 4-byte values mapping object name order to the order of the
+		first object format. For an object in the table of sorted shortened object
+		names, the value at the corresponding index in this table is the index in
+		the previous table for that same object.
+	* Zero or more NUL bytes.
+- The trailer consists of the following:
+	* Hash checksum of all of the above using the main hash.
+
+The lower six bits of each metadata table contain a type field indicating the
+reason that this object is stored:
+
+0::
+	Reserved.
+1::
+	This object is stored as a loose object in the repository.
+2::
+	This object is a shallow entry.  The mapping refers to a shallow value
+	returned by a remote server.
+3::
+	This object is a submodule entry.  The mapping refers to the commit stored
+	representing a submodule.
+
+Other data may be stored in this field in the future.  Bits that are not used
+must be zero.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
author	brian m. carlson <sandals@crustytoothpaste.net>	2026-02-07 20:04:44 +0000
committer	Junio C Hamano <gitster@pobox.com>	2026-02-07 17:41:03 -0800
commit	40a1b4fb2bfc6a3af608655e2e55435912f5550a (patch)
tree	bb049da02df66b361ccad840a738c0647ad7017f /Documentation/gitformat-loose.adoc
parent	0c04f2621ed7cbdff8b11f680fb66e1a9807f5b1 (diff)
download	git-40a1b4fb2bfc6a3af608655e2e55435912f5550a.tar.xz