| gitformat-loose(5) |
| ================== |
| |
| NAME |
| ---- |
| gitformat-loose - Git loose object format |
| |
| |
| SYNOPSIS |
| -------- |
| [verse] |
| $GIT_DIR/objects/[0-9a-f][0-9a-f]/* |
| $GIT_DIR/objects/loose-object-idx |
| $GIT_DIR/objects/loose-map/map-*.map |
| |
| DESCRIPTION |
| ----------- |
| |
| Loose objects are how Git stores individual objects, where every object is |
| written as a separate file. |
| |
| Over the lifetime of a repository, objects are usually written as loose objects |
| initially. Eventually, these loose objects will be compacted into packfiles |
| via repository maintenance to improve disk space usage and speed up the lookup |
| of these objects. |
| |
| == Loose objects |
| |
| Each loose object contains a prefix, followed immediately by the data of the |
| object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`, |
| `tree`, `commit`, or `tag` and `size` is the size of the data (without the |
| prefix) as a decimal integer expressed in ASCII. |
| |
| The entire contents, prefix and data concatenated, is then compressed with zlib |
| and the compressed data is stored in the file. The object ID of the object is |
| the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data. |
| |
| The file for the loose object is stored under the `objects` directory, with the |
| first two hex characters of the object ID being the directory and the remaining |
| characters being the file name. This is done to shard the data and avoid too |
| many files being in one directory, since some file systems perform poorly with |
| many items in a directory. |
| |
| As an example, the empty tree contains the data (when uncompressed) `tree 0\0` |
| and, in a SHA-256 repository, would have the object ID |
| `6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be |
| stored under |
| `$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`. |
| |
| Similarly, a blob containing the contents `abc` would have the uncompressed |
| data of `blob 3\0abc`. |
| |
| == Loose object mapping |
| |
| When the `compatObjectFormat` option is used, Git needs to store a mapping |
| between the repository's main algorithm and the compatibility algorithm. There |
| are two formats for this: the legacy mapping and the modern mapping. |
| |
| === Legacy mapping |
| |
| The compatibility mapping is stored in a file called |
| `$GIT_DIR/objects/loose-object-idx`. The format of this file looks like this: |
| |
| # loose-object-idx |
| (main-name SP compat-name LF)* |
| |
| `main-name` refers to hexadecimal object ID of the object in the main |
| repository format and `compat-name` refers to the same thing, but for the |
| compatibility format. |
| |
| This format is read if it exists but is not written. |
| |
| Note that carriage returns are not permitted in this file, regardless of the |
| host system or configuration. |
| |
| === Modern mapping |
| |
| The modern mapping consists of a set of files under `$GIT_DIR/objects/loose` |
| ending in `.map`. The portion of the filename before the extension is that of |
| the hash checksum in hex format. |
| |
| `git pack-objects` will repack existing entries into one file, removing any |
| unnecessary objects, such as obsolete shallow entries or loose objects that |
| have been packed. |
| |
| ==== Mapping file format |
| |
| - A header appears at the beginning and consists of the following: |
| * A 4-byte mapping signature: `LMAP` |
| * 4-byte version number: 1 |
| * 4-byte length of the header section. |
| * 4-byte number of objects declared in this map file. |
| * 4-byte number of object formats declared in this map file. |
| * For each object format: |
| ** 4-byte format identifier (e.g., `sha1` for SHA-1) |
| ** 4-byte length in bytes of shortened object names. This is the |
| shortest possible length needed to make names in the shortened |
| object name table unambiguous. |
| ** 8-byte integer, recording where tables relating to this format |
| are stored in this index file, as an offset from the beginning. |
| * 8-byte offset to the trailer from the beginning of this file. |
| * Zero or more additional key/value pairs (4-byte key, 4-byte value), which |
| may optionally declare one or more chunks. No chunks are currently |
| defined. Readers must ignore unrecognized keys. |
| - Zero or more NUL bytes. These are used to improve the alignment of the |
| 4-byte quantities below. |
| - Tables for the first object format: |
| * A sorted table of shortened object names. These are prefixes of the names |
| of all objects in this file, packed together without offset values to |
| reduce the cache footprint of the binary search for a specific object name. |
| * A sorted table of full object names. |
| * A table of 4-byte metadata values. |
| * Zero or more chunks. A chunk starts with a four-byte chunk identifier and |
| a four-byte parameter (which, if unneeded, is all zeros) and an eight-byte |
| size (not including the identifier, parameter, or size), plus the chunk |
| data. |
| - Zero or more NUL bytes. |
| - Tables for subsequent object formats: |
| * A sorted table of shortened object names. These are prefixes of the names |
| of all objects in this file, packed together without offset values to |
| reduce the cache footprint of the binary search for a specific object name. |
| * A table of full object names in the order specified by the first object format. |
| * A table of 4-byte values mapping object name order to the order of the |
| first object format. For an object in the table of sorted shortened object |
| names, the value at the corresponding index in this table is the index in |
| the previous table for that same object. |
| * Zero or more NUL bytes. |
| - The trailer consists of the following: |
| * Hash checksum of all of the above. |
| |
| The lower six bits of each metadata table contain a type field indicating the |
| reason that this object is stored: |
| |
| 0:: |
| Reserved. |
| 1:: |
| This object is stored as a loose object in the repository. |
| 2:: |
| This object is a shallow entry. The mapping refers to a shallow value |
| returned by a remote server. |
| 3:: |
| This object is a submodule entry. The mapping refers to the commit stored |
| representing a submodule. |
| |
| Other data may be stored in this field in the future. Bits that are not used |
| must be zero. |
| |
| All 4-byte numbers are in network order and must be 4-byte aligned in the file, |
| so the NUL padding may be required in some cases. |
| |
| Note that the hash at the end of the file is in whatever the repository's main |
| algorithm is. In the usual case when there are multiple algorithms, the main |
| algorithm will be SHA-256 and the compatibility algorithm will be SHA-1. |
| |
| GIT |
| --- |
| Part of the linkgit:git[1] suite |