|  | Git index format | 
|  | ================ | 
|  |  | 
|  | == The Git index file has the following format | 
|  |  | 
|  | All binary numbers are in network byte order. Version 2 is described | 
|  | here unless stated otherwise. | 
|  |  | 
|  | - A 12-byte header consisting of | 
|  |  | 
|  | 4-byte signature: | 
|  | The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") | 
|  |  | 
|  | 4-byte version number: | 
|  | The current supported versions are 2, 3 and 4. | 
|  |  | 
|  | 32-bit number of index entries. | 
|  |  | 
|  | - A number of sorted index entries (see below). | 
|  |  | 
|  | - Extensions | 
|  |  | 
|  | Extensions are identified by signature. Optional extensions can | 
|  | be ignored if Git does not understand them. | 
|  |  | 
|  | Git currently supports cached tree and resolve undo extensions. | 
|  |  | 
|  | 4-byte extension signature. If the first byte is 'A'..'Z' the | 
|  | extension is optional and can be ignored. | 
|  |  | 
|  | 32-bit size of the extension | 
|  |  | 
|  | Extension data | 
|  |  | 
|  | - 160-bit SHA-1 over the content of the index file before this | 
|  | checksum. | 
|  |  | 
|  | == Index entry | 
|  |  | 
|  | Index entries are sorted in ascending order on the name field, | 
|  | interpreted as a string of unsigned bytes (i.e. memcmp() order, no | 
|  | localization, no special casing of directory separator '/'). Entries | 
|  | with the same name are sorted by their stage field. | 
|  |  | 
|  | 32-bit ctime seconds, the last time a file's metadata changed | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit ctime nanosecond fractions | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit mtime seconds, the last time a file's data changed | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit mtime nanosecond fractions | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit dev | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit ino | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit mode, split into (high to low bits) | 
|  |  | 
|  | 4-bit object type | 
|  | valid values in binary are 1000 (regular file), 1010 (symbolic link) | 
|  | and 1110 (gitlink) | 
|  |  | 
|  | 3-bit unused | 
|  |  | 
|  | 9-bit unix permission. Only 0755 and 0644 are valid for regular files. | 
|  | Symbolic links and gitlinks have value 0 in this field. | 
|  |  | 
|  | 32-bit uid | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit gid | 
|  | this is stat(2) data | 
|  |  | 
|  | 32-bit file size | 
|  | This is the on-disk size from stat(2), truncated to 32-bit. | 
|  |  | 
|  | 160-bit SHA-1 for the represented object | 
|  |  | 
|  | A 16-bit 'flags' field split into (high to low bits) | 
|  |  | 
|  | 1-bit assume-valid flag | 
|  |  | 
|  | 1-bit extended flag (must be zero in version 2) | 
|  |  | 
|  | 2-bit stage (during merge) | 
|  |  | 
|  | 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF | 
|  | is stored in this field. | 
|  |  | 
|  | (Version 3 or later) A 16-bit field, only applicable if the | 
|  | "extended flag" above is 1, split into (high to low bits). | 
|  |  | 
|  | 1-bit reserved for future | 
|  |  | 
|  | 1-bit skip-worktree flag (used by sparse checkout) | 
|  |  | 
|  | 1-bit intent-to-add flag (used by "git add -N") | 
|  |  | 
|  | 13-bit unused, must be zero | 
|  |  | 
|  | Entry path name (variable length) relative to top level directory | 
|  | (without leading slash). '/' is used as path separator. The special | 
|  | path components ".", ".." and ".git" (without quotes) are disallowed. | 
|  | Trailing slash is also disallowed. | 
|  |  | 
|  | The exact encoding is undefined, but the '.' and '/' characters | 
|  | are encoded in 7-bit ASCII and the encoding cannot contain a NUL | 
|  | byte (iow, this is a UNIX pathname). | 
|  |  | 
|  | (Version 4) In version 4, the entry path name is prefix-compressed | 
|  | relative to the path name for the previous entry (the very first | 
|  | entry is encoded as if the path name for the previous entry is an | 
|  | empty string).  At the beginning of an entry, an integer N in the | 
|  | variable width encoding (the same encoding as the offset is encoded | 
|  | for OFS_DELTA pack entries; see pack-format.txt) is stored, followed | 
|  | by a NUL-terminated string S.  Removing N bytes from the end of the | 
|  | path name for the previous entry, and replacing it with the string S | 
|  | yields the path name for this entry. | 
|  |  | 
|  | 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes | 
|  | while keeping the name NUL-terminated. | 
|  |  | 
|  | (Version 4) In version 4, the padding after the pathname does not | 
|  | exist. | 
|  |  | 
|  | Interpretation of index entries in split index mode is completely | 
|  | different. See below for details. | 
|  |  | 
|  | == Extensions | 
|  |  | 
|  | === Cached tree | 
|  |  | 
|  | Cached tree extension contains pre-computed hashes for trees that can | 
|  | be derived from the index. It helps speed up tree object generation | 
|  | from index for a new commit. | 
|  |  | 
|  | When a path is updated in index, the path must be invalidated and | 
|  | removed from tree cache. | 
|  |  | 
|  | The signature for this extension is { 'T', 'R', 'E', 'E' }. | 
|  |  | 
|  | A series of entries fill the entire extension; each of which | 
|  | consists of: | 
|  |  | 
|  | - NUL-terminated path component (relative to its parent directory); | 
|  |  | 
|  | - ASCII decimal number of entries in the index that is covered by the | 
|  | tree this entry represents (entry_count); | 
|  |  | 
|  | - A space (ASCII 32); | 
|  |  | 
|  | - ASCII decimal number that represents the number of subtrees this | 
|  | tree has; | 
|  |  | 
|  | - A newline (ASCII 10); and | 
|  |  | 
|  | - 160-bit object name for the object that would result from writing | 
|  | this span of index as a tree. | 
|  |  | 
|  | An entry can be in an invalidated state and is represented by having | 
|  | a negative number in the entry_count field. In this case, there is no | 
|  | object name and the next entry starts immediately after the newline. | 
|  | When writing an invalid entry, -1 should always be used as entry_count. | 
|  |  | 
|  | The entries are written out in the top-down, depth-first order.  The | 
|  | first entry represents the root level of the repository, followed by the | 
|  | first subtree--let's call this A--of the root level (with its name | 
|  | relative to the root level), followed by the first subtree of A (with | 
|  | its name relative to A), ... | 
|  |  | 
|  | === Resolve undo | 
|  |  | 
|  | A conflict is represented in the index as a set of higher stage entries. | 
|  | When a conflict is resolved (e.g. with "git add path"), these higher | 
|  | stage entries will be removed and a stage-0 entry with proper resolution | 
|  | is added. | 
|  |  | 
|  | When these higher stage entries are removed, they are saved in the | 
|  | resolve undo extension, so that conflicts can be recreated (e.g. with | 
|  | "git checkout -m"), in case users want to redo a conflict resolution | 
|  | from scratch. | 
|  |  | 
|  | The signature for this extension is { 'R', 'E', 'U', 'C' }. | 
|  |  | 
|  | A series of entries fill the entire extension; each of which | 
|  | consists of: | 
|  |  | 
|  | - NUL-terminated pathname the entry describes (relative to the root of | 
|  | the repository, i.e. full pathname); | 
|  |  | 
|  | - Three NUL-terminated ASCII octal numbers, entry mode of entries in | 
|  | stage 1 to 3 (a missing stage is represented by "0" in this field); | 
|  | and | 
|  |  | 
|  | - At most three 160-bit object names of the entry in stages from 1 to 3 | 
|  | (nothing is written for a missing stage). | 
|  |  | 
|  | === Split index | 
|  |  | 
|  | In split index mode, the majority of index entries could be stored | 
|  | in a separate file. This extension records the changes to be made on | 
|  | top of that to produce the final index. | 
|  |  | 
|  | The signature for this extension is { 'l', 'i', 'n', 'k' }. | 
|  |  | 
|  | The extension consists of: | 
|  |  | 
|  | - 160-bit SHA-1 of the shared index file. The shared index file path | 
|  | is $GIT_DIR/sharedindex.<SHA-1>. If all 160 bits are zero, the | 
|  | index does not require a shared index file. | 
|  |  | 
|  | - An ewah-encoded delete bitmap, each bit represents an entry in the | 
|  | shared index. If a bit is set, its corresponding entry in the | 
|  | shared index will be removed from the final index.  Note, because | 
|  | a delete operation changes index entry positions, but we do need | 
|  | original positions in replace phase, it's best to just mark | 
|  | entries for removal, then do a mass deletion after replacement. | 
|  |  | 
|  | - An ewah-encoded replace bitmap, each bit represents an entry in | 
|  | the shared index. If a bit is set, its corresponding entry in the | 
|  | shared index will be replaced with an entry in this index | 
|  | file. All replaced entries are stored in sorted order in this | 
|  | index. The first "1" bit in the replace bitmap corresponds to the | 
|  | first index entry, the second "1" bit to the second entry and so | 
|  | on. Replaced entries may have empty path names to save space. | 
|  |  | 
|  | The remaining index entries after replaced ones will be added to the | 
|  | final index. These added entries are also sorted by entry name then | 
|  | stage. | 
|  |  | 
|  | == Untracked cache | 
|  |  | 
|  | Untracked cache saves the untracked file list and necessary data to | 
|  | verify the cache. The signature for this extension is { 'U', 'N', | 
|  | 'T', 'R' }. | 
|  |  | 
|  | The extension starts with | 
|  |  | 
|  | - A sequence of NUL-terminated strings, preceded by the size of the | 
|  | sequence in variable width encoding. Each string describes the | 
|  | environment where the cache can be used. | 
|  |  | 
|  | - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from | 
|  | ctime field until "file size". | 
|  |  | 
|  | - Stat data of core.excludesfile | 
|  |  | 
|  | - 32-bit dir_flags (see struct dir_struct) | 
|  |  | 
|  | - 160-bit SHA-1 of $GIT_DIR/info/exclude. Null SHA-1 means the file | 
|  | does not exist. | 
|  |  | 
|  | - 160-bit SHA-1 of core.excludesfile. Null SHA-1 means the file does | 
|  | not exist. | 
|  |  | 
|  | - NUL-terminated string of per-dir exclude file name. This usually | 
|  | is ".gitignore". | 
|  |  | 
|  | - The number of following directory blocks, variable width | 
|  | encoding. If this number is zero, the extension ends here with a | 
|  | following NUL. | 
|  |  | 
|  | - A number of directory blocks in depth-first-search order, each | 
|  | consists of | 
|  |  | 
|  | - The number of untracked entries, variable width encoding. | 
|  |  | 
|  | - The number of sub-directory blocks, variable width encoding. | 
|  |  | 
|  | - The directory name terminated by NUL. | 
|  |  | 
|  | - A number of untracked file/dir names terminated by NUL. | 
|  |  | 
|  | The remaining data of each directory block is grouped by type: | 
|  |  | 
|  | - An ewah bitmap, the n-th bit marks whether the n-th directory has | 
|  | valid untracked cache entries. | 
|  |  | 
|  | - An ewah bitmap, the n-th bit records "check-only" bit of | 
|  | read_directory_recursive() for the n-th directory. | 
|  |  | 
|  | - An ewah bitmap, the n-th bit indicates whether SHA-1 and stat data | 
|  | is valid for the n-th directory and exists in the next data. | 
|  |  | 
|  | - An array of stat data. The n-th data corresponds with the n-th | 
|  | "one" bit in the previous ewah bitmap. | 
|  |  | 
|  | - An array of SHA-1. The n-th SHA-1 corresponds with the n-th "one" bit | 
|  | in the previous ewah bitmap. | 
|  |  | 
|  | - One NUL. | 
|  |  | 
|  | == File System Monitor cache | 
|  |  | 
|  | The file system monitor cache tracks files for which the core.fsmonitor | 
|  | hook has told us about changes.  The signature for this extension is | 
|  | { 'F', 'S', 'M', 'N' }. | 
|  |  | 
|  | The extension starts with | 
|  |  | 
|  | - 32-bit version number: the current supported version is 1. | 
|  |  | 
|  | - 64-bit time: the extension data reflects all changes through the given | 
|  | time which is stored as the nanoseconds elapsed since midnight, | 
|  | January 1, 1970. | 
|  |  | 
|  | - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. | 
|  |  | 
|  | - An ewah bitmap, the n-th bit indicates whether the n-th index entry | 
|  | is not CE_FSMONITOR_VALID. | 
|  |  | 
|  | == End of Index Entry | 
|  |  | 
|  | The End of Index Entry (EOIE) is used to locate the end of the variable | 
|  | length index entries and the begining of the extensions. Code can take | 
|  | advantage of this to quickly locate the index extensions without having | 
|  | to parse through all of the index entries. | 
|  |  | 
|  | Because it must be able to be loaded before the variable length cache | 
|  | entries and other index extensions, this extension must be written last. | 
|  | The signature for this extension is { 'E', 'O', 'I', 'E' }. | 
|  |  | 
|  | The extension consists of: | 
|  |  | 
|  | - 32-bit offset to the end of the index entries | 
|  |  | 
|  | - 160-bit SHA-1 over the extension types and their sizes (but not | 
|  | their contents).  E.g. if we have "TREE" extension that is N-bytes | 
|  | long, "REUC" extension that is M-bytes long, followed by "EOIE", | 
|  | then the hash would be: | 
|  |  | 
|  | SHA-1("TREE" + <binary representation of N> + | 
|  | "REUC" + <binary representation of M>) | 
|  |  | 
|  | == Index Entry Offset Table | 
|  |  | 
|  | The Index Entry Offset Table (IEOT) is used to help address the CPU | 
|  | cost of loading the index by enabling multi-threading the process of | 
|  | converting cache entries from the on-disk format to the in-memory format. | 
|  | The signature for this extension is { 'I', 'E', 'O', 'T' }. | 
|  |  | 
|  | The extension consists of: | 
|  |  | 
|  | - 32-bit version (currently 1) | 
|  |  | 
|  | - A number of index offset entries each consisting of: | 
|  |  | 
|  | - 32-bit offset from the begining of the file to the first cache entry | 
|  | in this block of entries. | 
|  |  | 
|  | - 32-bit count of cache entries in this block |