| gitdatamodel(7) |
| =============== |
| |
| NAME |
| ---- |
| gitdatamodel - Git's core data model |
| |
| SYNOPSIS |
| -------- |
| gitdatamodel |
| |
| DESCRIPTION |
| ----------- |
| |
| It's not necessary to understand Git's data model to use Git, but it's |
| very helpful when reading Git's documentation so that you know what it |
| means when the documentation says "object", "reference" or "index". |
| |
| Git's core operations use 4 kinds of data: |
| |
| 1. <<objects,Objects>>: commits, trees, blobs, and tag objects |
| 2. <<references,References>>: branches, tags, |
| remote-tracking branches, etc |
| 3. <<index,The index>>, also known as the staging area |
| 4. <<reflogs,Reflogs>>: logs of changes to references ("ref log") |
| |
| [[objects]] |
| OBJECTS |
| ------- |
| |
| All of the commits and files in a Git repository are stored as "Git objects". |
| Git objects never change after they're created, and every object has an ID, |
| like `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. |
| |
| This means that if you have an object's ID, you can always recover its |
| exact contents as long as the object hasn't been deleted. |
| |
| Every object has: |
| |
| [[object-id]] |
| 1. an *ID* (aka "object name"), which is a cryptographic hash of its |
| type and contents. |
| It's fast to look up a Git object using its ID. |
| This is usually represented in hexadecimal, like |
| `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. |
| 2. a *type*. There are 4 types of objects: |
| <<commit,commits>>, <<tree,trees>>, <<blob,blobs>>, |
| and <<tag-object,tag objects>>. |
| 3. *contents*. The structure of the contents depends on the type. |
| |
| Here's how each type of object is structured: |
| |
| [[commit]] |
| commit:: |
| A commit contains these required fields |
| (though there are other optional fields): |
| + |
| 1. The full directory structure of all the files in that version of the |
| repository and each file's contents, stored as the *<<tree,tree>>* ID |
| of the commit's top-level directory |
| 2. Its *parent commit ID(s)*. The first commit in a repository has 0 parents, |
| regular commits have 1 parent, merge commits have 2 or more parents |
| 3. An *author* and the time the commit was authored |
| 4. A *committer* and the time the commit was committed |
| 5. A *commit message* |
| + |
| Here's how an example commit is stored: |
| + |
| ---- |
| tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a |
| parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647 |
| author Maya <maya@example.com> 1759173425 -0400 |
| committer Maya <maya@example.com> 1759173425 -0400 |
| |
| Add README |
| ---- |
| + |
| Like all other objects, commits can never be changed after they're created. |
| For example, "amending" a commit with `git commit --amend` creates a new |
| commit with the same parent. |
| + |
| Git does not store the diff for a commit: when you ask Git to show |
| the commit with linkgit:git-show[1], it calculates the diff from its |
| parent on the fly. |
| |
| [[tree]] |
| tree:: |
| A tree is how Git represents a directory. |
| It can contain files or other trees (which are subdirectories). |
| It lists, for each item in the tree: |
| + |
| 1. The *filename*, for example `hello.py` |
| 2. The *file type*, which must be one of these five types: |
| - *regular file* |
| - *executable file* |
| - *symbolic link* |
| - *directory* |
| - *gitlink* (for use with submodules) |
| 3. The <<object-id,*object ID*>> with the contents of the file, directory, |
| or gitlink. |
| + |
| For example, this is how a tree containing one directory (`src`) and one file |
| (`README.md`) is stored: |
| + |
| ---- |
| 100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md |
| 040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src |
| ---- |
| |
| NOTE: In the output above, Git displays the file type of each tree entry |
| using a format that's loosely modelled on Unix file modes (`100644` is |
| "regular file", `100755` is "executable file", `120000` is "symbolic |
| link", `040000` is "directory", and `160000` is "gitlink"). It also |
| displays the object's type: `blob` for files and symlinks, `tree` for |
| directories, and `commit` for gitlinks. |
| |
| [[blob]] |
| blob:: |
| A blob object contains a file's contents. |
| + |
| When you make a commit, Git stores the full contents of each file that |
| you changed as a blob. |
| For example, if you have a commit that changes 2 files in a repository |
| with 1000 files, that commit will create 2 new blobs, and use the |
| previous blob ID for the other 998 files. |
| This means that commits can use relatively little disk space even in a |
| very large repository. |
| |
| [[tag-object]] |
| tag object:: |
| Tag objects contain these required fields |
| (though there are other optional fields): |
| + |
| 1. The *ID* of the object it references |
| 2. The *type* of the object it references |
| 3. The *tagger* and tag date |
| 4. A *tag message*, similar to a commit message |
| |
| Here's how an example tag object is stored: |
| |
| ---- |
| object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3 |
| type commit |
| tag v1.0.0 |
| tagger Maya <maya@example.com> 1759927359 -0400 |
| |
| Release version 1.0.0 |
| ---- |
| |
| NOTE: All of the examples in this section were generated with |
| `git cat-file -p <object-id>`. |
| |
| [[references]] |
| REFERENCES |
| ---------- |
| |
| References are a way to give a name to a commit. |
| It's easier to remember "the changes I'm working on are on the `turtle` |
| branch" than "the changes are in commit bb69721404348e". |
| Git often uses "ref" as shorthand for "reference". |
| |
| References can either refer to: |
| |
| 1. An object ID, usually a <<commit,commit>> ID |
| 2. Another reference. This is called a "symbolic reference" |
| |
| References are stored in a hierarchy, and Git handles references |
| differently based on where they are in the hierarchy. |
| Most references are under `refs/`. Here are the main types: |
| |
| [[branch]] |
| branches: `refs/heads/<name>`:: |
| A branch refers to a commit ID. |
| That commit is the latest commit on the branch. |
| + |
| To get the history of commits on a branch, Git will start at the commit |
| ID the branch references, and then look at the commit's parent(s), |
| the parent's parent, etc. |
| |
| [[tag]] |
| tags: `refs/tags/<name>`:: |
| A tag refers to a commit ID, tag object ID, or other object ID. |
| There are two types of tags: |
| 1. "Annotated tags", which reference a <<tag-object,tag object>> ID |
| which contains a tag message |
| 2. "Lightweight tags", which reference a commit, blob, or tree ID |
| directly |
| + |
| Even though branches and tags both refer to a commit ID, Git |
| treats them very differently. |
| Branches are expected to change over time: when you make a commit, Git |
| will update your <<HEAD,current branch>> to point to the new commit. |
| Tags are usually not changed after they're created. |
| |
| [[HEAD]] |
| HEAD: `HEAD`:: |
| `HEAD` is where Git stores your current <<branch,branch>>, |
| if there is a current branch. `HEAD` can either be: |
| + |
| 1. A symbolic reference to your current branch, for example `ref: |
| refs/heads/main` if your current branch is `main`. |
| 2. A direct reference to a commit ID. In this case there is no current branch. |
| This is called "detached HEAD state", see the DETACHED HEAD section |
| of linkgit:git-checkout[1] for more. |
| |
| [[remote-tracking-branch]] |
| remote-tracking branches: `refs/remotes/<remote>/<branch>`:: |
| A remote-tracking branch refers to a commit ID. |
| It's how Git stores the last-known state of a branch in a remote |
| repository. `git fetch` updates remote-tracking branches. When |
| `git status` says "you're up to date with origin/main", it's looking at |
| this. |
| + |
| `refs/remotes/<remote>/HEAD` is a symbolic reference to the remote's |
| default branch. This is the branch that `git clone` checks out by default. |
| |
| [[other-refs]] |
| Other references:: |
| Git tools may create references anywhere under `refs/`. |
| For example, linkgit:git-stash[1], linkgit:git-bisect[1], |
| and linkgit:git-notes[1] all create their own references |
| in `refs/stash`, `refs/bisect`, etc. |
| Third-party Git tools may also create their own references. |
| + |
| Git may also create references other than `HEAD` at the base of the |
| hierarchy, like `ORIG_HEAD`. |
| |
| NOTE: Git may delete objects that aren't "reachable" from any reference |
| or <<reflogs,reflog>>. |
| An object is "reachable" if we can find it by following tags to whatever |
| they tag, commits to their parents or trees, and trees to the trees or |
| blobs that they contain. |
| For example, if you amend a commit with `git commit --amend`, |
| there will no longer be a branch that points at the old commit. |
| The old commit is recorded in the current branch's <<reflogs,reflog>>, |
| so it is still "reachable", but when the reflog entry expires it may |
| become unreachable and get deleted. |
| |
| the old commit will usually not be reachable, so it may be deleted eventually. |
| Reachable objects will never be deleted. |
| |
| [[index]] |
| THE INDEX |
| --------- |
| The index, also known as the "staging area", is a list of files and |
| the contents of each file, stored as a <<blob,blob>>. |
| You can add files to the index or update the contents of a file in the |
| index with linkgit:git-add[1]. This is called "staging" the file for commit. |
| |
| Unlike a <<tree,tree>>, the index is a flat list of files. |
| When you commit, Git converts the list of files in the index to a |
| directory <<tree,tree>> and uses that tree in the new <<commit,commit>>. |
| |
| Each index entry has 4 fields: |
| |
| 1. The *file type*, which must be one of: |
| - *regular file* |
| - *executable file* |
| - *symbolic link* |
| - *gitlink* (for use with submodules) |
| 2. The *<<blob,blob>>* ID of the file, |
| or (rarely) the *<<commit,commit>>* ID of the submodule |
| 3. The *stage number*, either 0, 1, 2, or 3. This is normally 0, but if |
| there's a merge conflict there can be multiple versions of the same |
| filename in the index. |
| 4. The *file path*, for example `src/hello.py` |
| |
| It's extremely uncommon to look at the index directly: normally you'd |
| run `git status` to see a list of changes between the index and <<HEAD,HEAD>>. |
| But you can use `git ls-files --stage` to see the index. |
| Here's the output of `git ls-files --stage` in a repository with 2 files: |
| |
| ---- |
| 100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md |
| 100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py |
| ---- |
| |
| [[reflogs]] |
| REFLOGS |
| ------- |
| |
| Every time a branch, remote-tracking branch, or HEAD is updated, Git |
| updates a log called a "reflog" for that <<references,reference>>. |
| This means that if you make a mistake and "lose" a commit, you can |
| generally recover the commit ID by running `git reflog <reference>`. |
| |
| A reflog is a list of log entries. Each entry has: |
| |
| 1. The *commit ID* |
| 2. *Timestamp* when the change was made |
| 3. *Log message*, for example `pull: Fast-forward` |
| |
| Reflogs only log changes made in your local repository. |
| They are not shared with remotes. |
| |
| You can view a reflog with `git reflog <reference>`. |
| For example, here's the reflog for a `main` branch which has changed twice: |
| |
| ---- |
| $ git reflog main --date=iso --no-decorate |
| 750b4ea main@{2025-09-29 15:17:05 -0400}: commit: Add README |
| 4ccb6d7 main@{2025-09-29 15:16:48 -0400}: commit (initial): Initial commit |
| ---- |
| |
| GIT |
| --- |
| Part of the linkgit:git[1] suite |