Understanding Git commit SHAs

Git's commit structure is designed around hashing and trees, leveraging the SHA-1 hash algorithm and a hierarchical structure to manage and store project history efficiently

How Git uses SHA-1 hashing

The SHA-1 hash is a cryptographic hash function that generates a 160-bit (20-byte) hash value, commonly expressed as a 40-character hexadecimal number. This hash serves several purposes within Git:

Uniqueness: Each commit and every piece of content in the repository is uniquely identified by its SHA-1 hash, ensuring that every change can be tracked and referenced distinctly.
Integrity: The hash provides a checksum of the content, which Git uses to detect corruption or tampering with the data. If even a single bit changes, the resulting hash will be entirely different.

Why Git uses SHA-1

Git employs SHA-1 hashes for several reasons:

Efficiency: SHA-1 strikes a balance between speed and entropy, making it suitable for the rapid processing of objects in the repository, and decreasing the probability of hash collisions.
Security: While not the primary reason, the use of SHA-1 also adds a layer of security by making it difficult to create two different sets of content with the same hash, preventing malicious actors from injecting unnoticed code into your repository.

Git's use of hashing and trees

Git uses the SHA-1 hash to create a "git hash" for each "blob" (file content), "tree" (directory structure), and "commit" (change set). This system forms a backbone for Git's data model and version control capabilities.

Git blob: Represents a file's content in Git, with no file name or directory structure. Each blob is uniquely identified by a SHA-1 hash of its contents, referred to as the "git hash."
Git tree: A tree object in Git represents a directory. It contains a list of file names and their corresponding blob hashes, as well as other trees (subdirectories), forming a recursive structure. The tree itself is also identified by a SHA-1 hash, derived from its contents. This hierarchical organization, or "git hash tree," allows Git to efficiently manage and navigate the project's directory structure.
Git commit: A commit object points to a tree object that represents the top-level directory of the project at a certain point in time. It contains metadata such as the author, commit message, and parent commits, creating a linked history. The commit is also identified by a SHA-1 hash, known as the "git commit hash."

Example of Git hashing

Let's illustrate how Git uses the git hash-object command to generate hashes for tracking and managing files within a repository.

Step 1: Create a new file

First, we create a new text file named example.txt and add some content to it. Let's say the content is "Hello, Git!".

Terminal

echo "Hello, Git!" > example.txt

Step 2: Calculating the hash with `git hash-object`

Next, we use the git hash-object command to calculate the SHA-1 hash of the file's contents. This command takes the content of a file and outputs the SHA-1 hash, simulating what Git does internally when files are added to the repository.

Terminal

git hash-object example.txt

This command will output a 40-character SHA-1 hash that uniquely identifies the content of example.txt, in this case:

Terminal

d94b5f7ec7c6d7602c78a5e9b8a5b8c94d093eda

This hash serves as a unique identifier for the content "Hello, Git!" in the Git repository.

Step 3: Understanding the hash in Git's Data Model

The hash d94b5f7ec7c6d7602c78a5e9b8a5b8c94d093eda acts as a "git hash code" for the blob object representing the content of example.txt. If you were to add this file to a Git repository using git add and then commit the change, Git would use this hash to track the file content.

As a blob: In Git's data model, the file content "Hello, Git!" is stored as a blob object, identified by this SHA-1 hash. The blob contains just the content, with no information about the file name or directory structure.
In a tree: If example.txt is part of a directory that is committed to Git, a tree object will be created. This tree object contains entries for all items in the directory, including example.txt. The entry for example.txt in the tree will reference the blob by its hash.
In a commit: When you make a commit, a commit object is created. This commit object points to the top-level tree object representing the state of the repository at that commit. The commit itself is also identified by a unique SHA-1 hash, based on its content and metadata (including the tree it points to, the parent commit hash, author, and message).

This example illustrates how git hash-object gives us a glimpse into the foundational role that hashing plays in Git's version control system. By uniquely identifying file contents with SHA-1 hashes, Git can efficiently track changes, ensure data integrity, and manage complex project histories.

For further reading on how Git organizes its internal data storage, see the official Git documentation.

Understanding Git commit SHAs

How Git uses SHA-1 hashing

Why Git uses SHA-1

Git's use of hashing and trees

Example of Git hashing

Step 1: Create a new file

Step 2: Calculating the hash with `git hash-object`

Step 3: Understanding the hash in Git's Data Model

Smartlog

Understanding the Git staging area

How to change the author of a Git commit

Built for the world's fastest engineering teams, now available for everyone

Understanding Git commit SHAs

How Git uses SHA-1 hashing

Why Git uses SHA-1

Git's use of hashing and trees

Example of Git hashing

Step 1: Create a new file

Step 2: Calculating the hash with git hash-object

Step 3: Understanding the hash in Git's Data Model

Smartlog

Understanding the Git staging area

How to change the author of a Git commit

Built for the world's fastest engineering teams, now available for everyone

Step 2: Calculating the hash with `git hash-object`