Read Anthropic’s case study about Graphite Reviewer

Cloning a single file in Git

Greg Foster
Greg Foster
Graphite software engineer
Try Graphite


Note

This guide explains this concept in vanilla Git. For Graphite documentation, see our CLI docs.


Git is predominantly used for cloning entire repositories, but sometimes you may need just a single file from a large repository. This is especially useful when dealing with large repositories where downloading the entire content is unnecessary and time-consuming. In this guide, we will explore how you can achieve this using different Git techniques, focusing on efficiency and practicality.

Join 20,000+ developers at top companies
Stop wrestling with Git commands
The Graphite CLI takes all the pain out of Git, allowing you to ship faster and stop googling Git commands.
main
diff1
diff2

Git does not support cloning a single file directly because the git clone command inherently works at the repository level. However, there are several methods to effectively retrieve just a single file from a repository without cloning the entire project.

Sparse checkout allows you to selectively check out specific files or directories from a repository. This method involves cloning the repository but only checking out the files you specify, saving on bandwidth and local storage.

Here's an explanation of each step you've listed in managing a Git repository, including the commands you would see from the terminal:

  1. Initialize a new Git repository:

    Terminal
    git init <repo-name>
    cd <repo-name>

    This step creates a new Git repository in the specified directory <repo-name>. The git init command initializes a new Git repository locally on your computer in the folder you name.

    After initialization, the cd <repo-name> command moves the terminal's current directory focus into the newly created repository directory. This repository starts empty with no files and only the necessary Git configuration and directory structure (like the .git directory where Git keeps all of its internal tracking information).

  2. Add the remote repository:

    Terminal
    git remote add origin <repository-url>

    This command connects your local repository to a remote repository, which is a repository hosted on a server (commonly on platforms like GitHub, GitLab, or Bitbucket).

    git remote add origin <repository-url> adds a new remote named "origin" at the specified URL. The name "origin" is a conventional name used to refer to the primary upstream repository, but you can name it anything. This step is crucial for linking your local repository with a remote repository to enable pushing (sending your commits) and pulling (receiving updates) between them.

  3. Enable sparse checkout:

    Terminal
    git config core.sparseCheckout true

    Sparse checkout is a feature in Git that allows you to selectively check out only specific subdirectories or files from a repository, rather than the entire repository.

    This is useful in large repositories where you only need access to a subset of the content. The command git config core.sparseCheckout true sets the configuration option core.sparseCheckout to true in the local repository, enabling this feature.

  4. Create a sparse-checkout file that specifies which files to check out:

    Terminal
    echo "path/to/your/file" > .git/info/sparse-checkout

    Once sparse checkout is enabled, you define what to check out using a sparse-checkout file located in .git/info/sparse-checkout. This file contains a list of patterns that specify the paths to include in the checkout.

    The echo command writes the specified path to this file, setting up the repository to only include the directories or files at the path path/to/your/file. This can be a directory name, wildcard patterns, or specific file paths.

  5. Fetch the data and checkout the specific file:

    Terminal
    git fetch origin main
    git checkout main

    The git fetch origin main command contacts the remote named "origin" and downloads the content for the branch named main, updating your local repository's database with references to all branches from the remote, including their history, but without altering your working directory. This prepares the local repository to switch to the specific version of files.

    git checkout main then updates the files in your working directory to match the latest commit on the main branch. In the context of sparse checkout, this step will only checkout the files specified in the sparse-checkout configuration, instead of all files in the branch.

Join 20,000+ developers at top companies
The best engineers use Graphite to simplify Git
Engineers at Vercel, Snowflake & The Browser Company are shipping faster and staying unblocked with Graphite.
main
diff1
diff2

Using git archive to download a single file involves accessing a remote repository and piping the output to tar to extract a specific file.

  1. Use git archive and tar:

    Terminal
    git archive --remote=<repository-url> HEAD:path/to/directory/ filename | tar -x
    • git archive: This Git command is used to create an archive (like a .tar or .zip file) of files from a named tree in the repository.
    • --remote=<repository-url>: This option specifies that the archive should be created not from the local repository, but directly from a remote repository at the given URL.
    • HEAD:path/to/directory/ filename: This part of the command specifies what to include in the archive. HEAD refers to the latest commit on the current branch in the repository. path/to/directory/ filename indicates a specific path within that commit. This path should point to the directory or file you want to archive. The space between the directory path and filename should likely be removed for the command to work correctly, as it typically should point directly to a file or directory, like HEAD:path/to/directory/filename.
    • | tar -x: The output of git archive is piped (|) directly into the tar command. tar -x extracts the files from the archive stream it receives from git archive. This means that as soon as git archive creates the archive, tar extracts it immediately, which allows for directly extracting files without having to save and then manually extract the archive.
  2. Replace placeholders:

    • <repository-url>: You need to replace this with the actual URL of the remote Git repository from which you want to extract the file or directory. For example, it could be something like https://github.com/user/repository.git.
    • path/to/directory/: Replace this with the actual path within the Git repository where the file or directory you want to extract is located. It's important that this path is correct and exists in the repository at the latest commit on the main branch (or whichever branch HEAD points to in the remote repository).
    • filename: Replace this with the actual name of the file you want to extract from the specified directory. If you are extracting an entire directory, this part should adjust to cover the directory path fully.

If the file is hosted on GitHub, you can use GitHub's API to download a single file directly.

  1. Construct the URL:

    • Format: https://api.github.com/repos/<username>/<repository>/contents/<path-to-file>
  2. Use curl or wget to download the file:

    Terminal
    curl -H 'Accept: application/vnd.github.v3.raw' -O -L <URL>
    • Replace <URL> with the full URL constructed in the previous step.

This method is straightforward and does not require cloning the repository or installing Git, but it requires internet access and works specifically with GitHub.

For more reading, see the official documentation on git sparse-checkout, git archive, and the GitHub API.

Built for the world's fastest engineering teams, now available for everyone