Data report"State of code review 2024" is now liveRead the full report

Managing monorepos with Git

Kenny DuMez
Kenny DuMez
Graphite software engineer

Monorepos, or monolithic repositories, are a source control strategy where multiple software projects are stored in a single repository. This approach can simplify dependency management, streamline the development process, and enhance code reuse. However, managing a monorepo effectively with Git requires understanding certain strategies and tools to maintain efficiency and scalability.

A monorepo is a single repository that contains the code for many separate projects, which may be related or independent. This contrasts with multi-repo approaches where each project has its own discrete repository. Large companies like Google and Facebook use monorepos for their codebases as it simplifies many aspects of their workflow and tooling.

  • Simplified dependency management: Changes to shared libraries or services can be made atomically across all of the projects that depend on them.
  • Unified versioning: A single commit can represent a snapshot of the state of all of the projects at a point in time.
  • Collaboration: It's easier to refactor across boundaries since all code is in a single repo.
  • Scalability: As the repository grows, so does the overhead of storage and time complexity for Git operations like clone, fetch, and pull.
  • CI/CD complexity: Continuous integration and deployment systems may need to be optimized to handle changes in large repositories without deploying or testing everything in the monorepo for every small change.

Sparse checkout is a feature in Git that allows you to clone only a subset of a larger repository. This can significantly reduce the amount of data pulled onto a developer’s machine.

Using sparse checkout:

Terminal
git clone --filter=blob:none --no-checkout https://your-repository-url.git
cd your-repository
git sparse-checkout init --cone
git sparse-checkout set apps/myapp
git checkout main

This sequence sets up a new repository, initializes sparse checkout in cone mode, which is optimized for performance, specifies the directories you wish to check out, and then checks out the main branch.

VFS for Git, originally developed by Microsoft, manages large repositories by downloading only the specific versions of files that are currently needed on the developer’s machine.

Git LFS is used to handle large files without bloating the repository size. Files tracked in LFS are stored in a separate server, and Git interacts with pointers to the files rather than the large files themselves.

Configuring Git LFS:

Terminal
git lfs install
git lfs track "*.psd"
git add .gitattributes
git commit -m "Track PSD files with Git LFS"

For example in a repository storing large Photoshop .psd resources you may want to use LFS to avoid constant large file transfers.

This setup ensures that all .psd files are handled via LFS, keeping the repository size manageable.

For monorepos, CI/CD pipelines need to be smart about what they test and deploy:

  • Path filters: Configure your CI/CD system to trigger jobs based on changes to specific paths.
  • Bazel: Google's build tool, Bazel, can intelligently determine which parts of a repository need to be rebuilt and tested based on changes.

While the management of monorepos in Git can be complex, the benefits often outweigh the challenges, especially for large-scale projects. By leveraging tools and strategies like sparse checkouts, VFS for Git, Git LFS, and tailored CI/CD workflows, teams can maximize their productivity and maintain scalability in their development processes.

For further reading see this comprehensive guide to monorepos.

Graphite
Git stacked on GitHub

Stacked pull requests are easier to read, easier to write, and easier to manage.
Teams that stack ship better software, faster.

Or install our CLI.
Product Screenshot 1
Product Screenshot 2