How 10,000 Developers All Contribute to the same Repo

Imagine stepping into a role as a junior developer at a global tech behemoth. Your first day, you get straight to work and start to explore the git repo that you’ll be working in. There’s only one problem: the repository is massive, containing millions of lines of code. To put this into perspective, if you were to clone the repo directly onto your machine it could take upwards of 12 hours. This is because the company is using something called a “monorepo”, where every line of code across every team in the whole company lives in a single repository. 10,000 software engineers all committing, reviewing, and merging into the same repository, all at once. This may seem insane at first, but this is the reality for most of the largest companies in tech. For example, both Meta (formerly Facebook) and Google structure their codebase in monorepos, using a workflow called “stacking” to facilitate the massive amount of concurrent pull requests. Stacking, by breaking up large changes into a series of independent smaller PRs, allows engineers at these giant companies to stay unblocked, and deliver better quality code, faster. Coupled with specifically tailored tooling and CI/CD systems, stacking in a monorepo helps the largest tech companies in the world build and ship the products that billions of people use every day.

What is a monorepo?

A monorepo (monolithic repository), much like its name implies, is a single repository that holds multiple projects, libraries, services, and/or applications together. These projects may be related or unrelated and could be components, libraries, tools, or entire self-contained applications. Giant companies tend to favor monorepos because they’re far easier to manage at scale, given the right tools and engineering practices. When implemented correctly, monorepos facilitate code reuse across multiple services and teams by unifying all your source code into a single source of truth. Organizing all of your code into a single repository also allows different projects to share the same configurations, libraries, and tools, while enforcing uniform code standards across your organization through a shared CI/CD pipeline.

The benefits of stacking

With these advantages however, come inherent challenges—notably the monumental size of the repositories, routinely clocking in at tens of millions (sometimes billions) of lines of code. With so many engineers all contributing to the same repository, avoiding merge conflicts and managing code review can be extremely tricky. This is where “stacking” comes in. Traditional git workflows equate every feature with its own PR, made up of several commits, on its own branch. You finish a feature, you submit a PR and wait for its approval before you can merge it back into main. Then once it’s merged, before you start work on the next feature, you branch off of main again and begin building. At its most seamless, this process can take a while, with PRs sitting waiting for review for hours, days, and even years. When you start factoring in merge conflicts, bugs, and disagreements on the PR, the review process can grind productivity to a halt.

By breaking up larger changes into many smaller ones and by making the unit of change a pull request, your change can be tested, reviewed, landed, and reverted easily. Once you complete a feature, it lays the foundation for subsequent dependent features. These collections of smaller changes, or “stacks,” can continuously be built, one on top of the other, allowing engineers to stay unblocked and keep moving. Such methodologies empower developers to bypass the delays of main branch dependency and permit continuous parallel development.

Stacking manually can be challenging, so large companies invested millions of dollars into creating their own internal tools to help facilitate stacking. Critique by Google and Phabricator by Facebook are two examples of the kind of tooling that allows these companies to enhance developer productivity and minimize stagnation, a crucial aspect when every hanging PR can snowball into bigger and bigger blockages.

Continuous integration and delivery

Once a developer’s stack of PRs is approved and merged, the focus shifts to deploying. This is where a solid CI/CD (continuous integration, continuous delivery) workflow is essential. The first step, continuous integration, can be thought of as the build stage of the software release cycle, automating the build processes and mitigating integration problems. During this stage a battery of tests are run to ensure code standards are being met, and the changes seamlessly integrate into the rest of the codebase without breaking anything. Again, the big companies have their own tools to help. Products like Bazel by Google specialize in optimizing large codebases, utilizing extensive caching and distributed builds to expedite the integration process, a crucial aspect when even a 5% optimization translates to millions of dollars in savings. This stage primes the code for the next phase—Continuous Delivery.

The best practices for deploying production code, dictate all builds be:

Automated
Reproducible
Fault-tolerant

In order to achieve these goals, organizations use technologies like Terraform for reproducible infrastructure as code, in addition to techniques like canary deployments to make the release process systematic and virtually risk-free. An optimized CI/CD deployment ensures seamless user experiences and provides safeguards against application downtime and widespread outages.

What about the little guys?

Throughout this post we’ve been looking at what the biggest companies in the world are doing to produce efficient, quality code. The good news, though, is that these practices aren’t just limited to trillion-dollar companies anymore! Tools like Graphite bring the stacking workflow to everyone, allowing teams of all sizes to start taking advantage of the same practices that the giant companies have been using for years. Graphite’s CLI and dashboard let engineers take advantage of the “stacking” workflow, while syncing all your data back to GitHub.

Whether you’re an engineer in a team of developing solo, Graphite’s developer productivity tools and stacking workflow will help you ship code faster. Whether you’re a junior dev looking to learn how bigger companies work, or a senior engineer who misses the tooling you used to have at the tech giants, tools like Graphite can help bridge the gap between smaller orgs and the biggest in the industry.

TLDR;

Monolithic Repositories: Enable consistent collaboration across applications but bring the challenge of managing massive lines of code.
Stacking: Allows continuous incremental improvements, avoiding main branch dependency, and facilitating parallel development.
Continuous Integration and Continuous Delivery: Ensure that the synchronized efforts of thousands are seamlessly and securely delivered, protecting against potential multimillion-dollar losses from deployment mishaps.

This holistic approach to software development, epitomized by methodologies like stacking and practices like CI/CD, has been refined over the years by the largest most successful tech companies, to enhance, innovate, and deliver— lessons we can all learn from, no matter the size of your team.

How 10,000 Developers all Contribute to the Same Repo

What is a monorepo?

The benefits of stacking

Continuous integration and delivery

What about the little guys?

TLDR;

Related posts

Built for the world's fastest engineering teams, now available for everyone