Read Anthropic’s case study about Graphite Reviewer

Ana was exhausted. Her team had just resolved a major incident caused by a bug that had made it into production, wreaking havoc for their customers. The feature involved a complex revamp of their recommendations engine—one of the biggest changes they had ever undertaken.

Dealing with hundreds of lines of intricate new logic, it was clear that reviewers had focused only on the parts they understood, assuming the rest was fine.

As the lead developer, she took responsibility for the failure in their review process and recognized that in their rush to complete the key feature, some code reviews had been no more than a rubber stamp.

On the surface, code reviews may seem like an unnecessary drag on developer velocity. However, data reveals that structured reviews can improve quality and productivity—when implemented effectively.

Big ideas developed in a vacuum are doomed from the start. Feedback is an essential tool for building and growing a successful company. — Jay Samit, Independent Vice Chairman, Deloitte Digital

According to Steve McConnell’s book Code Complete, code inspections discover over 60% of defects compared to 25-45% for standard testing. Here are a few more stats from the book:

  • Code reviews cut errors by over 80% at Aetna, enabling a 20% decrease in dev staffing.

  • AT&T saw a 14% productivity boost and a 90% decrease in defects after introducing code reviews.

  • 63% of new devs were able to learn and use Git in one semester, indicating change process adoption viability.

At Graphite, we found that teams following PR size best practices have PRs hovering around the 50-line average and ship 40% more code than a similar team doing 200+ line PRs. Smaller PRs also make writing proper unit tests for each module easier and make reverting regressions much easier.

This data demonstrates that thoughtfully designed reviews can isolate objective changes and provide a model for increasing release stability. Consider consistent, structured code reviews as one of the main pillars of the development process.

When done right, code reviews improve product quality and team productivity—but the review model must adhere to modern code review processes.

What review models best balance robustness and speed? Let's explore the primary types of reviews and their tradeoffs.

Let’s pick back up with Ana and the team.

It was time for them to evaluate their code review process to prevent incidents like this moving forward. Ana knew proper code reviews were necessary to improve performance, velocity, and quality.

“If an egg is broken by outside force, life ends. If broken by inside force, life begins. Great things always begin from inside.” – Jim Kwik, Learning Expert.

Diverse developers and teams employ various types of code reviews. Let's explore the most popular methods and evaluate their compatibility with Ana and her team.

A formal code review is a structured, thorough process involving multiple phases and participants that helps examine code for defects. Originating from Michael Fagan's work in the 1970s, this method emphasizes defect detection rather than correction or improvement.

The process typically develops in several stages:

  • Planning

  • Overview meeting

  • Preparation

  • Inspection meeting

  • Causal analysis

  • Reworking

  • Follow-up

The plan is to identify a wide array of defects—ranging from 60 to 90 percent.

Each participant plays a specific role, including the moderator, program designer (or architect), developer (or coder), and tester, contributing to a comprehensive and detailed review.

Inspections are time-boxed to maintain efficiency, with two-hour sessions being optimal to prevent a decrease in error detection effectiveness.

While formal reviews are highly effective in finding defects, they can be time-consuming and require significant preparation and participation effort.

Typically, formal code reviews are not very scalable. Especially when you know systems do get complicated over time.

As a program evolves and acquires more features, it becomes complicated, with subtle dependencies between its components. Over time, complexity accumulates, and it becomes harder and harder for programmers to keep all of the relevant factors in their minds as they modify the system. This slows down development and leads to bugs, which slow development even more and add to its cost. The larger the program, and the more people that work on it, the more difficult it is to manage complexity. — From the book A Philosophy of Software Design.

Ana considered whether formal reviews could have prevented the issue they experienced. The structured preparation and time-boxed inspection meeting would certainly have encouraged a more thorough checking of the entire change set.

However, she was concerned that the formality and time required would not work well with her team's Agile processes. The cost of disrupting development to prepare and participate in lengthy review meetings outweighs any benefits.

Lightweight code review offers a more flexible and less resource-intensive approach than formal reviews. They generally include several methods, such as pair programming, over-the-shoulder, async reviews, and tool-assisted review.

These methods share a common goal of speeding up the feedback loops and integrating easily into the development workflow without the extensive setup of formal inspections.

Two developers work simultaneously on the same piece of code, effectively conducting a continuous review process. This approach fosters mutual motivation and maintains focus, especially among developers of similar experience levels.

The appeal of pair programming for certain tasks may be clear, but across a whole project, it would be impractical. Two developers on Ana’s team would rarely be working in the same area of the codebase. This approach may be more practical for larger businesses with an established product that needs to be maintained.

These reviews occur in real-time, with the reviewer joining the coder at their workstation to go through the code together. This method is most useful when the reviewer needs more familiarity with the task's objectives or anticipates substantial code improvements.

“Over the shoulder is often the developer explaining their decisions in the code, instead of the reviewer trying to reverse-engineer it, independently. It's just faster and has less resistance -- not necessarily better. The problem with remote live reviews is that in a remote environment, it's harder to tell if someone is free or if they are doing their deep work. Either the developer has to wait for the review to be done asynchronously before the merge... or ping someone to review their code through a screenshare and take away their attention.” — Hacker News user, aman-pro

However, it can lead to forced context switching, negatively impacting the reviewer's productivity and the team's overall efficiency.

While synchronous reviews by a reviewer sitting at a workstation could be valuable, Ana's team was fully remote across multiple time zones. Real-time over-the-shoulder review would be almost impossible to coordinate.

This type of code review allows the coder and reviewer to operate independently, with the reviewer examining the code and providing feedback at their convenience. It minimizes the disruption associated with synchronous reviews but can lead to extended review cycles spread over several days. Some teams prioritize reviews at specific times, such as after breaks, to mitigate delays and maintain a reasonable review turnaround.

Asynchronous reviews may be a good fit, allowing Ana's globally distributed team members to inspect code without forcing real-time alignment of schedules. However, she worried that long feedback delays could still be an issue without some way to focus reviews.

This strategy uses specialized code review tools to streamline and enhance the review process. These tools facilitate simplified workflows for submitting changes, requesting reviews, annotating code, tracking issues, and approving/rejecting alterations.

Modern code review platforms aim to assist teams in performing effective reviews without frustration. The most capable tools like Graphite, GitHub, GitLab, and Phabricator build lightweight code review workflows on top of existing systems.

Automation can streamline rote tasks like assignments, notifications, metrics gathering, policy compliance tracking, and more. However, restrictive automation that strictly dictates practices can hinder productivity, especially when developers have a preferred workflow. The most effective systems strike a balance—providing helpful guidance while keeping humans firmly in the loop.

On the other hand, Tool assistance can incorporate team standards and best practices directly into the existing flow of work. Checklists, templates, and visibility help streamline lightweight reviews without excessive processes and SOPs.

Ana's globally distributed team members often face challenges aligning schedules for synchronous reviews. While tool-assisted code review offers benefits such as consistency, reduced manual effort, and customizable workflows, Ana worries that relying solely on tools could further complicate the issue. On the other hand, automated processes might unintentionally limit their ability to work together without careful adjustment to fit their specific needs. So, while code review tools are invaluable for efficiency, Ana recognizes the importance of balancing their benefits with the need for flexibility in her team's workflow.

Pull requests have become a standard in open-source and commercial development for improved code review. You can use pull requests for pair programming, formal code reviews, and most other code reviews—making it a flexible strategy, which is why most companies stick to pull request-based code reviews.

This method uses a version control system (VCS) like Git to submit code changes for review before merging, supporting collaboration and iterative feedback through comments and approvals.

Many development teams adopt this method due to its streamlined integration into daily workflows. If your team follows this method, you may also want to ensure they adhere to the pull request best practices for improved efficiency.

Ana could see how using pull requests as the vehicle for their code reviews could address some of the issues that led to their recent incident:

  • PRs connected to tickets make the scope for review more manageable.

  • The PR approval process acts as a speed bump, preventing changes from being merged without proper inspection.

  • Comments and version histories support discussion and iterative improvement of the changes.

Such change-based reviews would align well with Ana's team's Agile approach of working in fixed-length sprints and tracking progress via user stories and tickets.

However, one notable challenge with regular pull request workflows is that PRs often tend to become sizable, leading to review delays.

Large PRs could wait for a review for days—and, according to surveys, sometimes even years.

Thoroughly examining and validating thousands of lines of code across multiple files, while comprehending the PR's purpose, can be overwhelming for reviewers.

“I’ve found that pull request size solves a lot of the issues that people have with code reviews and quality. When people see a very large pull request, there is a tendency to skim and then slap on an Approval. Keeping pull requests small typically leads to a more thorough review because it’s much easier to parse the changes and build a mental model. This usually leads to better feedback. This also helps prevent less experienced devs from going crazy down the rabbit hole and making a huge code change. Small and steady is best, and fostering a culture where people are often asking each other questions and collaborating is key.” — Hacker News user, matthewwolfe

Large PRs may also lead to negligence, and reviewers may approve buggy code. In these situations, reviewers may do a quick skim instead of a detailed review—making it easier for bugs to go unnoticed and reducing the review process's thoroughness.

This issue is common enough that there is a large stock of memes floating around the web.

What’s the solution? 

Smaller, focused PRs enabled by stacked pull requests. Pull request stacking encourages a modular breakdown of massive changes into interconnected stacks of bite-sized PRs. This complements reviews by reducing complexity, making changes easier to validate without blocking progress.

Let’s understand them in more detail.

Our analysis of over 1.5 million pull requests compared the number of files changed to the time those PRs took to merge. The data revealed clear patterns:

  • The fastest PRs changed the fewest files, with a median time-to-merge 3X higher for 5+ file changes.

  • Review complexity grows with more files, requiring elevated cognitive effort.

  • Git's per-file model means more files increase rebase conflicts.

Stacked pull requests involve breaking large feature changes down into a sequence of small, dependent pull requests that build on each other like a stack.

This forces changes to be structured into logical building blocks that are easy to review incrementally and without blocking progress. However, most code review tools are not built to support stacking. They approach code reviews traditionally, leaving much room for improvement.

That’s where Graphite comes in.

Graphite automates the Git branching and syncing required to maintain the relationships between stacked PRs.

However, the key benefit is the fundamental shift towards modular, layered changes that reduce complexity for authors and reviewers.

Stacked pull requests divide large code changes into smaller, interconnected pieces, simplify the review process, and enhance comprehensibility for those evaluating the code. Additionally, this method allows developers to maintain a swift pace of work without compromising accuracy. Let's explore the key advantages:

Reviewing a massive pull request that alters a vast amount of code is overwhelming, making it hard to keep track of the numerous interconnected changes. Dividing these changes into smaller, logically organized stacks clarifies and focuses each part. This approach enables code reviewers to easily understand small changes while maintaining an overview of the entire project.

For example, you could organize enhancements to a checkout page on a shopping website into separate, sequential stacks, such as:

  • Change the page layout.

  • Improve the order summary display.

  • Add upsells before the checkout button.

  • Integrate additional payment gateways.

  • Modify the order processing method.

Instead of a monster pull request, the stacks split it so experts can check more easily.

Figuring out hidden connections between changes is hard—especially when the code reviewers don’t have enough context. Stacked pull requests from within Graphite, visually lay out how each part fits together flat out.

Reviewers can instantly see relationships, and keeping stacks working as stuff shifts becomes super easy. These interdependent stacks make understanding relationships easier, thus making for a much more thorough code review.

With stacked PRs, developers no longer have to wait for the review to be completed and merged before moving to the next feature or part of the feature.

They can start a new branch from the feature branch, write the new code, and submit that as another small, independent PR.

As the reviewers go through the PRs, they can suggest changes that the developer can now make, and the changes are automatically synced into the stacked PRs.

Because the approvals or feedback are faster than traditional methods, code flows through the pipelines much quicker, releasing new features more frequently.

"I've been using Graphite for a week and it's already saved me ~20 hours of work." — Forbes Lindesay, Senior Software Engineer, Mavenoid

Ana could immediately see how Graphite’s stacked PR approach could have prevented their recent incident:

  • Large, unstructured changes are transformed into manageable, incremental improvements.

  • Direct dependencies between changes are explicitly defined, making their relationships clear.

  • Each small PR allows for a thorough review with minimal effort, enhancing code quality.

While the stacked PR workflow would require a mindset shift, the investment would pay dividends in more efficient reviews and increased dev velocity in the future.

Look, there is no "perfect" code review process. Each code review process has its place, depending on the project's requirements, team, and goals.

Modern Agile teams, especially distributed ones, should actively default to change-based reviews through pull requests. Pull requests streamline submitting code for inspection and support vital collaboration through comments and iterations. The key then becomes using automation to maximize the effectiveness of PR-based reviews.

This is where stacked pull requests shine.

Breaking large changes into modular building blocks reduces complexity, and reviewers can easily inspect changes without blocking overall progress.

Tools like Graphite take this to the next level by fully automating the Git workflow required to interlink stacked PR dependencies. The code review velocity is further boosted through auto-assignment, notifications, and metrics. Graphite's UI lowers the activation barrier so teams can easily shift to the new stacked PR workflows.

"With Graphite’s stacks, sync, and simplicity, engineers at The Browser Company can prototype, review, and ship features faster" — Andrew Monshizadeh, Software Engineer, The Browser Company (Arc Browser)

The bottom line is that centralized change review is necessary for any team serious about quality. Transition to stacked pull request workflows, improve code review accuracy, and get unblocked with Graphite today.

Sign up for free now and let your team experience the benefits of simpler, faster, and more effective code reviews from day one.

Built for the world's fastest engineering teams, now available for everyone