Empirically sup code review best practices

Code review is a cornerstone of great engineering process. Done right, code reviews catch defects early, spread context across the team, enhance software design, and help sustain code quality over time as codebases scale.

When review workflows are inefficient however, they can grind development to a crawl—damaging the team’s morale and momentum. The key to a good code review process is striking the right balance between speed and accuracy.

Teams with lightweight code review processes can quickly circulate feedback while upholding quality standards. Surveys across large tech organizations like Microsoft reveal the various techniques they use to improve reviews and accelerate release cycles. Often, just a few simple tweaks can add up to huge results.

Let’s look at some techniques to improve your company’s code reviews.

Start with a readable description

The first step of a successful code review is ensuring the code author provides proper context. Every pull request should begin with a short description summarizing the following:

What actually changed in the code?
Why were these changes made?
Does this pull request introduce any risks?

This overview helps reviewers understand the essence of the changes without diving into the code itself. Aim to get at least a one-paragraph summary so any team member can grasp the primary objective and functionality of the code.

Create smaller pull requests

One of the most data-backed code review best practices is to work with smaller, more focused pull requests. Though bundling many changes into one large PR may seem easier, the data shows this consistently causes problems as reviewers are much less thorough when reviewing larger pull requests.

average & median time to merge PRs by lines changed

Source

Splitting a PR into multiple smaller ones is better for the longer-term health of your code. Here are several benefits of segmenting changes into smaller pull requests:

More correct: the smaller the PR, the easier it is to design it well and the less likely the author is to introduce bugs.
Easier to review: not only are small PRs reviewed more quickly, but they tend to be reviewed more thoroughly.
Faster to merge: unsurprisingly, small PRs are reviewed more quickly, less likely to have rebase conflicts, and more likely to be merged faster than larger PRs.
Better if things go wrong: in the unfortunate event that a PR is rejected, there is less wasted work; if the PR ends up being reverted it's easier to identify and easier to revert.

Code reviewers in the above survey mentioned they were 3x more likely to thoroughly review a small 500-line PR compared to only skimming a 2000-line PR. Try to keep pull requests below 200 lines changed whenever possible. Not only are reviewers more likely to approve changes, but the number of post-merge bugs becomes substantially lower.

Stack related pull requests

Beyond keeping individual pull requests small, separating logically distinct changes into separate PRs can be extremely helpful, even if they are to be merged later. These individual pull requests can then be “stacked” on top of each other enabling you to keep pushing changes even without waiting for a dependent PR to be merged.

For example, consider that you need to add a new payment form to your website.

This feature can be broken up into three logical sections:

Frontend form components
API endpoint and logic
Database table modification

diagram showing trunk-based workflow vs Stacked PRs

Breaking up this single “feature” into multiple pull requests helps you assign appropriate reviewers on their domain-specific expertise, ensuring you get the most relevant eyes on every section of code. Stacking these changes also allows modifying one set of changes without blocking the entire feature.

For instance, if the backend API change gets rejected or needs rework during review, the frontend code can still be reviewed and approved. Then once the backend changes are approved, the frontend branch can be automatically rebased and merged together with the rest of the stack.

On the other hand, in a feature branch model, even if you were to break up the monolithic payment form into smaller pull requests, starting work on a subsequent pull request would be blocked on the previous change being merged. In the case of our payment app, before starting work on the front end changes, you’d have to wait for the backend API to be reviewed and merged. This slows down development considerably.

Try to conceptualize features at a high level, break them into solution domains, and break down these changes into granular, stacked, PRs for review.

Similar to multithreading functions, “stacking” enables parallel workstreams, increasing efficiency and allowing for more granular reviews rather than rejecting an entire monolithic feature outright. By parallelizing review and development, stacking increases engineering efficiency, and reduces the burden on both author and reviewer.

There are even developer tools like Phabricator (no longer maintained), and Graphite that are specifically designed to automate the complexities that manually stacking introduces. Using tools like Graphite will encourage default behaviors that lead to better, more manageable PRs.

Maintain manageable code review scope

As you work towards keeping individual pull requests small and stacked, work on developing standards around the total size and complexity of code changes under review at a given time.

Too many code reviews waiting in the queue can slow down development velocity. Here are a few key things teams should consider doing to maintain a manageable code review scope.

Keep reviews under 60 minutes

Research suggests the ideal individual code review session lasts about 60 minutes. Around this threshold, reviewers become drained, less focused, and unable to maintain peak accuracy.

Aim to scope review workloads such that team members can regularly complete high-quality reviews without needing to exceed this 60-minute attention span ceiling frequently. Consider anything more as a warning sign of overloaded queues that warrants addressing.

Keep pull requests less than 400 lines

Reviewer attention, comprehension, and recall drop precipitously beyond an average of 400 lines.

graph of attention span and accuracy vs lines of code to review

A SmartBear study of a Cisco Systems programming team suggests that reviewing fewer than 400 lines of code (LOC) leads to a higher defect discovery rate, with optimal inspection rates under 500 LOC per hour. Setting this limit helps keep PRs small and review thoroughly instead of skimming

Eliminate context switching

As a reviewer works through disparate pull requests and switches contexts, the time required to reorient can seriously undermine efficiency. One study found that humans need over 20 minutes to recover their train of thought after a short, two-minute interruption.

This context switching can be easily avoided by setting processes that prevent reviewers from constantly jumping between unrelated pull requests.

Some proven techniques include:

Allow self-selection: Let reviewers cherry-pick from available pull requests working on areas of the codebase they are already familiar with rather than being directly assigned.
Only assign relevant reviewers: Only assign engineers PRs that are relevant to them. For example you wouldn’t want to assign your front end expert a PR dealing with security in your AWS infrastructure.
Batch related PRs: Group related pull requests with labels so reviewers can iteratively make their way through a complete stack of changes within a contained system.
Limit allowed WIPs: Add work-in-progress limits that restrict how many parallel pull requests anyone can have assigned for review simultaneously, minimizing task switching.

Maintain a code review checklist

Well-designed review checklists can standardize and simplify manual review processes and highlight key focus areas for code reviewers.

Studies show that checklist-driven code reviews increase defect detection rates by over 66.7% compared to non-checklist-based methods.

Now, checklists should generally be tailored to your team’s tech stack and project requirements. But here are a few common categories you can include:

Functionality: Does the code perform as intended?

Start by double-checking that all new code works as expected and provides the right outputs.

Test boundary conditions and edge cases thoroughly. Also, verify error handling behavior for invalid inputs and unexpected scenarios to avoid letting any failures or crashes into the final product.

For example, walk through user interfaces to confirm that all the workflows succeed without logic failures. Try invalid inputs and extreme use cases when calling APIs. You can also inspect data pipeline logic for consistency.

Readability: Is the code clear and easy to understand?

To make code maintenance more manageable, the codebase must be easy for other developers to read and understand. So, check that the code is documented and has logical explanations throughout.

The goal is to make it easy for a new coder to start working and improving the existing codebase with as little help as possible.

For example, check that variable and function names are descriptive and capture their intent. Keep consistent naming conventions across the codebase to avoid ambiguity and check to see if functions are not nested too deeply. Group similar functions together.

Maintainability: Is the code structurally sound for future changes?

Ensure any new code is written to be easily adjustable if and when you need to change things later.

According to a study, 75% of defects found in code reviews directly affect the evolvability or maintainability of the software rather than functionality.

Start by breaking logic into smaller independent functions. Avoid linking important things to unstable functions that change frequently over time—document why certain structural decisions were made so that reasoning is not lost.

While dirty and fast code shows quick impact at first, the cost to maintain the code goes exponentially higher as the code becomes more complex. So, when in doubt, prioritize modular code.

Favor loose coupling to enable faster testing and development of modules in the future. Also, keep documenting design decisions and trade-offs for future reference.

Security: Does the code safeguard against vulnerabilities?

Perform thorough security audits and vulnerability scans to identify potential weaknesses.

Maintain a security best practices checklist and also ensure that you add access controls to limit entry to sensitive code blocks—check that you’re handling.

For example, sanitize and confirm the validity of all external inputs. Check restrictions on modifying critical data stores and review storage and transmission encryption used.

Performance: Is the code optimized for efficiency and speed?

Code review should also identify and optimize resource bottlenecks through performance analysis.

Find sections to use caching and optimize database queries to improve efficiency. Flag repeated expensive calls out to external services where optimization is possible. Monitor emerging bottlenecks against current benchmarks.

For example, measure the runtime of computational logic. Minimize unnecessary database queries. Check memory usage growth indicators. Find ways to refactor inefficient algorithms and logic with more optimized implementations to reduce execution time. Utilize pre-commit hooks and automated tools to enforce linting rules and ensure the code adheres to the established style guide.

Coding standards: Are best practices and conventions followed?

Use linters to automatically catch styling, formatting, imports, and type hinting issues.

Source

Enforce docstring templates for documentation and flag uses of problematic language idioms or anti-patterns.

For example, format code consistently project-wide. Specify commenting and context requirements. Enforce consistent coding styles and formatting guidelines. Adhere to established project conventions and best practices for readability and maintainability.

Using pre-defined issue checklists, automated tests, and linters can help reviewers quickly validate requirements in these areas rather than through complex manual analysis alone. The ultimate goal is to provide guard rails, ensuring code quality without creating undue overhead.

Provide direct, constructive, and respectful feedback

Empirical research shows that code review effectiveness depends on teams establishing an environment with open, constructive dialog.

Keep review comments focused on ideas for improving the system rather than criticisms of an author. Qualitative explanations outlining potential refinements or alternatives give authors actionable suggestions for enhancing their work rather than vague criticism.

It's also important that this feedback be delivered respectfully, with reviewers considering emotional intelligence alongside technical accuracy. Studies find that a single abrasive remark can deter engineers from performing up to potential. Instead, offer constructive criticisms and assume positive intent while acting patiently and gracefully.

Ultimately, high-performing teams start using code reviews as opportunities for mentoring and learning rather than condemnation. Aim to cultivate this empathy and intellectual humility right from the start.

Document and explain code decisions

Require authors and code reviewers to document and explain the motivation and justification for design decisions made within any code change.

Whether in the pull request overview or inline comments, call out considerations around:

Why a particular programming pattern or language feature was selected
Alternatives explored and rejected
Tradeoffs made regarding performance, coupling, readability, etc.

Requiring such explanations serves several ends:

It forces authors as well as reviewers to think through decisions
Documents the context for future maintenance cycles
Speeds reviewer understanding when motivations align
Surfaces incorrect assumptions early when mismatches occur

Over time, these documented discussions become knowledgebases that benefit the entire team. The review process automatically offers guidance if the code author lacks experience in making architectural decisions independently.

Limit nits

A common complaint about code reviews is they become dominated by stylistic critiques and other minor issues—so-called bikeshedding or nits.

These “nits” distract authors from focusing on substantive problems and reduce morale when over-emphasized.

To avoid this, you must establish that reviewers avoid nit-picking or offering stylistic feedback during the core peer review process. Instead, capture minor issues in a designated parking lot to address separately or batch it for later.

Example nits:

Comment wording preferences
Code organization and placement
Bracket or semicolon position
Filename conventions

Consider instituting a “no nits” or issues-only policy for peer reviews. Some teams also implement “nit keeper” roles that capture nits for later correction instead of derailing current feedback.

With this distraction eliminated, reviewers can stay centered on identifying meaningful defects and optimization opportunities during tight 60-minute windows.

Automate testing and review where possible

Code reviews burden programmers with balancing multiple responsibilities under tight deadlines. Find opportunities to automate them instead of pushing developers to do rote tasks. Example categories that are prime for automation:

Formatting: Apply linters to enforce style guide rules for whitespace, indentation, quotes, and semicolons automatically without wasting time.
Patterns: Scan code for problematic patterns using codified knowledge bases.
Vulnerabilities: Inspect dependencies against CVE databases. Identify risky constructs like SQL injection and XSS vectors.
Best practices: Check for unused variables, unnecessary imports, ignored tests, and missing types, and keep customizing the rules over time.
Coverage: Enforce minimum automated test coverage by area or annotations to validate functions and edge cases.

As you invest in robust automation to verify correctness, standards alignment, and general code quality, developers and reviewers can focus efforts on judgment-oriented problems. Consistently applying automated checking also provides stability and safety nets, complimenting peer review even after deployment.

Create a predictable code review cadence

To habitually instill code reviews within team workflows, create a recurring inspection interval—and, if possible, put it on everyone’s calendar. You can start with an hour weekly for the team code review session.

Initially, you can also huddle in a meeting room and give each developer a chance to review their peer’s code. Offer real-time feedback and let the team offer feedback to the reviewer so everyone knows how quality reviews are done.

In a one-week study at Google, developers were reported to be happy with the peer code review requirements. Through the week where developers performed peer code reviews, over 70% of diffs were committed in less than 24 hours after they were sent for the initial review.

Slowly move towards mandatory peer review before all feature branches merge. Before you realize it, your team will be set on independently reviewing and fixing code before it goes for the big merge.

Keep review cycles short

Try to keep average review cycle times under one day. When code reviews are slow, the team's overall velocity decreases, and developers start to protest them.

If your current delays stretch beyond a single day, consider the following remedies:

WIP limits: Cap how many pull requests an individual can have in review simultaneously.
Time boxing: Ask reviewers to provide feedback within a max window (4 hours), even if abbreviated to honor other commitments.
Asynchronous review: Allow PR merging if sufficient coverage is met through ongoing asynchronous reviews.
Split large changes: Split changes (changelists or CLs for ex-Googlers) into several smaller, atomic changes to make them manageable and easier to review.

There are certainly tradeoffs for adjusting policies too aggressively and too fast. But make the decision based on reviewer experiences to ensure velocity. Keep reviews regular, lightweight, and fast-flowing.

Create and measure metrics

Monitoring code review metrics provides crucial visibility to optimize workflows over time. To get started, remember to track key metrics like:

Publish to merge time: median time between a PR being marked as ready for review for the first time, and it being merged.
Review cycles until merge: the max number of review cycles any reviewer had until the PR was mergedThen across all PRs, computes the median of this number
Reviewer workloads: ensure fair distribution by tracking total pull requests and weekly reviews per person.
Test coverage: enforce minimum automated test coverage by area to validate functions.

Periodically visualize these metrics against benchmarks to reveal workflow constraints that need attention. For example, identify patterns of:

Slowing output due to individual overloaded queues
Lower acceptance rates suggest inadequate review scrutiny
Test gaps leaving code insufficiently validated

Inspecting key metrics and responding to deviations can help you optimize processes for smooth code flow from creation through review, testing, and deployment.

Create the perfect lightweight code review process

I get it. Relying on what's familiar is easier—everyone adds feedback in a PR thread, and it’s worked upon until the PR gets merged with main.

But this approach slows things down, stresses everyone out, and builds tech debt instead of creating good systems. As the code becomes complex, you start seeing the consequences of this messy code review process.

Stacking solves this by helping code move more freely.

With stacking, you break complex tasks into smaller PRs, understand interdependencies, and sequence these PRs—making your approach about insight and accuracy.

Tools like Graphite make the implementation of stacking much smoother. You get the visibility and coordination you need to visualize blockers, track and even attribute success metrics.

Tooling that is "stacking-first" pushes your team to focus on small and focused pull requests, helping you speed up code review and implement best practices with minimal training.

So, why keep playing by the old rules? Effective code reviews are already here. Sign up for Graphite to start streamlining your team’s code review journey!

Empirically supported code review best practices