Table of contents
- Why commit counts alone fall short
- Modern productivity frameworks (DORA & SPACE)
- Key metrics beyond commit count
- Tools for measuring developer productivity
- Use cases and examples
- Conclusion
Measuring developer productivity is about much more than counting commits. In modern software teams, understanding how efficiently and effectively developers work requires a holistic view of the development process. While the number of code commits or lines of code was once used as a proxy for productivity, these raw counts fail to capture code quality, collaboration, or the value delivered. This article explores how to measure developer productivity beyond commit counts, using proven frameworks, practical metrics, and tools. We'll define key concepts, provide examples, and highlight how tools like Graphite can assist with code reviews and productivity insights.
Why commit counts alone fall short
Relying on commit counts as a productivity metric is misleading. A high commit count or many lines of code might look "productive" but says nothing about the quality or impact of those changes. In fact, developers can game these metrics by splitting trivial changes into many commits or writing overly verbose code. As one expert noted, some of their most productive days involved a single small commit, while days with the most "code churn" were less productive. Focusing on commit quantity can also discourage important work like refactoring or planning, since those activities produce fewer commits but are vital for long-term success.
Misconceptions abound, such as "more code = more productivity." In reality, concise, high-quality code is often more valuable than a high volume of changes. Likewise, no single metric — be it commit count, pull request frequency, or sprint velocity — can fully capture a developer's contributions. A combination of metrics is essential. For example, a developer who fixes a critical bug with a one-line change has delivered enormous value, even though their commit count is low. This is why modern engineering organizations turn to more comprehensive measurements.
Modern productivity frameworks (DORA & SPACE)
Over the past decade, the industry has developed frameworks to measure productivity and performance in software teams more holistically. Two notable frameworks are DORA and SPACE:
DORA Metrics: Originating from the DevOps Research and Assessment group, DORA metrics focus on software delivery performance. The four key DORA metrics are:
- Deployment frequency: How often releases happen.
- Lead time for changes: How quickly code goes from commit to production.
- Change failure rate: What percentage of deployments cause failures.
- Time to restore service: How fast incidents are resolved.
These metrics emphasize both speed and stability, reflecting a team's ability to deliver value quickly and reliably. For instance, shorter lead time and frequent deployments indicate an efficient pipeline, while a low failure rate signals good quality and testing practices.
- SPACE Framework: Proposed in 2020 by researchers including Margaret-Anne Storey and colleagues, SPACE is an acronym for Satisfaction, Performance, Activity, Communication/Collaboration, and Efficiency/Flow. This framework broadens the perspective beyond pure output. It considers developer satisfaction and well-being, the performance outcomes (e.g. feature completion or code quality), the activities developers engage in (like code reviews or contributions), team collaboration effectiveness, and the efficiency of workflow (e.g. uninterrupted "flow" time). SPACE reminds us that developer productivity isn't just about speed – it's also about developer experience and team dynamics. For example, a team might have high output but if developer burnout is high (poor well-being) or collaboration is low, productivity in the long run will suffer.
These frameworks highlight that measuring productivity requires multiple lenses. Historically, productivity was often gauged by outputs like lines of code, which don't reflect true software quality or team health. DORA and SPACE evolved to fill this gap, emphasizing that fast delivery, stability, satisfaction, and collaboration all matter for sustainable productivity.
Key metrics beyond commit count
To move beyond the simplistic commit count, engineering teams track a variety of metrics that reflect code quality, collaboration, and workflow efficiency. Here are some key metrics commonly used to measure developer productivity in a more comprehensive way:
Total pull requests merged: The total number of pull requests (PRs) merged into the main branch in a given period. This indicates the team's overall output and throughput of completed work. Unlike raw commits, merged PRs typically correspond to completed units of work (features or fixes) that passed code review and tests.
PRs merged per engineer: The median number of PRs each developer merges in that period. This highlights individual contribution rates and helps identify if work is unevenly distributed. It's a more useful productivity signal than commit count because merging a PR implies the work was reviewed and integrated, not just pushed to a personal branch.
PRs reviewed per engineer: How many pull requests each team member reviews for others. Code review is a critical collaborative activity; a high number of reviews per engineer suggests a healthy, collaborative culture. It prevents silos and spreads knowledge. Productivity isn't just about writing code, but also about improving code via peer review.
Wait time to first review: The median time from when a PR is opened to when it receives the first reviewer feedback. Faster review response times mean developers aren't blocked waiting on feedback, which keeps the development flow moving. A long wait time to first review can indicate overburdened reviewers or poor team practices, leading to slowdowns.
Lines changed per PR: The average or median size of a pull request in lines of code (added + removed). Smaller PRs are easier and faster to review and less likely to introduce bugs. In fact, Google's engineering research suggests aiming to keep PRs around ~50 lines for optimal review efficiency. Tracking PR size helps teams encourage bite-sized, incremental changes instead of giant "mammoth" PRs.
Publish-to-merge time (cycle time): The duration from when a PR is published (opened) to when it is merged and released. This end-to-end cycle time measures how quickly code changes move through the pipeline, encompassing coding, review, and integration. Shorter cycle times mean faster delivery of value to users. If this metric is high, it could point to slow code reviews, extensive rework, or CI pipeline delays.
Review cycles per PR: The number of review iterations a PR goes through before merge. Each cycle might be new changes requested by reviewers. Fewer review cycles suggest that the code was of high quality and clearly communicated in the first round. Multiple cycles could highlight miscommunication or areas where authors need to incorporate feedback more proactively. Reducing review cycles saves time for both authors and reviewers.
By combining these metrics, teams get a much richer view of productivity. For example, if "PRs merged" is low but "wait time for first review" is high, the bottleneck might be code review bandwidth. If "lines per PR" is high and "review cycles" are also high, it suggests huge PRs that require multiple rework rounds – an opportunity to encourage smaller, more frequent commits. Crucially, these metrics shift the focus to throughput, quality, and collaboration rather than raw output.
Tools for measuring developer productivity
Capturing and analyzing the above metrics can be challenging without the right tooling. Fortunately, a range of tools has emerged to help teams measure and improve their development workflow:
Graphite – Developer productivity and code review analytics: Graphite is a tool specifically designed to streamline code reviews and provide insights into engineering workflows. Graphite enables a practice called "stacked diffs" – breaking large changes into smaller, reviewable PRs – which leads to faster, easier reviews. By keeping pull requests small, Graphite helps developers "get code out there faster" with less friction. On the analytics side, Graphite Insights dashboards visualize key metrics like median publish-to-merge time, time to first review, PRs per engineer, and more. For example, Graphite can show the median review turnaround time for your team, helping identify slowdowns. It integrates with GitHub and even provides a CLI/VS Code extension to make managing PR stacks effortless. In short, Graphite's tools assist with code reviews and give data to measure improvement areas (e.g. if adopting smaller PRs actually reduced cycle time, you'll see it in the dashboard).
Built-in Git platform analytics: Many Git platforms offer basic metrics. For instance, GitHub's Insights for a repository can show contribution graphs, and the GitHub API allows extracting data on PR timestamps, reviews, and more. Teams can script custom reports or use integrations. While not as comprehensive out-of-the-box, these can track things like commit frequency, open vs closed PRs, and code review comments. With some effort, one can leverage the GitHub API or tools like GitHub's Four Keys Dashboard (an open-source project by Google) to collect DORA metrics (deployment frequency, lead time, etc.) from Git data.
Engineering analytics platforms: Several specialized tools help measure developer productivity across the software delivery pipeline. For example, Pluralsight Flow (formerly GitPrime) and Code Climate Velocity aggregate data from version control, code reviews, and issue trackers to produce detailed reports on PR throughput, rework rates, and even individual coding patterns. LinearB is another platform that provides real-time metrics like cycle time, PR size, and review depth, often with benchmarking against industry data. These platforms often implement DORA and other metrics under the hood and provide dashboards for engineering leaders to spot bottlenecks. According to one industry report, focusing on smaller PRs and better review practices (often highlighted by these tools) has been correlated with higher merge rates and team velocity.
Project management and ticket tracking tools: Productivity can also be viewed through the lens of feature delivery and project flow. Tools like Jira or Linear track how work items (user stories, tickets) progress, which can be used to derive metrics like lead time per issue, or throughput of tasks completed per iteration. While these aren't code metrics, they provide context – for example, a developer might have few commits but may have closed many important tickets. Some teams even use point systems for completed tickets as a productivity indicator, though care must be taken to avoid perverse incentives. The key is linking code changes to user stories to ensure engineering output aligns with business value delivered.
CI/CD and monitoring tools: Developer productivity is also tied to the efficiency of continuous integration and deployment. Tools such as Jenkins, CircleCI, or GitHub Actions can track build/test frequencies and failure rates. Monitoring tools in the DevOps toolchain (like Datadog, New Relic, or Grafana with Prometheus) won't measure code productivity directly, but they ensure the pipeline and environments are healthy. Interestingly, teams are adopting observability principles here: for instance, instrumenting the CI pipeline to monitor how often builds fail or how long deployments take. This is analogous to DORA metrics – it helps pinpoint if slow deployments (an operational issue) are dragging down developer effectiveness.
In choosing tools, the goal is to reduce manual effort in gathering data and to present actionable insights. The right tooling can automate the collection of metrics, correlate them, and even recommend improvements. For instance, if your merge time is trending up, a tool might suggest adopting a merge queue or using auto-assignments for reviewers. Always remember that tools are there to assist, not to impose surveillance – metrics should be used to foster continuous improvement, not to micromanage developers.
Use cases and examples
Let's consider a few scenarios that illustrate measuring productivity beyond commit counts:
Identifying review delays: A mid-sized software team was concerned that features were taking too long to ship. By looking at metrics, they discovered the median wait time to first review on pull requests was 2 days – a major blocker. Engineers were opening PRs but then sitting idle waiting for feedback. Using a tool to track this metric, they adjusted their process: dedicating specific daily slots for code review and using auto-notifications. As a result, first review time dropped significantly, and overall cycle time (publish-to-merge) improved. In this case, commit counts alone would not have revealed the bottleneck; only by measuring review latency could they pinpoint the issue.
Improving PR size and quality: Another team found that their average PR touched hundreds of lines of code, and many PRs required 3-4 review cycles of rework. Large, lengthy PRs were causing reviewer fatigue and allowing more bugs to slip through. By setting a team goal to reduce the lines changed per PR and promoting the idea of "one focused change per PR", they saw improvements. Small PRs (under ~200 lines) began to merge faster, in one or two cycles. One Google study even recommends ~50-line changes for optimal review efficiency. The team used Graphite's stacked diffs feature to help break down big changes into smaller sequential PRs, making this practice easier. Over a quarter, their bug introduction rate went down in tandem with PR size, showing that code quality improved when reviews were more focused. This real-world use case shows how a metric like PR size, combined with the right tooling, can drive positive behavioral change.
Balanced workload and collaboration: In a large open-source project, maintainers worried that a few contributors were overburdened with code reviews while others only committed code. They started tracking PRs reviewed per engineer and visualized it. The data revealed an imbalance: only a couple of core team members did most of the reviewing. To fix this, they initiated a rotation system and mentorship to get more contributors involved in reviews. Over time, the review count metric became more balanced across the team. This not only prevented burnout of key reviewers but also improved knowledge sharing (more people reading each other's code). As an added benefit, the project's bus factor improved (fewer single points of failure), and new contributors felt more engaged. This example highlights how measuring collaborative metrics (not just individual output) leads to healthier team dynamics.
Impact of new tools (AI example): A recent case study by Harness looked at the impact of introducing GitHub Copilot (an AI pair-programmer) on a team's productivity. The results showed a 10.6% increase in pull requests merged and a 3.5-hour reduction in average cycle time after adopting the AI assistant. This example reinforces that modern productivity gains often manifest in metrics like PR throughput and cycle time. It also shows that by measuring these outcomes, teams can quantitatively evaluate the impact of new tools or practices (in this case, AI coding assistance). It's not about commits; it's about more PRs completed and faster feedback loops leading to quicker releases.
Each of these examples demonstrates a key point: actionable insights come from looking at the right metrics. By focusing on aspects like review efficiency, PR size, collaboration, and cycle times, teams can identify where to improve. Crucially, these are all areas that commit counts alone would never illuminate. Real-world use cases prove that when teams measure what truly matters, they can experiment (process changes, tool adoption) and then see the results in improved metrics.
Conclusion
Developer productivity is a multifaceted concept – it's about speed, quality, collaboration, and developer experience. Measuring it requires moving beyond crude metrics like commit counts to more meaningful indicators. We've discussed how frameworks like DORA and SPACE provide structured ways to think about productivity, and how an observability mindset (considering multiple signals) trumps a simplistic monitoring approach of single metrics. By tracking metrics such as pull request throughput, review times, PR sizes, and cycle time, teams get a data-driven understanding of their performance.
Just as importantly, these metrics should be used with context and care. Always combine quantitative metrics with qualitative insights (like team feedback) and focus on trends rather than individual micro-measurements. The goal is continuous improvement of the development process, not ranking developers by numbers. As one guide advised, no single metric can provide a complete picture – use multiple metrics and consider quality and context alongside the numbers.
Finally, leverage tools to make this easier. Platforms like Graphite offer specialized support for measuring and improving code review efficiency, while other analytics and monitoring tools can round out your view. With the right metrics and tooling in place, engineering leaders and teams can identify bottlenecks, celebrate improvements, and ensure that "productivity" ultimately means delivering value – not just writing more code. By measuring what matters beyond commit counts, you set the stage for both high-performing and healthy development teams, where fast software delivery goes hand in hand with quality and developer satisfaction.