Read Anthropic’s case study about Graphite Reviewer

Continuous Integration (CI) is a cornerstone of modern software development. By automatically checking for regressions and linter errors on each Pull Request (PRs), CI frees engineers to focus on more nuanced, qualitative aspects of code review.

If you’re reading this, you likely already use CI. You may also be familiar with waiting for CI to finish. You may dread rebasing a PR or fixing a nit, knowing it’ll restart a long CI run. This dread is a common one, leading to an obvious truth - CI is good, but long CI can be bad. That begs the question - how long is too long for CI tests?


Note

Greg spends full workdays writing weekly deep dives on engineering practices and dev-tools. This is made possible because these articles help get the word out about Graphite. If you like this post, try Graphite today, and start shipping 30% faster!


Various industry resources suggest an ideal CI time of around 10 minutes for completing a full build, test, and analysis cycle. As Kent Beck, author of Extreme Programming, said, “A build that takes longer than ten minutes will be used much less often, missing the opportunity for feedback. A shorter build doesn’t give you time to drink your coffee.” While taking time out of your day to drink coffee is hardly a contentious ideal, there is some debate around this 10-minute "golden time," with some arguing “Even if it takes 1 day to release commit A, that's OK b/c 10min later commit B has been released (because it was pushed 10min after commit A).”

As in other debates like how to write good commit messages, we couldn’t just sit on the sidelines; we had to see for ourselves!

At Graphite, we've always taken a data-driven approach to validating conventional wisdom. In the case of CI, Graphite already syncs millions of GitHub action workflow runs in order to power our review experience.

In this investigation, we sampled PRs merged over the last three months, focusing on those which had at least one GitHub action workflow run and which had at least one code review. By correlating these times with other metrics like time-to-merge and weekly throughput, some interesting patterns emerged.

Our analysis found a clear correlation between CI time and the average time for a PR to get merged. That is expected - longer tests means more waiting before pressing the merge button. Surprisingly, however, this relationship is not one to one - an additional 5 minutes in CI time appear to increase average time-to-merge by over an hour.

This makes sense intuitively as CI runs many times over the course of a PR’s lifecycle. As it’s updated, reviewed, and merged oftentimes, the code will have to go through various build, run, and test cycles. A small increase in the time these cycles take can lead to a snowballing effect as the number of CI runs scales.

This highlights how important it is to always use best practices when building out your CI infrastructure.

While shorter CI times lead to any one PR merging in faster, it doesn’t seem to correlate to net more changes merging. Counterintuitively, we found CI times between 15-30 minutes correspond to the highest number of PRs merged per week per author. This graph shows that engineers with average CI run times around the 15-30 minute mark have the highest output in terms of number of PRs merged.

Our hypothesis around this conundrum is two-fold. Lightning fast CI times can be explained by products that are simply less complex: smaller builds, less tests, etc. On the other hand, extremely high CI runtimes indicate a lack of optimization: the builds are bloated, CI runs are flaky and include a lot of retries, tests are not written in an efficient way.

Teams centered in the middle of this curve are most typical of high-performing, well-staffed teams with more complex products and optimized CI infrastructure.

It must be noted also that there IS indeed a drop off from the 5-10 minute bucket to the 10-15 minute bucket suggesting that the “10 minute golden rule” may still hold some validity.

Engineering organizations need to find a balance. CI takes time, and it often takes longer than one would like. Based on the data, it would be fair to say that 10 minutes is indeed a sweet spot for many small fast PRs, however you shouldn’t beat yourself up too much if your CI times start creeping up into the 25-30 minute range. Once you get past that though and start hitting 30+ minutes? It’s probably time to sit down with your DevOps engineers.

If you don’t have easy ways to shorten your CI execution time, consider stacking your pull requests. Stacking allows engineers to break up changes into dependency graphs of small PRs, each able to run CI in parallel. PRs can be created, modified, and reviewed all while other code is waiting for CI to finish. The workflow is ideal for maintaining velocity despite long individual CI times.

  • No Lower Bound: Our data reveals no diminishing returns to optimizing CI speed, even below 5 minutes - even while filtering on PRs that awaited code review.

  • New Rule of Thumb: Expect a 1-2 hour decrease in time-to-merge for every 5 minutes shaved off CI time.

  • Consider Stacking Workflows: Teams with CI times in the 15-30 minute range should evaluate stacking to maximize efficiency.

  • Check your own stats: Review your stats today like “publish to merge time” and “number of PRs merged”. See how you stack up against the average developer and where you can improve.

Built for the world's fastest engineering teams, now available for everyone