Graphite Reviewer is now Diamond

How merge queues help prevent flaky tests from blocking deployments

Greg Foster
Greg Foster
Graphite software engineer
Try Graphite

Flaky tests are automated tests that yield inconsistent outcomes—passing or failing without any changes to the codebase or environment. Common causes include race conditions, reliance on external resources, lack of test isolation, and outdated or orphaned code.

In CI/CD pipelines, flaky tests can be particularly disruptive. A single flaky test can cause a pull request (PR) to fail its checks, blocking its merge and delaying deployments. Moreover, in a merge queue setup, a flaky test can halt the entire queue, affecting multiple PRs and wasting CI resources.

Merge queues manage and sequence PRs before they are merged into the main branch. They ensure that each PR is tested in the context of the latest codebase, reducing the chances of integration conflicts.

However, when flaky tests are present, merge queues can become bottlenecks. A flaky test failure can cause a PR to be removed from the queue, triggering re-tests for subsequent PRs and leading to delays.

The Graphite Merge Queue is designed to streamline the integration of pull requests (PRs) into the main branch, addressing challenges posed by flaky tests in CI/CD pipelines. By automating the rebase process and ensuring that each PR is tested against the latest codebase, it helps maintain a stable and green main branch, reducing the likelihood of deployment blocks due to test flakiness.​

Identifying and isolating flaky tests prevents them from affecting the merge queue. Tools can detect tests that fail intermittently and quarantine them, ensuring they don't block PRs.

Some systems allow for automatic retries of failed tests. If a test passes on a subsequent attempt, it may be deemed flaky, and the PR can proceed. However, excessive retries can mask genuine issues, so this approach should be used judiciously.

Running tests in parallel can expedite the CI process. The Graphite platform supports parallel CI runs, allowing multiple PRs to be tested simultaneously. However, with flaky tests, parallelism can lead to increased CI runs due to test failures, so it's essential to balance concurrency levels.

  • Monitor and log test results: Keep track of test outcomes to identify patterns indicative of flakiness.

  • Regularly review and refactor tests: Ensure tests are deterministic and isolated from external factors.

  • Mock external dependencies: Replace calls to external systems with mocks to reduce variability.

  • Use containerization: Tools like Docker can provide consistent environments, minimizing discrepancies that lead to flaky tests.

Flaky tests pose significant challenges in CI/CD pipelines, especially when using merge queues. By implementing strategies like quarantining flaky tests, using retry mechanisms, and leveraging tools like Graphite, teams can mitigate the impact of flaky tests, ensuring smoother deployments and improved developer productivity.

Built for the world's fastest engineering teams, now available for everyone