Almost all of us have some experience with continuous integration, continuous delivery (CI/CD) systems. Whether you use GitHub Actions, Jenkins, or are one of the five people in the world still using Concourse, CI/CD systems provide the backbone of how most organizations build and ship software. However, one question we hear a lot from Graphite users is: how do we optimize our CI/CD pipelines to balance speed, cost, and efficiency?
In this post we’ll cover common challenges in CI/CD workflows, as well as share insights and best practices that our team here at Graphite has gathered over their collective decades of experience working at organizations of all shapes and sizes.
Note
Greg spends full workdays writing weekly deep dives on engineering practices and dev-tools. This is made possible because these articles help get the word out about Graphite. If you like this post, try Graphite today, and start shipping 30% faster!
Challenges in CI/CD
The two biggest potential problems in CI/CD flows are time and financial cost, which, if not managed properly, can quickly spiral out of control, grinding your eng team to a halt while running up a big compute bill in the process.
Time cost: Productive eng teams need fast CI processes. A slow CI pipeline can quickly become a bottleneck, delaying review feedback, slowing down merge queues, and hampering the overall pace of development. This is particularly painful for teams using trunk-based development, where short-lived branches and frequent merges are standard practice, meaning that CI disruptions can easily snowball as engineers add more and more PRs to the queue.
Financial Cost: The typical response to slow CI processes is increased parallelization, but this often comes with increased computational and hosting costs. An initially simple setup can, over time, morph into a complex array of parallel jobs. Because each of these jobs run on every revision of every PR, the associated costs can quickly balloon into tens or even hundreds of thousands of dollars in compute spend a month.
With these challenges in mind, fast-moving eng teams need to strike a delicate balance in their CI setups. Your CI process should ensure comprehensive test coverage, while at the same time balancing speed and resource requirements to keep costs in check. The 2nd half of this post delves into strategies and best practices for creating a time and cost-efficient CI/CD pipeline.
Strategies for optimizing CI/CD
Here are a few strategies that we use at Graphite to optimize our CI/CD pipelines:
Parallelization
It’s important to keep your CI runtime under control. Aim for a total CI time of under 10 minutes. Fast CI runtimes accelerate the time-to-merge and keep development momentum high.
Techniques for parallelizing unit tests:
Utilize testing libraries that support parallel execution. Match the parallelization level to the number of cores available on your test runner.
For example in GitHub Actions you can define matrices of different job configurations:
jobs:parallelization_matrix:strategy:matrix:version: [1, 3, 5]os: [fedora-latest, ubuntu-latest]
Each possible combination of variables will trigger its own job run, so for this example a total of 6 jobs will run. Jobs will all run in parallel, depending on runner availability.
Strategies for dividing tests into sub-groups:
Split unit tests by module, sub-project, or hash of test file names. This allows for running multiple jobs or workflows in parallel.
Example of a test splitting configuration:
jobs:test-module-A:runs-on: ubuntu-lateststeps:- name: Run module A testsrun: npm run test:module-Atest-module-B:runs-on: ubuntu-lateststeps:- name: Run module B testsrun: npm run test:module-B
Avoiding redundant work
Repetition in CI/CD flows is expensive. Avoid having each parallel job reinstall dependencies or rebuild the project. This can add significant time and cost to each CI run.
Best practices for caching and artifact management:
Use caching strategies to store and reuse build artifacts across jobs.
Recommended tools:
GitHub Cache Action: GitHub Actions' first-party caching action
Turborepo: efficient artifact management in monorepos
Example of efficient caching:
Cache your dependencies and build outputs to be reused in subsequent jobs.
Example GitHub Actions cache configuration:
steps:- uses: actions/cache@v2with:path: |node_modulesbuild-outputkey: ${{ runner.os }}-build-${{ hashFiles('**/lockfiles') }}
Selective testing
Avoid re-testing everything on every code change. This is inefficient, especially in large projects and monorepos.
Approaches for building and testing based on selective changes:
Implement CI logic to detect and test only the changed components.
Use tools like Vercel’s Turbo to detect changes and trigger the necessary builds and tests.
Example of selective testing using GitHub Actions’ paths filter on
push
triggers:
on:push:paths:- 'module-A/**'- 'module-B/**'
Fail-fast principle
Halting CI workflows on failure:
Configure your CI to stop all subsequent jobs upon the first failure, saving time and resources.
Example of a fail-fast setup in GitHub Actions using the if conditional and this custom action that uses GitHub’s API to cancel the current job:
name: cancel_workflowif: failure() || steps.lint_all.outcome == 'failure'uses: andymckay/cancel-action@0.2
Configuring CI to cancel redundant executions:
Avoid running multiple CI executions for the same PR; you should cancel any in-progress runs when a new push is made, as all previous runs are now running on a deprecated version of the branch.
Example GitHub Actions configuration for canceling in-progress jobs using GitHub’s
cancel-in-progress
keyword.
concurrency:group: ${{ github.workflow }}-${{ github.ref }}cancel-in-progress: true
Set timeouts to prevent hanging jobs:
Implement a maximum timeout for CI jobs to avoid long-running or stalled jobs.
Example timeout configuration:
jobs:build:runs-on: ubuntu-latesttimeout-minutes: 20
Through these strategies and practices, you can significantly optimize your CI/CD pipelines, ensuring that they are not only robust and reliable but also cost-effective and time-efficient.
Additional considerations
When optimizing your CI/CD pipeline, you should make sure to consider your team’s development workflows & processes. For example, your team’s repo setup will greatly affect how you approach CI:
Monorepo vs. polyrepo:
Monorepo advantages: In a monorepo, all of your projects and libraries reside in a single repository, simplifying dependency management and ensuring consistency across the codebase. This setup can significantly benefit CI/CD as changes across multiple projects can be tested together, ensuring that integration points are always in sync.
Polyrepo considerations: Polyrepos, where each project or library has its own repository, naturally shard code, which introduces complexity in managing atomic changes across repositories. While it may make testing siloed feature sets easier, it ultimately complicates integrating these features.
Graphite's Recommendation: Use a monorepo, especially for larger teams. It simplifies CI/CD pipeline management and fosters a more integrated development environment.
Key takeaways
Prioritize parallelization:
Assess opportunities: Regularly review your test suites and other CI jobs to identify opportunities for parallel execution.
Optimize resource allocation: Match the number of parallel jobs to your available computational resources to maximize efficiency without overloading the system.
Use parallel-friendly tools: Employ tools and frameworks that inherently support parallel execution of tasks.
Avoid redundant work:
Implement efficient caching: Cache dependencies and build artifacts to reuse them across jobs, reducing the time spent in setup.
Optimize docker layers: If using Docker, structure your Dockerfiles to take advantage of layer caching.
Use smart build systems: Employ build systems that can intelligently skip unchanged parts of the codebase.
Embrace selective testing:
Implement change detection: Use tools like Vercel’s Turbo to isolate testing on only the parts of the codebase that have changed.
Configure path-based triggers: Set up your CI to trigger different workflows based on the paths of changed files.
Shallow clone: Only checkout the code you need for testing, cutting out time spent pulling down extraneous data.
Implement the fail-fast principle:
Early exit on failure: Configure your CI pipeline to stop subsequent steps immediately after a failure is detected.
Prioritize fast feedback loops: Run the quickest tests (like linters and static analysis) early in the CI process.
Optimize dependency management:
Use dependency locking: Utilize lock files to ensure consistent dependency installation across all CI runs.
Optimize dependency retrieval: Use a package manager that supports efficient retrieval and caching of dependencies.
Prune unnecessary dependencies: Regularly audit your dependencies to remove or update those that are no longer needed or are outdated.
Optimizing your CI/CD pipeline is a juggling act of balancing efficiency, cost, and performance. By implementing strategies such as parallelization, avoiding redundant work, selective testing, and fast-failures, you can achieve significant improvements in your CI/CD processes. Make sure to consider the structure of your codebase (monorepo vs. polyrepo) and the limitations of your CI/CD tools as you think about these optimizations.
These approaches have worked well for us at Graphite, but obviously every company is different - we highly encourage you to experiment with these strategies, adapt them to your specific context, and continuously refine your CI/CD practices.
Let us know how these strategies work for you!
References and further reading
For more insights and detailed guides on optimizing CI/CD pipelines, check out the following resources: