Engineering teams have a merging problem. Not the kind you solve with better Git workflows or cleaner code reviews, but a fundamental infrastructure problem that gets worse as you scale. At Graphite, we solved it by building the first merge queue that understands stacked pull requests—and made it available to every engineering team. The results speak for themselves: 74% faster merge times at Ramp, 7 hours saved per engineer per week at Asana, and 21% more code shipped.
This is the story of how we built Graphite's Merge Queue, why existing solutions couldn't work, and the technical innovations that any team can now adopt.
The merge queue paradox
Every fast-moving engineering team eventually hits the same wall. You hire more engineers, break work into smaller pull requests, and suddenly your main branch becomes a battleground. Merge conflicts multiply. CI runs stack up. Engineers spend more time waiting for their code to merge than writing it.
The standard solution is a merge queue—tools like GitHub's native queue, Mergify, or Bors that serialize merges to prevent conflicts. But here's the paradox: the moment your team adopts stacked development (breaking large features into chains of dependent PRs), these tools become your bottleneck instead of your salvation.
At companies like Asana, engineers were spending up to 40 minutes per stack per week just babysitting the merge process. Across 125 engineers, that's 83 hours of wasted developer time every single week. The math is brutal, and it only gets worse as you scale.
Why existing merge queues break with stacks
To understand why we needed to build something new, you need to understand what a stack really is. A stack isn't just a sequence of PRs—it's a dependency graph where each change builds on the previous one. Think of it like this:
PR #1: Add user authentication endpointsPR #2: Add user profile UI (depends on #1)PR #3: Add profile picture upload (depends on #2)PR #4: Add social login integration (depends on #1)
This creates a merge order constraint: you can't merge PR #2 until #1 is safely in main, and #3 can't merge until #2 is done. Traditional merge queues are fundamentally PR-centric—they treat each pull request as an independent unit, completely ignoring these dependencies.
Here's what happens when you try to merge a five-PR stack through a traditional queue:
PR #1 enters the queue: Runs CI, merges successfully
PR #2 enters the queue: Needs to rebase onto the new main (because #1 just merged), runs CI again
PR #3 enters the queue: Needs to rebase onto #2's new state, runs CI again
And so on...
Each PR requires a fresh rebase and CI run. If PR #3 fails CI, you have to evict it from the queue—but you also have to evict PRs #4 and #5 because they depend on #3. Then you need to rebase and re-run CI for everything downstream.
The result? A five-PR stack that should take 30 minutes to merge takes 3 hours, burns through CI resources like crazy, and requires constant manual intervention.
The stack-aware breakthrough
Our insight was simple: treat stacks as first-class citizens, not afterthoughts.
Instead of a PR-centric queue, we built a stack-aware system that understands dependency relationships. When you queue up a stack, Graphite's merge queue:
Validates the entire stack as a unit: Rather than testing each PR independently, we run CI on the top-most PR (which contains all downstream changes)
Merges atomically: If CI passes, all PRs in the stack fast-forward merge in sequence
Handles failures intelligently: If something breaks, we evict only the failing PR and its dependents, leaving the rest of the queue intact
Here's the technical architecture that makes this possible:
// Stack-aware merge queue processorconst processStack = async (stack: PR[]) => {// Build dependency graphconst dependencyGraph = buildStackGraph(stack);// Run CI only on the stack head (contains all changes)const ciResult = await runSpeculativeCI(stack.head, {baseBranch: `gtmq_speculative_${stack.id}`,includedPRs: stack.allPRs});if (ciResult.success) {// Fast-forward merge all PRs in dependency orderawait mergeStackAtomically(stack);} else {// Bisect to find failing PR(s)const failingPRs = await bisectStack(stack, ciResult);await evictPRsAndDependents(failingPRs);}};
The key insight is in that runSpeculativeCI
call. Instead of running CI on each PR individually, we create a speculative merge branch that contains all the changes, then run CI once. This cuts CI costs dramatically while giving us complete confidence in the merged result.
Speculative execution and failure recovery
But stack-awareness alone wasn't enough for production-scale merge queues. When you're processing hundreds of PRs per day and CI takes 30-45 minutes, a single failure can back up your queue for hours.
Our solution: speculative execution with intelligent bisection.
When multiple stacks enter the queue, we don't wait for the first to complete before starting the second. Instead, we optimistically run CI on multiple stacks in parallel, using temporary branches with the gtmq_
prefix:
// Parallel speculative executionconst batchMerge = async (queuedStacks: Stack[]) => {const speculativeBranches = await Promise.all(queuedStacks.map(stack =>createSpeculativeBranch(stack, {assumeUpstreamSuccess: true,baseBranch: calculateOptimisticBase(stack)})));const results = await runCIInParallel(speculativeBranches);// Handle successes and failuresawait Promise.all([...results.successes.map(mergeStackAtomically),...results.failures.map(handleFailureWithBisection)]);};
The magic happens in handleFailureWithBisection
. When a batch of stacks fails CI, we don't just throw our hands up—we use a binary search algorithm to isolate the problematic changes:
const bisectStack = async (stack: PR[], failedCI: CIResult): Promise<PR[]> => {if (stack.length === 1) return stack;const midpoint = Math.floor(stack.length / 2);const bottomHalf = stack.slice(0, midpoint);const topHalf = stack.slice(midpoint);// Test bottom half first (dependencies are cleaner)const bottomResult = await runCI(bottomHalf);if (bottomResult.success) {// Problem is in the top halfreturn bisectStack(topHalf, failedCI);} else {// Problem is in the bottom halfreturn bisectStack(bottomHalf, failedCI);}};
This bisection algorithm can isolate a failing PR in a 32-PR batch with just 5 CI runs instead of 32. The math matters when CI costs are measured in dollars per minute.
The performance impact
The results were immediate and dramatic. Here's what engineering teams saw after adopting Graphite's stack-aware merge queue:
Ramp Engineering: 74% decrease in median time between merges, with engineers merging PRs up to 3x faster. Their engineering velocity metrics showed a clear inflection point right after adoption.
Asana: Engineers saved 7 hours per week each—that's nearly a full workday returned to actual engineering. They shipped 21% more code in the same timeframe and, importantly, reported significantly higher job satisfaction around the merge process.
Shopify: Projected CI cost savings of 15-25% across their entire organization by eliminating redundant stack CI runs. At their scale, that translates to hundreds of thousands of dollars annually.
The performance gains come from three key optimizations:
Reduced CI overhead: One CI run per stack instead of one per PR
Parallel processing: Multiple stacks can be validated simultaneously
Intelligent failure handling: Bisection minimizes wasted CI when problems occur
But the numbers only tell part of the story. The qualitative impact was equally significant—engineers stopped dreading the merge process and could focus on building instead of babysitting queues.
Technical challenges and solutions: building production-scale infrastructure
Building a production-ready stack-aware merge queue wasn't just about implementing algorithms—it was about solving the gnarly infrastructure problems that only surface when you're processing hundreds of PRs per day for teams like Shopify and Netflix. Here's the real story of how we built the infrastructure that powers Graphite Merge Queue, challenge by challenge.
The great concurrency control problem
The first major issue we hit was concurrent queue corruption. When multiple merge queue processors tried to work on the same repository simultaneously, they'd step on each other's work—corrupting stack state, creating duplicate CI runs, and leaving PRs stuck in limbo.
Our solution: distributed locking with timeout recovery. Every repository gets a lock with a configurable expiration time, and we built sophisticated cleanup logic to handle processor crashes:
// Repository-level lockingconst locked = await mergeQueueDao.lockMqRepo({mqRepository,lockUntil: DateTime.now().plus({ seconds: iterationTimeSecs }),});if (locked) {// Process safely with exclusive accessawait mqProcessor({ mqRepository, deps });await mergeQueueDao.unlockMqRepo({ mqRepository, updateSuccess: true });}// Cleanup expired locks from crashed processorsconst nRowsUnlocked = await mergeQueueDao.unlockExpiredLocks({});if (nRowsUnlocked > 0) {splog.verbose({message: "Unlocked expired MQ repositories",tags: { nRowsUnlocked },});}
But locking alone wasn't enough. We discovered that processors could still conflict when handling error recovery jobs. A failing stack might trigger multiple bisection processes, creating chaos. We had to implement stack-level locking within the repository lock:
// Stack-level locking for error handlingconst createStackLock = async (stackPrNumbers: number[]) => {const lockKey = `stack_${stackPrNumbers.sort().join('_')}`;return await acquireLock(lockKey, { timeoutMs: 30000 });};
The timeout explosion
As we scaled up batching, we hit another brutal problem: exponential timeout complexity. CI runs that took 10 minutes for single PRs were taking 45+ minutes for large batches, causing cascading failures across the entire queue.
The math was nasty. For a batch with N stacks, our timeout needed to account for:
Initial batch CI run:
baseCiTime
Potential bisection depth:
log2(N)
Flaky test retries:
retryMultiplier
We built a dynamic timeout calculator that scales intelligently:
// Exponential timeout calculationconst calculateTimeout = (stackCount: number, strategy: 'bisect' | 'batch-sect') => {const baseTimeout = 30; // minutesconst bisectionDepth = Math.ceil(Math.log2(stackCount)) + 2;if (strategy === 'batch-sect') {// N-sect creates N draft PRs simultaneouslyreturn baseTimeout * stackCount;} else {// Bisect creates log(N) draft PRs sequentiallyreturn baseTimeout * bisectionDepth;}};// Timeout handling with proper cleanupconst handleTimeout = async ({ args, deps }) => {const { stackJobId, createdAt, timeoutMin } = args;const elapsed = DateTime.now().diff(DateTime.fromJSDate(createdAt), 'minutes');if (elapsed.minutes > timeoutMin) {await markJobAsCancelled({ stackJobId, reason: 'timeout' });await cleanupDraftPRs({ stackJobId });return { complete: true, timedOut: true };}};
The draft PR nightmare
Speculative execution requires creating temporary "draft PRs" to test combinations of stacks. Simple in theory, brutal in practice. We discovered that GitHub's API has subtle race conditions when creating/deleting branches rapidly, leading to phantom branches and stale CI runs.
The breakthrough came when we realized we needed to treat draft PRs as first-class infrastructure, not just temporary artifacts:
// Draft PR with SHA trackingconst createDraftPrWithSha = async ({stackJob,targetSha,baseBranch = 'main'}) => {// Fetch the exact SHA before creating branch to avoid race conditionsawait githubClient.git.fetchRef({owner,repo,ref: targetSha});const draftBranch = `gtmq_${stackJob.id}_${Date.now()}`;await githubClient.git.createRef({owner,repo,ref: `refs/heads/${draftBranch}`,sha: targetSha});return createDraftPR({head: draftBranch,base: baseBranch,title: `[GTMQ] Testing stack ${stackJob.id}`,body: generateStackTestingBody(stackJob)});};
We also had to build intelligent cleanup logic because draft PRs could get orphaned when jobs failed or timed out:
// Cleanup orphaned draft PRsconst cleanupOrphanedDrafts = async () => {const staleDrafts = await findDraftPRsOlderThan({ hours: 2 });await Promise.all(staleDrafts.map(async (draft) => {await githubClient.pulls.update({pull_number: draft.number,state: 'closed'});await githubClient.git.deleteRef({ref: heads/${draft.head.ref}});}));};
The bisection algorithm evolution
Our original bisection was naive—just binary search through the stack list. But real stacks have dependency constraints. You can't test PRs #3-5 if PR #2 is broken, because #3 depends on #2.
We had to build a topology-aware bisection that respects dependency ordering:
// Smart bisection respecting dependenciesconst getDraftPrToCreateAndSafeStacksBisect = ({stacks,latestDraftPrMetadata,ciSummary}) => {if (stacks.length === 1) {return { stackToBisect: stacks[0], safeStacks: [] };}// Find midpoint that doesn't break dependenciesconst midpoint = findValidBisectionPoint(stacks);const bottomHalf = stacks.slice(0, midpoint);const topHalf = stacks.slice(midpoint);// Test bottom half first (cleaner dependency chains)return {stackToBisect: bottomHalf,potentialSafeStacks: topHalf};};
But even that wasn't enough. We discovered that flaky tests were causing false positives in bisection, leading to good PRs getting evicted. So we added confidence scoring:
// Confidence-based eviction to handle flaky testsconst shouldEvictWithConfidence = (ciResults: CiResult[]) => {const failurePattern = ciResults.map(r => r.success);const consecutiveFailures = getConsecutiveFailures(failurePattern);// Require multiple consistent failures before evictingreturn consecutiveFailures >= 2;};
The two-strategy failure system
The biggest architectural breakthrough came when we realized that different failure scenarios need different strategies:
N-Sect for Fast Isolation: When you have 8 stacks and suspect multiple might be broken, create 8 draft PRs simultaneously and test them all in parallel. Expensive but fast.
Bisect for Deep Debugging: When you have 32 stacks and expect at most one failure, use binary search to minimize CI costs.
// Strategy selection logicconst selectFailureStrategy = (stacks: Stack[], ciHistory: CiResult[]) => {const stackCount = stacks.length;const historicalFailureRate = calculateFailureRate(ciHistory);if (stackCount <= 8 || historicalFailureRate > 0.3) {return { strategy: 'batch-sect', nsectN: BATCH_SIZE_N_SECT };} else {return { strategy: 'bisect', nsectN: 2 };}};
Horizontal scaling: the partition breakthrough
Even with all these optimizations, we hit the monorepo scaling wall. A single merge queue processing 200+ PRs/day becomes a bottleneck no matter how smart your algorithms are.
The solution: partitioned queues that split repositories by file patterns:
// Partition coordinatorconst processMergeQueueWithPartitions = async ({ mqRepository, deps }) => {const partitions = mqRepository.partitions || [];// Process each partition independentlyawait Promise.all(partitions.map(async (partition) => {const partitionEntries = await filterEntriesByPartition(partition);if (partitionEntries.length > 0) {await processPartition({partition,entries: partitionEntries,concurrency: partition.concurrency || 3});}}));};// Smart partition routingconst determinePartition = (changedFiles: string[], partitionConfig) => {for (const [name, config] of Object.entries(partitionConfig)) {if (changedFiles.some(file => minimatch(file, config.pattern))) {return name;}}return 'default';};
This enables true horizontal scaling—frontend changes don't wait for backend CI, database migrations don't block UI tweaks, and each partition can have different concurrency limits based on CI capacity.
The observability challenge
The final piece was deep observability. When a 20-stack batch fails after 45 minutes of CI, you need surgical precision to understand what happened:
// Comprehensive failure trackingconst logBisectionProgress = ({stackJob,currentDepth,totalStacks,strategy,ciResults}) => {splog.info({message: "Bisection progress",tags: {stackJobId: stackJob.id,strategy,depth: `${currentDepth}/${Math.ceil(Math.log2(totalStacks))}`,testedStacks: ciResults.length,failureRate: calculateFailureRate(ciResults),estimatedTimeRemaining: calculateRemainingTime(currentDepth, totalStacks)}});};
We instrument everything: lock acquisition times, bisection depths, CI wait times, GitHub API latencies, timeout frequencies. This data feeds back into our algorithms—we actually tune timeout multipliers based on historical CI performance per repository.
The compound effect
What makes this system special isn't any single algorithm—it's how all these pieces work together. The locking prevents corruption, the timeouts prevent cascading failures, the bisection algorithms minimize CI waste, the partitioning enables horizontal scale, and the observability ties it all together.
The result is infrastructure that scales with team velocity, not against it. When Asana's engineering team doubles in size, their merge queue gets faster, not slower. When Shopify pushes 500 PRs in a day, their CI costs go down, not up.
That's the difference between building a tool and building infrastructure—infrastructure should make the hard things easy and the impossible things possible.
Why this matters for every engineering team
The stack-aware merge queue represents more than just a performance optimization—it's an enabling technology for modern development workflows that any team can adopt.
As engineering teams scale, the pressure to break work into smaller, more reviewable chunks intensifies. But traditional merge infrastructure penalizes this approach, creating a painful trade-off between code quality and developer velocity.
By making stacks first-class citizens in the merge process, teams can eliminate that trade-off. With Graphite Merge Queue, you can embrace smaller PRs, deeper stacks, and more granular code reviews without paying a merge-time penalty. This changes the fundamental economics of how you structure development work.
Consider the second-order effects:
- Better code reviews: Smaller PRs are easier to review thoroughly
- Faster feature delivery: Partial stack merges let you ship incrementally
- Reduced merge anxiety: Developers stop batching changes to avoid merge queue pain
- CI cost optimization: Smarter test execution saves real money at scale
The future of developer infrastructure
The merge queue is just one example of infrastructure that needs to evolve as development practices mature. The Git primitives we use today—branches, merges, rebases—were designed for a different era of software development. They work fine for small teams working on single features, but they break down at scale.
What we've learned building Graphite's merge queue is that the solution isn't to work around these limitations—it's to build infrastructure that understands the higher-level abstractions developers actually work with. Stacks, dependencies, atomic feature delivery, intelligent CI optimization.
The tools we build should make best practices easier, not harder. When merging a five-PR stack is as simple as merging a single PR, teams naturally gravitate toward better development practices.
That's the real win here. We didn't just make merges faster—we made better development workflows viable at scale. And that's the kind of infrastructure innovation that compounds over time, enabling engineering teams to ship higher-quality software more quickly.