How we built the first stack-aware merge queue (and why it matters)

Engineering teams have a merging problem. Not the kind you solve with better Git workflows or cleaner code reviews, but a fundamental infrastructure problem that gets worse as you scale. At Graphite, we solved it by building the first merge queue that understands stacked pull requests—and made it available to every engineering team. The results speak for themselves: 74% faster merge times at Ramp, 7 hours saved per engineer per week at Asana, and 21% more code shipped.

This is the story of how we built Graphite's Merge Queue, why existing solutions couldn't work, and the technical innovations that any team can now adopt.

The merge queue paradox

Every fast-moving engineering team eventually hits the same wall. You hire more engineers, break work into smaller pull requests, and suddenly your main branch becomes a battleground. Merge conflicts multiply. CI runs stack up. Engineers spend more time waiting for their code to merge than writing it.

The standard solution is a merge queue—tools like GitHub's native queue, Mergify, or Bors that serialize merges to prevent conflicts. But here's the paradox: the moment your team adopts stacked development (breaking large features into chains of dependent PRs), these tools become your bottleneck instead of your salvation.

At companies like Asana, engineers were spending up to 40 minutes per stack per week just babysitting the merge process. Across 125 engineers, that's 83 hours of wasted developer time every single week. The math is brutal, and it only gets worse as you scale.

Why existing merge queues break with stacks

To understand why we needed to build something new, you need to understand what a stack really is. A stack isn't just a sequence of PRs—it's a dependency graph where each change builds on the previous one. Think of it like this:

Terminal

PR #1: Add user authentication endpoints

PR #2: Add user profile UI (depends on #1)  

PR #3: Add profile picture upload (depends on #2)

PR #4: Add social login integration (depends on #1)

This creates a merge order constraint: you can't merge PR #2 until #1 is safely in main, and #3 can't merge until #2 is done. Traditional merge queues are fundamentally PR-centric—they treat each pull request as an independent unit, completely ignoring these dependencies.

Here's what happens when you try to merge a five-PR stack through a traditional queue:

PR #1 enters the queue: Runs CI, merges successfully
PR #2 enters the queue: Needs to rebase onto the new main (because #1 just merged), runs CI again
PR #3 enters the queue: Needs to rebase onto #2's new state, runs CI again
And so on...

Each PR requires a fresh rebase and CI run. If PR #3 fails CI, you have to evict it from the queue—but you also have to evict PRs #4 and #5 because they depend on #3. Then you need to rebase and re-run CI for everything downstream.

The result? A five-PR stack that should take 30 minutes to merge takes 3 hours, burns through CI resources like crazy, and requires constant manual intervention.

The stack-aware breakthrough

Our insight was simple: treat stacks as first-class citizens, not afterthoughts.

Instead of a PR-centric queue, we built a stack-aware system that understands dependency relationships. When you queue up a stack, Graphite's merge queue:

Validates the entire stack as a unit: Rather than testing each PR independently, we run CI on the top-most PR (which contains all downstream changes)
Merges atomically: If CI passes, all PRs in the stack fast-forward merge in sequence
Handles failures intelligently: If something breaks, we evict only the failing PR and its dependents, leaving the rest of the queue intact

Here's the technical architecture that makes this possible:

typescript

// Stack-aware merge queue processor
const processStack = async (stack: PR[]) => {
  // Build dependency graph
  const dependencyGraph = buildStackGraph(stack);

  // Run CI only on the stack head (contains all changes)
  const ciResult = await runSpeculativeCI(stack.head, {
    baseBranch: `gtmq_speculative_${stack.id}`,
    includedPRs: stack.allPRs
  });


  if (ciResult.success) {
    // Fast-forward merge all PRs in dependency order
    await mergeStackAtomically(stack);
  } else {
    // Bisect to find failing PR(s)
    const failingPRs = await bisectStack(stack, ciResult);
    await evictPRsAndDependents(failingPRs);
  }
};

The key insight is in that runSpeculativeCI call. Instead of running CI on each PR individually, we create a speculative merge branch that contains all the changes, then run CI once. This cuts CI costs dramatically while giving us complete confidence in the merged result.

Speculative execution and failure recovery

But stack-awareness alone wasn't enough for production-scale merge queues. When you're processing hundreds of PRs per day and CI takes 30-45 minutes, a single failure can back up your queue for hours.

Our solution: speculative execution with intelligent bisection.

When multiple stacks enter the queue, we don't wait for the first to complete before starting the second. Instead, we optimistically run CI on multiple stacks in parallel, using temporary branches with the gtmq_ prefix:

typescript

// Parallel speculative execution
const batchMerge = async (queuedStacks: Stack[]) => {
  const speculativeBranches = await Promise.all(
    queuedStacks.map(stack => 
      createSpeculativeBranch(stack, {
        assumeUpstreamSuccess: true,
        baseBranch: calculateOptimisticBase(stack)
      })
    )
  ); 

  const results = await runCIInParallel(speculativeBranches);

  // Handle successes and failures
  await Promise.all([
    ...results.successes.map(mergeStackAtomically),
    ...results.failures.map(handleFailureWithBisection)
  ]);
};

The magic happens in handleFailureWithBisection. When a batch of stacks fails CI, we don't just throw our hands up—we use a binary search algorithm to isolate the problematic changes:

typescript

const bisectStack = async (stack: PR[], failedCI: CIResult): Promise<PR[]> => {
  if (stack.length === 1) return stack;

  const midpoint = Math.floor(stack.length / 2);
  const bottomHalf = stack.slice(0, midpoint);
  const topHalf = stack.slice(midpoint);

  // Test bottom half first (dependencies are cleaner)
  const bottomResult = await runCI(bottomHalf);

  if (bottomResult.success) {
    // Problem is in the top half
    return bisectStack(topHalf, failedCI);
  } else {
    // Problem is in the bottom half  
    return bisectStack(bottomHalf, failedCI);
  }
};

This bisection algorithm can isolate a failing PR in a 32-PR batch with just 5 CI runs instead of 32. The math matters when CI costs are measured in dollars per minute.

The performance impact

The results were immediate and dramatic. Here's what engineering teams saw after adopting Graphite's stack-aware merge queue:

Ramp Engineering: 74% decrease in median time between merges, with engineers merging PRs up to 3x faster. Their engineering velocity metrics showed a clear inflection point right after adoption.

Asana: Engineers saved 7 hours per week each—that's nearly a full workday returned to actual engineering. They shipped 21% more code in the same timeframe and, importantly, reported significantly higher job satisfaction around the merge process.

Shopify: Projected CI cost savings of 15-25% across their entire organization by eliminating redundant stack CI runs. At their scale, that translates to hundreds of thousands of dollars annually.

The performance gains come from three key optimizations:

Reduced CI overhead: One CI run per stack instead of one per PR
Parallel processing: Multiple stacks can be validated simultaneously
Intelligent failure handling: Bisection minimizes wasted CI when problems occur

But the numbers only tell part of the story. The qualitative impact was equally significant—engineers stopped dreading the merge process and could focus on building instead of babysitting queues.

Technical challenges and solutions: building production-scale infrastructure

Building a production-ready stack-aware merge queue wasn't just about implementing algorithms—it was about solving the gnarly infrastructure problems that only surface when you're processing hundreds of PRs per day for teams like Shopify and Netflix. Here's the real story of how we built the infrastructure that powers Graphite Merge Queue, challenge by challenge.

The great concurrency control problem

The first major issue we hit was concurrent queue corruption. When multiple merge queue processors tried to work on the same repository simultaneously, they'd step on each other's work—corrupting stack state, creating duplicate CI runs, and leaving PRs stuck in limbo.

Our solution: distributed locking with timeout recovery. Every repository gets a lock with a configurable expiration time, and we built sophisticated cleanup logic to handle processor crashes:

typescript

// Repository-level locking
const locked = await mergeQueueDao.lockMqRepo({
  mqRepository,
  lockUntil: DateTime.now().plus({ seconds: iterationTimeSecs }),
});

if (locked) {
  // Process safely with exclusive access
  await mqProcessor({ mqRepository, deps });
  await mergeQueueDao.unlockMqRepo({ mqRepository, updateSuccess: true });
}

// Cleanup expired locks from crashed processors
const nRowsUnlocked = await mergeQueueDao.unlockExpiredLocks({});
if (nRowsUnlocked > 0) {
  splog.verbose({
    message: "Unlocked expired MQ repositories",
    tags: { nRowsUnlocked },
  });
}

But locking alone wasn't enough. We discovered that processors could still conflict when handling error recovery jobs. A failing stack might trigger multiple bisection processes, creating chaos. We had to implement stack-level locking within the repository lock:

typescript

// Stack-level locking for error handling
const createStackLock = async (stackPrNumbers: number[]) => {
  const lockKey = `stack_${stackPrNumbers.sort().join('_')}`;
  return await acquireLock(lockKey, { timeoutMs: 30000 });
};

The timeout explosion

As we scaled up batching, we hit another brutal problem: exponential timeout complexity. CI runs that took 10 minutes for single PRs were taking 45+ minutes for large batches, causing cascading failures across the entire queue.

The math was nasty. For a batch with N stacks, our timeout needed to account for:

Initial batch CI run: baseCiTime
Potential bisection depth: log2(N)
Flaky test retries: retryMultiplier

We built a dynamic timeout calculator that scales intelligently:

typescript

// Exponential timeout calculation
const calculateTimeout = (stackCount: number, strategy: 'bisect' | 'batch-sect') => {
  const baseTimeout = 30; // minutes
  const bisectionDepth = Math.ceil(Math.log2(stackCount)) + 2;

  if (strategy === 'batch-sect') {
    // N-sect creates N draft PRs simultaneously
    return baseTimeout * stackCount;
  } else {
    // Bisect creates log(N) draft PRs sequentially  
    return baseTimeout * bisectionDepth;
  }
};

// Timeout handling with proper cleanup
const handleTimeout = async ({ args, deps }) => {
  const { stackJobId, createdAt, timeoutMin } = args;
  const elapsed = DateTime.now().diff(DateTime.fromJSDate(createdAt), 'minutes');

  if (elapsed.minutes > timeoutMin) {
    await markJobAsCancelled({ stackJobId, reason: 'timeout' });
    await cleanupDraftPRs({ stackJobId });
    return { complete: true, timedOut: true };
  }
};

The draft PR nightmare

Speculative execution requires creating temporary "draft PRs" to test combinations of stacks. Simple in theory, brutal in practice. We discovered that GitHub's API has subtle race conditions when creating/deleting branches rapidly, leading to phantom branches and stale CI runs.

The breakthrough came when we realized we needed to treat draft PRs as first-class infrastructure, not just temporary artifacts:

typescript

// Draft PR with SHA tracking
const createDraftPrWithSha = async ({
  stackJob,
  targetSha,
  baseBranch = 'main'
}) => {

  // Fetch the exact SHA before creating branch to avoid race conditions
  await githubClient.git.fetchRef({
    owner,
    repo,
    ref: targetSha
  });
  
  const draftBranch = `gtmq_${stackJob.id}_${Date.now()}`;

  await githubClient.git.createRef({
    owner,
    repo,
    ref: `refs/heads/${draftBranch}`,
    sha: targetSha
  });

  return createDraftPR({
    head: draftBranch,
    base: baseBranch,
    title: `[GTMQ] Testing stack ${stackJob.id}`,
    body: generateStackTestingBody(stackJob)
  });
};

We also had to build intelligent cleanup logic because draft PRs could get orphaned when jobs failed or timed out:

typescript

// Cleanup orphaned draft PRs
const cleanupOrphanedDrafts = async () => {
  const staleDrafts = await findDraftPRsOlderThan({ hours: 2 });

  await Promise.all(staleDrafts.map(async (draft) => {
    await githubClient.pulls.update({
      pull_number: draft.number,
      state: 'closed'
    });

    await githubClient.git.deleteRef({
      ref: heads/${draft.head.ref}
    });
  }));
};

The bisection algorithm evolution

Our original bisection was naive—just binary search through the stack list. But real stacks have dependency constraints. You can't test PRs #3-5 if PR #2 is broken, because #3 depends on #2.

We had to build a topology-aware bisection that respects dependency ordering:

typescript

// Smart bisection respecting dependencies
const getDraftPrToCreateAndSafeStacksBisect = ({
  stacks,
  latestDraftPrMetadata,
  ciSummary
}) => {

  if (stacks.length === 1) {
    return { stackToBisect: stacks[0], safeStacks: [] };
  }

  // Find midpoint that doesn't break dependencies
  const midpoint = findValidBisectionPoint(stacks);
  const bottomHalf = stacks.slice(0, midpoint);
  const topHalf = stacks.slice(midpoint);
  
  // Test bottom half first (cleaner dependency chains)
  return {
    stackToBisect: bottomHalf,
    potentialSafeStacks: topHalf
  };
};

But even that wasn't enough. We discovered that flaky tests were causing false positives in bisection, leading to good PRs getting evicted. So we added confidence scoring:

typescript

// Confidence-based eviction to handle flaky tests
const shouldEvictWithConfidence = (ciResults: CiResult[]) => {
  const failurePattern = ciResults.map(r => r.success);
  const consecutiveFailures = getConsecutiveFailures(failurePattern);

  // Require multiple consistent failures before evicting
  return consecutiveFailures >= 2;
};

The two-strategy failure system

The biggest architectural breakthrough came when we realized that different failure scenarios need different strategies:

N-Sect for Fast Isolation: When you have 8 stacks and suspect multiple might be broken, create 8 draft PRs simultaneously and test them all in parallel. Expensive but fast.

Bisect for Deep Debugging: When you have 32 stacks and expect at most one failure, use binary search to minimize CI costs.

typescript

// Strategy selection logic
const selectFailureStrategy = (stacks: Stack[], ciHistory: CiResult[]) => {
  const stackCount = stacks.length;
  const historicalFailureRate = calculateFailureRate(ciHistory);

  if (stackCount <= 8 || historicalFailureRate > 0.3) {
    return { strategy: 'batch-sect', nsectN: BATCH_SIZE_N_SECT };
  } else {
    return { strategy: 'bisect', nsectN: 2 };
  }
};

Horizontal scaling: the partition breakthrough

Even with all these optimizations, we hit the monorepo scaling wall. A single merge queue processing 200+ PRs/day becomes a bottleneck no matter how smart your algorithms are.

The solution: partitioned queues that split repositories by file patterns:

typescript

// Partition coordinator

const processMergeQueueWithPartitions = async ({ mqRepository, deps }) => {
  const partitions = mqRepository.partitions || [];

  // Process each partition independently
  await Promise.all(partitions.map(async (partition) => {
    const partitionEntries = await filterEntriesByPartition(partition);

    if (partitionEntries.length > 0) {
      await processPartition({
        partition,
        entries: partitionEntries,
        concurrency: partition.concurrency || 3
      });
    }
  }));
};


// Smart partition routing
const determinePartition = (changedFiles: string[], partitionConfig) => {
  for (const [name, config] of Object.entries(partitionConfig)) {
    if (changedFiles.some(file => minimatch(file, config.pattern))) {
      return name;
    }
  }
  return 'default';
};

This enables true horizontal scaling—frontend changes don't wait for backend CI, database migrations don't block UI tweaks, and each partition can have different concurrency limits based on CI capacity.

The observability challenge

The final piece was deep observability. When a 20-stack batch fails after 45 minutes of CI, you need surgical precision to understand what happened:

typescript

// Comprehensive failure tracking

const logBisectionProgress = ({
  stackJob,
  currentDepth,
  totalStacks,
  strategy,
  ciResults
}) => {
  splog.info({
    message: "Bisection progress",
    tags: {
      stackJobId: stackJob.id,
      strategy,
      depth: `${currentDepth}/${Math.ceil(Math.log2(totalStacks))}`,
      testedStacks: ciResults.length,
      failureRate: calculateFailureRate(ciResults),
      estimatedTimeRemaining: calculateRemainingTime(currentDepth, totalStacks)
    }
  });
};

We instrument everything: lock acquisition times, bisection depths, CI wait times, GitHub API latencies, timeout frequencies. This data feeds back into our algorithms—we actually tune timeout multipliers based on historical CI performance per repository.

The compound effect

What makes this system special isn't any single algorithm—it's how all these pieces work together. The locking prevents corruption, the timeouts prevent cascading failures, the bisection algorithms minimize CI waste, the partitioning enables horizontal scale, and the observability ties it all together.

The result is infrastructure that scales with team velocity, not against it. When Asana's engineering team doubles in size, their merge queue gets faster, not slower. When Shopify pushes 500 PRs in a day, their CI costs go down, not up.

That's the difference between building a tool and building infrastructure—infrastructure should make the hard things easy and the impossible things possible.

Why this matters for every engineering team

The stack-aware merge queue represents more than just a performance optimization—it's an enabling technology for modern development workflows that any team can adopt.

As engineering teams scale, the pressure to break work into smaller, more reviewable chunks intensifies. But traditional merge infrastructure penalizes this approach, creating a painful trade-off between code quality and developer velocity.

By making stacks first-class citizens in the merge process, teams can eliminate that trade-off. With Graphite Merge Queue, you can embrace smaller PRs, deeper stacks, and more granular code reviews without paying a merge-time penalty. This changes the fundamental economics of how you structure development work.

Consider the second-order effects:

- Better code reviews: Smaller PRs are easier to review thoroughly

- Faster feature delivery: Partial stack merges let you ship incrementally

- Reduced merge anxiety: Developers stop batching changes to avoid merge queue pain

- CI cost optimization: Smarter test execution saves real money at scale

The future of developer infrastructure

The merge queue is just one example of infrastructure that needs to evolve as development practices mature. The Git primitives we use today—branches, merges, rebases—were designed for a different era of software development. They work fine for small teams working on single features, but they break down at scale.

What we've learned building Graphite's merge queue is that the solution isn't to work around these limitations—it's to build infrastructure that understands the higher-level abstractions developers actually work with. Stacks, dependencies, atomic feature delivery, intelligent CI optimization.

The tools we build should make best practices easier, not harder. When merging a five-PR stack is as simple as merging a single PR, teams naturally gravitate toward better development practices.

That's the real win here. We didn't just make merges faster—we made better development workflows viable at scale. And that's the kind of infrastructure innovation that compounds over time, enabling engineering teams to ship higher-quality software more quickly.

How we built the first stack-aware merge queue (and why it matters)

The merge queue paradox

Why existing merge queues break with stacks

The stack-aware breakthrough

Speculative execution and failure recovery

The performance impact

Technical challenges and solutions: building production-scale infrastructure

The great concurrency control problem

The timeout explosion

The draft PR nightmare

The bisection algorithm evolution

The two-strategy failure system

Horizontal scaling: the partition breakthrough

The observability challenge

The compound effect

Why this matters for every engineering team

The future of developer infrastructure

Related posts

Built for the world's fastest engineering teams, now available for everyone