
When Graphite and Braintrust, a leading AI evaluation and observability platform, started talking about working together, it was like holding up a mirror. Both companies began the same way: with founders who built such effective internal tools, they knew others needed them too.
“After building this set of internal tools myself at Figma and before that at my own startup, I realized that other people have the same problem,” recalls Ankur Goyal, CEO of Braintrust. Graphite’s origin was strikingly similar: the team had created a faster, cleaner way to handle code review and pull requests, first for themselves, then for others.
This shared foundation made collaboration natural. Both teams began using each other’s tools to address pain points and collaborated to improve one another’s products. In this case study, we’ll take a closer look at how Braintrust uses Graphite’s AI code review tool, Diamond, and how that feedback loop strengthens both companies.
How Braintrust leverages Diamond
For Braintrust, usability is key. AI code review is only valuable if developers can trust it and integrate it seamlessly into their workflows. “Diamond felt like the first product that met a certain level of usability criteria. Diamond now runs on all of our PRs and serves as a really good additional layer of review. It’s particularly useful in enforcing a baseline quality of code and detecting subtle bugs that are hard to find and test,” says Ankur.
As Braintrust scaled, Graphite became central to their code review process. With Graphite and Diamond, the team was able to:
Onboard users with a low learning curve.
Accelerate code reviews by making them more structured.
Increase velocity without sacrificing code quality.
With Graphite in place, developers could ship improvements to Braintrust’s observability platform faster and with greater confidence, without lowering their standards for rigor and quality.
Braintrust accepts nearly two-thirds (63%) of Diamond’s code review suggestions.
Diamond also changed the way Braintrust engineers approached reviews. “Diamond allows you to treat more code reviews as if they were the most important code review in the world,” Ankur explains. “Diamond now reads through PRs, pulling out all the fine-print details that are usually the most tedious to review.” And because Braintrust’s documentation lives in the same repo as their product code, those reviews extend to docs as well, which helps the team improve important user-facing content alongside their software.
Building better tooling, together
This partnership is mutually reinforcing: while Braintrust gains value from AI code reviews, the Graphite team also benefits from Braintrust’s findings to help build evaluation datasets and criteria. “Sometimes I’ll go out of my way to find examples while using Diamond where it’s not doing what it should be doing,” says Ankur. “That helps the Graphite team build a repository of inputs for eval datasets, and it also helps them build intuition about what they should be evaluating in the first place.”
Evaluations help make Diamond more accurate and reliable. For example, after running evaluations with Braintrust, the Graphite team saw a 5% drop in negative rules generated through Diamond’s custom rules feature. As Ankur sums it up: “This is how tech companies are using each other to make each other better, but also build a better product for their user base.”
Conclusion
By offloading the tedious, detail-oriented checks to Diamond, Braintrust’s developers can spend more time focusing on customer-facing challenges around AI reliability, rather than getting bogged down in code review inefficiencies. At the same time, Graphite benefits from Braintrust’s real-world insights, creating a feedback loop where both products improve. The result is a partnership that reflects the best of the developer tools ecosystem: two companies working together to build better products for every team that relies on them.