Read Anthropic’s case study about Graphite Reviewer

AI is buzzy and powerful right now. Look at how every company seems to be adding it to their H1s.

Much of this is just marketing fluff hyping a handful of small AI features. That being said, not all of these features are vaporware, and while many might be bad, some are… actually good. Given how many engineers are tinkering and productionizing LLM features lately, I’m curious - what makes an AI feature good vs. bad?


Note

Greg spends full workdays writing weekly deep dives on engineering practices and dev-tools. This is made possible because these articles help get the word out about Graphite. If you like this post, try Graphite today, and start shipping 30% faster!


I got some ideas last week when I attended an OpenAI developer day. There, I sat with other startup founders and developers, listening to presentations by OpenAI engineers on how to build on top of their APIs. During the day, there was a 30-minute session where folks had a chance to raise their hand and ask Sam Altman anything. One particular question was asked multiple times in different ways: “Will OpenAI launch a new product that will kill my startup in the future?” The response was friendly but firm - you’ll be fine as long as you:

  1. don’t compete with ChatGPT

  2. don’t try to build the best LLM

  3. and don’t just try to fix a small problem with existing LLM services

The advice is straightforward and is a clear warning to anyone trying to build a company that’s only value prop is the equivalent of a caching layer to GPT-4’s API or an alternative GUI to ChatGPT.

(Caveat - this isn't to say one shouldn’t compete with OpenAI, but it is a valid response to those worried about getting steamrolled).

Based on this advice, if those are all bad AI features, what makes a good one? It’s not a ChatGPT skin or clone. It isn’t the addition of a model trying to be cheaper or smarter than GPT3/4. And it certainly isn’t some feature that improves the ergonomics of a model’s existing API.

That’s a lot of “what's not a good feature” - but from these negatives we can try to distill positive traits. OpenAI’s presence in the market (along with Anthropic and others) helps to establish some baseline constraints:

First, we can predict that LLM models will get asymptotically cheaper, smarter, and easier to integrate. Therefore, in the near future, any product that benefits from generative AI will integrate it. LLM-backed features may become as common as CSS on websites, and, like CSS, it might become necessary to include it on your site to stay competitive.

Now, though, AI is here — in fact, it’s everywhere, no matter the underlying technology.” - Ben Thomson

Secondly, ChatGPT (or some equivalent) will continue existing and improving. Therefore, products should assume that motivated users are always one new-tab away from a generic input-output oracle. If you either don’t have generative features directly baked in to your application, or your features aren’t good enough, the ai-loving user will quickly default to the next best option.

there is a user experience issue when it comes to AI… and basically any friction in that experience matters way more - Nat Freedman

These two constraints together ensure that AI outside your product is plentiful, powerful, and alluring to users. Therefore, any good AI feature must be better for one or both of the following reasons:

  • Lowest friction

  • Queries unique data

GitHub’s Copilot product is a great example of a tool that wins by lowest-friction AI responses. As a user, I could easily copy and paste my code into a ChatGPT-like tool and instruct the AI to complete the function I was typing. If however, like Copilot, you present me the generated response directly, without me having to switch contexts, you’ll win my usage. Superhuman’s use of AI to draft responses to emails is another great example of lowest-friction access. Seamless access to generative AI makes a common ability great because it’s cheaper (effort-wise) than any alternative.

Additionally, an AI feature can be great if it operates over data that only that product has access to. Notion’s AI search is a perfect example of this fit. Many tools use generative AI to answer questions about documents, but only Notion has access to all your Notion pages. You can’t export all your documents and paste them into ChatGPT, and a competing AI document editor hasn’t built up the dataset that Notion has. The unique data access makes the feature great because it’s the only way to use generative AI in this case.

Seamless access and unique data - what do these principles mean for someone looking to create an AI feature?

Firstly, it means that your product needs to create value for users without AI. In order to offer frictionless AI, there needs to be an existing user flow off which to trigger. A user already chooses Canva to create posters before Canva suggests AI improvements. A coder is already typing in an IDE before generative autocomplete surfaces for them. If the user doesn't have a reason to use a product outside of AI itself, then the product is likely just a skin on ChatGPT, destined for painful competition.

Secondly, these principles mean that a good AI feature should aim to leverage unique data from your application. To do that, the product needs to create unique data in the first place. AI without unique data is like adding an LLM to a calculator app - your only shot at it being a good feature is hoping that it’s lower friction than alternatives. But if you can accumulate a dataset that your user cares about, then you can generate great summaries, translations, search results, pictures, and more from that data. DayOne’s journalling app (no mention of AI yet) could create a feature that lets you chat with former versions of yourself. Even if it took a few clicks to access (and a cavalier sense of trust), it would be the only way to use AI to talk to your past diary entries.

I spend all my time building Graphite—a better code change stack—and thinking about how we can build AI features that aren’t terrible. In the past, we’ve even shipped a few. In building those features, how have I applied these principles myself?

First, I created AI-generated descriptions for pull requests. This is a medium-good feature based on these principles. The generation can be triggered with a single button on the PR, making it slightly lower friction than copy-and-pasting code into ChatGPT. It also leverages recursive summarization, allowing for diffs that would otherwise max out ChatGPT’s input size. But, it could be better - a lower-friction feature would ghost a suggested description at PR creation time, serving as a viable default that the user could extend or delete.

When it comes to data uniqueness however, AI-generated PR descriptions is also only “okay”. Graphite as a product works hard to sync every PR instantaneously to GitHub, and vice versa. Therefore, it’s not the only tool with access to the code diff - GitHub has the same data. Moreover, GitHub is a generous platform that grants controlled access to code diffs to many other applications. While the input diff can’t be found publicly, there’s nothing stopping GitHub from creating an equal or better description generator.

I’d rate this feature 3/5 stars for low-friction, and 2/5 stars for unique data.

A similar but better AI feature we've built is “generated branch names.” Already, tens of thousands of developers use Graphite to create local git branches. Traditionally, they type gt create -m "feat(server): new endpoint to make a neatly named branch. But now, we allow users to run gt create --ai and have GPT-4 select a branch title for them based on the diff, as well as old user-typed branch titles. This feature was partly inspired by my friend’s open-source project “gptcommit.”

Why is this a better feature? For one, it’s lower friction. The user types net-fewer characters and thinks less about a name that hardly matters. It’s not only easier than using ChatGPT, but it’s also easier than not using AI at all. Secondly, the feature leverages data that (roughly) only Graphite has access to - in this case, your local diff that you’re committing to a branch. At the moment of branch creation, GitHub doesn’t have access to the diff, nor does Git. The closest other service with access to the diff is VSCode and the plugins running on it - but Graphite runs there as well along with our title generation.

5/5 stars for low-friction, and 3/5 stars for unique data.

What could change that might cause these principles to evolve? AI model access might become much more expensive. Based on the industry trends so far, though, I doubt it.

Sam Altman might prove wrong and custom per-application models might become a critical variable - though if that were to be true, unique data would only become more important.

Thirdly, LLMs could develop further into a platform, as ChatGPT hopes to with their GPTs Store. The concept of building an AI feature into a product would be flipped on its head, where applications are instead built into AI rather than the other way around. If this came to pass, friction to trigger AI would stop mattering.

Regardless of what the future holds, it’s a fun time to be experimenting with AI features. If you want to read more about how Graphite has investigated AI features, check out our former blog post on the rest of our AI experiments here at Graphite.

Built for the world's fastest engineering teams, now available for everyone