Read Anthropic’s case study about Graphite Reviewer

Most engineers are familiar with creating branches and making commits in git - despite being notoriously unintuitive, git has become universal in software engineering. But did you know that you can use git to store more than just snapshots of code?

There is already a long history of tools hacking small amounts of metadata into Git. For example, the open-source code review tool Gerrit ingests pull requests through git push by allowing the user to encode data in the name of the remote ref. The command git push gerrit HEAD:refs/for/master would open a pull request against master on Gerrit rather than writing a new ref.

Another common form of metadata hacking is the practice of adding unique IDs to commits. Maintaining association between a proposed code change and a specific commit can be hard because a git commit ID can change between revisions. In response, both Gerrit and another code review tool, Phabricator, leverage the commit message as a metadata store. Using a commit hook or alternative source control CLI, they add a unique ID to each commit message which proves stable across rebases and amendments.

At Graphite, our CLI has a different form of metadata we need to track. To create stacks of branches, the tool needs to map branches to their parents. Storing a reference to the name of a parent branch in a commit message wouldn't work because no one commit is stable over the life of a branch.

After investigating various mechanisms, we landed on using git's object database directly to store branch metadata. The command [git hash-object](https://git-scm.com/docs/git-hash-object) allows a user to write any string Git's object database and returns an ID. A second command, [git update-ref](https://git-scm.com/docs/git-update-ref) allows you to create or update any ref to point to the stored object by its ID. Used together, we had a dead simple mechanism for storing JSON blobs in Git's native database:

Terminal
const objectId = execSync(`git hash-object -w --stdin`, {
input: JSON.stringify(metadata),
}).toString();
execSync(`git update-ref refs/branch-metadata/${branchName} ${objectId}`, {
stdio: "ignore",
});

Storing data is of no use if we can't read it back. Luckily, the read operation is even easier using [git cat-file](https://git-scm.com/docs/git-cat-file)

Terminal
const metadata = execSync(
`git cat-file -p refs/branch-metadata/${branchName} 2> /dev/null`
).toString();

With these two code blocks, we have everything necessary to read and write any data to Git's object database. The advantages of this approach are plentiful:

  • The metadata refs are plainly visible to users by running ls .git/refs/branch-metadata

  • The data can be inspected, modified, and removed using native git commands.

  • The refs can be pushed and pulled from remote repositories, allowing easy syncing.

Graphite simply stores small JSON blobs keyed on branch names, but this approach could be used to store any data under any keys while remaining accessible to a tool as common as Git. For example, Graphite has already started caching open PR statuses through git hash-object. By asynchronously fetching and storing PR information, Graphite is able to print elegant log outputs like:

Terminal
․ ◯ gf--fix_cycles_disallow_meta_parent_cycl PR #238 (Approved)
․ │ fix(cycles): disallow meta parent cycles
․ │ 68 minutes ago
․ │ * f0e3e7 - fix(cycles): disallow meta parent cycles
․ │
◌──┘

You can read Graphite's full implementation of metadata handing here.

This piece originally appeared on dev.to

Built for the world's fastest engineering teams, now available for everyone