Read Anthropic’s case study about Graphite Reviewer

On April 3rd, 2005, Linus cut a Linux kernel release candidate, 2.6.12-rc2. The release candidate itself wasn’t too interesting — in Linus’s words "The diffstat output tells the story: this is a lot of very small changes, ie tons of small cleanups and bug fixes.” — but would become significant for what it marked: the last non-Git release of the Linux kernel.

For the first ten years of development, prior to its usage of BitKeeper, the Linux Kernel version control tool of choice had just been Linus himself. The system worked like this: developers would submit tarballs and patches to a handful of Linsus’s trusted lieutenants. After vetting, the patches that passed review would then be sent up to Linus. Finally, Linus himself would incorporate them manually into his own source tree, and then cut the release.

Of course, Linus-as-a-version-control-service was far from a perfect product. In 1998, when Larry McVoy first sketched out the idea behind BitKeeper on the Linux Kernel Mailing List, he wrote "It's clear that our fearless leader [Linus] is, at the moment, a bit overloaded so patches may be getting lost

While this manual workflow seems barbaric to us today, at the time, Linus viewed this workflow as superior to the alternatives — namely CVS. Much later, when Linus gave a talk at Google in 2007 about Git, he mentioned one of his core design principles: “WWCVSND” or “What Would CVS Not Do?” Of course this hatred naturally extended to SVN as well; in the same talk, he’d go on to say with a smile “If there are any Subversion users in the audience, you might want to leave. My hatred of CVS has meant that I see Subversion as being the most pointless project ever started. The whole slogan for Subversion for a while was CVS done right or something like that. And if you start with that slogan, there’s no where you can go. It’s like, there’s no way to do CVS right.”

The heart of Linus’s criticism against CVS was its centralized nature. Given the hundreds of Linux developers out there, Linus felt it was critical that each of them have their own discrete copy of the repository that they could develop their own branches on. This both eased offline work and helped with internal politics; each developer was free to commit whatever they’d like to their own repository, and then would have the opportunity to convince the community that their changes were valuable. This prevented a single set of contributors with commit access from gatekeeping the sole repository.

BitKeeper stood in sharp contrast to CVS. In the aforementioned 1998 pitch for BitKeeper, Larry McVoy sketched out a system that, while reminiscent of how we think about source control today, was radically different for the time. McVoy wrote:

The mechanism which allows all this to happen is a distributed source

management system.

The main features of the system are:

- everybody gets a repository (contrast against the one repository

model of CVS)

- changes can be mailed around as "super-patches", also know as

change sets. A change set is just a patch file that contains

* All the changes broken up into one revision at a time

* An identifier that shows where the patch should be applied

in the tree (patches will fail if you aren't as up to date

as the sender of the patch)

* All the revision history for the changes

* Metadata such as pathname changes, symbolic tags

(like alpha2 or linux-2.1.133), etc.

- a new concept called a line of development (LOD).

It's logically a branch but it doesn't need to be on a branch.

Patches can (and will) be their own LOD. You can perform

operations on a LOD like "apply this to the trunk".

Later, Linus would give great credit to BitKeeper for changing his view and inspiring Git: “BitKeeper was not only the first source control system that I ever felt was worth using at all, it was also the source control system that taught me why there's a point to them and how you actually can do things. So Git in many ways, even though from a technical angle it is very, very different from BitKeeper, which was another design goal because I wanted to make it clear that it wasn't a BitKeeper clone, a lot of the flows we use with Git come directly from the flows we learned from BitKeeper.” (The phrasing here is a little awkward because it comes from the aforementioned live Google talk.)

While Linus himself held BitKeeper in high regard, his decision to use the tool internally for Linux in 2002 led to massive flame wars on the Linux Kernel Mailing List.

Why the flames? When Larry McVoy had built BitKeeper, he did it as part of a commercial, closed-source endeavor (BitMover). Though folks were able to use BitKeeper’s free community version, this came with a restrictive license.

From the BitKeeper Wikipedia entry: “The license for the ‘community’ version of BitKeeper had allowed for developers to use the tool at no cost for open source or free software projects, provided those developers did not participate in the development of a competing tool (such as Concurrent Versions System, GNU arch, Subversion or ClearCase) for the duration of their usage of BitKeeper plus one year. This restriction applied regardless of whether the competing tool was free or proprietary.”

Even Richard Stallman, last of the true hackers and free software evangelist, chimed in: "The spirit of the Bitkeeper license is the spirit of the whip hand. It is the spirit that says, ‘You have no right to use Bitkeeper, only temporary privileges that we can revoke. Be grateful that we allow you to use Bitkeeper. Be grateful, and don't do anything we dislike, or we may revoke those privileges.’ … Outrage at this spirit is the reason for the free software movement.”

But Linus took a far more pragmatic view; from his perspective, he just wanted the best tool for the job, regardless of where it came from. In 2007 he’d say, “And I was happy with [the BitKeeper arrangement despite the license] because, quite frankly, as far as I'm concerned I do open source because I think it's the only right way to do software. But at the same time, I'll use the best tool for the job and, quite frankly, BitKeeper was it.”

The uneasy marriage wasn’t to last, however.

In 2005, one of the Linux Kernel developers, Andrew Tridgell, forced the issue when he violated the license and reverse-engineered BitKeeper so that he “could pull stuff out of BK trees without agreeing to the BK license.” From Tridgell’s perspective, this was completely ethical; “I did not use BitKeeper at all in writing this tool and thus was never subject to the BitKeeper license.”

Larry McVoy disagreed. And initially, Linus was on his side:

“Larry is perfectly fine with somebody writing a free replacement. ... What Larry is not fine with, is somebody writing a free replacement by just reverse-engineering what he did. Larry has a very clear moral standpoint: ‘You can compete with me, but you can’t do so by riding on my coat-tails. Solve the problems on your own, and compete honestly. Don’t compete by looking at my solution.’ And that is what the BK license boils down to. It says: ‘get off my coat-tails, you free-loader.’ And I [Linus] can’t really argue against that.”

For his part, Linus tried for three months to play peacemaker. (And, if anything, his future comments indeed seemed to hint at more frustration with Andrew than Larry.) But ultimately there was no way of reconciling those differences.

On April 6, 2005, Linus emailed the Linux Kernel Mailing List, subject line “Kernel SCM saga…”, beginning the chain of events that would change an industry:

“Ok, as a number of people are already aware, we've been trying to work out a conflict over BK usage over the last month or two (and it feels like longer ;). That hasn't been working out, and as a result, the kernel team is looking at alternatives.”

He joked about the history — “It's not like my choice of BK has been entirely conflict-free (’No, really? Do tell! Oh, you mean the gigabytes upon gigabytes of flames we had?’)” — and stressed his continued gratefulness to the BitKeeper team.

Despite the outcome Linus looked back on the time with clear fondness:

In fact, one impact BK has had is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting submaintainers with much bigger things, and not having to work on a patch-by-patch basis any more. So the three years with BK are definitely not wasted: I'm convinced it caused us to do things in better ways, and one of the things I'm looking at is to make sure that those things continue to work.

So I just wanted to say that I'm personally very happy with BK [BitKeeper], and with Larry. It didn't work out, but it sure as hell made a big difference to kernel development. And we'll work out the temporary problem of having to figure out a set of tools to allow us to continue to do the things that BK allowed us to do.

In reality, while he posted the public breakup news to the mailing list on April 6th, Linus had already been hard at work. Three days prior, right after the release of 2.6.12-rc2, he had halted his work on the Linux kernel and switched his full focus to finding an alternative to BitKeeper.

Linus’s goal was to have “something usable in two weeks.” As part of that April 6th email, he announced "I'm going to be effectively off-line for a week (think of it as a normal "Linus went on a vacation" event) and I'm just asking that people who continue to maintain BK trees at least try to also make sure that they can send me the result as (individual) patches, since I'll eventually have to merge some other way.”

Linus’s emails conveyed a clear sense of the urgency; after all, the next Linux kernel release was blocked until he could figure this out.

In an email on April 7th he even noted that in the worst case scenario, the Linux Kernel might even move to a centralized version control system: "NOTE! I detest the centralized SCM model, but if push comes to shove, and we just can't get a reasonable parallel merge thing going in the short timeframe (ie month or two), I'll use something like SVN on a trusted site with just a few committers, and at least try to distribute the merging out over a few people rather than making me be the throttle.”

While the outcome is clear to us in retrospect, at the time, the emails show that writing a custom version control system was far from a given. Out of the 205 emails in the chain that spanned from the first email on April 6th to the last on April 12th, there was much discussion of other open-source alternatives — Monotone, GNU arch, Bazaar-ng, Darcs — with some of these tools’ creators jumping in to advocate for their project.

(Even the Subversion developers chimed in with their post "Please Stop Bugging Linus Torvalds About Subversion.”)

The main consideration, especially from Linus, seemed to be overall performance of each of these tools. All in all, out of all 205 emails, there was a lot of talk of performance and efficiency of the various tools.

The biggest question on everyone’s mind seemed to be: would any of the existing tools work for a project the size of the Linux kernel?

On April 8th, two days after his initial email and five days after he started work in earnest, Linus shared an update: “In the meantime (and because monotone really is that slow), here's a quick challenge for you, and any crazy hacker out there: if you want to play with something really nasty (but also very very fast), take a look at kernel.org:/pub/linux/kernel/people/torvalds/.”

Git was born.

When we hear about Linus writing the original Git in two weeks, it’s worth adding a large caveat: when we think of git today, we think of the user-facing commands and the overall workflow but at the time, the goals — and mandate — were much different and far more limited in scope.

As different folks debated the merits of various tools and approaches on the email list, one person wrote, describing roughly what was needed: “It is ok to be a little slow so long as it is not pathetically slow. The purpose of the interim solution is just to get the patch flow process back online.”

Linus’s original Git was far more of a content-addressable file system than a fully-fledged source-control management system. Here’s his explanation from another email:

(*) I call this "commit", but it's really something much simpler. It's really just a "I now have <this directory state>, I got here from <collection of previous directory states> and the reason was <reason>".

That, btw, is kind of the design. "git" really doesn't care about things like merges. You can use any SCM to do a merge. What "git" does is track directory state (and how you got to that state), and nothing else. It doesn't merge, it doesn't really do a whole lot of anything.

So when you "pull" or "push" on a Git archive, you get the "union" of all directory states in the destination. The HEAD thing is one pointer into the "sea of directory states", but you really have to use something else to merge two directory states together.

When there was discussion of a cherry-picking workflow and moving around commits in the email thread, Linus outscoped this from the client he was building.

All in all, Linus would comment "'Git' is really trivial, written in four days. Most of that was not actually spent coding, but thinking about the data structures.”

It’s a sentiment that he’d later echo again and again and is sometimes taken out of context but rings true — the data structure choice at the time was the novel part of the code Linus had written. After Linus shared his first few commits, there was no discussion of workflow (what most comes to mind when thinking about Git today). Instead most discussion centered around the architecture we now take for granted: Git’s direct interfacing with the filesystem and its hash-based approach to tell which files had changed (and ensure data integrity).

A minority contingent advocated for using SQL to store the changes. In classic Linus fashion, here’s one exchange about the latter:

> Why not to use sql as backend instead of the tree of directories?

Because it sucks?

I can come up with millions of ways to slow things down on my own. Please come up with ways to speed things up instead.

Linus

But there were also a few who saw the vision and were eager to start work on Git. The same day that Linus asked folks to check out the changes in his directory, a few folks sent back scripts that built additional functionality on top of his foundation.

Two weeks after he had started, on April 17 2005, Linus emailed the email list: “First ever real kernel Git merge!”


The most satisfying part about reading through the initial email chain nearly twenty years after all of these events transpired is seeing the hints of the future, unknown to the authors at the time.

In evaluating one of the proposed source control alternatives, “monotone,” a user wrote:

One slightly annoying thing is that monotone doesn't appear to have a web interface. I used to use the bk one a lot when tracking down bugs, because it was really fast to have a web browser window open and click through the revisions of a file reading checkin comments, etc. Does anyone know if one is being worked on?

At the time, web UIs for source control and code review were just becoming popular; at Google, Guido van Rossum was building their first dedicated piece of code review tooling on the web in Mondrian. And, two years after this email thread, GitHub would be founded.

There’s also an irony in all of the back-and-forth about performance in the email chain.

A brief side discussion expressing concern with how Git might perform if backed by a network-based file system has been flipped on its head in the present day; for giants like Google (FUSE) and Meta (EdenFS), these network-based, source-control-aware file systems are a critical part of how they continue to scale source control and builds in their massive monorepos.

And the core concern at the time that had inspired Linus to write Git in the first place — that none of the existing revision control alternatives would be performant enough to support the large history and commit throughput of the Linux kernel repo?

That would replay itself just a few years later when Meta would migrate its main repo off of Git itself for just the same reason.

Of course, at the time, all of this was unknown to the various folks participating in the original “Kernel SCM saga…” email chain.

Git was created as a tool to unblock future Linux kernel releases — not intended as a global reinvention of all source code management; Linus’s comments highlight that he explicitly saw source code management as the domain of other tools that would then interface with Git.

When we think of history, we often romanticize it as being born of a sudden stroke of inspiration. But the creation of Git shows the far harsher reality of invention: a slowly escalating disagreement over a license; the need for a scrappy backup solution to unblock work; and then continued polishing and iteration through years and years, led not by the inventor, but rather a community.

Eventually BitMover did open-source BitKeeper. Tying a bow on the whole situation, an HN commenter (Bryan Cantrill, now the CTO of Oxide) left a fantastic comment when BitMover made the announcement in 2016:

“The grand irony is that Larry was one of the earliest advocates of open sourcing the operating system at Sun[1] -- and believed that by the time Sun finally collectively figured it out and made it happen (in 2005), it was a decade or more too late.[2] So on the one hand, you can view the story of BitKeeper with respect to open source as almost Greek in its tragic scope: every reason that Larry outlined for "sourceware"[3] for Sun applied just as much to BK as it did to SunOS -- with even the same technologist (Torvalds) leading the open source alternative! And you can say to BK and Larry now that it's "too late", just as Larry told Sun in 2005, but I also think this represents a forced dichotomy of "winners" and "losers." To the contrary, I would like to believe that the ongoing innovation in the illumos communities (SmartOS, OmniOS, etc.) proves that it's never too late to open source software -- that open source communities (like cities) can be small yet vibrant, serving a critical role to their constituencies. In an alternate universe, might we be running BK on SunOS instead of Git on Linux? Sure -- but being able to run an open source BK on an open source illumos is also pretty great; the future of two innovative systems has been assured, even if it took a little longer than everyone might like.”

A response was posted by another HN user "luckydude" — Larry McVoy himself.

"Yeah this irony is not lost on me. But in both cases, the companies acted in self interest. Neither had the guts to walk away from their existing revenue stream. It's hard to say what would have happened.

It's been an interesting ride and if nothing else, BK was the inspiration for Git and Hg, that's a contribution to the field."

Built for the world's fastest engineering teams, now available for everyone