Momentary Fascinations: How To Use Mercurial For Local Source Code Management With A Public Subversion Server

How To Use Mercurial For Local Source Code Management With A Public Subversion Server

permalink categories: programming originally posted: 2007-01-05 17:25:30

I'm working on contributing some patches to Python. According to the Python patch submission guidelines:

We like unified diffs. We grudgingly accept contextual diffs. Straight ("ed-style") diffs are right out!
If you send diffs for multiple files, concatenate all the diffs in a single text file.
We appreciate it if you send patches relative to the current svn tree.

If you're like most people, you don't have Subversion checkin privileges for Python. That means you're going to be doing a read-only checkout of the sources. Which means you can't use the Python Subversion repository to check in your changes later.

I prefer to use a source code manager whenever I write code. But it seemed like a lot of bother or my first couple of patches, I just gritted my teeth and worked directly in the Subversion checkout directory, using svn diff to make the unified diffs. But that meant I couldn't check in ever; I wound up keeping zip backups (what I call "hillbilly SCM"). This was even more painful because I had one patch based on another patch; I kept them in separate directories, and manually propogated fixes around. I made many mistakes when hand-merging my code around, and vowed "never again".

A Better Way

When I volunteered to tackle this latest (and hopefully final) rewrite of the patch, I resolved to use an SCM. Now, I knew I wouldn't use the official Python Subversion repository; I don't have write access, and anyway I wanted to work on branches, and Subversion's branch support roundly sucks. But I had an idea: I'd use my own SCM. I'd get the files from Python's Subversion server, then check them right back in to my own SCM, then use that as usual during development.

I decided to use Mercurial, for the following reasons:

Mercurial is one o' them new-fangled "distributed SCM systems", which means it's designed to work really well as a standalone system—no central server required.
Mercurial has excellent (if slightly puzzling!) support for branching.
Mercurial is fast as lightning and feature-rich to boot.
Like most SCMs these days, Mercurial automatically produces "universal diffs".
Mercurial is itself written in Python, so it seemed like good karma.
I wanted to play with a new tool and Mercurial sounded like fun.

I'm happy to report it all turned out great—and the best part is that it's really easy. The only hard part was figuring out how to do it. So I hope you all use this technique when contributing your own patches!

Notes:

To save time, I'm going to assume you've already installed Subversion and Mercurial.
If you don't have a good grasp of what working in a distributed SCM system is like, I highly recommend the Monotone tutorial. Start with "1. Concepts" and read through "2. Tutorial", stopping at "3. Advanced Uses". Monotone is a little different from Mercurial, but current distributed SCMs share a lot of the same concepts and workflow.
I'm going to tell you exactly how I set up my work for Python; I'm actually redoing it in another window as I type this, so I won't miss a step.
In case you're wondering, I'm doing my work on Windows; there are one or two spots where non-Windows folks should do something different, and I'll be sure to point that out to you when I get there.

The Basic Approach

In a nutshell, you checked out a copy of the Subversion tree, then use Mercurial to make a copy of it, and do all your work in that copy. The Subversion tree remains read-only; you only ever pull changes out of it, you never push changes into it. It's like a waterfall; changes fall out of Subversion and into your work directory, but they never swim back up the waterfall—until that happy day when your patches are accepted!

Here are the basic steps involved:

Check out the read-only Subversion tree.
Check that tree in to Mercurial. This tree will always be read-only.
Clone that tree to a second directory. This is the tree where you do your work.
When there are updates to the Subversion tree, update your read-only tree and pull the changes into your work tree.
When it's time to make your patch, diff your work tree to the read-only tree.

The Method, Revealed

Step 1: Pack Your Trunk

In this step we'll make a directory tree that is both your read-only copy of the current version checked in to Subversion and a Mercurial repository. Note that this is the longest step in the whole how-to; your actual day-to-day work will be much easier.

First, you need to pull the current SVN tree into its own directory. This is going to always represent the original source tree without your changes, what's often called "mainline" or "trunk". I'm going with "trunk". I run this:

% svn co http://svn.python.org/projects/python/branches/p3yk trunk

That'll take a couple minutes, so go get yourself a frosty cup of hot dog water or whatever it is you like to drink.

Once it's finished, you'll have a nice fresh copy of the Python source tree that Subversion will happily update for you. Next we're going to make that exact same directory into a Mercurial repository:

% hg init trunk

That finishes immediately, almost as if it didn't do anything, which is almost right. It's a Mercurial repository, but Mercurial doesn't know about any of the files in it yet. We need to check our fresh source tree right back in to Mercurial. However, we don't want to check in all of Subversion's metadata; that would be confusing and silly. To teach Mercurial to ignore Subversion's metadata, create a new file in the root of the repository called .hgignore with shell globs, one per line, for every set of files you want it to ignore. For right now all you really need is the first entry, but I'm going to save you a little time and show you my whole list:

syntax: glob
.svn\
*.o

# Windows build output
*.obj
*.pch
*.idb
*.pdb
*.exp
*.exe
*.lib
*.dll
*.suo
*.ncb
*.ilk
*.sbr
*.res
~BuildLog.htm
BuildLog.htm

# Python-specific build output
*.pyd
*.pyc
*.pyo
PC\python_nt_d.h
PC\pythonnt_rc_d.h
PC\python_nt.h
PC\pythonnt_rc.h

(UNIX folks probably just need the top two globs, but they might want other things too. Don't ask me.) Copy and paste that into .hgignore.

Now you're ready to check in the whole tree.

% cd trunk
% hg add

That recursively adds every file under the current directory to the repository—and it's shockingly fast.

We're almost ready to check in, but first we'll take advantage of a new Mercurial feature: named branches. Now, Mercurial's support for branches is a little... unusual. The Mercurial wiki puts it best:

In Mercurial, a branch is a repository. Nothing more or less. A repository is a branch. Repeat the soothing mantra.

That's not to say that you can't give your branches (repositories!) useful names. Named branches are a new feature in Mercurial, and so far no nomenclature has been suggested. I went with this:

% hg branch larry@hastings.org:py3k:trunk

If you're never going to push changes to other people's Mercurial repositories, it probably doesn't matter much what you name your branch.

(Truthfully, you don't even need to give it a name at all; nothing I'm going to show you uses branch names. But it'll be helpful in case you ever need to go spelunking in your version tree; revisions are automatically tagged with the "branch" they were checked in to, so it's just a little extra helpful metadata for your revision history.)

At last, we're ready to check this mess in to Mercurial:

% hg ci

This will grind for a while, then pop open an editor for you to make your checkin comment. Type something pithy like "Checkin of current Python 3000 source tree from SVN.", save, and exit. Mercurial grinds a little more and it's done.

Step 2: Make Like A Tree And Branch

Now it's time for you to make your work directory. I'm going to call mine concat, as that's an mnemonic for the work I'm going to do in that tree.

% cd ..
% hg clone trunk concat

Next, rename the branch:

% cd concat
% hg branch larry@hastings.org:py3k:concat

And you're done—you're ready to rock and roll. Get to work!

Step 3: Update From Subversion

When new updates of the original tree are available, you'll want to update your patch to be compatible with them. Here's how. Starting at the parent directory to our two Mercurial br—I mean, repositories:

% cd trunk
% svn update

That'll pull down all the updates from the Subversion repository.

% hg addremove
% hg ci

That'll update the Mercurial repository so it has adds all the added files, and removes all the removed files.

% cd ../concat
% hg status

It's always a good idea to make sure everything is checked in. If you have any files marked as new/modified, make sure you check everything in (hg ci) before you run the next commands.

% hg pull ../trunk
% hg merge

That will pull the new version from trunk (your read-only Subversion copy) into concat (your work directory), then interactively merge those changes into your existing changes. Good luck!

Step 4: Make Your Patch

Edit: 2007/01/08, gave up on hg outgoing, wrote hgbd.py

Finally! You're ready to publish your changes to an adoring world. It's time to run a diff between our work branch (repository) and the stock Python branch (repository). Now here comes some slightly bad news: Mercurial can't do this straight out of the box. I had originally done this with "hg outgoing -M -p ../trunk", but that produces one diff per revision! And that's as close as Mercurial currently gets to a pure cross-branch diff.

However! It's easy to create this functionality ourselves, and we're only cheating a little. The crucial observation is this: the tip revision in trunk always exists in the current branch. (We always pull it over using hg pull, remember?) So we don't really need to do a cross-branch diff after all. We just need to do a diff in the current branch against that revision.

To do that, we need to find out what the tip revision is in the other branch. And that's easy; just move to that branch and ask. The specific command you want is:

% hg tip —template {node}

That prints out just the full node ID of the current tip revision. From there it's easy; move back to the concat branch and run

% hg diff -r <node-id> -r tip

where <node-id> is the node ID we just pulled out of the trunk.

There's one more fly in the ointment: hg diff is dumb (perhaps on purpose) about EOL convention issues. To make a long story short, if you run hg.exe diff on two files with Windows-style EOL you'll get \r\r\n sequences littering the output. (And yes, I've already let them know.)

I've killed two birds with one script. I call it hgbd.py, for hg branch diff. It runs the cross-branch diff for you and culls the extraneous \r characters from the output. Here it is:

import subprocess
import sys

if len(sys.argv) < 2:
    sys.exit("usage: hgbd path-to-branch\n"
        + "produces a patch between the tip of"
        + " the other branch and the current branch.\n"
        + "the tip revision of the other branch"
        + " must exist in this one.")

child = subprocess.Popen("hg tip —template {node}", cwd=sys.argv[1], stdout=subprocess.PIPE)
child.wait()
node = str(child.stdout.read())

child = subprocess.Popen("hg diff -r " + node + " -r tip", stdout=subprocess.PIPE)

output = []
while 1:
    data = child.stdout.read()
    if len(data) == 0:
        break
    output.append(data)

diff = "".join(output)
diff = diff.replace("\r\r\n", "\n")
diff = diff.replace("\r\n", "\n")
sys.stdout.write(diff)

Install that on your path in such a way that you can run it directly, and now generating patches is as easy as:

% cd concat
% hgbd.py ../trunk > patch.txt

Boom, you're done, the patch is ready to go!

Optional: A Patch On A Patch

Let's say that you're crazy like me and want to make a patch built on top of your other patch. I did this; my "lazy slices" patch required my "lazy concatenation" patch. Try doing that without your own SCM and you'll just go crazy—but with our handy-dandy Mercurial it's easy.

First, clone your work repository to a second repository; this second repository is where you'll create your stacked patch:

% hg clone concat slice

Now do your work in the slice tree. The rest of your workflow is mostly the same, except that the slice tree's parent is concat, not trunk:

When there's a new version in the Subversion repository, first pull from trunk into concat, then pull from concat into slice.
To make your stacked patch (from concat to slice), run
```
% cd slice
% hg outgoing -p ../concat
```
To make a cumulative patch that has all your changes (from both concat and slice), run
```
% cd slice
% hg outgoing -p ../trunk
```

Final Notes

If you have to make changes to your .hgignore file, make them in your trunk directory and pull them into your work directory.
Mercurial's patches are a little funny; it prepends fake "a/" and "b/" directories onto the names of the files listed in the patch. You can easily ignore these when applying the patch by running
```
% patch -p1 < your.patch.txt
```

Happy patching!

Another Approach: Mercurial Queues

Edit: 2007/01/08, added this section

When I was having problems using hg outgoing to generate my patches, I went on the #mercurial IRC channel to get some help. Several people there suggested that I'd be better off using Mercurial Queues instead of multiple branches. I don't agree, and I'll tell you why.

First, let's make this clear: Mercurial Queues are really really neat. It's a way of stacking patches on top of each other. You start with your base repository, then "push" a patch, and perhaps "push" another... as deep as you like. To undo a push, you can "pop" the patches off. Perhaps best of all, if you edit the files in the repository with a "patch" set as current, you can save your changes back into the top patch! Rather than make hash of any further explaination, let me point you here, here, and to Chapter 6 of the unofficial Mercurial book here.

So if queues are so neat-o, why don't I want to use them? Because they're not native Mercurial objects. Mercurial queue patches are stored in their native "unified diff" format. While you can modify a patch, all Mercurial does with your edits is store them back into the patch file. You can't set checkin comments, and you can't go over your checkin history, so clever activities like hg bisect aren't possible.

Mercurial queues are great for managing patches, but for day-to-day development I want to leave behind a revision history.