Nikolaus Rath's Website

Why you should give Mercurial a shot

Git has arguable become the most widely used version control system in the open-source and free software communities. However, I believe this is mostly thanks to the amazing features of GitHub rather than the virtues of Git itself. As a matter of fact, I believe that if you can live without GitHub and don't need to manage a code base as large as e.g. the Linux kernel, you are much better of using Mercurial.

Before you continue with this post, please take a moment to read my previous article on the differences between Git and Mercurial to ensure that you're up to date and differences and similarities. With that settled, let's reflect on why you really should be using Mercurial for most of your projects.

Mercurial requires fewer concepts to grasp

There are only a few things you need to understand to use Mercurial efficiently:

  • Mercurial tracks changes to files in form of commits
  • Commits form a directed graph (each commit has one or two ancestors and zero or more children)
  • Tags are named commits, and heads are commits without children (i.e, the ones holding your most recent changes)
  • Bookmarks are pointers to commits that typically move to the youngest commit in a lineage.
  • Pulling from a remote repository adds additional commits into your local graph, and pushing to a repository puts your commits into the remote graph.
  • Merging means to create a new commit that combines the changes made in both of its parents.

Most of these concepts have close analogues in Git - but Git also introduces several additional ones, and then ties them all together so that you cannot just ignore the extra complexity. Many of these additional features go under the headline of "plumbing layer". In theory there are "high-level" Git commands that don't require you to know about the lower level details, and "plumbing-layer" commands that you should need only if you want to do unusual things. In practice, this separation unfortunately does not work out. While you rarely need the plumbing layer commands, using Git effectively still requires you to understand and be aware of the lower-level details. So let's take a look at what this means in practice...

Using Git takes up more mental capacity

The "minimum working knowledge" that you have to keep in your mind when using Git amounts to something like this:

  • Git stores (de-duplicated) "objects" that are identified by their hash values. Each object has a type and a value.
  • Git can store snapshots of a directory tree in these objects. The data is stored in "blob" objects, and the list of blobs that constitute a snapshot is stored in a "tree" object.
  • Git can order the snapshots into a directed graph that tracks changes to files over time. This is done by creating "commit" objects.
  • There is one special tree object called the "index". Commits can be created from the index, or directly from the file system.
  • Tags are pointers to commits, HEAD is a pointer to the most-recently checked out commit.
  • Branches are pointers to commits that typically move to the youngest commit in a lineage
  • A commit that has no descendants and no branch or tag pointing at it is in a "limbo" state: it does not exist in the commit graph and it will deleted when the next "garbage collection" runs. But it is still a regular "commit" object and (as long as it has not been garbage collected) you can still add a reference to it to make it part of the commit graph.
  • If you pull from a remote repository, the remote snapshots are added to your local repository. However, this would leave them in limbo (because nothing is pointing at them) so Git needs to assign a name to them. This is done by creating "remote tracking branch". If you repeatedly pull from a repository, the branch labels need to be moved from the old leafs to the new leafs -- which is called a "fast-forward merge" (even though nothing is merged).
  • Similarly, if you push to a remote repository, Git needs to tell the remote repository what branch name to attach to the new leafs. If this leaves other leafs without label, the operation is forbidded and you first need to create a new commit that descends from the leaf that you actually want to push, and the leaf that's going to be orphaned. This is called a merge (an actual one, not a fast-forward one). Alternatively, you can force the push, which will put the orphaned leaf into the aforementioned limbo - it may be garbage-collected or it may be revived by someone else pushing a new, named descendant.

At this point I just stopped, since this post isn't meant to be a Git tutorial. But this pretty much proves the point I'm trying to make here. Compare this list to the one for Mercurial, and consider that is the bare minimum that you have to keep in mind just to properly use Git's commit, reset, pull, and checkout commands. When you program, do you want as much of your memory and intellect to be available for the problem at hand, or do you want to reserve large parts of it for using your version control software?

Mercurial commands are more intuitive

The Git Koans make for some pretty entertaining reading, but they are depressingly true. Figuring out the Git command to perform a certain action requires either dumb memoization or intimate knowledge of Git's internals (or, in many cases, both). Mercurials commands, on the other hand, are typically easily deduced from what one wants to do.

Do you want to revert changes to a file that you have not yet committed? Use hg revert or git reset --hard. Do you want to create a Git branch / Mercurial bookmark? Use hg bookmark or git checkout. Do you want to edit your commit history (reorder commits, squash commits, remove commits)? Use hg histedit or git rebase -i. And the list goes on.

Git further increases the cognitive load by often using very unfortunate terminology. Why is the "staging area" also called the "index"? And why is the option to make commands work on the staging area called --cached? Why is a "feed-forward merge" called a "merge", if nothing is actually being merged?

Specifying Mercurial revisions is more powerful and intuitive

In Git, the only way to specify revisions is to use their hashes and combine them with a few single-symbol operators (e.g. @ and ^). The hashes are hard to remember and type even if the hash printed by your last git log command is still on the screen and you want to run git diff for it next. The restriction to single-symbol operators makes even simple queries difficult to understand, and hard ones impossible.

In Mercurial commit has an additional short, numeric id that you can use to identify it in such situations. These numeric ids change when the history is modified, but they make it much easier to refer to specific changesets when you execute a number of (history-preserving) commands in sequence (e.g. during bisection).

(Incidentally, I think this is also one of the reasons why Mercurial bookmarks aren't used as often as Git branches even though they are conceptually the same thing: in Mercurial, it's much easier to refer to a commit even if it doesn't have a user-defined bookmark pointing at it).

Mercurial allows you to specify sets of revisions by combining named functions, which is both easier to understand and more powerful. Quickly, which revisions are selected by last(ancestors(ef4b8),3) or modifies("Changes.txt") and date(">2016-05-01")? You can probably figure that out without looking at hg help revsets and hg help dates. Now, what is the meaning of @{5} or ef4b8^{/bug 284}? And how does the former differ from @{-5}? If you are a frequent Git user, you may be able to answer those right away - but I dare you to argue that this is in any way intuitive or obvious.

Furthermore, what if you want to find the last three commits preceding the first release after a given bug was fixed? With Git, you will have to write a little program to walk through the output of git rev-list. With Mercurial, you can run something like hg log -r "last(ancestors(tag("re:release-.*")),3) and descendants(3843)" (assuming that the bug was fixed in commit 3843 and that releases are tagged with names starting with release-).

Mercurial prevents accidental history rewrite

Git makes it very easy to loose the history of your project, and to mess-up even remote repositories. Git will happily let you rewrite history after you have pushed it, or accidentally reset a branch to an earlier commit. While it is generally possible to recover from such accidents, the necessary skills are generally only found in those people who don't make the mistake in the first place. In other words, Git makes it easy to make mistakes while at the same time making it hard to fix them.

Mercurial, on the other hand, makes it hard to make mistakes. Unless you specifically force it to, it will refuse to rebase or modify history that has been pushed. Since there is no need to name branches, a whole class of mistakes (like resetting a branch or committing in detached head state) cannot even happen in the first place.

Mercurial messages are easier to understand

In recent years, there has been a heroic effort to make Git easier to use for beginners. Among other things, this has lead to the introduction of (luckily optional) "advice" messages that are presumably intended to refresh the users memory and help him understand what Git has just done. Let's look at an example:

$ git checkout 979d
Note: checking out '979d'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

HEAD is now at 979d... Merge remote-tracking branch 'origin/master'

Now, I totally understand the intention here, but this completely fails in practice. If you need this message to understand what has happened, then you won't be able to understand it without going back to the documentation. And if you understand it, then you don't need it. Git is a perfect example of why it's generally not possible to fix a usability problem with additional documentation. The information that's provided here is too dense for people who need it, and at the same time also way too long for everyone (it get's in the way of experienced users, and further scares off people who don't understand it).

I would have liked to compare this with a message from Mercurial in the same situation, but there's a problem: in Mercurial, situations requiring such messages don't occur in the first place. The "detached head" state in Mercurial doesn't have a special name, it's a state like any other. There is no need to tell the user that he can "look around, make changes, and commit them" because that is always possible. Even more importantly, commits are never silently discarded (or impact other branches) so there is no need to warn about that. To delete an experimental commit, the user has to explicitly instruct Mercurial to do so. Therefore, the message that Mercurial prints when entering its version of a "detached head" state is:

$ hg update -r 2766
resolving manifests
removing tests/pytest_checklogs.py
getting tests/common.py
getting tests/conftest.py
getting tests/t4_fuse.py
getting tests/t5_failsafe.py
4 files updated, 0 files merged, 1 files removed, 0 files unresolved

In a nutshell

Mercurial is simple. It does the job without getting in your way, and without requiring you to worry more about your version control system than about the actual code you're working with. It implements everything that is required of a distributed version control system in the vast majority of cases.

Git, on the other hand, implements a generic content tracker with a distributed version control system built on-top. This makes it amazingly flexible, but in the vast majority of cases the additional flexibily is not needed and just adds unneeded complexity that cannot be avoided even when using only the high-level commands. The unfortunate and often counterintuitive terminology further adds to the cognitive load.

Having said all this, there are situations when Git is the better choice. If you have to work with very large repositories (the size of the linux kernel), Git is both faster and more space efficient. If you have a complicated development structure with many teams working on different features, Git's remote tracking branches make life much easier. And finally, if you are looking for a service like GitHub, then there is simply nothing comparable for Mercurial.

Let me thus close with a final appeal: the next time you need a version control system (and even more importantly, the next time someone new to DVCS asks you for a recommendation) do not immediately reach for Git. Consider not just the advantages of GitHub, but also the drawbacks of Git, and maybe give Mercurial a try.

Convinced? Then head over to HG Init or just start using Mercurial. Most likely you will find using it a lot easier than when you started using Git - you just got used to the pain.

Comments