Technology: Source code Management (SCM)

Author: Alexander Schatten

Source Code Management

Source code and other resources like icons, configuration files or documentation are the core assets of a software project. Hence careful management of these resources is an important issue, particularly in team collaboration. The following aspects have to be taken into consideration:

  • Source code should be versioned, to be able to undo changes and refer back to older version of the software
  • Often older versions of a Software have to be maintained although newer version are already available (e.g. to patch security issues: Version 1, Version 2, Version 1.1, 1.2, 2.1, 2.2 and so on).
  • Team-collaboration has to be considered: sharing of code between developers has to be transparent, reliable and traceable.
  • Versions and other important milestones in the project should be marked in the version history to be able to go back to a specific version in the history.
  • Changes in the source code should be communicated and annotated to be transparent for the whole team

Source code management (SCM) systems are designed to support the developers with these issues and thus are part of every software project; and are even recommended for tiny one-person projects.

Centralised vs. Distributed SCM

Two different approaches to SCM can be distinguished: (1) centralised and (2) distributed systems:

Centralised systems like subversion work server based. A central server is responsible for keeping the version and metadata for a project. On the client side, working copies can be pulled from the server. Changes on the clients are committed back to the server.

Distributed systems are very lightweight systems and do not rely on a central repository or server (although they sometimes use central repositories "by definition"). Every developer has a complete repository (including the whole history and metadata) on his machine. Distributed SCM systems allow a variety of collaboration patterns. Most new projects use distributed SCM systems. Some concrete advantages over centralised systems are:

  • No central repository (and no server) required; each client has the full history of the project.
  • Offline work is better supported.
  • Performance on repository interaction is usually much better
  • A broad variety of collaboration patterns are supported (see also below)
  • A repository can be easily cloned (with one command); this way, "experiments" on the source code or repository can be isolated from other developers; in case of problems, the test-repository can be deleted; if the "experiment" was successful, the changes can be pushed to other repositories.
  • Creating repositories is very simple (usually only one command is necessary). Hence also small projects on one machine can benefit from SCM.
  • Merging is very well supported and usually more straightforward than with centralised systems.

If multiple developers are working in the same project, conflicts are possible, e.g.: two or more people changing data in the same file. To handle conflicts two approaches are common: (1) Locking and (2) Merging. Locking means, that resources (e.g. files) have to be checked out from the repository and locked before they can be changed. Other developers cannot modify these files when locked. This approach can be useful for binary resources, with text-based files and particularly with source code locking is not a very efficient collaboration strategy. Merging on the other hand allows also conflicting changes but supports the developers in resolving (merging) the conflicts on commit. Hence in SCM systems designed for source code management, merging is the common practice.

Distributed SCM systems however, can by nature not support locking mechanism, as they do not use central server instances.

References

  • Mercurial, GIT and Bazaar are very popular (Open Source) distributed SCMs.
  • Subversion is one of the leading centralised SCM systems (Open Source).

Mercurial Distributed SCM

Overview

In this document a brief introduction to one of the leading distributed SCM systems, Mercurial, is given. Other DSCMs behave very similarly and the general concepts described here are the same in systems like GIT or Bazaar. Mercurial was selected as it is very easy to learn and understand, yet very powerful for projects and collaborations. Binary installers are available for Unix-like systems, OS X and Windows.

In the following short introduction to Mercurial Unix Bash-Script syntax is used. The Mercurial commands are identical in the Windows command line though. The general idea should be clear also for non Unix systems:

1: $ hg status
2: ? a.txt

The $ sign indicates that the following text in this line should be entered as a command in the command line ("hg status" in this case). In the next line (without $) the result from the command is shown ("? a.txt" in this example). The lines are numbered for easy reference within the text. Commands in Mercurial can be abbreviated as long as the abbreviation is clear, e.g. hg st can be used instead of hg status.

The echo "text" >> filename Unix command is used to add text to files. Of course any visual editor can be used instead. So the previous echo command can be replaced with (1) Open a text editor (2) write "text" (3) save the text file under the name "filename" The echo command is used to provide complete working examples.

Some output maybe abbreviated, e.g. in the glog output often user and date lines are removed to keep listings short.

How to Read this Tutorial

For a basic understanding of distributed SCM and Mercurial (e.g. for individual use) read the sections:

More advanced functionality is described in the following sections (this will be needed particularly as soon as team collaboration is needed):

Finally a short overview on "modern" collaboration platforms that support DSCM is given in the final section, a short overview on helpful tools and further information resources is given.

Getting Started

For installation instructions follow the documentation on the Mercurial website. Plugins for Eclipse and other IDEs are available.

For the initial setup it is recommended to create a .hgrc configuration file. On Unix-like Systems and Mac OS X this file should be created with a text-editor in the user root directory. For details check the wiki-page. For a start, the initial .hgrc file could look like this:

[ui]
username = Firstname Lastname  <firstname.lastname@company.com>
[extensions]
hgext.graphlog = 
hgext.purge = 
hgext.rebase = 

It is very important to set at least username and email address, as these settings are used to associate changesets to particular users.

Mercurial comes with a set of basic functions. A large number of additional extensions is available, a lot are already part of the distribution. These extensions are not activated by default to not confuse the user with a large number of commands that are most likely not always needed. In this config file three extensions are activated: graphlog, purge and rebase.

Mercurial offers a easy to understand online help system: hg help lists all known commands; hg help command shows the detailed help of command

Creating a New Project

$ mkdir hgexample
$ cd hgexample
$ hg init

First a new directory hgexample is created, then the directory is changed and hg init prepares this directory and all subdirectories to be from now on under Mercurial version control. That's all that is required to create a new repository. The init command can also be used in a directory that already contains files.

$ hg status

The status command gives an overview on the current status of the repository (new files, changed files, ...). In this newly created repo there is nothing to report (yet).

Cloning an Existing Project

If a Mercurial project exists already, either on the same computer (local) or on a server, the clone command can be used to create a local clone of the other repository:

$ hg clone repo clone_repo

This command assumes that the directory repo contains a Mercurial repository and creates a clone of this repository in the clone_repo directory.

$ hg clone ssh://username@server.com//home/project/repo project_clone

This version of the clone command connects via ssh to the server.com and searches for a Mercurial repository in /home/project/repo (the double // in the url is correct!). This repository is then cloned locally in a directory named project_clone

It is important to understand, that the clone of a repository is not just a working copy but contains all information including the full version history from the source repository. The source and the cloned repository hence can be seen as being identical.

Making Changes

In the following example a Java class is created, the new file is added to revision control and a change is committed:

1: $ echo "class Test { public static void main (String[] args) { System.out.println(\"Hello World\");}}" > Test.java 
2: $ hg status
3: ? Test.java
4: $ hg add Test.java
5: $ hg status
6: A Test.java
7: hg commit -m "Created initial Text class with 'Hello World' statement."
8: hg glog
9: @  changeset:   0:684afa0debd6
      tag:         tip
      user:        Firstname Lastname <firstname.lastname@company.com>
      date:        Fri Jan 29 17:31:04 2010 +0100
      summary:     Created initial Text class with 'Hello World' statement.
  • In line 1 a file with the name Test.java is created. This file contains a simple Java class that prints out "Hello World"
  • The command in line 2 checks the status of the repository, line 3 indicates (?) that there is a file that is not yet under version control.
  • The add command tells Mercurial to schedule this file for version control. If no file name is given, all files that are not yet under version control are added to the repository. This command prepares files for version control, but does not actually create a new version. This is done in line 7
  • The status command in line 5 now indicates that Test.java was added to the version control but was not yet committed.
  • Line 7 actually tells Mercurial to create a new version. A new version is created containing all changes to files that were already under version control since the last commit and all newly added file(s).
  • The glog command lists all versions (= changesets) in the repository.
  • Line 9 and following display the first changeset in the repository.

The user, date and summary fields should be clear. The changeset version is a little bit more tricky: As Mercurial is a distributed version control system, there is not one central server instance that can provide unique increasing version numbers. Hence Mercurial generates changeset ids that are unique ("684afa0debd6" in the example above) even over multiple users in cloned repositories. The counter ("0") is only a convenience counter that helps navigation within one repository and must not be used to identify changesets between clones of repositories.

Status of the Repository

To get an idea about the status the repository or the working copy is in, three commands are important to understand:

  • hg status: shows the status of files in the working directory: e.g. files that are not under revision control (?), changed (M), deleted files (R)... The status command can also list differences between revisions using e.g. hg status --rev 14:18 in a compact way.
  • hg (g)log: The log command shows the revision history of the repository. The glog command also displays a graphical outline of the history.
  • hg parent: the parent command is helpful to show the parent revision of the current working copy. The hg update command can set the version of the working copy on an arbitrary version. hg pull can pull new changesets into the repository. parent shows which version actually is used in the working copy. In the glog command the parent is indicated with the @ symbol.

Navigating in the Version History

A SCM system keeps track of changes in a project and also allows to restore arbitrary versions from the project history. In Mercurial the hg update command is used to "navigate" in the version history. hg update without further parameter sets the working copy to the most recent version ("Tip") of the current branch. hg update --rev revision sets the working copy to an arbitrary revision. In the following example the Java class from the example above will be modified and the update command is used to navigate in the history:

 1: $ echo "// Todo: add more functionality">> Test.java
 2: $ hg status
 3: M Test.java
 4: $ hg ci -m "Added todo to Test.java"
 5: $ hg glog
 6: @  changeset:   1:ce41ef368b17
    |  tag:         tip
    |  user:        Firstname Lastname <firstname.lastname@company.com>
    |  date:        Sun Feb 07 20:21:14 2010 +0100
    |  summary:     Added todo to Test.java
    |
    o  changeset:   0:684afa0debd6
       user:        Firstname Lastname <firstname.lastname@company.com>
       date:        Fri Jan 29 17:31:04 2010 +0100
       summary:     Created initial Text class with 'Hello World' statement.
 7: $ hg parent
 8: changeset:   1:ce41ef368b17
    tag:         tip
    user:        Firstname Lastname <firstname.lastname@company.com>
    date:        Sun Feb 07 20:21:14 2010 +0100
    summary:     Added todo to Test.java
 9: $ hg update 0
10: 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
11: $ cat Test.java
12: class Test { public static void main (String[] args) { System.out.println("Hello World");}}
13: $ hg parent
14: changeset:   0:684afa0debd6
    user:        Firstname Lastname <firstname.lastname@company.com>
    date:        Fri Jan 29 17:31:04 2010 +0100
    summary:     Created initial Text class with 'Hello World' statement.
15: $ hg update     
16: 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
17: $ cat Test.java
18: class Test { public static void main (String[] args) { System.out.println("Hello World");}}
    // Todo: add more functionality
19: $ hg parent
20: changeset:   1:ce41ef368b17
    tag:         tip
    user:        Firstname Lastname <firstname.lastname@company.com>
    date:        Sun Feb 07 20:21:14 2010 +0100
    summary:     Added todo to Test.java
  • Line 1-4: Add new line to file and commit change.
  • Line 5-6: Show repository history
  • Line 7-8: parent command indicates that the working copy is representing the "Tip", i.e. the most recent revision. The "@" sign in the glog history in line 6 also shows the parent.
  • Line 9: The working copy is set to revision 0, line 10 shows that one file was changed, lines 11-14 confirm that the Test.java file in the working directory is set back to revision 0.
  • Line 15-20: The update command is used (without parameters) to set the working copy back to the most recent version (Tip) of the repository.

Adding, (Re)moving and Ignoring Files

The hg add command was already explained above: it puts files under revision control and with the next commit these files are added to the repository. The "opposite" command is occasionally needed: hg forget removes files from the repository/version control, but does not delete the file from the working copy. forget also does not modify the history of the repository.

Additionally Mercurial offers commands known from the Unix shell to remove files, and move files and directories: hg rm removes files from the working copy and from the repository after the next commit.

Moving files in the working copy with non-VCS commands like using the Mac Finder or Windows Explorer is usually not a good idea. For many VCS systems this looks like the old file was deleted and a new file was added on a different location. The consequence is, that the change history in the file is lost or only difficult to reconstruct. Hence it is always a good idea to use the commands the SCM system provides to move files and directories; Mercurial offers: hg mv.

Mercurial offers a second command: addremove: This command can be used to add all new files under revision control and remove all missing (i.e. deleted or moved) files from revision control. Particularly the --similarity switch can be very helpful: Assume some files or directories were moved with non-Mercurial commands, e.g. with the Windows Explorer. Now, on executing the hg addremove --similarity 100, Mercurial checks all files that are deleted and looks for newly added files that are 100% similar, i.e. identical. If such files are found, Mercurial detects that they were actually moved. If the percentage value is set lower, also minor changes in the files are ignored. This command can be very helpful to "reconstruct" moves.

Finally, ignoring files is a very important feature. Generated files like .class files or target directories in Maven projects, or generated html-reports should never be put under revision control. First, they are redundant, secondly they clutter the version history! Every time a developer rebuilds a project potentially hundreds of files (all generated during the build) are "new" or "changed". If these files are under revision control, the version history gets messed up. Consider the following situation following the previous example:

 1: $ ls
 2: Test.java
 3: $ hg status
 4: 
 5: $ javac Test.java
 6: $ ls
 7: Test.class  Test.java
 8: $ hg status
 9: ? Test.class
10: $ echo "syntax: glob" >> .hgignore
11: $ echo "*.class" >> .hgignore
12: $ cat .hgignore 
13: syntax: glob
    *.class
14: $ hg status
15: ? .hgignore
16: $ hg add
17: adding .hgignore
18: $ hg commit -m "Created .hgignore and added *.class"
19: $ hg status
20:
  • Lines 1-4 show that Test.java is under revision control and all changes are committed, hence the status command shows no result. Test.java is also the only file in this directory.
  • In line 5 Test.java is compiled using javac, the result is a Test.class file.
  • Lines 6-9 indicate that a new file, Test.class was generated and Mercurial shows, that this file is not under revision control. Now, generated files should never be put under revision control hence:
  • Lines 10-13: Each repository usually has one .hgignore file in the root of the repository. All files (or files that follow patterns like *.class in this example) are ignored by Mercurial.
  • Lines 14-15: Mercurial already ignores the Test.class file, but indicates, that the .hgignore file is not yet under revision control.
  • In lines 16-18 this .hgignore file itself is put under revision control.
  • The status command in line 19 now shows no result, i.e. the Test.class file and all other .class files will be ignored from now on.

Usually all repositories have a .hgignore file; all files to be ignored or patterns of files are put one file/pattern pre line.

Sharing Changes

In most projects many developers are collaborating. Distributed SCMs allow a broad variety of collaboration patterns. In this short tutorial only two common collaboration patterns are outlined:

  1. Collaboration using one central repository on a server (company server, Google Code, Bitbucket, ... or the built-in Mercurial webserver).
  2. "Peer to peer" collaboration without a central server; changesets are exchanged via email.

(1) Collaboration with a central repository: The concept of cloning was already explained in a previous section. Cloned repositories share a common history. Changes on one of them can be exchanges using the push and pull command. A simple way to create a central company or workgroup repository is, to create a Mercurial repository on a Linux server where all developers have ssh access to this directory on the server. This central repository can be cloned using the clone commande as described above. For Open Source source projects also services like Google code or Sourceforge can be used. Commercial projects can use services like Bitbucket as central repository.

Developers now make changes (commits) to their local repositories. These changes can be "pushed" from the local repository to the server using the hg push command. A Mercurial repository "remembers" where it was cloned from, hence push without parameter pushe the changes to the parent repository.

hg pull pulls new changesets from the parent (central) repository to the local repository. Warning: pull loads new changesets into the repository, but does not change the working directory, i.e. in most cases hg update should be done after the pull. Alternatively the command hg pull -u can be executed. Pull with this switch automatically updates to the Tip after loading the new changesets.

hg outgoing and hg incoming are two very useful commands: outgoing checks which changeset would be pushed to the server if push would be executed, it shows new changesets in the local repository that are not yet pushed to the parent repository. incoming does the same trick with incoming changes: it shows the changesets that would be loaded when pull is executed. incoming and outgoing make no changes, neither on the local nor on the associated repository.

The hg serve command can be very useful in certain situations: it starts a small webserver at port 8000 and allows (a) to browse the repository with a web-browser and (b) for others to pull changes from this address. Attention though: this included simple server has no authorisation, i.e. everyone that has network access to the machine/port can access the project.

(2) Peer to peer collaboration: Distributed SCMs also allow easy collaboration without central repositories: The general idea is to share changesets that are available in the repository of one developer with others, e.g. via email:

  • The hg export command exports one changeset as text(file), also called a patch. This patch can be sent to other developers
  • The hg import command imports a patch into the local repository.
  • The hg bundle command is more elaborate, and allows to bundle a set of changesets into a compressed file. This file can be sent to other developers and imported to their repositories using the hg pull command.
  • The before-mentioned hg serve command can be used for ad hoc collaboration.

One recommendation: Patch imports should be always done in local clones. In case of a problem, the main repository is not affected. In case of success, the changes can be pushed to the own main repo. The following example outlines such a collaboration scenario:

Let's assume: developer A created a new project in his local repository RA; This repository initially contained 3 revisions. This repository was transfered via USB stick to developer B. Later A makes three commits in RA and wants to share these with developer B via Email:

Developer A:

 1: $ hg glog
 2: @  changeset:   5:75039d11a746
    |  tag:         tip
    |  summary:     changed c
    |
    o  changeset:   4:1459c7e874c7
    |  summary:     changed b
    |
    o  changeset:   3:952599ea532b
    |  summary:     changed a
    |
    o  changeset:   2:a5f1a9f158bd
    |  summary:     added c
    |
    o  changeset:   1:28a2fd44926b
    |  summary:     added b
    |
    o  changeset:   0:65abfa371065
       summary:     added a
 3: $ hg bundle --base 2 changes.hg
 4: 3 changesets found
 5: (email changes.hg to developer B)

Developer B (received bundle "changes.hg" via email):

 6: $ hg glog
 7: @  changeset:   2:a5f1a9f158bd
    |  summary:     added c
    |
    o  changeset:   1:28a2fd44926b
    |  summary:     added b
    |
    o  changeset:   0:65abfa371065
       summary:     added a
 8: $ hg pull changes.hg
 9: pulling from changes.hg
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 3 changesets with 3 changes to 3 files
    (run 'hg update' to get a working copy)
10: $ hg update
11: $ hg glog
12: @  changeset:   5:75039d11a746
   |  tag:         tip
   |  summary:     changed c
   |
   o  changeset:   4:1459c7e874c7
   |  summary:     changed b
   |
   o  changeset:   3:952599ea532b
   |  summary:     changed a
   |
   o  changeset:   2:a5f1a9f158bd
   |  summary:     added c
   |
   o  changeset:   1:28a2fd44926b
   |  summary:     added b
   |
   o  changeset:   0:65abfa371065
      summary:     added a
  • Line 1-2: Developer A has 6 changeset in his repository; he knows, that developer B only has the first three revisions, revision 3-5 are new and should be shared
  • Line 3-5: Developer A create a bundle (file) that contains changesets 3-5 and sends this bundle (e.g. via email) to developer B.
  • Line 6-7: Developer B currently has only the first three revisions and receives the bundle via email
  • Line 8-12: Deverloper B pulls the changesets from the bundle to her repository.

Instead of the bundle command also export and import can be used. However to share multiple revisions bundle is much easier to use and more "resilient": pull only imports changesets fromt the bundle to the repository that are not yet in the repository.

Again a warning here: the short revision numbers (counting 0, 1, 2, ...) are only for convenience use within one repository. They must not be used to identify revisions between clones of repositories. For this purpose always the long ID has to be used!

The Version History: Working with Revisions

The log and glog as well as the update command were already outlined in the previous sections. But there is a set of other commands that are helpful in working with revisions:

The hg annotate command can be very helpful to assess changes in a file:

1: $ hg annotate -u -n -dq Test.java 
2:     Lisa 2 2010-02-09: class Test { 
       Lisa 2 2010-02-09: 	public static void main (String[] args) { 
  alexander 3 2010-02-09: 		System.out.println("Hello Java World");
       Lisa 2 2010-02-09: 	}
  alexander 3 2010-02-09: }
  alexander 1 2010-02-07: // Todo: add more functionality

The annotate command in line 1 is executed on the Test.java file with some switches: -u to display the name of the user who changed a line, -n to display the short revision counter, -dq to print the date in short form. The result is the content of the Test.java file with annotations: who changed a line in which revision at what date.

Another important command is hg diff: This command can be used to show differences of the project between revisions. For example:

 1: $ hg diff -r 2 -r 3
 2: diff -r 7eb91dedc66a -r 5ba8ca403e87 Test.java
 3: --- a/Test.java	Tue Feb 09 12:04:00 2010 +0100
 4: +++ b/Test.java	Tue Feb 09 12:04:55 2010 +0100
 5: @@ -1,5 +1,6 @@
 6: class Test { 
 7:    	public static void main (String[] args) { 
 8: -		System.out.println("Hello World");}
 9: +		System.out.println("Hello Java World");
10:    	}
11: +}
12: // Todo: add more functionality
  • Line 1 executes the diff command and compares revision 2 with revision 3. The following lines are the output of the diff command.
  • Line 2 shows the exact hexadecimal revision identifier and the file (Test.java) that has been changed.
  • Line 3 - 5 provide more details on the change, e.g. the dates.
  • Lines 6-12 show the changes in the file plus some additional unchanged lines as context around the changed lines
  • Lines 6, 7, 10 and 12 were unchanged between these revisions
  • Line 8 was replaced (-) with Line 9 (+)

The hg revert command can be used to revert changes in the working copy to an earlier version from the version history. For example: Assume the file Test.java is edited but after saving the file we figure out, that we made a mistake and would like to revert the changes:

 1: $ cat Test.java
 2: class Test { 
     	public static void main (String[] args) { 
     		System.out.println("Hello Java World");
     	}
     }
     // Todo: add more functionality
 3: $ echo "bad line" >> Test.java
 4: $ cat Test.java
 5: ...
    }
    // Todo: add more functionality
    bad line
 6: $ hg status
 7: M Test.java
 8: hg revert Test.java
 9: $ ls
10: Test.java  Test.java.orig
11: rm Test.java.orig

First a "bad change" is made to Test.java in line 3. Line 4 and 5 show the added line to the file. This change is reverted by the command in line 8. The revert command (1) reverts the state of Test.java and (2) creates a new file Test.java.orig that contains the bad change, in case parts of the change should be kept. If this is not necessary, Test.java.orig can be deleted and the "bad change" is reverted.

Finally, the hg backout command can be used to remove changesets from the history. However, the history is not rewritten. Mercurial tries to "undo" the given changeset and creates new commits with the reverted changes. Warning: The backout command can be tricky, particularly for beginners. It is highly recommended to make a clone of the repository before trying out this command. Also backing out changes involves branching and merging which will be explained in the next section. However, one backout example:

 1: $ hg init
 2: $ touch a.txt
 3: $ hg add a.txt
 4: adding a.txt
 5: $ hg commit -m "added a.txt"
 6: $ touch b.txt
 7: $ hg add b.txt
 8: adding b.txt
 9: $ hg commit -m "added b.txt"
10: $ touch c.txt
11: $ hg add c.txt
12: adding c.txt
13: $ hg commit -m "added c.txt"
14: hg glog
15: @  changeset:   2:74ffff5e37d6
    |  tag:         tip
    |  summary:     added c.txt
    |
    o  changeset:   1:95ee193a510f
    |  summary:     added b.txt
    |
    o  changeset:   0:39dcafad465b
       summary:     added a.txt
16: $ hg backout 1 -m "Backing out Revision 95ee193a510f"
17: removing b.txt
    Backed out changeset 95ee193a510f
    created new head
    changeset 3:b735c1b1f4d4 backs out changeset 1:95ee193a510f
    the backout changeset is a new head - do not forget to merge
18: $ hg glog
19: o  changeset:   3:b735c1b1f4d4
    |  tag:         tip
    |  parent:      1:95ee193a510f
    |  summary:     Backed out changeset 95ee193a510f
    |
    | @  changeset:   2:74ffff5e37d6
    |/  summary:     added c.txt
    |
    o  changeset:   1:95ee193a510f
    |  summary:     added b.txt
    |
    o  changeset:   0:39dcafad465b
       summary:     added a.txt
20: $ hg merge
21: 0 files updated, 0 files merged, 1 files removed, 0 files unresolved
    (branch merge, don't forget to commit)
22: $ hg commit -m "Merged Backout"
23: $ hg glog
24: @    changeset:   4:f665faf1f560
    |\   tag:         tip
    | |  parent:      2:74ffff5e37d6
    | |  parent:      3:b735c1b1f4d4
    | |  summary:     Merged Backout
    | |
    | o  changeset:   3:b735c1b1f4d4
    | |  parent:      1:95ee193a510f
    | |  summary:     Backed out changeset 95ee193a510f
    | |
    o |  changeset:   2:74ffff5e37d6
    |/   summary:     added c.txt
    |
    o  changeset:   1:95ee193a510f
    |  summary:     added b.txt
    |
    o  changeset:   0:39dcafad465b
       summary:     added a.txt

To execute a backout, a merge operation is needed. Branches, merge and rebase are explained in the next sections.

Branches and Tags

A Tag is useful to mark a certain version in the repository. It is a convenience function like a bookmark in the web browser. Technically spoken, tags are not necessary, as version IDs are unique identifiers of versions. One could simply make an external list with IDs that have a "special meaning". Yet it is convenient to tag certain revisions with names e.g. "Version_1.2" using the tag command in line 1:

1: $ hg tag -r 143 Version_1.2
2: $ hg tags
3: tip             254:708b52826a7c
   Version_1.2     143:d8eaf2c503d7
   Version_1.0      98:073887fbd616
4: $ hg update Version_1.2

The tags command (line 2) lists all available tags and the tagname can be used in to set the working directory to the tagged version (line 3). With hg tag --remove Version_1.2 a tag can be removed.

Branches are parallel lines of development. Consider this example: Version 1 of a software was developed, then the development proceeds to version 2. Now a security issue is detected in version 1 which has to be maintained as customers still use it, hence a bugfix is written for version 1, creating version 1.1 leading to this situation:

    
    o  Version 2.0       
    |       
    |       
    o       
    |       
    | o Version 1.1
    | |    
    o |  
    | |    
    |/   
    |
    o  Version 1
    |
    |
    o 

Generally spoken, branching is a more complex topic compared to the rather easy usage patterns described before. There are several options in distributed SCMs to deal with branching. Steve Losh wrote a rather comprehensive and easy to understand article about branching in Mercurial and Git.

There are two main options:

  1. Cloned repositories: As soon as a repository is cloned and both clones are changed, you have in fact created a branch. This means, each branch lives in a separate repository.
  2. Create branches within one repository.

Both approaches have advantages and disadvantages. (1) is very easy to do and easy to understand, yet one has to deal with a series of repositories, which is not always convenient. (2) Keeps all branches in one repository which can be a little bit harder to understand. Steve analyses advantages and disadvantages in detail the article mentioned above. Here only a brief example of branching within a repository is given:

 1: $ hg init
 2: $ touch a.txt
 3: $ hg add
 4: adding a.txt
 5: $ hg commit -m "added a.txt"
 6: $ touch b.txt
 7: $ hg add
 8: adding b.txt
 9: $ hg commit -m "added b.txt"
10: $ hg tag -r 0 V_1.0
11: $ hg tag -r 1 V_2.0
12: $ hg glog
13: @  changeset:   3:fce35216a630
    |  tag:         tip
    |  summary:     Added tag V_2.0 for changeset 6bc53f524a55
    |
    o  changeset:   2:f0168f0f06d0
    |  summary:     Added tag V_1.0 for changeset b62595346c8f
    |
    o  changeset:   1:6bc53f524a55
    |  tag:         V_2.0
    |  summary:     added b.txt
    |
    o  changeset:   0:b62595346c8f
       tag:         V_1.0
       summary:     added a.txt
14: $ hg update V_1.0
15: $ hg branch V_1.x
16: marked working directory as branch V_1.x
17: $ touch c.txt
18: $ hg add
19: adding c.txt
20: $ hg commit -m "Bugfix, added c.txt"
21: created new head
22: $ hg glog
23: @  changeset:   4:3a4f80f10c62
    |  branch:      V_1.x
    |  tag:         tip
    |  parent:      0:984077fd8bf0
    |  summary:     Bugfix, added c.txt
    |
    | o  changeset:   3:6a562cba417e
    | |  summary:     Added tag V_2.0 for changeset dd4ce84acde5
    | |
    | o  changeset:   2:95d33ae73048
    | |  summary:     Added tag V_1.0 for changeset 984077fd8bf0
    | |
    | o  changeset:   1:dd4ce84acde5
    |/   tag:         V_2.0
    |    summary:     added b.txt
    |
    o  changeset:   0:984077fd8bf0
       tag:         V_1.0
       summary:     added a.txt
24: $ hg update default
25: 2 files updated, 0 files merged, 1 files removed, 0 files unresolved
26: $ ls
27: a.txt  b.txt
28: $ hg branches
29: V_1.x                          4:3a4f80f10c62
    default                        3:6a562cba417e
30: $ hg update V_1.x
31: 1 files updated, 0 files merged, 2 files removed, 0 files unresolved
32: $ ls
33: a.txt  c.txt
  • Line 1-9: A repository is created and two files are added with two commits.
  • Line 10-11: Two tags are set: Revision 0 is Version 1.0 of our "product" and Revision 1 is Version 2.0
  • Line 12-13 show the current revision history of the repository.
  • Now we have to patch Version 1 (bugfix): line 14 sets the working directory back to Version 1.0
  • Line 15 indicates, that we intend to make a new (named) branch
  • Line 17-20: Make the "bugfix" by adding file c.txt and commit the changes
  • Line 21 indicates that a new "head" is created, this means the repository now has a new branch.
  • Line 22-23 show the current revision history with the two branches
  • Line 24 updates to the "default" branch and lines 26-27 show the expected status with the two files a.txt and b.txt
  • Line 28-29: The hg branches command lists all branches in the repository, in this case two branches: default and V_1.x.
  • Line 30-33: The update command switches back to branch V_1.x and shows the expected result: b.txt is not existing, but the "bugfix" c.txt is.

The principles of branching should be clear from this short example. Sometimes branches are also used when more complex features are introduced to not disrupt the rest of the development:

E.g., in a web application one developer should make significant changes to the persistence part of the application which takes several days. During this work it is highly likely that other parts of the application are broken. Hence he makes a feature branch which is isolated from the development efforts of the rest of the team. As soon as the feature is finished, the two lines of development are integrated (merged) again, as described in the next section.

(From the technical point of view a feature branch is a branch as described above.)

Merging and Rebasing

Branches can be planned as described in the previous section (e.g. for feature development or for maintaining multiple versions) or a consequence of teams working in parallel. So it is often important to merge different branches together to one common line of development. For this purpose the merge and rebase command can be used. For example, the status of the central repository is like this:

@  changeset:   1:c0040aa5dff5
|  tag:         tip
|  user:        Anne
|  summary:     added b.txt
|
o  changeset:   0:00ebb2c92bb2
   user:        Anne
   summary:     added a.txt

So, Ann had created the first two changesets. Pete and Francis clone this repository and do their own work. After cloning the three repositories are of course identical. Then Pete adds file c.txt. So his local repository looks like this:

@  changeset:   2:118d62ddb905
|  tag:         tip
|  user:        Pete
|  summary:     added c.txt
|
o  changeset:   1:c0040aa5dff5
|  user:        Anne
|  summary:     added b.txt
|
o  changeset:   0:00ebb2c92bb2
   user:        Anne
   summary:     added a.txt

Francis on the other hand added the file d.txt, hence her local repository looks like this:

@  changeset:   2:f0d318f6b0d1
|  tag:         tip
|  user:        Francis
|  date:        Fri Feb 19 15:22:32 2010 +0100
|  summary:     added d.txt
|
o  changeset:   1:c0040aa5dff5
|  user:        Anne
|  date:        Fri Feb 19 15:10:18 2010 +0100
|  summary:     added b.txt
|
o  changeset:   0:00ebb2c92bb2
   user:        Anne
   date:        Fri Feb 19 15:10:07 2010 +0100
   summary:     added a.txt

In this example it becomes obvious, that the short revision numbers are only unique within one repository: Revisions 0 and 1 are identical in all three repositories, e.g. revision 1 has the hash-identifier c0040aa5dff5 which is the same in all repositories. Revision numer 2 on the other hand has hash-ID 118d62ddb905 in Pete's repository and f0d318f6b0d1 indicating that these are different changesets!

Now, Pete pushes his changes to the central repository using hg push. Hence changeset 118d62ddb905 is pushed into the central repository. Later Francis also wants to push her changes back, but notices, that there are already changes in the central repository that she does not have locally (the changeset Pete made), so she has to first pull the unknown changes into her local repository, merge the changes and finally push the merged version:

 1: $ hg incoming
 2: searching for changes
    changeset:   2:118d62ddb905
    tag:         tip
    user:        Pete
    summary:     added c.txt
 3: $ hg pull
 4: added 1 changesets with 1 changes to 1 files (+1 heads)
 5: $ hg glog
 6: o  changeset:   3:118d62ddb905
    |  tag:         tip
    |  parent:      1:c0040aa5dff5
    |  user:        Pete
    |  summary:     added c.txt
    |
    | @  changeset:   2:f0d318f6b0d1
    |/   user:        Francis
    |    summary:     added d.txt
    |
    o  changeset:   1:c0040aa5dff5
    |  user:        Anne
    |  summary:     added b.txt
    |
    o  changeset:   0:00ebb2c92bb2
       user:        Anne
       summary:     added a.txt
 7: $ hg merge
 8: 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
    (branch merge, don't forget to commit)  
 9: $ hg ci -m "Merged Branches"
10: $ hg glog
11: @    changeset:   4:be47f1fca4d5
    |\   tag:         tip
    | |  parent:      2:f0d318f6b0d1
    | |  parent:      3:118d62ddb905
    | |  user:        Francis
    | |  summary:     Merged Branches
    | |
    | o  changeset:   3:118d62ddb905
    | |  parent:      1:c0040aa5dff5
    | |  user:        Pete
    | |  summary:     added c.txt
    | |
    o |  changeset:   2:f0d318f6b0d1
    |/   user:        Francis
    |    summary:     added d.txt
    |
    o  changeset:   1:c0040aa5dff5
    |  user:        Anne
    |  summary:     added b.txt
    |
    o  changeset:   0:00ebb2c92bb2
       user:        Anne
       summary:     added a.txt
12: $ ls
13: a.txt  b.txt  c.txt  d.txt
14: $ hg push
  • Line 1: Before pushing changesets the incoming command should be used to check whether someone else had pushed changes in the meantime. Line 2 indicates that Pete made changes that are not in the local repository.
  • Line 3-6: Pete's changeset is pulled into the local repository, and the glog command illustrates that this generated a branch: one branch is the development of Pete, the other one from Francis.
  • Line 7-9: These branches should be merged using the merge command. The merge is then committed
  • Line 10-11 show now that the two branches have been merged at changeset 4 which has two parents.
  • Line 12-13: Also the ls command shows, that Francis has now all 4 files in her local repository: a.txt and b.txt created by Anne, c.txt created by Pete and d.txt by Francis herself.
  • Line 14: Having solved these issues, Francis can now push the changesets back to the central repository.

Merging operations can be a little bit tricky sometimes, so it is recommended to first try them in a clone of the own repository.

It is often recommended not to use the merge command for such scenarios, as the consequence can be a very "messed up" and unclear revision history. Merge should be used when longer lived branches are merged. In such cases it is rather recommended to use the rebase command. Rebase rewrites the history in the local repository and creates a "flat" revision line. rebase is provided via the Rebase-Extension which is part of the distribution, but has to be enabled in the .hgrc file like this:

    [extensions]
    hgext.graphlog = 
    hgext.purge = 
    hgext.rebase = 

Now let's redo the previous example but with rebase instead of merge:

 1: $ hg incoming
 2: searching for changes
    changeset:   2:118d62ddb905
    tag:         tip
    user:        Pete
    summary:     added c.txt
 3: $ hg pull
 4: added 1 changesets with 1 changes to 1 files (+1 heads)
 5: $ hg glog
 6: o  changeset:   3:118d62ddb905
    |  tag:         tip
    |  parent:      1:c0040aa5dff5
    |  user:        Pete
    |  summary:     added c.txt
    |
    | @  changeset:   2:f0d318f6b0d1
    |/   user:        Francis
    |    summary:     added d.txt
    |
    o  changeset:   1:c0040aa5dff5
    |  user:        Anne
    |  summary:     added b.txt
    |
    o  changeset:   0:00ebb2c92bb2
       user:        Anne
       summary:     added a.txt
 7: $ hg rebase
 8: added 3 changesets with 3 changes to 3 files
    rebase completed
 9: $ hg glog
10:  o  changeset:   3:f0d318f6b0d1
     |  user:        Francis
     |  summary:     added d.txt
     |
     o  changeset:   2:118d62ddb905
     |  user:        Pete
     |  summary:     added c.txt
     |
     o  changeset:   1:c0040aa5dff5
     |  user:        Anne
     |  summary:     added b.txt
     |
     o  changeset:   0:00ebb2c92bb2
        user:        Anne
        summary:     added a.txt
    
12: $ ls
13: a.txt  b.txt  c.txt  d.txt
14: $ hg push

This example is identical to the previous merge-example except for line 7: The rebase command "linearises" the revision history, i.e. it puts Frances changeset after Pete's changeset. This is easier to read and understand than the merged graph, particularly when a lot of parallel work is occurring. It is very important though not to rebase code that was already shared with other repositories!!

The previous examples were rather simple insofar as there were no conflicting changes. In case that two or more people made changes in the same file Mercurial indicates a conflict during merge or rebase. The resolve command can be used to mark when the conflict was resolved, for instance:

 1: $ hg pull
 2: $ hg glog
 3: o  changeset:   6:295b143c7c17
    |  tag:         tip
    |  parent:      4:49795916bc6f
    |  user:        Pete
    |  summary:     changed e.txt
    |
    | @  changeset:   5:4739ebd8c16e
    |/   user:        Francis
    |    summary:     changed e.txt
    |
    o  changeset:   4:49795916bc6f
    |  user:        Francis
    |  summary:     added e.txt
    |
 4: $ hg rebase
 5: merging e.txt
    warning: conflicts during merge.
    merging e.txt failed!
    abort: fix unresolved conflicts with hg resolve then run hg rebase --continue
 6: $ ls
 7: a.txt  b.txt  c.txt  d.txt  e.txt  e.txt.orig
 
Open e.txt in a text editor, resolve conflict and save it. 
 
 8: $ rm e.txt.orig
 9: $ hg resolve --mark e.txt
10: $ hg rebase --continue
11:  added 2 changesets with 2 changes to 1 files
     rebase completed
12: $ hg glog 
13: @  changeset:   6:4739ebd8c16e
    |  tag:         tip
    |  user:        Francis
    |  summary:     changed e.txt
    |
    o  changeset:   5:295b143c7c17
    |  user:        Pete
    |  summary:     changed e.txt
    |
    o  changeset:   4:49795916bc6f
    |  user:        Francis
    |  summary:     added e.txt
  • Line 1-4: Again a changeset is pulled into the local repository. This time, the changes are conflicting.
  • Line 5: Hence rebase indicates that the conflicting changes have to be resolved before the rebase can be finished.
  • Line 6-7: Rebase tried to merge the conflicts in e.txt and created a new file e.txt.orig that contains the content before the merge.
  • Now, Frances opens e.txt in an editor and resolves the conflict. e.txt.orig can be used, but is not needed any longer after the resolving, hence removed.
  • Line 9: The resolve command marks that the conflict is resolved
  • Line 10-13: The rebase is continued and executed.

Depending on the operating system and tools installed various merge-tools can be used to support the resolving step. Also, the default settings which merge tool is used by Mercurial is different in various operating systems. Please read the Mercurial documentation to modify the merge tool settings.

Tools

hg serve, Eclipse Plugins, TortoiseHG

Collaboration Platforms

Google Code, Bitbucket

Further Information

There is a lot of additional tutorial and reference documentation for Mercurial, just some good references here: