Scientific Software

Ten best practices for scientific software development.

Learn More About This Project    

Version Control: Beyond the Basics

 

Utilizing advanced version control features like forks, branches, and tags allows much greater flexibility for maintaining scientific software.

While version control is a good start, simply tracking changes to a code base is wasting most of the capabilities of version control software. Version control can go far beyond simply tracking incremental revisions over time.

This page will briefly cover some of the key concepts, and provide links to more information.

Keep in mind that, depending on what version control software you're using, specific terms can have differing meanings. The terms used here are intended to be as general as possible.

Trunk and Branches

While models for version control and attitudes toward legacy software have changed substantially over time, it is still useful to start by talking about some classic models for version control. One of these is the trunk-branch model.

The term "branch" refers to a model of a repository that looks kind of like a tree: you have the main "trunk" of the tree, which is where most of the tree's growth (a.k.a., code development) happens. Sprouting from the sides of the trunk, you have branches, where more or less growth (development work) will go. These branches may start early, and exist for a long time; or they may be very tiny. You can even have branches coming off of branches.

The basic idea, then, is that you can start with a single broad code base, and developers can decide they're going to branch off and work on a specific feature of the code, independently of the main development branch.

There is a nice explanation of subversion repository layout at the SVN Book (at red-bean.com). This is an extensive O'Reilly book about Subversion that is published in its entirety online.

Tags

A tag, as the name implies, is a point in the code where you add a "tag" to the code, so that you can refer back to it specifically. Think of it like freezing the code at a particular point in time, and saving that snapshot so that you can re-use it later.

This is useful if you reach a milestone in the code - say, version 1.0, or version 2.0, etc.

You can also add tags to branches. As you will see, the repository model for git is a bit more sophisticated, so tags take on a more general meaning, but the idea is still the same.

Forks

While it depends on the context, a fork is generally someone else making a copy of your repository in order to modify it - that is, they're branching off of your code. Usually they will have the intention of making changes and submitting those changes back to you so that you can incorporate them into the original code base, although not always.

Git and the Directed Acyclic Graph

It is important to understand that git represents a vast leap forward in version control. It is very different from the subversion model, but shares just enough in common that subversion users will be lulled into thinking it works the same way.

The basic concept behind a git repository is a directed, acyclic graph. We're talking math graph here, not bar charts. In plain English, a graph consists of nodes (dots) connected by edges (lines).

Each time changes are commited to a git repository, the state of the code is added as a node on a graph, and it must be connected to at least one other node on the graph - meaning, it must have at least one parent commit.

This is a more generalized way of reprsenting a repository than the tree repository model that is built into subversion. It means you can keep managing your code with a tree model for your repository, if you really like subversion, but it also means you can do some much more powerful things.

The genesis of git was Linux kernel development - a coding project that is extremely complex, nuanced, and massive, with thousands of developers working on thousands of parts of thousands of code files. Git allows you to leverage all the hard work, blood, sweat, tears, and agony put into learning difficult lessons about version control, and yet remain blissfully ignorant about all of that suffering.

It can be a bit confusing to beginners. To help clear it up, this talk, "Git for Ages 4 and Up", is well worth watching - and re-watching.