Scientific Software

Ten best practices for scientific software development.

Learn More About This Project    

Version Control - The Basics

 

Version control creates a starting point for your scientific software project.

I once knew a world-class researcher - I will call him George. He was an expert in mathematical models of fluid dynamics. I spent a summer working for him at a national lab. My first week there, I met with him and one of his collaborators, and we were discussing his turbulence code.

These two had been collaborating for several months, and had already created several versions of the code. George said, "Let me email you the latest code."

Having been lucky enough to learn version control from the start of my scientific software career, I finally had a taste of what life without version control was like. And it was frightening. It took no time at all for versions to proliferate. Hours spent squashing bugs went to waste when a "new" version of the code incorporated changes from one file, but not another, into a new version. The entire model of collaboration was to constantly create new copies of the code, in-place, and send zip files of the code back and forth via email.

This all came about as a result of the original code being a one-man operation. There was no apparent need for George to learn a new tool (version control). But this method was rife with unnecessary difficulties. Bugs, for one, were hard to squash. Incorporating an additional person onto the project could still be managed while emailing zip files back and forth - but any more than two collaborators, and you're going to have big problems.

George's plan was to have five new collaborators join the development team that summer.

Needless to say, I introduced George to subversion, and we never looked back.

Principles of Version Control: The Absolute Bare Minimum

Basic version control provides you with snapshots of your code at regular intervals. Primitive version control just takes "dumb" snapshots of files at regular or irregular intervals, and uses those to track changes over time.

This can quickly become difficult to manage - how can multiple versions of the code be developed in parallel, and how are changes to code determined and stored?

Principles of Version Control: Patch and Diff

Most Unix distributions have a copy of patch and diff. These are extremely handy programs that can be used to create a list of atomic changes made from one version of a file to the next. Diff creates a list of changes between two files, while patch creates a patch file that can be applied to existing source code to modify it.

This allows you to modify source code, then create a patch file containing all of your changes. When others download and apply that patch file to their code, assuming you have the same starting point as they do, their code will be modified to match your code.

Principles of Version Control: Managing Versions

Changes to code happen in related sets of edits - hence the utility of patch and diff. But code development often happen in parallel, and patching, then un-patching, then re-patching code is cumbersome.

Enter version control software.

cvs (concurrent version system) was the early de-facto standard for version control. Now you have other options like svn (subversion) or git. These programs are intended to handle related sets of edits and their relationship to one another.

Different version control systems take different approaches, but all of them are built with flexibility and agility for software developers in mind. Most have some special features built-in, which we'll cover later.

Tools for Version Control

Version control can take many forms, but here are a list of some of the most common tools related to version control.

Unix utilities:

patch - see Patch on the charlesreid1.com wiki

diff- see Diff on the charlesreid1.com wiki

Version control programs:

cvs - concurrent version system, oldest version control system; see CVS project page

hg - Mercurial, a less common but still popular version control system; Mercurial project page

svn - subversion, an older but ubiquitous version control system; see SVN project page

git - git, a complicated-but-ultimately-actually-simpler version control system; see Git project page