Ten best practices for scientific software development.
Cheklists improve efficiency for repeated tasks, like running tests, releasing new versions, or bringing new developers onboard.
Most scientific software developers probably got their start working on software in academia - they joined a research group, and began to work on developing a mathematical model, or analyzing data from instruments, or running a set of simulations to understand the effects of different variables.
Academia is unique in the way that groups working on software development often experience high turnover - graduate students and postdocs have limited timelines, grant money will dry up or disappear with time, and new projects and new applications will come up.
In this type of environment especially, but indeed in any kind of research group (industry, national lab, or academia), it is very easy for critical information to disappear as people come and go, and for the hard work of establishing processes, controls, and for software quality to erode.
It can take a lot of work to create protocols and establish precedents for how to handle certain tasks. Recording this information somewhere is crucial to maintain high quality code and efficient software development.
Checklists are an excellent way to do this. Not only does a checklist record the steps involved in a task or process, it provides a starting poitn for process improvement (see Continuous Process Improvement).
What kind of tasks might you write checklists for? Here are a few ideas: Bringing a new member onboard (what paperwork must be done, what resources can they use, where do they go for help, etc.) Running a regression test after making changes to the core functionality of software (to ensure correct answers are not changed) Creating a new major or minor version release (this often involves a long series of steps, and if any one in particular is missed, there aren't many chances for do-overs) Running a parameter study with a software program Visualizing results of simulations Profiling, timing, and benchmarking computer code Filing a bug using the issue-tracking software Squashing a bug and marking it as resolved
Software used to maintain these checklists must meet a few requirements.
First, it must be easy to use. The usefulness of the system depends on more people using it, so if the system is not easy to use, it will not be adopted, and will eventually peter out and die. You generally want to avoid requiring each user to install and use specialized software - and that means you want people to be able to use their web browser.
Second, it should be text-based, and as simple to use as writing a text document. This will contribute to ease of use, but (more importantly) will allow for flexibility and the ability to change to a different system down the road. There is always a new program just around the corner, and having the flexibility to get your data out of one system and into another will keep users happy and save headaches.
Last, you want the system to be democratic, such that everyone can add or change information. That way, if a person develops a procedure, they can document the checklist themselves, or they can delegate the task. Additionally, if people have suggestions for improvements, they don't have to wait - they can jump in and make the changes themselves. This encourages people to be active and take responsibility.
There is one class of software that meets the major requirements listed above, and that is, wiki software.
Wikis are large, collaboratively edited documents, that allow for rapid generation of content, a flat and democratic editing model, and flexible organization of content. Wikis are an excellent way to maintain checklists for repeated tasks.
MediaWiki is by far the most popular wiki software, and is the same wiki software used by Wikipedia. MediaWiki has a large community of developers, has a familiar look-and-feel, and is almost trivial to get set up (it uses PHP, a server-side web language, and a MySQL or other SQL database to store page data). MediaWiki provides a lot of power in a small package, and is easy to back up (all of its data is stored in an SQL database, so a database dump takes care of it).
Additionally, MediaWiki has the (often overlooked!) advantage that, as you learn how to use MediaWiki templates, linking, images, and markup, that knowledge translates directly to better fluency with Wikipedia. It is also being constantly improved, and recent changes have made its user interface easier for beginners to understand and use.
Github repositories have integrated wikis, in addition to an issue tracker. This is an excellent solution if you want a zero-cost, zero-effort wiki that you can quickly get started using. The disadvantage of this approach, however, is that the wiki is limited in its scope - new repositories require creating new wikis, so there is no sharing of pages across wikis. This siloed approach makes it more difficult to create content for an organization, as opposed to for a particular piece of software.
Trac is another wiki software package. Trac is different in philosophy from MediaWiki - it is intended to provide integrated wiki, version control, and issue tracking functionality. Trac is more difficult to get set up, but once it is set up, it can be very, very handy to have your wiki integrated with your issue tracker and your source code repository. It makes it much easier to cross-reference issues, include source code, point to files, and reference particular commits. Trac can be a double-edged sword, however - because it tries to do many things, it is more complicated, and the more complicated a tool is, the harder it is to get buy-in from everyone. Trac is great for integrated teams where everyone is "on board," but without someone to lead the charge, Trac wikis will, more often than not, be abandoned.