Compare commits
2 Commits
Author | SHA1 | Date | |
---|---|---|---|
e793a6678f | |||
d2a6ceca81 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -1 +1,2 @@
|
|||||||
|
.snakemake/
|
||||||
site/
|
site/
|
||||||
|
118
Snakefile
Normal file
118
Snakefile
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
ghurl = 'git@github.com:charlesreid1/how-do-i-snakemake.git'
|
||||||
|
cmrurl = 'ssh://git@git.charlesreid1.com:222/charlesreid1/how-do-i-snakemake.git'
|
||||||
|
cmrmkm = 'ssh://git@git.charlesreid1.com:222/charlesreid1/mkdocs-material.git'
|
||||||
|
index = 'index.html'
|
||||||
|
|
||||||
|
rule default:
|
||||||
|
message:
|
||||||
|
"""
|
||||||
|
|
||||||
|
Welcome to the Snakefile for the how-do-i-snakemake repo.
|
||||||
|
This Snakefile contains utility methods for building and
|
||||||
|
deploying the site.
|
||||||
|
|
||||||
|
----------------------------------------
|
||||||
|
Snakefile
|
||||||
|
|
||||||
|
Add the -p or --printshellcmd flag to print the shell commands being run.
|
||||||
|
Add the -n or --dryrun flag to do a dry run.
|
||||||
|
|
||||||
|
|
||||||
|
To clone the deployed site to site/:
|
||||||
|
snakemake -p clone_site
|
||||||
|
|
||||||
|
|
||||||
|
To initialize submodules (if you did not clone this repo recursively):
|
||||||
|
snakemake -p submodule_init
|
||||||
|
|
||||||
|
|
||||||
|
To build the documentation in docs/ into the site/ directory:
|
||||||
|
snakemake build
|
||||||
|
|
||||||
|
|
||||||
|
To build and serve the documentation site locally (viewable at localhost:8000):
|
||||||
|
snakemake serve
|
||||||
|
|
||||||
|
|
||||||
|
To safely clean the documentation site before next deployment:
|
||||||
|
snakemake clean_docs
|
||||||
|
|
||||||
|
|
||||||
|
To build and deploy the updated documentation site to Heroku app dcppc-private-www:
|
||||||
|
snakemake deploy_docs
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
rule clone_site:
|
||||||
|
"""
|
||||||
|
Clone the deployed site to site/
|
||||||
|
and add the proper remotes.
|
||||||
|
"""
|
||||||
|
output:
|
||||||
|
'site'
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
git clone -b gh-pages {ghurl} site/
|
||||||
|
cd site \
|
||||||
|
&& git remote add cmr {cmrurl} \
|
||||||
|
&& git remote add gh {ghurl}
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
rule submodule_init:
|
||||||
|
"""
|
||||||
|
Initialize the submodules (mkdocs-material)
|
||||||
|
so we can build the documentation.
|
||||||
|
"""
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
git submodule update --init
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
rule build:
|
||||||
|
"""
|
||||||
|
Build the documentation with mkdocs.
|
||||||
|
"""
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
mkdocs build
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
rule serve:
|
||||||
|
"""
|
||||||
|
Serve the documentation with mkdocs.
|
||||||
|
Visit localhost:8000 in your browser.
|
||||||
|
"""
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
mkdocs serve
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
rule clean_docs:
|
||||||
|
"""
|
||||||
|
Safely clean the deployed documentation site.
|
||||||
|
"""
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
rm -fr site/content/*
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
rule deploy_docs:
|
||||||
|
"""
|
||||||
|
Deploy the documentation.
|
||||||
|
"""
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
mkdocs build
|
||||||
|
cd site/
|
||||||
|
git add -A .
|
||||||
|
git commit --allow-empty . -m 'updating site'
|
||||||
|
git push cmr gh-pages
|
||||||
|
git push gh gh-pages
|
||||||
|
'''
|
||||||
|
|
@@ -25,7 +25,8 @@ Two recommendations to help overcome this:
|
|||||||
smaller parts, and convert some of those parts into separate Snakemake rules.
|
smaller parts, and convert some of those parts into separate Snakemake rules.
|
||||||
|
|
||||||
|
|
||||||
## Example: Filtering Sequencer Reads
|
<a name="example"></a>
|
||||||
|
## Example Overview: Filtering Sequencer Reads
|
||||||
|
|
||||||
Let's illustrate the process of converting a workflow from shell scripts to a
|
Let's illustrate the process of converting a workflow from shell scripts to a
|
||||||
Snakefile, and doing so in stages, using a hypothetical workflow that involves
|
Snakefile, and doing so in stages, using a hypothetical workflow that involves
|
||||||
@@ -43,6 +44,7 @@ downloading data files containing reads from a sequencer from an external URL:
|
|||||||
| `SRR609_2_reads.fq.gz` | <http://example.com/SRR609_2_reads.fq.gz> |
|
| `SRR609_2_reads.fq.gz` | <http://example.com/SRR609_2_reads.fq.gz> |
|
||||||
|
|
||||||
|
|
||||||
|
<a name="stage1"></a>
|
||||||
## Stage 1: Shell Script + Snakefile
|
## Stage 1: Shell Script + Snakefile
|
||||||
|
|
||||||
### The Shell Script
|
### The Shell Script
|
||||||
@@ -333,6 +335,7 @@ You can force Snakemake to re-download the files two ways:
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<a name="stage2"></a>
|
||||||
## Stage 2: Replace Script with Snakefile (Hard-Coded)
|
## Stage 2: Replace Script with Snakefile (Hard-Coded)
|
||||||
|
|
||||||
The next step in converting our workflow to Snakemake is to
|
The next step in converting our workflow to Snakemake is to
|
||||||
@@ -388,8 +391,95 @@ rule download_reads:
|
|||||||
The Python variables `read_file` and `read_url` are available to the shell command
|
The Python variables `read_file` and `read_url` are available to the shell command
|
||||||
through `{read_file}` and `{read_url}`.
|
through `{read_file}` and `{read_url}`.
|
||||||
|
|
||||||
|
The Snakefile above shows how the `run:` directive and `shell()` function call
|
||||||
|
can be combined. This is very convenient, since otherwise we would end up with
|
||||||
|
complicated subprocess command construction and funky string manipulations.
|
||||||
|
|
||||||
|
**Problem:** There's still one big problem, and that's how the task of
|
||||||
|
downloading each file is being divided up. We have a relatively short list of
|
||||||
|
files to download here, but suppose we had a list of 200 files. Now, we have
|
||||||
|
a single rule that is responsible for downoading 200 files. If any of those
|
||||||
|
files are missing, it will correctly assume the rule needs to be re-run, but
|
||||||
|
will end up running the entire rule, and re-downloading every file.
|
||||||
|
|
||||||
|
We really need to split our task up so that each rule corresponds to a single
|
||||||
|
task of downloading a single indivdiual file. If we were hard-coding everything,
|
||||||
|
then we might end up hard-coding a bunch of rules, and that would stink.
|
||||||
|
Instead, let's use wildcards.
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
-----------------------------------8<----------------------------------------------
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
<a name="stage3"></a>
|
||||||
## Stage 3: Replace Script with Snakefile (Wildcards)
|
## Stage 3: Replace Script with Snakefile (Wildcards)
|
||||||
|
|
||||||
|
The next step in converting our workflow to Snakemake is to
|
||||||
|
hard-code the file names into a Snakemake rule and let Snakemake
|
||||||
|
run the curl command to download them. Here are the links:
|
||||||
|
|
||||||
|
| Read Files | URL (note: these links are fake) |
|
||||||
|
|------------|----------------------------------|
|
||||||
|
| `SRR606_1_reads.fq.gz` | <http://example.com/SRR606_1_reads.fq.gz> |
|
||||||
|
| `SRR606_2_reads.fq.gz` | <http://example.com/SRR606_2_reads.fq.gz> |
|
||||||
|
| `SRR607_1_reads.fq.gz` | <http://example.com/SRR607_1_reads.fq.gz> |
|
||||||
|
| `SRR607_2_reads.fq.gz` | <http://example.com/SRR607_2_reads.fq.gz> |
|
||||||
|
| `SRR608_1_reads.fq.gz` | <http://example.com/SRR608_1_reads.fq.gz> |
|
||||||
|
| `SRR608_2_reads.fq.gz` | <http://example.com/SRR608_2_reads.fq.gz> |
|
||||||
|
| `SRR609_1_reads.fq.gz` | <http://example.com/SRR609_1_reads.fq.gz> |
|
||||||
|
| `SRR609_2_reads.fq.gz` | <http://example.com/SRR609_2_reads.fq.gz> |
|
||||||
|
|
||||||
|
There are multiple ways to modify the Snakefile to download the files directly.
|
||||||
|
The approach shown below uses a `run` directive to run Python code, and a
|
||||||
|
`shell()` call to run a shell command. It also shows how these two can be mixed:
|
||||||
|
|
||||||
|
**`Snakefile`:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
touchfile = '.downloaded_reads'
|
||||||
|
|
||||||
|
# from https://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html#read-only-web-http-s
|
||||||
|
from snakemake.remote.HTTP import RemoteProvider as HTTPRemoteProvider
|
||||||
|
HTTP = HTTPRemoteProvider()
|
||||||
|
|
||||||
|
# map of read files to read urls
|
||||||
|
reads = {
|
||||||
|
"SRR606_1_reads.fq.gz" : "http://example.com/SRR606_1_reads.fq.gz",
|
||||||
|
"SRR606_2_reads.fq.gz" : "http://example.com/SRR606_2_reads.fq.gz",
|
||||||
|
"SRR607_1_reads.fq.gz" : "http://example.com/SRR607_1_reads.fq.gz",
|
||||||
|
"SRR607_2_reads.fq.gz" : "http://example.com/SRR607_2_reads.fq.gz",
|
||||||
|
"SRR608_1_reads.fq.gz" : "http://example.com/SRR608_1_reads.fq.gz",
|
||||||
|
"SRR608_2_reads.fq.gz" : "http://example.com/SRR608_2_reads.fq.gz",
|
||||||
|
"SRR609_1_reads.fq.gz" : "http://example.com/SRR609_1_reads.fq.gz",
|
||||||
|
"SRR609_2_reads.fq.gz" : "http://example.com/SRR609_2_reads.fq.gz"
|
||||||
|
}
|
||||||
|
|
||||||
|
rule download_read:
|
||||||
|
"""
|
||||||
|
Download a single individual read
|
||||||
|
"""
|
||||||
|
input:
|
||||||
|
# The input file is the remote HTTP url
|
||||||
|
HTTP.remote("example.com/{prefix}_{direction}_reads.fq.gz", keep_local=True)
|
||||||
|
output:
|
||||||
|
# The output file is now the read file itself
|
||||||
|
"{prefix}_{direction}_reads.fq.gz"
|
||||||
|
shell:
|
||||||
|
'''
|
||||||
|
curl -L {input} -o {output}
|
||||||
|
'''
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's walk through this step by step.
|
||||||
|
|
||||||
|
We start by importing the `HTTPRemoteProvider`. This is an object that will
|
||||||
|
check if a remote file is available and if it is not the rule fails to
|
||||||
|
execute.
|
||||||
|
|
||||||
|
The `input:` directive contains a call to `HTTP.remote()` that passes the
|
||||||
|
URL of the file, containing the wildcards that are matched in the `output:`
|
||||||
|
directive. The `keep_local=True` flag ensures the downloaded files are
|
||||||
|
not deleted.
|
||||||
|
|
||||||
|
@@ -49,6 +49,14 @@ throughout this documentation and what they mean.
|
|||||||
[Converting Workflows to Snakemake](converting.md) - strategies for
|
[Converting Workflows to Snakemake](converting.md) - strategies for
|
||||||
converting shell script workflows into Snakemake workflows.
|
converting shell script workflows into Snakemake workflows.
|
||||||
|
|
||||||
|
[**Example Overview: Filtering Sequence Reads**](converting.md#example)
|
||||||
|
|
||||||
|
[**Stage 1: Shell Script + Snakefile**](converting.md#stage1)
|
||||||
|
|
||||||
|
[**Stage 2: Replace Script with Snakefile (Hard-Coded)**](converting.md#stage2)
|
||||||
|
|
||||||
|
[**Stage 3: Replace Script with Snakefile (Wildcards)**](converting.md#stage3)
|
||||||
|
|
||||||
|
|
||||||
## Useful Resources
|
## Useful Resources
|
||||||
|
|
||||||
@@ -56,3 +64,6 @@ Following is a list of useful Snakemake resources:
|
|||||||
|
|
||||||
* <https://snakemake.readthedocs.io/>
|
* <https://snakemake.readthedocs.io/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@@ -25,8 +25,8 @@ Once you have Python installed, you should have `pip` available as well.
|
|||||||
Snakemake can be installed using pip:
|
Snakemake can be installed using pip:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ virtualenv -p python3 .venv
|
$ virtualenv vp
|
||||||
$ source .venv/bin/activate
|
$ source vp/bin/activate
|
||||||
$ pip install snakemake
|
$ pip install snakemake
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -35,5 +35,9 @@ $ pip install snakemake
|
|||||||
If you are using conda, you can install Snakemake using conda by first
|
If you are using conda, you can install Snakemake using conda by first
|
||||||
adding some conda channels, then installing Snakemake using `conda install`:
|
adding some conda channels, then installing Snakemake using `conda install`:
|
||||||
|
|
||||||
|
```
|
||||||
|
conda install -c bioconda -c conda-forge snakemake
|
||||||
|
```
|
||||||
|
|
||||||
|
This will install snakemake from the bioconda channel.
|
||||||
|
|
||||||
|
@@ -1,11 +1,32 @@
|
|||||||
# Terminology
|
# Terminology
|
||||||
|
|
||||||
|
* **container** - Containers are like very lightweight
|
||||||
|
virtual machines that can provide an isolated,
|
||||||
|
consistent, reproducible environment in which to
|
||||||
|
run software.
|
||||||
|
|
||||||
|
* **directive** - this refers to subheadings of rules,
|
||||||
|
such as `input:` or `output:` or `shell:`
|
||||||
|
|
||||||
|
* **docker** - Docker is a program that allows running
|
||||||
|
containers. Docker is very popular in enterprise
|
||||||
|
software engineering, but presents challenges for
|
||||||
|
scientific computing because it requires an admin
|
||||||
|
account and presents security risks, making it
|
||||||
|
hard to run in an HPC environment.
|
||||||
|
|
||||||
* **rule** - Snakemake rules define a given task,
|
* **rule** - Snakemake rules define a given task,
|
||||||
the input files it depends on, the output files
|
the input files it depends on, the output files
|
||||||
it produces, the shell commands it should run,
|
it produces, the shell commands it should run,
|
||||||
etc.
|
etc.
|
||||||
|
|
||||||
* **directive** - this refers to subheadings of rules,
|
* **singularity** - Singularity is a program that allows
|
||||||
such as `input:` or `output:` or `shell:`
|
running containers, like Docker, but without requiring
|
||||||
|
an admin account and without many of the security
|
||||||
|
concerns that Docker creates.
|
||||||
|
|
||||||
|
* **Snakefile** - Snakemake is a Python program that is used
|
||||||
|
to run a set of commands in a file; Snakefile is the
|
||||||
|
default name of the file in which Snakemake expects to
|
||||||
|
find those definitions.
|
||||||
|
|
||||||
|
Submodule mkdocs-material updated: b0c6890853...6569122bb1
@@ -21,7 +21,7 @@ theme:
|
|||||||
font:
|
font:
|
||||||
text: 'Bitter'
|
text: 'Bitter'
|
||||||
code: 'PT Mono'
|
code: 'PT Mono'
|
||||||
nav:
|
pages:
|
||||||
- "Index" : "index.md"
|
- "Index" : "index.md"
|
||||||
- "Installing Snakemake" : "installing.md"
|
- "Installing Snakemake" : "installing.md"
|
||||||
- "Snakemake Terminology" : "terminology.md"
|
- "Snakemake Terminology" : "terminology.md"
|
||||||
@@ -34,6 +34,3 @@ markdown_extensions:
|
|||||||
guess_lang: false
|
guess_lang: false
|
||||||
- toc:
|
- toc:
|
||||||
permalink: true
|
permalink: true
|
||||||
|
|
||||||
|
|
||||||
strict: true
|
|
||||||
|
Reference in New Issue
Block a user