Snakemake workflows as command line utilities that run on k8s (kubernetes)... Bring Your Own Kubernetes
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 

5.4 KiB

2019-snakemake-byok8s

travis
license

Overview

This is an example of a Snakemake workflow that:

  • is a command line utility
  • is bundled as a Python package
  • is designed to run on a Kubernetes cluster
  • can be tested locally or with Travis CI using minikube

Snakemake functionality is provided through
a command line tool called byok8s, so that
it allows you to do this (abbreviated for clarity):

# Create virtual k8s cluster
minikube start

# Run the workflow
byok8s --s3-bucket=mah-s3-bukkit my-workflowfile my-paramsfile

# Clean up the virtual k8s cluster
minikube stop

Snakemake workflows are provided via a Snakefile by
the user. Snakemake runs tasks on the Kubernetes (k8s)
cluster. The approach is for the user to provide
their own Kubernetes cluster (byok8s = Bring Your
Own Kubernetes).

The example above uses minikube
to make a virtual k8s cluster, useful for testing.

For real workflows, your options for
kubernetes clusters are cloud providers:

  • AWS EKS (Elastic Container Service)
  • GCP GKE (Google Kuberntes Engine)
  • Digital Ocean Kubernetes service
  • etc...

The Travis CI tests utilize minikube to run
test workflows.

Quickstart

This runs through the installation and usage
of 2019-snakemake-byok8s.

Step 1: Set up Kubernetes cluster with minikube.

Step 2: Install byok8s.

Step 3: Run the byok8s workflow using the Kubernetes cluster.

Step 4: Tear down Kubernetes cluster with minikube.

Step 1: Set Up Virtual Kubernetes Cluster

For the purposes of the quickstart, we will walk
through how to set up a local, virtual Kubernetes
cluster using minikube.

Start by installing minikube:

scripts/install_minikube.sh

Once it is installed, you can start up a kubernetes cluster
with minikube using the following commands:

cd test
minikube start

NOTE: If you are running on AWS, run this command first

minikube config set vm-driver none

to set the the vm driver to none and use native Docker to run stuff.

If you are running on AWS, the DNS in the minikube
kubernetes cluster will not work, so run this command
to fix the DNS settings (should be run from the
test/ directory):

kubectl apply -f fixcoredns.yml
kubectl delete --all pods --namespace kube-system

Step 2: Install byok8s

Start by setting up a python virtual environment,
and install the required packages into the
virtual environment:

pip install -r requirements.txt

This installs snakemake and kubernetes Python
modules. Now install the byok8s command line
tool:

python setup.py build install

Now you can run:

which byok8s

and you should see byok8s in your virtual
environment’s bin/ directory.

This command line utility will expect a kubernetes
cluster to be set up before it is run.

Setting up a kubernetes cluster will create...
(fill in more info here)...

Snakemake will automatically create the pods
in the cluster, so you just need to allocate
a kubernetes cluster.

Step 3: Run byok8s

Now you can run the workflow with the byok8s command.
This submits the Snakemake workflow jobs to the Kubernetes
cluster that minikube created.

You should have your workflow in a Snakefile in the
current directory. Use the --snakefile flag if it is
named something other than Snakefile.

You will also need to specify your AWS credentials
via the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
environment variables. These are used to to access
S3 buckets for file I/O.

Finally, you will need to create an S3 bucket for
Snakemake to use for file I/O. Pass the name of the
bucket using the --s3-bucket flag.

Start by exporting these two vars (careful to
scrub them from bash history):

 export AWS_ACCESS_KEY_ID=XXXXX
 export AWS_SECRET_ACCESS_KEY=XXXXX

Run the alpha workflow with blue params:

byok8s --s3-bucket=mah-bukkit workflow-alpha params-blue

Run the alpha workflow with red params:

byok8s --s3-bucket=mah-bukkit workflow-alpha params-red

Run the gamma workflow with red params, &c:

byok8s --s3-bucket=mah-bukkit workflow-gamma params-red

(NOTE: May want to let the user specify
input and output directories with flags.)

All input files are searched for relative to the working
directory.

Step 4: Tear Down Kubernetes Cluster

The last step once the workflow has been finished,
is to tear down the kubernetes cluster. The virtual
kubernetes cluster created by minikube can be torn
down with the following command:

minikube stop

Using Kubernetes with Cloud Providers

Cloud Provider Kubernetes Service Guide
Minikube (on AWS EC2) Minikube Minikube AWS Guide
Google Cloud Platform (GCP) Google Container Engine (GKE) GCP GKE Guide
Amazon Web Services (AWS) Elastic Container Service (EKS) AWS EKS Guide
Digital Ocean (DO) DO Kubernetes (DOK) DO DOK Guide