Charles Reid
b810e5e362
|
6 years ago | |
---|---|---|
cli | 6 years ago | |
scripts | 6 years ago | |
test | 6 years ago | |
.gitignore | 6 years ago | |
.travis.yml | 6 years ago | |
LICENSE | 6 years ago | |
MANIFEST.in | 6 years ago | |
README.md | 6 years ago | |
kubernetes_aws.md | 6 years ago | |
kubernetes_dok.md | 6 years ago | |
kubernetes_gcp.md | 6 years ago | |
kubernetes_minikube.md | 6 years ago | |
requirements-to-freeze.txt | 6 years ago | |
requirements.txt | 6 years ago | |
setup.py | 6 years ago |
README.md
2019-snakemake-byok8s
Overview
This is an example of a Snakemake workflow that:
- is a command line utility
- is bundled as a Python package
- is designed to run on a Kubernetes cluster
- can be tested locally or with Travis CI using minikube
Snakemake functionality is provided through
a command line tool called byok8s
, so that
it allows you to do this (abbreviated for clarity):
# Create virtual k8s cluster
minikube start
# Run the workflow
byok8s --s3-bucket=mah-s3-bukkit my-workflowfile my-paramsfile
# Clean up the virtual k8s cluster
minikube stop
Snakemake workflows are provided via a Snakefile by the user. Snakemake runs tasks on the Kubernetes (k8s) cluster. The approach is for the user to provide their own Kubernetes cluster (byok8s = Bring Your Own Kubernetes).
The example above uses minikube
to make a virtual k8s cluster, useful for testing.
For real workflows, your options for kubernetes clusters are cloud providers:
- AWS EKS (Elastic Container Service)
- GCP GKE (Google Kuberntes Engine)
- Digital Ocean Kubernetes service
- etc...
The Travis CI tests utilize minikube to run test workflows.
Quickstart
This runs through the installation and usage
of 2019-snakemake-byok8s
.
Step 1: Set up Kubernetes cluster with minikube
.
Step 2: Install byok8s
.
Step 3: Run the byok8s
workflow using the Kubernetes cluster.
Step 4: Tear down Kubernetes cluster with minikube
.
Step 1: Set Up Virtual Kubernetes Cluster
For the purposes of the quickstart, we will walk
through how to set up a local, virtual Kubernetes
cluster using minikube
.
Start by installing minikube:
scripts/install_minikube.sh
Once it is installed, you can start up a kubernetes cluster with minikube using the following commands:
cd test
minikube start
NOTE: If you are running on AWS, run this command first
minikube config set vm-driver none
to set the the vm driver to none and use native Docker to run stuff.
If you are running on AWS, the DNS in the minikube
kubernetes cluster will not work, so run this command
to fix the DNS settings (should be run from the
test/
directory):
kubectl apply -f fixcoredns.yml
kubectl delete --all pods --namespace kube-system
Step 2: Install byok8s
Start by setting up a python virtual environment, and install the required packages into the virtual environment:
pip install -r requirements.txt
This installs snakemake and kubernetes Python
modules. Now install the byok8s
command line
tool:
python setup.py build install
Now you can run:
which byok8s
and you should see byok8s
in your virtual
environment's bin/
directory.
This command line utility will expect a kubernetes cluster to be set up before it is run.
Setting up a kubernetes cluster will create... (fill in more info here)...
Snakemake will automatically create the pods in the cluster, so you just need to allocate a kubernetes cluster.
Step 3: Run byok8s
Now you can run the workflow with the byok8s
command.
This submits the Snakemake workflow jobs to the Kubernetes
cluster that minikube created.
You should have your workflow in a Snakefile
in the
current directory. Use the --snakefile
flag if it is
named something other than Snakefile
.
You will also need to specify your AWS credentials
via the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables. These are used to to access
S3 buckets for file I/O.
Finally, you will need to create an S3 bucket for
Snakemake to use for file I/O. Pass the name of the
bucket using the --s3-bucket
flag.
Start by exporting these two vars (careful to scrub them from bash history):
export AWS_ACCESS_KEY_ID=XXXXX
export AWS_SECRET_ACCESS_KEY=XXXXX
Run the alpha workflow with blue params:
byok8s --s3-bucket=mah-bukkit workflow-alpha params-blue
Run the alpha workflow with red params:
byok8s --s3-bucket=mah-bukkit workflow-alpha params-red
Run the gamma workflow with red params, &c:
byok8s --s3-bucket=mah-bukkit workflow-gamma params-red
(NOTE: May want to let the user specify input and output directories with flags.)
All input files are searched for relative to the working directory.
Step 4: Tear Down Kubernetes Cluster
The last step once the workflow has been finished, is to tear down the kubernetes cluster. The virtual kubernetes cluster created by minikube can be torn down with the following command:
minikube stop
Using Kubernetes with Cloud Providers
Cloud Provider | Kubernetes Service | Guide |
---|---|---|
Minikube (on AWS EC2) | Minikube | Minikube AWS Guide |
Google Cloud Platform (GCP) | Google Container Engine (GKE) | GCP GKE Guide |
Amazon Web Services (AWS) | Elastic Container Service (EKS) | AWS EKS Guide |
Digital Ocean (DO) | DO Kubernetes (DOK) | DO DOK Guide |