# 2019-snakemake-byok8s [![travis](https://img.shields.io/travis/charlesreid1/2019-snakemake-byok8s.svg)](https://travis-ci.org/charlesreid1/2019-snakemake-byok8s) [![license](https://img.shields.io/github/license/charlesreid1/2019-snakemake-byok8s.svg)](https://github.com/charlesreid1/2019-snakemake-byok8s/blob/master/LICENSE) # Overview This is an example of a Snakemake workflow that: - is a command line utility - is bundled as a Python package - is designed to run on a Kubernetes cluster - can be tested locally or with Travis CI using minikube Snakemake functionality is provided through a command line tool called `byok8s`, so that it allows you to do this (abbreviated for clarity): ``` # Create virtual k8s cluster minikube start # Run the workflow byok8s --s3-bucket=mah-s3-bukkit my-workflowfile my-paramsfile # Clean up the virtual k8s cluster minikube stop ``` Snakemake workflows are provided via a Snakefile by the user. Snakemake runs tasks on the Kubernetes (k8s) cluster. The approach is for the user to provide their own Kubernetes cluster (byok8s = Bring Your Own Kubernetes). The example above uses [`minikube`](https://github.com/kubernetes/minikube) to make a virtual k8s cluster, useful for testing. For real workflows, your options for kubernetes clusters are cloud providers: - AWS EKS (Elastic Container Service) - GCP GKE (Google Kuberntes Engine) - Digital Ocean Kubernetes service - etc... The Travis CI tests utilize minikube to run test workflows. # Quickstart This runs through the installation and usage of `2019-snakemake-byok8s`. Step 1: Set up Kubernetes cluster with `minikube`. Step 2: Install `byok8s`. Step 3: Run the `byok8s` workflow using the Kubernetes cluster. Step 4: Tear down Kubernetes cluster with `minikube`. ## Step 1: Set Up Virtual Kubernetes Cluster For the purposes of the quickstart, we will walk through how to set up a local, virtual Kubernetes cluster using `minikube`. Start by installing minikube: ``` scripts/install_minikube.sh ``` Once it is installed, you can start up a kubernetes cluster with minikube using the following commands: ``` cd test minikube start ``` NOTE: If you are running on AWS, run this command first ``` minikube config set vm-driver none ``` to set the the vm driver to none and use native Docker to run stuff. If you are running on AWS, the DNS in the minikube kubernetes cluster will not work, so run this command to fix the DNS settings (should be run from the `test/` directory): ``` kubectl apply -f fixcoredns.yml kubectl delete --all pods --namespace kube-system ``` ## Step 2: Install byok8s Start by setting up a python virtual environment, and install the required packages into the virtual environment: ``` pip install -r requirements.txt ``` This installs snakemake and kubernetes Python modules. Now install the `byok8s` command line tool: ``` python setup.py build install ``` Now you can run: ``` which byok8s ``` and you should see `byok8s` in your virtual environment's `bin/` directory. This command line utility will expect a kubernetes cluster to be set up before it is run. Setting up a kubernetes cluster will create... (fill in more info here)... Snakemake will automatically create the pods in the cluster, so you just need to allocate a kubernetes cluster. ## Step 3: Run byok8s Now you can run the workflow with the `byok8s` command. This submits the Snakemake workflow jobs to the Kubernetes cluster that minikube created. You should have your workflow in a `Snakefile` in the current directory. Use the `--snakefile` flag if it is named something other than `Snakefile`. You will also need to specify your AWS credentials via the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables. These are used to to access S3 buckets for file I/O. Finally, you will need to create an S3 bucket for Snakemake to use for file I/O. Pass the name of the bucket using the `--s3-bucket` flag. Start by exporting these two vars (careful to scrub them from bash history): ``` export AWS_ACCESS_KEY_ID=XXXXX export AWS_SECRET_ACCESS_KEY=XXXXX ``` Run the alpha workflow with blue params: ``` byok8s --s3-bucket=mah-bukkit workflow-alpha params-blue ``` Run the alpha workflow with red params: ``` byok8s --s3-bucket=mah-bukkit workflow-alpha params-red ``` Run the gamma workflow with red params, &c: ``` byok8s --s3-bucket=mah-bukkit workflow-gamma params-red ``` (NOTE: May want to let the user specify input and output directories with flags.) All input files are searched for relative to the working directory. ## Step 4: Tear Down Kubernetes Cluster The last step once the workflow has been finished, is to tear down the kubernetes cluster. The virtual kubernetes cluster created by minikube can be torn down with the following command: ``` minikube stop ``` # Using Kubernetes with Cloud Providers | Cloud Provider | Kubernetes Service | Guide | |-----------------------------|---------------------------------|----------------------------------------------| | Minikube (on AWS EC2) | Minikube | [Minikube AWS Guide](kubernetes_minikube.md) | | Google Cloud Platform (GCP) | Google Container Engine (GKE) | [GCP GKE Guide](kubernetes_gcp.md) | | Amazon Web Services (AWS) | Elastic Container Service (EKS) | [AWS EKS Guide](kubernetes_aws.md) | | Digital Ocean (DO) | DO Kubernetes (DOK) | [DO DOK Guide](kubernetes_dok.md) |