Getting Started

Installation

snakeobjects is tested and works well on Linux and Mac; it doesn’t work on Windows. By far, the easiest method for installing snakeobjects is to use the snakeobjects conda package available at the iossifovlab channel (SOON AT bioconda!!). This method requires for conda or miniconda to be installed. (See Conda Installation). For faster installation it is recommended to first install mamba:

$ conda install -n base -c conda-forge mamba

With mamba ready, installing snakeobjects is simple:

$ conda activate base
$ mamba create -c bioconda -c conda-forge -n snakeobjects snakeobjects

After conda install finishes, you can use the sobjects version command to check if the installation was successful:

$ sobjects version
3.1.4

Hello world pipeline

We will show how you can create and execute a small pipeline, and we will use it to introduce some of the basic steps for working with snakeobjects projects and pipelines.

Let’s create a new directory to use both as a pipeline and as a project directory. The directory’s location and name do not matter, but we will use /tmp/helloWorld for the example below. In this directory, you should create three files. The first file should be called build_object_graph.py and should contain the following two lines:

def run(proj,OG):
        OG.add("hello","world")

The second file should be called hello.snakefile and contain:

add_targets("result.txt")
rule createResult:
   output: T("result.txt")
   shell: "echo 'hello world' > {output}"

Finally, the third file should be named so_project.yaml and should be empty. If you don’t want to be bothered creating a directory, copying, and pasting, you can instead download and extract (tar xzf helloWorld.tgz) the files from helloWorld.tgz.

The build_object_graph.py and the hello.snakefile files comprise our pipeline. The build_object_graph.py is a script that creates the object graph for our project containing only one object with object type hello and object id world. The hello.snakefile declares that objects of type hello have one target, result.txt, and includes the rule to create such a target. The so_project.yaml file indicates that the directory will be used as a snakeobjects project directory and will contain the results of the pipeline’s execution.

With snakeobjects, we execute a pipeline over a project in two steps: sobjects prepare and sobjects run. We perform both using the sobjects command-line utility from within our project directory.

$ cd /tmp/helloWorld

$ sobjects prepare
# WORKING ON PROJECT /tmp/helloWorld
# WITH PIPELINE /tmp/helloWorld

sobjects run -j -q
# WORKING ON PROJECT /tmp/helloWorld
# WITH PIPELINE /tmp/helloWorld
UPDATING ENVIRONMENT:
export SO_PROJECT=/tmp/helloWorld
export SO_PIPELINE=/tmp/helloWorld
export PATH=$SO_PIPELINE:$PATH
RUNNING: snakemake -s /tmp/helloWorld/Snakefile -d /tmp/helloWorld -j -q
Job stats:
job               count    min threads    max threads
--------------  -------  -------------  -------------
createResult          1              1              1
so_all_targets        1              1              1
so_hello_obj          1              1              1
total                 3              1              1

The sobjects prepare performs a few initialization steps. sobjects run does the ‘heavy lifting’ using the Snakemake to execute the rules for creating the object targets. The execution of our helloWorld pipeline should finish instantly, and we can find the file for the result.txt target in the directory hello creates for our single hello/world object:

$ cat /tmp/helloWorld/hello/world/result.txt
hello world

What’s next

We strongly suggest that you examine our extensive Tutorial next. It introduces all the components necessary to design complex workflows and to apply them to large projects. You can find more examples in the Additional Examples. For a high-level overview of snakeobjects, you should read the snakeobjects paper [REF to come]. You can find a detailed reference for all of the snakeobjects’ components in the rest of this documentation package.