Special topics
Working with clusters
Snakeobjects provides transparent access to different cluster architectures
such as slurm, SGE, and others. This is accomplished by placing cluster
profile path into so_project.yaml file with the name
default_snakemake_args as in this example:
default_snakemake_args: --profile <path to profile folder>
Profile folder should contain file config.yaml with directives specific to
cluster architecture. For more information on profiles see
https://github.com/snakemake-profiles/doc.
Multi-part targets
Sometimes target computing time exceeds several hours or even days. This
negatively affects overall project management: unstable file system,
failure of computational node, or many other causes may delay processing
downstream targets. With multiple processing units it may be valuable to split
a target into sub-targets. In bioinformatics it is convenient to restrict bam
file processing by extracting data for individual chromosomes and then
accumulating partial results in a separate target. In snakeobjects we have
special functions that make subdivision of a target into smaller ones with
the following aggregation of these parts into original target easy to
accomplish. Below is an example.
ids = list_of_part_ids
rule beg:
output:
T('beg-{c}.txt')
shell:
"initialize.sh {wildcards.oid} {wildcards.c} > {output}"
rule part:
input:
T('beg-{c}.txt')
output:
T('part-{c}.txt')
shell:
"process_part.sh {wildcards.oid} {wildcards.c} > {output}; "
rule mergedI:
input:
expand(TE('part-{c}.txt'),c=ids)
output:
T('merged.txt')
shell:
"merge_parts.sh {input} > {output}"
Here initialize.sh, process_part.sh, and merge_parts.sh are appropriate
commands user should provide for his/her application. They are not restricted
to shell codes, but can be python scripts or other executables. More specific
simple examples are presented in demos/d5 and demos/d8. The important element
in this implementation is function TE().