View on GitHub

Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.

Support scripts

In addition to the main species_separator script, execution of the Sargasso pipeline relies on a number of supporting Python and Bash scripts. Their usage patterns are described here; note, however, that in normal usage these scripts need not be executed directly by the user.

build_bowtie2_index (Bash)

Usage:

build_bowtie2_index
    <sequence-fasta-file> <num-threads> <index-dir> <bowtie2-build-executable>

Build a Bowtie2 index for a species’ genome. build_bowtie2_index is called from the species separation Makefile.

Options:

build_star_index (Bash)

Usage:

build_star_index
    <sequence-fasta-files> <gtf-file> <num-threads> <index-dir> <star-executable>

Build a STAR index for a species’ genome. build_star_index is called from the species separation Makefile.

Options:

collate_raw_reads (Bash)

Usage:

collate_raw_reads
    <samples> <raw-reads-directory> <reads-dir> <reads-type>
    <raw-read-files-1> <raw-read-files-2>

Assemble links to the FASTQ files containing raw sequencing reads for each sample. collate_raw_reads is called from the species separation Makefile.

filter_control (Python)

Usage:

filter_control
    [--log-level=<log-level>] [--reject-multimaps]
    <block-dir> <output-dir> <sample-name> 
    <mismatch-threshold> <minmatch-threshold> <multimap-threshold> 
    (<species>) (<species>) ...

Takes as input a directory containing sets of BAM files, each set being the result of mapping a set of mixed species sequencing reads against each species’ genome (in normal operation, all pairs of BAM files will correspond to a single sample, having been split in pieces for efficiency of filtering). Each set of BAM files is passed to an instance of the script filter_sample_reads, running on a separate thread, which writes filtered read mappings to a set of species-specific output BAM files.

filter_control is called by the script filter_reads.

filter_reads (Bash)

Usage:

filter_reads
    <data_type> <samples>
    <input-dir> <output-dir> <num-threads>
    <mismatch-threshold> <minmatch-threshold> <multimap-threshold>
    <reject-multimaps>
    (<species>) (<species>) ...

For each sample, take the sequencing reads mapping to each genome, and assign them to their correct species of origin. filter_reads is called by the species separation Makefile.

filter_sample_reads (Python)

Usage:

filter_sample_reads
    [--log-level=<log-level>] [--reject-multimaps]
    <mismatch-threshold> <minmatch-threshold> <multimap-threshold>
    (<species> <species-input-bam> <species-output-bam>)
    (<species> <species-input-bam> <species-output-bam>) ...

filter_sample_reads takes a set of BAM files as input, the results of mapping a set of mixed species sequencing reads against each species’ genome, and determines, where possible, from which species each read or read pair originates. Disambiguated read mappings are written to a set of species-specific output BAM files. Note that the input BAM files must be sorted in read order (and should contain mappings for the same set of reads) — failure to ensure input BAM files are correctly sorted will result in erroneous output.

filter_sample_reads is called by the script filter_control.

map_reads_dnaseq (Bash)

Usage:

map_reads_dnaseq
    <species> <samples> <bowtie-indexes-dir> <num-threads>
    <input-dir> <output-dir> <reads-type> <bowtie2-executable>

For each sample, map raw sequencing reads to each species’ genome. map_reads_dnaseq is called by the species separation Makefile.

map_reads_rnaseq (Bash)

Usage:

map_reads_rnaseq
    <species> <samples> <bowtie-indexes-dir> <num-threads>
    <input-dir> <output-dir> <reads-type> <star-executable>

For each sample, map raw RNA-seq reads to each species’ genome. map_reads_rnaseq is called by the species separation Makefile.

sort_reads (Bash)

Usage:

sort_reads
    <species> <samples> <num-threads> <input-dir> <output-dir> <tmp-dir>

For each sample, sort mapped reads for each species into name order. sort_reads is called by the species separation Makefile.

Next: Choosing parameters