View on GitHub

Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.

Choosing parameters

While Sargasso includes a number of “pre-packaged” filtering strategies, each providing a particular balance between sensitivity and specificity, users may be interested to explore choices of filtering parameter in greater depth, allowing them to select parameters that are tailored to their own experimental situation and the sequencing technology used. The sargasso_parameter_test script is provided to aid this process.

Usage

The sargasso_parameter_test script runs Sargasso a number of times, on a set of provided samples, using different filtering parameters each time. Filtering statistics are gathered, and informative plots of sensitivity and specificity are produced, allowing users to examine the trade-offs between these two measures.

n.b. it is intended that sargasso_parameter_test be run on single-species samples, thus allowing incorrectly assigned reads to be easily identified.

sargasso_parameter_test.sh <data-type>
    [--mapper-executable <mapper-executable>]
    [--reads-base-dir <reads-base-dir>]
    [--mismatch-setting '<mismatch-values>']
    [--minmatch-setting '<minmatch-values>']
    [--multimap-setting '<multimap-values>']
    [--num-threads <num-threads>]
    [--plot-format <plot-format>]
    [--skip-init-run]
    --samples-origin '<species> <species> ...'
    <samples-file> <output-dir>
    (<species> <species-info>)
    (<species> <species-info>)
     ...

Script parameters are described below:

Output

Sargasso filtering will be run for every combination of the values of the parameters --mismatch-setting, --minmatch-setting and --multimap-setting, though initial mapping will only be run once (or not at all, if the flag --skip-init-run is set, in which case mapping is assumed to have already been run using this script). The main output of sargasso_parameter_test are two plots – counts.{png|pdf} and percentage.{png|pdf}. These show, respectively, the number or percentage of reads which are assigned, marked as ambiguous, or rejected, for each (i) sample, (ii) species into which data is being separated, and (iii) combination of the mismatch, minmatch and multimap parameters.

Examining these graphs can help the user to choose which filtering parameter values will achieve an acceptable assignment of data to the correct species of origin, while minimising misassignment to the wrong species.

Example usage

sargasso_parameter_test rnaseq 
    --samples-origin 'rat' 
    --mismatch-setting '0 2 4' 
    --minmatch-setting '0 2 4' 
    --multimap-setting '1' 
    --plot-format png 
    test_sample.tsv ~/tmp/sargasso_test 
    rat /srv/data/genome/rat/ensembl-103/STAR_indices/toplevel 
    human /srv/data/genome/human/ensembl-103/STAR_indices/primary_assembly

In this case, we are running the test script using a single sample, whose true species of origin is rat (paths to the raw reads files for the sample are contained in the file test_sample.tsv). Sargasso filtering will be run for every combination of the specified mismatch (0, 2, 4), minmatch (0, 2, 4), and multimap (1) settings – a total of 9 runs – separating the input sample into rat and human reads (where any reads assigned to human will be incorrect assignments). The resulting graphs counts.png and percentage.png will be written into the output directory ~/tmp/sargasso_test.

Next: References