When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.
Again, running spades.py will show you the options:
spades.py
This produces:
SPAdes genome assembler v3.10.1
Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o <output_dir>
Basic options:
-o <output_dir> directory to store all the resulting files (required)
--sc this flag is required for MDA (single-cell) data
--meta this flag is required for metagenomic sample data
--rna this flag is required for RNA-Seq data
--plasmid runs plasmidSPAdes pipeline for plasmid detection
--iontorrent this flag is required for IonTorrent data
--test runs SPAdes on toy dataset
-h/--help prints this usage message
-v/--version prints version
Input data:
--12 <filename> file with interlaced forward and reverse paired-end reads
-1 <filename> file with forward paired-end reads
-2 <filename> file with reverse paired-end reads
-s <filename> file with unpaired reads
--pe<#>-12 <filename> file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1 <filename> file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2 <filename> file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s <filename> file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-<or> orientation of reads for paired-end library number <#> (<#> = 1,2,..,9; <or> = fr, rf, ff)
--s<#> <filename> file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12 <filename> file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1 <filename> file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2 <filename> file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s <filename> file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-<or> orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9; <or> = fr, rf, ff)
--hqmp<#>-12 <filename> file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1 <filename> file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2 <filename> file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s <filename> file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-<or> orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9; <or> = fr, rf, ff)
--nxmate<#>-1 <filename> file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2 <filename> file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger <filename> file with Sanger reads
--pacbio <filename> file with PacBio reads
--nanopore <filename> file with Nanopore reads
--tslr <filename> file with TSLR-contigs
--trusted-contigs <filename> file with trusted contigs
--untrusted-contigs <filename> file with untrusted contigs
Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler runs only assembling (without read error correction)
--careful tries to reduce number of mismatches and short indels
--continue continue run from the last available check-point
--restart-from <cp> restart run with updated options and from the specified check-point ('ec', 'as', 'k<int>', 'mc')
--disable-gzip-output forces error correction not to compress the corrected reads
--disable-rr disables repeat resolution stage of assembling
Advanced options:
--dataset <filename> file with dataset description in YAML format
-t/--threads <int> number of threads
[default: 16]
-m/--memory <int> RAM limit for SPAdes in Gb (terminates if exceeded)
[default: 250]
--tmp-dir <dirname> directory for temporary files
[default: <output_dir>/tmp]
-k <int,int,...> comma-separated list of k-mer sizes (must be odd and
less than 128) [default: 'auto']
--cov-cutoff <float> coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64)
[default: auto-detect]
As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:
spades.py -t 4 \
-m 32 \
-k 31,51,71 \
--only-assembler \
-1 miseq.1.fastq -2 miseq.2.fastq \
--nanopore minion.fastq \
-o hybrid_assembly
In turn, these parameters mean
Comments
use SPAdes to assemble the data. SPAdes is a swiss-army knife of genome assembly tools, and by default includes read correction. This takes up lots of RAM, so we are going to skip it. We will also only use 3 kmers to save time:
Use samtools to extract the top contig:
Finally, a quick comparison to the reference: