use SPAdes to assemble the data. SPAdes is a swiss-army knife of genome assembly tools, and by default includes read correction. This takes up lots of RAM, so we are going to skip it. We will also only use 3 kmers to save time:
./SPAdes-3.6.2-Linux/bin/spades.py --only-assembler
-t 4 -k 21,51,71
-1 SRR2627175_1.fastq.gz
-2 SRR2627175_2.fastq.gz
--nanopore minion.pass.2D.fastq
-o SPAdes_hybrid &
Use samtools to extract the top contig:
head -n 1 SPAdes_hybrid/contigs.fasta
samtools faidx SPAdes_hybrid/contigs.fasta
samtools faidx SPAdes_hybrid/contigs.fasta NODE_1_length_4620446_cov_135.169_ID_22238 > single_contig.fa
Finally, a quick comparison to the reference:
sudo apt-get install mummer
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NC_000913.3&rettype=fasta&retmode=txt" > NC_000913.3.fa
nucmer NC_000913.3.fa single_contig.fa
mummerplot -png out.delta
display out.png &
We now need to install the poRe dependencies in R, which is very easy:
R
source("http://www.bioconductor.org/biocLite.R")
biocLite("rhdf5")
install.packages(c("shiny","bit64","data.table","svDialogs"))
q()
R may ask if you want to install into a local library, just say Y and accept defaults. We need to download poRe from sourecforge and we are using version 0.16
Once downloaded, and back at the Linux command line:
R CMD INSTALL poRe_0.16.tar.gz
The fastq extraction scripts for poRe are in github, so let’s go get those:
git clone https://github.com/mw55309/poRe_scripts.git
We will assemble using SPAdes, so let’s go get that:
wget http://spades.bioinf.spbau.ru/release3.6.2/SPAdes-3.6.2-Linux.tar.gz
gunzip < SPAdes-3.6.2-Linux.tar.gz | tar xvf -
Now, we are ready to go. First off, let’s extract the 2D sequence data as FASTQ from the MinION data. Nick’s SQK-MAP-006 data are in the old FAST5 format so we use the script in “old_format”:
./poRe_scripts/old_format/extract2D MAP006-1/MAP006-1_downloads/pass/ > minion.pass.2D.fastq &