BOL: Run miniasm assembler on nanopore reads !

Pages
Jit
Run miniasm assembler on nanopore reads !

Run miniasm assembler on nanopore reads !

Last updated 2320 days ago by Jit Comments (5)

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Find the detail of the reads repeats:

fq2fa ONT_A.fastq ONT_A.fasta

minimap2 -xava-ont ONT_A.fasta ONT_A.fasta -t10 -X > AONT.paf

awk '{if($1==$6){print}}' AONT.paf > AONTself.paf

awk '$5=="-"' AONTself.paf | awk '{print $1}'| sort|uniq > invertedrepeat.list

Generated a few palindrome and repeats plots (highlighting only repeats largest than 10, 20 and 30 kb)

minidot -f 5 -m 30000 AONTself.paf > AONTself30000.eps
sed 's/_template_pass_FAH31515//' AONTself30000.eps > AONTself30000final.eps

minidot -f 5 -m 20000 AONTself.paf > AONTself20000.eps
sed 's/_template_pass_FAH31515//' AONTself20000.eps > AONTself20000final.eps

minidot -f 5 -m 10000 AONTself.paf > AONTself10000.eps
sed 's/_template_pass_FAH31515//' AONTself10000.eps > AONTself10000final.eps

Assemble with miniasm:

miniasm -f ONT_A.fasta AONT.paf > AONT.gfa
grep '^S' AONT.gfa |awk '{print ">"$2"\n"$3}' > AONT_miniasm.fasta

minimap2 -xasm10 AONT_miniasm.fasta AONT_miniasm.fasta -t1 -X > AONT_miniasm.paf

awk '{if($1==$6){print}}' AONT_miniasm.paf > AONT_miniasm_self.paf

minidot -f 5 -m 10000 AONT_miniasm_self.paf > AONT_miniasm_self10000.eps

Njoy the assembly !

Comments

- Jit@jit.aber
Jit 2271 days ago
urbe@urbo214b[allONT] minimap2 -xava-ont both.fastq both.fastq -t20 -X > AONT.paf []
[M::mm_idx_gen::107.693*1.96] collected minimizers
[M::mm_idx_gen::117.963*3.34] sorted minimizers
[M::main::117.963*3.34] loaded/built the index for 702657 target sequence(s)
[M::mm_mapopt_update::121.629*3.27] mid_occ = 525
[M::mm_idx_stat] kmer size: 15; skip: 5; is_HPC: 0; #seq: 702657
[M::mm_idx_stat::123.917*3.22] distinct minimizers: 182052563 (37.70% are singletons); average occurrences: 7.475; average spacing: 2.940
[M::worker_pipeline::208.871*9.76] mapped 120736 sequences
[M::worker_pipeline::288.250*12.60] mapped 101108 sequences
[M::worker_pipeline::370.153*14.25] mapped 87860 sequences
[M::worker_pipeline::446.515*15.24] mapped 83415 sequences
[M::worker_pipeline::520.833*15.92] mapped 82980 sequences
[M::worker_pipeline::600.252*16.47] mapped 85392 sequences
[M::worker_pipeline::674.217*16.86] mapped 85983 sequences
[M::worker_pipeline::748.729*17.18] mapped 55098 sequences
[M::worker_pipeline::822.918*17.44] mapped 54844 sequences
[M::worker_pipeline::899.174*17.66] mapped 55252 sequences
[M::worker_pipeline::966.032*17.83] mapped 54720 sequences
[M::worker_pipeline::1043.026*17.99] mapped 62540 sequences
[M::worker_pipeline::1120.649*18.13] mapped 58892 sequences
[M::worker_pipeline::1197.893*18.26] mapped 56897 sequences
[M::worker_pipeline::1276.147*18.37] mapped 55813 sequences
[M::worker_pipeline::1353.298*18.46] mapped 56524 sequences
[M::worker_pipeline::1425.963*18.54] mapped 55836 sequences
[M::worker_pipeline::1488.410*18.61] mapped 55247 sequences
[M::worker_pipeline::1527.475*18.63] mapped 45017 sequences
[M::mm_idx_gen::1641.600*17.47] collected minimizers
[M::mm_idx_gen::1649.738*17.48] sorted minimizers
[M::main::1649.738*17.48] loaded/built the index for 455528 target sequence(s)
[M::mm_mapopt_update::1649.738*17.48] mid_occ = 525
[M::mm_idx_stat] kmer size: 15; skip: 5; is_HPC: 0; #seq: 455528
[M::mm_idx_stat::1651.772*17.46] distinct minimizers: 186643623 (37.28% are singletons); average occurrences: 7.298; average spacing: 2.937
[M::worker_pipeline::1735.440*17.56] mapped 120736 sequences
[M::worker_pipeline::1812.396*17.66] mapped 101108 sequences
[M::worker_pipeline::1891.734*17.76] mapped 87860 sequences
[M::worker_pipeline::1970.370*17.85] mapped 83415 sequences
[M::worker_pipeline::2046.242*17.93] mapped 82980 sequences
[M::worker_pipeline::2123.177*18.01] mapped 85392 sequences
[M::worker_pipeline::2195.681*18.07] mapped 85983 sequences
[M::worker_pipeline::2271.864*18.14] mapped 55098 sequences
[M::worker_pipeline::2348.983*18.20] mapped 54844 sequences
[M::worker_pipeline::2417.805*18.25] mapped 55252 sequences
[M::worker_pipeline::2480.751*18.30] mapped 54720 sequences
[M::worker_pipeline::2551.193*18.34] mapped 62540 sequences
[M::worker_pipeline::2623.354*18.39] mapped 58892 sequences
[M::worker_pipeline::2701.793*18.44] mapped 56897 sequences
[M::worker_pipeline::2779.713*18.48] mapped 55813 sequences
[M::worker_pipeline::2857.669*18.53] mapped 56524 sequences
[M::worker_pipeline::2935.004*18.57] mapped 55836 sequences
[M::worker_pipeline::3001.654*18.60] mapped 55247 sequences
[M::worker_pipeline::3044.627*18.61] mapped 45017 sequences
[M::mm_idx_gen::3084.077*18.40] collected minimizers
[M::mm_idx_gen::3086.956*18.40] sorted minimizers
[M::main::3086.956*18.40] loaded/built the index for 155969 target sequence(s)
[M::mm_mapopt_update::3086.956*18.40] mid_occ = 525
[M::mm_idx_stat] kmer size: 15; skip: 5; is_HPC: 0; #seq: 155969
[M::mm_idx_stat::3089.212*18.39] distinct minimizers: 129075783 (48.61% are singletons); average occurrences: 3.628; average spacing: 2.938
[M::worker_pipeline::3119.876*18.39] mapped 120736 sequences
[M::worker_pipeline::3148.589*18.40] mapped 101108 sequences
[M::worker_pipeline::3178.007*18.42] mapped 87860 sequences
[M::worker_pipeline::3207.291*18.43] mapped 83415 sequences
[M::worker_pipeline::3235.240*18.45] mapped 82980 sequences
[M::worker_pipeline::3263.770*18.46] mapped 85392 sequences
[M::worker_pipeline::3290.201*18.47] mapped 85983 sequences
[M::worker_pipeline::3318.045*18.48] mapped 55098 sequences
[M::worker_pipeline::3347.963*18.49] mapped 54844 sequences
[M::worker_pipeline::3375.439*18.50] mapped 55252 sequences
[M::worker_pipeline::3399.842*18.51] mapped 54720 sequences
[M::worker_pipeline::3427.431*18.52] mapped 62540 sequences
[M::worker_pipeline::3455.095*18.53] mapped 58892 sequences
[M::worker_pipeline::3483.205*18.54] mapped 56897 sequences
[M::worker_pipeline::3513.617*18.54] mapped 55813 sequences
[M::worker_pipeline::3541.396*18.56] mapped 56524 sequences
[M::worker_pipeline::3569.479*18.57] mapped 55836 sequences
[M::worker_pipeline::3592.692*18.58] mapped 55247 sequences
[M::worker_pipeline::3607.738*18.58] mapped 45017 sequences
[M::main] Version: 2.3-r531
[M::main] CMD: minimap2 -xava-ont -t20 -X both.fastq both.fastq
[M::main] Real time: 3608.342 sec; CPU: 67037.124 sec
urbe@urbo214b[allONT] ~/Tools/miniasm/miniasm -f both.fastq AONT.paf > AONT.gfa []
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::107.578*1.00] read 138602741 hits; stored 132126233 hits and 760720 sequences (8675939030 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::130.468*1.00] 745887 query sequences remain after sub
[M::ma_hit_cut::134.429*1.00] 128171424 hits remain after cut
[M::ma_hit_flt::137.921*1.00] 99545705 hits remain after filtering; crude coverage after filtering: 91.41
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::144.152*1.00] 743866 query sequences remain after sub
[M::ma_hit_cut::146.263*1.00] 99187462 hits remain after cut
[M::ma_hit_contained::149.341*1.00] 18263 sequences and 142921 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 129305 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 83030 arcs
[M::asg_arc_del_multi] removed 1833 multi-arcs
[M::asg_arc_del_asymm] removed 994 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 2053 tips
[M::asg_pop_bubble] popped 254 bubbles and trimmed 20 tips
[M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 1392 asymmetric arcs
[M::asg_arc_del_short] removed 4908 short overlaps
[M::asg_cut_tip] cut 549 tips
[M::asg_pop_bubble] popped 92 bubbles and trimmed 22 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 273 asymmetric arcs
[M::asg_arc_del_short] removed 325 short overlaps
[M::asg_cut_tip] cut 106 tips
[M::asg_pop_bubble] popped 15 bubbles and trimmed 6 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 191 asymmetric arcs
[M::asg_arc_del_short] removed 239 short overlaps
[M::asg_cut_tip] cut 54 tips
[M::asg_pop_bubble] popped 14 bubbles and trimmed 6 tips
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 28 internal sequences
[M::asg_cut_biloop] cut 45 small bi-loops
[M::asg_cut_tip] cut 1 tips
[M::asg_pop_bubble] popped 3 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 139 asymmetric arcs
[M::asg_arc_del_short] removed 147 short overlaps
[M::asg_cut_tip] cut 24 tips
[M::asg_pop_bubble] popped 1 bubbles and trimmed 0 tips
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.2-r168-dirty
[M::main] CMD: /home/urbe/Tools/miniasm/miniasm -f both.fastq AONT.paf
[M::main] Real time: 168.768 sec; CPU: 168.772 sec
- Poonam Mahapatra@poonam
Poonam Mahapatra 2165 days ago
I followed this to assemble the genome using ONT (nanopore reads)
Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)
We start with an all against all comparison:
minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
Then we can assemble
miniasm -f reads.fq reads.paf.gz > reads.gfa
Convert GFA to FASTA:
awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa
And then count how many contigs:
grep ">" reads.fa | wc -l
- Rahul Nayak@rahul
Rahul Nayak 2157 days ago
Here’s the quick and dirty of what was done:
1 Run minimap:
This uses a pre-built set of defaults (the ava-pb in the code below) for analyzing PacBio data. Minimap only accepts two FASTQ files and you need to map your FASTQ file against itself. So, if you have multiple FASTQ sequencing files, you have to concatenate them into a single file prior to running minimap.
minimap2 -x ava-pb -t 23 \ 20170911_oly_pacbio_cat.fastq \ 20170911_oly_pacbio_cat.fastq \ > 20170911_minimap2_pacbio_oly.paf
2 Run miniasm:
This uses your concatenated FASTQ file and the PAF file output from the miniasm step. The code below is taken from the example provided in the miniasm documentation; there are other options available.
miniasm \ -f \ /home/data/20170911_oly_pacbio_cat.fastq /home/data/20170911_minimap2_pacbio_oly.paf > /home/data/20170918_oly_pacbio_miniasm_reads.gfa
3 Convert miniasm output GFA to FASTA
The FASTA file is needed to re-run minimap in Step 4 below.
awk '$1 ~/S/ {print ">"$2"\n"$3}' 20170918_oly_pacbio_miniasm_reads.gfa > 20170918_oly_pacbio_miniasm_reads.fasta
4 Run minimap with default settings
Using the default settings maps the FASTQ reads back to the contigs (the PAF file) created in the fist step. These mappings are required for Racon assembly (Step 5).
minimap2 \ -t 23 \ 20170918_oly_pacbio_miniasm_reads.fasta 20170905_minimap2_pacibio_oly.paf > 20170918_minimap2_mapping_fasta_oly_pacbio.paf
5 Run racon
The output file is the FASTA file listed below.
racon -t 24 \ 20170911_oly_pacbio_cat.fastq \ 20170918_oly_pacbio_minimap_mappings.paf \ 20170918_oly_pacbio_miniasm_assembly.gfa \ 20170918_oly_pacbio_racon1_consensus.fasta
from Sam’s Notebook http://ift.tt/2fKBPUN
- Rahul Nayak@rahul
Rahul Nayak 2157 days ago

Must read paper https://genome.cshlp.org/content/27/5/737.full

Rahul Nayak 786 days ago

minimap2 –x ava-ont \
 ../../trimming_practical/nanofilt/nanofilt_trimmed.fastq \ 
 ../../trimming_practical/nanofilt/nanofilt_trimmed.fastq \
| gzip -1 > ./minimap.paf.gz

miniasm -f \
../../trimming_practical/nanofilt/nanofilt_trimmed.fastq \
./minimap.paf.gz > miniasm.gfa

awk ’/^S/{print “>”$2”\n”$3}’ miniasm.gfa > miniasm.fasta

assembly-stats ./miniasm.fasta

dnadiff -p dnadiff ~/course_data/precompiled/chr17.fasta miniasm.fasta

BOL

Jit

Navigation

Our Sponsors

Run miniasm assembler on nanopore reads !

Run miniasm assembler on nanopore reads !

Comments

1 Run minimap:

2 Run miniasm:

3 Convert miniasm output GFA to FASTA

4 Run minimap with default settings

5 Run racon