BOL: Related items

pbmm2:A minimap2 frontend for PacBio native data formats

BioStar — Tue, 18 Feb 2020 03:36:22 -0600

pbmm2 is a SMRT C++ wrapper for minimap2's C API. Its purpose is to support native PacBio in- and output, provide sets of recommended parameters, generate sorted output on-the-fly, and postprocess alignments. Sorted output can be used directly for polishing using GenomicConsensus, if BAM has been used as input to pbmm2. Benchmarks show that pbmm2 outperforms BLASR in sequence identity, number of mapped bases, and especially runtime. pbmm2 is the official replacement for BLASR.

Address of the bookmark: https://github.com/PacificBiosciences/pbmm2

WGS Celera Assembler version 8.3rc2

Jit — Mon, 10 Apr 2017 04:45:40 -0500

These are release notes for Celera Assembler version 8.3rc2, which was released on May 24, 2015.

This distribution package provides a stable, tested, documented version of the software. The distribution is usable on most Unix-like platforms, and some platforms have pre-compiled binary distributions ready for installation.

The source code package includes full source code (revision 4627), Makefiles, and scripts. A subset of the kmer package (http://kmer.sourceforge.net/, version r1994), used by some modules of Celera Assembler, is included. This distribution includes [http://samtools.sourceforge.net/ SAMtools], [http://www.cbcb.umd.edu/software/jellyfish/ Jellyfish 2.0], [https://github.com/pbjd/pbutgcns PBUTGCNS], [https://github.com/PacificBiosciences/pbdagcon PBDAGCON], [https://github.com/PacificBiosciences/BLASR BLASR], and parts of the [https://github.com/PacificBiosciences/FALCON/tree/v0.1.3 Falcon assembler].

Full documentation can be found online at http://wgs-assembler.sourceforge.net/.

Interesting scripts within it

urbe@urbo214b[bin] ls []
-rwxrwxr-x 1 urbe urbe 11K Apr 10 11:41 addCNSToStore
-rwxrwxr-x 1 urbe urbe 575K Apr 10 11:41 addReadsToUnitigs
-rwxrwxr-x 1 urbe urbe 128K Apr 10 11:41 analyzeBest
-rwxrwxr-x 1 urbe urbe 257K Apr 10 11:41 analyzePosMap
-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 analyzeScaffolds
-rwxrwxr-x 1 urbe urbe 224K Apr 10 11:41 asmOutputFasta
-rwxrwxr-x 1 urbe urbe 448K Apr 10 11:41 asmOutputStatistics
-rwxrwxr-x 1 urbe urbe 2,4K Apr 10 11:41 asmToAGP.pl
-rwxrwxr-x 1 urbe urbe 7,6M Apr 10 11:41 blasr
-rwxrwxr-x 1 urbe urbe 1,6M Apr 10 11:41 bogart
-rwxrwxr-x 1 urbe urbe 183K Apr 10 11:41 bogus
-rwxrwxr-x 1 urbe urbe 272K Apr 10 11:41 bogusness
-rwxrwxr-x 1 urbe urbe 247K Apr 10 11:41 buildPosMap
-rwxrwxr-x 1 urbe urbe 213K Apr 10 11:41 buildRefContigs
-rwxrwxr-x 1 urbe urbe 990K Apr 10 11:41 buildUnitigs
-rwxrwxr-x 1 urbe urbe 18K Apr 10 11:41 ca2ace.pl
-rwxrwxr-x 1 urbe urbe 12K Apr 10 11:41 caqc_help.ini
-rwxrwxr-x 1 urbe urbe 61K Apr 10 11:41 caqc.pl
-rwxrwxr-x 1 urbe urbe 23K Apr 10 11:41 cat-corrects
-rwxrwxr-x 1 urbe urbe 24K Apr 10 11:41 cat-erates
-rwxrwxr-x 1 urbe urbe 1,9M Apr 10 11:41 cgw
-rwxrwxr-x 1 urbe urbe 1,4M Apr 10 11:41 cgwDump
-rwxrwxr-x 1 urbe urbe 204K Apr 10 11:41 chimChe
-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:40 chimera
-rwxrwxr-x 1 urbe urbe 220K Apr 10 11:41 classifyMates
-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:41 classifyMatesApply
-rwxrwxr-x 1 urbe urbe 215K Apr 10 11:41 classifyMatesPairwise
-rwxrwxr-x 1 urbe urbe 366K Apr 10 11:41 computeCoverageStat
-rwxrwxr-x 1 urbe urbe 9,8K Apr 10 11:41 convert-fasta-to-v2.pl
-rwxrwxr-x 1 urbe urbe 48K Apr 10 11:41 convertOverlap
-rwxrwxr-x 1 urbe urbe 119K Apr 10 11:41 convertSamToCA
-rwxrwxr-x 1 urbe urbe 20K Apr 10 11:41 convertToPBCNS
-rwxrwxr-x 1 urbe urbe 197K Apr 10 11:41 correct-frags
-rwxrwxr-x 1 urbe urbe 259K Apr 10 11:41 correct-olaps
-rwxrwxr-x 1 urbe urbe 520K Apr 10 11:41 correctPacBio
-rwxrwxr-x 1 urbe urbe 540K Apr 10 11:41 ctgcns
-rwxrwxr-x 1 urbe urbe 162K Apr 10 11:40 deduplicate
-rwxrwxr-x 1 urbe urbe 37K Apr 10 11:41 demotePosMap
-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 dumpCloneMiddles
-rwxrwxr-x 1 urbe urbe 124K Apr 10 11:41 dumpPBRLayoutStore
-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 dumpSingletons
-rwxrwxr-x 1 urbe urbe 171K Apr 10 11:41 erate-estimate
-rwxrwxr-x 1 urbe urbe 221K Apr 10 11:40 estimate-mer-threshold
-rwxrwxr-x 1 urbe urbe 1,5M Apr 10 11:41 extendClearRanges
-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 extendClearRangesPartition
-rwxrwxr-x 1 urbe urbe 205K Apr 10 11:40 extractmessages
-rwxrwxr-x 1 urbe urbe 7,2M Apr 10 11:41 falcon_sense
-rwxrwxr-x 1 urbe urbe 9,8K Apr 10 11:41 fastaToCA
-rwxrwxr-x 1 urbe urbe 124K Apr 10 11:40 fastqAnalyze
-rwxrwxr-x 1 urbe urbe 137K Apr 10 11:40 fastqSample
-rwxrwxr-x 1 urbe urbe 62K Apr 10 11:40 fastqSimulate
-rwxrwxr-x 1 urbe urbe 121K Apr 10 11:40 fastqSimulate-sort
-rwxrwxr-x 1 urbe urbe 246K Apr 10 11:40 fastqToCA
-rwxrwxr-x 1 urbe urbe 140K Apr 10 11:41 filterOverlap
-rwxrwxr-x 1 urbe urbe 341K Apr 10 11:40 finalTrim
-rwxrwxr-x 1 urbe urbe 228K Apr 10 11:41 fixUnitigs
-rwxrwxr-x 1 urbe urbe 147K Apr 10 11:40 fragmentDepth
-rwxrwxr-x 1 urbe urbe 29K Apr 10 11:41 fragsInVars
-rwxrwxr-x 1 urbe urbe 545K Apr 10 11:41 frgs2clones
-rwxrwxr-x 1 urbe urbe 398K Apr 10 11:40 gatekeeper
-rwxrwxr-x 1 urbe urbe 139K Apr 10 11:40 gatekeeperbench
-rwxrwxr-x 1 urbe urbe 167K Apr 10 11:40 gkpStoreCreate
-rwxrwxr-x 1 urbe urbe 147K Apr 10 11:40 gkpStoreDumpFASTQ
-rwxrwxr-x 1 urbe urbe 184K Apr 10 11:41 greedyFragmentTiling
-rwxrwxr-x 1 urbe urbe 1,6K Apr 10 11:41 greedy_layout_to_IUM
-rwxrwxr-x 1 urbe urbe 142K Apr 10 11:40 initialTrim
-rwxrwxr-x 1 urbe urbe 967K Apr 10 11:41 jellyfish
-rwxrwxr-x 1 urbe urbe 219K Apr 10 11:41 markRepeatUnique
-rwxrwxr-x 1 urbe urbe 273K Apr 10 11:40 markUniqueUnique
-rwxrwxr-x 1 urbe urbe 114K Apr 10 11:40 mercy
-rwxrwxr-x 1 urbe urbe 3,8K Apr 10 11:41 mergeqc.pl
-rwxrwxr-x 1 urbe urbe 422K Apr 10 11:40 merTrim
-rwxrwxr-x 1 urbe urbe 125K Apr 10 11:40 merTrimApply
-rwxrwxr-x 1 urbe urbe 376K Apr 10 11:40 meryl
-rwxrwxr-x 1 urbe urbe 176K Apr 10 11:41 metagenomics_ovl_analyses
-rwxrwxr-x 1 urbe urbe 297K Apr 10 11:41 olap-from-seeds
-rwxrwxr-x 1 urbe urbe 275K Apr 10 11:41 outputLayout
-rwxrwxr-x 1 urbe urbe 229K Apr 10 11:41 overlapInCore
-rwxrwxr-x 1 urbe urbe 144K Apr 10 11:40 overlap_partition
-rwxrwxr-x 1 urbe urbe 179K Apr 10 11:41 overlapStats
-rwxrwxr-x 1 urbe urbe 179K Apr 10 11:41 overlapStore
-rwxrwxr-x 1 urbe urbe 153K Apr 10 11:41 overlapStoreBucketizer
-rwxrwxr-x 1 urbe urbe 175K Apr 10 11:41 overlapStoreBuild
-rwxrwxr-x 1 urbe urbe 33K Apr 10 11:41 overlapStoreIndexer
-rwxrwxr-x 1 urbe urbe 48K Apr 10 11:41 overlapStoreSorter
-rwxrwxr-x 1 urbe urbe 604K Apr 10 11:40 overmerry
lrwxrwxrwx 1 urbe urbe 4 Apr 10 11:41 pacBioToCA -> PBcR
-rwxrwxr-x 1 urbe urbe 131K Apr 10 11:41 PBcR
-rwxrwxr-x 1 urbe urbe 2,9M Apr 10 11:41 pbdagcon
-rwxrwxr-x 1 urbe urbe 1,9M Apr 10 11:41 pbutgcns
-rwxrwxr-x 1 urbe urbe 201K Apr 10 11:40 remove_fragment
-rwxrwxr-x 1 urbe urbe 153K Apr 10 11:40 removeMateOverlap
-rwxrwxr-x 1 urbe urbe 2,5K Apr 10 11:41 replaceUIDwithName-fastq
-rwxrwxr-x 1 urbe urbe 1,2K Apr 10 11:41 replaceUIDwithName-posmap
-rwxrwxr-x 1 urbe urbe 1,3M Apr 10 11:41 resolveSurrogates
-rwxrwxr-x 1 urbe urbe 139K Apr 10 11:41 rewriteCache
-rwxrwxr-x 1 urbe urbe 232K Apr 10 11:41 runCA
-rwxrwxr-x 1 urbe urbe 88K Apr 10 11:41 runCA-dedupe
-rwxrwxr-x 1 urbe urbe 14K Apr 10 11:41 runCA-overlapStoreBuild
-rwxrwxr-x 1 urbe urbe 3,6K Apr 10 11:41 run_greedy.csh
-rwxrwxr-x 1 urbe urbe 297K Apr 10 11:40 sffToCA
-rwxrwxr-x 1 urbe urbe 13K Apr 10 11:40 show-corrects
-rwxrwxr-x 1 urbe urbe 557K Apr 10 11:41 splitUnitigs
-rwxrwxr-x 1 urbe urbe 1,4M Apr 10 11:41 terminator
drwxrwxr-x 2 urbe urbe 4,0K Apr 10 11:41 TIGR
-rwxrwxr-x 1 urbe urbe 526K Apr 10 11:41 tigStore
-rwxrwxr-x 1 urbe urbe 35K Apr 10 11:41 tracearchiveToCA
-rwxrwxr-x 1 urbe urbe 35K Apr 10 11:41 tracedb-to-frg.pl
-rwxrwxr-x 1 urbe urbe 44K Apr 10 11:41 trimFastqByQVWindow
-rwxrwxr-x 1 urbe urbe 18K Apr 10 11:40 uidclient
-rwxrwxr-x 1 urbe urbe 589K Apr 10 11:41 unitigger
-rwxrwxr-x 1 urbe urbe 42K Apr 10 11:40 upgrade-v8-to-v9
-rwxrwxr-x 1 urbe urbe 42K Apr 10 11:40 upgrade-v9-to-v10
-rwxrwxr-x 1 urbe urbe 854 Apr 10 11:41 utg2fasta
-rwxrwxr-x 1 urbe urbe 731K Apr 10 11:41 utgcns
-rwxrwxr-x 1 urbe urbe 561K Apr 10 11:41 utgcnsfix

Address of the bookmark: http://wgs-assembler.sourceforge.net/wiki/index.php/Main_Page

shovill: Assemble bacterial isolate genomes from Illumina paired-end reads

BioStar — Sat, 02 Jan 2021 07:05:36 -0600

Shovill is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time. Shovill also supports other assemblers like SKESA, Velvet and Megahit, so you can take advantage of the pre- and post-processing the Shovill provides with those too.

Address of the bookmark: https://github.com/tseemann/shovill

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

ALPACA: A hybrid strategy for assembly of genomic DNA shotgun sequencing reads.

Seema Singh — Mon, 30 Apr 2018 04:38:40 -0500

ALPACA requires Celera Assembler 8.3 or later. It is recommended to build Celera Assembler from source. (Why? The pre-built binaries CA_8.3rc1 and CA8.3rc2 will work for any large data set.

Detail paper at https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3927-8

Address of the bookmark: https://github.com/VicugnaPacos/ALPACA

Frequent Paired-end reads (PE 2x100) mapping command lines

Jit — Tue, 15 May 2018 08:59:29 -0500

bowtie2 -x hs37m -X 650 -q -1 r1.fq -2 r2.fq -S r12.bowtie2.sam

bwa aln hs37m.fa r1.fq > r1.sai && bwa aln hs37m.fa r2.fq > r2.sai \
&& bwa sampe hs37m r1.sai r2.sai r1.fq r2.fq > r12.bwa.sam

bwa bwasw ../index/bwa/hs37m.fa r12.fq > r12.bwasw.sam

gsnap -A sam -d hs37m r1.fq r2.fq > r12.gsnap.sam

novoalign -r Random -o SAM -f r1.fq r2.fq -i 500 50 -d hs37m-k14s3.novo > r12.novo.sam

smalt map -f samsoft -i 650 -o r12.smalt-k20s13.sam hs37m-k20s13 r1.fq r2.fq

stampy.py -g hs37m -h hs37m -o r12.stampy.sam -M r1.fq,r2.fq

soap -D hs37m.fa.index -a r1.fq -b r2.fq -l 32 -g 3 -u dummy -2 dummy -o r12.soap

EAGLER: a scaffolding tool for long reads.

Jit — Mon, 04 Jun 2018 05:26:03 -0500

EAGLER is a scaffolding tool for long reads. The scaffolder takes as input a draft genome created by any NGS assembler and a set of long reads. The long reads are used to extend the contigs present in the NGS draft and possibly join overlapping contigs. EAGLER supports both PacBio and Oxford Nanopore reads.

The tool should be compatible with most UNIX flavors and has been successfully tested on the following operating systems:

Mac OS X 10.11.1
Mac OS X 10.10.3
Ubuntu 14.04 LTS

https://bib.irb.hr/datoteka/844447.Diplomski_2015_Luka_terbi.pdf

Address of the bookmark: https://github.com/mculinovic/EAGLER

JBrowse: Embeddable genome browser built completely with JavaScript and HTML5

Jit — Fri, 29 Jun 2018 09:19:56 -0500

JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl. Headline Features: Fast, smooth scrolling and zooming. Explore your genome with unparalleled speed. Scales easily to multi-gigabase genomes and deep-coverage sequencing. Quickly open and view data files on your computer without uploading them to any server. Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with either .tbi or .idx index), REST, and more. BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed. Includes an optional “faceted” track selector (see demo) suitable for large installations with thousands of tracks. Very light server resource requirements. In fact, JBrowse has no back-end server code, just tools for formatting data files to be read directly over HTTP. Serve huge datasets from a single low-cost cloud instance. Can run as a stand-alone app on OSX and Windows using the Electron platform Highly extensible plugin architecture, with a large plugin registry of existing examples here https://gmod.github.io/jbrowse-registry https://jbrowse.org/

Address of the bookmark: https://github.com/GMOD/jbrowse

SimLoRD: A read simulator for third generation sequencing reads

Aaryan Lokwani — Wed, 22 Aug 2018 10:40:27 -0500

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Reads are simulated from both strands of a provided or randomly generated reference sequence.

The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)
The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length
Quality values and number of passes depend on fragment length.
Provided subread error probabilities are modified according to number of passes
Outputs reads in FASTQ format and alignments in SAM format

Address of the bookmark: https://bitbucket.org/genomeinformatics/simlord/

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads

Rahul Nayak — Fri, 19 Oct 2018 08:23:42 -0500

Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top–down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom–up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data

Address of the bookmark: https://sourceforge.net/projects/bio-rainbow/files/