BOL: Related items

ASplice: a scalable and memory-efficient algorithm for de novo transcriptome assembly

Rahul Nayak — Tue, 03 Jul 2018 04:09:46 -0500

With increased availability of de novo assembly algorithms, it is feasible to study entire transcriptomes of non-model organisms. While algorithms are available that are specifically designed for performing transcriptome assembly from high-throughput sequencing data, they are very memory-intensive, limiting their applications to small data sets with few libraries. Texas A&M University researchers develop a transcriptome assembly algorithm that recovers alternatively spliced isoforms and expression levels while utilizing as many RNA-Seq libraries as possible that contain hundreds of gigabases of data. New techniques are developed so that computations can be performed on a computing cluster with moderate amount of physical memory. Availability – A software program that implements the algorithm is available at: http://faculty.cse.tamu.edu/shsze/asplice. Sze SH, Pimsler ML, Tomberlin JK, Jones CD, Tarone AM. (2017) A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms. BMC Genomics 18(Suppl 4):387.

Address of the bookmark: http://faculty.cse.tamu.edu/shsze/asplice/

MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Jit — Wed, 14 Nov 2018 04:50:27 -0600

MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. MEGAHIT can optionally utilize a CUDA-enabled GPU to accelerate its SdBG contstruction. The GPU-accelerated version of MEGAHIT has been tested on NVIDIA GTX680 (4G memory) and Tesla K40c (12G memory) with CUDA 5.5, 6.0 and 6.5. MEGAHIT v1.0 or greater also supports IBM Power PC and has been tested on IBM POWER8.

https://academic.oup.com/bioinformatics/article/31/10/1674/177884

Address of the bookmark: https://github.com/voutcn/megahit

NxRepair: error correction in de novo assemblies using Nextera Mate Pair Reads

BioStar — Thu, 24 Jan 2019 10:35:12 -0600

NxRepair is a python module that automatically detects large structural errors in de novo assemblies using Nextera mate pair reads. The decector will break a contig at the site of an identified misassembly and will generate a new fasta file containing both the corrected contigs and the correct, unaffected contigs.

https://nxrepair.readthedocs.io/en/latest/tutorial.html

nxrepair aligned_matepairs.bam assemblyfasta.fasta error_locations.csv new_fasta.fasta

Address of the bookmark: https://github.com/rebeccaroisin/nxrepair

Flye: Fast and accurate de novo assembler for single molecule sequencing reads

BioJoker — Tue, 02 Apr 2019 21:54:55 -0500

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.

Address of the bookmark: https://github.com/fenderglass/Flye

StringTie Transcript assembly and quantification for RNA-Seq

Jit — Tue, 09 Jun 2020 05:21:11 -0500

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only alignments of short reads that can also be used by other transcript assemblers, but also alignments of longer sequences that have been assembled from those reads. In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).

Address of the bookmark: https://ccb.jhu.edu/software/stringtie/

3D de novo assembly (3D DNA) pipeline

Jit — Sun, 02 Feb 2020 13:41:55 -0600

For a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see Genome Assembly Cookbook on http://aidenlab.org/assembly.

For the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in (Dudchenko et al., Science, 2017) see the original commit.

For the detailed description of the merge section see https://github.com/theaidenlab/AGWG-merge.

Address of the bookmark: https://github.com/theaidenlab/3d-dna

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads

BioStar — Fri, 27 Mar 2020 22:49:31 -0500

HiCanu, a significant modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering.

More at https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3

Address of the bookmark: https://github.com/marbl/canu

VICUNA: a software tool that enables consensus assembly of ultra-deep sequence derived from diverse viral or other heterogeneous populations.

biogeek — Tue, 25 Aug 2020 03:40:17 -0500

VICUNA is a de novo assembly program targeting populations with high mutation rates. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. For clinical samples rich in contamination (e.g., >95%), VICUNA can leverage existing genomes, if available, to assemble only target-alike reads. After initial assembly, it can also use existing genomes to perform guided merging of contigs. For each data set (e.g., Illumina paired read, 454), VICUNA outputs consensus sequence(s) and the corresponding multiple sequence alignment of constituent reads. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.

http://software.broadinstitute.org/viral/docs/vicuna_v1.0.pdf

Address of the bookmark: https://www.broadinstitute.org/viral-genomics/vicuna

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes

Jit — Wed, 10 Mar 2021 06:13:49 -0600

The pipeline can use information from scaffolded assemblies (for example from HiC or 10X Genomics), or even from diverged (~65-100 Mya) reference genomes for ordering the contigs and thus support the assembly process. This typically results in improved contig N50 when compared to current state of the art methods.

For smaller vertebrate genomes (~1 Gbp) chromosome scale assemblies can be achieved within 12h on high-end Desktop computers (Intel i7, 12 CPU threads, 128 GB RAM). Larger mammalian genomes (~3Gbp) can be processed within 15-18 h on server equipment (Xeon, 96 CPU threads, 1TB RAM).

Address of the bookmark: https://github.com/HMPNK/CSA2.6

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation

Neel — Sun, 03 Apr 2022 20:35:19 -0500

Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller’s internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.

More at https://www.nature.com/articles/s41592-022-01445-y

Address of the bookmark: https://github.com/arangrhie/merfin