BOL: Related items

StringTie Transcript assembly and quantification for RNA-Seq

Jit — Tue, 09 Jun 2020 05:21:11 -0500

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only alignments of short reads that can also be used by other transcript assemblers, but also alignments of longer sequences that have been assembled from those reads. In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).

Address of the bookmark: https://ccb.jhu.edu/software/stringtie/

3D de novo assembly (3D DNA) pipeline

Jit — Sun, 02 Feb 2020 13:41:55 -0600

For a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see Genome Assembly Cookbook on http://aidenlab.org/assembly.

For the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in (Dudchenko et al., Science, 2017) see the original commit.

For the detailed description of the merge section see https://github.com/theaidenlab/AGWG-merge.

Address of the bookmark: https://github.com/theaidenlab/3d-dna

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads

BioStar — Fri, 27 Mar 2020 22:49:31 -0500

HiCanu, a significant modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering.

More at https://www.biorxiv.org/content/10.1101/2020.03.14.992248v3

Address of the bookmark: https://github.com/marbl/canu

Supernova: generates phased, whole-genome de novo assemblies from a Chromium-prepared library.

Jit — Sun, 31 May 2020 01:59:30 -0500

Supernova generates phased, whole-genome de novo assemblies from a Chromium-prepared library.

Please see Achieving Success with De Novo Assembly and System Requirements before creating your Chromium libraries for assembly.

Supernova should be run using 38-56x coverage of the genome.
• Somewhat higher coverage is sometimes advantageous.
• Supernova will exit if it finds that coverage is far from the recommended range.
• Note that at most 2.14 billion reads are allowed.
• Please note that we have not extensively tested genomes larger than human, and any genome above approximately 4 GB should be considered experimental and is not supported.

Address of the bookmark: https://support.10xgenomics.com/de-novo-assembly/software/pipelines/latest/using/running

auN: a new metric to measure assembly contiguity

Jit — Tue, 02 Aug 2022 01:18:47 -0500

Given a de novo assembly, we often measure the “average” contig length by N50. N50 is neither the real average nor median. It is the length of the contig such that this and longer contigs cover at least 50% of the assembly. A longer N50 indicates better contiguity. We can similarly define Nx such that contigs no shorter than Nx covers x% of the assembly. The Nx curve plots Nx as a function of x, where x is ranged from 0 to 100.

Address of the bookmark: https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity

The Genome 10K Project

Tue, 29 Jul 2014 09:11:04 -0500

https://genome10k.soe.ucsc.edu The Genome 10K project aims to assemble a genomic zoo—a collection of DNA sequences representing the genomes of 10,000 vertebrate species, approximately one for every vertebrate genus. The trajectory of cost reduction in DNA sequencing suggests that this project will be feasible within a few years. Capturing the genetic diversity of vertebrate species would create an unprecedented resource for the life sciences and for worldwide conservation efforts. The growing Genome 10K Community of Scientists (G10KCOS), made up of leading scientists representing major zoos, museums, research centers, and universities around the world, is dedicated to coordinating efforts in tissue specimen collection that will lay the groundwork for a large-scale sequencing and analysis project.

liftover

Jitendra Narayan — Mon, 08 Feb 2016 15:45:03 -0600

Convenient conversions between genome assemblie. The liftover package makes it easy to remap genomic coordinates to a different genome assembly.

More at https://github.com/aaronwolen/liftover

https://www.bioconductor.org/help/workflows/liftOver/

Address of the bookmark: https://github.com/aaronwolen/liftover

RNA-Seq De novo Assembly Using Trinity

Surabhi Chaudhary — Wed, 23 Mar 2016 05:53:46 -0500

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

Inchworm assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes.

More at https://github.com/trinityrnaseq/trinityrnaseq/wiki

......................................................................................................................................

Download Trinity here.

Build Trinity by typing 'make' in the base installation directory.

Assemble RNA-Seq data like so:

 Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G

Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta'

Address of the bookmark: https://github.com/trinityrnaseq/trinityrnaseq/wiki

RACA: Reference-Assisted Chromosome Assembly

Priya Singh — Wed, 06 Apr 2016 09:29:50 -0500

Rreference-Assisted Chromosome Assembly (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads.

http://www.ncbi.nlm.nih.gov/pubmed/23307812

http://bioen-compbio.bioen.illinois.edu/RACA/

Address of the bookmark: http://bioen-compbio.bioen.illinois.edu/RACA/

DISCOVAR

Abhimanyu Singh — Mon, 18 Apr 2016 11:59:16 -0500

DISCOVAR is a new variant caller and DISCOVAR de novo a new genome assembler, both designed for state-of-the-art data. Their inputs are chosen to optimize quality while keeping costs low. Currently it takes as input Illumina reads of length 250 or longer — produced on MiSeq or HiSeq 2500 — and from a single PCR-free library. These data enable a level of completeness and continuity that was not previously possible.

DISCOVAR can call variants on a region by region basis, potentially tiling an entire large genome. DISCOVAR variant calling is under active development and transitioning to VCF.

DISCOVAR de novo can generate de novo assemblies for both large and small genomes. It currently does not call variants.

More at https://www.broadinstitute.org/software/discovar/blog/?page_id=14

Address of the bookmark: https://www.broadinstitute.org/software/discovar/blog/