BOL: Related items

proovread : large-scale high-accuracy PacBio correction through iterative short read consensus

Jit — Fri, 05 Jan 2018 04:12:20 -0600

proovread : large-scale high-accuracy PacBio correction through iterative short read consensus

outperforms PacBioToCA/LSC in terms of accuracy and contiguity/sensitivity (http://dx.doi.org/10.1093/bioinformatics/btu392)
is easy to install/run/configure
supports various types of dat
- HiSeq/MiSeq (100-500bp)
- Unitigs
- 454, ...

proovread maps high coverage data to pacbio reads (bwa mem, blasr, daligner) in multiple iterations.

Address of the bookmark: https://github.com/BioInf-Wuerzburg/proovread

HALC: High throughput algorithm for long read error correction

Jit — Fri, 08 Jun 2018 10:47:41 -0500

HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach) HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.

Address of the bookmark: https://github.com/lanl001/halc

pbmm2:A minimap2 frontend for PacBio native data formats

BioStar — Tue, 18 Feb 2020 03:36:22 -0600

pbmm2 is a SMRT C++ wrapper for minimap2's C API. Its purpose is to support native PacBio in- and output, provide sets of recommended parameters, generate sorted output on-the-fly, and postprocess alignments. Sorted output can be used directly for polishing using GenomicConsensus, if BAM has been used as input to pbmm2. Benchmarks show that pbmm2 outperforms BLASR in sequence identity, number of mapped bases, and especially runtime. pbmm2 is the official replacement for BLASR.

Address of the bookmark: https://github.com/PacificBiosciences/pbmm2

Next Generation Sequencing (NGS) Tutorials

Jitendra Narayan — Sat, 24 Aug 2013 06:01:37 -0500

Institute of computational biomedicine, Cornell University provide an NGS workshop tutorial at http://chagall.med.cornell.edu/NGScourse/

You can also add your favourite NGS educational material, or workshop tutorial by commenting on this bookmarks for user benefit.

Understanding the basics of genome sequencing:

Tutorial by Luke Jostins.

http://www.genetic-inference.co.uk/blog/2009/04/basics-sequencing-dna-part-1/

http://www.genetic-inference.co.uk/blog/2009/08/basics-sequencing-dna-part-2/

A window into third-generation sequencing

http://hmg.oxfordjournals.org/content/19/R2/R227.full.pdf

==============================================

NGS data analysis pipelines

Detecting and annotating genetic variations using the HugeSeq pipeline DOI: 10.1038/nbt.2134
NARWHAL, a primary analysis pipeline for NGS data http://bioinformatics.oxfordjournals.org/cgi/content/abstract/28/2/284?etoc
RseqFlow: Workflows for RNA-Seq data analysis DOI: 10.1093/bioinformatics/btr441
ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence 10.1186/1471-2164-12-285
A framework for variation discovery and genotyping using next-generation DNA sequencing data PubMed: 21478889
SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects DOI: 10.1186/1471-2105-12-134 Abstract: http://www.biomedcentral.com/1471-2105/12/134/abstract
WEP: a high-performance analysis pipeline for whole-exome data http://www.biomedcentral.com/1471-2105/14/S7/S11
DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data. http://www.ncbi.nlm.nih.gov/pubmed/23657089
GATK: a Toolkit for Genome Analysis http://www.broadinstitute.org/gatk/
Metagenomics:http://www.nbic.nl/education/nbic-phd-school/course-schedule/ngsmetagenomics/
RNASeq:http://www.nbic.nl/education/nbic-phd-school/course-schedule/ngsrnaseq/
Bioinformatics and Seq courses: http://www.isb-sib.ch/training/training-activities-schedule/archive-2013.html
Variant Detection (Model organism) Advanced tutorial https://docs.google.com/document/pub?id=1CuKkKylVDb03tnN7RSWl5EUzleetn0ctjmvaidPKLxM
Variant Detection Introductory tutorial https://docs.google.com/document/pub?id=1ZRzrjjOCvtAu3m-IKL-rbJ1f4On60dDL_IEwG7oejdI
Microbial de novo Assembly for Illumina Data Introductory tutorial https://docs.google.com/document/pub?id=1N3AB9ptISUu4zULqe1kXpVF0BDyGb5f5yzxWSJd_WNM
RNAseq Differential Gene Expression Introductory tutorial https://docs.google.com/document/pub?id=1KbTiBHtvHLfPRZ39AY3uriazrINA8TJzgjjwn1zPP7Y

" Please add your favourite NGS link below in comment section for the benefit of bioinformatics community ".

Address of the bookmark: http://chagall.med.cornell.edu/NGScourse/

TULIP - The Uncorrected Long read Itegration Pipeline

Jit — Thu, 23 Nov 2017 09:30:01 -0600

#Running TULIP (The Uncorrected Long-read Integration Process), version 0.4 late 2016 (European eel)

TULIP currently consists of to Perl scripts, tulipseed.perl and tulipbulb.perl. These are very much intended as prototypes, and additional components and/or implementations are likely to follow.
Tulipseed takes as input alignments files of long reads to sparse short seeds, and outputs a graph and scaffold structures. Tulipbulb adds long read sequencing data to these.

https://github.com/Generade-nl/TULIP

Address of the bookmark: https://github.com/Generade-nl/TULIP

PhyloHerb: Phylogenomic Analysis Pipeline for Herbarium Specimens

LEGE — Wed, 21 Feb 2024 06:15:13 -0600

What is PhyloHerb: PhyloHerb is a wrapper program to process genome skimming data collected from plant materials. The outcomes include the plastid genome (plastome) assemblies, mitochondrial genome assemblies, nuclear ribosomal DNAs (NTS+ETS+18S+ITS1+5.8S+ITS2+28S), alignments of gene and intergenic regions, and a species tree. It is designed to be a high throughput program dealing with lower quality data. Examples include low-coverage (5x cpDNA) plastome phylogeny, recycling plastid genes from target enrichment data, retrieving low-copy nuclear genes from medium coverage (5x nucDNA) genome skimming.

Address of the bookmark: https://github.com/lmcai/PhyloHerb/

Webinar on Streamlining large scale analysis using the Strand NGS Pipeline Manager on 24 Feb 2016

Yeshodari — Fri, 05 Feb 2016 06:43:28 -0600

Live Webinar on Streamlining large scale NGS data analysis using the Strand NGS Pipeline Manager on 24 Feb 2016

Abstract: Strand NGS includes comprehensive workflows for DNA-Seq, RNA-Seq, Small RNA-Seq, ChIP-Seq, MeDIP-Seq, and Methyl-Seq analysis. Each workflow includes a quality assessment and filter section, followed by a workflow-specific analysis section. The pipeline functionality in Strand NGS allows users to execute a sequence of analysis steps with specific parameters - all without any manual intervention. This simplifies the analysis in large scale sequencing projects where every sample needs to be processed identically.

In this webinar we will discuss the pre-packaged pipelines present in Strand NGS. The packaged pipelines have well-chosen default parameters and are suitable for users analyzing data for the first time in the tool. We will also show how advanced users can customize pipelines and share them with other Strand NGS users. Finally, we will show a brief glimpse of an elaborate pipeline that aligns reads, filters poor-quality matches, computes coverage metrics, identifies variants, checks for sample cross-contamination, and emails quality reports - all from within Strand NGS.

Speaker: Dr. Vamsi Veeramachaneni, Vice President - Bioinformatics, Strand Life Sciences

Details: Session 1: 2:30 PM IST, Session 2 : 10:30 PM IST
Register here: http://www.strand-ngs.com/webinar_registration

ARC: pipeline which facilitates iterative, reference guided de novo assemblies

Jit — Thu, 26 Jul 2018 09:20:26 -0500

ARC is a pipeline which facilitates iterative, reference guided de novo assemblies with the intent of:

Reducing time in analysis and increasing accuracy of results by only considering those reads which should assemble together.
Reducing/removing reference bias as compared to mapping based approaches.

The software is designed to work in situations where a whole-genome assembly is not the objective, but rather when the researcher wishes to assemble discreet 'targets' contained within next-generation shotgun sequence data. ARC decomplexifies the traditionally difficult problem of assembly by breaking the reads into small, manageable subsets which can then be assembled quickly and efficiently in parallel. Applications include those in which the researcher wishes to de novo assemble specific content and a set of semi-similar reference targets is available to initialize the assembly process.

https://ibest.github.io/ARC/

Address of the bookmark: https://ibest.github.io/ARC/

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates

Abhimanyu Singh — Thu, 16 May 2019 00:17:39 -0500

multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene-calling algorithm and assigns putative functions to gene calls using protein-, virus-, and phage-centric databases.

Address of the bookmark: https://github.com/carolzhou/multiPhATE

DeepVariant : an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Jit — Sat, 25 Jan 2020 13:28:09 -0600

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework.

https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html

https://www.biorxiv.org/content/10.1101/092890v6

Address of the bookmark: https://github.com/google/deepvariant