BOL: Related items

Taxoblast : Taxoblast is a pipeline to identify contamination in genomic sequence

Jit — Thu, 23 Nov 2017 08:37:15 -0600

Modern genome sequencing strategies are highly sensitive to contamination making the detection of foreign DNA sequences an important part of analysis pipelines. Here we use Taxoblast, a simple pipeline with a graphical user interface, for the post-assembly detection of contaminating sequences in the published genome of the kelp Saccharina japonica. Analyses were based on multiple blastn searches with short sequence fragments. They revealed a number of probable bacterial contaminations as well as hybrid scaffolds that contain both bacterial and algal sequences. This or similar types of analysis, in combination with manual curation, may thus constitute a useful complement to standard bioinformatics analyses prior to submission of genomic data to public repositories. Our analysis pipeline is open-source and freely available at http://sdittami.altervista.org/taxoblast and via SourceForge (https://sourceforge.net/projects/taxoblast).

Address of the bookmark: https://sourceforge.net/projects/taxoblast/files/

Ancestral sequence reconstruction steps !

Surabhi Chaudhary — Fri, 18 May 2018 08:28:26 -0500

Ancestral sequence reconstruction (ASR) – also known as ancestral gene/sequence reconstruction/resurrection – is a technique used in the study of molecular evolution. The method consists of the synthesis of an ancestral gene and expression of the corresponding ancestral protein. The idea of protein 'resurrection' was suggested in 1963 by Pauling and Zuckerkandl. Some early efforts were made in the eighties-nineties, led by the laboratory of Steven A. Benner, showing the potential of this technique – one that only started to be fulfilled in the post-genomic era. Thanks to the improvement of algorithms and of better sequencing and synthesis techniques, the method was developed further in the early 2000s to allow the resurrection of a greater variety of and much more ancient genes. Over the last decade, ancestral protein resurrection has developed as a strategy to reveal the mechanisms and dynamics of protein evolution.

BEAST is the best way to predict the ancestral structure. but, I suggest following steps?

1- Alignments "Mafft - http://mafft.cbrc.jp/alignment/software/source.html"

mafft --maxiterate 1000 --reorder --thread 24 --genafpair Dataset.fasta > Dataset_Alig.fasta

2- Your dataset has a good phylogenetic signal, is possible to perform with Tree-Puzzle "http://www.tree-puzzle.de";

3 - This dataset which the saturation index, I perform with "http://dambe.bio.uottawa.ca/dambe.asp";

4- Has evidence of possible recombination in your dataset, the evaluate if this presence or absence, because this may to influence the grouping of clades, I perform with

---recombination

4.1- Phi-test, implemented in SplitTree4"http://www.splitstree.org", (.nex file)

4.2- GARD deployed in webserver in the DataMonkey "http://www.datamonkey.org/" - turning to the amino acid seaview -> view proteins -> save as ...) Ideally do a tree-based groups.

4.3- RDP4 for download and installation on Windows in "http://web.cbio.uct.ac.za/~darren/rdp.html"

4.4- Hyphy (Mac, Windows, Linux) in "http://hyphy.org/w/index.php/Download"

4.5- Path-o-Gen (temporal structure of a tree input file -> arquivo.tre)

These steps above, I call of pre-processing to inferences phylogenetic...

5- Perform phylogenetic tree, used Bayesian Inference with Molecular Clock, but is necessary Clock Testing:

- This step is performed with program Beast (Beauti, Beast and TreeAnnotator), and Tracer_v1.5 more FigTree to inspection.

- Tutorials: http://beast.bio.ed.ac.uk/tutorials

- Downloads: http://beast.bio.ed.ac.uk/downloads

Genome sequence-based (sub-)species delineation.

Abhimanyu Singh — Wed, 12 Dec 2018 08:31:14 -0600

The GGDC web service reports digital DDH for a universal and accurate delineation of prokaryotic (sub-)species without inheriting the pitfalls of classic DDH, and also calculates differences in genomic G+C content.

http://ggdc.dsmz.de/ggdc_background.php#

Genome-to-Genome Distance Calculator 2.1

http://ggdc.dsmz.de/ggdc.php

Address of the bookmark: http://ggdc.dsmz.de/

TRITEX sequence assembly pipeline for Triticeae genomes

Jit — Tue, 20 Aug 2019 09:47:14 -0500

The pipeline is open-source and hosted in a public Bitbucket repository.

TRITEX has been run on highly inbred genotypes of barley (Hordeum vulgare), tetraploid wheat (Triticum turgidum) and hexaploid wheat (T. aestivum) with reasonable results: super-scaffold N50 values in the range of dozens of Mb and pseudomolecules with better gene space representation than a BAC-by-BAC assembly. It has never been tested and is not expected to work on heterozygous or autopolyploid genomes.

A protocol for generating chromosome-conformation capture sequencing (Hi-C) data suitable for use with the pipeline is described in Himmelbach et al. 2018. Refer to the technical notes of 10X Genomics on how to generate Chromium data.

Address of the bookmark: https://tritexassembly.bitbucket.io/

Shouji: a fast and efficient pre-alignment filter for sequence alignment

Jit — Mon, 04 Nov 2019 07:09:45 -0600

The ability to generate massive amounts of sequencing data continues to overwhelm the processing capacity of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes.

We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator design that adopts modern FPGA (field-programmable gate array) architectures to further boost the performance of our algorithm.

More at https://github.com/CMU-SAFARI/Shouji

Address of the bookmark: https://github.com/CMU-SAFARI/Shouji

AccessSyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies

Jit — Sat, 01 Feb 2020 13:38:49 -0600

AccessSyRI: finding genomic rearrangements andlocal sequence differences from whole-genome assemblies

SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1911-0

Address of the bookmark: https://github.com/schneebergerlab/syri

VICUNA: a software tool that enables consensus assembly of ultra-deep sequence derived from diverse viral or other heterogeneous populations.

biogeek — Tue, 25 Aug 2020 03:40:17 -0500

VICUNA is a de novo assembly program targeting populations with high mutation rates. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. For clinical samples rich in contamination (e.g., >95%), VICUNA can leverage existing genomes, if available, to assemble only target-alike reads. After initial assembly, it can also use existing genomes to perform guided merging of contigs. For each data set (e.g., Illumina paired read, 454), VICUNA outputs consensus sequence(s) and the corresponding multiple sequence alignment of constituent reads. VICUNA efficiently handles ultra-deep sequence data with tens of thousands fold coverage.

http://software.broadinstitute.org/viral/docs/vicuna_v1.0.pdf

Address of the bookmark: https://www.broadinstitute.org/viral-genomics/vicuna

The complete sequence of a human genome

Neel — Thu, 31 Mar 2022 23:58:18 -0500

The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987

Seal: SEquence ALignment evaluation suite

Jit — Wed, 03 Jan 2018 05:05:46 -0600

Seal is a comprehensive sequencing simulation and alignment tool evaluation suite. This software (implemented in Java) provides several utilities that can be used to evaluate alignment algorithms, including:

Reading a pre-existing reference genome from one or more FASTA files.
Alternatively, generating an artificial reference genome based on input parameters (length, repeat count, repeat length, repeat variability rate).
Simulating reads from random locations in the genome based on input parameters of read length, coverage, sequencing error rate, and indel rate.
Applying alignment tools to the genome and the reads through a standardized interface.
Parsing the output of the alignment tool and calculating the number of reads that were correctly or incorrectly mapped.
Computing run times and measures of accuracy.

Seal has interfaces to evaluate the following software packages:

Bowtie
BWA
MAQ
mrFAST
mrsFAST
Novoalign
SHRiMP
SOAPv2

Address of the bookmark: http://compbio.case.edu/seal/

Best Practices for Variant Calling with the GATK

biogeek — Sat, 22 Feb 2020 03:07:31 -0600

The presentations below were filmed during the March 2015 GATK Workshop, part of the BroadE Workshop series. At the time of this workshop, the current version of Broad’s Genome Analysis Toolkit (GATK) was version 3.3.

Genome Analysis Toolkit

03/19/15	Introduction to High-Throughput Sequencing data formats and methods	Joel Thibault	PDF	Video
03/19/15	Introduction to the GATK	Geraldine Van der Auwera	PDF	Video
03/19/15	Mapping, processing, and duplicate marking with Picard tools	Matt Sooknah	PDF	Video
03/19/15	Mapping and processing RNAseq	Ami Levy-Moonshine	PDF	Video
03/19/15	Indel realignment	Mark Fleharty	PDF	Video
03/19/15	Base quality score recalibration	David Roazen	PDF	Video
03/19/15	Introduction to variant discovery: calling cohorts	Louis Bergelson	PDF	Video
03/19/15	Variant calling and joint genotyping	Sheila Chandran	PDF	Video
03/19/15	Variant quality score recalibration	Bertrand Haas	PDF	Video
03/19/15	Introduction to working with variants	Yossi Farjoun	PDF	Video
03/19/15	Genotype refinement	Laura Gauthier	PDF	Video
03/19/15	Annotation and variant evaluation	David Benjamin	PDF	Video

Address of the bookmark: https://www.broadinstitute.org/partnerships/education/broade/best-practices-variant-calling-gatk-1