BOL: Related items

Mulan: Multiple-sequence local alignment and visualization for studying function and evolution

Jit — Fri, 24 Aug 2018 09:50:01 -0500

Mulan: Multiple-sequence local alignment and visualization for studying function and evolution

Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540288/

Proksee: in-depth characterization and visualization of bacterial genomes

LEGE — Tue, 09 May 2023 19:38:52 -0500

Proksee is an expert system for genome assembly, annotation and visualization. To begin using Proksee, provide a complete genome sequence, sequencing reads or a CGView/Proksee map JSON file.

Address of the bookmark: https://proksee.ca/

Protein function annotation and machine learning - UPMC - Paris, France

Sat, 02 Aug 2014 01:22:52 -0500

Protein function annotation and machine learning - UPMC - Paris, France

Job Description: We are interested in finding an excellent postdoc with interests in protein functional annotation, machine learning and computer grids. The position is open for 3.5 years at the Université Pierre et Marie Curie, in the heart of paris.

Research topic: Protein function annotation, multiple probabilistic models, domain architecture, machine learning, combinatorial optimization, computer grid.

Title: A novel integrative platform for large scale protein annotation that exploits a multitude of diversified probabilistic models in several protein signature databases.

We propose a novel integrated approach for large scale protein annotation that will exploit an unprecedented amount of genomic data as well as sophisticated machine learning techniques and combinatorial optimization approaches taking advantages of High Performance Computing (HPC) environments. The idea is to uncover as much as possible the evolutionary processes of protein sequences that took place throughout the whole tree of life and that affected the evolution of a protein family. We have already demonstrated in a previous work that the problem of functional annotation is inherent to the ability of uncovering such paths. Now, we shall extend this approach to large scale genome annotation by considering 11 different protein databases, constituted by about 10^9 protein sequences, and by producing a large pool of diversified probabilistic models coding for about 10^7 evolutionary protein pathways. Such models will be used to search for specific domains in genomes to be annotated. Our previous methodology needs to be fundamentally improved to deal with this large amount of biological data. In this project, we shall work on the algorithms to reduce the space of models and the search complexity, and we shall implement some important algorithmic changes towards the realization of a powerful integrated annotation tool.

Where: This project is run on the Laboratoire de Biologie Computationnelle et Quantitative UMR7238 CNRS-UPMC – Analytical Genomics team, headed by A.Carbone. It is co-advised with Pierre-Henri Wuillemin, Laboratoire d’Informatique de Paris 6 – Equipe DECISION.

Start date: September 1st, 2014
Contact Person: Alessandra Carbone
Contact: alessandra.carbone@lip6.fr

flo

Jitendra Narayan — Wed, 10 Feb 2016 10:52:32 -0600

flo - same species annotations lift over pipeline

Lift over is the process of transferring annotations from one genome assembly to another. Usually lift over is done because there is a new, improved genome assembly for the species and good quality annotations (maybe manually curated or experimentally verified) are available on the old assembly.

The idea is simple: align the new assembly with the old one (e.g., with BLAT), process the alignment data to define how a coordinate or coordinate range on the old assembly should be transformed to the new assembly (e.g., as a chain file), transform the coordinates (e.g., with liftOver).

https://github.com/wurmlab/flo

Address of the bookmark: https://github.com/wurmlab/flo

RASTtk : algorithm for building custom annotation pipelines and annotating batches of genomes

Abhi — Wed, 27 Apr 2016 11:07:59 -0500

The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

More at http://www.nature.com/articles/srep08365

Address of the bookmark: http://rast.nmpdr.org/

Prokka: tool for the rapid annotation of prokaryotic genomes

Jit — Mon, 06 Mar 2017 03:49:57 -0600

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Address of the bookmark: http://www.vicbioinformatics.com/software.prokka.shtml

Genome Annotation Transfer Utility (GATU)

Jit — Mon, 29 May 2017 05:54:53 -0500

Genome Annotation Transfer Utility (GATU) was designed to facilitate quick, efficient annotation of similar genomes using genomes that have already been annotated. For example, whenever a new strain of SARS coronavirus is sequenced, it is possible, using GATU, to automatically annotate the new strain using a previously-annotated strain of SARS CoV. This saves researchers from tedious manual annotation of these sequences.

The program utilizes tBLASTn and BLASTn algorithms to map genes from the reference genome (the annotated strain) to the new sequence (the unannotated strain). The goal is to annotate the majority of the new genome’s genes in a single step. ORFs present in the target genome and absent from the reference genome are also identified; these ORFs can be further analyzed using BLAST, VGO and BBB. Afterwards, they can either be accepted for/rejected from annotation. GATU can handle multiple-exon genes as well as mature peptides. Although it was designed for use with viral genomes, GATU can also be used to help annotate larger genomes (ie. bacterial genomes).

The output is saved in GenBank, XML, or EMBL file format.

Address of the bookmark: https://virology.uvic.ca/help/tool-help/help-books/genome-annotation-transfer-utility-gatu-documentation/

PASA: Gene Structure Annotation and Analysis

biogeek — Tue, 26 Dec 2017 21:14:03 -0600

PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.

Address of the bookmark: http://pasapipeline.github.io/

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

Jit — Tue, 14 Nov 2017 10:26:16 -0600

We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.

Availability and Implementation

The software is implemented in Python 3 and runs in both Python 2.7 and 3.4– on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/ under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/.

Address of the bookmark: https://dfast.nig.ac.jp/

ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data

Rahul Nayak — Tue, 03 Jul 2018 04:14:52 -0500

ChopStitch is a new method for finding putative exons and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also detects base substitutions in transcript sequences corresponding to sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are reported as splice graphs in dot output format.

Address of the bookmark: https://github.com/bcgsc/ChopStitch