BOL: Related items

LibsDyogen: Libibrary for comparative genomics

Jit — Wed, 25 Dec 2019 01:32:39 -0600

Library of usual classes and functions written in python and used in the Dyogen team for comparative genomics applications.

Collaborative python library used in the DYOGEN teamfor studying the evolution of gene order in vertebrates.

http://www.ibens.ens.fr/?rubrique43&lang=fr

Address of the bookmark: https://github.com/DyogenIBENS/LibsDyogen

Bioinformaticians in comparative and evolutionary genomics

Tue, 02 Aug 2022 01:22:48 -0500

NBIS is now looking for a new member to support Swedish research in evolutionary, comparative, and population genomics, with a particular focus on conifer genomics.

Your tasks will consist of:

Advanced bioinformatics analyses within research projects across Sweden, including key involvement in a major research effort in conifer genomics.
Development of bioinformatics tools and workflows.
Educating other scientists in bioinformatics through collaboration within supported projects, teaching at national courses, and through participating in various networks.
Taking part in the continuous development of NBIS/SciLifeLab at a national level

More at https://www.uu.se/en/about-uu/join-us/details/?positionId=518909

List of comparative genomics resources !

Shruti Paniwala — Tue, 28 Jun 2022 04:08:06 -0500

3D-GENOMICS -- A Database to Compare Structural and Functional Annotations of Proteins between Sequenced Genomes

Compare structural and functional annotations of proteins between sequenced genomes.

ARED Organism -- expansion of ARED reveals AU-rich element cluster variations between human and mouse

View AREs in the human transcriptome and study the comparative genomics of AREs in model organisms.

ATGC -- Alignable Tight Genomic Clusters Database

Find information about orthologous genes in prokaryotes.

AnimalQTLdb -- a livestock QTL database tool set for positional QTL information mining and beyond

Search for publicly available QTL data on livestocks and animal species.

BGDB -- Bovine Genome Database

Find information about bovine genomics data.

COMPARE -- a multi-organism system for cross-species data comparison and transfer of information

A multi-organism web-based resource system designed to easily retrieve, correlate and interpret data across species.

CONDOR -- COnserved Non-coDing Orthologous Regions

A database resource of developmentally associated conserved non-coding elements.

CORG -- A database for COmparative Regulatory Genomics

Delineate conserved non-coding blocks from upstream regions of putative orthologous gene pairs from man, mouse, rat, fugu, Mus musculus, Danio rerio, and zebrafish.

COXPRESdb -- a database of coexpressed gene networks in mammals

Find coexpressed gene lists and networks in human and mouse.

CVTree -- A Phylogenetic Tree Reconstruction Tool Based on Whole Genomes

Construct phylogenetic tree of microorganisms based on oligopeptide content of their complete proteomes.

CleanEST -- the cleansed EST libraries database

A novel database server that classifies GenBank's dbEST (database of expressed gene sequences) libraries and removes contaminants.

CoCoa -- COefficient of COAncestry software

Find information about the ancestral relationship between genes.

CoGemiR -- a comparative genomics microRNA database

Provides an overview of the genomic organization of microRNAs and extent of conservation during evolution in different metazoan species.

Comparative Genometrics (CG) -- a database dedicated to biometric comparisons of whole genomes

Conduct comparative biometric analysis of chromosomes of different organisms.

DoTS -- Database Of Transcribed Sequences

Search for Indices of gene and transcripts in human and mouse.

DroSpeGe -- rapid access database for new Drosophila species genomes

Search and compare 12 new and old Drosophila genomes.

ECR Browser -- A Tool for Visualizing and Accessing Data from Comparisons of Multiple Vertebrate Genomes

Access to whole genome alignments of human, mouse, rat and fish sequences.

EPGD -- Eukaryotic Paralog Group Database

Find eukaryotic paralog/paralogon information.

EVOG -- evolutionary visualizer for overlapping genes

Analyze the evolutionary process of overlapping genes when comparing different species.

GNAT -- Inter-species gene mention normalization (ISGN)

The first publicly available system reported to handle inter-species gene mention normalization.

GenColors -- annotation and comparative genomics of prokaryotes made easy

A web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes.

GeneNest gene indices

Visualize gene indices of human, mouse, Arabidopsis, Zebrafish, Drosophila and Sheep.

GenomeTrafac -- a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs

Use comparative genomics approach to characterize gene models and identify putative cis-regulatory regions of RefSeq Gene Orthologs.

IKMC -- International Knockout Mouse Consortium web portal

Find information about mutated mouse genes.

IMG/M -- Integrated Microbial Genomes/Metagenomes

A data management and analysis system for metagenomes

ISED -- Influenza sequence and epitope database.

Search for influenza sequence, vaccine, and drug resistance information.

LAMDHI: The Search for Animal Models Starts Here

LAMHDI, the initiative to Link Animal Models to Human DIsease, is designed to accelerate the research process by providing biomedical researchers with a simple, comprehensive Web-based resource to find the best animal models for their research.

MANTIS -- a phylogenetic framework for multi-species genome comparisons

The missing link between multi-species full genome comparisons and functional analysis.

MBGD -- Microbial genome database for comparative analysis

Conduct comparative analysis of completely sequenced microbial genomes.

MEGA -- Molecular Evolutionary Genetics Analysis

A biologist-centric software for evolutionary analysis of DNA and protein sequences.

MamPol -- a database of nucleotide polymorphism in the Mammalia class

Conduct single nucleotide polymorphisms diversity measurements among homologous sequences from the Mammalia class.

MicrobesOnline -- Prokaryotic Genome Database

Find information about 1000s of microbial genomes.

Narcisse -- a mirror view of conserved syntenies

A database dedicated to the study of genome conservation.

OMA -- the Orthologous MAtrix project

Explore orthologous relations across 352 complete genomes.

OPTIC -- orthologous and paralogous transcripts in clades

Browse complete genomes in several clades.

OrthoDB -- the hierarchical catalog of eukaryotic orthologs

Find groups of orthologous genes.

OrthoMaM -- orthologous mammalian markers

A database of orthologous genomic markers for placental mammal phylogenetics.

PEDANT -- Protein Extraction, Description and ANalysis Tool

Conduct genome wide functional and structural analysis.

PReMod -- a database of genome-wide mammalian cis-regulatory module predictions

Conduct genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.

PhenomicDB -- Comparison of phenotypes of orthologous genes in human and model organisms

Compare phenotypes of a given gene or gene set in different model organisms.

Phylemon -- A suite of web tools for molecular evolution, phylogenetics and phylogenomics

Phylemon is a web server that integrates a selected suite of more than 20 different tools from the most popular stand-alone programs of phylogenetic and evolutionary analysis.

PhyloPat -- the phylogenetic pattern database

Use this database to see where in the evolution some phylogenetic lineages were started, and over which species they were contained.

Pristionchus.org -- a genome-centric database of the nematode satellite species Pristionchus pacificus

Search for genomic information on nematode satellite species Pristionchus pacificus.

ProtClustDB -- NCBI Protein Clusters Database

Find information about related protein sequences.

ProtozoaDB -- database of protozoan genomes

Database hosting genomics and post-genomics data from multiple protozoans.

Pseudofam -- the pseudogene families database

A database of pseudogene families based on the protein families from the Pfam database.

RIDM - RIKEN Integrated Database of Mammals

Find genomic information about mammals.

RegPrecise -- Regulon Prediction Database

Find information about predicted regulons in prokaryotic transcription regulation.

SALAD -- Surveyed contained motif ALignment diagram and the Associating Dendrogram

Perform systematic comparison of proteome data among species.

SGN -- SOL Genomics Network

A comparative map viewer dedicated to the biology of the Solanaceae family.

ShotgunFunctionalizeR -- R-package for functional comparison of metagenomes

Analyze data from functional analysis on fragmented microbial genetic material.

SnoopCGH -- Comparative Genomic Hybridization software

Visualize and explore comparative genomic hybridization data sets.

SwissRegulon -- a database of genome-wide annotations of regulatory sites

Search for genome-wide annotations of regulatory sites in yeast and prokaryotes genomes.

TaxonGap -- a visualization tool for intra- and inter-species variation among individual biomarkers

Compare and select individual biomarkers.

The Adaptive Evolution Database (TAED) -- a phylogeny based tool for comparative genomics

Search for information on adaptive evolution in gene families of higher plants and chordate.

The CGView Server -- a comparative genomics tool for circular genomes

Generate graphical maps of circular genomes that show sequence features, base composition plots, analysis results and sequence similarity plots.

The ERGO -- Genome analysis and discovery system

Conduct a comprehensive analysis of genes and genomes.

The Macaque Genome: Interactive Poster and Teaching Resource

An interactive online poster presentation on the Macaque genome, including high-quality images, video clips, and Web resources

The TIGR Gene Indices -- clustering and assembling EST and known genes and integration with eukaryotic genomes

Search for annotated genetic information of expressed sequence tags (ESTs) in different eukaryotic organisms.

UniGene

Find mapping and expression information for a unigene cluster (ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene)

Uprobe -- universal overgo hybridization-based probe retrieval and design

A public online resource for identifying or designing 'universal' overgo-hybridization probes from conserved sequences that can be used to efficiently screen one or more genomic libraries from a designated group of species.

VISTA -- Computational Tools for Comparative Genomics

Comprehensive suite of programs and databases for comparative analysis of genomic sequences.

cBARBEL -- Catfish Breeder and Researcher Bioinformatics Entry Location

Find information about ictalurid catfish.

eggNOG -- evolutionary genealogy of genes: Non-supervised Orthologous Groups

Discover orthologous groups of genes.

metaTIGER -- a metabolic gene evolution resource

Find metabolic networks and phylogenomic information on a taxonomically diverse range of eukaryotes.

xBASE -- a collection of online databases for bacterial comparative genomics

Conduct bacterial comparative genomics.

New born babies get ready to know their whole genome soon!!!

Rahul Agarwal — Thu, 05 Sep 2013 07:24:02 -0500

USA launch a pilot projects to examine medical information of newborn baby, which are being funded by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Human Genome Research Institute (NHGRI), both parts of the National Institutes of Health.

Awards of $5 million to four grantees have been made in fiscal year 2013 under the Genomic Sequencing and Newborn Screening Disorders research program. The program will be funded at $25 million over five years, as funds are made available.

"Hundreds of US babies will be pioneers in genomic medicine through a US$25-million programme to sequence their genomes soon after they are born."

Source:

http://blogs.nature.com/news/2013/09/scientists-to-sequence-hundreds-of-newborns-genomes.html

http://www.genome.gov/27554919

GOLD:Genomes Online Database

Jit — Wed, 26 Jul 2017 07:49:29 -0500

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

https://gold.jgi.doe.gov/

Address of the bookmark: https://gold.jgi.doe.gov/

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

Tools for bacterial whole genome annotation

Radha Agarkar — Sat, 16 Dec 2017 17:37:47 -0600

RAST – Web tool (upload contigs), uses the subsystems in the SEED database and provides detailed annotation and pathway analysis. Takes several hours per genome but I think this is the best way to get a high quality annotation (if you have only a few genomes to annotate).

Prokka – Standalone command line tool, takes just a few minutes per genome. This is the best way to get good quality annotation in a flash, which is particularly useful if you have loads of genomes or need to annotate a pangenome or metagenome. Note however that the quality of functional information is not as good as RAST, and you will need several extra steps if you want to do functional profiling and pathway analysis of your genome(s)… which is in-built in RAST.

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

PGAP: NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

BEACON (automated tool for Bacterial GEnome Annotation ComparisON), a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/.

BlastKOLA: Assigns K numbers to the user's sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

PAGIT: Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.

MAKER: A portable and easily configurable genome annotation pipeline. MAKER allows smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. It identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER's inputs are minimal and its ouputs can be directly loaded into a Generic Model Organism Database (GMOD). They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER is available for download and can be tested online via the MAKER Web Annotation Service (MWAS).

MyPro is a software pipeline for high-quality prokaryotic genome assembly and annotation. It was validated on 18 oral streptococcal strains to produce submission-ready, annotated draft genomes. MyPro installed as a virtual machine and supported by updated databases will enable biologists to perform quality prokaryotic genome assembly and annotation with ease.

Magic-BLAST: a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.

Jit — Tue, 26 Dec 2017 22:23:39 -0600

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.

Address of the bookmark: https://ncbi.github.io/magicblast/

MUMmer4: A fast and versatile genome alignment system

Jit — Sat, 03 Feb 2018 04:59:17 -0600

MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes;

Address of the bookmark: https://mummer4.github.io/