BOL: Related items

The Helsinki Summer School on Mathematical Ecology and Evolution

Thu, 10 Mar 2022 01:06:28 -0600

https://wiki.helsinki.fi/display/BioMath/The+Helsinki+Summer+School+on+Mathematical+Ecology+and+Evolution+2022

This is the seventh school of a biennial series of international summer schools on mathematical ecology and evolution in Finland, organised by the Biomathematics Group of the University of Helsinki. The series of The Helsinki Summer School on Mathematical Ecology and Evolution is part of the EMS-ESMTB Schools in Applied Mathematics.

After the two-year break forced upon by the pandemic, we are looking forward to continue this series in August 2022, if only the covid situation permits.

Postdoctoral Scholar in Bacterial Evolution at Pathogen and Microbiome Institute at Northern Arizona University

Fri, 13 Dec 2024 12:49:16 -0600

We are pleased to announce a Postdoctoral Scholar position to study
bacterial evolution at the Pathogen and Microbiome Institute at
Northern Arizona University with Professor Paul Keim. The scholar
will have the opportunity also work with Professor Sam Sheppard at
The University of Oxford on joint projects. See our recent paper
on interspecific gene flow in Campylobacter. (DOI:
https://doi.org/10.1128/mbio.00581-24)

The job description: "This research position focuses on the science
of bacterial evolution. It will consist of researching theoretical
principles, but could include translational applications. Phylogenomic
and bioinformatic analysis of bacterial populations in nature or
in laboratory experiments will be a key component of the work. Prior
experience is an asset though training will be possible at PMI.
Likewise, laboratory microbiological, molecular, and biochemical
skills are an asset though not essential. Communication and critical
thinking skills are essential for performing the work and for
communicating to the local and international scientific communities.
Participating in team or independent grant writing to obtain research
funding will be required. Student mentoring is a part of the NAU
mission and is a partial expectation."

https://hr.peoplesoft.nau.edu/psp/ph92prta/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_APP_SCHJOB.GBL?Page=HRS_APP_JBPST&Action=U&FOCUS=Applicant&SiteId=1&JobOpeningId=608024&PostingSeq=1

Northern Arizona University is located in Flagstaff, Arizona, a
beautiful mountain town with a surprisingly vibrant restaurant
scene. Located a little over an hour from the Grand Canyon and ~45
min from Sedona, Flagstaff is a hiker's paradise. In fact, the city
of Flagstaff operates more than 50 miles of unpaved trails and there
are, on average, 266 sunny days per year with which to enjoy them.
At 7000 ft in elevation, Flagstaff experiences all four seasons,
but thesummers are mild and, in the winter, you can be on the ski
slopes within 30 min! https://www.flagstaffarizona.org/

As mentioned, joint projects with Professor Sheppard at Oxford
University are possible, including travel to his laboratory in the
United Kingdom. https://www.biology.ox.ac.uk/people/samuel-sheppard

Contact Information:
Paul.Keim@nau.edu

Paul S. Keim, Ph.D.
Regents Professor, &
Cowden Endowed Chair of Microbiology
Northern Arizona University
Flagstaff, AZ 86011-4073

Paul S Keim

SOWDHAMINI Lab

Sun, 15 Sep 2013 09:19:12 -0500

Genome sequencing projects have enormous potential for benefiting human endeavors. However, just as acquiring a language's vocabulary does not enable one to speak it, databases that list the amino acid composition of proteins do not directly tell us much about these proteins' higher-level structure and function. The most productive way to indirectly exploit these databases has been to start with the small number of proteins that are fully-characterised and to assume that other "similar" proteins will have a related structure and function. Proteins with very similar amino acid sequence are "no-brainers", but the real test, which our group largely focuses on, is to detect the "essential" similarity in proteins whose non-critical sections have experienced random rearrangements during evolution. In such cases functionally similar proteins may have less than 25% sequence overlap.

More @ http://www.ncbs.res.in/sowdhamini/groups_sowdhamini.htm

Sidow Lab

Fri, 07 Dec 2018 09:06:30 -0600

We study mechanisms of cancer evolution by using state-of-the-art genomic approaches at the bench and in analysis. Accurate genome reconstruction is our other major area of interest. We also collaborate on important questions for which our expertise in genomics and computation is relevant. Arend's biosketch highlights some of our past contributions.

http://www.sidowlab.org/

"IdeasLab"* workshop !

Wed, 02 Feb 2022 06:13:48 -0600

A new, grant-funded opportunity seeks early career researchers interested
in life's origins: https://templetonideaslab.umbc.edu/

Applications are invited to an all-expenses paid position at a 5-day
"IdeasLab"* workshop to be held near Prague CZ in June 2022. Thirty
successful applicants will be drawn in equal number from the relevant
sectors biological evolution, A-Life anbd theoretical physics. The week's
activities will lead these thirty to form interdisciplinary teams which
each propose how they can advance frontiers of abiogenesis research. Up to
$5 million total funding will be available for developing these ideas
produced by the week's activities. Further details of the event,
including the online application form, are found at the link posted above

Unlocking Evolutionary Secrets: A Dive into Comparative Genomics Methods

LEGE — Tue, 20 May 2025 00:25:09 -0500

Comparative genomics is the art and science of comparing genomes—across species, within species, or even among individuals—to unravel evolutionary relationships, functional elements, and genetic adaptations. As sequencing technologies have advanced and genome databases have expanded, comparative genomics has become a cornerstone of modern biology, shedding light on everything from antibiotic resistance in bacteria to human disease genetics.

In this post, we’ll explore the core methods used in comparative genomics, the questions they help answer, and how they’re shaping our understanding of life.

1. Whole-Genome Alignment
Whole-genome alignment involves mapping the entire genome of one species to another. Tools like MUMmer, MAUVE, and LASTZ perform large-scale sequence alignments to detect conserved regions, rearrangements, insertions, and deletions.

Use Case:
Comparing human and chimpanzee genomes to identify evolutionary conserved sequences (ECS) and regions of divergence.

Key Challenges:
Handling repetitive sequences and genome rearrangements.

Computational complexity in large genomes.

2. Synteny and Collinearity Analysis
Synteny refers to conserved blocks of gene order across species. Tools like MCScanX, SynMap, or CHITRA (for visualizing synteny interactively) detect these blocks to understand chromosomal evolution.

Use Case:
Studying ancient genome duplications in plants.

Investigating chromosomal rearrangements in cancer genomes.

3. Ortholog and Paralog Detection
Orthologs are genes in different species that evolved from a common ancestor, while paralogs are genes duplicated within a genome. Identifying them is crucial for functional annotation and evolutionary studies.

Popular Tools:
OrthoFinder, Orthologous MAtrix (OMA), InParanoid, and EggNOG.

Use Case:
Functional prediction of uncharacterized genes based on orthologs in model organisms.

Tracing gene family evolution.

4. Phylogenomic Analysis
Phylogenomic methods combine phylogenetics and genomics to infer evolutionary trees based on genome-wide data. These methods can handle dozens to hundreds of genomes, using concatenated alignments or gene trees.

Tools:
RAxML, IQ-TREE, ASTRAL, Phylip, BEAST.

Use Case:
Resolving the evolutionary relationships between microbial species.

Studying speciation events.

5. Pan-Genome Analysis
The pan-genome consists of the core genome (shared by all strains) and the accessory genome (strain-specific genes). This is especially popular in microbial genomics.

Tools:
Roary, Panaroo, BPGA, PGAP.

Use Case:
Understanding virulence factor diversity in E. coli.

Designing broad-spectrum vaccines.

6. Comparative Transcriptomics
Comparing transcriptomes across species or conditions reveals conserved and unique expression patterns. RNA-seq data can be mapped to reference genomes to identify orthologous expression profiles.

Use Case:
Comparing stress response in extremophiles and model species.

Studying conserved regulatory networks.

7. Functional Element Comparison
Beyond genes, comparative genomics also targets non-coding regions—enhancers, promoters, miRNAs. Conservation across species often implies functional importance.

Tools:
PhastCons, GERP, phyloP (based on multiple alignments).

Use Case:
Detecting conserved non-coding elements in vertebrates.

Studying regulatory divergence in human evolution.

8. Horizontal Gene Transfer (HGT) Detection
In microbes, genes often jump across species boundaries. Comparative genomics can detect HGT by identifying genes that defy the expected phylogenetic pattern.

Tools:
HGTector, DarkHorse, AlienHunter, SIGI-HMM.

Use Case:
Tracing antibiotic resistance genes.

Exploring microbial adaptability in extreme environments.

Final Thoughts
Comparative genomics is a powerful lens to observe the diversity and unity of life. With a broad toolkit—from aligners to orthology pipelines, phylogenetic engines to visualization tools—it allows scientists to ask big questions: How did genomes evolve? What makes species unique? Where do new genes come from?

Whether you're studying extremophiles, building better crops, or exploring human ancestry, comparative genomics offers the methods to connect the dots across the tree of life.

SPAdes hybrid genome assembly

Jit — Mon, 27 Nov 2017 08:05:40 -0600

When you have both Illumina and Nanopore data, then SPAdes remains a good option for hybrid assembly - SPAdes was used to produce the B fragilis assembly by Mick Watson’s group.

Again, running spades.py will show you the options:

spades.py

This produces:

SPAdes genome assembler v3.10.1

Usage: /usr/local/SPAdes-3.10.1-Linux/bin/spades.py [options] -o 

Basic options:
-o          directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data
--meta                  this flag is required for metagenomic sample data
--rna                   this flag is required for RNA-Seq data
--plasmid               runs plasmidSPAdes pipeline for plasmid detection
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12          file with interlaced forward and reverse paired-end reads
-1            file with forward paired-end reads
-2            file with reverse paired-end reads
-s            file with unpaired reads
--pe<#>-12            file with interlaced reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-1             file with forward reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-2             file with reverse reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-s             file with unpaired reads for paired-end library number <#> (<#> = 1,2,..,9)
--pe<#>-    orientation of reads for paired-end library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--s<#>                file with unpaired reads for single reads library number <#> (<#> = 1,2,..,9)
--mp<#>-12            file with interlaced reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-1             file with forward reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-2             file with reverse reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-s             file with unpaired reads for mate-pair library number <#> (<#> = 1,2,..,9)
--mp<#>-    orientation of reads for mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--hqmp<#>-12          file with interlaced reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-1           file with forward reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-2           file with reverse reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-s           file with unpaired reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9)
--hqmp<#>-  orientation of reads for high-quality mate-pair library number <#> (<#> = 1,2,..,9;  = fr, rf, ff)
--nxmate<#>-1         file with forward reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--nxmate<#>-2         file with reverse reads for Lucigen NxMate library number <#> (<#> = 1,2,..,9)
--sanger              file with Sanger reads
--pacbio              file with PacBio reads
--nanopore            file with Nanopore reads
--tslr        file with TSLR-contigs
--trusted-contigs             file with trusted contigs
--untrusted-contigs           file with untrusted contigs

Pipeline options:
--only-error-correction runs only read error correction (without assembling)
--only-assembler        runs only assembling (without read error correction)
--careful               tries to reduce number of mismatches and short indels
--continue              continue run from the last available check-point
--restart-from      restart run with updated options and from the specified check-point ('ec', 'as', 'k', 'mc')
--disable-gzip-output   forces error correction not to compress the corrected reads
--disable-rr            disables repeat resolution stage of assembling

Advanced options:
--dataset             file with dataset description in YAML format
-t/--threads               number of threads
                                [default: 16]
-m/--memory                RAM limit for SPAdes in Gb (terminates if exceeded)
                                [default: 250]
--tmp-dir              directory for temporary files
                                [default: /tmp]
-k                 comma-separated list of k-mer sizes (must be odd and
                                less than 128) [default: 'auto']
--cov-cutoff             coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)
                                [default: auto-detect]

As you can see this is also a “pipeline” of tools that can be switched on or off. SPAdes takes quite a long time, so for the purposes of this practical, something like this may suffice:

spades.py -t 4 \
          -m 32 \
          -k 31,51,71 \
          --only-assembler \
          -1 miseq.1.fastq -2 miseq.2.fastq \
          --nanopore minion.fastq \
          -o hybrid_assembly

In turn, these parameters mean

use 4 threads
max memory is 32Gb
use 3 kmer values to build the de bruijn graph(s) - 31, 51 and 71
only run the assembler, not the correction algorithm (for speed)
read 1 and read 2 of the MiSeq data
the nanopore data
put the output in folder “hybrid_assembly”

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

Tools for bacterial whole genome annotation

Radha Agarkar — Sat, 16 Dec 2017 17:37:47 -0600

RAST – Web tool (upload contigs), uses the subsystems in the SEED database and provides detailed annotation and pathway analysis. Takes several hours per genome but I think this is the best way to get a high quality annotation (if you have only a few genomes to annotate).

Prokka – Standalone command line tool, takes just a few minutes per genome. This is the best way to get good quality annotation in a flash, which is particularly useful if you have loads of genomes or need to annotate a pangenome or metagenome. Note however that the quality of functional information is not as good as RAST, and you will need several extra steps if you want to do functional profiling and pathway analysis of your genome(s)… which is in-built in RAST.

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

PGAP: NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

BEACON (automated tool for Bacterial GEnome Annotation ComparisON), a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/.

BlastKOLA: Assigns K numbers to the user's sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

PAGIT: Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.

MAKER: A portable and easily configurable genome annotation pipeline. MAKER allows smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. It identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER's inputs are minimal and its ouputs can be directly loaded into a Generic Model Organism Database (GMOD). They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER is available for download and can be tested online via the MAKER Web Annotation Service (MWAS).

MyPro is a software pipeline for high-quality prokaryotic genome assembly and annotation. It was validated on 18 oral streptococcal strains to produce submission-ready, annotated draft genomes. MyPro installed as a virtual machine and supported by updated databases will enable biologists to perform quality prokaryotic genome assembly and annotation with ease.

Magic-BLAST: a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome.

Jit — Tue, 26 Dec 2017 22:23:39 -0600

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.

Address of the bookmark: https://ncbi.github.io/magicblast/