BOL: Related items

The Helsinki Summer School on Mathematical Ecology and Evolution

Thu, 10 Mar 2022 01:06:28 -0600

https://wiki.helsinki.fi/display/BioMath/The+Helsinki+Summer+School+on+Mathematical+Ecology+and+Evolution+2022

This is the seventh school of a biennial series of international summer schools on mathematical ecology and evolution in Finland, organised by the Biomathematics Group of the University of Helsinki. The series of The Helsinki Summer School on Mathematical Ecology and Evolution is part of the EMS-ESMTB Schools in Applied Mathematics.

After the two-year break forced upon by the pandemic, we are looking forward to continue this series in August 2022, if only the covid situation permits.

Postdoctoral Scholar in Bacterial Evolution at Pathogen and Microbiome Institute at Northern Arizona University

Fri, 13 Dec 2024 12:49:16 -0600

We are pleased to announce a Postdoctoral Scholar position to study
bacterial evolution at the Pathogen and Microbiome Institute at
Northern Arizona University with Professor Paul Keim. The scholar
will have the opportunity also work with Professor Sam Sheppard at
The University of Oxford on joint projects. See our recent paper
on interspecific gene flow in Campylobacter. (DOI:
https://doi.org/10.1128/mbio.00581-24)

The job description: "This research position focuses on the science
of bacterial evolution. It will consist of researching theoretical
principles, but could include translational applications. Phylogenomic
and bioinformatic analysis of bacterial populations in nature or
in laboratory experiments will be a key component of the work. Prior
experience is an asset though training will be possible at PMI.
Likewise, laboratory microbiological, molecular, and biochemical
skills are an asset though not essential. Communication and critical
thinking skills are essential for performing the work and for
communicating to the local and international scientific communities.
Participating in team or independent grant writing to obtain research
funding will be required. Student mentoring is a part of the NAU
mission and is a partial expectation."

https://hr.peoplesoft.nau.edu/psp/ph92prta/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_APP_SCHJOB.GBL?Page=HRS_APP_JBPST&Action=U&FOCUS=Applicant&SiteId=1&JobOpeningId=608024&PostingSeq=1

Northern Arizona University is located in Flagstaff, Arizona, a
beautiful mountain town with a surprisingly vibrant restaurant
scene. Located a little over an hour from the Grand Canyon and ~45
min from Sedona, Flagstaff is a hiker's paradise. In fact, the city
of Flagstaff operates more than 50 miles of unpaved trails and there
are, on average, 266 sunny days per year with which to enjoy them.
At 7000 ft in elevation, Flagstaff experiences all four seasons,
but thesummers are mild and, in the winter, you can be on the ski
slopes within 30 min! https://www.flagstaffarizona.org/

As mentioned, joint projects with Professor Sheppard at Oxford
University are possible, including travel to his laboratory in the
United Kingdom. https://www.biology.ox.ac.uk/people/samuel-sheppard

Contact Information:
Paul.Keim@nau.edu

Paul S. Keim, Ph.D.
Regents Professor, &
Cowden Endowed Chair of Microbiology
Northern Arizona University
Flagstaff, AZ 86011-4073

Paul S Keim

Postdoctoral Fellow in Genomics and Comparative Genomics

Thu, 09 Apr 2026 02:12:32 -0500

Environnement de travail (Work environment):
The successful candidate will join a dynamic research group working
on the ecology and evolution of host'parasite'environment
interactions in non-model organisms, particularly snail vectors and
its trematode parasites. She/He will conduct genomic analyses aimed at
understanding host'parasite coevolution and the genetic architecture
of resistance in the invasive snail Pseudosuccinea columella to the
zoonotic parasite Fasciola hepatica. This thematic line is embedded
within the regional scientific project InvaSnail financed by the
ExposUM initiative from the Montpellier. The position is based in
Montpellier, a vibrant scientific hub in Southern France internationally
recognized for excellence in ecology and evolutionary biology. The IHPE
laboratory provides a collaborative research environment with access
to high-performance computing facilities, sequencing platforms, and
strong interdisciplinary interactions across research institutions in
the Montpellier area. University

Main mission:

Develop and implement strategies for whole-genome sequencing of non-model
species
Generate high-quality de novo genome assemblies using short- and long-read
sequencing technologies
Perform genome annotation and structural/functional characterization
Conduct comparative genomic analyses across related species or populations
Design and implement genome-wide association studies (GWAS) to identify
loci associated with phenotypic or adaptive traits
Integrate genomic, phenotypic, and environmental datasets
Contribute to the development of reproducible bioinformatics pipelines

ActivitÃ©s (Activities):

Lead the genomic component of the research project
High-molecular-weight DNA extraction optimization
Long-read genome assembly (PacBio HiFi / ONT)
Genome polishing and quality assessment (BUSCO, QUAST)
Structural and functional annotation
Variant discovery (SNPs, indels, SVs)
Population genomic analyses (FST, demographic inference)
Mixed-model GWAS accounting for structure
Workflow development (Snakemake/Nextflow)
HPC-based pipeline implementation
Publish results in peer-reviewed journals
Present findings at international conferences
Collaborate with experimental and computational team members
Contribute to project development
Mentor graduate students when appropriate

More at https://evol.mcmaster.ca/brian/evoldir/PostDocs//MontpellierU.ComparativeGenomics

Mulan: Multiple-sequence local alignment and visualization for studying function and evolution

Jit — Fri, 24 Aug 2018 09:50:01 -0500

Mulan: Multiple-sequence local alignment and visualization for studying function and evolution

Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540288/

What is the hologenome concept of evolution?

Jit — Wed, 03 Feb 2021 12:23:54 -0600

All multicellular organisms are colonized by microbes, but a gestalt study of the composition of microbiome communities and their influence on the ecology and evolution of their macroscopic hosts has only recently become possible. One approach to thinking about the topic is to view the host–microbiome ecosystem as a “holobiont”.

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6198262/

Useful link to teach evolution !

Abhi — Wed, 05 Oct 2022 18:29:30 -0500

Mimicry and other resources
Mimicry games:
Great Heliconius game:
http://heliconius.org/evolving_butterflies/
(See also 
https://royalsocietypublishing.org/doi/10.1098/rspb.2020.0014)
Other one, a bit less friendly:
https://ccl.northwestern.edu/netlogo/models/Mimicry
Camouflage practical
https://alexis-catherine.github.io/publication/natural-selection-and-camouflage/
(NetLogo also has one: 
https://ccl.northwestern.edu/netlogo/models/BugHuntCamouflage)
Peppered moth game:
https://askabiologist.asu.edu/peppered-moths-game/play.html

General resources
The always popular Populus:
https://cbs.umn.edu/populus/overview
Drift & Gene Flow 
https://cartwrig.ht/apps/genie/
(Cock van Oosterhout has a great ppt to lead students through this)
See also https://cartwrig.ht/apps/redlynx/
https://demonstrations.wolfram.com/ReplicatorMutatorDynamicsWithThreeStrategies/
NetLogo:
http://ccl.northwestern.edu/netlogo/models/index.cgi
Population Genetics:
https://www.radford.edu/~rsheehy/Gen_flash/popgen/
Evolution in general
https://evolution.berkeley.edu/evolibrary/home.php
Mitochondrial Eve:
https://projects.ncsu.edu/cals/gn/ex/mit-eve.html
Y chromosomes:
https://projects.ncsu.edu/cals/gn/ex/y-chrom.html
A professional online package from Michael Kasumovic:
https://arludo.com/
a compilation of resources:
https://planted.botany.org/index.php?P=Home
Finally, Donald Forsdyke has some great on-line videos explaining
evolutionary principles (occasionally in a fake Scottish accent):
http://post.queensu.ca/~forsdyke/videolectures.htm

Public Databases for Bioinformatics !

Jit — Tue, 23 Mar 2021 05:32:15 -0500

https://www.nature.com/articles/s41467-020-17155-y

Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"

For use case 1 we obtained the following ENCODE and ROADMAP datasets https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz, https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam, https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam. Blacklisted regions were obtained from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz. The human genome version hg38 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz.

For use case 2 we used the set of narrowPeak files summarized in https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt (archived version v1.0.1). The human genome version hg19 was obtained from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

For use case 3 we used the ENCODE datasets https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam, https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig, https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam as we as the GENCODE annotation v29 from ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz.

Address of the bookmark: http://mitra.stanford.edu/

Scripts for the analysis of HGT in genome sequence data.

Jit — Wed, 29 Nov 2017 16:44:10 -0600

Scripts for the analysis of HGT in genome sequence data

Address of the bookmark: https://github.com/reubwn/hgt

kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome

Jit — Fri, 08 Dec 2017 16:48:40 -0600

Sept. 20, 2017 Version 3.1 released. Major upgrade. Version 3.1 fixes the problems with SNP annotation that arose when NCBI discontinued use of GI numbers. Please read carefully the Preface (page 3) and the File of annotated genomes section (pages 9-10) in the version 3.1 User Guide. Thanks to Tom Slezak for revsing the get_genbank_file3 script and to Tod Stuber (USDA) for testing version 3.1 even though he doesn't need the annotation feature. All users are encouraged to upgrade to version 3.1.

Address of the bookmark: https://sourceforge.net/projects/ksnp/files/

String graph based genome assembly software and tools !

Rahul Nayak — Tue, 19 Dec 2017 17:17:38 -0600

In graph theory, a string graph is an intersection graph of curves in the plane; each curve is called a "string". String graphs were first proposed by E. W. Myers in a 2005 publication. In recent Genome Research paper describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the paper is coauthored by Jared Simpson, the developer of ABySS assembler and Richard Durbin. iii) Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called ‘string graph’.

Following are the genome assembly tools based on string graph:

1.SGA (String Graph Assembler) https://github.com/jts/sga

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.

2. SAGE: String-overlap Assembly of GEnomes https://github.com/lucian-ilie/SAGE2

SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

3. FSG: Fast String Graph

The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

4. BASE https://github.com/dhlbh/BASE

It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.

5. Fermi https://github.com/lh3/fermi/

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of unitigs to represent all the information in raw reads.

If you want to learn about String Graph assembler, please read the following papers -

i) The Fragment Assembly String Graph - E. W. Myers

This paper describes the String Graph concept.

ii) Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin

This earlier paper from Simpson and Durbin

iii) Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin