BOL: Related items

Online resources on must-read papers in evolutionary biology

BioStar — Fri, 26 Jul 2024 01:39:14 -0500

Online resources on must-read papers in evolutionary biology, for a literature club.

Below is a summary of all answers that we received.

All the best,

Jana and Xiaoyan

1.       *Nick Barton:*

- The textbook "Evolution" by Nick Barton, with resources for
  exploring the literature: Barton, N. H., Briggs, D. E. G., Eisen, J.
  A., Goldstein, D. B., & Patel, N. H. (2007). Evolution. Cold Spring
  Harbor Laboratory Press.

- Papers from a course named "Classics in Evolutionary Biology":

Evolutionary Synthesis
1. Haldane, J. B. S. 1932. The causes of evolution. Longmans. New York.
   (esp. Ch. IV).
2. Fisher, R. A. 1930. The genetical theory of natural selection. Oxford
   University Press, Oxford. Selected Sections - Fundamental Theorem.

Genetic Variation
1a. Lewontin, R. C., and J. L. Hubby. 1966. A molecular approach to
the study of genic heterozygosity in natural populations. II. Amount
of variation and degree of heterozygosity in natural populations of
Drosophila pseudoobscura. Genetics. 54:595-609.

1b. Sachidandam et al. 2001. A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. 409: 928-33.

2. Wright S., Dobzhansky T., Hovanitz W. 1942 Genetics of natural
populations VII The allelism of lethals in the third chromosome of
Drosophila pseudoobscura. Genetics 27: 363-394.

Recombination and evolution
1. Hill, W. G., and A. Robertson. 1966. The effect of linkage on limits
to artificial selection. Genet. Res. 8:269-294.

2. Maynard Smith and Haigh. 1974. The hitch-hiking effect of a favourable
gene. Genet. Res. 23: 23-35.

Understanding sequence variation
1. Begun D. J., Aquadro C. F., 1992 Levels of naturally occurring DNA
polymorphism correlate with recombination rate in Drosophila melanogaster.
Nature 356: 519-520.

2. Green R. E., Reich D., Pääbo S., 2010 A draft sequence of the
Neandertal genome. Science 328: 710-722.

Quantitative Genetics:  variation in complex traits
1. Galton F., 1877 Typical laws of heredity. Nature 15: 492-495-
512-514- 532-533.

2. Turelli M., 1984 Heritable genetic variation via
mutation-selection balance: Lerch's Zeta meets the abdominal
bristle. Theor. Popul. Biol. 25: 138-193.

Quantitative Genetics:  finding the genes
1. Shrimpton A. E., Robertson A., 1988 The Isolation of polygenic factors
controlling bristle score in Drosophila melanogaster II Distribution of
third chromosome bristle effects within chromosome sections. Genetics
118: 445-459.

2. Boyle E. A., Li Y. I., Pritchard J. K., 2017 An expanded view of
complex traits: from polygenic to omnigenic. Cell 169: 1177-1186.

Neutral Evolution
1. Kimura, M. 1968. Evolutionary rate at the molecular level. Science.
217:624-626.

2a. Kern A. D., Hahn M. W., 2018 The Neutral Theory in Light of Natural
Selection. Molecular Biology and Evolution 110: 21077-6.

2b. Jensen J. D., Payseur B. A., Stephan W., Aquadro C. F., Lynch M.,
Charlesworth D., Charlesworth B., 2018 The importance of the Neutral Theory
in 1968 and 50 years on: a response to Kern and Hahn 2018. Evolution 112:
2109-4.

2c. Ellegren & Galtier. 2016. Determinants of genetic diversity. Nature
Reviews Genetics.

Mutation and Genetic Variability
1. Luria, S. E., and M. Delbrück. 1943. Mutations of Bacteria from Virus
Sensitivity to Virus Resistance. Genetics. 28(6):491-511.

2. Hill, W G. 1982. "Rates of Change in Quantitative Traits From Fixation
of New Mutations." Proceedings of the National Academy of Sciences (U.S.A.)
79: 142-45.

Testing for selection
1. McDonald & Kreitman. 1991. Adaptive protein evolution at the Adh locus
in Drosophila. Nature.

2. Begun, et al. Mol. Biol. Evol. 16, 1816-1819 (1999).

3. Siddiq et al. 2016. Experimental test and refutation of a classic case
of molecular adaptation in Drosophila melanogaster.  Nature Ecology &
Evolution.

The shifting balance
1. Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and
selection in evolution. Proceedings of the VI International Congress of
Genetics: 1. pp 356-366.

2. Coyne, J.A., N.H. Barton, and M. Turelli. 1997. A critique of Wright's
shifting balance theory of evolution.  Evolution 51: 643-671.

3. Barton. 2016. Sewall Wright on Evolution in Mendelian Populations and
the "Shifting Balance". Genetics.

Evolution of Sex
1.  Muller, H.J. 1964. The relation of recombination to mutational advance.
Mutation Res. 1(1):2-9

2. McDonald et al. 2016. Sex speeds adaptation by altering the dynamics of
molecular evolution. Nature.

Kin Selection, Cooperation, and Conflict
1. Hamilton, W. D. 1964. The genetical evolution of social behaviour I.
Journal of Theoretical Biology. 7:1-52.

2. Trivers, R. L. 1974 Parent-offspring conflict. American Zoologist.
14(1):249-264.

Sexual Selection
1. Zahavi, A. 1975. Mate selection - a selection of a handicap. J. Theor.
Biol. 53:205-214.

2. Kirkpatrick, M., and Ryan, M.J. 1991. The evolution of mating
preferences and the paradox of the lek. Nature. 350:33-38.

Fitness Landscapes
1. Dean, A. 1995. A Molecular Investigation of Genotype by Environment
Interactions. Genetics. 139:19-33.

2. Costanzo et al. 2010. The Genetic Landscape of a Cell. Science.

Speciation
1. Coyne, J. A., and H. A. Orr. 1989. Patterns of speciation in Drosophila.
Evolution. 43:362-381.

2. Corbett-Detig et al. 2013. Genetic incompatibilities are widespread
within species. Nature.

2.       *Marcos Antezana:*

Valen, L. v. 1975. Energy and Evolution. University of Chicago, Department
of Biology.

3.       *Remco Folkertsma:*

1. The work by Hopi Hoekstra on local adaptation and oldfield mice

2. Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., Müller,
I., ... & Wolf, J. B. (2014). The genomic landscape underlying phenotypic
integrity in the face of gene flow in crows. Science, 344(6190), 1410-1414.

4.       *Joshka Kaufmann and Leslie Turner*

They offer us a link to 'papers every evolutionary biologist should read',
the papers are collected by Leslie Turner.
https://static1.squarespace.com/static/53e8cb7ce4b02c4bc3aeeee4/t/5ab8fcb670a6ad55c67fcdf4/1522072758665/EvoBioClassicsRefList.pdf

5.       *Sarah Stockwell*

Matt Ridley collected classic papers in evolutionary biology and printed
part of these papers in his book Evolution (see Matt Ridley. Evolution
(Univ. of Oxford Press, 2nd edition, 2004))

Awk for Bioinformatician and computational biologist

Poonam Mahapatra — Tue, 06 Feb 2018 14:54:35 -0600

Awk is a programming language which allows easy manipulation of structured data and is mostly used for pattern scanning and processing. It searches one or more files to see if they contain lines that match with the specified patterns and then perform associated actions. The basic syntax is:

awk '/pattern1/ {Actions}
/pattern2/ {Actions}' file

The working of Awk is as follows
Awk reads the input files one line at a time.
For each line, it matches with given pattern in the given order, if matches performs the corresponding action.
If no pattern matches, no action will be performed.
In the above syntax, either search pattern or action are optional, But not both.
If the search pattern is not given, then Awk performs the given actions for each line of the input.
If the action is not given, print all that lines that matches with the given patterns which is the default action.
Empty braces with out any action does nothing. It wont perform default printing operation.
Each statement in Actions should be delimited by semicolon.
Say you have data.tsv with the following contents:

$ cat data/test.tsv
contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2 ACTTTATATATT
contig3 ACTTATATATATATA
contig4 ACTTATATATATATA
contig5 ACTTTATATATT
By default Awk prints every line from the file.

$ awk '{print;}' data/test.tsv
contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2 ACTTTATATATT
contig3 ACTTATATATATATA
contig4 ACTTATATATATATA
contig5 ACTTTATATATT
We print the line which matches the pattern contig3

$ awk '/contig3/' data/test.tsv
contig3 ACTTATATATATATA
Awk has number of builtin variables. For each record i.e line, it splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 5 words, it will be stored in $1, $2, $3, $4 and $5. $0 represents the whole line. NF is a builtin variable which represents the total number of fields in a record.

$ awk '{print $1","$2;}' data/test.tsv
contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2,ACTTTATATATT
contig3,ACTTATATATATATA
contig4,ACTTATATATATATA
contig5,ACTTTATATATT

$ awk '{print $1","$NF;}' data/test.tsv
contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2,ACTTTATATATT
contig3,ACTTATATATATATA
contig4,ACTTATATATATATA
contig5,ACTTTATATATT

Awk has two important patterns which are specified by the keyword called BEGIN and END. The syntax is as follows:

BEGIN { Actions before reading the file}
{Actions for everyline in the file}
END { Actions after reading the file }

For example,
$ awk 'BEGIN{print "Header,Sequence"}{print $1","$2;}END{print "-------"}' data/test.tsv
Header,Sequence
contig1,ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2,ACTTTATATATT
contig3,ACTTATATATATATA
contig4,ACTTATATATATATA
contig5,ACTTTATATATT
-------
We can also use the concept of a conditional operator in print statement of the form print CONDITION ? PRINT_IF_TRUE_TEXT : PRINT_IF_FALSE_TEXT. For example, in the code below, we identify sequences with lengths > 14:

$ awk '{print (length($2)>14) ? $0">14" : $0"<=14";}' data/test.tsv
contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG>14
contig2 ACTTTATATATT<=14
contig3 ACTTATATATATATA>14
contig4 ACTTATATATATATA>14
contig5 ACTTTATATATT<=14
We can also use 1 after the last block {} to print everything (1 is a shorthand notation for {print $0} which becomes {print} as without any argument print will print $0 by default), and within this block, we can change $0, for example to assign the first field to $0 for third line (NR==3), we can use:

$ awk 'NR==3{$0=$1}1' data/test.tsv
contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2 ACTTTATATATT
contig3
contig4 ACTTATATATATATA
contig5 ACTTTATATATT
You can have as many blocks as you want and they will be executed on each line in the order they appear, for example, if we want to print $1 three times (here we are using printf instead of print as the former doesn't put end-of-line character),

$ awk '{printf $1"\t"}{printf $1"\t"}{print $1}' data/test.tsv
contig1 contig1 contig1
contig2 contig2 contig2
contig3 contig3 contig3
contig4 contig4 contig4
contig5 contig5 contig5
Although, we can also skip executing later blocks for a given line by using next keyword:

$ awk '{printf $1"\t"}NR==3{print "";next}{print $1}' data/test.tsv
contig1 contig1
contig2 contig2
contig3
contig4 contig4
contig5 contig5

$ awk 'NR==3{print "";next}{printf $1"\t"}{print $1}' data/test.tsv
contig1 contig1
contig2 contig2

contig4 contig4
contig5 contig5
You can also use getline to load the contents of another file in addition to the one you are reading, for example, in the statement given below, the while loop will load each line from test.tsv into k until no more lines are to be read:

$ awk 'BEGIN{while((getline k <"data/test.tsv")>0) print "BEGIN:"k}{print}' data/test.tsv
BEGIN:contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
BEGIN:contig2 ACTTTATATATT
BEGIN:contig3 ACTTATATATATATA
BEGIN:contig4 ACTTATATATATATA
BEGIN:contig5 ACTTTATATATT
contig1 ACTGTCTGTCACTGTGTTGTGATGTTGTGTGTG
contig2 ACTTTATATATT
contig3 ACTTATATATATATA
contig4 ACTTATATATATATA
contig5 ACTTTATATATT
You can also store data in the memory with the syntax VARIABLE_NAME[KEY]=VALUE which you can later use through for (INDEX in VARIABLE_NAME) command:

$ awk '{i[$1]=1}END{for (j in i) print j"<="i[j]}' data/test.tsv
contig1<=1
contig2<=1
contig3<=1
contig4<=1
contig5<=1

bioinformatics workbook

biogeek — Tue, 05 Jan 2021 22:42:32 -0600

This books assumes that the reader has some knowledge of biology and basic understanding of the Unix command line. However, for the beginner, the appendix contains introductory material and tips/tricks for common bioinformatic problems, that is referred to for more information throughout the book.

https://bioinformaticsworkbook.org/

Address of the bookmark: https://bioinformaticsworkbook.org/

Understanding pango networks !

Abhi — Sat, 16 Oct 2021 14:02:36 -0500

In the vast majority of instances it is expected that Pango lineage names and designations will conform to the following rules. These rules also act as guidelines for the decisions made by the Lineage Designation Committee.

https://www.pango.network/the-pango-nomenclature-system/statement-of-nomenclature-rules/

https://www.pango.network/how-does-the-system-work/what-are-pango-lineages/

Reference paper

https://www.nature.com/articles/s41564-020-0770-5

Address of the bookmark: https://www.pango.network/the-pango-nomenclature-system/statement-of-nomenclature-rules/

Bioinformatics Training Material !

BioStar — Sat, 18 Mar 2023 11:26:18 -0500

Glittr is a curated list of bioinformatics training material.
All material is:

In a GitHub or GitLab repository
Free to use
Written in markdown or similar

NOTE: This list of courses is selected only based on the above criteria.
There are no checks on quality.

https://glittr.org/?per_page=25&sort_by=stargazers&sort_direction=desc

Address of the bookmark: https://glittr.org/?per_page=25&sort_by=stargazers&sort_direction=desc

3 days intensive course on Understanding 'omics data in Basel, Switzerland, 19-21st November

Mon, 23 Sep 2013 10:46:57 -0500

Benefits for the participants

- Plan more efficient experiments
- Correctly interpret results
- Communicate results in publications more effectively

The course focus is on methodologies, not on particular software tools. After the course participants should be able to apply the methods in their respective environment. However, during the course, hands-on sessions will be performed using the Genedata Expressionist® software, which enables participants to quickly apply the discussed methods and visualize results. No previous knowledge on Expressionist® is required; access to the software is free of charge during the course.

More @ http://www.dixa-fp7.eu/dixa-training/dixa-training-agenda/genedata-academy#!

Essentials of Statistics and Data Analysis using R

Mon, 31 Aug 2015 01:32:12 -0500

Clinical Development Services Agency (CDSA) is an extramural unit of Translational Health Science and Technology Institute (THSTI), Department of Biotechnology, Ministry of Science & Technology, Government of India. CDSA has a national mandate of strengthening capacity and capability building in the area of Clinical development and Translational Research.

CDSA is pleased to announce a 4 days hands-on training program on “Essentials of Statistics and Data Analysis using R” at ICGEB, Aruna Asaf Ali Road, New Delhi on December 1 – 4, 2015. This will involve developing and enhancing skills to understand basic principles of statistics for summarizing data and use of appropriate statistical tests as well as providing an understanding of data analysis using R. Didactic lectures with practical sessions will be delivered by experienced faculties from AIIMS and Novartis. Live classroom with power point presentations, case studies, mock exercise, practical sessions on R, group work with time for discussion and Q&A sessions are added advantages of this workshop.

Please contact gayatrivishwakarma.cdsa@thsti.res.in or vineetabaloni.cdsa@thsti.res.in for program and registration details.

Please nominate personage or register yourself on or before November 6, 2015 along with the electronic transfer of registration fee.

Integration Of Speciation Research : Workshop Announcement

Tue, 28 Apr 2026 07:07:57 -0500

We are excited to share that the ESEB- funded special topic network Integration Of Speciation Research (IOS - https://speciation-network.pages.ist.ac.at/) is hosting a second in-person workshop from 7â€“11 December 2026 at the Scottish Centre for Ecology & the Natural Environment (Glasgow, UK - https://www.gla.ac.uk/research/az/scene/).

This workshop is aimed at bringing together ~40 diverse speciation researchers:

- to collaborate on populating a database of published reproductive barriers based on a standardized RIO framework (https://ecoevorxiv.org/repository/view/10083/). This will involve working through papers during the workshop to extract RI measures and other metadata and entering them into a draft database (some preparatory work before the workshop may be requested to facilitate these steps during the workshop)

- to start working towards a manuscript using this database to answer an outstanding question in speciation

- to network, learn about reproductive isolation, and have fun!

If you are interested in applying to participate in the workshop, please fill out the form in the link below by ** May 20th **. Room & board will be covered by the organizers; all other travel costs are the responsibility of the attendee.

Application link:

https://docs.google.com/forms/d/e/1FAIpQLSenAMqSdZjeRmKEDtBNa5tpjPn7IukPyT5BzfZ4JMpIk2YOEw/viewform

The previous IOS workshop was held in Finland in 2023 (see more details at https://speciation-network.pages.ist.ac.at/workshops) and resulted in new interactions between speciation researchers and the successful publication of an Integration of Speciation research manuscript (https://academic.oup.com/evolinnean/article/3/1/kzae001/7609448).

We look forward to seeing you in Glasgow!

Omega2: metagenome assembly pipeline

Jit — Mon, 10 Jul 2017 05:56:07 -0500

Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.

Address of the bookmark: http://omega.omicsbio.org/

miniasm: very fast OLC-based de novo assembler for noisy long reads

Jit — Mon, 27 Nov 2017 07:58:49 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are PacBio E. coli sample, ERS473430, ERS544009, ERS554120, ERS605484, ERS617393, ERS646601, ERS659581, ERS670327, ERS685285, ERS743109 and a deprecated PacBio E. coli data set. ONT data are acquired from the Loman Lab.

For a C. elegans PacBio data set (only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the HGAP3produces a 104Mb assembly with N50 1.61Mb. This dotter plot gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.

Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that minimap can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as MHAP and DALIGNER. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.

Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)

We start with an all against all comparison:

minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz

Then we can assemble

miniasm -f reads.fq reads.paf.gz > reads.gfa

Convert GFA to FASTA:

awk '/^S/{print ">"$2"\n"$3}' reads.gfa | fold > reads.fa

And then count how many contigs:

grep ">" reads.fa | wc -l

# Download sample PacBio from the PBcR website
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz | tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
# Install minimap and miniasm (requiring gcc and zlib)
git clone https://github.com/lh3/minimap && (cd minimap && make)
git clone https://github.com/lh3/miniasm && (cd miniasm && make)
# Overlap
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 > reads.paf.gz
# Layout
miniasm/miniasm -f reads.fq reads.paf.gz > reads.gfa

Address of the bookmark: https://github.com/lh3/miniasm