BOL: Related items

Ancient whole genome duplication (WGD) detection tools !

Rahul Nayak — Sun, 07 Mar 2021 00:32:44 -0600

There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.

At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:

GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline
https://peerj.com/articles/6303/
WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3
wgd: https:// github.com/arzwa/wgd
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
https://github.com/dfguan/purge_dups
https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1

This article introduces the usage of wgd.

Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software

BLAST
MCL
MUSCLE/MAFFT/PRANK
PAML
PhyML/FastTree
i-ADHoRe

But the good news is that most of the software it depends on can be installed with bioconda

conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich
conda activate wgd

Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

After that, the installation is much simpler

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install .
pip install git+https://github.com/arzwa/wgd.git
For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0

Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/

tar -zxvf i-adhore-3.0.01.tar.gz
cd i-adhore-3.0.01
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/
make -j 4
make insatall

Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)

Download the tutorial for CDS and GFF annotation files

mkdir -p wgd_tutorial && cd wgd_tutorial
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz
gunzip *.gz

First conda activate wgdstart our analysis environment, and then start the analysis

Step 1 : Use to wgd mclidentify homologous genes in the genome

wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out

Step 2 : Use to wgd ksdbuild Ks distribution

wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta

Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point

wgd syn --feature gene --gene_attribute ID \
-ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \
Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl

For more reading - There are 9 sub-modules in WGD

kde: KDE fitting to the Ks distribution
ksd: Ks distribution construction
mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.
mix: Hybrid modeling of Ks distribution.
pre: preprocess the CDS file
syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis
viz: draw histogram and density plot
wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn
wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD

Josefa González Lab

Thu, 19 Aug 2021 08:52:56 -0500

Lab focus on understanding how organisms adapt to their environments. They combine omics approaches with detailed molecular and phenotypic analyses to get a comprehensive picture of adaptation. Our aim at being internationally recognized as a leading lab in the field of environmental adaptation.
Lab share our passion for science with the general public by leading outreach projects aimed at increasing science awareness.

More at https://www.biologiaevolutiva.org/gonzalez_lab/

maftools

Surabhi Chaudhary — Fri, 17 Dec 2021 03:18:28 -0600

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.

https://www.bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html

Address of the bookmark: https://github.com/PoisonAlien/maftools

Comparative genomics visualisation tools !

Neel — Thu, 17 Feb 2022 05:37:55 -0600

Comparative genomics visualisation tools !

Address of the bookmark: https://cmdcolin.github.io/awesome-genome-visualization/?latest=true&selected=%23BRIG&tag=Comparative

The complete sequence of a human genome

Neel — Thu, 31 Mar 2022 23:58:18 -0500

The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987

PhyloHerb: A high‐throughput phylogenomic pipeline for processing genome skimming data

Abhi — Wed, 06 Sep 2023 00:14:28 -0500

Phylogenomic Analysis Pipeline for Herbarium Specimens

What is PhyloHerb: PhyloHerb is a wrapper program to process genome skimming data collected from plant materials. The outcomes include the plastid genome (plastome) assemblies, mitochondrial genome assemblies, nuclear ribosomal DNAs (NTS+ETS+18S+ITS1+5.8S+ITS2+28S), alignments of gene and intergenic regions, and a species tree. It is designed to be a high throughput program dealing with lower quality data. Examples include low-coverage (5x cpDNA) plastome phylogeny, recycling plastid genes from target enrichment data, retrieving low-copy nuclear genes from medium coverage (5x nucDNA) genome skimming.

License: GNU General Public License

Citation:

Cai, Liming, Hongrui Zhang, and Charles C. Davis. 2022. PhyloHerb: A high‐throughput phylogenomic pipeline for processing genome‐skimming data. Applications in Plant Sciences 10(3): 1–9. https://doi.org/10.1002/aps3.11475

Address of the bookmark: https://github.com/lmcai/PhyloHerb/

Regular Expression Cheat Sheet

Jitendra Narayan — Tue, 09 Jul 2013 17:38:42 -0500

The Regular Expression are the sole of Perl language, and for bioinformatician it is just a magical stick to resolve gingatic string data. We did not find any good and user friendly regular expression cheat sheet, hence write our own cheat sheet. The Regular Expressions Cheat Sheet, a quick reference guide for regular expressions, including symbols, ranges, grouping, assertions and some sample patterns to get you started.

Claus-Peter Stelzer Lab

Mon, 15 Mar 2021 15:24:41 -0500

Interested in various topics at the intersection of ecology and evolution. In my research I use rotifers as model organisms for experimental studies at the individual and population level. Rotifers are ideally suited for this, because populations of thousands can be kept in small containers in the lab, while single individuals can still be handled conveniently.

More at https://www.uibk.ac.at/limno/personnel/stelzer/index.html.en#research

TEDMED Great Challenges: Genomics and Medicine: Where promise meets clinical practice

Fri, 22 Nov 2013 12:05:32 -0600

November 21, 2013 - NHGRI Director Eric Green, M.D., Ph.D, hosted the TEDMED Google+ Hangout to discuss genomic medicine with an all-star cast that includes Carlos Bustamante, James Evans, Amy McGuire and Sharon Terry. More: http://www.tedmed.com/greatchallenges

Post-doctoral Research Assistant in Genetics

Thu, 05 Jun 2014 16:01:39 -0500

Post-doctoral Research Assistant in Genetics
Camden, North London
£31.1K per annum inclusive of London Weighting

This is a fixed term post for 36 months.

We wish to recruit a highly motivated, postdoctoral scientist to carry out a BBSRC funded project in the laboratory of Dr. Denis Larkin. The project is focused on developing and applying new algorithms to study genome and chromosome evolution in birds, mammals and other vertebrate species using whole-genome sequences and existing algorithms. The post holder will use cutting edge computational and laboratory approaches to generate chromosomal assemblies for sequenced genomes, study chromosomal structures and genome differences between bird and other vertebrate species in attempt to identify species- and clade-specific genome signatures.

Applicants must have a Ph.D. and a track record of success, as indicated by first-author publications in international journals. They must possess excellent organisation skills and be capable of individual initiative and of interacting as part of a team. Applicants with extensive practical experience in bioinformatics or computer science, programming, visualization, handling of large data sets, high-performance computing are encouraged to apply. The post will involve collaboration with a wide range of academic partners both within the UK, EU and worldwide. In addition to leading their own project the post holder will have opportunities to contribute to multiple international genome initiatives.

Experience in programming, bioinformatics and comparative genome analysis is essential. Applicants should have a minimum of a degree and preferably a higher degree in a relevant subject.

The Royal Veterinary College has the largest range of veterinary, para-veterinary and animal science undergraduate and postgraduate courses of any veterinary school in the world and is one of the largest veterinary schools in Europe.

Prospective applicants are encouraged to contact Dr. Denis Larkin, Comparative Biomedical Sciences Department on +442071211906 or email: dlarkin@rvc.ac.uk

We offer a generous reward package.

For further information and to apply on-line please visit our website: www.rvc.ac.uk
Job reference CBS-0025-14A

Closing date: 4 July 2014
Interviews are likely to be held in July 2014

We promote equality of opportunity and diversity within the workplace and welcome applications from all sections of the community.