BOL: Related items

5700 year-old human genome !

Jit — Thu, 19 Dec 2019 11:22:18 -0600

A Landmark in genomics, scientists have done something that hasn't been done ever.

Scientists have reconstructed the genome of an ancient human who lived nearly 5,700 years ago in Southern Denmark from the birch pitch- an ancient tar-like substance.

By sequencing the sample, researchers not only discovered the ancient human DNA but also microbial DNA reflecting the oral microbiome of the person who chewed the pitch, along with plant and animal DNA that could be the recent meal she might have consumed.

The DNA sample is comparable in quality to well-preserved teeth and skull bones. The DNA suggests that the chewer was a female, most likely with dark skin, dark brown hair and blue eyes.

https://www.nature.com/articles/s41467-019-13549-9

Artistic reconstruction. (Tom Björklund)

More at https://gizmodo.com/scientists-reconstruct-lola-after-finding-her-dna-in-1840481633

mutatrix: a population genome simulator which generates simulated genomes.

Jit — Tue, 28 Jan 2020 04:06:58 -0600

genome simulation across a population with zeta-distributed allele frequency, snps, insertions, deletions, and multi-nucleotide polymorphisms

More at https://github.com/ekg/mutatrix

./mutatrix -S sample -P test/ -p 2 -n 10 reference.fasta

Address of the bookmark: https://github.com/ekg/mutatrix

China’s BGI says it can sequence a genome for just $100

Neel — Sat, 29 Feb 2020 04:49:43 -0600

Using technology originally acquired in the US, the Chinese gene giant BGI Group says it will make genome sequencing cheaper than ever, breaking the $100 barrier for the first time.

The Shenzhen company says the low cost will be possible with an “extreme” DNA sequencing system it plans to offer that is capable of decoding the genomes of 100,000 people a year.

Ref: https://www.technologyreview.com/s/615289/china-bgi-100-dollar-genome/

HASLR: a hybrid assembler which uses both second and third generation sequencing reads

BioStar — Mon, 04 May 2020 02:04:03 -0500

HASLR, a hybrid assembler which uses both second and third generation sequencing reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on all the samples compared to other tested assemblers. Furthermore, the generated assemblies in terms of contiguity and accuracy are on par with the other tools on most of the samples. Availability. HASLR is an open source tool available at https://github.com/vpc-ccg/haslr.

Address of the bookmark: https://github.com/vpc-ccg/haslr

Pollard Lab

Fri, 25 Sep 2020 20:20:50 -0500

We are a bioinformatics research lab focused on developing novel methods and using them to study genome evolution, organization, and regulation. Our mission is to decode biomedical knowledge that is missed without rigorous statistical approaches.

http://docpollard.org/

Tools

http://docpollard.org/resources/software/

Ancient whole genome duplication (WGD) detection tools !

Rahul Nayak — Sun, 07 Mar 2021 00:32:44 -0600

There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.

At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:

GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline
https://peerj.com/articles/6303/
WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3
wgd: https:// github.com/arzwa/wgd
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
https://github.com/dfguan/purge_dups
https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1

This article introduces the usage of wgd.

Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software

BLAST
MCL
MUSCLE/MAFFT/PRANK
PAML
PhyML/FastTree
i-ADHoRe

But the good news is that most of the software it depends on can be installed with bioconda

conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich
conda activate wgd

Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

After that, the installation is much simpler

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install .
pip install git+https://github.com/arzwa/wgd.git
For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0

Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/

tar -zxvf i-adhore-3.0.01.tar.gz
cd i-adhore-3.0.01
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/
make -j 4
make insatall

Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)

Download the tutorial for CDS and GFF annotation files

mkdir -p wgd_tutorial && cd wgd_tutorial
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz
gunzip *.gz

First conda activate wgdstart our analysis environment, and then start the analysis

Step 1 : Use to wgd mclidentify homologous genes in the genome

wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out

Step 2 : Use to wgd ksdbuild Ks distribution

wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta

Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point

wgd syn --feature gene --gene_attribute ID \
-ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \
Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl

For more reading - There are 9 sub-modules in WGD

kde: KDE fitting to the Ks distribution
ksd: Ks distribution construction
mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.
mix: Hybrid modeling of Ks distribution.
pre: preprocess the CDS file
syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis
viz: draw histogram and density plot
wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn
wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD

Calling variants in non-diploid systems

Neel — Sat, 26 Jun 2021 15:37:49 -0500

The main challenge associated with non-diploid variant calling is the difficulty in distinguishing between the sequencing noise (abundant in all NGS platforms) and true low frequency variants. Some of the early attempts to do this well have been accomplished on human mitochondrial DNA although the same approaches will work equally good on viral and bacterial genomes (Rebolledo-Jaramillo et al. 2014, Li et al. 2015).

Address of the bookmark: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html

MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization

Neel — Tue, 30 Nov 2021 23:23:57 -0600

MitoZ, consisting of independent modules of de novo assembly, findMitoScaf (find Mitochondrial Scaffolds), annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads.

https://academic.oup.com/nar/article/47/11/e63/5377471

Address of the bookmark: https://github.com/linzhi2013/MitoZ

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

Jit — Thu, 03 Feb 2022 04:01:55 -0600

chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.

USAGE:

-query: sequence A in fasta format
-db: sequence B in fasta format
-out: output matrix
-kmer Integer: k>1 (default 32) Use 32 for chromosomes and genomes and 16 for small bacteria
-diffuse Integer: z>0 (default 4) Use 4 for everything - if using large plant genomes you can try using 1
-dimension Size of the output matrix and plot. Integer: d>0 (default 1000) Use 1000 for everything that is not full genome size, where 2000 is recommended

Address of the bookmark: https://github.com/estebanpw/chromeister

The complete sequence of a human genome

Neel — Thu, 31 Mar 2022 23:58:18 -0500

The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.

Address of the bookmark: https://www.science.org/doi/10.1126/science.abj6987