BOL: Related items

Arvados

Martin Jones — Sat, 20 Sep 2014 16:54:21 -0500

Arvados is a free and open source bioinformatics platform for genomic and biomedical data. User can Store | Organize | Compute | Share the data for free.

Address of the bookmark: https://arvados.org/

merqury: Evaluate genome assemblies with k-mers

Jit — Fri, 03 Jul 2020 19:29:34 -0500

Often, genome assembly projects have illumina whole genome sequencing reads available for the assembled individual. The k-mer spectrum of this read set can be used for independently evaluating assembly quality without the need of a high quality reference. Merqury provides a set of tools for this purpose.

More at https://www.biorxiv.org/content/10.1101/2020.03.15.992941v1.full

Address of the bookmark: https://github.com/marbl/merqury

Ancient whole genome duplication (WGD) detection tools !

Rahul Nayak — Sun, 07 Mar 2021 00:32:44 -0600

There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.

At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:

GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline
https://peerj.com/articles/6303/
WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3
wgd: https:// github.com/arzwa/wgd
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
https://github.com/dfguan/purge_dups
https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1

This article introduces the usage of wgd.

Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software

BLAST
MCL
MUSCLE/MAFFT/PRANK
PAML
PhyML/FastTree
i-ADHoRe

But the good news is that most of the software it depends on can be installed with bioconda

conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich
conda activate wgd

Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

After that, the installation is much simpler

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install .
pip install git+https://github.com/arzwa/wgd.git
For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0

Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/

tar -zxvf i-adhore-3.0.01.tar.gz
cd i-adhore-3.0.01
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/
make -j 4
make insatall

Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)

Download the tutorial for CDS and GFF annotation files

mkdir -p wgd_tutorial && cd wgd_tutorial
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz
gunzip *.gz

First conda activate wgdstart our analysis environment, and then start the analysis

Step 1 : Use to wgd mclidentify homologous genes in the genome

wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out

Step 2 : Use to wgd ksdbuild Ks distribution

wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta

Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point

wgd syn --feature gene --gene_attribute ID \
-ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \
Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl

For more reading - There are 9 sub-modules in WGD

kde: KDE fitting to the Ks distribution
ksd: Ks distribution construction
mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.
mix: Hybrid modeling of Ks distribution.
pre: preprocess the CDS file
syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis
viz: draw histogram and density plot
wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn
wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD

Understanding kmer !

BioStar — Wed, 18 Aug 2021 04:27:51 -0500

What is a k-mer anyway? A k-mer is just a sequence of k characters in a string (or nucleotides in a DNA sequence). Now, it is important to remember that to get all k-mers from a sequence you need to get the first k characters, then move just a single character for the start of the next k-mer and so on. Effectively, this will create sequences that overlap in k-1 positions.

Address of the bookmark: https://bioinfologics.github.io/post/2018/09/17/k-mer-counting-part-i-introduction/

MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization

Neel — Tue, 30 Nov 2021 23:23:57 -0600

MitoZ, consisting of independent modules of de novo assembly, findMitoScaf (find Mitochondrial Scaffolds), annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads.

https://academic.oup.com/nar/article/47/11/e63/5377471

Address of the bookmark: https://github.com/linzhi2013/MitoZ

JRF/SRF at Jawaharlal Nehru Institute ofAdvanced Studies (JNIAS), Hyderabad

Fri, 31 Oct 2014 08:48:23 -0500

Applications for Academic Projects in Biotechnology, Bioinformatics, Environmental Sciences and Computer Science & Engineering

About JNIAS
Jawaharlal Nehru Institute of Advanced Studies (JNIAS), Hyderabad has been established by Dr. D. Swaminadhan Research Foundation (DSRF), Hyderabad as a Research and Educational Institution with a view to contribute in developing advanced technologies and build „core competence‟ in specific areas. The activities of JNIAS involves: Education, Research Training and Innovations in the fields of Sciences, Technologies, Humanities and Social Sciences. It aims to blossom into an Advanced Institute of education and research with a reservoir of expertise and experience in the relevant fields and the necessary capability to harness multi-disciplinary research and studies. JNIAS has been recognized as an Advanced Research Institute by Jawaharlal Nehru Technological University Hyderabad (JNTUH), Hyderabad and Jawaharlal Nehru Technological University Anantapur (JNTUA), for offering Ph.D., P.G M.Phil, P.G Diploma and Training Programmes in Sciences and Engineering & Technology.

Jawaharlal Nehru Architecture and Fine Arts University (JNAFAU) Hyderabad also recognized JNIAS for offering UG, PG degree in Architecture.

Projects & Facilities

JNIAS offers wide range of projects:

Biotechnology area:

Molecular Biology
Microbiology
Nanotechnology
Bioinformatics (Schrodinger Software)
In Silico studies & Drug Designing
Sequence analysis
Protein structure function studies

Registration
Tuition Fees: Interested students need to pay the following tuition fees:
1. Six Month’s Project: Rs. 20,000/-
2. Four Month’s Project: Rs. 15,000/-
3. Three Month’s Project: Rs. 10,000/-
4. One Month - Hands on Training : Rs. 8,000/-

For enquires call:
91-7893203414 (Biotechnology), 91-9949582263 (Environmental Sciences) 91-8977369305 (Computer Science)

Interested student may download the application from the website (www.jnias.in) and send the hard copy of the completed application forms and Curriculum Vitae along with the Demand Draft drawn on any nationalized Banks in favor of “The Registrar, JNIAS, Secunderabad”. Application forms can be sent through email to academicprojects@jnias.in

Address
Jawaharlal Nehru Institute of Advanced Studies (JNIAS)
6th Floor, Buddha Bhavan, M.G Road,
Secunderabad - 500 003
Andhra Pradesh, India
Tele/Fax: 040- 27541551; 27541553
Mobile: 08885541554
Web site: www.jnias.in

Brochure : https://drive.google.com/file/d/0B3zPwhgA-u-nU0dyMFd2OWcxNUpSTWNYc0xDSGs5UDI4UDNB/view?usp=sharing

maftools

Surabhi Chaudhary — Fri, 17 Dec 2021 03:18:28 -0600

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.

https://www.bioconductor.org/packages/devel/bioc/vignettes/maftools/inst/doc/maftools.html

Address of the bookmark: https://github.com/PoisonAlien/maftools

Google Genomics

Reshma Khatun — Fri, 17 Oct 2014 02:14:14 -0500

Google Genomics provides an API to store, process, explore, and share DNA sequence reads, reference-based alignments, and variant calls, using Google's cloud infrastructure.

Store alignments and variant calls for one genome or a million.
Process genomic data in batch by running principal component analysis or Hardy-Weinberg equilibrium, in minutes or hours, by using parallel computing frameworks like MapReduce.
Explore data by slicing alignments and variants by genomic range across one or multiple samples -- for your own algorithms or for visualization; or interactively process entire cohorts to find transition/transversion ratios, allelic frequency, genome-wide association and more using BigQuery.
Share genomic data with your research group, collaborators, the broader community, or the public. You decide.

Google Genomics is implementing the API defined by the Global Alliance for Genomics and Health for visualization, analysis and more. Compliant software can access Google Genomics, local servers, or any other implementation.

Address of the bookmark: https://cloud.google.com/genomics/

Short-read assembly using Spades !

Abhimanyu Singh — Mon, 31 Jan 2022 07:18:16 -0600

If we only had Illumina reads, we could also assemble these using the tool Spades.

You can try this here, or try it later on your own data.

Get data

We will use the same Illumina data as we used above:

illumina_R1.fastq.gz: the Illumina forward reads
illumina_R2.fastq.gz: the Illumina reverse reads

Assemble

Run Spades:

spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o spades_assembly_all_illumina

-1 is input file of forward reads
-2 is input file of reverse reads
--careful minimizes mismatches and short indels
--cov-cutoff auto computes the coverage threshold (rather than the default setting, “off”)
-o is the output directory

Results

Move into the output directory and look at the contigs:

infoseq contigs.fasta

Research Scientist – National Institute of Cholera and Enteric Diseases

Wed, 22 Oct 2014 10:26:46 -0500

The following post is to be filled up on purely temporary basis under the project entitled "Second phase of Task Force Biomedical Informatics Center of ICMR" under Dr. Santasabuj Das, Scientist 'D' of this Institute:-

01. Scientist II 01
Essential: Ph.D. degree in Life Sciences from a recognized university along with a minimum of 2 years of research experience in Bioinformatics as evidenced by publications in the peer reviewed journals.

OR
Ph.D. degree in Bioinformatics from a recognized university.

OR
M.Sc. in Bioinformatics from a recognized university along with a minimum of 3 years of research experience in Bioinformatics as evidenced by publications in the peer reviewed journals.

Desirable:
Thorough Knowledge about In silico genome analysis and comparative genomics.
Experience with in silico identification of novel virulence factors of pathogens, host-pathogen interactions and Systems Biology.
Additional Postdoctoral research experience in relevant subjects from a recognized institutions.

Rs. 44,000/- p.m. (consolidated) plus 30% HRA

Below 40 years

Applications along with Bio-Data containing detail work experience and full list of publications may be sent via email tosantasabujdas@yahoo.com latest by October 27, 2014.

Short-listed candidates will be called via email for an interview to be held at the institute in the second week of November, 2014.

Advertisement: www.niced.org.in/placements.htm