BOL: Related items

Mike Ritchie Lab

Wed, 02 Oct 2013 15:25:45 -0500

Mike Ritchie Lab primary research focus is the detection of susceptibility genes for common diseases such as cancer, diabetes, hypertension, and cardiovascular disease, among others. The approaches will involve the development and application of new statistical methods with a focus on the detection of gene-gene interactions associated with human disease.

Gene expression and protein expression patterns between normal and non-normal tissues is a growing area of research that may lead to the identification of candidate genes for understanding the etiology of common, complex diseases.

Lab homepage @ http://ritchielab.psu.edu/ritchielab/

Mycology Research Resources for Bioinformaticians: Unlocking the Fungal Kingdom

Neel — Fri, 13 Dec 2024 11:21:45 -0600

Mycology, the study of fungi, is a field that bridges ecology, medicine, and biotechnology. With advancements in bioinformatics, researchers now have unprecedented opportunities to explore the fungal kingdom at molecular, genetic, and ecological levels. From understanding pathogenic fungi to harnessing fungal enzymes for industrial applications, the potential is vast.

To fully leverage these opportunities, bioinformaticians require specialized tools and databases. This blog highlights essential resources for mycology research, focusing on databases, tools, and platforms tailored for fungal biology.

1. Fungal Databases

1.1. MycoCosm

Website: MycoCosm
Developed by the DOE Joint Genome Institute, MycoCosm is a comprehensive portal for fungal genomics. It offers genomic and transcriptomic data for a wide range of fungi, including saprobes, pathogens, and symbionts.

Key Features: Genome browsers, comparative genomics tools, and functional annotations.
Best For: Large-scale studies on fungal evolution and ecology.

1.2. FungiDB

Website: FungiDB
FungiDB is an integrated genomic resource for fungal pathogens and non-pathogens. It provides access to genome sequences, transcriptomic data, and functional annotations.

Key Features: Advanced search options, BLAST, and pathway analysis tools.
Best For: Studying fungal pathogenesis and host-pathogen interactions.

1.3. Index Fungorum

Website: Index Fungorum
This nomenclatural database provides information on the scientific names of fungi. It’s an essential resource for taxonomists and researchers focused on fungal biodiversity.

Key Features: Taxonomic hierarchy and synonymy tracking.
Best For: Identifying and classifying fungal species.

1.4. UNITE

Website: UNITE
UNITE is a specialized database for fungal ITS (Internal Transcribed Spacer) sequences, often used in fungal identification and phylogenetics.

Key Features: Curated reference datasets and community annotations.
Best For: Environmental mycology and microbial ecology studies.

2. Analytical Tools

2.1. Funannotate

Repository: GitHub - Funannotate
Funannotate is a genome annotation tool designed for fungi. It supports tasks like gene prediction, functional annotation, and orthology analysis.

Best For: Annotating newly sequenced fungal genomes.

2.2. BUSCO (Benchmarking Universal Single-Copy Orthologs)

Website: BUSCO
BUSCO evaluates genome assembly and annotation completeness using orthologs. It includes a fungal-specific dataset.

Best For: Assessing the quality of fungal genome assemblies.

2.3. Pathogen-Host Interactions Database (PHI-base)

Website: PHI-base
PHI-base is a manually curated resource containing information on pathogen-host interactions, including fungal pathogens.

Best For: Exploring virulence factors and host-pathogen relationships.

3. Visualization Platforms

3.1. Cytoscape

Website: Cytoscape
A powerful tool for visualizing molecular interaction networks, Cytoscape can be used to study protein-protein interactions, gene networks, and metabolic pathways in fungi.

Best For: Network biology and functional genomics.

3.2. iTOL (Interactive Tree of Life)

Website: iTOL
iTOL is an interactive tool for visualizing phylogenetic trees.

Best For: Displaying fungal phylogenies and comparing evolutionary relationships.

4. Community Resources

4.1. Mycological Society of America (MSA)

Website: MSA
The MSA promotes fungal research and provides access to resources, conferences, and publications.

Best For: Networking with fungal researchers and accessing recent studies.

4.2. OpenFungi

Website: OpenFungi
OpenFungi is an open-source initiative providing fungal genomic and transcriptomic datasets for research and education.

Best For: Sharing and accessing public fungal datasets.

5. Genomics Workflows

5.1. Galaxy

Website: Galaxy Project
Galaxy offers a web-based platform for reproducible bioinformatics workflows, including tools for fungal genome and transcriptome analysis.

Best For: User-friendly analysis pipelines without requiring coding skills.

5.2. Snakemake

Repository: Snakemake
A flexible pipeline management tool that supports fungal data processing and analysis.

Best For: Custom workflows for large-scale fungal datasets.

Conclusion

Fungal research is a rapidly growing field with vast implications for medicine, agriculture, and industry. For bioinformaticians, the availability of specialized resources—databases, tools, and community platforms—opens doors to innovative discoveries. Whether you are investigating fungal genomics, studying host-pathogen interactions, or exploring fungal biodiversity, the resources outlined above will empower your research journey.

Dive into these resources and help unravel the mysteries of the fungal kingdom!

Linux for bioinformatician !!!

Rahul Nayak — Thu, 13 Mar 2014 16:59:26 -0500

Linux, free operating system for computers, provides several powerful admin tools and utilities which will help you to manage your systems effectively and handle huge amount of genomic/biological data with an ease. The field of bioinformatics relies heavily on Linux-based computers and software. Although most bioinformatics programs can be compiled to run. If you don’t know what these no so user-friendly tools are and how to use them, you could be spending lot of time trying to perform even the basic admin tasks. The focus of this linux series is to help you understand system admin as well as basic tools, which will help you to become an effective bioinformatician and computational biologist.

For knowledge about Linux and their importance amongst bioinformatician plesae read this article "An introduction to Linux for bioinformatics" by Paul Stothard.

Linux cheat sheet at http://bioinformaticsonline.com/file/view/87/linux-cheat-sheet

Please browse for futher useful linux pages on right hand side ...

Frequent parameters for bioinformatics tools !

BioStar — Tue, 27 Oct 2020 19:42:32 -0500

Third party executable parameters and options.

Trimmomatic

“ILLUMINACLIP:...:2:30:10”

“LEADING:15”

“TRAILING:15”

“SLIDINGWINDOW:4:20”

“MINLEN:20”

“TOPHRED33”

Filtlong

--min_length 500

--min_mean_q 85

--min_window_q 65

FastQ Screen

--aligner bowtie2' (bwa for PacBio)

--subset 1000 (for PacBio)

SPAdes

--careful

--disable-gzip-output

--cov-cutoff auto

--phred-offset 33

HGAP

Pbalign.task_options.min_accuracy: 70

Pbalign.task_options.no_split_subreads: false

Genomic_consensus.task_options.min_confidence: 40

falcon_ns.task_options.HGAP_GenomeLength_str:

6000000

Pbcoretools.task_options.read_length: 0

Genomic_consensus.task_options.use_score: 0

Pbalign.task_options.min_length: 50

Pbalign.task_options.algorithm_options: --minMatch 12

--bestn 10 --minPctSimilarity 70.0

Pbalign.task_options.hit_policy: randombest

Pbcoretools.task_options.other_filters: rq >= 0.7

Pbalign.task_options.concordant: false

Genomic_consensus.task_options.min_coverage: 5

falcon_ns.task_options.HGAP_SeedCoverage_str: 30

falcon_ns.task_options.HGAP_AggressiveAsm_bool: false

Genomic_consensus.task_options.algorithm: best

falcon_ns.task_options.HGAP_SeedLengthCutoff_str: -1

Genomic_consensus.task_options.diploid: false

MeDuSa

-random 100

Prokka

--usegenus

--force

--addgenes

--rfam

--rawproduct

cmsearch (taxonomy, 16S)

--rfam

--noali

blastn (taxonomy, 16S)

-evalue 1E-10

blastn (MLST)

-ungapped

-dust no

-evalue 1E-20

-word_size 32

-culling_limit 2

-perc_identity 95

blastp (VF)

-culling_limit 2

RGI (ABR)

--input_type contig

bowtie2 (mapping)

--sensitive

minimap2 (mapping)

-a

-x map-ont

samtools mpileup (SNP detection)

-uRI

bcftools call (SNP detection)

--variants-only

--skip-variants indels

--output-type v

--ploidy 1

-c

SNPsift filter (SNP detection)

"( QUAL >= 30 ) & (( na FILTER ) | (FILTER = 'PASS')) &

( DP >= 20 ) & ( MQ >= 20 )"

SNPeff ann (SNP detection)

-nodownload

-no-intron

-no-downstream

-no SPLICE_SITE_REGION

-upDownStreamLen 250

bcftools consensus

(phylogenetic tree)

--haplotype 1

fasttreemp

-nt

-boot 100

roary

-e

-n

-cd 100

-g 100000

simNGS and simLibrary – Software for Simulating Next-Gen Sequencing Data

Jit — Tue, 28 Nov 2017 06:49:11 -0600

simNGS is software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software. By default, observations only incorporate noise due to sequencing and do not incorporate effects from more esoteric sources of noise that may be present in real data ("dust", bubbles, merged clusters, sequence-heterogeneous clusters, etc). Many of these additional sources may optionally applied.

simNGS takes fasta format sequences and a file describing the covariance of noise between bases and cycles observed in an actual run of the machine, randomly generates noisy intensities representing the signals for the sequence at each cycle and calculates likelihoods for all possible base calls.

Address of the bookmark: https://www.ebi.ac.uk/goldman-srv/simNGS/

Tools for Protein-Protein Docking !

Poonam Mahapatra — Wed, 25 Apr 2018 05:15:53 -0500

Predicting the structure of protein–protein complexes using docking approaches is a difficult problem whose major challenges include identifying correct solutions, and properly dealing with molecular flexibility and conformational changes. Following are the tools to predict the structure of protein–protein complexes:

3D-Dock Suite

Global rigid search: FFTShape complementarity and electrostatics

Re-scoring and clustering. Refinement of interface side-chains

3D-Garden

Global rigid search in ensamble

Shape complementarity and Lennard–Jones potential

Side chain and backbone dihedral refinement

DOT

Global rigid search: FFTShape complementarity, electrostatics and VDWNone

Escher NG

Global rigid searchShape complementarity, hydrogen bonds and electrostatic

Integrated in VEGA

GRAMM

Global rigid search: FFT. smooth protein surface representation for soft docking

Shape complementarity and Lennard-Jones potential

Clustering of conformations

GRAMM-X

Global rigid search: FFT. smooth protein surface representation for soft docking

Shape complementarity and Lennard-Jones potentialminimization and re-scoring with multiple filters

HEX

Global rigid search: Fourier correlation of spherical harmonics

Shape complementarity

HADDOCK

Global rigid searchElectrostatic ,VDW and desolvation energy termsMD simulated annealing refinement . Filtering based on external data.

ICM

Global rigid search: Monte CarloEmpirical scoring function

Clustering and selection of conformations. Refinement of interface side-chains and re-scoring

MolFit

Global rigid search: FFTShape complementarity

Clustering of good solutions, filtering using a priori information and small, local rigid rotations around selected conformations

PatchDock

Global rigid searchShape complementarity and atomic desolvation energy

Clustering of conformations

PyDock

Global rigid search:FFTShape complementarity

rescoring by binding electrostatics and desolvation energy

RosettaDock

Local rigid search: Monte Carlo with low and high resolution structure representation levels

Different scoring parameters for the different resolutions

ZDOCK

Global rigid search: FFTShape complementarity, desolvation energy, and electrostatics.

Energy minimization and re-scoringFree for academics

Point to note:

The proper treatment of flexibility in protein–protein docking is still an active field of research. You first should analyzed your proteins in order to define their conformational space and then choose the most suitable method for your docking problem.

molinspiration: broad range of cheminformatics software tools supporting molecule manipulation

BioJoker — Sun, 20 Jan 2019 05:32:40 -0600

Molinspiration offers broad range of cheminformatics software tools supporting molecule manipulation and processing, including SMILES and SDfile conversion, normalization of molecules, generation of tautomers, molecule fragmentation, calculation of various molecular properties needed in QSAR, molecular modelling and drug design, high quality molecule depiction, molecular database tools supporting substructure and similarity searches. Our products support also fragment-based virtual screening, bioactivity prediction and data visualization. Molinspiration tools are written in Java, therefore can be used practically on any computer platform.

Address of the bookmark: https://www.molinspiration.com/

Ancient whole genome duplication (WGD) detection tools !

Rahul Nayak — Sun, 07 Mar 2021 00:32:44 -0600

There are two methods for ancient WGD detection, one is collinearity analysis, and the other is based on the Ks distribution map. Among them, Ks is defined as the average number of synonymous substitutions at each synonymous site, and there is also a Ka corresponding to it, which refers to the average number of non-synonymous substitutions at each non-synonymous site.

At present, some people have posted articles about the analysis process of WGD. I searched for the keyword "wgd pipeline" and found the following:

GenoDup: https:// github.com/MaoYafei/GenoDup-Pipeline
https://peerj.com/articles/6303/
WGDdetector: https:// github.com/yongzhiyang2 012/WGDdetector
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2670-3
wgd: https:// github.com/arzwa/wgd
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2#Sec1
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
GeNoGAP https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1142-2
https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-017-0399-x
https://github.com/dfguan/purge_dups
https://www.biorxiv.org/content/10.1101/2020.01.24.917997v1

This article introduces the usage of wgd.

Wgd cannot be installed directly with bioconda at present, so it is a little troublesome to install, because it depends on a lot of software. wgd depends on the following software

BLAST
MCL
MUSCLE/MAFFT/PRANK
PAML
PhyML/FastTree
i-ADHoRe

But the good news is that most of the software it depends on can be installed with bioconda

conda create -n wgd python=3.5 blast mcl muscle mafft prank paml fasttree cmake libpng mpi=1.0=mpich
conda activate wgd

Here mpi=1.0=mpich is selected, because i-adhore depends on mpich. If openmpi is installed, an error will appear while loading shared libraries: libmpi_cxx.so.40: cannot open shared object file: No such file or directory

After that, the installation is much simpler

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install .
pip install git+https://github.com/arzwa/wgd.git
For i-ADHoRe, you need to register at http:// bioinformatics.psb.ugent.be /webtools/i-adhore/licensing/Agree to the license to download i-ADHoRe-3.0

Since my miniconda3 installed ~/opt/, the installation path is so~/opt/miniconda3/envs/wgd/

tar -zxvf i-adhore-3.0.01.tar.gz
cd i-adhore-3.0.01
mkdir -p build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=~/opt/miniconda3/envs/wgd/
make -j 4
make insatall

Take the sugarcane genome Saccharum spontaneum L as an example. The genome is 8-ploid with 32 chromosomes (2n = 4x8 = 32)

Download the tutorial for CDS and GFF annotation files

mkdir -p wgd_tutorial && cd wgd_tutorial
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.cds.fasta.gz
wget http://www.life.illinois.edu/ming/downloads/Spontaneum_genome/Sspon.v20190103.gff3.gz
gunzip *.gz

First conda activate wgdstart our analysis environment, and then start the analysis

Step 1 : Use to wgd mclidentify homologous genes in the genome

wgd mcl -n 20 --cds --mcl -s Sspon.v20190103.cds.fasta -o Sspon_cds.out

Step 2 : Use to wgd ksdbuild Ks distribution

wgd ksd --n_threads 80 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl Sspon.v20190103.cds.fasta

Step 3 : If the quality of the genome is good, then wgd syncollinearity analysis can be used . It can help us find the collinearity block in the genome and the corresponding anchor point

wgd syn --feature gene --gene_attribute ID \
-ks wgd_ksd/Sspon.v20190103.cds.fasta.ks.tsv \
Sspon.v20190103.gff3 Sspon_cds.out/Sspon.v20190103.cds.fasta.blast.tsv.mcl

For more reading - There are 9 sub-modules in WGD

kde: KDE fitting to the Ks distribution
ksd: Ks distribution construction
mcl: BLASP comparison of All-vs-ALl + MCL classification analysis.
mix: Hybrid modeling of Ks distribution.
pre: preprocess the CDS file
syn: Call I-ADHoRe 3.0 to use GFF files for collinearity analysis
viz: draw histogram and density plot
wf1: Ks standard analysis procedure of the whole genome paranome (paranome), call mcl, ksd and syn
wf2: Ks standard analysis procedure of one-vs-one homologous gene (ortholog), call wcl and kSD

Picard

Neel — Fri, 29 Apr 2016 08:21:54 -0500

Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. These file formats are defined in the Hts-specs repository. See especially the SAM specification and the VCF specification.

Note that the information on this page is targeted at end-users. For developers, the source code, building instructions and implementation/development resources are available on GitHub.

The Picard toolkit is open-source under the MIT license and free for all uses.

Enjoy!

Address of the bookmark: http://broadinstitute.github.io/picard/

CANU: Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing.

Jit — Tue, 26 Apr 2016 11:38:10 -0500

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). The software is currently alpha level, feel free to use and report issues encountered.

Canu is a hierachical assembly pipeline which runs in four steps:

Detect overlaps in high-noise sequences using MHAP
Generate corrected sequence consensus
Trim corrected sequences
Assemble trimmed corrected sequences

Read the documentation

New release https://github.com/marbl/canu/releases

Address of the bookmark: https://github.com/marbl/canu