BOL: Neel's blogs

10 Books to Kickstart (and Level Up) Your Bioinformatics Journey

Neel — Tue, 12 Aug 2025 03:50:11 -0500

If you’re starting out in bioinformatics or looking to sharpen your computational biology skills, having the right learning resources makes all the difference.
Here’s my curated list of 10 must-read books — from beginner-friendly introductions to advanced computational genomics.

1️⃣ Data Analysis for the Life Sciences
A fantastic starting point to learn statistics, R programming, and exploratory data analysis in the context of biology. The best part? It’s available free online from HarvardX.

2️⃣ Practical Computing for Biologists
The very first book I picked up when I started learning computational biology. It’s beginner-friendly and focuses on essential computing skills every biologist needs.

3️⃣ A Primer for Computational Biology
An open-access, hands-on introduction to computational biology concepts and coding techniques. Perfect if you want to learn through real examples.

4️⃣ Computational Genomics with R
For those who already know R and want to dive deeper into genome-scale data analysis, from sequence alignment to gene expression.

5️⃣ The Biologist’s Guide to Computing
Bridges the gap between biological problems and computational thinking, making it easier for life scientists to approach programming and data analysis.

6️⃣ Bioinformatics Data Skills
A must-read to sharpen your bioinformatics toolkit — from command-line skills to reproducible research workflows. Ideal once you’ve covered the basics.

7️⃣ Bioinformatics Workbook
A practical tutorial series to help scientists design bioinformatics projects, analyze data, and understand best practices.

8️⃣ Modern Statistics for Modern Biology
An essential guide to modern statistical methods applied to biology, blending theory with hands-on examples in R.

9️⃣ Algorithms on Strings, Trees, and Sequences by Dan Gusfield
A classic reference for anyone wanting to understand the algorithms behind sequence alignment, genome assembly, and biological data structures.

A Beginner's Guide to Using Kraken for Taxonomic Classification

Neel — Fri, 13 Dec 2024 11:29:03 -0600

Kraken is a popular bioinformatics tool designed for fast and accurate taxonomic classification of metagenomic sequences. Its efficiency and precision make it a go-to resource for analyzing microbial communities, including bacteria, viruses, archaea, and fungi. Whether you're new to bioinformatics or experienced in the field, Kraken is an indispensable tool for taxonomic analysis.

In this blog, we’ll walk through the basics of Kraken, from installation to running an analysis, and highlight its key features and applications.

What is Kraken?

Kraken is a sequence classification tool that assigns taxonomic labels to DNA sequences using exact k-mer matching. It uses a reference database of genomes, dividing sequences into k-mers and identifying matches in a computationally efficient way.

Key Features of Kraken

Speed: Kraken processes data much faster than alignment-based methods.
Accuracy: It uses a precise k-mer matching algorithm for high-resolution taxonomic assignments.
Scalability: It can handle large metagenomic datasets.
Custom Databases: You can build and use custom databases tailored to your research needs.

Installing Kraken

System Requirements
- A Unix-based operating system (Linux/macOS).
- Sufficient computational resources for database building (RAM and disk space).
Installation Steps
- Clone the Kraken repository from GitHub:
  
  git clone https://github.com/DerrickWood/kraken.git cd kraken
- Compile the Kraken binaries:
  
  make
- Add Kraken to your PATH for easy access:
  
  export PATH=$PATH:/path/to/kraken

Preparing a Database

Kraken requires a database of reference genomes. You can use a pre-built database or create a custom one.

Downloading a Pre-built Database
Kraken offers pre-built databases, such as the MiniKraken database, which is lightweight and suitable for smaller datasets. Download it using:

kraken-build --download-library minikraken
Building a Custom Database
To include specific genomes, download FASTA files and build the database:

kraken-build --download-library bacteria --threads 4 --db my_database kraken-build --build --db my_database

This process may take considerable time and resources, depending on the size of the database.

Running Kraken

Once the database is ready, you can classify sequences.

Basic Usage
Use the following command to classify sequences:

kraken --db my_database --threads 4 --fastq-input input_sequences.fastq --output kraken_output.txt

Key options:
- --db: Specifies the database.
- --threads: Number of threads for parallel processing.
- --fastq-input: Indicates input file format (FASTQ/FASTA).
Interpreting Results
Kraken generates an output file with columns for sequence IDs, taxonomic classifications, and the confidence score.

Visualizing Kraken Results

Kraken results can be visualized using tools like Krona or converted to human-readable reports using kraken-report.

Generate a Report

kraken-report --db my_database kraken_output.txt > kraken_report.txt
Krona Visualization
Install Krona and convert Kraken output for visualization:

cut -f2,3 kraken_output.txt | ktImportTaxonomy -o krona_output.html

Open the HTML file in your browser to interactively explore the taxonomic classifications.

Advanced Usage

Confidence Thresholds
Adjust the confidence threshold for classification using the --confidence option. Higher values reduce false positives but may miss some true positives:

kraken --db my_database --confidence 0.1 --fastq-input input.fastq
Paired-End Reads
For paired-end sequencing data, use:

kraken --db my_database --paired reads_1.fastq reads_2.fastq
Customizing K-mers
Kraken allows you to set custom k-mer lengths during database building for specific applications.

Applications of Kraken

Microbial Ecology: Characterizing microbial communities in soil, water, and the human microbiome.
Pathogen Detection: Identifying pathogens in clinical samples.
Fungal Research: Analyzing fungal diversity in metagenomic datasets.
Environmental Monitoring: Tracking microbial populations in diverse habitats.

Conclusion

Kraken is a versatile and efficient tool for taxonomic classification in metagenomics. Its speed, accuracy, and flexibility make it a favorite among bioinformaticians. By following this guide, you can set up and use Kraken to unlock insights into microbial and fungal communities, paving the way for discoveries in ecology, medicine, and biotechnology.

Mycology Research Resources for Bioinformaticians: Unlocking the Fungal Kingdom

Neel — Fri, 13 Dec 2024 11:21:45 -0600

Mycology, the study of fungi, is a field that bridges ecology, medicine, and biotechnology. With advancements in bioinformatics, researchers now have unprecedented opportunities to explore the fungal kingdom at molecular, genetic, and ecological levels. From understanding pathogenic fungi to harnessing fungal enzymes for industrial applications, the potential is vast.

To fully leverage these opportunities, bioinformaticians require specialized tools and databases. This blog highlights essential resources for mycology research, focusing on databases, tools, and platforms tailored for fungal biology.

1. Fungal Databases

1.1. MycoCosm

Website: MycoCosm
Developed by the DOE Joint Genome Institute, MycoCosm is a comprehensive portal for fungal genomics. It offers genomic and transcriptomic data for a wide range of fungi, including saprobes, pathogens, and symbionts.

Key Features: Genome browsers, comparative genomics tools, and functional annotations.
Best For: Large-scale studies on fungal evolution and ecology.

1.2. FungiDB

Website: FungiDB
FungiDB is an integrated genomic resource for fungal pathogens and non-pathogens. It provides access to genome sequences, transcriptomic data, and functional annotations.

Key Features: Advanced search options, BLAST, and pathway analysis tools.
Best For: Studying fungal pathogenesis and host-pathogen interactions.

1.3. Index Fungorum

Website: Index Fungorum
This nomenclatural database provides information on the scientific names of fungi. It’s an essential resource for taxonomists and researchers focused on fungal biodiversity.

Key Features: Taxonomic hierarchy and synonymy tracking.
Best For: Identifying and classifying fungal species.

1.4. UNITE

Website: UNITE
UNITE is a specialized database for fungal ITS (Internal Transcribed Spacer) sequences, often used in fungal identification and phylogenetics.

Key Features: Curated reference datasets and community annotations.
Best For: Environmental mycology and microbial ecology studies.

2. Analytical Tools

2.1. Funannotate

Repository: GitHub - Funannotate
Funannotate is a genome annotation tool designed for fungi. It supports tasks like gene prediction, functional annotation, and orthology analysis.

Best For: Annotating newly sequenced fungal genomes.

2.2. BUSCO (Benchmarking Universal Single-Copy Orthologs)

Website: BUSCO
BUSCO evaluates genome assembly and annotation completeness using orthologs. It includes a fungal-specific dataset.

Best For: Assessing the quality of fungal genome assemblies.

2.3. Pathogen-Host Interactions Database (PHI-base)

Website: PHI-base
PHI-base is a manually curated resource containing information on pathogen-host interactions, including fungal pathogens.

Best For: Exploring virulence factors and host-pathogen relationships.

3. Visualization Platforms

3.1. Cytoscape

Website: Cytoscape
A powerful tool for visualizing molecular interaction networks, Cytoscape can be used to study protein-protein interactions, gene networks, and metabolic pathways in fungi.

Best For: Network biology and functional genomics.

3.2. iTOL (Interactive Tree of Life)

Website: iTOL
iTOL is an interactive tool for visualizing phylogenetic trees.

Best For: Displaying fungal phylogenies and comparing evolutionary relationships.

4. Community Resources

4.1. Mycological Society of America (MSA)

Website: MSA
The MSA promotes fungal research and provides access to resources, conferences, and publications.

Best For: Networking with fungal researchers and accessing recent studies.

4.2. OpenFungi

Website: OpenFungi
OpenFungi is an open-source initiative providing fungal genomic and transcriptomic datasets for research and education.

Best For: Sharing and accessing public fungal datasets.

5. Genomics Workflows

5.1. Galaxy

Website: Galaxy Project
Galaxy offers a web-based platform for reproducible bioinformatics workflows, including tools for fungal genome and transcriptome analysis.

Best For: User-friendly analysis pipelines without requiring coding skills.

5.2. Snakemake

Repository: Snakemake
A flexible pipeline management tool that supports fungal data processing and analysis.

Best For: Custom workflows for large-scale fungal datasets.

Conclusion

Fungal research is a rapidly growing field with vast implications for medicine, agriculture, and industry. For bioinformaticians, the availability of specialized resources—databases, tools, and community platforms—opens doors to innovative discoveries. Whether you are investigating fungal genomics, studying host-pathogen interactions, or exploring fungal biodiversity, the resources outlined above will empower your research journey.

Dive into these resources and help unravel the mysteries of the fungal kingdom!

Exploring RNA Sequence Analysis: Tools for Every Bioinformatician

Neel — Fri, 13 Dec 2024 04:03:04 -0600

RNA sequence analysis has become an essential part of modern biological research. From RNA-seq pipelines to specialized tools for specific RNA types, here's a comprehensive guide to tools you can use to make sense of RNA data.

1. RNA-Seq Analysis Pipelines

RNA-seq is one of the most popular techniques for studying RNA. These tools streamline processing raw sequence data:

FASTQC: For quality control of raw RNA-seq reads.
Trimmomatic: For trimming and filtering RNA-seq reads.
HISAT2/STAR: High-performance aligners for RNA-seq reads.
FeatureCounts: For quantifying gene expression.
DESeq2/EdgeR: For differential expression analysis.

2. Transcriptome Assembly and Annotation

For analyzing transcriptomes from non-model organisms or assembling novel transcripts:

Trinity: For de novo transcriptome assembly.
StringTie: For transcript assembly and quantification from RNA-seq alignments.
TransDecoder: To predict coding regions within assembled transcripts.
TAU: Tools for annotating non-coding and coding RNAs.

3. Exploring Non-Coding RNA (ncRNA)

Non-coding RNAs play critical regulatory roles. Dedicated tools for studying them include:

Infernal: For identifying ncRNA sequences based on covariance models.
Rfam: Database and tools for ncRNA families.
miRDeep: For identifying microRNAs in RNA-seq datasets.

4. RNA Structure and Motif Analysis

Structural biology of RNA helps in understanding its function:

RNAfold (ViennaRNA): Predicts secondary structures from RNA sequences.
RNAstructure: Tools for RNA secondary structure prediction and analysis.
MEME Suite: For identifying motifs in RNA sequences.
IntaRNA: For RNA-RNA interaction prediction.

5. RNA Editing and Modifications

Epitranscriptomics is a growing field focusing on RNA modifications:

REDItools: For RNA editing analysis.
m6Aboost: For identifying m6A modifications in RNA.

6. Long-Read RNA Sequencing Analysis

Long-read technologies like Nanopore and PacBio are transforming RNA research:

FLAIR: For isoform-level analysis of long-read RNA-seq data.
NanoMod: For detecting modifications in RNA from Nanopore sequencing.

7. RNA-Protein Interactions

To study RNA-protein interactions and complexes:

RBPmap: For identifying RNA-binding protein motifs.
PARalyzer: For analyzing PAR-CLIP data.

8. Functional Enrichment Analysis

Understanding biological functions and pathways from RNA-seq data:

getENRICH: A tool designed for pathway enrichment analysis of non-model organisms (hypergeometric P-value calculation with FDR correction).
ClusterProfiler: For GO and KEGG pathway enrichment analysis.

9. Visualization and Data Sharing

Presenting and sharing RNA sequence analysis results effectively:

IGV: Genome browser for visualizing RNA-seq alignments.
Circos: Circular visualization of RNA-seq data.
DashBio: A Python library for creating bioinformatics visualizations.

Conclusion

The bioinformatics landscape for RNA sequence analysis is vast, with tools catering to specific needs. Whether you’re studying coding RNAs, non-coding RNAs, or exploring RNA-protein interactions, the right tools can transform your data into biological insights.

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Neel — Wed, 11 Dec 2024 00:59:15 -0600

RNA sequencing (RNA-Seq) is a powerful technology used to study transcriptomes, providing insights into gene expression levels. However, raw RNA-Seq data requires normalization to account for sequencing depth and gene length, enabling accurate comparisons between genes and samples. Among the most widely used normalization methods are TPM (Transcripts Per Million), FPKM (Fragments Per Kilobase Million), and CPM (Counts Per Million). Each method has its unique principles and applications, which we’ll explore in this blog.

Why Normalize RNA-Seq Data?

Normalization is a crucial step in RNA-Seq analysis for the following reasons:

Sequencing depth: Different RNA-Seq experiments produce varying numbers of reads, making direct comparisons between samples misleading.
Gene length: Longer genes inherently generate more reads, irrespective of their actual expression level.
Bias reduction: Normalization mitigates technical biases, enabling meaningful biological interpretation.

TPM (Transcripts Per Million)

TPM measures the proportion of reads mapped to a transcript, normalized by transcript length and sequencing depth. It is calculated as:

Key Features:

Proportionality: TPM values sum to 1,000,000 across all transcripts in a sample, making it easier to compare between samples.
Intuitive interpretation: TPM values directly represent the abundance of transcripts in a sample.
Preferred for comparisons: TPM facilitates between-sample comparisons better than FPKM.

FPKM (Fragments Per Kilobase Million)

FPKM normalizes read counts by transcript length and sequencing depth, but without enforcing proportionality like TPM. It is defined as:

Key Features:

Historical significance: FPKM was one of the first normalization methods used for RNA-Seq.
Single-end vs. paired-end: In paired-end sequencing, FPKM becomes RPKM (Reads Per Kilobase Million).
Limited utility: FPKM values are not as robust as TPM for cross-sample comparisons due to lack of proportionality.

CPM (Counts Per Million)

CPM normalizes raw read counts by sequencing depth, without considering gene length. It is expressed as:

Key Features:

Simplicity: CPM is straightforward and computationally less intensive.
Application: Suitable for non-length-dependent analyses, such as comparing total expression levels or differential expression analysis.
Gene length agnostic: CPM does not correct for gene length, making it less ideal for measuring expression levels.

When to Use Each Method

TPM: Best for comparing expression levels between samples, especially when transcript length and sequencing depth vary.
FPKM: Useful for historical consistency but generally replaced by TPM.
CPM: Ideal for differential expression analysis when gene length normalization is unnecessary.

Conclusion

Choosing the right normalization method depends on the specific objectives of your RNA-Seq analysis. TPM’s proportionality and robustness make it the preferred choice for most applications, while CPM serves well for differential expression studies. Although FPKM paved the way for RNA-Seq normalization, it has largely been supplanted by TPM in modern workflows. Understanding these methods and their nuances ensures accurate and meaningful interpretations of RNA-Seq data.

References:

Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics.
Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology.
Law, C. W., et al. (2014). voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology.

Langya Virus Update !

Neel — Fri, 12 Aug 2022 05:31:10 -0500

https://www.ncbi.nlm.nih.gov/nuccore/OM101125,OM101126,OM101127,OM101128,OM101129,OM101130?

Zoonotic Henipavirus

https://pubmed.ncbi.nlm.nih.gov/35921459/

https://www.ncbi.nlm.nih.gov/nuccore/OM069646,,OM069567,OM069568,OM069569,OM069570,OM069571,OM069572,OM069573,OM069574,OM069575,OM069576,OM069577,OM069578,OM069579,OM069580,OM069581,OM069582,OM069583,OM069584,OM069585,OM069586,OM069587,OM069588,OM069589,OM069590,OM069591,OM069592,OM069593,OM069594,OM069595,OM069596,OM069597,OM069598,OM069599,OM069600,OM069601,OM069602,OM069603,OM069604,OM069605,OM069606,OM069607,OM069608,OM069609,OM069610,OM069611,OM069612,OM069613,OM069614,OM069615,OM069616,OM069617,OM069618,OM069619,OM069620,OM069621,OM069622,OM069623,OM069624,OM069625,OM069626,OM069627,OM069628,OM069629,OM069630,OM069631,OM069632,OM069633,OM069634,OM069635,OM069636,OM069637,OM069638,OM069639,OM069640,OM069641,OM069642,OM069643,OM069644,OM069645,OM069646

Installing ELGG on Ubuntu !

Neel — Wed, 25 May 2022 02:26:05 -0500

Elgg is an open-source and highly customizable framework used for building an online social environment. It provides a simple and powerful user interface that helps to manage and build your content through a web browser. Elgg offers a rich set of features including messaging, microblogging, file-sharing, RSS support, access control, groups, and many more.

In this tutorial, we will show you how to install and configure Elgg social networking platform on Ubuntu 20.04.

Prerequisites

• A fresh Ubuntu 20.04 VPS on the Atlantic.net Cloud Platform
• A valid domain name pointed to your server IP
• A root password configured on your server

Step 1 – Create Atlantic.Net Cloud Server

First, log in to your Atlantic.Net Cloud Server. Create a new server, choosing Ubuntu 20.04 as the operating system with at least 2GB RAM. Connect to your Cloud Server via SSH and log in using the credentials highlighted at the top of the page.

Once you are logged in to your Ubuntu 20.04 server, run the following command to update your base system with the latest available packages.

apt-get update -y

Step 2 – Install Apache, MariaDB and PHP

Elgg runs on Apache web server, is written in PHP, and uses MySQL/MariaDB as a database backend, so you will need to install the Apache, MariaDB, PHP and other required PHP extensions to your server. You can install all of them with the following command:

apt-get install apache2 mariadb-server php libapache2-mod-php php-common php-sqlite3 php-curl 
php-intl php-mbstring php-xmlrpc php-mysql php-gd php-xml php-cli php-zip unzip wget -y

After installing all the packages, edit the php.ini file and change some recommended settings.

nano /etc/php/7.4/apache2/php.ini

Change the following values:

max_execution_time = 300
memory_limit = 512M
upload_max_filesize = 100M
date.timezone = Asia/Kolkata

Save and close the file, then restart the Apache service to apply the configuration changes.

systemctl restart apache2

Step 3 – Create a Database for Elgg

Next, you will need to create a database and user for Elgg. First, log in to MySQL shell with the following command:

mysql

Once logged in, create a database and user with the following command:

CREATE DATABASE elgg;
CREATE USER 'elgg'@'localhost' IDENTIFIED BY 'secure-password';

Next, grant all the privileges to the elgg database with the following command:

GRANT ALL ON elgg.* TO 'elgg'@'localhost' IDENTIFIED BY 'secure-password' WITH GRANT 
OPTION;

Next, flush the privileges and exit from the MariaDB shell with the following command:

FLUSH PRIVILEGES;
EXIT;

At this point, the MariaDB database is created for Elgg.

Step 4 – Install Elgg

First, download the latest version of Elgg from its official website using the following command:

wget https://elgg.org/download/elgg-3.3.13.zip

Once the download is completed, unzip the downloaded file with the following command:

unzip elgg-3.3.13.zip

Next, move the extracted directory to the Apache root directory:

mv elgg-3.3.13 /var/www/html/elgg

Next, create a data directory and set proper ownership and permissions to the Elgg directory:

mkdir /var/www/html/data
chown -R www-data:www-data /var/www/html/elgg
chown -R www-data:www-data /var/www/html/data
chmod -R 755 /var/www/html/elgg

Once you are finished, you can proceed to the next step.

Step 5 – Configure Apache for Elgg

Next, you will need to configure Apache to serve Elgg. You can configure it by creating a new Apache virtual host configuration file:

nano /etc/apache2/sites-available/elgg.conf

Add the following lines:


ServerAdmin admin@example.com
DocumentRoot /var/www/html/elgg/
ServerName elgg.example.com
Options FollowSymLinks
AllowOverride All
ErrorLog /var/log/apache2/elgg-error_log
CustomLog /var/log/apache2/elgg-access_log common

Save and close the file, then enable the virtual host and Apache rewrite module with the following command:

a2ensite elgg.conf
a2enmod rewrite

Finally, restart the Apache service to apply the changes:

systemctl restart apache2

Step 6 – Access Elgg Web Interface

Now, open your web browser and access the Elgg web interface using the URL http://elgg.example.com. You should see the Elgg welcome screen:

Useful Bioinformatics Analysis Tools !

Neel — Thu, 23 Dec 2021 23:10:02 -0600

CoMeta

Classificier of reads from metagenomic sequencing experiments.

• Kawulok, J., Deorowicz, S., CoMeta: Classification of Metagenomes Using k-mers, PLOS ONE, 2015; 10(4):1–23,

CoMSA

Compressor of multiple sequence alignments of proteins.

• Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A., CoMSA: compression of protein multiple sequence alignment files, Bioinformatics, 2019; 35(2):22–234,

DSRC

Compressor of sequencing reads.

• Roguski, L., Deorowicz, S., DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 2014; 30(15):2213–2215,
• Deorowicz, S., Grabowski, Sz., Compression of DNA sequences in FASTQ format, Bioinformatics, 2011; 27(6):860–862,

FAMSA

Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).

• Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., FAMSA: Fast and accurate multiple sequence alignment of huge protein families, Scientific Reports, 2016; 6(33964):

FaStore

Compressor of FASTQ files.

• Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S., FaStore - a space-saving solution for raw sequencing data, Bioinformatics, 2018; 34(16):2748–2756,

FQSqueezer

Experimental high-end compressor of FASTQ files.

• Deorowicz, S., FQSqueezer: k-mer-based compression of sequencing data, Scientific Reports, 2020; 10(578):

GDC

Compressor of collections of genome sequences.

• Deorowicz, S., Danek, A., Niemiec, M., GDC 2: Compression of large collections of genomes, Scientific Reports, 2015; 5(11565):1–12,
• Deorowicz, S., Grabowski, Sz., Robust relative compression of genomes with random access, Bioinformatics, 2011; 27(21):2979–2986,

GTC

Genotype databases compressor with support for fast queries.

• Danek, A., Deorowicz, S., GTC: how to maintain huge genotype collections in a compressed form, Bioinformatics, 2018; 34(11):1834–1840,

GTShark

Genotypes compressor.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, Bioinformatics, 2019; 35(22):4791–4793,

KMC

Memory frugal k-mer counter.

•  Kokot, M., Długosz, M., Deorowicz, S., KMC 3: counting and manipulating k -mer statistics, Bioinformatics, 2017; 33(17):2759–2761,
•  Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A., KMC 2: Fast and resource-frugal k-mer counting, Bioinformatics, 2015; 31(10):1569–1576,
•  Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz., Disk-based k-mer counting on a PC, BMC Bioinformatics, 2013; 14():Article no. 160,

Kmer-db

Tool for estimation of evolutionary distances in a collection of genomes.

• Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A., Kmer-db: instant evolutionary distance estimation, Bioinformatics, 2019; 35(1):133–136,

MuGI

Index allowing queries for a collection of multiple genome sequences.

• Danek, A., Deorowicz, S., Grabowski, Sz., Indexes of Large Genome Collections on a PC, PLOS ONE, 2014; 9(10):e109384,

ORCOM

Experimental compressor of sequencing reads.

• Grabowski, Sz., Deorowicz, S., Roguski, L., Disk-based compression of data from genome sequencing, Bioinformatics, 2014; 31(9):1389–1395,

PgSA

Index allowing queries for a collection of sequencing reads.

• Kowalski, T., Grabowski, Sz., Deorowicz, S., Indexing arbitrary-length k-mers in sequencing reads, PLOS ONE, 2015; 10(7):1–16,

QuickProbs

Multiple sequence alignment designed especially for GPU.

• Gudys, A., Deorowicz, S., QuickProbs 2: towards rapid construction of high-quality alignments of large protein families, Scientific Reports, 2017; 7(41553):
• Gudys, A., Deorowicz, S., QuickProbs – A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors, PLOS ONE, 2014; 9(2):e88901,

RECKONER

Read error corrector.

• Maciej Długosz, M., Deorowicz, S., RECKONER: read error corrector based on KMC, Bioinformatics, 2017; 33(7):1086–1089,

TGC

Compressor of collections of genomes given in Variant Call Format (VCF) files.

• Deorowicz, S., Danek, A., Grabowski, Sz., Genome compression: a novel approach for large collections, Bioinformatics, 2013; 29(20):2572–2578,

VCFShark

Compressor of VCF files.

• Deorowicz, S., Danek, A., GTShark: Genotype compression in large projects, biorxiv.org, 2020; ():

Whisper

Experimental mapper of whole genome sequencing data.

•  Deorowicz, S., Gudys, A., Whisper 2: indel-sensitive short read mapping, bioRxiv.org, 2019; :
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Whisper: read sorting allows robust robust mapping of DNA sequencing data, Bioinformatics, 2019; 35(12):2043–2050,
•  Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz., Robust mapping of whole genome sequencing data, Poster at The Biology of Genomes Conference, 2017;

REST API

Neel — Mon, 04 Oct 2021 12:46:40 -0500

REST API

The Representational State Transfer (REST) sample clients are provided for a number of programming languages. For details of how to use these clients, download the client and run the program without any arguments.

Language	Download	Requirements
Perl	psiblast.pl	LWP and XML::Simple
Python	psiblast.py	xmltramp2

For details see Environment setup for REST Web Services and Examples for Perl REST Web Services Clients pages.

Frequently used bioinformatics tools for viral genome analysis !

Neel — Wed, 23 Jun 2021 07:40:41 -0500

IVA: accurate de novo assembly of RNA virus genomes.
Hunt M, Gall A, Ong SH, Brener J, Ferns B, Goulder P, Nastouli E, Keane JA, Kellam P, Otto TD.
Bioinformatics. 2015 Jul 15;31(14):2374-6. doi: 10.1093/bioinformatics/btv120. Epub 2015 Feb 28.

Adapter sequences:
Optimal enzymes for amplifying sequencing libraries.
Quail, M. a et al. Nat. Methods 9, 10-1 (2012).

GAGE:
GAGE: A critical evaluation of genome assemblies and assembly algorithms.
Salzberg, S. L. et al. Genome Res. 22, 557-67 (2012).

KMC:
Disk-based k-mer counting on a PC.
Deorowicz, S., Debudaj-Grabysz, A. & Grabowski, S. BMC Bioinformatics 14, 160 (2013).

Kraken:
Kraken: ultrafast metagenomic sequence classification using exact alignments.
Wood, D. E. & Salzberg, S. L. Genome Biol. 15, R46 (2014).

MUMmer:
Versatile and open software for comparing large genomes.
Kurtz, S. et al. Genome Biol. 5, R12 (2004).

R:
R: A language and environment for statistical computing.
R Core Team (2013). R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

RATT:
RATT: Rapid Annotation Transfer Tool.
Otto, T. D., Dillon, G. P., Degrave, W. S. & Berriman, M. Nucleic Acids Res. 39, e57 (2011).

SAMtools:
The Sequence Alignment/Map format and SAMtools.
Li, H. et al. Bioinformatics 25, 2078-9 (2009).

Trimmomatic:
Trimmomatic: A flexible trimmer for Illumina Sequence Data.
Bolger, A. M., Lohse, M. & Usadel, B. Bioinformatics 1-7 (2014).

BOL: Neel's blogs

10 Books to Kickstart (and Level Up) Your Bioinformatics Journey

A Beginner's Guide to Using Kraken for Taxonomic Classification

What is Kraken?

Key Features of Kraken

Installing Kraken

Preparing a Database

Running Kraken

Visualizing Kraken Results

Advanced Usage

Applications of Kraken

Conclusion

Mycology Research Resources for Bioinformaticians: Unlocking the Fungal Kingdom

1. Fungal Databases

1.1. MycoCosm

1.2. FungiDB

1.3. Index Fungorum

1.4. UNITE

2. Analytical Tools

2.1. Funannotate

2.2. BUSCO (Benchmarking Universal Single-Copy Orthologs)

2.3. Pathogen-Host Interactions Database (PHI-base)

3. Visualization Platforms

3.1. Cytoscape

3.2. iTOL (Interactive Tree of Life)

4. Community Resources

4.1. Mycological Society of America (MSA)

4.2. OpenFungi

5. Genomics Workflows

5.1. Galaxy

5.2. Snakemake

Conclusion

Exploring RNA Sequence Analysis: Tools for Every Bioinformatician

1. RNA-Seq Analysis Pipelines

2. Transcriptome Assembly and Annotation

3. Exploring Non-Coding RNA (ncRNA)

4. RNA Structure and Motif Analysis

5. RNA Editing and Modifications

6. Long-Read RNA Sequencing Analysis

7. RNA-Protein Interactions

8. Functional Enrichment Analysis

9. Visualization and Data Sharing

Conclusion

Understanding RNA-Seq Normalization Methods: TPM vs. FPKM vs. CPM

Why Normalize RNA-Seq Data?

TPM (Transcripts Per Million)

Key Features:

FPKM (Fragments Per Kilobase Million)

Key Features:

CPM (Counts Per Million)

Key Features:

When to Use Each Method

Conclusion

References:

Langya Virus Update !

Installing ELGG on Ubuntu !

Prerequisites

Step 1 – Create Atlantic.Net Cloud Server

Step 2 – Install Apache, MariaDB and PHP

Step 3 – Create a Database for Elgg

Step 4 – Install Elgg

Step 5 – Configure Apache for Elgg

Step 6 – Access Elgg Web Interface

Useful Bioinformatics Analysis Tools !

REST API

REST API

Python

Frequently used bioinformatics tools for viral genome analysis !