BOL: Related items

PHYMMBL

Jit — Mon, 10 Oct 2016 08:56:34 -0500

Metagenomics sequencing projects collect samples of DNA from uncharacterized environments that may contain hundreds or even thousands of species. One of the main challenges in analyzing a metagenome is phylogenetic classification of raw sequence reads into groups representing the same or similar species. Such classification is a useful prerequisite for genome assembly and for analysis of the biological diversity present in a sample. The newest sequencing technologies have simultaneously made metagenomics easier, by making the sequencing process faster, and more difficult, by producing shorter read lengths than previous technologies. Methods for classifying sequences as short as 100 base pairs (bp) have until now been relatively inaccurate, requiring metagenomics projects to use older, long-read technologies. Phymm, a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, can accurately classify reads as short as 100 bp. Its accuracy for short reads represents a significant leap forward over previous composition-based classification methods. PhymmBL (rhymes with "thimble"), the hybrid classifier included in this distribution which combines analysis from both Phymm and BLAST, produces even higher accuracy.

Address of the bookmark: http://www.cbcb.umd.edu/software/phymm/

A Beginner's Guide to Using Kraken for Taxonomic Classification

Neel — Fri, 13 Dec 2024 11:29:03 -0600

Kraken is a popular bioinformatics tool designed for fast and accurate taxonomic classification of metagenomic sequences. Its efficiency and precision make it a go-to resource for analyzing microbial communities, including bacteria, viruses, archaea, and fungi. Whether you're new to bioinformatics or experienced in the field, Kraken is an indispensable tool for taxonomic analysis.

In this blog, we’ll walk through the basics of Kraken, from installation to running an analysis, and highlight its key features and applications.

What is Kraken?

Kraken is a sequence classification tool that assigns taxonomic labels to DNA sequences using exact k-mer matching. It uses a reference database of genomes, dividing sequences into k-mers and identifying matches in a computationally efficient way.

Key Features of Kraken

Speed: Kraken processes data much faster than alignment-based methods.
Accuracy: It uses a precise k-mer matching algorithm for high-resolution taxonomic assignments.
Scalability: It can handle large metagenomic datasets.
Custom Databases: You can build and use custom databases tailored to your research needs.

Installing Kraken

System Requirements
- A Unix-based operating system (Linux/macOS).
- Sufficient computational resources for database building (RAM and disk space).
Installation Steps
- Clone the Kraken repository from GitHub:
  
  git clone https://github.com/DerrickWood/kraken.git cd kraken
- Compile the Kraken binaries:
  
  make
- Add Kraken to your PATH for easy access:
  
  export PATH=$PATH:/path/to/kraken

Preparing a Database

Kraken requires a database of reference genomes. You can use a pre-built database or create a custom one.

Downloading a Pre-built Database
Kraken offers pre-built databases, such as the MiniKraken database, which is lightweight and suitable for smaller datasets. Download it using:

kraken-build --download-library minikraken
Building a Custom Database
To include specific genomes, download FASTA files and build the database:

kraken-build --download-library bacteria --threads 4 --db my_database kraken-build --build --db my_database

This process may take considerable time and resources, depending on the size of the database.

Running Kraken

Once the database is ready, you can classify sequences.

Basic Usage
Use the following command to classify sequences:

kraken --db my_database --threads 4 --fastq-input input_sequences.fastq --output kraken_output.txt

Key options:
- --db: Specifies the database.
- --threads: Number of threads for parallel processing.
- --fastq-input: Indicates input file format (FASTQ/FASTA).
Interpreting Results
Kraken generates an output file with columns for sequence IDs, taxonomic classifications, and the confidence score.

Visualizing Kraken Results

Kraken results can be visualized using tools like Krona or converted to human-readable reports using kraken-report.

Generate a Report

kraken-report --db my_database kraken_output.txt > kraken_report.txt
Krona Visualization
Install Krona and convert Kraken output for visualization:

cut -f2,3 kraken_output.txt | ktImportTaxonomy -o krona_output.html

Open the HTML file in your browser to interactively explore the taxonomic classifications.

Advanced Usage

Confidence Thresholds
Adjust the confidence threshold for classification using the --confidence option. Higher values reduce false positives but may miss some true positives:

kraken --db my_database --confidence 0.1 --fastq-input input.fastq
Paired-End Reads
For paired-end sequencing data, use:

kraken --db my_database --paired reads_1.fastq reads_2.fastq
Customizing K-mers
Kraken allows you to set custom k-mer lengths during database building for specific applications.

Applications of Kraken

Microbial Ecology: Characterizing microbial communities in soil, water, and the human microbiome.
Pathogen Detection: Identifying pathogens in clinical samples.
Fungal Research: Analyzing fungal diversity in metagenomic datasets.
Environmental Monitoring: Tracking microbial populations in diverse habitats.

Conclusion

Kraken is a versatile and efficient tool for taxonomic classification in metagenomics. Its speed, accuracy, and flexibility make it a favorite among bioinformaticians. By following this guide, you can set up and use Kraken to unlock insights into microbial and fungal communities, paving the way for discoveries in ecology, medicine, and biotechnology.

CLARK: Fast, accurate and versatile sequence classification system

Jit — Sat, 15 Feb 2020 01:49:01 -0600

CLARK, a method based on a supervised sequence classification using discriminative k-mers. Considering two distinct specific classification problems (see the article for details), namely (1) the taxonomic classification of metagenomic reads to known bacterial genomes, and (2) the assignment of BAC clones and transcript to chromosome arms/centromeres (in the absence of a finished assembly for the reference genome), CLARK outperforms in classification speed and precision the best state-of-the-art methods.

http://clark.cs.ucr.edu/Spaced/

Address of the bookmark: http://clark.cs.ucr.edu/Spaced/

Tiara: deep learning-based classification system for eukaryotic sequences

Rahul Nayak — Mon, 14 Mar 2022 23:02:11 -0500

With a large number of metagenomic datasets becoming available, eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step toward a better understanding of eukaryotic diversity.

Address of the bookmark: https://academic.oup.com/bioinformatics/article/38/2/344/6375939

Metabuli 분리 improves metagenomic read classification

Abhi — Sat, 03 Jun 2023 20:15:04 -0500

Metabuli 분리 improves metagenomic read classification through metamers, DNA-AA k-mers, to be sensitive and specific, recovering 99% and 98% of DNA or AA classifiers.

Metabuli is metagenomic classifier that jointly analyze both DNA and amino acid (AA) sequences. DNA-based classifiers can make specific classifications, exploiting point mutations to distinguish close taxa. AA-based classifiers have higher sensitivity in detecting homology between query and reference sequences, leverageing higher conservation of AA sequences. Metabuli combines the information of both sequence types using a novel k-mer structure, metamer, to enable both specific and sensitive characterization of metagenomic samples. In addition, it can classify reads against a database of any size as long as it fits in the hard disk.

Address of the bookmark: https://github.com/steineggerlab/Metabuli

Pango Lineage Analysis !

Abhi — Mon, 15 Nov 2021 03:38:29 -0600

The Pango nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. This website documents all current Pango lineages and their spread, as well as various software tools which can be used by researchers to perform analyses on SARS-COV-2 sequence data.

Address of the bookmark: https://cov-lineages.org/resources/pangolin/output.html

RNAcon: web-server for the prediction and classification of non-coding RNAs

Shruti Paniwala — Mon, 17 Jul 2017 04:55:11 -0500

RNAcon is a web-server for the prediction and classification of non-coding RNAs. It uses SVM-based model for the discrimination between coding and ncRNAs and RandomForest-based prediction model for the classification of ncRNAs into different classes. The structural information based graph properties were used for the development of prediction model.

The standalone version (Linux-based command-line) of RNAcon is freely available for the global scientific community.

Reference: Panwar, B.; Arora, A. and Raghava, G.P.S. (2014) Prediction and classification of ncRNAs using structural informationBMC Genomics 2014, 15:127

Address of the bookmark: http://crdd.osdd.net/raghava/rnacon/

CAT/BAT: tool for taxonomic classification of contigs and metagenome-assembled genomes (MAGs)

Jit — Mon, 18 May 2020 10:53:32 -0500

Contig Annotation Tool (CAT) and Bin Annotation Tool (BAT) are pipelines for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins) of both known and (highly) unknown microorganisms, as generated by contemporary metagenomics studies. The core algorithm of both programs involves gene calling, mapping of predicted ORFs against the nr protein database, and voting-based classification of the entire contig / MAG based on classification of the individual ORFs. CAT and BAT can be run from intermediate steps if files are formated appropriately (see Usage).

Address of the bookmark: https://github.com/dutilh/CAT

Understanding DUMP files from NCBI Taxonomy database !

Shruti Paniwala — Fri, 15 Jul 2022 04:29:05 -0500

*.dmp files are bcp-like dump from GenBank taxonomy database

General information.

Field terminator is "\t|\t"

Row terminator is "\t|\n"

nodes.dmp file consists of taxonomy nodes. The description for each node includes the following

fields:

tax_id -- node id in GenBank taxonomy database

parent tax_id -- parent node id in GenBank taxonomy database

rank -- rank of this node (superkingdom, kingdom, ...)

embl code -- locus-name prefix; not unique

division id -- see division.dmp file

inherited div flag (1 or 0) -- 1 if node inherits division from parent

genetic code id -- see gencode.dmp file

inherited GC flag (1 or 0) -- 1 if node inherits genetic code from parent

mitochondrial genetic code id -- see gencode.dmp file

inherited MGC flag (1 or 0) -- 1 if node inherits mitochondrial gencode from parent

GenBank hidden flag (1 or 0) -- 1 if name is suppressed in GenBank entry lineage

hidden subtree root flag (1 or 0) -- 1 if this subtree has no sequence data yet

comments -- free-text comments and citations

Taxonomy names file (names.dmp):

tax_id -- the id of node associated with this name

name_txt -- name itself

unique name -- the unique variant of this name if name not unique

name class -- (synonym, common name, ...)

Divisions file (division.dmp):

division id -- taxonomy database division id

division cde -- GenBank division code (three characters)

division name -- e.g. BCT, PLN, VRT, MAM, PRI...

comments

Genetic codes file (gencode.dmp):

genetic code id -- GenBank genetic code id

abbreviation -- genetic code name abbreviation

name -- genetic code name

cde -- translation table for this genetic code

starts -- start codons for this genetic code

Deleted nodes file (delnodes.dmp):

tax_id -- deleted node id

Merged nodes file (merged.dmp):

old_tax_id -- id of nodes which has been merged

new_tax_id -- id of nodes which is result of merging

Citations file (citations.dmp):

cit_id -- the unique id of citation

cit_key -- citation key

pubmed_id -- unique id in PubMed database (0 if not in PubMed)

medline_id -- unique id in MedLine database (0 if not in MedLine)

url -- URL associated with citation

text -- any text (usually article name and authors).

-- The following characters are escaped in this text by a backslash:

-- newline (appear as "\n"),

-- tab character ("\t"),

-- double quotes ('\"'),

-- backslash character ("\\").

taxid_list -- list of node ids separated by a single space

Mosquito species known for transmitting the Dengue virus

BioStar — Wed, 03 Apr 2024 00:05:51 -0500

Here is a list of mosquito species known for transmitting the Dengue virus along with essential and applied information about each species:

1. Aedes aegypti:
- Geographical Distribution: Found in tropical and subtropical regions worldwide.
- Biting Behavior: Daytime biter, prefers feeding indoors, often around human dwellings.
- Role in Dengue Transmission: Primary vector responsible for transmitting Dengue virus to humans.

2. Aedes albopictus (Asian tiger mosquito):
- Geographical Distribution: Found in tropical, subtropical, and temperate regions worldwide.
- Biting Behavior: Daytime biter, feeds both indoors and outdoors, aggressive feeder.
- Role in Dengue Transmission: Secondary vector, can transmit Dengue virus to humans.

3. Aedes polynesiensis:
- Geographical Distribution: Found in Pacific Islands and coastal regions.
- Biting Behavior: Daytime biter, prefers feeding outdoors, often near coastal areas.
- Role in Dengue Transmission: Vector of Dengue virus in specific geographic regions.

4. Aedes scutellaris:
- Geographical Distribution: Found in Southeast Asia, Pacific Islands, and coastal regions.
- Biting Behavior: Daytime feeder, active in shaded areas, prefers outdoor environments.
- Role in Dengue Transmission: Vector of Dengue virus, particularly in coastal areas.

5. Aedes africanus:
- Geographical Distribution: Found in parts of Africa, including forested areas.
- Biting Behavior: Daytime feeder, prefers shaded areas, bites humans and other animals.
- Role in Dengue Transmission: Vector of Dengue virus in African regions.

Understanding the geographical distribution and biting behavior of these mosquito species is crucial for implementing effective control and prevention strategies to reduce Dengue virus transmission.