BOL: Related items

BLAST+ 2.11.0 release is now available on FTP site !

Jit — Sat, 14 Nov 2020 21:37:53 -0600

BLAST+ 2.11.0 release is now available from our FTP site. The main advance is the ability to provide usage reports to NCBI to help us improve BLAST. This information is limited to the name of the BLAST program, some basic database metadata, a few BLAST parameters, as well the number and total size of your queries. See the Privacy document for more details on the information we collect, how we will use it, and how you can opt-out of reporting.

Another new feature allows threading by query batch in rpsblast/rpstblastn. Enabling this option using -m t provides more efficient searching with large numbers of queries. See release notes for details on more improvements and bug fixes.

Useful Links
------------
NCBI Insights: https://ncbiinsights.ncbi.nlm.nih.gov/2020/11/12/blast-2-11-0/

BLAST FTP: https://go.usa.gov/x7QQ3
Privacy document: https://go.usa.gov/x7QQe
Release notes: https://go.usa.gov/x7Qnv

Chemical Elements of Bioinformatics

Rahul Agarwal — Tue, 03 Sep 2013 16:35:39 -0500

You must be familiar with periodic table and colour pattern, but this time you are going to amaze by new elements table by Eagle genomics. Just check it out and have fun :)

http://elements.eaglegenomics.com/

TwinBLAST: When Two Is Better than One

Jit — Sat, 07 Sep 2019 08:50:08 -0500

TwinBLAST is a web-based tool for viewing 2 BLAST reports simultaneouslyside-by-side. It uses ExtJS (www.sencha.com/products/extjs/) to provide 2independently scrollable panels. BioPerl (www.bioperl.org) is used to indexraw BLAST reports and Bio::Graphics is used to draw pictograms of the BLASThits.

https://github.com/IGS/twinblast

https://mra.asm.org/content/8/35/e00842-19

Address of the bookmark: https://github.com/IGS/twinblast

SOWDHAMINI Lab

Sun, 15 Sep 2013 09:19:12 -0500

Genome sequencing projects have enormous potential for benefiting human endeavors. However, just as acquiring a language's vocabulary does not enable one to speak it, databases that list the amino acid composition of proteins do not directly tell us much about these proteins' higher-level structure and function. The most productive way to indirectly exploit these databases has been to start with the small number of proteins that are fully-characterised and to assume that other "similar" proteins will have a related structure and function. Proteins with very similar amino acid sequence are "no-brainers", but the real test, which our group largely focuses on, is to detect the "essential" similarity in proteins whose non-critical sections have experienced random rearrangements during evolution. In such cases functionally similar proteins may have less than 25% sequence overlap.

More @ http://www.ncbs.res.in/sowdhamini/groups_sowdhamini.htm

BLAST+ 5: Key Updates and Enhancements for Modern Bioinformatics

LEGE — Sat, 07 Dec 2024 22:37:48 -0600

The BLAST+ 5 (Basic Local Alignment Search Tool) update has introduced several key enhancements aimed at improving performance, user experience, and compatibility with evolving genomic data standards. Here are the major updates:

Database Enhancements:
- The BLAST databases have shifted fully to the version 5 (v5) format, which integrates built-in taxonomy information. This allows for more detailed and efficient sequence annotation and analysis.
- Protein databases in v5 are now accession-based, supporting a broader range of sequences, including those from high-throughput projects and the Pathogen Detection Project. These databases also accommodate structural proteins with multi-character chain identifiers.
Performance Improvements:
- Adaptive Composition-Based Statistics (CBS) is available as an experimental feature, enhancing the detection of novel results in protein-protein comparisons.
- Updated algorithms improve the stability of search results, especially when fewer hits are requested than the default output.
Compatibility:
- Support for the older v4 databases has been discontinued. The v5 format is now the default for all BLAST database updates, ensuring alignment with current standards in bioinformatics.
User-Friendly Changes:
- Naming conventions for databases have been simplified to enhance clarity and ease of use. For example, database names no longer include version tags like "_v5".
Future-Proofing:
- BLAST+ 5 aligns with current and upcoming data requirements, ensuring that researchers have access to the most comprehensive and modern resources for sequence alignment.

These updates reflect NCBI's commitment to maintaining BLAST as a leading tool for sequence analysis. For detailed release notes and additional guidance, refer to NCBI Insights here

Google Genomics

Tenzin Paul — Thu, 18 Dec 2014 11:05:42 -0600

Explore genetic variation interactively. Compare entire cohorts in seconds with SQL-like queries. Compute transition/transversion ratios, genome-wide association, allelic frequency and more.
Process big genomic data easily. Run batch analyses like principal component analysis and Hardy-Weinberg equilibrium on as many samples as you like, in minutes or hours, with just a little code.
Use Google's infrastructure and big data expertise. Store one genome or a million using Google Genomics and take advantage of the same infrastructure that powers Search, Maps, YouTube, Gmail and Drive.
Support emerging global standards. Google Genomics is implementing the API defined by the Global Alliance for Genomics and Health for visualization, analysis and more. Compliant software can access Google Genomics, local servers, or any other implementation.

Address of the bookmark: https://cloud.google.com/genomics/

genomics public data links !

Jit — Thu, 13 Feb 2020 00:20:00 -0600

List of publically available databases on google server.

More at https://software.broadinstitute.org/gatk/download/bundle

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/GATK/.

ftp://ftp.broadinstitute.org/bundle/hg38/hg38bundle/

Address of the bookmark: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0?pli=1

A Beginner's Guide to Using Kraken for Taxonomic Classification

Neel — Fri, 13 Dec 2024 11:29:03 -0600

Kraken is a popular bioinformatics tool designed for fast and accurate taxonomic classification of metagenomic sequences. Its efficiency and precision make it a go-to resource for analyzing microbial communities, including bacteria, viruses, archaea, and fungi. Whether you're new to bioinformatics or experienced in the field, Kraken is an indispensable tool for taxonomic analysis.

In this blog, we’ll walk through the basics of Kraken, from installation to running an analysis, and highlight its key features and applications.

What is Kraken?

Kraken is a sequence classification tool that assigns taxonomic labels to DNA sequences using exact k-mer matching. It uses a reference database of genomes, dividing sequences into k-mers and identifying matches in a computationally efficient way.

Key Features of Kraken

Speed: Kraken processes data much faster than alignment-based methods.
Accuracy: It uses a precise k-mer matching algorithm for high-resolution taxonomic assignments.
Scalability: It can handle large metagenomic datasets.
Custom Databases: You can build and use custom databases tailored to your research needs.

Installing Kraken

System Requirements
- A Unix-based operating system (Linux/macOS).
- Sufficient computational resources for database building (RAM and disk space).
Installation Steps
- Clone the Kraken repository from GitHub:
  
  git clone https://github.com/DerrickWood/kraken.git cd kraken
- Compile the Kraken binaries:
  
  make
- Add Kraken to your PATH for easy access:
  
  export PATH=$PATH:/path/to/kraken

Preparing a Database

Kraken requires a database of reference genomes. You can use a pre-built database or create a custom one.

Downloading a Pre-built Database
Kraken offers pre-built databases, such as the MiniKraken database, which is lightweight and suitable for smaller datasets. Download it using:

kraken-build --download-library minikraken
Building a Custom Database
To include specific genomes, download FASTA files and build the database:

kraken-build --download-library bacteria --threads 4 --db my_database kraken-build --build --db my_database

This process may take considerable time and resources, depending on the size of the database.

Running Kraken

Once the database is ready, you can classify sequences.

Basic Usage
Use the following command to classify sequences:

kraken --db my_database --threads 4 --fastq-input input_sequences.fastq --output kraken_output.txt

Key options:
- --db: Specifies the database.
- --threads: Number of threads for parallel processing.
- --fastq-input: Indicates input file format (FASTQ/FASTA).
Interpreting Results
Kraken generates an output file with columns for sequence IDs, taxonomic classifications, and the confidence score.

Visualizing Kraken Results

Kraken results can be visualized using tools like Krona or converted to human-readable reports using kraken-report.

Generate a Report

kraken-report --db my_database kraken_output.txt > kraken_report.txt
Krona Visualization
Install Krona and convert Kraken output for visualization:

cut -f2,3 kraken_output.txt | ktImportTaxonomy -o krona_output.html

Open the HTML file in your browser to interactively explore the taxonomic classifications.

Advanced Usage

Confidence Thresholds
Adjust the confidence threshold for classification using the --confidence option. Higher values reduce false positives but may miss some true positives:

kraken --db my_database --confidence 0.1 --fastq-input input.fastq
Paired-End Reads
For paired-end sequencing data, use:

kraken --db my_database --paired reads_1.fastq reads_2.fastq
Customizing K-mers
Kraken allows you to set custom k-mer lengths during database building for specific applications.

Applications of Kraken

Microbial Ecology: Characterizing microbial communities in soil, water, and the human microbiome.
Pathogen Detection: Identifying pathogens in clinical samples.
Fungal Research: Analyzing fungal diversity in metagenomic datasets.
Environmental Monitoring: Tracking microbial populations in diverse habitats.

Conclusion

Kraken is a versatile and efficient tool for taxonomic classification in metagenomics. Its speed, accuracy, and flexibility make it a favorite among bioinformaticians. By following this guide, you can set up and use Kraken to unlock insights into microbial and fungal communities, paving the way for discoveries in ecology, medicine, and biotechnology.

Smash: An alignment-free method to find and visualise rearrangements between pairs of DNA sequences

Jit — Tue, 26 Apr 2016 12:18:49 -0500

Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements. The detection is based on conditional exclusive compression, namely using a FCM (Markov model), of high context order (typically 20). For visualisation, Smash outputs a SVG image, with an ideogramoutput architecture, where the patterns are represented with several HSV values (only value varies). The method can perform both in small- and large-scale. Nevertheless is more directed to large-scale since that the main aim of the research is to know where the large-scale [chromosomal by chromosome] of several primates was equal/different, having at a glance a map of the entire genomes.

Address of the bookmark: http://bioinformatics.ua.pt/software/smash/

Spines

Jit — Mon, 28 Nov 2016 05:33:26 -0600

Spines is a collection of software tools, developed and used by the Vertebrate Genome Biology Group at the Broad Institute. It provides basic data structures for efficient data manipulation (mostly genomic sequences, alignments, variation etc.), as well as specialized tool sets for various analyses. It also features three sequence alignment packages: Satsuma, a highly parallelized program for high-sensitivity, genome-wide synteny; Papaya, an all-purpose alignment tool for less diverged sequences; and SLAP, a context-sensitive local aligner for diverged sequences with large gaps.

Access Spines here.

Address of the bookmark: https://www.broadinstitute.org/genome-sequencing-and-analysis/spines