BOL: Related items

A Step-by-Step Guide to Running BLAST Offline

LEGE — Sat, 07 Dec 2024 22:32:37 -0600

BLAST (Basic Local Alignment Search Tool) is a powerful algorithm used to compare nucleotide or protein sequences to sequence databases, identifying regions of similarity. Running BLAST offline provides more control, ensures data security, and allows customization for specific research needs. Here’s a detailed guide to set up and run BLAST locally on your system.

Step 1: Install BLAST

Download BLAST:
- Visit the NCBI BLAST+ download page to download the appropriate version for your operating system (Windows, macOS, or Linux).
Install BLAST:
- Extract the downloaded archive. For Linux/Mac, use:
```
tar -xvzf ncbi-blast-*.tar.gz
cd ncbi-blast-*
```
- Add the BLAST binary folder to your system PATH for easier access:
```
export PATH=$PATH:/path/to/ncbi-blast-*/bin
```
Verify Installation:
Run the following command to ensure BLAST is installed correctly:
```
blastn -version
```

Step 2: Prepare a Local Database

To run BLAST offline, you’ll need a sequence database.

Download a Pre-Built Database (Optional):
- NCBI provides ready-to-use databases such as nt, nr, and Swiss-Prot. Use the update_blastdb.pl script (bundled with BLAST) to download these:
```
update_blastdb.pl --decompress nt
```
Create a Custom Database:
If you have specific sequences to use as a database:
- Prepare a FASTA file containing the sequences.
- Use makeblastdb to create a database:
```
makeblastdb -in your_sequences.fasta -dbtype [nucl|prot] -out custom_db
```
  Replace [nucl|prot] with nucl for nucleotide sequences or prot for protein sequences.

Step 3: Prepare the Query Sequence

Save your query sequence(s) in FASTA format.
Ensure the file is properly formatted, with a header line starting with > followed by the sequence name, and the sequence on subsequent lines.

Example:

>query_sequence
ATGCGTAGCTAGCGTAGCTAGCTAGCTA

Step 4: Run BLAST

Choose the Appropriate BLAST Tool:
Depending on your data type:
- blastn: For nucleotide-nucleotide searches.
- blastp: For protein-protein searches.
- blastx: Translates nucleotide sequences into proteins and compares them to a protein database.
- tblastn: Compares protein sequences to a nucleotide database.
- tblastx: Translates both nucleotide query and database sequences.
Run the Command:
Example command for blastn:
```
blastn -query query.fasta -db custom_db -out results.txt -outfmt 6 -evalue 1e-5
```
Explanation of Parameters:
- -query: Specifies the query file.
- -db: Points to the local database.
- -out: Output file name.
- -outfmt: Output format (e.g., 6 for tabular format).
- -evalue: E-value cutoff for significance.

Step 5: Interpret Results

Output Formats:
- Default (outfmt 0): Human-readable format.
- Tabular (outfmt 6): Includes fields like query ID, subject ID, percent identity, alignment length, etc.
Analyze Results:
Use tools like grep, Python, or R to parse and filter results for downstream analysis.

Step 6: Optimize Performance

For large datasets, BLAST can be resource-intensive. To improve performance:

Multithreading:
Use the -num_threads option to leverage multiple CPU cores:

blastn -query query.fasta -db custom_db -out results.txt -num_threads 4

Database Subsetting:
Split large databases into smaller chunks for faster searches.
Adjust Parameters:
- Lower the -evalue threshold for stricter matches.
- Use -max_target_seqs to limit the number of results per query.

Step 7: Update Databases (Optional)

If using NCBI databases, regularly update them to ensure the inclusion of the latest sequences:

update_blastdb.pl --decompress nt

Conclusion

Running BLAST offline is a straightforward process that offers flexibility and security for bioinformaticians working with sensitive data. By following this guide, you can harness the power of BLAST to analyze sequences efficiently and gain valuable biological insights.

For advanced use cases, explore BLAST’s customization options, such as custom scoring matrices, filtering, and iterative searches with tools like PSI-BLAST. Happy BLASTing!

Caretta – A multiple protein structure alignment and feature extraction suite

Rahul Nayak — Fri, 18 Dec 2020 02:09:44 -0600

Caretta – a multiple protein structure alignment and feature extraction suite

Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning.

Address of the bookmark: http://www.bioinformatics.nl/caretta/

NCBI PSI-BLAST Tutorial

Wed, 04 Sep 2013 11:46:06 -0500

http:--www.biotechnology.jhu.edu- Tutorial for PSI-BLAST, an extension of BLAST that uses matrix algebra. BLAST is a cornerstone bioinformatics tool at NCBI. BLAST is the Basic Local Alignment Search tool and will protein and DNA sequences that are related to a sequence that the user provides.

BLAST+ updated !!!

Jit — Tue, 16 Jun 2015 16:55:24 -0500

A new version (2.2.31) of the stand-alone BLAST executables (Linux, Windows and MacOSX on FTP) is now available. New features include support for BLAST-XML2 specification (information here) and JSON BLAST output format, as well as several bug fixes and improvements. The BLAST AMI at AWS will also be updated to 2.2.31 (see this BLAST Help page for more information). For a full list of improvements, see the release notes.

More at http://www.ncbi.nlm.nih.gov/news/06-16-2015-blast-plus-update/?

sequenceserver

Jit — Fri, 10 Mar 2017 08:51:55 -0600

SequenceServer lets you rapidly set up a BLAST+ server with an intuitive user interface for use locally or over the web.

More at http://sequenceserver.com.

Address of the bookmark: https://github.com/wurmlab/sequenceserver

NCBI Magic-BLAST

Jit — Tue, 14 Aug 2018 18:11:11 -0500

Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.

Magic-BLAST incorporates within the NCBI BLAST code framework ideas developed in the NCBI Magic pipeline, in particular hit extensions by local walk and jump (http://www.ncbi.nlm.nih.gov/pubmed/26109056), and recursive clipping of mismatches near the edges of the reads, which avoids accumulating artefactual mismatches near splice sites and is needed to distinguish short indels from substitutions near the edges.

Address of the bookmark: https://ncbi.github.io/magicblast/

Magic-BLAST

Shruti Paniwala — Fri, 20 Mar 2020 15:18:36 -0500

Address of the bookmark: https://ncbi.github.io/magicblast/

Cleaner BLAST Databases for More Accurate Results

LEGE — Tue, 23 Apr 2024 01:23:08 -0500

Do you use BLAST to identify a sequence or the evolutionary scope of a gene? That can be challenging if contaminated and misclassified sequences are in the BLAST databases and show up in your search results. To address this problem, we now use the NCBI quality assurance tools listed below to systematically remove these misleading sequences from the default nucleotide (nt) and protein (nr) BLAST databases.

Foreign Contamination Screen tool for genome cross-species screening (FCS-GX) detects contamination from foreign organisms in genomes and other sequences using the genome cross-species aligner (GX)
Average Nucleotide Identity (ANI) evaluates the taxonomic classification of prokaryotic genome assemblies. Sequences from genomes marked up as ‘unverified source organism’ are considered suspect and removed.

Ref https://ncbiinsights.ncbi.nlm.nih.gov/2024/04/22/cleaner-blast-databases-more-accurate-results/

Circoletto: visualizing sequence similarity with Circos

Jit — Fri, 09 Feb 2018 10:23:40 -0600

Circoletto, an online visualization tool based on Circos, which provides a fast, aesthetically pleasing and informative overview of sequence similarity search results.

Online version and downloadable software package for offline use (source code in PERL) freely available at http://bat.ina.certh.gr/tools/circoletto/

Contact:ndarz@certh.gr

Address of the bookmark: http://tools.bat.infspire.org/circoletto/

TwinBLAST: When Two Is Better than One

Jit — Sat, 07 Sep 2019 08:50:08 -0500

TwinBLAST is a web-based tool for viewing 2 BLAST reports simultaneouslyside-by-side. It uses ExtJS (www.sencha.com/products/extjs/) to provide 2independently scrollable panels. BioPerl (www.bioperl.org) is used to indexraw BLAST reports and Bio::Graphics is used to draw pictograms of the BLASThits.

https://github.com/IGS/twinblast

https://mra.asm.org/content/8/35/e00842-19

Address of the bookmark: https://github.com/IGS/twinblast