BOL: Related items

RepeatMasker compatible blast tool

Neel — Fri, 07 Dec 2018 08:13:03 -0600

RMBlast is a RepeatMasker compatible version of the standard NCBI blastn program. The primary difference between this distribution and the NCBI distribution is the addition of a new program "rmblastn" for use with RepeatMasker and RepeatModeler.

RMBlast supports RepeatMasker searches by adding a few necessary features to the stock NCBI blastn program. These include:

Support for custom matrices ( without KA-Statistics ).
Support for cross_match-like complexity adjusted scoring. Cross_match is Phil Green's seeded smith-waterman search algorithm.
Support for cross_match-like masklevel filtering.

https://anaconda.org/bioconda/rmblast

Address of the bookmark: http://www.repeatmasker.org/RMBlast.html

Visualise blast results !

Abhi — Tue, 11 Oct 2022 03:15:10 -0500

Kablammo helps you create interactive visualizations of BLAST results from your web browser. Find your most interesting alignments, list detailed parameters for each, and export a publication-ready vector image, all without installing any software.

Address of the bookmark: https://kablammo.wasmuthlab.org/

A Step-by-Step Guide to Running BLAST Offline

LEGE — Sat, 07 Dec 2024 22:32:37 -0600

BLAST (Basic Local Alignment Search Tool) is a powerful algorithm used to compare nucleotide or protein sequences to sequence databases, identifying regions of similarity. Running BLAST offline provides more control, ensures data security, and allows customization for specific research needs. Here’s a detailed guide to set up and run BLAST locally on your system.

Step 1: Install BLAST

Download BLAST:
- Visit the NCBI BLAST+ download page to download the appropriate version for your operating system (Windows, macOS, or Linux).
Install BLAST:
- Extract the downloaded archive. For Linux/Mac, use:
```
tar -xvzf ncbi-blast-*.tar.gz
cd ncbi-blast-*
```
- Add the BLAST binary folder to your system PATH for easier access:
```
export PATH=$PATH:/path/to/ncbi-blast-*/bin
```
Verify Installation:
Run the following command to ensure BLAST is installed correctly:
```
blastn -version
```

Step 2: Prepare a Local Database

To run BLAST offline, you’ll need a sequence database.

Download a Pre-Built Database (Optional):
- NCBI provides ready-to-use databases such as nt, nr, and Swiss-Prot. Use the update_blastdb.pl script (bundled with BLAST) to download these:
```
update_blastdb.pl --decompress nt
```
Create a Custom Database:
If you have specific sequences to use as a database:
- Prepare a FASTA file containing the sequences.
- Use makeblastdb to create a database:
```
makeblastdb -in your_sequences.fasta -dbtype [nucl|prot] -out custom_db
```
  Replace [nucl|prot] with nucl for nucleotide sequences or prot for protein sequences.

Step 3: Prepare the Query Sequence

Save your query sequence(s) in FASTA format.
Ensure the file is properly formatted, with a header line starting with > followed by the sequence name, and the sequence on subsequent lines.

Example:

>query_sequence
ATGCGTAGCTAGCGTAGCTAGCTAGCTA

Step 4: Run BLAST

Choose the Appropriate BLAST Tool:
Depending on your data type:
- blastn: For nucleotide-nucleotide searches.
- blastp: For protein-protein searches.
- blastx: Translates nucleotide sequences into proteins and compares them to a protein database.
- tblastn: Compares protein sequences to a nucleotide database.
- tblastx: Translates both nucleotide query and database sequences.
Run the Command:
Example command for blastn:
```
blastn -query query.fasta -db custom_db -out results.txt -outfmt 6 -evalue 1e-5
```
Explanation of Parameters:
- -query: Specifies the query file.
- -db: Points to the local database.
- -out: Output file name.
- -outfmt: Output format (e.g., 6 for tabular format).
- -evalue: E-value cutoff for significance.

Step 5: Interpret Results

Output Formats:
- Default (outfmt 0): Human-readable format.
- Tabular (outfmt 6): Includes fields like query ID, subject ID, percent identity, alignment length, etc.
Analyze Results:
Use tools like grep, Python, or R to parse and filter results for downstream analysis.

Step 6: Optimize Performance

For large datasets, BLAST can be resource-intensive. To improve performance:

Multithreading:
Use the -num_threads option to leverage multiple CPU cores:

blastn -query query.fasta -db custom_db -out results.txt -num_threads 4

Database Subsetting:
Split large databases into smaller chunks for faster searches.
Adjust Parameters:
- Lower the -evalue threshold for stricter matches.
- Use -max_target_seqs to limit the number of results per query.

Step 7: Update Databases (Optional)

If using NCBI databases, regularly update them to ensure the inclusion of the latest sequences:

update_blastdb.pl --decompress nt

Conclusion

Running BLAST offline is a straightforward process that offers flexibility and security for bioinformaticians working with sensitive data. By following this guide, you can harness the power of BLAST to analyze sequences efficiently and gain valuable biological insights.

For advanced use cases, explore BLAST’s customization options, such as custom scoring matrices, filtering, and iterative searches with tools like PSI-BLAST. Happy BLASTing!

Refseq viraal genome sequences !

Jit — Sat, 11 Dec 2021 08:35:18 -0600

List of all viruses on NCBI

https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/

Address of the bookmark: https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/

SEASTAR: Systematic Evaluation of Alternative STArt site in RNA

BioStar — Thu, 13 Aug 2020 09:54:27 -0500

SEASTAR (Systematic Evaluation of Alternative STArt site in RNA) is a software package for Transcription Start Site (TSS) identification and quantification using only RNA-seq data. It assembles novel TSSs based only on RNA-Seq data and merges them with known TSSs from a public database. This package enables high-quality TSS identification that is comparable to the highly sophisticated CAGE technology. This package is particularly useful for finding novel TSSs that contribute to transcriptome complexity along with identifying differential promoter utilization.

version 1.0.0 - updates several descriptions and tests. To achieve v0.9.4, one can visit https://github.com/zhyqin/SEASTAR-0.9.4 for download.

Address of the bookmark: https://github.com/Xinglab/SEASTAR

NCBI PSI-BLAST Tutorial

Fri, 23 Aug 2013 02:25:02 -0500

http:--www.biotechnology.jhu.edu- Tutorial for PSI-BLAST, an extension of BLAST that uses matrix algebra. BLAST is a cornerstone bioinformatics tool at NCBI. BLAST is the Basic Local Alignment Search tool and will protein and DNA sequences that are related to a sequence that the user provides.

A fast package to parse BLAST

Jitendra Narayan — Tue, 10 Sep 2013 16:58:56 -0500

In current era, we are handling huge amount of genomics data, and analysing it to make some biological sense out of it. Large-scale sequence studies requiring BLAST-based analysis produce huge amounts of data to be parsed. There are several BLAST parsers are available, but they are often missing some important features, such as keeping all information from the raw BLAST output, allowing direct access to single results, and performing logical operations over them.

Massimiliano Orsini and Simone Carcangiu develope a new and fast fast package "BlaSTorage" to parse and store BLAST results. BlaSTorage shows comparable speed of more basic parser written in compiled languages as C++ and can be easily integrated into web applications or software pipelines.

Find more @ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571973/

http://biowiki.crs4.it/biowiki/MassimilianoOrsini

BLAST Ring Image Generator (BRIG)

Anjana — Fri, 30 Sep 2016 09:18:50 -0500

BRIG is a free cross-platform (Windows/Mac/Unix) application that can display circular comparisons between a large number of genomes, with a focus on handling genome assembly data. The application is available at: http://sourceforge.net/projects/brig

If you have any questions or comments, post them on one of the trackers on BRIG’s SourceForge page: http://sourceforge.net/tracker/?group_id=328245.

Features:

Images show similarity between a central reference sequence and other sequences as concentric rings.
BRIG will perform all BLAST comparisons and file parsing automatically via a simple GUI.
Contig boundaries and read coverage can be displayed for draft genomes; customized graphs and annotations can be displayed.
Using a user-defined set of genes as input, BRIG can display gene presence, absence, truncation or sequence variation in a set of complete genomes, draft genomes or even raw, unassembled sequence data.
BRIG also accepts SAM-formatted read-mapping files enabling genomic regions present in unassembled sequence data from multiple samples to be compared simultaneously

Address of the bookmark: http://brig.sourceforge.net/

Converting BLAST output into CSV

Poonam Mahapatra — Mon, 11 Dec 2017 04:17:58 -0600

Suppose we wanted to do something with all this BLAST output. Generally, that’s the case - you want to retrieve all matches, or do a reciprocal BLAST, or something.

As with most programs that run on UNIX, the text output is in some specific format. If the program is popular enough, there will be one or more parsers written for that format – these are just utilities written to help you retrieve whatever information you are interested in from the output.

Let’s conclude this tutorial by converting the BLAST output in out.txt into a spreadsheet format, using a Python script.

First, we need to get the script. We’ll do that using the ‘git’ program:

git clone https://github.com/ngs-docs/ngs-scripts.git /root/ngs-scripts

We’ll discuss ‘git’ more later; for now, just think of it as a way to get ahold of a particular set of files. In this case, we’ve placed the files in /root/ngs-scripts/, and you’re looking to run the script blast/blast-to-csv.py using Python:

python /root/ngs-scripts/blast/blast-to-csv.py out.txt

This outputs a spread-sheet like list of names and e-values. To save this to a file, do:

python /root/ngs-scripts/blast/blast-to-csv.py out.txt > ~out.csv

If you have Excel installed, try double clicking on it.

Elastic BLAST !

Abhi — Tue, 06 Sep 2022 18:14:57 -0500

ElasticBLAST is a new way to BLAST large numbers of queries, faster and on the cloud. Here are the top three reasons you should use ElasticBLAST:

1. ElasticBLAST can handle much LARGER queries!

ElasticBLAST can search query sets that have hundreds to millions of sequences and against BLAST databases of all sizes.

2. ElasticBLAST is FASTER

ElasticBLAST distributes your searches across multiple cloud instances to process them simultaneously. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+.

3. ElasticBLAST is EASY to run on the cloud

ElasticBLAST is easy to set up using our step-by-step instructions (Amazon Web Services (AWS), Google Cloud Platform (GCP)) and allows you to leverage the power of the cloud. Once configured, it manages the software and database installation, handles partitioning of the BLAST workload among the various instances, and deallocates cloud resources when the searches are done.

ElasticBLAST also selects the instance (i.e., machine) type for you based on database size. Of course, you can also choose the instance type manually if you prefer.

Address of the bookmark: https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/