<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/42370?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/42370?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/4193/bioinformatics-101-running-blast</guid>
	<pubDate>Tue, 03 Sep 2013 14:59:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/4193/bioinformatics-101-running-blast</link>
	<title><![CDATA[Bioinformatics 101 -  Running BLAST]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/CYnjROfGXv8" frameborder="0" allowfullscreen></iframe>How to format the database for BLAST, run the command, view the output file, and use BioPerl and Perl to parse the output. By David Francis, Ohio State University. Delivered live at the Tomato Disease Workshop 2010. For more information, please visit http://www.extension.org/pages/32521/bioinformatics-101-video.]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/4851/blast</guid>
	<pubDate>Wed, 25 Sep 2013 10:56:23 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/4851/blast</link>
	<title><![CDATA[BLAST]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/g0nSH17psDc" frameborder="0" allowfullscreen></iframe>Dr. Rob Edwards describes how BLAST works]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30976/brig</guid>
	<pubDate>Thu, 16 Feb 2017 13:14:25 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30976/brig</link>
	<title><![CDATA[BRIG]]></title>
	<description><![CDATA[<p>BRIG is a free cross-platform (Windows/Mac/Unix) application that can display circular comparisons between a large number of genomes, with a focus on handling genome assembly data. The application is available at:<a href="http://sourceforge.net/projects/brig">http://sourceforge.net/projects/brig</a></p>
<p>If you have any questions or comments, post them on&nbsp;<a href="http://sourceforge.net/tracker/?group_id=328245">one of the trackers</a>&nbsp;on BRIG&rsquo;s SourceForge page:<a href="http://sourceforge.net/tracker/?group_id=328245">http://sourceforge.net/tracker/?group_id=328245</a>.</p>
<p>Features:</p>
<ul>
<li>Images show similarity between a central reference sequence and other sequences as concentric rings.</li>
<li>BRIG will perform all BLAST comparisons and file parsing automatically via a simple GUI.</li>
<li>Contig boundaries and read coverage can be displayed for draft genomes; customized graphs and annotations can be displayed.</li>
<li>Using a user-defined set of genes as input, BRIG can display gene presence, absence, truncation or sequence variation in a set of complete genomes, draft genomes or even raw, unassembled sequence data.</li>
<li>BRIG also accepts SAM-formatted read-mapping files enabling genomic regions present in unassembled sequence data from multiple samples to be compared simultaneously</li>
</ul>
<p>&nbsp;</p><p>Address of the bookmark: <a href="http://brig.sourceforge.net/" rel="nofollow">http://brig.sourceforge.net/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</guid>
	<pubDate>Wed, 14 Mar 2018 05:10:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</link>
	<title><![CDATA[Basic command-line to run BLAST]]></title>
	<description><![CDATA[<p>&nbsp;</p><p>The goal of this tutorial is to run you through a demonstration of the command line, which you may not have seen or used much before.</p><p>All of the commands below can copy/pasted.</p><div id="install-software"><h2>Install software<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#install-software" title="Permalink to this headline"></a></h2><p>Copy and paste the following commands</p><div><div><pre>sudo apt-get update &amp;&amp; sudo apt-get -y install python ncbi-blast+
</pre></div></div><p>This updates the software list and installs the Python programming language and NCBI BLAST+.</p></div><div id="get-data"><h2>Get Data<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#get-data" title="Permalink to this headline"></a></h2><p>Grab some data to play with. Grab some cow and human RefSeq proteins:</p><div><div><pre>wget ftp://ftp.ncbi.nih.gov/refseq/B_taurus/mRNA_Prot/cow.1.protein.faa.gz
wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz
</pre></div></div><p>This is only the first part of the human and cow protein files - there are 24 files total for human.</p><p>The database files are both gzipped, so lets unzip them</p><div><div><pre>gunzip *gz
ls
</pre></div></div><p>Take a look at the head of each file:</p><div><div><pre>head cow.1.protein.faa
head human.1.protein.faa
</pre></div></div><p>These are protein sequences in FASTA format. FASTA format is something many of you have probably seen in one form or another &ndash; it&rsquo;s pretty ubiquitous. It&rsquo;s just a text file, containing records; each record starts with a line beginning with a &lsquo;&gt;&rsquo;, and then contains one or more lines of sequence text.</p><p>Note that the files are in fasta format, even though they end if &rdquo;.faa&rdquo; instead of the usual &rdquo;.fasta&rdquo;. This NCBI&rsquo;s way of denoting that this is a fasta file with amino acids instead of nucleotides.</p><p>How many sequences are in each one?</p><div><div><pre>grep -c '^&gt;' cow.1.protein.faa
grep -c '^&gt;' human.1.protein.faa
</pre></div></div><p>This grep command uses the c flag, which reports a count of lines with match to the pattern. In this case, the pattern is a regular expression, meaning match only lines that begin with a &gt;.</p><p>This is a bit too big, lets take a smaller set for practice. Lets take the first two sequences of the cow proteins, which we can see are on the first 6 lines</p><div><div><pre>head -6 cow.1.protein.faa &gt; cow.small.faa
</pre></div></div></div><div id="blast"><h2>BLAST<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#blast" title="Permalink to this headline"></a></h2><p>Now we can blast these two cow sequences against the set of human sequences. First, we need to tell blast about our database. BLAST needs to do some pre-work on the database file prior to searching. This helps to make the software work a lot faster. Because you installed your own version of the sotware, you need to tell the shell where the software is located. Use the full path and the makeblastdb command:</p><div><div><pre>makeblastdb -in human.1.protein.faa -dbtype prot
ls
</pre></div></div><p>Note that this makes a lot of extra files, with the same name as the database plus new extensions (.pin, .psq, etc). To make blast work, these files, called index files, must be in the same directory as the fasta file.</p><p><br /> blastp [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-qcov_hsp_perc float_value]<br /> [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking]<br /> [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value]<br /> [-best_hit_overhang float_value] [-best_hit_score_edge float_value]<br /> [-window_size int_value] [-lcase_masking] [-query_loc range]<br /> [-parse_deflines] [-outfmt format] [-show_gis]<br /> [-num_descriptions int_value] [-num_alignments int_value]<br /> [-line_length line_length] [-html] [-max_target_seqs num_sequences]<br /> [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]<br /> [-use_sw_tback] [-version]</p><p>Now we can run the blast job. We will use blastp, which is appropriate for protein to protein comparisons.</p><div><div><pre>blastp -query cow.small.faa -db human.1.protein.faa
</pre></div></div><p>This gives us a lot of information on the terminal screen. But this is difficult to save and use later - Blast also gives the option of saving the text to a file.</p><div><div><pre>    blastp -query cow.small.faa -db human.1.protein.faa -out cow_vs_human_blast_results.txt
ls
</pre></div></div><p>Take a look at the results using less. Note that there can be more than one match between the query and the same subject. These are referred to as high-scoring segment pairs (HSPs).</p><div><div><pre>less cow_vs_human_blast_results.txt
</pre></div></div><p>So how do you know about all the options, such as the flag to create an output file? Lets also take a look at the help pages. Unfortunately there are no man pages (those are usually reserved for shell commands, but some software authors will provide them as well), but there is a text help output</p><div><div><pre>blastp -help
</pre></div></div><p>To scroll through slowly</p><div><div><pre>blastp -help | less
</pre></div></div><p>To quit the less screen, press the q key.</p><p>Parameters of interest include the -evalue (Default is 10?!?) and the -outfmt</p><p>Lets filter for more statistically significant matches with a different output format:</p><div><div><pre>blastp \
-query cow.small.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 7
</pre></div></div><p>I broke the long single command into many lines with by &ldquo;escaping&rdquo; the newline. That forward slash tells the command line &ldquo;Wait, I&rsquo;m not done yet!&rdquo;. So it waits for the next line of the command before executing.</p><p>Check out the results with less.</p><p>Lets try a medium sized data set next</p><div><div><pre>head -199 cow.1.protein.faa &gt; cow.medium.faa
</pre></div></div><p>What size is this db?</p><div><div><pre>grep -c '^&gt;' cow.medium.faa
</pre></div></div><p>Lets run the blast again, but this time lets return only the best hit for each query.</p><div><div><pre>blastp \
-query cow.medium.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 6 \
-max_target_seqs 1
</pre></div></div></div><div id="summary"><h2>Summary<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#summary" title="Permalink to this headline"></a></h2><p>Review:</p><ul>
<li>command line programs such as blast use flags to get information about how and what to do</li>
<li>blast options can be found by typing&nbsp;<cite>blastp -help</cite></li>
<li>break a command up over many lines by using&nbsp;<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#id1">`</a>` to &ldquo;escape&rdquo; the new line</li>
</ul><p>&nbsp;</p><p>Blastn</p><p>blastn [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-perc_identity float_value]<br /> [-qcov_hsp_perc float_value] [-max_hsps int_value]<br /> [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]<br /> [-min_raw_gapped_score int_value] [-template_type type]<br /> [-template_length int_value] [-dust DUST_options]<br /> [-filtering_db filtering_database]<br /> [-window_masker_taxid window_masker_taxid]<br /> [-window_masker_db window_masker_db] [-soft_masking soft_masking]<br /> [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]<br /> [-best_hit_score_edge float_value] [-window_size int_value]<br /> [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]<br /> [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]<br /> [-outfmt format] [-show_gis] [-num_descriptions int_value]<br /> [-num_alignments int_value] [-line_length line_length] [-html]<br /> [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]<br /> [-version]</p><p>DESCRIPTION<br /> Nucleotide-Nucleotide BLAST 2.7.0+</p></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/40589/new-layout-for-blast-ftp-database-site</guid>
	<pubDate>Tue, 21 Jan 2020 11:57:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/40589/new-layout-for-blast-ftp-database-site</link>
	<title><![CDATA[New Layout for BLAST ftp Database Site]]></title>
	<description><![CDATA[<p>As announced previously, the new default database version for&nbsp;<a href="https://ncbiinsights.ncbi.nlm.nih.gov/2019/12/18/blast-2-10-0/" target="_blank" title="Follow link">BLAST+</a>&nbsp;is&nbsp;<a href="https://ncbiinsights.ncbi.nlm.nih.gov/2019/09/30/protein-blastdbs-accession-based/" target="_blank" title="Follow link">dbV5</a>.&nbsp; To complete this transition, the&nbsp;<a href="ftp://ftp.ncbi.nlm.nih.gov/blast/db/" target="_blank" title="Follow link">ftp database site</a>&nbsp;will be updated to support this change.&nbsp; We expect this change to happen around February 4<sup>th</sup>, please adjust your scripts or procedures accordingly.</p><p>Here is a list of what is changing:</p><ol>
<li>All databases at the root level will be dbV5.</li>
<li>The dbV5 file naming, &nbsp;&ldquo;_v5&rdquo; will be removed. Databases with &nbsp;no &ldquo;_vX&rdquo; descriptor will be dbV5.</li>
<li>dbV4 tarballs will be renamed with "_v4", files included in tarball will not be renamed.</li>
<li>dbV4 databases will be moved to a v4 subdirectory.</li>
<li>As of 1/13/20 the Cloud directory will be frozen with no more new entries.</li>
<li>The will be no more updates to dbV4 databases.</li>
<li>The FASTA directory will contain nr, nt, swissprot, and pdbaa files.</li>
</ol><p>If you have any questions or concerns, please contact&nbsp;<a href="mailto:blast-help@ncbi.nlm.nih.gov" target="_blank" title="Follow link">blast-help@ncbi.nlm.nih.gov</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43952/elastic-blast</guid>
	<pubDate>Tue, 06 Sep 2022 18:14:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43952/elastic-blast</link>
	<title><![CDATA[Elastic BLAST !]]></title>
	<description><![CDATA[<p><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/elasticblast.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823">ElasticBLAST</a>&nbsp;is a new way to&nbsp;<a href="https://blast.ncbi.nlm.nih.gov/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823">BLAST</a>&nbsp;large numbers of queries, faster and on the cloud. Here are the top three reasons you should use ElasticBLAST:</p>
<h6><strong><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Larger-e1659978198941.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">1. ElasticBLAST can handle much LARGER queries!&nbsp;</strong></h6>
<p>ElasticBLAST can search query sets that have&nbsp;<em>hundreds to millions of sequences</em>&nbsp;and against BLAST databases of all sizes.</p>
<h6><span><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Faster.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">2. ElasticBLAST is FASTER</span></h6>
<p>ElasticBLAST distributes your searches across multiple cloud instances to process them simultaneously. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+.</p>
<h6><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Easy.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">3. ElasticBLAST is EASY to run on the cloud<strong><br></strong></h6>
<p>ElasticBLAST is easy to set up using our step-by-step instructions&nbsp;<span>(</span><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823" target="_blank"><span><span>Amazon Web&nbsp;</span><span>Services (AWS)</span></span></a><span>,&nbsp;</span><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823" target="_blank"><span>Google Cloud Platform (GCP)</span></a><span><span>)</span>&nbsp;<span>and</span>&nbsp;<span>allows&nbsp;</span><span>you&nbsp;</span><span>to leverage the power of</span><span>&nbsp;the&nbsp;</span><span>cloud. Once configured, i</span><span>t</span>&nbsp;<span>manages the software and database installation, handles partitioning of the BLAST workload among the various instances, and deallocates cloud resources when the searches are done.</span></span></p>
<p><span><span>ElasticBLAST</span>&nbsp;<span>also&nbsp;</span><span>selects the instance (</span><span>i.e.,</span><span>&nbsp;machine) type for you based on database size. Of course, you can also choose the instance type manually if you prefer</span><span>.&nbsp;</span></span></p><p>Address of the bookmark: <a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/" rel="nofollow">https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/fun/view/4196/chemical-elements-of-bioinformatics</guid>
	<pubDate>Tue, 03 Sep 2013 16:35:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/fun/view/4196/chemical-elements-of-bioinformatics</link>
	<title><![CDATA[Chemical Elements of Bioinformatics]]></title>
	<description><![CDATA[<p>You must be familiar with periodic table and colour pattern, but this time you are going to amaze by new elements table by Eagle genomics. Just check it out and have fun :)</p><p><a href="http://elements.eaglegenomics.com/">http://elements.eaglegenomics.com/</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38389/blast-options-setting-and-defaults</guid>
	<pubDate>Mon, 10 Dec 2018 08:29:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38389/blast-options-setting-and-defaults</link>
	<title><![CDATA[BLAST options, setting and defaults]]></title>
	<description><![CDATA[<p>BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990) and significantly improved by&nbsp;<a href="http://www3.oup.co.uk/nar/Volume_25/Issue_17/freepdf/">Altschul et al. (1997).</a>&nbsp;It is a very fast search algorithm that is used to separately search protein or DNA databases. BLAST is best used for sequence similarity searching, rather than for motif searching. For searches using a query sequence of fewer than twenty residues,&nbsp;<a href="https://www.arabidopsis.org/servlets/tools/patmatch/">PatMatch</a>&nbsp;is the best choice. Another sequence alignment tool that may yield different results from BLAST, and may be useful for motif searching, is&nbsp;<a href="https://www.arabidopsis.org/cgi-bin/fasta/TAIRfasta.pl">FASTA</a>. To search nonplant datasets, try&nbsp;<a href="http://seqsim.ncgr.org/newBlast.html">NCGR BLAST</a>&nbsp;or&nbsp;<a href="http://www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=0">NCBI BLAST</a>.</p>
<p>A fairly complete on-line guide to BLAST searching can be found at the&nbsp;<a href="http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html">NCBI BLAST Help Manual</a>. For a theoretical overview of BLAST, see the&nbsp;<a href="http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html">NCBI BLAST Course</a>. Additional information can be found in the&nbsp;<a href="https://www.arabidopsis.org/blast/aboutblast2.htm">BLAST 2.0 Release Notes</a></p>
<table border="1">
<tbody>
<tr><th>&nbsp;</th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTN</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTP</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTX</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">TBLASTN</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">TBLASTX</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">PSIBLAST</a></th></tr>
<tr>
<td><a name="open" id="open"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#open"><strong>Gap opening penalty</strong></a>:<br>cost to open a gap [integer]</td>
<td align="center">default = 5</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 5</td>
</tr>
<tr>
<td><a name="extend" id="extend"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#extend"><strong>Gap extension penalty</strong></a>:<br>cost to extend a gap [integer]</td>
<td align="center">default = 2</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 2</td>
</tr>
<tr>
<td><a name="match" id="match"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#match"><strong>Nucleic match</strong></a>:<br>reward for a match in the BLAST portion of run [integer]</td>
<td align="center">default = 1</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = 1</td>
</tr>
<tr>
<td><a name="mismatch" id="mismatch"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#mismatch"><strong>Nucleic mismatch</strong></a>:<br>penalty for a mismatch in the blast portion of run [integer]</td>
<td align="center">default = -3</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = -3</td>
</tr>
<tr>
<td><strong><a name="expect" id="expect"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#expect">Expectation value</a></strong>:<br>(E) [real]</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
</tr>
<tr>
<td><a name="word" id="word"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#word"><strong>Word size</strong></a>:<br>the size of the initial word that must be matched between the database and the query sequence</td>
<td align="center">default = 11</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 11</td>
</tr>
<tr>
<td><a name="descriptions" id="descriptions"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#descriptions"><strong>Max scores</strong></a>:<br>Number of one-line descriptions (V) [Integer]</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
</tr>
<tr>
<td><strong><a name="alignments" id="alignments"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#alignments">Max alignments</a></strong>:<br>number of alignments to show (B) [integer]</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
</tr>
<tr>
<td><strong>Query filter</strong>:<br>filter applied to the query sequence</td>
<td align="center">default = DUST</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = DUST</td>
</tr>
<tr>
<td><strong><a name="gencodes" id="gencodes"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#gencodes">Query genetic code</a></strong>:<br>genetic code to be used in BLASTX translation of the query</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = universal</td>
<td align="center">default = universal</td>
<td align="center">default = universal</td>
<td align="center">n/a</td>
</tr>
<tr>
<td><strong><a name="matrix" id="matrix"></a><a href="http://twod.med.harvard.edu/seqanal/matrices.html">Matrix</a></strong>:<br>substitution matrix to be used for amino acid comparisons</td>
<td align="center">no default</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">no default</td>
</tr>
</tbody>
</table>
<p>Supported and Suggested&nbsp;Values&nbsp;for Gap Open and Extension in BLASTP, BLASTX, TBLASTN, and TBLASTX</p>
<table border="1">
<tbody>
<tr><th>Gaps Open</th><th>Gap Extension</th></tr>
<tr>
<td align="center">10</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">10</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">11</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">8</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">9</td>
<td align="center">2</td>
</tr>
</tbody>
</table><p>Address of the bookmark: <a href="https://www.arabidopsis.org/Blast/BLASToptions.jsp" rel="nofollow">https://www.arabidopsis.org/Blast/BLASToptions.jsp</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44709/a-step-by-step-guide-to-running-blast-offline</guid>
	<pubDate>Sat, 07 Dec 2024 22:32:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44709/a-step-by-step-guide-to-running-blast-offline</link>
	<title><![CDATA[A Step-by-Step Guide to Running BLAST Offline]]></title>
	<description><![CDATA[<p>BLAST (Basic Local Alignment Search Tool) is a powerful algorithm used to compare nucleotide or protein sequences to sequence databases, identifying regions of similarity. Running BLAST offline provides more control, ensures data security, and allows customization for specific research needs. Here&rsquo;s a detailed guide to set up and run BLAST locally on your system.</p><hr><h3>Step 1: <strong>Install BLAST</strong></h3><ol>
<li>
<p><strong>Download BLAST</strong>:</p>
<ul>
<li>Visit the <a href="https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/">NCBI BLAST+ download page</a> to download the appropriate version for your operating system (Windows, macOS, or Linux).</li>
</ul>
</li>
<li>
<p><strong>Install BLAST</strong>:</p>
<ul>
<li>Extract the downloaded archive. For Linux/Mac, use:
<pre><code>tar -xvzf ncbi-blast-*.tar.gz
cd ncbi-blast-*
</code></pre>
</li>
<li>Add the BLAST binary folder to your system PATH for easier access:
<pre><code>export PATH=$PATH:/path/to/ncbi-blast-*/bin
</code></pre>
</li>
</ul>
</li>
<li>
<p><strong>Verify Installation</strong>:<br /> Run the following command to ensure BLAST is installed correctly:</p>
<pre><code>blastn -version
</code></pre>
</li>
</ol><hr><h3>Step 2: <strong>Prepare a Local Database</strong></h3><p>To run BLAST offline, you&rsquo;ll need a sequence database.</p><ol>
<li>
<p><strong>Download a Pre-Built Database (Optional)</strong>:</p>
<ul>
<li>NCBI provides ready-to-use databases such as <code>nt</code>, <code>nr</code>, and <code>Swiss-Prot</code>. Use the <code>update_blastdb.pl</code> script (bundled with BLAST) to download these:
<pre><code>update_blastdb.pl --decompress nt
</code></pre>
</li>
</ul>
</li>
<li>
<p><strong>Create a Custom Database</strong>:<br /> If you have specific sequences to use as a database:</p>
<ul>
<li>Prepare a FASTA file containing the sequences.</li>
<li>Use <code>makeblastdb</code> to create a database:
<pre><code>makeblastdb -in your_sequences.fasta -dbtype [nucl|prot] -out custom_db
</code></pre>
Replace <code>[nucl|prot]</code> with <code>nucl</code> for nucleotide sequences or <code>prot</code> for protein sequences.</li>
</ul>
</li>
</ol><hr><h3>Step 3: <strong>Prepare the Query Sequence</strong></h3><ul>
<li>Save your query sequence(s) in FASTA format.</li>
<li>Ensure the file is properly formatted, with a header line starting with <code>&gt;</code> followed by the sequence name, and the sequence on subsequent lines.</li>
</ul><p>Example:</p><pre><code>&gt;query_sequence
ATGCGTAGCTAGCGTAGCTAGCTAGCTA
</code></pre><hr><h3>Step 4: <strong>Run BLAST</strong></h3><ol>
<li>
<p><strong>Choose the Appropriate BLAST Tool</strong>:<br /> Depending on your data type:</p>
<ul>
<li><strong>blastn</strong>: For nucleotide-nucleotide searches.</li>
<li><strong>blastp</strong>: For protein-protein searches.</li>
<li><strong>blastx</strong>: Translates nucleotide sequences into proteins and compares them to a protein database.</li>
<li><strong>tblastn</strong>: Compares protein sequences to a nucleotide database.</li>
<li><strong>tblastx</strong>: Translates both nucleotide query and database sequences.</li>
</ul>
</li>
<li>
<p><strong>Run the Command</strong>:<br /> Example command for <code>blastn</code>:</p>
<pre><code>blastn -query query.fasta -db custom_db -out results.txt -outfmt 6 -evalue 1e-5
</code></pre>
<p><strong>Explanation of Parameters</strong>:</p>
<ul>
<li><code>-query</code>: Specifies the query file.</li>
<li><code>-db</code>: Points to the local database.</li>
<li><code>-out</code>: Output file name.</li>
<li><code>-outfmt</code>: Output format (e.g., 6 for tabular format).</li>
<li><code>-evalue</code>: E-value cutoff for significance.</li>
</ul>
</li>
</ol><hr><h3>Step 5: <strong>Interpret Results</strong></h3><ol>
<li>
<p><strong>Output Formats</strong>:</p>
<ul>
<li><strong>Default (outfmt 0)</strong>: Human-readable format.</li>
<li><strong>Tabular (outfmt 6)</strong>: Includes fields like query ID, subject ID, percent identity, alignment length, etc.</li>
</ul>
</li>
<li>
<p><strong>Analyze Results</strong>:<br /> Use tools like <code>grep</code>, Python, or R to parse and filter results for downstream analysis.</p>
</li>
</ol><hr><h3>Step 6: <strong>Optimize Performance</strong></h3><p>For large datasets, BLAST can be resource-intensive. To improve performance:</p><ol>
<li>
<p><strong>Multithreading</strong>:<br /> Use the <code>-num_threads</code> option to leverage multiple CPU cores:</p>
<pre><code>blastn -query query.fasta -db custom_db -out results.txt -num_threads 4
</code></pre>
</li>
<li>
<p><strong>Database Subsetting</strong>:<br /> Split large databases into smaller chunks for faster searches.</p>
</li>
<li>
<p><strong>Adjust Parameters</strong>:</p>
<ul>
<li>Lower the <code>-evalue</code> threshold for stricter matches.</li>
<li>Use <code>-max_target_seqs</code> to limit the number of results per query.</li>
</ul>
</li>
</ol><hr><h3>Step 7: <strong>Update Databases (Optional)</strong></h3><p>If using NCBI databases, regularly update them to ensure the inclusion of the latest sequences:</p><pre><code>update_blastdb.pl --decompress nt
</code></pre><hr><h3>Conclusion</h3><p>Running BLAST offline is a straightforward process that offers flexibility and security for bioinformaticians working with sensitive data. By following this guide, you can harness the power of BLAST to analyze sequences efficiently and gain valuable biological insights.</p><p>For advanced use cases, explore BLAST&rsquo;s customization options, such as custom scoring matrices, filtering, and iterative searches with tools like PSI-BLAST. Happy BLASTing!</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/40953/explore-taxdump-files</guid>
	<pubDate>Sat, 08 Feb 2020 04:44:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/40953/explore-taxdump-files</link>
	<title><![CDATA[Explore taxdump files !]]></title>
	<description><![CDATA[<pre>This is an extract of taxdump-readme.txt to be found at 
ftp://ftp.ncbi.nih.gov/pub/taxonomy/

The content of the archive
--------------------------

It may look like this:

delnodes.dmp
division.dmp
gencode.dmp
merged.dmp
names.dmp
nodes.dmp
readme.txt

The readme.txt file gives a brief description of *.dmp files. These files
contain taxonomic information and are briefly described below. Each of the
files store one record in the single line that are delimited by "\t|\n"
(tab, vertical bar, and newline) characters. Each record consists of one 
or more fields delimited by "\t|\t" (tab, vertical bar, and tab) characters.
The brief description of field position and meaning for each file follows.

nodes.dmp
---------

This file represents taxonomy nodes. The description for each node includes 
the following fields:

	tax_id					-- node id in GenBank taxonomy database
 	parent tax_id				-- parent node id in GenBank taxonomy database
 	rank					-- rank of this node (superkingdom, kingdom, ...) 
 	embl code				-- locus-name prefix; not unique
 	division id				-- see division.dmp file
 	inherited div flag  (1 or 0)		-- 1 if node inherits division from parent
 	genetic code id				-- see gencode.dmp file
 	inherited GC  flag  (1 or 0)		-- 1 if node inherits genetic code from parent
 	mitochondrial genetic code id		-- see gencode.dmp file
 	inherited MGC flag  (1 or 0)		-- 1 if node inherits mitochondrial gencode from parent
 	GenBank hidden flag (1 or 0)            -- 1 if name is suppressed in GenBank entry lineage
 	hidden subtree root flag (1 or 0)       -- 1 if this subtree has no sequence data yet
 	comments				-- free-text comments and citations

names.dmp
---------
Taxonomy names file has these fields:

	tax_id					-- the id of node associated with this name
	name_txt				-- name itself
	unique name				-- the unique variant of this name if name not unique
	name class				-- (synonym, common name, ...)

division.dmp
------------
Divisions file has these fields:
	division id				-- taxonomy database division id
	division cde				-- GenBank division code (three characters)
	division name				-- e.g. BCT, PLN, VRT, MAM, PRI...
	comments

gencode.dmp
-----------
Genetic codes file:

	genetic code id				-- GenBank genetic code id
	abbreviation				-- genetic code name abbreviation
	name					-- genetic code name
	cde					-- translation table for this genetic code
	starts					-- start codons for this genetic code

delnodes.dmp
------------
Deleted nodes (nodes that existed but were deleted) file field:

	tax_id					-- deleted node id

merged.dmp
----------
Merged nodes file fields:

	old_tax_id                              -- id of nodes which has been merged
	new_tax_id                              -- id of nodes which is result of merging

</pre>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>