<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44640?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/44640?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/fun/view/4196/chemical-elements-of-bioinformatics</guid>
	<pubDate>Tue, 03 Sep 2013 16:35:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/fun/view/4196/chemical-elements-of-bioinformatics</link>
	<title><![CDATA[Chemical Elements of Bioinformatics]]></title>
	<description><![CDATA[<p>You must be familiar with periodic table and colour pattern, but this time you are going to amaze by new elements table by Eagle genomics. Just check it out and have fun :)</p><p><a href="http://elements.eaglegenomics.com/">http://elements.eaglegenomics.com/</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38389/blast-options-setting-and-defaults</guid>
	<pubDate>Mon, 10 Dec 2018 08:29:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38389/blast-options-setting-and-defaults</link>
	<title><![CDATA[BLAST options, setting and defaults]]></title>
	<description><![CDATA[<p>BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990) and significantly improved by&nbsp;<a href="http://www3.oup.co.uk/nar/Volume_25/Issue_17/freepdf/">Altschul et al. (1997).</a>&nbsp;It is a very fast search algorithm that is used to separately search protein or DNA databases. BLAST is best used for sequence similarity searching, rather than for motif searching. For searches using a query sequence of fewer than twenty residues,&nbsp;<a href="https://www.arabidopsis.org/servlets/tools/patmatch/">PatMatch</a>&nbsp;is the best choice. Another sequence alignment tool that may yield different results from BLAST, and may be useful for motif searching, is&nbsp;<a href="https://www.arabidopsis.org/cgi-bin/fasta/TAIRfasta.pl">FASTA</a>. To search nonplant datasets, try&nbsp;<a href="http://seqsim.ncgr.org/newBlast.html">NCGR BLAST</a>&nbsp;or&nbsp;<a href="http://www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=0">NCBI BLAST</a>.</p>
<p>A fairly complete on-line guide to BLAST searching can be found at the&nbsp;<a href="http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html">NCBI BLAST Help Manual</a>. For a theoretical overview of BLAST, see the&nbsp;<a href="http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html">NCBI BLAST Course</a>. Additional information can be found in the&nbsp;<a href="https://www.arabidopsis.org/blast/aboutblast2.htm">BLAST 2.0 Release Notes</a></p>
<table border="1">
<tbody>
<tr><th>&nbsp;</th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTN</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTP</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">BLASTX</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">TBLASTN</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">TBLASTX</a></th><th><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#methods">PSIBLAST</a></th></tr>
<tr>
<td><a name="open" id="open"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#open"><strong>Gap opening penalty</strong></a>:<br>cost to open a gap [integer]</td>
<td align="center">default = 5</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 11<br>limited&nbsp;values&nbsp;are supported</td>
<td align="center">default = 5</td>
</tr>
<tr>
<td><a name="extend" id="extend"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#extend"><strong>Gap extension penalty</strong></a>:<br>cost to extend a gap [integer]</td>
<td align="center">default = 2</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 1<br>a 0 in this field means to use the default</td>
<td align="center">default = 2</td>
</tr>
<tr>
<td><a name="match" id="match"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#match"><strong>Nucleic match</strong></a>:<br>reward for a match in the BLAST portion of run [integer]</td>
<td align="center">default = 1</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = 1</td>
</tr>
<tr>
<td><a name="mismatch" id="mismatch"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#mismatch"><strong>Nucleic mismatch</strong></a>:<br>penalty for a mismatch in the blast portion of run [integer]</td>
<td align="center">default = -3</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = -3</td>
</tr>
<tr>
<td><strong><a name="expect" id="expect"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#expect">Expectation value</a></strong>:<br>(E) [real]</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
<td align="center">default = 10.0</td>
</tr>
<tr>
<td><a name="word" id="word"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#word"><strong>Word size</strong></a>:<br>the size of the initial word that must be matched between the database and the query sequence</td>
<td align="center">default = 11</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 3</td>
<td align="center">default = 11</td>
</tr>
<tr>
<td><a name="descriptions" id="descriptions"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#descriptions"><strong>Max scores</strong></a>:<br>Number of one-line descriptions (V) [Integer]</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
<td align="center">default = 25</td>
</tr>
<tr>
<td><strong><a name="alignments" id="alignments"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#alignments">Max alignments</a></strong>:<br>number of alignments to show (B) [integer]</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
<td align="center">default = 15</td>
</tr>
<tr>
<td><strong>Query filter</strong>:<br>filter applied to the query sequence</td>
<td align="center">default = DUST</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = SEG</td>
<td align="center">default = DUST</td>
</tr>
<tr>
<td><strong><a name="gencodes" id="gencodes"></a><a href="https://www.arabidopsis.org/Blast/BLAST_help.jsp#gencodes">Query genetic code</a></strong>:<br>genetic code to be used in BLASTX translation of the query</td>
<td align="center">n/a</td>
<td align="center">n/a</td>
<td align="center">default = universal</td>
<td align="center">default = universal</td>
<td align="center">default = universal</td>
<td align="center">n/a</td>
</tr>
<tr>
<td><strong><a name="matrix" id="matrix"></a><a href="http://twod.med.harvard.edu/seqanal/matrices.html">Matrix</a></strong>:<br>substitution matrix to be used for amino acid comparisons</td>
<td align="center">no default</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">default = blosum62</td>
<td align="center">no default</td>
</tr>
</tbody>
</table>
<p>Supported and Suggested&nbsp;Values&nbsp;for Gap Open and Extension in BLASTP, BLASTX, TBLASTN, and TBLASTX</p>
<table border="1">
<tbody>
<tr><th>Gaps Open</th><th>Gap Extension</th></tr>
<tr>
<td align="center">10</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">10</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">11</td>
<td align="center">1</td>
</tr>
<tr>
<td align="center">8</td>
<td align="center">2</td>
</tr>
<tr>
<td align="center">9</td>
<td align="center">2</td>
</tr>
</tbody>
</table><p>Address of the bookmark: <a href="https://www.arabidopsis.org/Blast/BLASToptions.jsp" rel="nofollow">https://www.arabidopsis.org/Blast/BLASToptions.jsp</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44370/ncbiblast-2141-now-available</guid>
	<pubDate>Wed, 30 Aug 2023 02:36:13 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44370/ncbiblast-2141-now-available</link>
	<title><![CDATA[NCBIBLAST+ 2.14.1 now available]]></title>
	<description><![CDATA[<p><a href="https://www.linkedin.com/feed/hashtag/?keywords=ncbiblast&amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7101231946264924160">#NCBIBLAST</a><span>+ 2.14.1 now available with improved documentation, faster and more reliable database downloads, and some bug fixes.&nbsp;</span></p><p>Check out the changes they made.</p><p>They added the&nbsp;<code><span>cleanup-blastdb-volumes.py</span></code>&nbsp;script to remove unused BLAST database volumes. Read the documentation&nbsp;<a href="https://www.ncbi.nlm.nih.gov/books/NBK592857/">here</a>.</p><p>They also switched the protocol from&nbsp;<code><span>ftp</span></code>&nbsp;to&nbsp;<code><span>https</span></code>&nbsp;to access BLAST databases for increased performance and reliability when downloading data from the NCBI with the&nbsp;<code><span>update_blastdb.pl</span></code>&nbsp;script.</p><p>And fixed a few bugs related to downloading data from the NCBI, and&nbsp;<code><span>mt_mode</span></code>&nbsp;crashing&nbsp;<code><span>blastn</span></code>&nbsp;and&nbsp;<code><span>blastx</span></code>.</p><p>Check out the&nbsp;<a href="https://www.ncbi.nlm.nih.gov/books/NBK131777/">release notes</a>.</p><p>Download&nbsp;<a href="https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.14.1/">BLAST+ 2.14.1</a></p><p>Questions or comments? Please write the&nbsp;<a href="https://support.nlm.nih.gov/support/create-case/">BLAST help desk</a>.</p><p><span><span>More info and download:</span>&nbsp;https://blast.ncbi.nlm.nih.gov/doc/blast-news/2023-BLAST-News.html</span></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44709/a-step-by-step-guide-to-running-blast-offline</guid>
	<pubDate>Sat, 07 Dec 2024 22:32:37 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44709/a-step-by-step-guide-to-running-blast-offline</link>
	<title><![CDATA[A Step-by-Step Guide to Running BLAST Offline]]></title>
	<description><![CDATA[<p>BLAST (Basic Local Alignment Search Tool) is a powerful algorithm used to compare nucleotide or protein sequences to sequence databases, identifying regions of similarity. Running BLAST offline provides more control, ensures data security, and allows customization for specific research needs. Here&rsquo;s a detailed guide to set up and run BLAST locally on your system.</p><hr><h3>Step 1: <strong>Install BLAST</strong></h3><ol>
<li>
<p><strong>Download BLAST</strong>:</p>
<ul>
<li>Visit the <a href="https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/">NCBI BLAST+ download page</a> to download the appropriate version for your operating system (Windows, macOS, or Linux).</li>
</ul>
</li>
<li>
<p><strong>Install BLAST</strong>:</p>
<ul>
<li>Extract the downloaded archive. For Linux/Mac, use:
<pre><code>tar -xvzf ncbi-blast-*.tar.gz
cd ncbi-blast-*
</code></pre>
</li>
<li>Add the BLAST binary folder to your system PATH for easier access:
<pre><code>export PATH=$PATH:/path/to/ncbi-blast-*/bin
</code></pre>
</li>
</ul>
</li>
<li>
<p><strong>Verify Installation</strong>:<br /> Run the following command to ensure BLAST is installed correctly:</p>
<pre><code>blastn -version
</code></pre>
</li>
</ol><hr><h3>Step 2: <strong>Prepare a Local Database</strong></h3><p>To run BLAST offline, you&rsquo;ll need a sequence database.</p><ol>
<li>
<p><strong>Download a Pre-Built Database (Optional)</strong>:</p>
<ul>
<li>NCBI provides ready-to-use databases such as <code>nt</code>, <code>nr</code>, and <code>Swiss-Prot</code>. Use the <code>update_blastdb.pl</code> script (bundled with BLAST) to download these:
<pre><code>update_blastdb.pl --decompress nt
</code></pre>
</li>
</ul>
</li>
<li>
<p><strong>Create a Custom Database</strong>:<br /> If you have specific sequences to use as a database:</p>
<ul>
<li>Prepare a FASTA file containing the sequences.</li>
<li>Use <code>makeblastdb</code> to create a database:
<pre><code>makeblastdb -in your_sequences.fasta -dbtype [nucl|prot] -out custom_db
</code></pre>
Replace <code>[nucl|prot]</code> with <code>nucl</code> for nucleotide sequences or <code>prot</code> for protein sequences.</li>
</ul>
</li>
</ol><hr><h3>Step 3: <strong>Prepare the Query Sequence</strong></h3><ul>
<li>Save your query sequence(s) in FASTA format.</li>
<li>Ensure the file is properly formatted, with a header line starting with <code>&gt;</code> followed by the sequence name, and the sequence on subsequent lines.</li>
</ul><p>Example:</p><pre><code>&gt;query_sequence
ATGCGTAGCTAGCGTAGCTAGCTAGCTA
</code></pre><hr><h3>Step 4: <strong>Run BLAST</strong></h3><ol>
<li>
<p><strong>Choose the Appropriate BLAST Tool</strong>:<br /> Depending on your data type:</p>
<ul>
<li><strong>blastn</strong>: For nucleotide-nucleotide searches.</li>
<li><strong>blastp</strong>: For protein-protein searches.</li>
<li><strong>blastx</strong>: Translates nucleotide sequences into proteins and compares them to a protein database.</li>
<li><strong>tblastn</strong>: Compares protein sequences to a nucleotide database.</li>
<li><strong>tblastx</strong>: Translates both nucleotide query and database sequences.</li>
</ul>
</li>
<li>
<p><strong>Run the Command</strong>:<br /> Example command for <code>blastn</code>:</p>
<pre><code>blastn -query query.fasta -db custom_db -out results.txt -outfmt 6 -evalue 1e-5
</code></pre>
<p><strong>Explanation of Parameters</strong>:</p>
<ul>
<li><code>-query</code>: Specifies the query file.</li>
<li><code>-db</code>: Points to the local database.</li>
<li><code>-out</code>: Output file name.</li>
<li><code>-outfmt</code>: Output format (e.g., 6 for tabular format).</li>
<li><code>-evalue</code>: E-value cutoff for significance.</li>
</ul>
</li>
</ol><hr><h3>Step 5: <strong>Interpret Results</strong></h3><ol>
<li>
<p><strong>Output Formats</strong>:</p>
<ul>
<li><strong>Default (outfmt 0)</strong>: Human-readable format.</li>
<li><strong>Tabular (outfmt 6)</strong>: Includes fields like query ID, subject ID, percent identity, alignment length, etc.</li>
</ul>
</li>
<li>
<p><strong>Analyze Results</strong>:<br /> Use tools like <code>grep</code>, Python, or R to parse and filter results for downstream analysis.</p>
</li>
</ol><hr><h3>Step 6: <strong>Optimize Performance</strong></h3><p>For large datasets, BLAST can be resource-intensive. To improve performance:</p><ol>
<li>
<p><strong>Multithreading</strong>:<br /> Use the <code>-num_threads</code> option to leverage multiple CPU cores:</p>
<pre><code>blastn -query query.fasta -db custom_db -out results.txt -num_threads 4
</code></pre>
</li>
<li>
<p><strong>Database Subsetting</strong>:<br /> Split large databases into smaller chunks for faster searches.</p>
</li>
<li>
<p><strong>Adjust Parameters</strong>:</p>
<ul>
<li>Lower the <code>-evalue</code> threshold for stricter matches.</li>
<li>Use <code>-max_target_seqs</code> to limit the number of results per query.</li>
</ul>
</li>
</ol><hr><h3>Step 7: <strong>Update Databases (Optional)</strong></h3><p>If using NCBI databases, regularly update them to ensure the inclusion of the latest sequences:</p><pre><code>update_blastdb.pl --decompress nt
</code></pre><hr><h3>Conclusion</h3><p>Running BLAST offline is a straightforward process that offers flexibility and security for bioinformaticians working with sensitive data. By following this guide, you can harness the power of BLAST to analyze sequences efficiently and gain valuable biological insights.</p><p>For advanced use cases, explore BLAST&rsquo;s customization options, such as custom scoring matrices, filtering, and iterative searches with tools like PSI-BLAST. Happy BLASTing!</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/2030/phylomedb</guid>
	<pubDate>Mon, 12 Aug 2013 11:55:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/2030/phylomedb</link>
	<title><![CDATA[PhylomeDB]]></title>
	<description><![CDATA[<p><span>PhylomeDB is a public database for complete&nbsp;</span><strong>collections of gene phylogenies</strong><span>&nbsp;(phylomes). It allows users to interactively explore the evolutionary history of genes through the visualization of phylogenetic trees and multiple sequence alignments.</span></p><p><span><span>Moreover, phylomeDB provides genome-wide orthology and paralogy predictions which are based on the analysis of the phylogenetic trees. The automated pipeline used to reconstruct trees aims at providing a&nbsp;</span><strong>high-quality phylogenetic analysis</strong><span>&nbsp;of different genomes , including Maximum Likelihood or Bayesian tree inference, alignment trimming and evolutionary model testing. PhylomeDB includes also a public download section with the complete set of trees, alignments and orthology predictions.</span></span></p><p>&nbsp;</p><p>More at&nbsp;<a href="http://phylomedb.org/">http://phylomedb.org/</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27685/biodbnet</guid>
	<pubDate>Thu, 02 Jun 2016 11:11:47 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27685/biodbnet</link>
	<title><![CDATA[BioDBnet]]></title>
	<description><![CDATA[<p><span>Database to Database Conversions</span> </p>
<p>db2db allows for conversions of identifiers from one database to other database identifiers or annotations. To use db2db select the input type of your data, changing the input type automatically changes the output options to the ones specific for the input selected. Then select one or more output types and add your identifiers in the ID list box. Set the remove duplicate values to 'No' if you do not want duplicates to be removed. Clicking on submit then returns a table of your inputs matched against all the outputs selected in the exact order as entered. Results can be limited to a particular taxon by entering it's <a href="https://biodbnet-abcc.ncifcrf.gov/tools/orgTaxon.php">Taxon ID</a>. The performance will vary widely depending on the number of outputs and the options selected. Conversions to a single output with the default options should complete in a few seconds</p><p>Address of the bookmark: <a href="https://biodbnet-abcc.ncifcrf.gov/db/db2db.php" rel="nofollow">https://biodbnet-abcc.ncifcrf.gov/db/db2db.php</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31123/biodownloader</guid>
	<pubDate>Sat, 25 Feb 2017 17:52:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31123/biodownloader</link>
	<title><![CDATA[BioDownloader]]></title>
	<description><![CDATA[<p><strong><em>BioDownloader</em></strong> is a program for downloading and/or updating files from ftp/http servers. The program has unique features that are specifically designed to deal with bioinformatics data files and servers:</p>
<ul>
<li>optimized to work with vast amount of data and very large file sets (~ 10,000 - 100,000).</li>
<li>allows the selective retrieval of only the required files (file masks, ls-lR parsing, recursive search, updates)</li>
<li>has a built-in repository containing the settings for the most common bioinformatics download needs</li>
<li>built-in wizard for batch post-processing of downloaded files (archive extraction, file conversion, etc.)</li>
<li>capable of performing multiple download or update tasks simultaneously</li>
</ul>
<p>BioDownloader has a built-in repository containing the settings for common bioinformatics file-synchronization needs, including the Protein Data Bank (PDB) and National Center for Biotechnology Information (NCBI) databases. It can post-process downloaded files, including archive extraction and file conversions.</p>
<p>http://dunbrack.fccc.edu/BioDownloader/</p><p>Address of the bookmark: <a href="http://dunbrack.fccc.edu/BioDownloader/" rel="nofollow">http://dunbrack.fccc.edu/BioDownloader/</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43714/hiv-genome-database</guid>
	<pubDate>Fri, 21 Jan 2022 05:40:15 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43714/hiv-genome-database</link>
	<title><![CDATA[HIV genome database !]]></title>
	<description><![CDATA[<p>HIV resources</p>
<p>https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html</p><p>Address of the bookmark: <a href="https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html" rel="nofollow">https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</guid>
	<pubDate>Tue, 23 Mar 2021 05:32:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42987/public-databases-for-bioinformatics</link>
	<title><![CDATA[Public Databases for Bioinformatics !]]></title>
	<description><![CDATA[<pre>https://www.nature.com/articles/s41467-020-17155-y<br><br>Server Infrastructure:

File Server:

dhara: Synology 3614 Storage Appliance
4 Core Xeon
108TB disk storage
10Gb ethernet to SCG3
Access atx: dhara:5000
Has btsync server (try it - its much better than dropbox)

Compute Servers:

nandi: Kundaje and Phi Server
24 intel cores
256GB RAM
500GB of SSD storage 
36TB RAID6 local storage
4 Intel Phi's (space for 4 more GPU's)


durga: Montgomery and sensitive data
24 intel cores
256GB RAM
500GB of SSD RAID0 storage 
60TB RAID6 local storage

mitra: Bassik and Web/DB Server
24 core
256GB RAM 
500GB of SSD RAID0 storage 
36TB RAID6 local storage

vayu: Kundaje GPU server
4 core
64GB RAM 
200GB of SSD storage 
8TB RAID10 local storage
4 Nvidia GTX 970 4GB GPUs

amold: Bickel and SGE server
32 AMD core
128GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

wotan: Bickel and SGE server
64 AMD core
256GB RAM 
200GB of SSD storage 
12TB RAID5 local storage

Filesystem:

/users/$USER
default home directory
full backups nightly 
nfs mount to dhara
should store code, papers, and other highly processed data here

/mnt/data/
globally accessible data
should store common data here
e.g. genomes and indexes, annotations, ENCODE data  
if you dont want this to count towards your quote you must chown

/mnt/lab_data/$LAB/
lab accessible data
should store lab project data here 
e.g. ATAC-seq prediction data, enhancer prediction, motif calls

/srv/scratch/$USER
fast local storage
not backed up, but on raid and data will never be deleted
most analysis should be performed here

/srv/persistent/$USER
fast local storage
synced nightly, but not backed up
       ie if the hard drives fail or you delete something and notice 
       within 24 hours we can recover. Otherwise not. (vs home which is 
       properly backed up )  
intermediate analysis products that would be hard to recover should be stored here 
       e.g. stochastic analysis results that need to be kept so that paper 
       results can be reproduced

/srv/www/$LABNAME/
web accessible from mitra.stanford.edu
*NOT BACKED UP*

Some parallel programming patterns:

# gzip a bunch of files
parallel gzip -- *.FILESTOGZIP

# fork example in python:
(for more detailed examples look at 
 https://github.com/nboley/grit/ grit/lib/multiprocessing_utils.py)

import os
import time
import random

import multiprocessing

class ProcessSafeOPStream( object ):
    def __init__( self, writeable_obj ):
        self.writeable_obj = writeable_obj
        self.lock = multiprocessing.Lock()
        self.name = self.writeable_obj.name
        return
    
    def write( self, data ):
        self.lock.acquire()
        self.writeable_obj.write( data )
        self.writeable_obj.flush()
        self.lock.release()
        return
    
    def close( self ):
        self.writeable_obj.close()

def worker(queue, ofp):
    # Try without this
    random.seed()
    while True:
        i = queue.get()
        if i == 'FINISHED': return
        # simulate an expensive function
        x = random.random()
        time.sleep(x/10)
        print i, x
        ofp.write("%i\t%s\n" % (i, x))

NSIMS = 10000
NPROC = 25

# populate queue
todo = multiprocessing.Queue()
for i in xrange(NSIMS): todo.put(i)
for i in xrange(NPROC): todo.put('FINISHED')

ofp = ProcessSafeOPStream( open("output.txt", "w") )

pids = []
for i in xrange(NPROC):
    pid = os.fork()
    if pid == 0:
       worker(todo, ofp)
       os._exit(0)
    else:
       pids.append(pid)  

for pid in pids:
    os.waitpid(pid, 0)

ofp.close()

print "FINISHED"<br><br></pre>
<p>For use case 1 we obtained the following ENCODE and ROADMAP datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz">https://www.encodeproject.org/files/ENCFF446WOD/@@download/ENCFF446WOD.bed.gz</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam">https://www.encodeproject.org/files/ENCFF546PJU/@@download/ENCFF546PJU.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam">https://www.encodeproject.org/files/ENCFF059BEU/@@download/ENCFF059BEU.bam</a>. Blacklisted regions were obtained from&nbsp;<a href="http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz">http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/hg38-human/hg38.blacklist.bed.gz</a>. The human genome version hg38 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz</a>.</p>
<p>For use case 2 we used the set of narrowPeak files summarized in&nbsp;<a href="https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt">https://github.com/wkopp/janggu_usecases/tree/master/extra/urls.txt</a>&nbsp;(archived version v1.0.1). The human genome version hg19 was obtained from&nbsp;<a href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz</a></p>
<p>For use case 3 we used the ENCODE datasets&nbsp;<a href="https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam">https://www.encodeproject.org/files/ENCFF591XCX/@@download/ENCFF591XCX.bam</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig">https://www.encodeproject.org/files/ENCFF736LHE/@@download/ENCFF736LHE.bigWig</a>,&nbsp;<a href="https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam">https://www.encodeproject.org/files/ENCFF177HHM/@@download/ENCFF177HHM.bam</a>&nbsp;as we as the GENCODE annotation v29 from&nbsp;<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz">ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz</a>.</p><p>Address of the bookmark: <a href="http://mitra.stanford.edu/" rel="nofollow">http://mitra.stanford.edu/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/38664/updated-ranking-of-institutes-and-countries-based-on-developed-biological-databases</guid>
	<pubDate>Fri, 11 Jan 2019 09:35:26 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/38664/updated-ranking-of-institutes-and-countries-based-on-developed-biological-databases</link>
	<title><![CDATA[Updated ranking of institutes and countries based on developed biological databases]]></title>
	<description><![CDATA[<p><span><span>Updated ranking of institutes and countries based on developed biological databases is available at </span></span><a href="https://lnkd.in/fiVAdM6" target="_blank">https://lnkd.in/fiVAdM6</a><span><span> , India is maintaing 4th position and "Institute of Microbial Technology, Chandigarh" is on 3rd Position (after EBI and NCBI). This is a big achievement for any institute to reach on 3rd position in the world.</span></span></p><p><span><span>More at&nbsp;http://bigd.big.ac.cn/databasecommons/stat</span></span></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>