<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/26395?offset=20</link>
	<atom:link href="https://bioinformaticsonline.com/related/26395?offset=20" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43401/levenshtein-and-damerau-levenshtein-distance</guid>
	<pubDate>Tue, 28 Sep 2021 04:38:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43401/levenshtein-and-damerau-levenshtein-distance</link>
	<title><![CDATA[Levenshtein and Damerau-Levenshtein distance !]]></title>
	<description><![CDATA[<h3><strong>Levenshtein Distance</strong></h3><p>Also known as <strong>Edit Distance</strong>, it is the number of transformations (deletions, insertions, or substitutions) required to transform a source string into the target one. For example, if the target term is &ldquo;book&rdquo; and the source is &ldquo;back&rdquo;, you will need to change the first &ldquo;o&rdquo; to &ldquo;a&rdquo; and the second &ldquo;o&rdquo; to &ldquo;c&rdquo;, which will give us a Levenshtein Distance of 2.Edit Distance is very easy to implement, and it is a popular challenge during code interviews </p><p>Additionally, some frameworks also support the Damerau-Levenshtein distance:</p><p>&nbsp;</p><h3><strong>Damerau-Levenshtein distance</strong></h3><p>It is an extension to Levenshtein Distance, allowing one extra operation: <strong><em>Transposition</em></strong>&nbsp;of two adjacent characters:</p><p><strong>Ex: </strong>TSAR to STAR</p><p><strong>Damerau-Levenshtein distance = </strong>1&nbsp; (Switching S and T positions cost only one operation)</p><p><strong>Levenshtein distance = 2&nbsp;</strong> (Replace S by T and T by S)</p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/9028/linux-for-bioinformatician</guid>
	<pubDate>Thu, 13 Mar 2014 16:59:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/9028/linux-for-bioinformatician</link>
	<title><![CDATA[Linux for bioinformatician !!!]]></title>
	<description><![CDATA[<p>Linux, free operating system for computers, provides several powerful admin tools and utilities which will help you to manage your systems effectively and handle huge amount of genomic/biological data with an ease. The field of bioinformatics relies heavily on Linux-based computers and software. Although most bioinformatics programs can be compiled to run. If you don&rsquo;t know what these no so user-friendly tools are and how to use them, you could be spending lot of time trying to perform even the basic admin tasks. The focus of this linux series is to help you understand system admin as well as basic tools, which will help you to become an effective bioinformatician and computational biologist.<br /><br /></p><p>For knowledge about Linux and their importance amongst bioinformatician plesae read this article "<a href="http://www.ualberta.ca/~stothard/downloads/linux_for_bioinformatics.pdf">An introduction to Linux for bioinformatics</a>" by Paul Stothard.</p><p>Linux cheat sheet at http://bioinformaticsonline.com/file/view/87/linux-cheat-sheet</p><p>Please browse for futher useful linux pages on right hand side ...</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/11313/linux-sort-commands-for-bioinformatics</guid>
	<pubDate>Sat, 31 May 2014 15:41:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/11313/linux-sort-commands-for-bioinformatics</link>
	<title><![CDATA[Linux Sort Commands for Bioinformatics]]></title>
	<description><![CDATA[<p>Almost all the scripting languages such as Perl, Python etc have built-in sort, but unfortunately none of them are as flexible as sort command. But one when it come to space efficiency GNU sort stands at the top. It can sort a 20Gb file with less than 2Gb memory. It is not trivial to implement so powerful a sort by yourself.</p><p>sort a space-delimited file based on its first column, then the second if the first is the same, and so on:<br />sort input.txt</p><p>sort a huge file (GNU sort ONLY):<br />sort -S 1500M -t $HOME/tmp input.txt &gt; sorted.txt</p><p>sort starting from the third column, skipping the first two columns:<br />sort +2 input.txt</p><p>sort the second column as numbers, descending order; if identical, sort the 3rd as strings, ascending order:<br />sort -k2,2nr -k3,3 input.txt</p><p>sort starting from the 4th character at column 2, as numbers:<br />sort -k2.4n input.txt</p><p>More Linxu sort command information<br /><br />If you have any sort commands you'd like to share, please add them to our comments section below. For more help, you can also type:<br /><br />man sort<br /><br />or<br /><br />sort --help<br /><br />on your Unix/Linux system.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34864/installing-perl-environment-on-linux</guid>
	<pubDate>Tue, 26 Dec 2017 21:21:50 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34864/installing-perl-environment-on-linux</link>
	<title><![CDATA[Installing Perl environment on Linux]]></title>
	<description><![CDATA[<p>By using&nbsp;<code>plenv</code>, you can easily install and switch among different version of Perl. This will be installed under your home directory in<code>~/.plenv</code>.</p><h4>Install latest Perl (with supporting multithreading) and CPANMinus.</h4><pre><code> $ cd
 $ git clone git://github.com/tokuhirom/plenv.git ~/.plenv
 $ git clone git://github.com/tokuhirom/Perl-Build.git ~/.plenv/plugins/perl-build/
 $ echo 'export PATH="$HOME/.plenv/bin:$PATH"' &gt;&gt; ~/.bashrc
 $ echo 'eval "$(plenv init -)"' &gt;&gt; ~/.bashrc
 $ source ~/.bashrc
 $ plenv install 5.18.1 -Dusethreads
 $ plenv rehash
 $ plenv global 5.18.1
 $ plenv install-cpanm
</code></pre><ul>
<li><code>git</code>&nbsp;is a distributed revision control and source code management software which can help you to download files from GitHub server.</li>
<li><code>echo</code>&nbsp;means "print".</li>
<li><code>&gt;&gt;</code>&nbsp;means adding the output into the end of the file, while&nbsp;<code>&gt;</code>&nbsp;means adding the output by overwriting the whole file. Please use<code>&gt;</code>&nbsp;with additional cares.</li>
<li>In Linux system, there are two types of outputs when you execute a command. One is called standard output (or sometimes STDOUT for short), and the other is a standard error (STDERR).&nbsp;<code>1&gt;</code>&nbsp;is for STDOUT only,&nbsp;<code>2&gt;</code>&nbsp;is for STDERR only, and&nbsp;<code>&amp;&gt;</code>means for both. In default&nbsp;<code>&gt;</code>&nbsp;is the same to&nbsp;<code>1&gt;</code>.</li>
<li><code>exec</code>&nbsp;is execution.</li>
<li>Remember to install Perl in supporting multithreading (with option&nbsp;<code>-Dusethreads</code>), which is important for many NGS analysis packages (e.g. Trinity). In this setting, you can use multiple CPU for Perl software.</li>
<li>Install the CPAN (Comprehensive Perl Archive Network) manager software, CPANMinus, by&nbsp;<code>install-cpanm</code>.</li>
</ul><p>You can use&nbsp;<code>plenv global</code>&nbsp;and&nbsp;<code>plenv local</code>&nbsp;to change the different version of Perl to fulfil different needs of your Perl software.</p><p>For example, if the&nbsp;specific version of Perl is not compatible with your script, you can switch to the different version by:</p><pre><code> $ plenv local 
</code></pre><ul>
<li>It is similar to set the local version of your script language when you use&nbsp;<code>pyenv</code>&nbsp;and&nbsp;<code>rbenv</code>&nbsp;as the following.</li>
</ul><p>Put the following path into&nbsp;<code>~/.bashrc file</code>.</p><pre><code>export PERL5LIB="$HOME/.plenv/build/perl-5.18.1/lib"
</code></pre><h4>Install BioPerl and PerlIO::gzip</h4><p>CPANMinus is a very good Perl module manager, use&nbsp;<code>cpanm</code>&nbsp;to install BioPerl can save you a lot of time. Here are some useful modules:</p><pre><code>$ cpanm Bio::Perl
$ cpanm Bio::SearchIO
$ cpanm PerlIO::gzip<br /></code></pre><p><span>For more information, please visit:&nbsp;</span><a href="https://github.com/tokuhirom/plenv">https://github.com/tokuhirom/plenv</a></p><pre><code>&nbsp;</code></pre>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/40768/linux-advantages</guid>
	<pubDate>Thu, 30 Jan 2020 06:27:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/40768/linux-advantages</link>
	<title><![CDATA[Linux advantages]]></title>
	<description><![CDATA[<p>https://www.forbes.com/sites/jasonevangelho/2018/07/30/ditching-windows-heres-how-ubuntu-updates-your-pc-and-why-its-better/#7aa6fa5f7c23</p><p>https://www.forbes.com/sites/jasonevangelho/2018/07/23/5-reasons-you-should-switch-from-windows-to-linux-right-now/#70c74923777b</p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43911/slurm-commands</guid>
	<pubDate>Wed, 06 Jul 2022 07:40:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43911/slurm-commands</link>
	<title><![CDATA[SLURM Commands]]></title>
	<description><![CDATA[<h3>SLURM commands</h3><p>The following table shows SLURM commands on the SOE cluster.</p><table border="1">
<thead>
<tr><th>Command</th><th>Description</th></tr>
</thead>
<tbody>
<tr>
<td><strong>sbatch</strong></td>
<td>Submit batch scripts to the cluster</td>
</tr>
<tr>
<td><strong>scancel</strong></td>
<td>Signal jobs or job steps that are under the control of Slurm.</td>
</tr>
<tr>
<td><strong>sinfo</strong></td>
<td>View information about SLURM nodes and partitions.</td>
</tr>
<tr>
<td><strong>squeue</strong></td>
<td>View information about jobs located in the SLURM scheduling queue</td>
</tr>
<tr>
<td><strong>smap</strong></td>
<td>Graphically view information about SLURM jobs, partitions, and set configurations parameters</td>
</tr>
<tr>
<td><strong>sqlog</strong></td>
<td>View information about running and finished jobs</td>
</tr>
<tr>
<td><strong>sacct</strong></td>
<td>View resource accounting information for finished and running jobs</td>
</tr>
<tr>
<td><strong>sstat</strong></td>
<td>View resource accounting information for running jobs</td>
</tr>
</tbody>
</table><p><span>For more information, run&nbsp;</span><strong>man</strong><span>&nbsp;on the commands above. See some examples below.</span><br /><br /><span style="font-size: large;"><strong>1. Info about the partitions and nodes</strong></span><span></span><br /><span>List all the partitions available to you and the nodes therein:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sinfo
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Nodes in state&nbsp;</span><tt>idle</tt><span>&nbsp;can accept new jobs.</span><br /><br /><span>Show a partition configuratuin, for example,&nbsp;</span><tt>SOE_main</tt><span></span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show partition=SOE_main
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Show current info about a specific node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=&lt;nodename&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>You can also specify a group of nodes in the command above. For example, if your MPI job is running across soenode05,06,35,36, you can execute the command below to get the info on the nodes you are interested in:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=soenode[05-06,35-36]
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>An informative parameter in the output to look at would be CPULoad. It allows you to see how your application utilizes the CPUs on the running nodes.</span><br /><br /><span style="font-size: large;"><strong>2. Submit scripts</strong></span><span></span><br /><span>The header in a submit script specifies job name, partition (queue), time limit, memory allocation, number of nodes, number of cores, and files to collect standard output and error at run time, for example</span></p><div><table border="1">
<tbody>
<tr>
<td>
<pre>#!/bin/bash

#SBATCH --job-name=OMP_run     # job name, "OMP_run"
#SBATCH --partition=SOE_main   # partition (queue)
#SBATCH -t 0-2:00              # time limit: (D-HH:MM) 
#SBATCH --mem=32000            # memory per node in MB 
#SBATCH --nodes=1              # number of nodes
#SBATCH --ntasks-per-node=16   # number of cores
#SBATCH --output=slurm.out     # file to collect standard output
#SBATCH --error=slurm.err      # file to collect standard errors
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>If the time limit is not specified in the submit script, SLURM will assign the default run time, 3 days. This means the job will be terminated by SLURM in 72 hrs. The maximum allowed run time is two weeks,&nbsp;</span><tt>14-0:00</tt><span>.</span><br /><span>If the memory limit is not requested, SLURM will assign the default 16 GB. The maximum allowed memory per node is 128 GB. To see how much RAM per node your job is using, you can run commands&nbsp;</span><tt>sacct</tt><span>&nbsp;or&nbsp;</span><tt>sstat</tt><span>&nbsp;to query MaxRSS for the job on the node - see examples below.</span><br /><span>Depending on a type of application you need to run, the submit script may contain commands to create a temporary space on a computational node -&nbsp;</span><a href="http://ecs.rutgers.edu/file_systems.html">see the discussion about using the file systems on the cluster.</a><span></span><br /><span>Then it sets the environment specific to the application and starts the application on one or multiple nodes - see sbatch sample scripts in directory&nbsp;</span><tt>/usr/local/Samples</tt><span>&nbsp;on soemaster1.hpc.rutgers.edu.</span><br /><span>You can submit your job to the cluster with&nbsp;</span><tt>sbatch</tt><span>&nbsp;command:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sbatch myscript.sh
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>3. Query job information</strong></span><span></span><br /><span>List all currently submitted jobs in running and pending states for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Command&nbsp;</span><tt>squeue</tt><span>&nbsp;can be run with format options to expose specific information, for example, when pending job #706 is scheduled to start running:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>START_TIME
2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>More info can be shown by placing additional format options, for example:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%i %P %j %u %T %l %C %S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>JOBID PARTITION   NAME    USER STATE   TIMELIMIT  CPUS START_TIME
706   SOE_main  Par_job_3 mike PENDING 3-00:00:00 64   2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To see when all the jobs, pending in the queue, are scheduled to start:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue --start 
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span>List all running and completed jobs for a user</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>or</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j &lt;JobID&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The following appreviations are used for the job states:</span></p><pre>       CA   CANCELLED      Job was cancelled.

       CD   COMPLETED      Job completed normally.

       CG   COMPLETING     Job is in the process of completing.

       F    FAILED         Job termined abnormally.

       NF   NODE_FAIL      Job terminated due to node failure.

       PD   PENDING        Job is pending allocation.

       R    RUNNING        Job currently has an allocation.

       S    SUSPENDED      Job is suspended.

       TO   TIMEOUT        Job terminated upon reaching its time limit.
</pre><p><span>You can specify the fields you would like to see in the output of&nbsp;</span><tt>sqlog</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog --format=list
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The command below, for example, provides Job ID, user name, exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j 2831 --format=jid,user,state,start,end
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>List status info for a currently running job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>A formatted output can be used to gain only a specific info, for example, the maximum resident RAM usage on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat --format="JobID,MaxRSS" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To get statistics on completed jobs by jobID:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To view the same information for all jobs of a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To print a list of fields that can be specified with the&nbsp;</span><tt>--format</tt><span>&nbsp;option:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --helpformat
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>For example, to get Job ID, Job name, Exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct -j 2831 --format="JobID,JobName,State,Start,End"
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Another useful command to gain information about a running job is&nbsp;</span><tt>scontrol</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show job=&lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>4. Cancel a job</strong></span><span></span><br /><span>To cancel one job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one job and delete the TMP directory created by the submit script on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sdel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel all the jobs for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one or more jobs by name:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel --name &lt;myJobName&gt;
</pre>
</td>
</tr>
</tbody>
</table></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44711/blast-5-key-updates-and-enhancements-for-modern-bioinformatics</guid>
	<pubDate>Sat, 07 Dec 2024 22:37:48 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44711/blast-5-key-updates-and-enhancements-for-modern-bioinformatics</link>
	<title><![CDATA[BLAST+ 5: Key Updates and Enhancements for Modern Bioinformatics]]></title>
	<description><![CDATA[<p>The BLAST+ 5 (Basic Local Alignment Search Tool) update has introduced several key enhancements aimed at improving performance, user experience, and compatibility with evolving genomic data standards. Here are the major updates:</p><ol>
<li>
<p><strong>Database Enhancements</strong>:</p>
<ul>
<li>The BLAST databases have shifted fully to the version 5 (v5) format, which integrates built-in taxonomy information. This allows for more detailed and efficient sequence annotation and analysis.</li>
<li>Protein databases in v5 are now accession-based, supporting a broader range of sequences, including those from high-throughput projects and the Pathogen Detection Project. These databases also accommodate structural proteins with multi-character chain identifiers.</li>
</ul>
</li>
<li>
<p><strong>Performance Improvements</strong>:</p>
<ul>
<li>Adaptive Composition-Based Statistics (CBS) is available as an experimental feature, enhancing the detection of novel results in protein-protein comparisons.</li>
<li>Updated algorithms improve the stability of search results, especially when fewer hits are requested than the default output.</li>
</ul>
</li>
<li>
<p><strong>Compatibility</strong>:</p>
<ul>
<li>Support for the older v4 databases has been discontinued. The v5 format is now the default for all BLAST database updates, ensuring alignment with current standards in bioinformatics.</li>
</ul>
</li>
<li>
<p><strong>User-Friendly Changes</strong>:</p>
<ul>
<li>Naming conventions for databases have been simplified to enhance clarity and ease of use. For example, database names no longer include version tags like "_v5".</li>
</ul>
</li>
<li>
<p><strong>Future-Proofing</strong>:</p>
<ul>
<li>BLAST+ 5 aligns with current and upcoming data requirements, ensuring that researchers have access to the most comprehensive and modern resources for sequence alignment.</li>
</ul>
</li>
</ol><p>These updates reflect NCBI's commitment to maintaining BLAST as a leading tool for sequence analysis. For detailed release notes and additional guidance, refer to NCBI Insights <a href="https://ncbiinsights.ncbi.nlm.nih.gov/">here</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/87/linux-cheat-sheet</guid>
	<pubDate>Tue, 09 Jul 2013 17:30:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/87/linux-cheat-sheet</link>
	<title><![CDATA[Linux Cheat Sheet]]></title>
	<description><![CDATA[<p><span>In an attempt to find a good Linux reference for bioinformatician and BOL readers, I was unsuccessful at finding a decent one on the Internet. So, we decided to make a cheat sheet for biological programmers.</span></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/87" length="81260" type="application/pdf" />
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/9030/linux-ssh-client-commands-for-bioinformatics</guid>
	<pubDate>Thu, 13 Mar 2014 17:16:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/9030/linux-ssh-client-commands-for-bioinformatics</link>
	<title><![CDATA[Linux SSH Client Commands for Bioinformatics]]></title>
	<description><![CDATA[<p>Here come on let play with the following basic command line usage of the ssh client.<br /><br /><strong>1. Check your SSH Client Version:</strong><br /><br />Checking for your SSH client is very sare, but sometimes it may be necessary to identify the SSH client that you are currently running and it&rsquo;s corresponding version number. The SSh client can be identified as follows<br /><br />$ ssh -V<br />OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2013<br /><br />$ ssh -V<br />ssh: SSH Secure Shell 3.2.9.1 (non-commercial version) on i686-pc-linux-gnu<br /><br /><strong>2. Connect and login to remote host:</strong></p><p>The First time when you login to the remotehost from a localhost, it will display the host key not found message and you can give &ldquo;yes&rdquo; to continue. The host key of the remote host will be added under .ssh2/hostkeys directory of your home directory, as shown below.<br /><br />localhost$ ssh -l jit remotehost.example.com<br /><br />jit@remotehost.example.com password:</p><p>remotehost.example.com$</p><p>The Second time when you login to the remote host from the localhost, it will prompt only for the password as the remote host key is already added to the known hosts list of the ssh client.<br /><br />localhost$ ssh -l jit remotehost.example.com<br />jit@remotehost.example.com password: <br />remotehost.example.com$<br /><br />For some reason, if the host key of the remote host is changed after you logged in for the first time, you may get a warning message as shown below. This could be because of various reasons such as 1) Sysadmin upgraded/reinstalled the SSH server on the remote host 2) someone is doing malicious activity etc., The best possible action to take before saying &ldquo;yes&rdquo; to the message below, is to call your sysadmin and identify why you got the host key changed message and verify whether it is the correct host key or not.<br /><br />localhost$ ssh -l jit remotehost.example.com<br /><br />jit @remotehost.example.com's password: <br />remotehost$<br /><br /><strong>4. Debug SSH Client:</strong><br /><br />Sometimes it is necessary to view debug messages to troubleshoot any SSH connection issues. For this purpose, pass -v (lowercase v) option to the ssh as shown below.<br /><br />Example without debug message:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; localhost$ ssh -l jit remotehost.example.com<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; warning: Connecting to remotehost.example.com failed: No address associated to the name<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; localhost$</p><p>Example with debug message:<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; locaclhost$ ssh -v -l jit remotehost.example.com<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; debug: SshConfig/sshconfig.c:2838/ssh2_parse_config_ext: Metaconfig parsing stopped at line 3.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; debug: SshConfig/sshconfig.c:637/ssh_config_set_param_verbose: Setting variable 'VerboseMode' to 'FALSE'.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; debug: SshConfig/sshconfig.c:3130/ssh_config_read_file_ext: Read 17 params from config file.<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; debug: Ssh2/ssh2.c:1707/main: User config file not found, using defaults. (Looked for '/home/jit/.ssh2/ssh2_config')<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; debug: Connecting to remotehost.example.com, port 22... (SOCKS not used)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; warning: Connecting to remotehost.example.com failed: No address associated to</p><p><strong>5. Escape Character: (Toggle SSH session, SSH session statistics etc.)</strong><br /><br />Escape character ~ get&rsquo;s SSH clients attention and the character following the ~ determines the escape command.<br />Toggle SSH Session: When you&rsquo;ve logged on to the remotehost using ssh from the localhost, you may want to come back to the localhost to perform some activity and go back to remote host again. In this case, you don&rsquo;t need to disconnect the ssh session to the remote host. Instead follow the steps below.</p><p>i. Login to remotehost from localhost: localhost$ssh -l jit remotehost<br />ii. Now you are connected to the remotehost: remotehost$<br />iii. To come back to the localhost temporarily, type the escape character ~ and Control-Z. When you type ~ you will not see that immediately on the screen until you press and press enter. So, on the remotehost in a new line enter the following key strokes for the below to work: ~<br /><br />&nbsp;&nbsp;&nbsp; remotehost$ ~^Z<br />&nbsp;&nbsp;&nbsp; [1]+&nbsp; Stopped&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ssh -l jit remotehost<br />&nbsp;&nbsp;&nbsp; localhost$</p><p>iv. Now you are back to the localhost and the ssh remotehost client session runs as a typical unix background job, which you can check as shown below:<br /><br />&nbsp;&nbsp;&nbsp; localhost$ jobs<br />&nbsp;&nbsp;&nbsp; [1]+&nbsp; Stopped&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ssh -l jit remotehost<br /><br />v. You can go back to the remote host ssh without entering the password again by bringing the background ssh remotehost session job to foreground on the localhost<br /><br />&nbsp;&nbsp;&nbsp; localhost$ fg %1<br />&nbsp;&nbsp;&nbsp; ssh -l jit remotehost<br />&nbsp;&nbsp;&nbsp; remotehost$</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/9639/find-certain-filesdocuments-in-linux-os</guid>
	<pubDate>Sun, 06 Apr 2014 23:56:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/9639/find-certain-filesdocuments-in-linux-os</link>
	<title><![CDATA[Find certain files/documents in Linux OS]]></title>
	<description><![CDATA[<p>As bioinformatician I know the fact that we usually handle the large dataset and lost in the huge numbers of files and folders. In order to search the missing file a strong search command is required. The Linux Find Command is one of the most important and much used command in Linux sytems. Find command used to search and locate list of files and directories based on conditions you specify for files that match the arguments. Find can be used in variety of conditions like you can find files by permissions, users, groups, file type, date, size and other possible criteria.<br /><br />Through this article we are sharing our day-to-day Linux find command experience and its usage in the form of examples. In this article we will show you the most used 35 Find Commands examples in Linux. We have divided the section into Five parts from basic to advance usage of find command.</p><p><strong>Part I &ndash; Basic Find Commands for Finding Files with Names</strong><br />1. Find Files Using Name in Current Directory<br /><br />Find all the files whose name is gene.txt in a current working directory.<br /><br /># find . -name gene.txt<br /><br />./gene.txt<br /><br />2. Find Files Under Home Directory<br /><br />Find all the files under /home directory with name gene.txt.<br /><br /># find /home -name gene.txt<br /><br />/home/gene.txt<br /><br />3. Find Files Using Name and Ignoring Case<br /><br />Find all the files whose name is gene.txt and contains both capital and small letters in /home directory.<br /><br /># find /home -iname gene.txt<br /><br />./gene.txt<br />./Gene.txt<br /><br />4. Find Directories Using Name<br /><br />Find all directories whose name is Gene in / directory.<br /><br /># find / -type d -name Gene<br /><br />/Gene<br /><br />5. Find fasta Files Using Name<br /><br />Find all php files whose name is gene.fasta in a current working directory.<br /><br /># find . -type f -name gene.fasta<br /><br />./gene.fasta<br /><br />6. Find all PHP Files in Directory<br /><br />Find all fasta files in a directory.<br /><br /># find . -type f -name "*.fasta"<br /><br />./gene.fasta<br />./cancer.fasta<br />./allgene.fasta<br /><br /><strong>Part II &ndash; Find Files Based on their Permissions</strong><br />7. Find Files With 777 Permissions<br /><br />Find all the files whose permissions are 777.<br /><br /># find . -type f -perm 0777 -print<br /><br />8. Find Files Without 777 Permissions<br /><br />Find all the files without permission 777.<br /><br /># find / -type f ! -perm 777<br /><br />9. Find SGID Files with 644 Permissions<br /><br />Find all the SGID bit files whose permissions set to 644.<br /><br /># find / -perm 2644<br /><br />10. Find Sticky Bit Files with 551 Permissions<br /><br />Find all the Sticky Bit set files whose permission are 551.<br /><br /># find / -perm 1551<br /><br />11. Find SUID Files<br /><br />Find all SUID set files.<br /><br /># find / -perm /u=s<br /><br />12. Find SGID Files<br /><br />Find all SGID set files.<br /><br /># find / -perm /g+s<br /><br />13. Find Read Only Files<br /><br />Find all Read Only files.<br /><br /># find / -perm /u=r<br /><br />14. Find Executable Files<br /><br />Find all Executable files.<br /><br /># find / -perm /a=x<br /><br />15. Find Files with 777 Permissions and Chmod to 644<br /><br />Find all 777 permission files and use chmod command to set permissions to 644.<br /><br /># find / -type f -perm 0777 -print -exec chmod 644 {} \;<br /><br />16. Find Directories with 777 Permissions and Chmod to 755<br /><br />Find all 777 permission directories and use chmod command to set permissions to 755.<br /><br /># find / -type d -perm 777 -print -exec chmod 755 {} \;<br /><br />17. Find and remove single File<br /><br />To find a single file called gene.txt and remove it.<br /><br /># find . -type f -name "gene.txt" -exec rm -f {} \;<br /><br />18. Find and remove Multiple File<br /><br />To find and remove multiple files such as .fa or .gb, then use.<br /><br /># find . -type f -name "*.fa" -exec rm -f {} \;<br /><br />OR<br /><br /># find . -type f -name "*.gb" -exec rm -f {} \;<br /><br />19. Find all Empty Files<br /><br />To file all empty files under certain path.<br /><br /># find /tmp -type f -empty<br /><br />20. Find all Empty Directories<br /><br />To file all empty directories under certain path.<br /><br /># find /tmp -type d -empty<br /><br />21. File all Hidden Files<br /><br />To find all hidden files, use below command.<br /><br /># find /tmp -type f -name ".*"<br /><br /><strong>Part III &ndash; Search Files Based On Owners and Groups</strong><br />22. Find Single File Based on User<br /><br />To find all or single file called gene.txt under / root directory of owner root.<br /><br /># find / -user root -name gene.txt<br /><br />23. Find all Files Based on User<br /><br />To find all files that belongs to user Rahul under /home directory.<br /><br /># find /home -user rahul<br /><br />24. Find all Files Based on Group<br /><br />To find all files that belongs to group Developer under /home directory.<br /><br /># find /home -group developer<br /><br />25. Find Particular Files of User<br /><br />To find all .txt files of user Rahul under /home directory.<br /><br /># find /home -user rahul -iname "*.txt"<br /><br /><strong>Part IV &ndash; Find Files and Directories Based on Date and Time</strong><br />26. Find Last 50 Days Modified Files<br /><br />To find all the files which are modified 50 days back.<br /><br /># find / -mtime 50<br /><br />27. Find Last 50 Days Accessed Files<br /><br />To find all the files which are accessed 50 days back.<br /><br /># find / -atime 50<br /><br />28. Find Last 50-100 Days Modified Files<br /><br />To find all the files which are modified more than 50 days back and less than 100 days.<br /><br /># find / -mtime +50 &ndash;mtime -100<br /><br />29. Find Changed Files in Last 1 Hour<br /><br />To find all the files which are changed in last 1 hour.<br /><br /># find / -cmin -60<br /><br />30. Find Modified Files in Last 1 Hour<br /><br />To find all the files which are modified in last 1 hour.<br /><br /># find / -mmin -60<br /><br />31. Find Accessed Files in Last 1 Hour<br /><br />To find all the files which are accessed in last 1 hour.<br /><br /># find / -amin -60<br /><br /><strong>Part V &ndash; Find Files and Directories Based on Size</strong><br />32. Find 50MB Files<br /><br />To find all 50MB files, use.<br /><br /># find / -size 50M<br /><br />33. Find Size between 50MB &ndash; 100MB<br /><br />To find all the files which are greater than 50MB and less than 100MB.<br /><br /># find / -size +50M -size -100M<br /><br />34. Find and Delete 100MB Files<br /><br />To find all 100MB files and delete them using one single command.<br /><br /># find / -size +100M -exec rm -rf {} \;<br /><br />35. Find Specific Files and Delete<br /><br />Find all .gb files with more than 10MB and delete them using one single command.<br /><br /># find / -type f -name *.gb -size +10M -exec rm {} \;</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>