<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: July 2022]]></title>
	<link>https://bioinformaticsonline.com/blog/archive/shruti/1656651600/1659330000?</link>
	<atom:link href="https://bioinformaticsonline.com/blog/archive/shruti/1656651600/1659330000?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43916/understanding-dump-files-from-ncbi-taxonomy-database</guid>
	<pubDate>Fri, 15 Jul 2022 04:29:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43916/understanding-dump-files-from-ncbi-taxonomy-database</link>
	<title><![CDATA[Understanding DUMP files from NCBI Taxonomy database !]]></title>
	<description><![CDATA[<p>*.dmp files are bcp-like dump from GenBank taxonomy database</p><p>General information.</p><p>Field terminator is "\t|\t"</p><p>Row terminator is "\t|\n"</p><p>&nbsp;</p><p>nodes.dmp file consists of taxonomy nodes. The description for each node includes the following</p><p>fields:</p><p>tax_id -- node id in GenBank taxonomy database</p><p>&nbsp; parent tax_id -- parent node id in GenBank taxonomy database</p><p>&nbsp; rank -- rank of this node (superkingdom, kingdom, ...)&nbsp;</p><p>&nbsp; embl code -- locus-name prefix; not unique</p><p>&nbsp; division id -- see division.dmp file</p><p>&nbsp; inherited div flag&nbsp; (1 or 0) -- 1 if node inherits division from parent</p><p>&nbsp; genetic code id -- see gencode.dmp file</p><p>&nbsp; inherited GC&nbsp; flag&nbsp; (1 or 0) -- 1 if node inherits genetic code from parent</p><p>&nbsp; mitochondrial genetic code id -- see gencode.dmp file</p><p>&nbsp; inherited MGC flag&nbsp; (1 or 0) -- 1 if node inherits mitochondrial gencode from parent</p><p>&nbsp; GenBank hidden flag (1 or 0)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- 1 if name is suppressed in GenBank entry lineage</p><p>&nbsp; hidden subtree root flag (1 or 0) &nbsp; &nbsp; &nbsp; -- 1 if this subtree has no sequence data yet</p><p>&nbsp; comments -- free-text comments and citations</p><p>&nbsp;</p><p>Taxonomy names file (names.dmp):</p><p>tax_id -- the id of node associated with this name</p><p>name_txt -- name itself</p><p>unique name -- the unique variant of this name if name not unique</p><p>name class -- (synonym, common name, ...)</p><p>&nbsp;</p><p>Divisions file (division.dmp):</p><p>division id -- taxonomy database division id</p><p>division cde -- GenBank division code (three characters)</p><p>division name -- e.g. BCT, PLN, VRT, MAM, PRI...</p><p>comments</p><p>&nbsp;</p><p>Genetic codes file (gencode.dmp):</p><p>genetic code id -- GenBank genetic code id</p><p>abbreviation -- genetic code name abbreviation</p><p>name -- genetic code name</p><p>cde -- translation table for this genetic code</p><p>starts -- start codons for this genetic code</p><p>&nbsp;</p><p>Deleted nodes file (delnodes.dmp):</p><p>tax_id -- deleted node id</p><p>&nbsp;</p><p>Merged nodes file (merged.dmp):</p><p>old_tax_id&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- id of nodes which has been merged</p><p>new_tax_id&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- id of nodes which is result of merging</p><p>Citations file (citations.dmp):</p><p>cit_id -- the unique id of citation</p><p>cit_key -- citation key</p><p>pubmed_id -- unique id in PubMed database (0 if not in PubMed)</p><p>medline_id -- unique id in MedLine database (0 if not in MedLine)</p><p>url -- URL associated with citation</p><p>text -- any text (usually article name and authors).</p><p>-- The following characters are escaped in this text by a backslash:</p><p>-- newline (appear as "\n"),</p><p>-- tab character ("\t"),</p><p>-- double quotes ('\"'),</p><p>-- backslash character ("\\").</p><p>taxid_list -- list of node ids separated by a single space</p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43911/slurm-commands</guid>
	<pubDate>Wed, 06 Jul 2022 07:40:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43911/slurm-commands</link>
	<title><![CDATA[SLURM Commands]]></title>
	<description><![CDATA[<h3>SLURM commands</h3><p>The following table shows SLURM commands on the SOE cluster.</p><table border="1">
<thead>
<tr><th>Command</th><th>Description</th></tr>
</thead>
<tbody>
<tr>
<td><strong>sbatch</strong></td>
<td>Submit batch scripts to the cluster</td>
</tr>
<tr>
<td><strong>scancel</strong></td>
<td>Signal jobs or job steps that are under the control of Slurm.</td>
</tr>
<tr>
<td><strong>sinfo</strong></td>
<td>View information about SLURM nodes and partitions.</td>
</tr>
<tr>
<td><strong>squeue</strong></td>
<td>View information about jobs located in the SLURM scheduling queue</td>
</tr>
<tr>
<td><strong>smap</strong></td>
<td>Graphically view information about SLURM jobs, partitions, and set configurations parameters</td>
</tr>
<tr>
<td><strong>sqlog</strong></td>
<td>View information about running and finished jobs</td>
</tr>
<tr>
<td><strong>sacct</strong></td>
<td>View resource accounting information for finished and running jobs</td>
</tr>
<tr>
<td><strong>sstat</strong></td>
<td>View resource accounting information for running jobs</td>
</tr>
</tbody>
</table><p><span>For more information, run&nbsp;</span><strong>man</strong><span>&nbsp;on the commands above. See some examples below.</span><br /><br /><span style="font-size: large;"><strong>1. Info about the partitions and nodes</strong></span><span></span><br /><span>List all the partitions available to you and the nodes therein:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sinfo
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Nodes in state&nbsp;</span><tt>idle</tt><span>&nbsp;can accept new jobs.</span><br /><br /><span>Show a partition configuratuin, for example,&nbsp;</span><tt>SOE_main</tt><span></span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show partition=SOE_main
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Show current info about a specific node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=&lt;nodename&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>You can also specify a group of nodes in the command above. For example, if your MPI job is running across soenode05,06,35,36, you can execute the command below to get the info on the nodes you are interested in:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=soenode[05-06,35-36]
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>An informative parameter in the output to look at would be CPULoad. It allows you to see how your application utilizes the CPUs on the running nodes.</span><br /><br /><span style="font-size: large;"><strong>2. Submit scripts</strong></span><span></span><br /><span>The header in a submit script specifies job name, partition (queue), time limit, memory allocation, number of nodes, number of cores, and files to collect standard output and error at run time, for example</span></p><div><table border="1">
<tbody>
<tr>
<td>
<pre>#!/bin/bash

#SBATCH --job-name=OMP_run     # job name, "OMP_run"
#SBATCH --partition=SOE_main   # partition (queue)
#SBATCH -t 0-2:00              # time limit: (D-HH:MM) 
#SBATCH --mem=32000            # memory per node in MB 
#SBATCH --nodes=1              # number of nodes
#SBATCH --ntasks-per-node=16   # number of cores
#SBATCH --output=slurm.out     # file to collect standard output
#SBATCH --error=slurm.err      # file to collect standard errors
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>If the time limit is not specified in the submit script, SLURM will assign the default run time, 3 days. This means the job will be terminated by SLURM in 72 hrs. The maximum allowed run time is two weeks,&nbsp;</span><tt>14-0:00</tt><span>.</span><br /><span>If the memory limit is not requested, SLURM will assign the default 16 GB. The maximum allowed memory per node is 128 GB. To see how much RAM per node your job is using, you can run commands&nbsp;</span><tt>sacct</tt><span>&nbsp;or&nbsp;</span><tt>sstat</tt><span>&nbsp;to query MaxRSS for the job on the node - see examples below.</span><br /><span>Depending on a type of application you need to run, the submit script may contain commands to create a temporary space on a computational node -&nbsp;</span><a href="http://ecs.rutgers.edu/file_systems.html">see the discussion about using the file systems on the cluster.</a><span></span><br /><span>Then it sets the environment specific to the application and starts the application on one or multiple nodes - see sbatch sample scripts in directory&nbsp;</span><tt>/usr/local/Samples</tt><span>&nbsp;on soemaster1.hpc.rutgers.edu.</span><br /><span>You can submit your job to the cluster with&nbsp;</span><tt>sbatch</tt><span>&nbsp;command:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sbatch myscript.sh
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>3. Query job information</strong></span><span></span><br /><span>List all currently submitted jobs in running and pending states for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Command&nbsp;</span><tt>squeue</tt><span>&nbsp;can be run with format options to expose specific information, for example, when pending job #706 is scheduled to start running:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>START_TIME
2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>More info can be shown by placing additional format options, for example:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%i %P %j %u %T %l %C %S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>JOBID PARTITION   NAME    USER STATE   TIMELIMIT  CPUS START_TIME
706   SOE_main  Par_job_3 mike PENDING 3-00:00:00 64   2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To see when all the jobs, pending in the queue, are scheduled to start:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue --start 
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span>List all running and completed jobs for a user</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>or</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j &lt;JobID&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The following appreviations are used for the job states:</span></p><pre>       CA   CANCELLED      Job was cancelled.

       CD   COMPLETED      Job completed normally.

       CG   COMPLETING     Job is in the process of completing.

       F    FAILED         Job termined abnormally.

       NF   NODE_FAIL      Job terminated due to node failure.

       PD   PENDING        Job is pending allocation.

       R    RUNNING        Job currently has an allocation.

       S    SUSPENDED      Job is suspended.

       TO   TIMEOUT        Job terminated upon reaching its time limit.
</pre><p><span>You can specify the fields you would like to see in the output of&nbsp;</span><tt>sqlog</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog --format=list
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The command below, for example, provides Job ID, user name, exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j 2831 --format=jid,user,state,start,end
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>List status info for a currently running job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>A formatted output can be used to gain only a specific info, for example, the maximum resident RAM usage on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat --format="JobID,MaxRSS" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To get statistics on completed jobs by jobID:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To view the same information for all jobs of a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To print a list of fields that can be specified with the&nbsp;</span><tt>--format</tt><span>&nbsp;option:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --helpformat
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>For example, to get Job ID, Job name, Exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct -j 2831 --format="JobID,JobName,State,Start,End"
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Another useful command to gain information about a running job is&nbsp;</span><tt>scontrol</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show job=&lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>4. Cancel a job</strong></span><span></span><br /><span>To cancel one job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one job and delete the TMP directory created by the submit script on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sdel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel all the jobs for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one or more jobs by name:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel --name &lt;myJobName&gt;
</pre>
</td>
</tr>
</tbody>
</table></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>

</channel>
</rss>