<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Shruti Paniwala's blogs]]></title>
	<link>https://bioinformaticsonline.com/blog/owner/shruti?</link>
	<atom:link href="https://bioinformaticsonline.com/blog/owner/shruti?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43916/understanding-dump-files-from-ncbi-taxonomy-database</guid>
	<pubDate>Fri, 15 Jul 2022 04:29:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43916/understanding-dump-files-from-ncbi-taxonomy-database</link>
	<title><![CDATA[Understanding DUMP files from NCBI Taxonomy database !]]></title>
	<description><![CDATA[<p>*.dmp files are bcp-like dump from GenBank taxonomy database</p><p>General information.</p><p>Field terminator is "\t|\t"</p><p>Row terminator is "\t|\n"</p><p>&nbsp;</p><p>nodes.dmp file consists of taxonomy nodes. The description for each node includes the following</p><p>fields:</p><p>tax_id -- node id in GenBank taxonomy database</p><p>&nbsp; parent tax_id -- parent node id in GenBank taxonomy database</p><p>&nbsp; rank -- rank of this node (superkingdom, kingdom, ...)&nbsp;</p><p>&nbsp; embl code -- locus-name prefix; not unique</p><p>&nbsp; division id -- see division.dmp file</p><p>&nbsp; inherited div flag&nbsp; (1 or 0) -- 1 if node inherits division from parent</p><p>&nbsp; genetic code id -- see gencode.dmp file</p><p>&nbsp; inherited GC&nbsp; flag&nbsp; (1 or 0) -- 1 if node inherits genetic code from parent</p><p>&nbsp; mitochondrial genetic code id -- see gencode.dmp file</p><p>&nbsp; inherited MGC flag&nbsp; (1 or 0) -- 1 if node inherits mitochondrial gencode from parent</p><p>&nbsp; GenBank hidden flag (1 or 0)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- 1 if name is suppressed in GenBank entry lineage</p><p>&nbsp; hidden subtree root flag (1 or 0) &nbsp; &nbsp; &nbsp; -- 1 if this subtree has no sequence data yet</p><p>&nbsp; comments -- free-text comments and citations</p><p>&nbsp;</p><p>Taxonomy names file (names.dmp):</p><p>tax_id -- the id of node associated with this name</p><p>name_txt -- name itself</p><p>unique name -- the unique variant of this name if name not unique</p><p>name class -- (synonym, common name, ...)</p><p>&nbsp;</p><p>Divisions file (division.dmp):</p><p>division id -- taxonomy database division id</p><p>division cde -- GenBank division code (three characters)</p><p>division name -- e.g. BCT, PLN, VRT, MAM, PRI...</p><p>comments</p><p>&nbsp;</p><p>Genetic codes file (gencode.dmp):</p><p>genetic code id -- GenBank genetic code id</p><p>abbreviation -- genetic code name abbreviation</p><p>name -- genetic code name</p><p>cde -- translation table for this genetic code</p><p>starts -- start codons for this genetic code</p><p>&nbsp;</p><p>Deleted nodes file (delnodes.dmp):</p><p>tax_id -- deleted node id</p><p>&nbsp;</p><p>Merged nodes file (merged.dmp):</p><p>old_tax_id&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- id of nodes which has been merged</p><p>new_tax_id&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -- id of nodes which is result of merging</p><p>Citations file (citations.dmp):</p><p>cit_id -- the unique id of citation</p><p>cit_key -- citation key</p><p>pubmed_id -- unique id in PubMed database (0 if not in PubMed)</p><p>medline_id -- unique id in MedLine database (0 if not in MedLine)</p><p>url -- URL associated with citation</p><p>text -- any text (usually article name and authors).</p><p>-- The following characters are escaped in this text by a backslash:</p><p>-- newline (appear as "\n"),</p><p>-- tab character ("\t"),</p><p>-- double quotes ('\"'),</p><p>-- backslash character ("\\").</p><p>taxid_list -- list of node ids separated by a single space</p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43911/slurm-commands</guid>
	<pubDate>Wed, 06 Jul 2022 07:40:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43911/slurm-commands</link>
	<title><![CDATA[SLURM Commands]]></title>
	<description><![CDATA[<h3>SLURM commands</h3><p>The following table shows SLURM commands on the SOE cluster.</p><table border="1">
<thead>
<tr><th>Command</th><th>Description</th></tr>
</thead>
<tbody>
<tr>
<td><strong>sbatch</strong></td>
<td>Submit batch scripts to the cluster</td>
</tr>
<tr>
<td><strong>scancel</strong></td>
<td>Signal jobs or job steps that are under the control of Slurm.</td>
</tr>
<tr>
<td><strong>sinfo</strong></td>
<td>View information about SLURM nodes and partitions.</td>
</tr>
<tr>
<td><strong>squeue</strong></td>
<td>View information about jobs located in the SLURM scheduling queue</td>
</tr>
<tr>
<td><strong>smap</strong></td>
<td>Graphically view information about SLURM jobs, partitions, and set configurations parameters</td>
</tr>
<tr>
<td><strong>sqlog</strong></td>
<td>View information about running and finished jobs</td>
</tr>
<tr>
<td><strong>sacct</strong></td>
<td>View resource accounting information for finished and running jobs</td>
</tr>
<tr>
<td><strong>sstat</strong></td>
<td>View resource accounting information for running jobs</td>
</tr>
</tbody>
</table><p><span>For more information, run&nbsp;</span><strong>man</strong><span>&nbsp;on the commands above. See some examples below.</span><br /><br /><span style="font-size: large;"><strong>1. Info about the partitions and nodes</strong></span><span></span><br /><span>List all the partitions available to you and the nodes therein:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sinfo
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Nodes in state&nbsp;</span><tt>idle</tt><span>&nbsp;can accept new jobs.</span><br /><br /><span>Show a partition configuratuin, for example,&nbsp;</span><tt>SOE_main</tt><span></span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show partition=SOE_main
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Show current info about a specific node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=&lt;nodename&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>You can also specify a group of nodes in the command above. For example, if your MPI job is running across soenode05,06,35,36, you can execute the command below to get the info on the nodes you are interested in:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show node=soenode[05-06,35-36]
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>An informative parameter in the output to look at would be CPULoad. It allows you to see how your application utilizes the CPUs on the running nodes.</span><br /><br /><span style="font-size: large;"><strong>2. Submit scripts</strong></span><span></span><br /><span>The header in a submit script specifies job name, partition (queue), time limit, memory allocation, number of nodes, number of cores, and files to collect standard output and error at run time, for example</span></p><div><table border="1">
<tbody>
<tr>
<td>
<pre>#!/bin/bash

#SBATCH --job-name=OMP_run     # job name, "OMP_run"
#SBATCH --partition=SOE_main   # partition (queue)
#SBATCH -t 0-2:00              # time limit: (D-HH:MM) 
#SBATCH --mem=32000            # memory per node in MB 
#SBATCH --nodes=1              # number of nodes
#SBATCH --ntasks-per-node=16   # number of cores
#SBATCH --output=slurm.out     # file to collect standard output
#SBATCH --error=slurm.err      # file to collect standard errors
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>If the time limit is not specified in the submit script, SLURM will assign the default run time, 3 days. This means the job will be terminated by SLURM in 72 hrs. The maximum allowed run time is two weeks,&nbsp;</span><tt>14-0:00</tt><span>.</span><br /><span>If the memory limit is not requested, SLURM will assign the default 16 GB. The maximum allowed memory per node is 128 GB. To see how much RAM per node your job is using, you can run commands&nbsp;</span><tt>sacct</tt><span>&nbsp;or&nbsp;</span><tt>sstat</tt><span>&nbsp;to query MaxRSS for the job on the node - see examples below.</span><br /><span>Depending on a type of application you need to run, the submit script may contain commands to create a temporary space on a computational node -&nbsp;</span><a href="http://ecs.rutgers.edu/file_systems.html">see the discussion about using the file systems on the cluster.</a><span></span><br /><span>Then it sets the environment specific to the application and starts the application on one or multiple nodes - see sbatch sample scripts in directory&nbsp;</span><tt>/usr/local/Samples</tt><span>&nbsp;on soemaster1.hpc.rutgers.edu.</span><br /><span>You can submit your job to the cluster with&nbsp;</span><tt>sbatch</tt><span>&nbsp;command:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sbatch myscript.sh
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>3. Query job information</strong></span><span></span><br /><span>List all currently submitted jobs in running and pending states for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Command&nbsp;</span><tt>squeue</tt><span>&nbsp;can be run with format options to expose specific information, for example, when pending job #706 is scheduled to start running:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>START_TIME
2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>More info can be shown by placing additional format options, for example:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue -j 706 --format="%i %P %j %u %T %l %C %S"
</pre>
</td>
</tr>
</tbody>
</table></div><div><table border="1">
<tbody>
<tr>
<td>
<pre>JOBID PARTITION   NAME    USER STATE   TIMELIMIT  CPUS START_TIME
706   SOE_main  Par_job_3 mike PENDING 3-00:00:00 64   2015-04-30T09:54:32
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To see when all the jobs, pending in the queue, are scheduled to start:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>squeue --start 
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span>List all running and completed jobs for a user</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>or</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j &lt;JobID&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The following appreviations are used for the job states:</span></p><pre>       CA   CANCELLED      Job was cancelled.

       CD   COMPLETED      Job completed normally.

       CG   COMPLETING     Job is in the process of completing.

       F    FAILED         Job termined abnormally.

       NF   NODE_FAIL      Job terminated due to node failure.

       PD   PENDING        Job is pending allocation.

       R    RUNNING        Job currently has an allocation.

       S    SUSPENDED      Job is suspended.

       TO   TIMEOUT        Job terminated upon reaching its time limit.
</pre><p><span>You can specify the fields you would like to see in the output of&nbsp;</span><tt>sqlog</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog --format=list
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>The command below, for example, provides Job ID, user name, exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sqlog -j 2831 --format=jid,user,state,start,end
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>List status info for a currently running job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>A formatted output can be used to gain only a specific info, for example, the maximum resident RAM usage on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sstat --format="JobID,MaxRSS" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To get statistics on completed jobs by jobID:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -j &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To view the same information for all jobs of a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --format="JobID,JobName,MaxRSS,Elapsed" -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To print a list of fields that can be specified with the&nbsp;</span><tt>--format</tt><span>&nbsp;option:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct --helpformat
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>For example, to get Job ID, Job name, Exit state, start date-time, and end date-time for job #2831:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sacct -j 2831 --format="JobID,JobName,State,Start,End"
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>Another useful command to gain information about a running job is&nbsp;</span><tt>scontrol</tt><span>:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scontrol show job=&lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><br /><span style="font-size: large;"><strong>4. Cancel a job</strong></span><span></span><br /><span>To cancel one job:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one job and delete the TMP directory created by the submit script on a node:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>sdel &lt;jobid&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel all the jobs for a user:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel -u &lt;username&gt;
</pre>
</td>
</tr>
</tbody>
</table></div><p><span>To cancel one or more jobs by name:</span></p><div><table border="0" style="background-color: #D0D0D0;">
<tbody>
<tr>
<td>
<pre>scancel --name &lt;myJobName&gt;
</pre>
</td>
</tr>
</tbody>
</table></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43900/finding-a-mimicry-game-for-teaching-on-line-and-mentioned-general-resources</guid>
	<pubDate>Tue, 28 Jun 2022 07:32:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43900/finding-a-mimicry-game-for-teaching-on-line-and-mentioned-general-resources</link>
	<title><![CDATA[Finding a mimicry game for teaching on-line and mentioned general resources]]></title>
	<description><![CDATA[<pre>Mimicry and other resources
Mimicry games:
Great Heliconius game:
http://heliconius.org/evolving_butterflies/
(See also 
https://royalsocietypublishing.org/doi/10.1098/rspb.2020.0014)
Other one, a bit less friendly:
https://ccl.northwestern.edu/netlogo/models/Mimicry
Camouflage practical
https://alexis-catherine.github.io/publication/natural-selection-and-camouflage/
(NetLogo also has one: 
https://ccl.northwestern.edu/netlogo/models/BugHuntCamouflage)
Peppered moth game:
https://askabiologist.asu.edu/peppered-moths-game/play.html

General resources
The always popular Populus:
https://cbs.umn.edu/populus/overview
Drift &amp; Gene Flow 
https://cartwrig.ht/apps/genie/
(Cock van Oosterhout has a great ppt to lead students through this)
See also https://cartwrig.ht/apps/redlynx/
https://demonstrations.wolfram.com/ReplicatorMutatorDynamicsWithThreeStrategies/
NetLogo:
http://ccl.northwestern.edu/netlogo/models/index.cgi
Population Genetics:
https://www.radford.edu/~rsheehy/Gen_flash/popgen/
Evolution in general
https://evolution.berkeley.edu/evolibrary/home.php
Mitochondrial Eve:
https://projects.ncsu.edu/cals/gn/ex/mit-eve.html
Y chromosomes:
https://projects.ncsu.edu/cals/gn/ex/y-chrom.html
A professional online package from Michael Kasumovic:
https://arludo.com/
a compilation of resources:
https://planted.botany.org/index.php?P=Home
Finally, Donald Forsdyke has some great on-line videos explaining
evolutionary principles (occasionally in a fake Scottish accent):
http://post.queensu.ca/~forsdyke/videolectures.htm
</pre><p>&nbsp;</p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43898/online-resources-on-must-read-papers-in-evolutionary-biology-for-a-literature-club</guid>
	<pubDate>Tue, 28 Jun 2022 07:29:08 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43898/online-resources-on-must-read-papers-in-evolutionary-biology-for-a-literature-club</link>
	<title><![CDATA[Online resources on must-read papers in evolutionary biology, for a literature club]]></title>
	<description><![CDATA[<pre>1.       *Nick Barton:*

- The textbook "Evolution" by Nick Barton, with resources for
  exploring the literature: Barton, N. H., Briggs, D. E. G., Eisen, J.
  A., Goldstein, D. B., &amp; Patel, N. H. (2007). Evolution. Cold Spring
  Harbor Laboratory Press.

- Papers from a course named "Classics in Evolutionary Biology":

Evolutionary Synthesis
1. Haldane, J. B. S. 1932. The causes of evolution. Longmans. New York.
   (esp. Ch. IV).
2. Fisher, R. A. 1930. The genetical theory of natural selection. Oxford
   University Press, Oxford. Selected Sections - Fundamental Theorem.

Genetic Variation
1a. Lewontin, R. C., and J. L. Hubby. 1966. A molecular approach to
the study of genic heterozygosity in natural populations. II. Amount
of variation and degree of heterozygosity in natural populations of
Drosophila pseudoobscura. Genetics. 54:595-609.

1b. Sachidandam et al. 2001. A map of human genome sequence variation
containing 1.42 million single nucleotide polymorphisms. 409: 928-33.

2. Wright S., Dobzhansky T., Hovanitz W. 1942 Genetics of natural
populations VII The allelism of lethals in the third chromosome of
Drosophila pseudoobscura. Genetics 27: 363-394.

Recombination and evolution
1. Hill, W. G., and A. Robertson. 1966. The effect of linkage on limits
to artificial selection. Genet. Res. 8:269-294.

2. Maynard Smith and Haigh. 1974. The hitch-hiking effect of a favourable
gene. Genet. Res. 23: 23-35.

Understanding sequence variation
1. Begun D. J., Aquadro C. F., 1992 Levels of naturally occurring DNA
polymorphism correlate with recombination rate in Drosophila melanogaster.
Nature 356: 519-520.

2. Green R. E., Reich D., P&auml;&auml;bo S., 2010 A draft sequence of the
Neandertal genome. Science 328: 710-722.

Quantitative Genetics:  variation in complex traits
1. Galton F., 1877 Typical laws of heredity. Nature 15: 492-495-
512-514- 532-533.

2. Turelli M., 1984 Heritable genetic variation via
mutation-selection balance: Lerch's Zeta meets the abdominal
bristle. Theor. Popul. Biol. 25: 138-193.

Quantitative Genetics:  finding the genes
1. Shrimpton A. E., Robertson A., 1988 The Isolation of polygenic factors
controlling bristle score in Drosophila melanogaster II Distribution of
third chromosome bristle effects within chromosome sections. Genetics
118: 445-459.

2. Boyle E. A., Li Y. I., Pritchard J. K., 2017 An expanded view of
complex traits: from polygenic to omnigenic. Cell 169: 1177-1186.

Neutral Evolution
1. Kimura, M. 1968. Evolutionary rate at the molecular level. Science.
217:624-626.

2a. Kern A. D., Hahn M. W., 2018 The Neutral Theory in Light of Natural
Selection. Molecular Biology and Evolution 110: 21077-6.

2b. Jensen J. D., Payseur B. A., Stephan W., Aquadro C. F., Lynch M.,
Charlesworth D., Charlesworth B., 2018 The importance of the Neutral Theory
in 1968 and 50 years on: a response to Kern and Hahn 2018. Evolution 112:
2109-4.

2c. Ellegren &amp; Galtier. 2016. Determinants of genetic diversity. Nature
Reviews Genetics.

Mutation and Genetic Variability
1. Luria, S. E., and M. Delbr&uuml;ck. 1943. Mutations of Bacteria from Virus
Sensitivity to Virus Resistance. Genetics. 28(6):491-511.

2. Hill, W G. 1982. "Rates of Change in Quantitative Traits From Fixation
of New Mutations." Proceedings of the National Academy of Sciences (U.S.A.)
79: 142-45.

Testing for selection
1. McDonald &amp; Kreitman. 1991. Adaptive protein evolution at the Adh locus
in Drosophila. Nature.

2. Begun, et al. Mol. Biol. Evol. 16, 1816-1819 (1999).

3. Siddiq et al. 2016. Experimental test and refutation of a classic case
of molecular adaptation in Drosophila melanogaster.  Nature Ecology &amp;
Evolution.

The shifting balance
1. Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and
selection in evolution. Proceedings of the VI International Congress of
Genetics: 1. pp 356-366.

2. Coyne, J.A., N.H. Barton, and M. Turelli. 1997. A critique of Wright's
shifting balance theory of evolution.  Evolution 51: 643-671.

3. Barton. 2016. Sewall Wright on Evolution in Mendelian Populations and
the "Shifting Balance". Genetics.

Evolution of Sex
1.  Muller, H.J. 1964. The relation of recombination to mutational advance.
Mutation Res. 1(1):2-9

2. McDonald et al. 2016. Sex speeds adaptation by altering the dynamics of
molecular evolution. Nature.

Kin Selection, Cooperation, and Conflict
1. Hamilton, W. D. 1964. The genetical evolution of social behaviour I.
Journal of Theoretical Biology. 7:1-52.

2. Trivers, R. L. 1974 Parent-offspring conflict. American Zoologist.
14(1):249-264.

Sexual Selection
1. Zahavi, A. 1975. Mate selection - a selection of a handicap. J. Theor.
Biol. 53:205-214.

2. Kirkpatrick, M., and Ryan, M.J. 1991. The evolution of mating
preferences and the paradox of the lek. Nature. 350:33-38.

Fitness Landscapes
1. Dean, A. 1995. A Molecular Investigation of Genotype by Environment
Interactions. Genetics. 139:19-33.

2. Costanzo et al. 2010. The Genetic Landscape of a Cell. Science.

Speciation
1. Coyne, J. A., and H. A. Orr. 1989. Patterns of speciation in Drosophila.
Evolution. 43:362-381.

2. Corbett-Detig et al. 2013. Genetic incompatibilities are widespread
within species. Nature.

2.       *Marcos Antezana:*

Valen, L. v. 1975. Energy and Evolution. University of Chicago, Department
of Biology.

3.       *Remco Folkertsma:*

1. The work by Hopi Hoekstra on local adaptation and oldfield mice

2. Poelstra, J. W., Vijay, N., Bossu, C. M., Lantz, H., Ryll, B., M&uuml;ller,
I., ... &amp; Wolf, J. B. (2014). The genomic landscape underlying phenotypic
integrity in the face of gene flow in crows. Science, 344(6190), 1410-1414.

4.       *Joshka Kaufmann and Leslie Turner*

They offer us a link to 'papers every evolutionary biologist should read',
the papers are collected by Leslie Turner.
https://static1.squarespace.com/static/53e8cb7ce4b02c4bc3aeeee4/t/5ab8fcb670a6ad55c67fcdf4/1522072758665/EvoBioClassicsRefList.pdf

5.       *Sarah Stockwell*

Matt Ridley collected classic papers in evolutionary biology and printed
part of these papers in his book Evolution (see Matt Ridley. Evolution
(Univ. of Oxford Press, 2nd edition, 2004))</pre>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43896/list-of-comparative-genomics-resources</guid>
	<pubDate>Tue, 28 Jun 2022 04:08:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43896/list-of-comparative-genomics-resources</link>
	<title><![CDATA[List of comparative genomics resources !]]></title>
	<description><![CDATA[<div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1096638041"><span>3D-GENOMICS -- A Database to Compare Structural and Functional Annotations of Proteins between Sequenced Genomes</span></a></div><p>Compare structural and functional annotations of proteins between sequenced genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1100640374"><span>ARED Organism -- expansion of ARED reveals AU-rich element cluster variations between human and mouse</span></a></div><p>View AREs in the human transcriptome and study the comparative genomics of AREs in model organisms.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1234973128"><span>ATGC -- Alignable Tight Genomic Clusters Database</span></a></div><p>Find information about orthologous genes in prokaryotes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174596104"><span>AnimalQTLdb -- a livestock QTL database tool set for positional QTL information mining and beyond</span></a></div><p>Search for publicly available QTL data on livestocks and animal species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL20110518150135"><span>BGDB -- Bovine Genome Database</span></a></div><p>Find information about bovine genomics data.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1229012662"><span>COMPARE -- a multi-organism system for cross-species data comparison and transfer of information</span></a></div><p>A multi-organism web-based resource system designed to easily retrieve, correlate and interpret data across species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1218141952"><span>CONDOR -- COnserved Non-coDing Orthologous Regions</span></a></div><p>A database resource of developmentally associated conserved non-coding elements.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1099057221"><span>CORG -- A database for COmparative Regulatory Genomics</span></a></div><p>Delineate conserved non-coding blocks from upstream regions of putative orthologous gene pairs from man, mouse, rat, fugu, Mus musculus, Danio rerio, and zebrafish.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1203608896"><span>COXPRESdb -- a database of coexpressed gene networks in mammals</span></a></div><p>Find coexpressed gene lists and networks in human and mouse.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1097763045"><span>CVTree -- A Phylogenetic Tree Reconstruction Tool Based on Whole Genomes</span></a></div><p>Construct phylogenetic tree of microorganisms based on oligopeptide content of their complete proteomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1232729680"><span>CleanEST -- the cleansed EST libraries database</span></a></div><p>A novel database server that classifies GenBank's dbEST (database of expressed gene sequences) libraries and removes contaminants.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1256926144"><span>CoCoa -- COefficient of COAncestry software</span></a></div><p>Find information about the ancestral relationship between genes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1227549154"><span>CoGemiR -- a comparative genomics microRNA database</span></a></div><p>Provides an overview of the genomic organization of microRNAs and extent of conservation during evolution in different metazoan species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1117678221"><span>Comparative Genometrics (CG) -- a database dedicated to biometric comparisons of whole genomes</span></a></div><p>Conduct comparative biometric analysis of chromosomes of different organisms.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1151007916"><span>DoTS -- Database Of Transcribed Sequences</span></a></div><p>Search for Indices of gene and transcripts in human and mouse.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174510065"><span>DroSpeGe -- rapid access database for new Drosophila species genomes</span></a></div><p>Search and compare 12 new and old Drosophila genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1098208414"><span>ECR Browser -- A Tool for Visualizing and Accessing Data from Comparisons of Multiple Vertebrate Genomes</span></a></div><p>Access to whole genome alignments of human, mouse, rat and fish sequences.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209738459"><span>EPGD -- Eukaryotic Paralog Group Database</span></a></div><p>Find eukaryotic paralog/paralogon information.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1232726869"><span>EVOG -- evolutionary visualizer for overlapping genes</span></a></div><p>Analyze the evolutionary process of overlapping genes when comparing different species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1227633714"><span>GNAT -- Inter-species gene mention normalization (ISGN)</span></a></div><p>The first publicly available system reported to handle inter-species gene mention normalization.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1229438992"><span>GenColors -- annotation and comparative genomics of prokaryotes made easy</span></a></div><p>A web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1151086258"><span>GeneNest gene indices</span></a></div><p>Visualize gene indices of human, mouse, Arabidopsis, Zebrafish, Drosophila and Sheep.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174489378"><span>GenomeTrafac -- a whole genome resource for the detection of transcription factor binding site clusters associated with conventional and microRNA encoding genes conserved between mouse and human gene orthologs</span></a></div><p>Use comparative genomics approach to characterize gene models and identify putative cis-regulatory regions of RefSeq Gene Orthologs.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL20110518150753"><span>IKMC -- International Knockout Mouse Consortium web portal</span></a></div><p>Find information about mutated mouse genes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209411604"><span>IMG/M -- Integrated Microbial Genomes/Metagenomes</span></a></div><p>A data management and analysis system for metagenomes</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1234976694"><span>ISED -- Influenza sequence and epitope database.</span></a></div><p>Search for influenza sequence, vaccine, and drug resistance information.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL20140710115515"><span>LAMDHI: The Search for Animal Models Starts Here</span></a></div><p>LAMHDI, the initiative to Link Animal Models to Human DIsease, is designed to accelerate the research process by providing biomedical researchers with a simple, comprehensive Web-based resource to find the best animal models for their research.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1228843803"><span>MANTIS -- a phylogenetic framework for multi-species genome comparisons</span></a></div><p>The missing link between multi-species full genome comparisons and functional analysis.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1099578148"><span>MBGD -- Microbial genome database for comparative analysis</span></a></div><p>Conduct comparative analysis of completely sequenced microbial genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1221077729"><span>MEGA -- Molecular Evolutionary Genetics Analysis</span></a></div><p>A biologist-centric software for evolutionary analysis of DNA and protein sequences.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174596756"><span>MamPol -- a database of nucleotide polymorphism in the Mammalia class</span></a></div><p>Conduct single nucleotide polymorphisms diversity measurements among homologous sequences from the Mammalia class.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1266437314"><span>MicrobesOnline -- Prokaryotic Genome Database</span></a></div><p>Find information about 1000s of microbial genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1208461006"><span>Narcisse -- a mirror view of conserved syntenies</span></a></div><p>A database dedicated to the study of genome conservation.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1219772764"><span>OMA -- the Orthologous MAtrix project</span></a></div><p>Explore orthologous relations across 352 complete genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209738741"><span>OPTIC -- orthologous and paralogous transcripts in clades</span></a></div><p>Browse complete genomes in several clades.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209573208"><span>OrthoDB -- the hierarchical catalog of eukaryotic orthologs</span></a></div><p>Find groups of orthologous genes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1221231200"><span>OrthoMaM -- orthologous mammalian markers</span></a></div><p>A database of orthologous genomic markers for placental mammal phylogenetics.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1100009979"><span>PEDANT -- Protein Extraction, Description and ANalysis Tool</span></a></div><p>Conduct genome wide functional and structural analysis.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174489475"><span>PReMod -- a database of genome-wide mammalian cis-regulatory module predictions</span></a></div><p>Conduct genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1151083092"><span>PhenomicDB -- Comparison of phenotypes of orthologous genes in human and model organisms</span></a></div><p>Compare phenotypes of a given gene or gene set in different model organisms.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1190899370"><span>Phylemon -- A suite of web tools for molecular evolution, phylogenetics and phylogenomics</span></a></div><p>Phylemon is a web server that integrates a selected suite of more than 20 different tools from the most popular stand-alone programs of phylogenetic and evolutionary analysis.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1232555615"><span>PhyloPat -- the phylogenetic pattern database</span></a></div><p>Use this database to see where in the evolution some phylogenetic lineages were started, and over which species they were contained.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174510223"><span>Pristionchus.org -- a genome-centric database of the nematode satellite species Pristionchus pacificus</span></a></div><p>Search for genomic information on nematode satellite species Pristionchus pacificus.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1236367352"><span>ProtClustDB -- NCBI Protein Clusters Database</span></a></div><p>Find information about related protein sequences.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209410278"><span>ProtozoaDB -- database of protozoan genomes</span></a></div><p>Database hosting genomics and post-genomics data from multiple protozoans.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1232554690"><span>Pseudofam -- the pseudogene families database</span></a></div><p>A database of pseudogene families based on the protein families from the Pfam database.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL20110518151439"><span>RIDM - RIKEN Integrated Database of Mammals</span></a></div><p>Find genomic information about mammals.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1272562567"><span>RegPrecise -- Regulon Prediction Database</span></a></div><p>Find information about predicted regulons in prokaryotic transcription regulation.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1272477473"><span>SALAD -- Surveyed contained motif ALignment diagram and the Associating Dendrogram</span></a></div><p>Perform systematic comparison of proteome data among species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1229010765"><span>SGN -- SOL Genomics Network</span></a></div><p>A comparative map viewer dedicated to the biology of the Solanaceae family.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1256669040"><span>ShotgunFunctionalizeR -- R-package for functional comparison of metagenomes</span></a></div><p>Analyze data from functional analysis on fragmented microbial genetic material.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1256238439"><span>SnoopCGH -- Comparative Genomic Hybridization software</span></a></div><p>Visualize and explore comparative genomic hybridization data sets.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1174489598"><span>SwissRegulon -- a database of genome-wide annotations of regulatory sites</span></a></div><p>Search for genome-wide annotations of regulatory sites in yeast and prokaryotes genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1229013521"><span>TaxonGap -- a visualization tool for intra- and inter-species variation among individual biomarkers</span></a></div><p>Compare and select individual biomarkers.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1106063477"><span>The Adaptive Evolution Database (TAED) -- a phylogeny based tool for comparative genomics</span></a></div><p>Search for information on adaptive evolution in gene families of higher plants and chordate.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1216742716"><span>The CGView Server -- a comparative genomics tool for circular genomes</span></a></div><p>Generate graphical maps of circular genomes that show sequence features, base composition plots, analysis results and sequence similarity plots.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1099663588"><span>The ERGO -- Genome analysis and discovery system</span></a></div><p>Conduct a comprehensive analysis of genes and genomes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1177611772"><span>The Macaque Genome: Interactive Poster and Teaching Resource</span></a></div><p>An interactive online poster presentation on the Macaque genome, including high-quality images, video clips, and Web resources</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1103816940"><span>The TIGR Gene Indices -- clustering and assembling EST and known genes and integration with eukaryotic genomes</span></a></div><p>Search for annotated genetic information of expressed sequence tags (ESTs) in different eukaryotic organisms.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1043767169"><span>UniGene</span></a></div><p>Find mapping and expression information for a unigene cluster (ESTs and full-length mRNA sequences organized into clusters that each represent a unique known or putative gene)</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1216738072"><span>Uprobe -- universal overgo hybridization-based probe retrieval and design</span></a></div><p>A public online resource for identifying or designing 'universal' overgo-hybridization probes from conserved sequences that can be used to efficiently screen one or more genomic libraries from a designated group of species.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1098205291"><span>VISTA -- Computational Tools for Comparative Genomics</span></a></div><p>Comprehensive suite of programs and databases for comparative analysis of genomic sequences.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL20110518144404"><span>cBARBEL -- Catfish Breeder and Researcher Bioinformatics Entry Location</span></a></div><p>Find information about ictalurid catfish.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1209738040"><span>eggNOG -- evolutionary genealogy of genes: Non-supervised Orthologous Groups</span></a></div><p>Discover orthologous groups of genes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1234370319"><span>metaTIGER -- a metabolic gene evolution resource</span></a></div><p>Find metabolic networks and phylogenomic information on a taxonomically diverse range of eukaryotes.</p></div><div><div><a href="https://www.hsls.pitt.edu/obrc/index.php?page=URL1138901833"><span>xBASE -- a collection of online databases for bacterial comparative genomics</span></a></div><p>Conduct bacterial comparative genomics.</p></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/40413/50-iisc-raman-post-doctoral-fellowships</guid>
	<pubDate>Thu, 19 Dec 2019 09:59:12 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/40413/50-iisc-raman-post-doctoral-fellowships</link>
	<title><![CDATA[50 IISC Raman Post Doctoral Fellowships]]></title>
	<description><![CDATA[<p><span>IISC Bangalore has launched Raman Post-Doc Program. Apply For Raman Post Doctoral Fellowship at IISC Bangalore. Bioscience &amp; Chemical Science researchers are eligible to apply for&nbsp;IISC Raman Post Doctoral Fellowships. 50&nbsp;IISC Raman Post Doctoral Fellowships are available.</span></p><p>The Indian Institute of Science (IISc) has been recognised as an Institution of Eminence (IoE) by the Government of India. As a part of the IoE initiative, IISc has created the Raman Post-Doc Program, a highly selective Post-Doc program with 50 positions. The Institute invites applications for intensely motivated individuals with an established record of&nbsp;high quality&nbsp;research, for the positions of Raman Post-Docs. Overseas Citizens of India (OCI), Persons of Indian Origin (PIO), and foreign nationals are also eligible to apply.</p><p><span>The information below specifically pertains to applicants intending to work with Faculty in the Biological Sciences Division.</span></p><p>This is a rolling advertisement and candidates can apply any time during the year. The applications will be reviewed every four months around the following dates:&nbsp;<span>April 30,&nbsp;August 31, December 31</span>.</p><p><span>Further details about the various departments and interdisciplinary centres, faculty profiles, academic programs, and areas of research are available at the departmental websites </span></p><p><span>and also at&nbsp;</span><a href="http://www.iisc.ac.in/" target="_blank">www.iisc.ac.in</a></p><p>Note:&nbsp;<span>Candidates should preferably be less than 32 years of age at the time of applying.</span></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/40404/exchange-programme-for-indian-scientist</guid>
	<pubDate>Wed, 18 Dec 2019 21:11:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/40404/exchange-programme-for-indian-scientist</link>
	<title><![CDATA[Exchange Programme for Indian scientist !!]]></title>
	<description><![CDATA[<p>The Indian National Science Academy (INSA) is a premier scientific learned body (established in 1935) representing all branches of science &ndash;Physical and Biological Sciences including Engineering, Medicine and Agricultural Sciences. The Academy has been promoting scientific cooperation with Academies/Organisations of several countries the world over. The Academy has links with the Academies and Organisations in Asia, Europe<br />and South America. These programmes provide opportunities to scientists working in various scientific institutions and organizations in the country for exchange of ideas, knowledge, establish new links, strengthen old links and undertake joint projects with their research partners in leading laboratories and institutions abroad.</p><p>The Academy has an International Exchange Programme with Academies/Organizations in the countries:&nbsp;<span>Brazil, China, France, Hungary, Iran, Israel, Nepal, Philippines, Poland, Scotland, Slovak Republic, Republic of Slovenia, Sudan and Taiwan.</span></p><p>Applications are invited from Indian Nationals for consideration by the Academy for the next calendar year.</p><ul>
<li>The applicant should be a scientist holding a regular (<span>permanent</span>) position in a recognized S &amp; T Institution/University and actively engaged in research work in frontline areas.</li>
<li>He/She should not have been abroad during the last 3 years under any INSA Programme.</li>
<li>The scientist should have been accepted to work in an Institute/Laboratory in the country to be visited and this should be supported by a&nbsp;<span>letter of invitation</span>&nbsp;from the host abroad.</li>
<li>Those who wish to visit abroad for three months should submit a detailed programme of their collaborative research work to be conducted.</li>
</ul><p>All applications duly completed should be forwarded to the academy through proper channel by the employer/head of the Institute.</p><ul>
<li>Scientists selected for deputation abroad would be provided&nbsp;<span>100% travel support (by only Air India excursion class airfare, through shortest route from the place of duty in India to the nearest airport of host Institute and back)</span>&nbsp;by INSA.</li>
<li>Medical Insurance purchased in India.</li>
<li>Visa fee (if any).</li>
<li>The receiving Academy/Organization would provide local hospitality including internal travel abroad.</li>
</ul><p>Contact for detail at&nbsp;</p><p><a href="http://www.insaindia.res.in/" target="_blank"><span>www.insaindia.res.in</span></a></p><p><span>INDIAN NATIONAL SCIENCE ACADEMY</span><br /><span>Bahadur Shah Zafar Marg, New Delhi &ndash; 110 002.</span><br /><span>Telephone: 91-11-23221931 &ndash; 23221950 (EPABX),</span><br /><span>Fax: 91-11- 23235648, 23231095</span></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</guid>
	<pubDate>Wed, 14 Mar 2018 05:10:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</link>
	<title><![CDATA[Basic command-line to run BLAST]]></title>
	<description><![CDATA[<p>&nbsp;</p><p>The goal of this tutorial is to run you through a demonstration of the command line, which you may not have seen or used much before.</p><p>All of the commands below can copy/pasted.</p><div id="install-software"><h2>Install software<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#install-software" title="Permalink to this headline"></a></h2><p>Copy and paste the following commands</p><div><div><pre>sudo apt-get update &amp;&amp; sudo apt-get -y install python ncbi-blast+
</pre></div></div><p>This updates the software list and installs the Python programming language and NCBI BLAST+.</p></div><div id="get-data"><h2>Get Data<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#get-data" title="Permalink to this headline"></a></h2><p>Grab some data to play with. Grab some cow and human RefSeq proteins:</p><div><div><pre>wget ftp://ftp.ncbi.nih.gov/refseq/B_taurus/mRNA_Prot/cow.1.protein.faa.gz
wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz
</pre></div></div><p>This is only the first part of the human and cow protein files - there are 24 files total for human.</p><p>The database files are both gzipped, so lets unzip them</p><div><div><pre>gunzip *gz
ls
</pre></div></div><p>Take a look at the head of each file:</p><div><div><pre>head cow.1.protein.faa
head human.1.protein.faa
</pre></div></div><p>These are protein sequences in FASTA format. FASTA format is something many of you have probably seen in one form or another &ndash; it&rsquo;s pretty ubiquitous. It&rsquo;s just a text file, containing records; each record starts with a line beginning with a &lsquo;&gt;&rsquo;, and then contains one or more lines of sequence text.</p><p>Note that the files are in fasta format, even though they end if &rdquo;.faa&rdquo; instead of the usual &rdquo;.fasta&rdquo;. This NCBI&rsquo;s way of denoting that this is a fasta file with amino acids instead of nucleotides.</p><p>How many sequences are in each one?</p><div><div><pre>grep -c '^&gt;' cow.1.protein.faa
grep -c '^&gt;' human.1.protein.faa
</pre></div></div><p>This grep command uses the c flag, which reports a count of lines with match to the pattern. In this case, the pattern is a regular expression, meaning match only lines that begin with a &gt;.</p><p>This is a bit too big, lets take a smaller set for practice. Lets take the first two sequences of the cow proteins, which we can see are on the first 6 lines</p><div><div><pre>head -6 cow.1.protein.faa &gt; cow.small.faa
</pre></div></div></div><div id="blast"><h2>BLAST<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#blast" title="Permalink to this headline"></a></h2><p>Now we can blast these two cow sequences against the set of human sequences. First, we need to tell blast about our database. BLAST needs to do some pre-work on the database file prior to searching. This helps to make the software work a lot faster. Because you installed your own version of the sotware, you need to tell the shell where the software is located. Use the full path and the makeblastdb command:</p><div><div><pre>makeblastdb -in human.1.protein.faa -dbtype prot
ls
</pre></div></div><p>Note that this makes a lot of extra files, with the same name as the database plus new extensions (.pin, .psq, etc). To make blast work, these files, called index files, must be in the same directory as the fasta file.</p><p><br /> blastp [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-qcov_hsp_perc float_value]<br /> [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking]<br /> [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value]<br /> [-best_hit_overhang float_value] [-best_hit_score_edge float_value]<br /> [-window_size int_value] [-lcase_masking] [-query_loc range]<br /> [-parse_deflines] [-outfmt format] [-show_gis]<br /> [-num_descriptions int_value] [-num_alignments int_value]<br /> [-line_length line_length] [-html] [-max_target_seqs num_sequences]<br /> [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]<br /> [-use_sw_tback] [-version]</p><p>Now we can run the blast job. We will use blastp, which is appropriate for protein to protein comparisons.</p><div><div><pre>blastp -query cow.small.faa -db human.1.protein.faa
</pre></div></div><p>This gives us a lot of information on the terminal screen. But this is difficult to save and use later - Blast also gives the option of saving the text to a file.</p><div><div><pre>    blastp -query cow.small.faa -db human.1.protein.faa -out cow_vs_human_blast_results.txt
ls
</pre></div></div><p>Take a look at the results using less. Note that there can be more than one match between the query and the same subject. These are referred to as high-scoring segment pairs (HSPs).</p><div><div><pre>less cow_vs_human_blast_results.txt
</pre></div></div><p>So how do you know about all the options, such as the flag to create an output file? Lets also take a look at the help pages. Unfortunately there are no man pages (those are usually reserved for shell commands, but some software authors will provide them as well), but there is a text help output</p><div><div><pre>blastp -help
</pre></div></div><p>To scroll through slowly</p><div><div><pre>blastp -help | less
</pre></div></div><p>To quit the less screen, press the q key.</p><p>Parameters of interest include the -evalue (Default is 10?!?) and the -outfmt</p><p>Lets filter for more statistically significant matches with a different output format:</p><div><div><pre>blastp \
-query cow.small.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 7
</pre></div></div><p>I broke the long single command into many lines with by &ldquo;escaping&rdquo; the newline. That forward slash tells the command line &ldquo;Wait, I&rsquo;m not done yet!&rdquo;. So it waits for the next line of the command before executing.</p><p>Check out the results with less.</p><p>Lets try a medium sized data set next</p><div><div><pre>head -199 cow.1.protein.faa &gt; cow.medium.faa
</pre></div></div><p>What size is this db?</p><div><div><pre>grep -c '^&gt;' cow.medium.faa
</pre></div></div><p>Lets run the blast again, but this time lets return only the best hit for each query.</p><div><div><pre>blastp \
-query cow.medium.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 6 \
-max_target_seqs 1
</pre></div></div></div><div id="summary"><h2>Summary<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#summary" title="Permalink to this headline"></a></h2><p>Review:</p><ul>
<li>command line programs such as blast use flags to get information about how and what to do</li>
<li>blast options can be found by typing&nbsp;<cite>blastp -help</cite></li>
<li>break a command up over many lines by using&nbsp;<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#id1">`</a>` to &ldquo;escape&rdquo; the new line</li>
</ul><p>&nbsp;</p><p>Blastn</p><p>blastn [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-perc_identity float_value]<br /> [-qcov_hsp_perc float_value] [-max_hsps int_value]<br /> [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]<br /> [-min_raw_gapped_score int_value] [-template_type type]<br /> [-template_length int_value] [-dust DUST_options]<br /> [-filtering_db filtering_database]<br /> [-window_masker_taxid window_masker_taxid]<br /> [-window_masker_db window_masker_db] [-soft_masking soft_masking]<br /> [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]<br /> [-best_hit_score_edge float_value] [-window_size int_value]<br /> [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]<br /> [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]<br /> [-outfmt format] [-show_gis] [-num_descriptions int_value]<br /> [-num_alignments int_value] [-line_length line_length] [-html]<br /> [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]<br /> [-version]</p><p>DESCRIPTION<br /> Nucleotide-Nucleotide BLAST 2.7.0+</p></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>

</channel>
</rss>