<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44762?offset=1150</link>
	<atom:link href="https://bioinformaticsonline.com/related/44762?offset=1150" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44734/data-visualization-in-bioinformatics-useful-and-eye-catching-plots-for-data-analysis</guid>
	<pubDate>Sat, 14 Dec 2024 12:41:53 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44734/data-visualization-in-bioinformatics-useful-and-eye-catching-plots-for-data-analysis</link>
	<title><![CDATA[Data Visualization in Bioinformatics: Useful and Eye-Catching Plots for Data Analysis]]></title>
	<description><![CDATA[<p>Data visualization is a cornerstone of bioinformatics, enabling researchers to interpret complex datasets effectively. With a plethora of data types&mdash;genomic sequences, expression profiles, protein interactions, and more&mdash;the right visualizations can make or break an analysis. This blog highlights some of the most useful and visually compelling plots for bioinformatics data analysis, along with tools to create them.</p><h4><strong>1. Heatmaps: Exploring Patterns in High-Dimensional Data</strong></h4><p>Heatmaps are a go-to visualization for representing high-dimensional datasets, such as gene expression or metabolomics data. They use color gradients to display data intensity, making patterns and clusters easily detectable.</p><ul>
<li>
<p><strong>Applications</strong>: Gene expression analysis, pathway enrichment, methylation studies.</p>
</li>
<li>
<p><strong>Tools</strong>: Seaborn (Python), ComplexHeatmap (R), Morpheus (web-based).</p>
</li>
</ul><p><strong>Tip</strong>: Add dendrograms to visualize clustering of rows and columns for hierarchical relationships.</p><h4><strong>2. Volcano Plots: Highlighting Differential Features</strong></h4><p>Volcano plots are indispensable for identifying significantly differentially expressed genes or proteins. They plot the log2 fold change against &ndash;log10(p-value), making it easy to spot statistically significant changes.</p><ul>
<li>
<p><strong>Applications</strong>: RNA-seq, proteomics, and metabolomics.</p>
</li>
<li>
<p><strong>Tools</strong>: ggplot2 (R), EnhancedVolcano (R), Plotly (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use color to highlight significant features and label key genes or proteins.</p><h4><strong>3. PCA Plots: Reducing Complexity with Principal Component Analysis</strong></h4><p>Principal Component Analysis (PCA) plots are used to reduce dimensionality and uncover trends or clusters in data. They provide insights into sample variability and grouping.</p><ul>
<li>
<p><strong>Applications</strong>: Transcriptomics, metabolomics, microbiome studies.</p>
</li>
<li>
<p><strong>Tools</strong>: scikit-learn + Matplotlib (Python), prcomp (R), ClustVis (web-based).</p>
</li>
</ul><p><strong>Tip</strong>: Annotate clusters with metadata to enhance interpretability.</p><h4><strong>4. Manhattan Plots: Genome-Wide Association Studies</strong></h4><p>Manhattan plots visualize p-values across the genome, making it easy to identify significant associations in genome-wide studies. They resemble city skylines, with the highest peaks indicating loci of interest.</p><ul>
<li>
<p><strong>Applications</strong>: GWAS, QTL mapping.</p>
</li>
<li>
<p><strong>Tools</strong>: qqman (R), Matplotlib (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use alternating colors for chromosomes and highlight significant SNPs for clarity.</p><h4><strong>5. Circular Plots (Circos): Visualizing Genomic Relationships</strong></h4><p>Circular plots are ideal for visualizing relationships across the genome, such as structural variations, gene duplications, or synteny.</p><ul>
<li>
<p><strong>Applications</strong>: Comparative genomics, structural variation studies.</p>
</li>
<li>
<p><strong>Tools</strong>: Circos (standalone), Rcircos (R), pyCircos (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Keep the plot clean and avoid overcrowding to maintain readability.</p><h4><strong>6. Sankey Diagrams: Tracking Data Flows</strong></h4><p>Sankey diagrams visualize flows or relationships between categories, often used to track changes in gene expression or pathway enrichment across conditions.</p><ul>
<li>
<p><strong>Applications</strong>: Pathway analysis, gene set enrichment analysis.</p>
</li>
<li>
<p><strong>Tools</strong>: Plotly (Python), networkD3 (R).</p>
</li>
</ul><p><strong>Tip</strong>: Use gradients or distinct colors to highlight key transitions.</p><h4><strong>7. Network Graphs: Mapping Interactions</strong></h4><p>Network graphs represent relationships between entities, such as protein-protein interactions or gene regulatory networks. Nodes represent entities, and edges represent relationships.</p><ul>
<li>
<p><strong>Applications</strong>: Systems biology, interactomics.</p>
</li>
<li>
<p><strong>Tools</strong>: Cytoscape (standalone), igraph (R), NetworkX (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use edge thickness or node size to represent interaction strength or centrality.</p><h4><strong>8. Violin Plots: Visualizing Data Distribution</strong></h4><p>Violin plots combine a boxplot with a density plot, showing the distribution and variability of data.</p><ul>
<li>
<p><strong>Applications</strong>: Single-cell RNA-seq, quantitative trait analysis.</p>
</li>
<li>
<p><strong>Tools</strong>: Seaborn (Python), ggplot2 (R).</p>
</li>
</ul><p><strong>Tip</strong>: Split violins by groups for side-by-side comparisons.</p><h4><strong>9. Time-Series Plots: Monitoring Changes Over Time</strong></h4><p>Time-series plots display changes in variables across time points, useful for tracking gene expression dynamics or metabolic fluxes.</p><ul>
<li>
<p><strong>Applications</strong>: Time-course experiments, cell cycle studies.</p>
</li>
<li>
<p><strong>Tools</strong>: Matplotlib (Python), ggplot2 (R).</p>
</li>
</ul><p><strong>Tip</strong>: Smooth the data to highlight trends while avoiding overfitting.</p><h4><strong>10. Genome Tracks: Visualizing Genomic Features</strong></h4><p>Genome tracks display multiple layers of genomic data, such as gene annotations, sequencing coverage, and epigenetic marks.</p><ul>
<li>
<p><strong>Applications</strong>: ChIP-seq, ATAC-seq, whole-genome sequencing.</p>
</li>
<li>
<p><strong>Tools</strong>: IGV (standalone), pyGenomeTracks (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Stack related tracks for direct comparisons.</p><h4><strong>11. UpSet Plots: Visualizing Set Intersections</strong></h4><p>UpSet plots are a powerful alternative to Venn diagrams for visualizing intersections between multiple datasets.</p><ul>
<li>
<p><strong>Applications</strong>: Overlap analysis for gene sets, pathways, or variants.</p>
</li>
<li>
<p><strong>Tools</strong>: UpSetR (R), ComplexUpset (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use bar plots to represent the size of each intersection for added clarity.</p><h4><strong>12. Ridge Plots: Comparing Distributions</strong></h4><p>Ridge plots visualize the distributions of multiple datasets, stacked for easy comparison.</p><ul>
<li>
<p><strong>Applications</strong>: Transcriptomics, single-cell RNA-seq.</p>
</li>
<li>
<p><strong>Tools</strong>: ggridges (R), Matplotlib (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use transparency and consistent scaling for better readability.</p><h4><strong>13. Chord Diagrams: Visualizing Connections Between Groups</strong></h4><p>Chord diagrams illustrate relationships between categories, such as shared genes between pathways or overlaps in regulatory elements.</p><ul>
<li>
<p><strong>Applications</strong>: Pathway overlap, synteny, co-expression networks.</p>
</li>
<li>
<p><strong>Tools</strong>: Circlize (R), Holoviews (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use distinct colors for each group to emphasize relationships.</p><h4><strong>14. Treemaps: Hierarchical Data Representation</strong></h4><p>Treemaps visualize hierarchical data as nested rectangles, with area proportional to data size.</p><ul>
<li>
<p><strong>Applications</strong>: Ontology enrichment, pathway analysis.</p>
</li>
<li>
<p><strong>Tools</strong>: Treemapify (R), Plotly (Python).</p>
</li>
</ul><p><strong>Tip</strong>: Use colors to represent additional variables, like significance or enrichment scores.</p><h4><strong>15. T-SNE/UMAP Plots: Dimensionality Reduction for Clustering</strong></h4><p>T-SNE and UMAP plots are great for visualizing high-dimensional data in two dimensions while preserving local or global structure.</p><ul>
<li>
<p><strong>Applications</strong>: Single-cell transcriptomics, clustering analyses.</p>
</li>
<li>
<p><strong>Tools</strong>: scikit-learn (Python), Seurat (R).</p>
</li>
</ul><p><strong>Tip</strong>: Combine with metadata annotations for better cluster interpretation.</p><h4><strong>Bringing It All Together</strong></h4><p>The choice of visualization can significantly impact the insights gained from bioinformatics data. By selecting plots tailored to your data type and analysis goals, you can effectively communicate your findings and make your research more impactful. Whether you&rsquo;re a seasoned bioinformatician or a beginner, mastering these visualizations will elevate your analyses and presentations.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/10460/assistant-professor-at-jawaharlal-nehru-university-in-delhi</guid>
  <pubDate>Wed, 07 May 2014 00:29:22 -0500</pubDate>
  <link></link>
  <title><![CDATA[Assistant Professor at Jawaharlal Nehru University in Delhi]]></title>
  <description><![CDATA[
<p>Advt. No. RC/48/2014</p>

<p>SCHOOL OF COMPUTATIONAL AND INTEGRATIVE SCIENCES (SC&amp;IS)</p>

<p>ESSENTIAL QUALIFICATION : - M.Sc./M.Tech. in Physics/ Chemistry/ Biology/ Mathematics/ Statistics/ Bioinformatics/ Computational Biology. Ph.D. in the broad areas of Bioinformatics/ Computational Biology. Candidates must have demonstrated capabilities in terms of high impact research publications in either of the above mentioned areas.</p>

<p>Scale of Pay : - 15600-39100/- (PB-III) AGP Rs. 6000/-</p>

<p>For more details on Centre/School, Specializations etc. please visit JNU website www.jnu.ac.in or contact Section Officer, Room Nos. 131-132, Recruitment Cell, Administrative Block, JNU, New Delhi – 110067, Email: recruitmentjnu2013@gmail.com The last date for the receipt of application is 15 May, 2014.</p>

<p>http://www.jnu.ac.in/Career/</p>

<p>http://www.jnu.ac.in/Career/ADVTNo_RC_48_2014.pdf<br />Last Apply Date:</p>

<p>15 May 2014</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/39834/jnu-openings</guid>
	<pubDate>Thu, 08 Aug 2019 11:04:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/39834/jnu-openings</link>
	<title><![CDATA[JNU openings !]]></title>
	<description><![CDATA[<div><span><span>Opening for Faculty positions (Assistant Professor, Associate Professor and Professor)</span></span></div><div>&nbsp;</div><div><span><span><span style="text-decoration: underline;">Advt. No. RC/62/2019 for Assistant Professor</span>&nbsp;:-&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_62_2019_Assist_Professor_Window.pdf"><span>Window advt.</span></a>&nbsp;and&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_62_2019_Assist_Professor_Detailed.pdf"><span>Detailed advt.</span></a></span></span></div><div><span><span><span style="text-decoration: underline;">Advt. No. RC/61/2019 for Associate Professor</span>&nbsp;:-&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_61_2019_Associate_Professor_Window.pdf"><span>Window advt.</span></a>&nbsp;and&nbsp;&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_61_2019_Associate_Professor_Detailed.pdf"><span>Detailed advt.</span></a></span></span></div><div><span><span><span style="text-decoration: underline;">Advt. No. RC/60/2019 for Professor</span>&nbsp;:-&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_60_2019_Professor_Window.pdf"><span>Window advt</span>.</a>&nbsp;and&nbsp;<a href="https://jnu.ac.in/sites/default/files/career/Advt_RC_60_2019_Professor_Detailed.pdf"><span>Detailed advt.</span></a></span></span></div><h4><a href="http://jnurc60.fdsrecruit.com/"><span><span><span>Click to apply Online</span></span></span></a></h4><div><span><span>Last date for submission of applications completed in all respects, shall be&nbsp;<span><span style="text-decoration: underline;">19 August, 2019 (5.30 PM)</span></span></span></span></div><p><span><span><span><span style="text-decoration: underline;">&nbsp;</span></span></span></span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/10749/memories-can-be-passed-down-through-dna</guid>
	<pubDate>Sat, 10 May 2014 21:24:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/10749/memories-can-be-passed-down-through-dna</link>
	<title><![CDATA[Memories Can Be Passed Down Through DNA]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/tbPwzII_g6o" frameborder="0" allowfullscreen></iframe>The premise of Assassin's Creed is the reliving of other people's memories stored inside DNA. Well scientists have found that in mice, it actually happens! Anthony is joined by special guest and our friend Tara Long from Hard Science to explain how this process works, and if it might apply to humans as well.

Read More: 
Parental olfactory experience influences behavior and neural structure in subsequent generations
http://www.nature.com/neuro/journal/vaop/ncurrent/abs/nn.3594.html
"Using olfactory molecular specificity, we examined the inheritance of parental traumatic exposure, a phenomenon that has been frequently observed, but not understood."

What Is Epigenetics?
http://www.sciencemag.org/content/330/6004/611
"The cells in a multicellular organism have nominally identical DNA sequences (and therefore the same genetic instruction sets), yet maintain different terminal phenotypes. This nongenetic cellular memory, which records developmental and environmental cues (and alternative cell states in unicellular organisms), is the basis of epi-(above)-genetics."

Epigenetics
http://en.wikipedia.org/wiki/Epigenetics

Watch More:
How to Change Your Genes
https://www.youtube.com/watch?v=B5DU9lgbsSE
TestTube Wild Card
http://testtube.com/dnews/dnews-231-how-too-many-screens-affect-our-brain?utm_source=YT&utm_medium=DNews&utm_campaign=DNWC
Is Sexiness Hereditary?
https://www.youtube.com/watch?v=z6STRCncvM8
____________________

DNews is dedicated to satisfying your curiosity and to bringing you mind-bending stories & perspectives you won't find anywhere else! New videos twice daily. 

Watch More DNews on TestTube http://testtube.com/dnews

Subscribe now! http://www.youtube.com/subscription_center?add_user=dnewschannel

DNews on Twitter http://twitter.com/dnews

Anthony Carboni on Twitter http://twitter.com/acarboni

Laci Green on Twitter http://twitter.com/gogreen18

Trace Dominguez on Twitter http://twitter.com/trace501

DNews on Facebook http://facebook.com/dnews

DNews on Google+ http://gplus.to/dnews

Discovery News http://discoverynews.com]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43952/elastic-blast</guid>
	<pubDate>Tue, 06 Sep 2022 18:14:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43952/elastic-blast</link>
	<title><![CDATA[Elastic BLAST !]]></title>
	<description><![CDATA[<p><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/elasticblast.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823">ElasticBLAST</a>&nbsp;is a new way to&nbsp;<a href="https://blast.ncbi.nlm.nih.gov/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823">BLAST</a>&nbsp;large numbers of queries, faster and on the cloud. Here are the top three reasons you should use ElasticBLAST:</p>
<h6><strong><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Larger-e1659978198941.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">1. ElasticBLAST can handle much LARGER queries!&nbsp;</strong></h6>
<p>ElasticBLAST can search query sets that have&nbsp;<em>hundreds to millions of sequences</em>&nbsp;and against BLAST databases of all sizes.</p>
<h6><span><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Faster.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">2. ElasticBLAST is FASTER</span></h6>
<p>ElasticBLAST distributes your searches across multiple cloud instances to process them simultaneously. The ability to scale resources in this way allows you to process large numbers of queries in a shorter time than you could with BLAST+.</p>
<h6><img src="https://i0.wp.com/ncbiinsights.ncbi.nlm.nih.gov/wp-content/uploads/2022/08/ElasticBLAST_Easy.png?resize=150%2C120&amp;ssl=1" alt="" width="150" height="120" style="border: 0px;">3. ElasticBLAST is EASY to run on the cloud<strong><br></strong></h6>
<p>ElasticBLAST is easy to set up using our step-by-step instructions&nbsp;<span>(</span><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823" target="_blank"><span><span>Amazon Web&nbsp;</span><span>Services (AWS)</span></span></a><span>,&nbsp;</span><a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=elasticblast-top3-20220823" target="_blank"><span>Google Cloud Platform (GCP)</span></a><span><span>)</span>&nbsp;<span>and</span>&nbsp;<span>allows&nbsp;</span><span>you&nbsp;</span><span>to leverage the power of</span><span>&nbsp;the&nbsp;</span><span>cloud. Once configured, i</span><span>t</span>&nbsp;<span>manages the software and database installation, handles partitioning of the BLAST workload among the various instances, and deallocates cloud resources when the searches are done.</span></span></p>
<p><span><span>ElasticBLAST</span>&nbsp;<span>also&nbsp;</span><span>selects the instance (</span><span>i.e.,</span><span>&nbsp;machine) type for you based on database size. Of course, you can also choose the instance type manually if you prefer</span><span>.&nbsp;</span></span></p><p>Address of the bookmark: <a href="https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/" rel="nofollow">https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/10773/bioinformatics-jrfsrf-position-at-national-research-centre-on-plant-biotechnology</guid>
  <pubDate>Sun, 11 May 2014 22:29:12 -0500</pubDate>
  <link></link>
  <title><![CDATA[Bioinformatics JRF/SRF position at NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY]]></title>
  <description><![CDATA[
<p>NATIONAL RESEARCH CENTRE ON PLANT BIOTECHNOLOGY<br />LBS, CENTRE, PUSA CAMPUS, IARI NEW DELHI<br />NEW DELHI – 110 012</p>

<p>WALK- IN –INTERVIEWS</p>

<p>Eligible candidates may appear in Walk-in-Interview on May 23, 2014 at 10 AM for the posts of Research Associates &amp; Senior Research Fellows (SRF) in the following DST/DBT/ICAR funded projects.</p>

<p>1 NPTC Project on Bioinformatics and Comparative Genomics</p>

<p>Research Associate (One)</p>

<p>Rs. 24000/- + 30% HRA for masters degree holder with more than 4 years experience</p>

<p>Essential: Ph D in Plant Molecular Biology &amp; Biotechnology/Genetics 0r Candidates who have already submitted their Ph D thesis in above subjects</p>

<p>Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.</p>

<p>Senior Research Fellow ( UGCCSIR/ DBT/ ICAR Net qualified only): (One)</p>

<p>Rs. 16000/- + 30% HRA and Rs. 18000+30 HRA from 3rd year onwards</p>

<p>Essential:</p>

<p>1. ICAR/ UGCCSIR/DBT Net qualified only</p>

<p>2. M. Sc. (with thesis) in Biotechnology, Life Sciences, Biosciences/ Bioinformatics, Genetics/ Plant Pathology with experience in molecular biology.</p>

<p>Or M.Sc with more than 3 years research experiences</p>

<p>3. B.Sc. Agriculture or Biology</p>

<p>Desirable:<br />1. M. Sc. with thesis<br />2. Experience in molecular biology, plant tissue culture<br />3. Bioinformatics knowledge is important</p>

<p>2 DST JC Bose National Fellowship</p>

<p>Research Associate (Bioinformatics) : One</p>

<p>Rs.22000/- + 30% HRA for 1 &amp; 2nd Yr., Rs. 23000+ 30% HRA for 3rd year and Rs. 24000+30% HRA for 4th &amp;5th yr</p>

<p>Essential: M Ph D in Plant Molecular Biology &amp; Biotechnology/Genetics</p>

<p>Desirable: Research experience in Genomics, Molecular biology, Microarrays analysis, Gene cloning, transgenic Techniques , and computational analysis.</p>

<p>Age limit: Max.35 years (Age relaxation of 5 years for SC/ST &amp; women and 3 years for OBC)</p>

<p>The posts are purely temporary in nature and are co-terminus with the project. Initially the offer will be made for one year only and may be further extendable based on performance of the candidate. The interview will be held on May 23 , 2014 at 10:00 AM at NRCPB, LBS Building, Pusa Campus, IARI, New Delhi- 110012. The candidates must bring four copies of biodata (in the prescribed proforma), original certificates, attested photocopies of each of the certificates and an attested copy of recent passport size photograph. No. TA/DA would be given for the appearance in interview. Only the candidates having essential qualification would be entertained for the interviews. Short-listing of candidates based on academic merit and experience will be done in case of large number of applicants.</p>

<p>Advertisement: http://www.nrcpb.org/sites/default/files/Advertisement%20for%20RA%20and%20SRF%20Position.pdf</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</guid>
	<pubDate>Wed, 14 Mar 2018 05:10:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/35923/basic-command-line-to-run-blast</link>
	<title><![CDATA[Basic command-line to run BLAST]]></title>
	<description><![CDATA[<p>&nbsp;</p><p>The goal of this tutorial is to run you through a demonstration of the command line, which you may not have seen or used much before.</p><p>All of the commands below can copy/pasted.</p><div id="install-software"><h2>Install software<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#install-software" title="Permalink to this headline"></a></h2><p>Copy and paste the following commands</p><div><div><pre>sudo apt-get update &amp;&amp; sudo apt-get -y install python ncbi-blast+
</pre></div></div><p>This updates the software list and installs the Python programming language and NCBI BLAST+.</p></div><div id="get-data"><h2>Get Data<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#get-data" title="Permalink to this headline"></a></h2><p>Grab some data to play with. Grab some cow and human RefSeq proteins:</p><div><div><pre>wget ftp://ftp.ncbi.nih.gov/refseq/B_taurus/mRNA_Prot/cow.1.protein.faa.gz
wget ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/human.1.protein.faa.gz
</pre></div></div><p>This is only the first part of the human and cow protein files - there are 24 files total for human.</p><p>The database files are both gzipped, so lets unzip them</p><div><div><pre>gunzip *gz
ls
</pre></div></div><p>Take a look at the head of each file:</p><div><div><pre>head cow.1.protein.faa
head human.1.protein.faa
</pre></div></div><p>These are protein sequences in FASTA format. FASTA format is something many of you have probably seen in one form or another &ndash; it&rsquo;s pretty ubiquitous. It&rsquo;s just a text file, containing records; each record starts with a line beginning with a &lsquo;&gt;&rsquo;, and then contains one or more lines of sequence text.</p><p>Note that the files are in fasta format, even though they end if &rdquo;.faa&rdquo; instead of the usual &rdquo;.fasta&rdquo;. This NCBI&rsquo;s way of denoting that this is a fasta file with amino acids instead of nucleotides.</p><p>How many sequences are in each one?</p><div><div><pre>grep -c '^&gt;' cow.1.protein.faa
grep -c '^&gt;' human.1.protein.faa
</pre></div></div><p>This grep command uses the c flag, which reports a count of lines with match to the pattern. In this case, the pattern is a regular expression, meaning match only lines that begin with a &gt;.</p><p>This is a bit too big, lets take a smaller set for practice. Lets take the first two sequences of the cow proteins, which we can see are on the first 6 lines</p><div><div><pre>head -6 cow.1.protein.faa &gt; cow.small.faa
</pre></div></div></div><div id="blast"><h2>BLAST<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#blast" title="Permalink to this headline"></a></h2><p>Now we can blast these two cow sequences against the set of human sequences. First, we need to tell blast about our database. BLAST needs to do some pre-work on the database file prior to searching. This helps to make the software work a lot faster. Because you installed your own version of the sotware, you need to tell the shell where the software is located. Use the full path and the makeblastdb command:</p><div><div><pre>makeblastdb -in human.1.protein.faa -dbtype prot
ls
</pre></div></div><p>Note that this makes a lot of extra files, with the same name as the database plus new extensions (.pin, .psq, etc). To make blast work, these files, called index files, must be in the same directory as the fasta file.</p><p><br /> blastp [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-qcov_hsp_perc float_value]<br /> [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-seg SEG_options] [-soft_masking soft_masking]<br /> [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value]<br /> [-best_hit_overhang float_value] [-best_hit_score_edge float_value]<br /> [-window_size int_value] [-lcase_masking] [-query_loc range]<br /> [-parse_deflines] [-outfmt format] [-show_gis]<br /> [-num_descriptions int_value] [-num_alignments int_value]<br /> [-line_length line_length] [-html] [-max_target_seqs num_sequences]<br /> [-num_threads int_value] [-ungapped] [-remote] [-comp_based_stats compo]<br /> [-use_sw_tback] [-version]</p><p>Now we can run the blast job. We will use blastp, which is appropriate for protein to protein comparisons.</p><div><div><pre>blastp -query cow.small.faa -db human.1.protein.faa
</pre></div></div><p>This gives us a lot of information on the terminal screen. But this is difficult to save and use later - Blast also gives the option of saving the text to a file.</p><div><div><pre>    blastp -query cow.small.faa -db human.1.protein.faa -out cow_vs_human_blast_results.txt
ls
</pre></div></div><p>Take a look at the results using less. Note that there can be more than one match between the query and the same subject. These are referred to as high-scoring segment pairs (HSPs).</p><div><div><pre>less cow_vs_human_blast_results.txt
</pre></div></div><p>So how do you know about all the options, such as the flag to create an output file? Lets also take a look at the help pages. Unfortunately there are no man pages (those are usually reserved for shell commands, but some software authors will provide them as well), but there is a text help output</p><div><div><pre>blastp -help
</pre></div></div><p>To scroll through slowly</p><div><div><pre>blastp -help | less
</pre></div></div><p>To quit the less screen, press the q key.</p><p>Parameters of interest include the -evalue (Default is 10?!?) and the -outfmt</p><p>Lets filter for more statistically significant matches with a different output format:</p><div><div><pre>blastp \
-query cow.small.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 7
</pre></div></div><p>I broke the long single command into many lines with by &ldquo;escaping&rdquo; the newline. That forward slash tells the command line &ldquo;Wait, I&rsquo;m not done yet!&rdquo;. So it waits for the next line of the command before executing.</p><p>Check out the results with less.</p><p>Lets try a medium sized data set next</p><div><div><pre>head -199 cow.1.protein.faa &gt; cow.medium.faa
</pre></div></div><p>What size is this db?</p><div><div><pre>grep -c '^&gt;' cow.medium.faa
</pre></div></div><p>Lets run the blast again, but this time lets return only the best hit for each query.</p><div><div><pre>blastp \
-query cow.medium.faa \
-db human.1.protein.faa \
-out cow_vs_human_blast_results.tab \
-evalue 1e-5 \
-outfmt 6 \
-max_target_seqs 1
</pre></div></div></div><div id="summary"><h2>Summary<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#summary" title="Permalink to this headline"></a></h2><p>Review:</p><ul>
<li>command line programs such as blast use flags to get information about how and what to do</li>
<li>blast options can be found by typing&nbsp;<cite>blastp -help</cite></li>
<li>break a command up over many lines by using&nbsp;<a href="http://angus.readthedocs.io/en/2016/running-command-line-blast.html#id1">`</a>` to &ldquo;escape&rdquo; the new line</li>
</ul><p>&nbsp;</p><p>Blastn</p><p>blastn [-h] [-help] [-import_search_strategy filename]<br /> [-export_search_strategy filename] [-task task_name] [-db database_name]<br /> [-dbsize num_letters] [-gilist filename] [-seqidlist filename]<br /> [-negative_gilist filename] [-negative_seqidlist filename]<br /> [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]<br /> [-db_hard_mask filtering_algorithm] [-subject subject_input_file]<br /> [-subject_loc range] [-query input_file] [-out output_file]<br /> [-evalue evalue] [-word_size int_value] [-gapopen open_penalty]<br /> [-gapextend extend_penalty] [-perc_identity float_value]<br /> [-qcov_hsp_perc float_value] [-max_hsps int_value]<br /> [-xdrop_ungap float_value] [-xdrop_gap float_value]<br /> [-xdrop_gap_final float_value] [-searchsp int_value]<br /> [-sum_stats bool_value] [-penalty penalty] [-reward reward] [-no_greedy]<br /> [-min_raw_gapped_score int_value] [-template_type type]<br /> [-template_length int_value] [-dust DUST_options]<br /> [-filtering_db filtering_database]<br /> [-window_masker_taxid window_masker_taxid]<br /> [-window_masker_db window_masker_db] [-soft_masking soft_masking]<br /> [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]<br /> [-best_hit_score_edge float_value] [-window_size int_value]<br /> [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]<br /> [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]<br /> [-outfmt format] [-show_gis] [-num_descriptions int_value]<br /> [-num_alignments int_value] [-line_length line_length] [-html]<br /> [-max_target_seqs num_sequences] [-num_threads int_value] [-remote]<br /> [-version]</p><p>DESCRIPTION<br /> Nucleotide-Nucleotide BLAST 2.7.0+</p></div>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/12943/a-history-of-bioinformatics-in-the-year-2039</guid>
	<pubDate>Wed, 23 Jul 2014 06:37:51 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/12943/a-history-of-bioinformatics-in-the-year-2039</link>
	<title><![CDATA[A History of Bioinformatics (in the Year 2039)]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/uwsjwMO-TEA" frameborder="0" allowfullscreen></iframe><p>C. Titus Brown http://video.open-bio.org/video/1/a-history-of-bioinformatics-in-the-year-2039</p>]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43563/apache-server-setting</guid>
	<pubDate>Fri, 29 Oct 2021 04:29:51 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43563/apache-server-setting</link>
	<title><![CDATA[Apache server setting !]]></title>
	<description><![CDATA[<p>Apache is an open source web server that&rsquo;s available for Linux servers free of charge.</p>
<p>In this tutorial we&rsquo;ll be going through the steps of setting up an Apache server.</p>
<h3>What you&rsquo;ll learn</h3>
<ul>
<li>How to set up Apache</li>
<li>Some basic Apache configuration</li>
</ul><p>Address of the bookmark: <a href="https://ubuntu.com/tutorials/install-and-configure-apache#3-creating-your-own-website" rel="nofollow">https://ubuntu.com/tutorials/install-and-configure-apache#3-creating-your-own-website</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/11144/scientists-map-17294-proteins-produced-in-human-body</guid>
	<pubDate>Thu, 29 May 2014 01:57:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/11144/scientists-map-17294-proteins-produced-in-human-body</link>
	<title><![CDATA[Scientists map 17,294 proteins produced in human body]]></title>
	<description><![CDATA[<p>Indian scientists missed the genomic profiling bus, but they've more than made up for it by creating the first human proteome map which is an extension of the genomic study. Till now, here is no direct equivalent for the human proteome. But recently two groups present mass spectrometry-based analysis of human tissues, body fluids and cells mapping the large majority of the human proteome.</p><p>The Indian scientists working in Bangalore, along with their American counterparts, have mapped more than 17,000 proteins in 30 organs of the human body. Just like the human genome was sequenced around the turn of the millennium, this is an equivalent mapping of the human proteome.<br /><br />The researcher estimated there are around 20,500 proteins in the human body. These scientists have profiled around 17,294, which account for around 84% of the total proteins. Apart from this, the team also traced around 2,500 of 3,000 proteins that had been categorised as "missing proteins".</p><p>The work, done by group of Indian scientists, and Johns Hopkins University, published in the renowned journal Nature ( http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html ). Of the 72 people who worked on the project, 46 are Indians.</p><p>Reference:</p><p>http://www.nature.com/nature/journal/v509/n7502/full/nature13302.html</p><p>http://www.proteinatlas.org/ -The antibody-based Human Protein Atlas programme</p><p>http://www.humanproteomemap.org/ -Proteogenomic analysis by identifying translated proteins from annotated pseudogenes, non-coding RNAs and untranslated regions.</p><p>https://www.proteomicsdb.org/ -Assembled protein evidence for 18,097 genes in ProteomicsDB</p><p>&nbsp;</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>