<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/27321?offset=0</link>
	<atom:link href="https://bioinformaticsonline.com/related/27321?offset=0" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27238/slurm</guid>
	<pubDate>Wed, 04 May 2016 05:13:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27238/slurm</link>
	<title><![CDATA[SLURM]]></title>
	<description><![CDATA[<p><a href="http://www.schedmd.com/">SLURM</a> workload manager software, a free open-source workload manager designed specifically to satisfy the demanding needs of high performance computing.</p>
<p>This page is a <em>HOWTO</em> guide for setting up a <a href="http://www.schedmd.com/">SLURM</a> installation, currently focused on a CentOS 7 Linux OS. Please send feedback to Ole.H.Nielsen /at/ fysik.dtu.dk.</p>
<p>See the <a href="http://www.schedmd.com/">SLURM</a> homepage (also <a href="https://computing.llnl.gov/linux/slurm/">https://computing.llnl.gov/linux/slurm/</a>).</p><p>Address of the bookmark: <a href="https://wiki.fysik.dtu.dk/niflheim/SLURM" rel="nofollow">https://wiki.fysik.dtu.dk/niflheim/SLURM</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44762/stay-connected-and-productive-unlock-the-power-of-screen-tmux-and-mosh-for-bioinformatics</guid>
	<pubDate>Wed, 22 Jan 2025 00:29:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44762/stay-connected-and-productive-unlock-the-power-of-screen-tmux-and-mosh-for-bioinformatics</link>
	<title><![CDATA[Stay Connected and Productive: Unlock the Power of Screen, Tmux, and Mosh for Bioinformatics]]></title>
	<description><![CDATA[<p>If you are a bioinformatician, chances are you have spent hours running long, complex analyses on remote servers only to lose your session because of an unstable connection. Frustrating, isnt it? Fear not! With tools like <strong>screen</strong>, <strong>tmux</strong>, and <strong>mosh</strong>, you can safeguard your workflow and stay productive, no matter where you are.</p><h4>Why Remote Session Management is a Must-Have</h4><p>In bioinformatics, tasks like genome assembly, RNA-seq analyses, and phylogenetic computations often take hours or days. A dropped SSH connection can result in:</p><ul>
<li><strong>Lost Progress:</strong> Restarting a job from scratch wastes valuable time.</li>
<li><strong>Workflow Interruptions:</strong> Disruptions can derail your focus and productivity.</li>
<li><strong>Corrupted Data:</strong> Interrupted processes may lead to incomplete or corrupted outputs.</li>
</ul><p>By integrating <strong>screen</strong>, <strong>tmux</strong>, or <strong>mosh</strong> into your workflow, you can avoid these setbacks and ensure a seamless experience.</p><h4>Screen: The Classic Workhorse</h4><p><strong>Screen</strong> is a terminal multiplexer that comes pre-installed on most Linux systems. It allows you to manage multiple terminal sessions and reconnect to them even after being disconnected.</p><p><strong>Getting Started with Screen:</strong></p><ol>
<li><strong>Start a Session:</strong>
<div>
<div>
<div>
<div>screen</div>
</div>
</div>
</div>
</li>
<li><strong>Detach from a Session:</strong><br />Press <code>Ctrl+A</code>, then <code>D</code>.</li>
<li><strong>Reattach to a Session:</strong>
<div>
<div>
<div>
<div>screen -r</div>
</div>
</div>
</div>
</li>
</ol><p><strong>Pro Tip:</strong> Enhance your screen experience with a customized <code>.screenrc</code> configuration file. Download one here: <a href="https://lnkd.in/es8vhcEH" target="_new">Get .screenrc</a>.</p><h4>Tmux: A Modern Alternative</h4><p><strong>Tmux</strong> takes everything great about screen and adds modern features, including better key bindings and intuitive session management. It\u2019s perfect for bioinformaticians who want more control over their workflow.</p><p><strong>Getting Started with Tmux:</strong></p><ol>
<li><strong>Start a Session:</strong>
<div>
<div>
<div>
<div>tmux</div>
</div>
</div>
</div>
</li>
<li><strong>Detach from a Session:</strong><br />Press <code>Ctrl+B</code>, then <code>D</code>.</li>
<li><strong>Reattach to a Session:</strong>
<div>
<div>
<div>
<div>tmux attach</div>
</div>
</div>
</div>
</li>
</ol><p><strong>Customize Your Tmux Experience:</strong><br />Use a <code>.tmux.conf</code> file to personalize your setup. Grab one here: <a href="https://lnkd.in/eZZfxmq7" target="_new">Download .tmux.conf</a>.</p><h4>Mosh: The Mobile Shell for Unreliable Connections</h4><p>SSH works well for stable networks, but it struggles in areas with spotty connectivity. Enter <strong>Mosh</strong>, the Mobile Shell. Designed for intermittent networks, Mosh keeps your session alive even when the connection drops temporarily.</p><p><strong>Why Mosh is a Game-Changer:</strong></p><ul>
<li>No lag over high-latency networks.</li>
<li>Automatically reconnects when the network is restored.</li>
<li>Ideal for working on the go, from cafes to trains.</li>
</ul><p><strong>Getting Started with Mosh:</strong></p><ol>
<li><strong>Install Mosh:</strong>
<div>
<div>
<div>
<div>sudo apt install mosh # For Debian/Ubuntu</div>
</div>
</div>
</div>
</li>
<li><strong>Connect to a Server:</strong>
<div>
<div>
<div>
<div>mosh username@server</div>
</div>
</div>
</div>
</li>
</ol><p>Learn more at <a href="https://mosh.org" target="_new">mosh.org</a>.</p><h4>Why This Matters for Bioinformatics</h4><p>Every bioinformatician knows the value of time and data integrity. Tools like screen, tmux, and mosh provide a lifeline when running long analyses, enabling you to:</p><ul>
<li>Safeguard your work against disconnections.</li>
<li>Easily manage multiple workflows in parallel.</li>
<li>Stay productive, even in challenging environments.</li>
</ul><h4>Quickstart Cheat Sheet</h4><ul>
<li>
<p><strong>Screen:</strong></p>
<div>
<div>
<div>
<div>screen # Start a session Ctrl+A, D # Detach screen -r # Reattach</div>
</div>
</div>
</div>
</li>
<li>
<p><strong>Tmux:</strong></p>
<div>
<div>tmux <span># Start a session </span> Ctrl+B, D <span># Detach </span> tmux attach <span># Reattach</span></div>
</div>
</li>
<li>
<p><strong>Mosh:</strong></p>
<div>
<div>mosh username@server</div>
</div>
</li>
</ul><h4>Final Thoughts</h4><p>As a bioinformatician, your time is too valuable to spend restarting analyses due to technical hiccups. With screen, tmux, and mosh in your toolkit, you can work smarter, protect your progress, and stay productive no matter where you are. Start using these tools today and transform the way you work with remote systems.</p><p>Let me know how these tools work for you, and don\u2019t forget to follow for more bioinformatics tips!</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27850/clusterprofiler</guid>
	<pubDate>Thu, 16 Jun 2016 18:57:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27850/clusterprofiler</link>
	<title><![CDATA[clusterProfiler]]></title>
	<description><![CDATA[<p>statistical analysis and visulization of functional profiles for genes and gene clusters<br><br>Bioconductor version: Release (3.3)<br><br>This package implements methods to analyze and visualize functional profiles (GO and KEGG) of gene and gene clusters.<br><br>Author: Guangchuang Yu &lt;guangchuangyu at gmail.com&gt; with contributions from Li-Gen Wang and Giovanni Dall'Olio.<br><br>Maintainer: Guangchuang Yu &lt;guangchuangyu at gmail.com&gt;<br><br>Citation (from within R, enter citation("clusterProfiler")):<br><br>Yu G, Wang L, Han Y and He Q (2012). &ldquo;clusterProfiler: an R package for comparing biological themes among gene clusters.&rdquo; OMICS: A Journal of Integrative Biology, 16(5), pp. 284-287.<br>Installation<br><br>To install this package, start R and enter:<br><br>## try http:// if https:// URLs are not supported<br>source("https://bioconductor.org/biocLite.R")<br>biocLite("clusterProfiler")</p>
<p>https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html</p><p>Address of the bookmark: <a href="https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html" rel="nofollow">https://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/31251/bioinformatics-opening-at-icgeb-new-delhi</guid>
  <pubDate>Thu, 02 Mar 2017 04:16:36 -0600</pubDate>
  <link></link>
  <title><![CDATA[Bioinformatics opening at ICGEB NEW DELHI]]></title>
  <description><![CDATA[
<p>ICGEB NEW DELHI</p>

<p>Applications are invited for:</p>

<p>Junior Research Fellow, in a DBT funded project, is available in Translational Health Group, ICGEB, New Delhi</p>

<p>Qualifications:</p>

<p>Education: M.Sc. (preferably in Biotechnology, Life Sciences or Zoology, Chemistry, Bioinformatics). Candidates with hands on experience on GC-MS data acquisition and analysis will be given preference. Bioinformatics expertise required.</p>

<p>Fellowship: As per DBT guidelines.</p>

<p>Tenure: The position is purely on temporary basis with an initial tenure of six months and based on satisfactory performance may continue until the completion of the project.</p>

<p>Closing date for applications: 04/03/2017</p>

<p>Please send a "TWO PAGE" CV by email to:  th.icgeb@gmail.com on or before the last date.</p>

<p>Research Associate, in a DBT funded project, is available in Translational Health Group, ICGEB, New Delhi</p>

<p>Qualifications:</p>

<p>Education: Ph.D. (in Biology, Biotechnology, Chemistry, Bioinformatics). Candidates with hands on experience on GC-MS data acquisition and analysis will be given preference. </p>

<p>Fellowship: As per DBT guidelines.</p>

<p>Tenure: The position is purely on temporary basis with an initial tenure of six months and  based on satisfactory performance may continue until the completion of the project.</p>

<p>Closing date for applications: 04/03/2017</p>

<p>Please send a "TWO PAGE" CV by email to: th.icgeb@gmail.com on or before the last date.</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/10925/a-brief-bioinformatics-tutorial</guid>
	<pubDate>Wed, 21 May 2014 12:50:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/10925/a-brief-bioinformatics-tutorial</link>
	<title><![CDATA[A Brief Bioinformatics Tutorial]]></title>
	<description><![CDATA[<p>This is about how to use a computer to find what is known about a gene of interest and also how to get new insights about it.</p>
<p>The tutorial is divided in three main parts:</p>
<ul>
<li>In the <strong>Sequence </strong>part, you will see how to look efficiently for a particular protein sequence, how to blast it against the database of your choice to find homologues, how to perform a multiple alignment of the homologues you've selected and how to edit this alignment.</li>
<li>The <strong>Structure </strong>part is about molecular visualization, homology modeling and structural domain prediction.</li>
<li>In the <strong>Function </strong>part, you will be introduced to you 3 useful servers to investigate the function of a protein. i.e. finding interactors, co-expressed genes, see a phylogenetic profile, easily access papers citing your gene etc ...</li>
</ul>
<p>During all the three parts, we will use the <em>S. cerevisiae </em>VPS36 protein as an example.</p><p>Address of the bookmark: <a href="http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html" rel="nofollow">http://www.mrc-lmb.cam.ac.uk/rlw/text/bioinfo_tuto/introduction.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r</guid>
	<pubDate>Tue, 24 Feb 2015 20:15:08 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/21443/a-guide-for-complete-r-beginners-getting-data-into-r</link>
	<title><![CDATA[A guide for complete R beginners :- Getting data into R]]></title>
	<description><![CDATA[<p>For a beginner this can be is the hardest part, it is also the most important to get right.</p><p>It is possible to create a vector by typing data directly into R using the combine function &lsquo;c&rsquo;</p><blockquote><p><strong>x </strong></p></blockquote><p>same as</p><blockquote><p><strong>x </strong></p></blockquote><p>creates the vector x with the numbers between 1 and 5.</p><p>You can see what is in an object at any time by typing its name;</p><blockquote><p><strong>x</strong></p></blockquote><p>will produce the output<strong> &lsquo;[1] 1 2 3 4 5&prime;</strong></p><p>Note that names need to be quoted</p><blockquote><p><strong>daysofweek </strong><strong>&larr; c(&lsquo;Monday&rsquo;, &lsquo;Tuesday&rsquo;, &lsquo;Wednesday&rsquo;, &lsquo;Thursday&rsquo;, &lsquo;Friday&rsquo;);</strong></p></blockquote><p>Usually however you want to input from a file. We have touched on the &lsquo;read.table&rsquo; function already.</p><blockquote><p><strong>mydata </strong></p></blockquote><p>Now <strong>mydata</strong> is a data frame with multiple vectors</p><p>each vector can be identified by the default syntax</p><p>#if any of these are typed it will print to screen</p><blockquote><p><strong>mydata$V1 mydata$V2 mydata$V3 </strong></p></blockquote><p>By default the function assumes certain things from the file</p><ul>
<li>The file is a plain text file (there are function to read excel files: <em>not covered here</em>)</li>
<li>columns are separated by any number of tabs or spaces</li>
<li>there is the same number of data points in each column</li>
<li>there is no header row (labels for the columns)</li>
<li>there is no column with names for the rows** [I&rsquo;ll explain].</li>
</ul><p><span style="text-decoration: underline;">If any of these are false, we need to tell that to the function</span></p><p>If it has a header column</p><blockquote><p><strong>mydata <em>header=T also works</em></strong></p></blockquote><p>Note that there is a comma between different parts of the functions arguments</p><p>If there is one less column in the header row, then R assumes that the 1<sup>st</sup> column of data after the header are the row names</p><p>Now the vectors (columns) are identified by their name</p><p>#if any of these are typed it will print to screen</p><blockquote><p><strong>mydata$A mydata$B mydata$C </strong></p></blockquote><p># Summary about the whole data frame</p><blockquote><p><strong>summary(mydata)</strong></p></blockquote><p># Summary information of column A</p><blockquote><p><strong>summary(mydata$A) </strong></p></blockquote><p>We can shortcut having to type the data frame each time by attaching it</p><blockquote><p><strong>attach(mydata)</strong></p></blockquote><p># summary of column B as &lsquo;mydata&rsquo; is attached</p><blockquote><p><strong>summary(B)</strong></p></blockquote><p><span style="text-decoration: underline;">Two other important options for </span><em><span style="text-decoration: underline;">read.table</span></em></p><p>If is is separated only by tabs and has a header</p><blockquote><p><strong>mydata </strong></p></blockquote><p>Really useful if you have spaces in the contents of some columns, so R does not mess up reading the columns . However if the columns or of an uneven length it will tell you.</p><p>If you know that the file has uneven columns</p><blockquote><p><strong>mydata </strong></p></blockquote><p>This causes R to fill empty spaces in a columns with &lsquo;NA&rsquo; .</p><p>The last two examples will still work with our file and give the same result as with only headers=T</p><p><span style="text-decoration: underline;">Graphs</span></p><p>to get an idea of what R is capable of type</p><blockquote><p><strong>demo(graphics)</strong></p></blockquote><p>steps through the examples, and the code is printed to the screen</p><p>We will work with simpler examples that have immediate use to biologists.</p><p>Remember to get more information about the options to a function type &lsquo;?function&rsquo;</p><p><span style="text-decoration: underline;">Histogram of A</span><span style="text-decoration: underline;"></span></p><blockquote><p><strong>hist(mydata$A)</strong></p></blockquote><p>If there was more data we could increase the number of vertical columns with the option, breaks=50 (or another relevant number).</p><blockquote><p><strong>boxplot(mydata)</strong></p></blockquote><p>We can get rid of the need to type the data frame each time by using the <strong>attach</strong> function</p><p># if not already done so</p><blockquote><p><strong>attach(mydata) </strong></p><p><strong>boxplot(mydata$A, mydata$B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p>same as</p><blockquote><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p><span style="text-decoration: underline;">Scatter plot</span></p><p># if not already done so</p><blockquote><p><strong>attach(mydata) </strong></p><p><strong>plot(A,B) # or plot(mydata$A, mydata$B)</strong></p></blockquote><p><strong><span style="text-decoration: underline;">SAVING an image</span></strong></p><p>Windows users (Rgui) RIGHT click on image and select which you want.</p><p><span style="text-decoration: underline;">These instructions work for everyone.</span></p><p>You need to create a new device of the type of file you need, then send the data to that device</p><p>to save as a png file (easy to load into the likes of powerpoint, also great for web applications.</p><blockquote><p><strong>png(&lsquo;filename&rsquo;) </strong></p><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p>or to save as a pdf</p><blockquote><p><strong>pdf(&lsquo;filename&rsquo;) </strong></p><p><strong>boxplot(A, B, name=c(&ldquo;Value A&rdquo;, &ldquo;Value B&rdquo;) , ylab=&ldquo;Count of Something&rdquo;)</strong></p></blockquote><p><span style="text-decoration: underline;">Note</span></p><ul>
<li>Nothing will appear on screen, the output is going to the file</li>
<li>Also it may not be saved immediately but will once the device (or R) is turned quit.</li>
</ul><p>To quit R type</p><p><strong>q() # </strong>If you save your session, next time you start R, you will have your data preloaded.</p><p>Or if you want to remain in R</p><blockquote><pre><strong>dev.off() #</strong>turns of the png (or pdf etc) device, thus forces the data to save</pre></blockquote>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/32496/bioinformatician-at-23andme</guid>
  <pubDate>Sat, 06 May 2017 17:57:39 -0500</pubDate>
  <link></link>
  <title><![CDATA[Bioinformatician at 23andMe]]></title>
  <description><![CDATA[
<p>23andMe’s mission is to help people access, understand, and benefit<br />from the human genome. We are a group of passionate individuals excited<br />to push the boundaries of what’s possible to help turn genetic insight<br />into better health and personal understanding.</p>

<p>Our Research Team prides itself on driving cutting edge, industrial-scale<br />science to make an impact that belies the team’s size, in an environment<br />and culture that fosters creativity, innovation, collaboration, and fun.</p>

<p>More than 80% of our customers consent to participate in research, and as<br />a result of their participation, we have one of the largest recontactable,<br />genotyped, and phenotyped research cohorts in the world. The scope and<br />breadth of our vision means that most of the methods and tools necessary<br />to unlock the potential of this unique resource for discovery have yet<br />to be developed.</p>

<p>Our science has garnered the respect of many members of the<br />broader scientific community. For a list of our publications, see<br />www.23andme.com/publications/for-scientists/.</p>

<p>Join us! Visit our Careers page (www.23andMe.com/careers) to learn more<br />about these open positions:</p>

<p>•	Scientist, Research Communications<br />•	Bioinformaticist<br />•	Computational Biologist, Ancestry R&amp;D<br />•	Scientist/Senior Scientist, Statistical Genetics<br />•	Scientist/Senior Scientist, Survey Methodology<br />•	Scientist/Senior Scientist, Health R&amp;D<br />•	Senior Computational Biologist<br />•	Biostatistician</p>

<p>pfontanillas@23andme.com</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/42670/icgeb-bioinformatics-job</guid>
  <pubDate>Sat, 23 Jan 2021 21:01:55 -0600</pubDate>
  <link></link>
  <title><![CDATA[ICGEB Bioinformatics Job]]></title>
  <description><![CDATA[
<p>The following vacancies are available in the various ongoing bioinformatics projects at.<br />Translational Bioinformatics Group (https://www.icgeb.org/dinesh-gupta/), ICGEB, New Delhi, India. Shortlisted candidates will be welcomed for an on-line interview at ICGEB. Only the chosen applicants will be informed individually. Preference will be given to the applicants with experience related to Bioinformatics as well as Computational area.</p>

<p>Interested applicants must submit their complete, updated Curriculum Vitae, mentioning details of two references as well as various other details at – http://14.139.62.220/survey/index.php/2021/01/21/icgeb-dbt-project-vacancy/</p>

<p>The last date of submission of applications is January 31st, 2021.</p>

<p>Research Associate : PhD. Degree in Computational Biology/Bioinformatics.</p>

<p>Consolidated Salary: 58280/- pm (including HRA).</p>

<p>More at https://www.icgeb.org/project-positions-translational-bioinformatics-group/ and https://www.icgeb.org/category/vacancies/</p>
]]></description>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43292/bioinformatics-scientist-production-bioinformatics-south-san-francisco-ca</guid>
  <pubDate>Thu, 19 Aug 2021 08:45:24 -0500</pubDate>
  <link></link>
  <title><![CDATA[Bioinformatics Scientist, Production Bioinformatics @ South San Francisco, CA]]></title>
  <description><![CDATA[
<p>wist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better evaluate and build hardened, production quality pipelines, optimize data quality, and automate lab and bioinformatics processes. Our ideal candidate is an organized problem solver with a background in developing and building novel production-quality bioinformatics tools and packages. Equally excellent communication skills and a proven ability to work independently are required.</p>

<p>More at https://boards.greenhouse.io/twistbioscience/jobs/3135495?gh_src=9ecc0b941us</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44720/a-beginners-guide-to-using-kraken-for-taxonomic-classification</guid>
	<pubDate>Fri, 13 Dec 2024 11:29:03 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44720/a-beginners-guide-to-using-kraken-for-taxonomic-classification</link>
	<title><![CDATA[A Beginner&#039;s Guide to Using Kraken for Taxonomic Classification]]></title>
	<description><![CDATA[<div>Kraken is a popular bioinformatics tool designed for fast and accurate taxonomic classification of metagenomic sequences. Its efficiency and precision make it a go-to resource for analyzing microbial communities, including bacteria, viruses, archaea, and fungi. Whether you're new to bioinformatics or experienced in the field, Kraken is an indispensable tool for taxonomic analysis.</div><div><div><div><div dir="auto"><div><div><p>In this blog, we&rsquo;ll walk through the basics of Kraken, from installation to running an analysis, and highlight its key features and applications.</p><h4><strong>What is Kraken?</strong></h4><p>Kraken is a sequence classification tool that assigns taxonomic labels to DNA sequences using exact k-mer matching. It uses a reference database of genomes, dividing sequences into k-mers and identifying matches in a computationally efficient way.</p><h4><strong>Key Features of Kraken</strong></h4><ul>
<li><strong>Speed</strong>: Kraken processes data much faster than alignment-based methods.</li>
<li><strong>Accuracy</strong>: It uses a precise k-mer matching algorithm for high-resolution taxonomic assignments.</li>
<li><strong>Scalability</strong>: It can handle large metagenomic datasets.</li>
<li><strong>Custom Databases</strong>: You can build and use custom databases tailored to your research needs.</li>
</ul><h4><strong>Installing Kraken</strong></h4><ol>
<li>
<p><strong>System Requirements</strong></p>
<ul>
<li>A Unix-based operating system (Linux/macOS).</li>
<li>Sufficient computational resources for database building (RAM and disk space).</li>
</ul>
</li>
<li>
<p><strong>Installation Steps</strong></p>
<ul>
<li>Clone the Kraken repository from GitHub:
<div>
<div>&nbsp;</div>
<div dir="ltr"><code>git <span style="font-size: 12.8px; font-weight: normal;">clone</span> https://github.com/DerrickWood/kraken.git <span style="font-size: 12.8px; font-weight: normal;">cd</span> kraken </code></div>
</div>
</li>
<li>Compile the Kraken binaries:
<div>
<div>&nbsp;</div>
<div dir="ltr"><code>make </code></div>
</div>
</li>
<li>Add Kraken to your PATH for easy access:
<div>
<div>&nbsp;</div>
<div dir="ltr"><code><span style="font-size: 12.8px; font-weight: normal;">export</span> PATH=<span style="font-size: 12.8px; font-weight: normal;">$PATH</span>:/path/to/kraken </code></div>
</div>
</li>
</ul>
</li>
</ol><h4><strong>Preparing a Database</strong></h4><p>Kraken requires a database of reference genomes. You can use a pre-built database or create a custom one.</p><ol>
<li>
<p><strong>Downloading a Pre-built Database</strong><br />Kraken offers pre-built databases, such as the <em>MiniKraken</em> database, which is lightweight and suitable for smaller datasets. Download it using:</p>
<div>
<div dir="ltr"><code>kraken-build --download-library minikraken </code></div>
</div>
</li>
<li>
<p><strong>Building a Custom Database</strong><br />To include specific genomes, download FASTA files and build the database:</p>
<div>
<div dir="ltr"><code>kraken-build --download-library bacteria --threads 4 --db my_database kraken-build --build --db my_database </code></div>
</div>
<p>This process may take considerable time and resources, depending on the size of the database.</p>
</li>
</ol><h4><strong>Running Kraken</strong></h4><p>Once the database is ready, you can classify sequences.</p><ol>
<li>
<p><strong>Basic Usage</strong><br />Use the following command to classify sequences:</p>
<div>
<div dir="ltr"><code>kraken --db my_database --threads 4 --fastq-input input_sequences.fastq --output kraken_output.txt </code></div>
</div>
<p>Key options:</p>
<ul>
<li><code>--db</code>: Specifies the database.</li>
<li><code>--threads</code>: Number of threads for parallel processing.</li>
<li><code>--fastq-input</code>: Indicates input file format (FASTQ/FASTA).</li>
</ul>
</li>
<li>
<p><strong>Interpreting Results</strong><br />Kraken generates an output file with columns for sequence IDs, taxonomic classifications, and the confidence score.</p>
</li>
</ol><h4><strong>Visualizing Kraken Results</strong></h4><p>Kraken results can be visualized using tools like <strong>Krona</strong> or converted to human-readable reports using <code>kraken-report</code>.</p><ol>
<li>
<p><strong>Generate a Report</strong></p>
<div>
<div dir="ltr"><code>kraken-report --db my_database kraken_output.txt &gt; kraken_report.txt </code></div>
</div>
</li>
<li>
<p><strong>Krona Visualization</strong><br />Install Krona and convert Kraken output for visualization:</p>
<div>
<div dir="ltr"><code>cut -f2,3 kraken_output.txt | ktImportTaxonomy -o krona_output.html </code></div>
</div>
<p>Open the HTML file in your browser to interactively explore the taxonomic classifications.</p>
</li>
</ol><h4><strong>Advanced Usage</strong></h4><ol>
<li>
<p><strong>Confidence Thresholds</strong><br />Adjust the confidence threshold for classification using the <code>--confidence</code> option. Higher values reduce false positives but may miss some true positives:</p>
<div>
<div dir="ltr"><code>kraken --db my_database --confidence 0.1 --fastq-input input.fastq </code></div>
</div>
</li>
<li>
<p><strong>Paired-End Reads</strong><br />For paired-end sequencing data, use:</p>
<div>
<div dir="ltr"><code>kraken --db my_database --paired reads_1.fastq reads_2.fastq </code></div>
</div>
</li>
<li>
<p><strong>Customizing K-mers</strong><br />Kraken allows you to set custom k-mer lengths during database building for specific applications.</p>
</li>
</ol><h4><strong>Applications of Kraken</strong></h4><ul>
<li><strong>Microbial Ecology</strong>: Characterizing microbial communities in soil, water, and the human microbiome.</li>
<li><strong>Pathogen Detection</strong>: Identifying pathogens in clinical samples.</li>
<li><strong>Fungal Research</strong>: Analyzing fungal diversity in metagenomic datasets.</li>
<li><strong>Environmental Monitoring</strong>: Tracking microbial populations in diverse habitats.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Kraken is a versatile and efficient tool for taxonomic classification in metagenomics. Its speed, accuracy, and flexibility make it a favorite among bioinformaticians. By following this guide, you can set up and use Kraken to unlock insights into microbial and fungal communities, paving the way for discoveries in ecology, medicine, and biotechnology.</p></div></div></div></div></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

</channel>
</rss>