<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/30696?</link>
	<atom:link href="https://bioinformaticsonline.com/related/30696?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22567/rosalind-problem-solution-with-perl</guid>
	<pubDate>Tue, 09 Jun 2015 23:35:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22567/rosalind-problem-solution-with-perl</link>
	<title><![CDATA[Rosalind Problem Solution with Perl]]></title>
	<description><![CDATA[<p>Rosalind is a platform for learning bioinformatics and programming through problem solving. <a href="http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track">Take a tour</a> to get the hang of how Rosalind works.</p><p>Bioinformatics Textbook Track</p><p>Find more about Rosalind puzzle at http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track</p><p>I will provide solution of all the Rosalind problem with Perl for community.</p><p>Check out the right sidebar for more links ...</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27463/bpipe-a-tool-for-running-and-managing-bioinformatics-pipelines</guid>
	<pubDate>Sat, 21 May 2016 22:42:16 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27463/bpipe-a-tool-for-running-and-managing-bioinformatics-pipelines</link>
	<title><![CDATA[Bpipe - a tool for running and managing bioinformatics pipelines]]></title>
	<description><![CDATA[<p>Bpipe provides a platform for running big bioinformatics jobs that consist of a series of processing stages - known as 'pipelines'.</p>
<ul>
<li>January 20th, 2016 - New! Bpipe 0.9.9 released!</li>
<li>Download <a href="http://download.bpipe.org/versions/bpipe-0.9.9.tar.gz">latest</a>, <a href="http://download.bpipe.org">all</a></li>
<li><a href="http://docs.bpipe.org">Documentation</a></li>
<li><a href="https://groups.google.com/forum/#%21forum/bpipe-discuss">Mailing List</a> (Google Group)</li>
</ul>
<p>Bpipe has been published in <a href="http://bioinformatics.oxfordjournals.org/content/early/2012/04/11/bioinformatics.bts167.abstract">Bioinformatics</a>! If you use Bpipe, please cite:</p>
<p><em>Sadedin S, Pope B &amp; Oshlack A, Bpipe: A Tool for Running and Managing Bioinformatics Pipelines, Bioinformatics</em></p><p>Address of the bookmark: <a href="http://docs.bpipe.org/" rel="nofollow">http://docs.bpipe.org/</a></p>]]></description>
	<dc:creator>Radha Agarkar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/396/bioinformatics-introduction-to-perl</guid>
	<pubDate>Thu, 11 Jul 2013 09:49:37 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/396/bioinformatics-introduction-to-perl</link>
	<title><![CDATA[Bioinformatics: Introduction to PERL]]></title>
	<description><![CDATA[<p>This course is aimed at those new to programming and provides an introduction to programming using <strong>Perl</strong>. By the end of this course, attendees should be able to write simple <strong>Perl</strong> programs and to understand more complex <strong>Perl</strong> programs written by others. The course will be taught using the online <a href="http://sofiarobb.com/learning-perl-toc/" title="http://sofiarobb.com/learning-perl-toc/">Learning Perl</a> materials created by <a href="http://stajich.bioinformatics.ucr.edu/members/sofia-robb" title="http://stajich.bioinformatics.ucr.edu/members/sofia-robb">Sofia Robb</a> of the <a href="http://www.ucr.edu/" title="http://www.ucr.edu/">University of California Riverside</a>. Further information is <a href="http://ruddles.bio.cam.ac.uk/%7Edpjudge/Descriptions/PERL.php" title="http://ruddles.bio.cam.ac.uk/~dpjudge/Descriptions/PERL.php">available</a>.</p>]]></description>
	<dc:creator>Archana Malhotra</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43292/bioinformatics-scientist-production-bioinformatics-south-san-francisco-ca</guid>
  <pubDate>Thu, 19 Aug 2021 08:45:24 -0500</pubDate>
  <link></link>
  <title><![CDATA[Bioinformatics Scientist, Production Bioinformatics @ South San Francisco, CA]]></title>
  <description><![CDATA[
<p>wist is looking for a Bioinformatics Scientist to join our Production Bioinformatics Team. You will work alongside research scientists, software engineers and data scientists to further deliver on our mission to expand access to best-in-class synthetic biology and next-generation sequencing applications. You will be developing and engineering tools to better evaluate and build hardened, production quality pipelines, optimize data quality, and automate lab and bioinformatics processes. Our ideal candidate is an organized problem solver with a background in developing and building novel production-quality bioinformatics tools and packages. Equally excellent communication skills and a proven ability to work independently are required.</p>

<p>More at https://boards.greenhouse.io/twistbioscience/jobs/3135495?gh_src=9ecc0b941us</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/20331/type-hinting</guid>
	<pubDate>Fri, 09 Jan 2015 22:26:13 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/20331/type-hinting</link>
	<title><![CDATA[Type Hinting]]></title>
	<description><![CDATA[<p>Python creator Guido van Rossum&rsquo;s proposal for static type-checking annotations is inching closer to reality, and the feature has taken on a new name: type hinting.</p><p><img src="http://sdtimes.com/wp-content/uploads/2015/01/0107.sdt-python-typehinting.png" alt="image" width="619" height="219" style="border: 0px; border: 0px;"></p><p>Back in August, van Rossum published a proposal on the Python mailing list recommending type-checking annotations as a valuable feature for the next version of Python to improve the performance of editors and IDEs, linter capabilities, standard notation, and refactoring. Van Rossum&rsquo;s <a href="http://lwn.net/Articles/627558/">latest proposal</a>, posted late last month, outlined plans to publish a Python Enhancement Proposal (PEP) in early January to put the feature now known as type hinting on track for inclusion in Python 3.5, slated for release this September.</p><p>Reference</p><p>https://quip.com/r69HA9GhGa7J</p>]]></description>
	<dc:creator>Pranjali Yadav</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22571/pattern-matching-problem-solution-with-perl</guid>
	<pubDate>Tue, 09 Jun 2015 23:58:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22571/pattern-matching-problem-solution-with-perl</link>
	<title><![CDATA[Pattern Matching Problem Solution with Perl]]></title>
	<description><![CDATA[<p>Problem at http://rosalind.info/problems/1c/</p><p>#Find all occurrences of a pattern in a string.<br />#Given: Strings Pattern and Genome.<br />#Return: All starting positions in Genome where Pattern appears as a substring. Use 0-based indexing.<br /><br />use strict;<br />use warnings;<br /><br />my $string="GATATATGCATATACTT";<br />my $subStr="ATAT";<br />my $kmer=length($subStr);<br /><br />kmerMatch ($string, $subStr, $kmer);<br /><br />sub kmerMatch { #Check the exact matching kmers with sliding window<br />my ($string, $myStr, $kmer)=@_;<br />for (my $aa=0; $aa&lt;=(length($string)-$kmer); $aa++) {<br />&nbsp;&nbsp;&nbsp; my $myWin=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp;&nbsp; if ($myWin eq $myStr) {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #print "$myWin eq $myStr\n";<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print $aa;<br />&nbsp;&nbsp;&nbsp; }<br />}<br />}</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/22961/bioscripts</guid>
	<pubDate>Sun, 28 Jun 2015 07:46:14 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/22961/bioscripts</link>
	<title><![CDATA[BioScripts]]></title>
	<description><![CDATA[<p>You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.</p>
<p>The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.</p>
<p>https://code.google.com/p/biowiki/</p>
<p>https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk</p>
<p>NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&amp;projectsearch=Search+projects</p>
<p>NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&amp;projectsearch=Search+projects</p><p>Address of the bookmark: <a href="https://code.google.com/hosting/search?q=bioinformatics&amp;sa=Search" rel="nofollow">https://code.google.com/hosting/search?q=bioinformatics&amp;sa=Search</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26424/biotoolbox</guid>
	<pubDate>Fri, 19 Feb 2016 09:14:44 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26424/biotoolbox</link>
	<title><![CDATA[BioToolbox]]></title>
	<description><![CDATA[<p>This is a collection of libraries and high-quality end-user scripts for bioinformatic analysis, including working with gene annotation, collecting data scores from a variety of modern file formats, and conversion between file formats. The Bio::ToolBox libraries provide a unified, abstracted interface to multiple common gene annotation formats and the collection of data from multiple data files. They rely on BioPerl SeqFeature libraries and related adaptors to access binary file formats including Bam, BigWig, BigBed, and USeq. The Bio::ToolBox package includes scripts for setting up databases of annotation, collecting annotated features, collecting genomic data relative to features, manipulating and analyzing data, and data format conversion.</p>
<p>More at http://cpansearch.perl.org/src/TJPARNELL/</p><p>Address of the bookmark: <a href="http://cpansearch.perl.org/src/TJPARNELL/" rel="nofollow">http://cpansearch.perl.org/src/TJPARNELL/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/29479/how-to-install-perl-modules-on-mac-os-x-in-easy-steps</guid>
	<pubDate>Thu, 20 Oct 2016 07:26:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/29479/how-to-install-perl-modules-on-mac-os-x-in-easy-steps</link>
	<title><![CDATA[How to install Perl modules on Mac OS X in easy steps !!]]></title>
	<description><![CDATA[<p>Today at work, I learned how to install Perl modules using&nbsp;<a href="http://en.wikipedia.org/wiki/CPAN">CPAN</a>. It&rsquo;s a lot easier than I thought.</p><p>You see, for the past couple of years, I&rsquo;ve been a bit frustrated because OS X does not come with a whole lot of Perl modules pre-installed, and for all I googled, I couldn&rsquo;t find an &ldquo;idiot&rsquo;s&rdquo; guide for moderately-savvy-but-not-expert users like myself to install modules and dependencies on demand.</p><p>The only instructions I could find point to&nbsp;<a href="http://fink.sourceforge.net/">Fink</a>, which basically installs modules in a path that isn&rsquo;t included in the Perl @INC variable, meaning you have to manually specify the full path to the modules in every script &mdash; which is not a lot of fun if you&rsquo;re developing on OS X and deploying on Red Hat, for instance.</p><p>Moreover, Fink doesn&rsquo;t seem to make every module available, and it&rsquo;s not very easy to determine which Fink package you need to install if you need a particular module.</p><p>So, with a script that called on several apparently unavailable modules, and a deadline looming, I finally decided to suck it up and figure out how to use CPAN to install them:</p><h4>1) Make sure you have the Apple Developer Tools (XCode) installed.</h4><p>These are on one of your install discs, or available as a huge but free download from the&nbsp;<a href="https://developer.apple.com/xcode/">Apple Developer Connection</a>&nbsp;[free registration required] or the Mac App Store. I thought I had them, but apparently when we upgraded that computer to Tiger, they went missing.</p><p>If you don&rsquo;t have this stuff installed, your installation will fail with errors about unavailable commands.</p><h4>1.5) Install Command Line Tools (Recent XCode versions only)</h4><p>(Thank you to Tom Marchioro for informing me about this step.)</p><p>Older versions of XCode installed the command line tools (which are required to properly install CPAN modules) by default, but apparently newer ones do not. To check whether you have the command line tools already installed, run the following from the Terminal:</p><p><code>$ which make</code></p><p>This command checks the system for the &ldquo;<code>make</code>&rdquo; tool. If it spits out something like&nbsp;<code>/usr/bin/make</code>&nbsp;you&rsquo;re golden and can skip ahead to Step 2. If you just get a new prompt and no output, you&rsquo;ll need to install the tools:</p><ol>
<li>Launch XCode and bring up the Preferences panel.</li>
<li>Click on the Downloads tab</li>
<li>Click to install the Command Line Tools</li>
</ol><p>If you like, you can run&nbsp;<code>which make</code>&nbsp;again to confirm that everything&rsquo;s installed correctly.</p><h4>2) Configure CPAN.</h4><p><code>$ sudo perl -MCPAN -e shell</code></p><p><code>perl&gt; o conf init</code></p><p>This will prompt you for some settings. You can accept the defaults for almost everything (just hit &ldquo;return&rdquo;). The two things you must fill in are the path to&nbsp;<code>make</code>&nbsp;(which should be&nbsp;<code>/usr/bin/make</code>&nbsp;or the value returned when you run&nbsp;<code>which make</code>&nbsp;from the command line) and your choice of CPAN mirrors (which you actually choose don&rsquo;t really matter, but it won&rsquo;t let you finish until you select at least one). If you use a proxy or a very restrictive firewall, you may have to configure those settings as well.</p><p>If you skip Step 2, you may get errors about&nbsp;<code>make</code>&nbsp;being unavailable.</p><h4>3) Upgrade CPAN</h4><p><code>$ sudo perl -MCPAN -e 'install Bundle::CPAN'</code></p><p>Don&rsquo;t forget the&nbsp;<code>sudo</code>, or it&rsquo;ll fail with permissions errors, probably when doing something relatively unimportant like installing&nbsp;<code>man</code>&nbsp;files.</p><p>This will spend a long time downloading, testing, and compiling various files and dependencies. Bear with it. It will prompt you a few times about dependencies. You probably want to enter &ldquo;yes&rdquo;. I agreed to everything it asked me, and everything turned out fine. YMMV of course. If everything installs properly, it&rsquo;ll give you an &ldquo;OK&rdquo; at the end.</p><h4>4) Install your modules. For each module&hellip;.</h4><p><code>$ sudo perl -MCPAN -e 'install Bundle::Name'</code></p><p>or</p><p><code>$ sudo perl -MCPAN -e 'install Module::Name'</code></p><p>This will install the module&nbsp;<em>and</em>&nbsp;its dependencies. Nice, eh? Again, don&rsquo;t forget the&nbsp;<code>sudo</code>.</p><p>The first time you run this after upgrading CPAN, it may prompt you to configure again (see Step 2). If you accept its offer to try to configure itself automatically, it may just run through everything without a problem.</p><p>There are a couple of potential pitfalls with specific modules (such as the<code>LWP::UserAgent</code>&nbsp;/&nbsp;<code>HEAD</code>&nbsp;issue), but most have workarounds, and I haven&rsquo;t run into anything that wasn&rsquo;t easily recoverable.</p><p>And that&rsquo;s it!</p><p>Did you find this useful? Is there anything I missed?</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>