<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38535?offset=400</link>
	<atom:link href="https://bioinformaticsonline.com/related/38535?offset=400" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38199/pacasus-correction-of-palindromes-in-long-reads-from-pacbio-and-nanopore</guid>
	<pubDate>Mon, 12 Nov 2018 05:26:48 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38199/pacasus-correction-of-palindromes-in-long-reads-from-pacbio-and-nanopore</link>
	<title><![CDATA[Pacasus: Correction of palindromes in long reads from PacBio and Nanopore]]></title>
	<description><![CDATA[<p><br>Tool for detecting and cleaning PacBio / Nanopore long reads after whole genome amplification. Check the poster from the Revolutionizing Next-Generation Sequencing (2nd edition) conference in the source folder:&nbsp;<a href="https://github.com/swarris/Pacasus/blob/master/vib2017.pdf">https://github.com/swarris/Pacasus/blob/master/vib2017.pdf</a>.</p>
<p>The prepint version is found on&nbsp;<a href="http://www.biorxiv.org/content/early/2017/08/09/173872">http://www.biorxiv.org/content/early/2017/08/09/173872</a></p>
<p>It uses the pyPaSWAS framework for sequence alignment (<a href="https://github.com/swarris/pyPaSWAS">https://github.com/swarris/pyPaSWAS</a>)</p><p>Address of the bookmark: <a href="https://github.com/swarris/Pacasus" rel="nofollow">https://github.com/swarris/Pacasus</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40754/understanding-your-reads-and-mapping</guid>
	<pubDate>Wed, 29 Jan 2020 06:29:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40754/understanding-your-reads-and-mapping</link>
	<title><![CDATA[Understanding your reads and mapping !]]></title>
	<description><![CDATA[<p>One of the best tutorial for beginners ...</p>
<p>https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2017/Day1/Session4-seqIntro.html</p><p>Address of the bookmark: <a href="https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2017/Day1/Session4-seqIntro.html" rel="nofollow">https://bioinformatics-core-shared-training.github.io/cruk-summer-school-2017/Day1/Session4-seqIntro.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/4099/sequencing-solutions-to-world-health</guid>
	<pubDate>Thu, 29 Aug 2013 15:05:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/4099/sequencing-solutions-to-world-health</link>
	<title><![CDATA[Sequencing Solutions to World Health]]></title>
	<description><![CDATA[<p>"<em>New technology that quickly, easily and economically reveals the genomes of viruses and pathogens transforms public health and medicine."</em></p>
<p><strong>Source</strong>: Life technologies</p><p>Address of the bookmark: <a href="http://www.lifetechnologies.com/global/en/home/communities-social/blog/blogs/sequencing-solutions-to-world-health.html?cid=social_blogseries_20130829_11098264" rel="nofollow">http://www.lifetechnologies.com/global/en/home/communities-social/blog/blogs/sequencing-solutions-to-world-health.html?cid=social_blogseries_20130829_11098264</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/2726/comparison-of-short-read-de-novo-alignment-algorithms</guid>
	<pubDate>Wed, 21 Aug 2013 07:56:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/2726/comparison-of-short-read-de-novo-alignment-algorithms</link>
	<title><![CDATA[Comparison of Short Read De Novo Alignment Algorithms]]></title>
	<description><![CDATA[<p>Excellent article to introduce different sequencing methods along with tools for de novo assembly of sequencing reads and their relevant references.</p>
<p>Title:&nbsp;<strong>Comparison of Short Read De Novo Alignment Algorithms&nbsp;</strong></p>
<p>Author<strong>: Nikhil Gopal</strong></p><p>Address of the bookmark: <a href="http://biochem218.stanford.edu/Projects%202011/Gopal%202011.pdf" rel="nofollow">http://biochem218.stanford.edu/Projects%202011/Gopal%202011.pdf</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/4208/latest-paper-on-comparison-of-mapping-tools</guid>
	<pubDate>Tue, 03 Sep 2013 18:00:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/4208/latest-paper-on-comparison-of-mapping-tools</link>
	<title><![CDATA[Latest paper on comparison of mapping tools]]></title>
	<description><![CDATA[<p>A. Hatem, D. Bozdag, A. E. Toland, U. V. Catalyurek "Benchmarking short sequence mapping tools" BMC Bioinformatics, 14(1):184, 2013.</p>
<p>http://bmi.osu.edu/hpc/software/benchmark/</p>
<p><a href="http://bmi.osu.edu/hpc/software/pmap/pmap.html">http://bmi.osu.edu/hpc/software/pmap/pmap.html</a></p>
<p>Other similiar papers:</p>
<p><a href="http://online.liebertpub.com/doi/pdf/10.1089/cmb.2012.0022">http://online.liebertpub.com/doi/pdf/10.1089/cmb.2012.0022</a></p>
<p><a href="http://bioinformatics.oxfordjournals.org/content/28/24/3169">http://bioinformatics.oxfordjournals.org/content/28/24/3169</a></p>
<p>Some new Mapping tool links:<a href="http://bmi.osu.edu/hpc/software/benchmark/"></a></p>
<p><strong>GSNAP</strong></p>
<p><a href="http://research-pub.gene.com/gmap/"></a><a href="http://research-pub.gene.com/gmap/">http://research-pub.gene.com/gmap/</a></p>
<p><strong>RMAP</strong></p>
<p><a href="http://rulai.cshl.edu/rmap/"></a><a href="http://rulai.cshl.edu/rmap/">http://rulai.cshl.edu/rmap/</a></p>
<p><strong>mrsFAST</strong></p>
<p><a href="http://mrsfast.sourceforge.net/Home"></a><a href="http://mrsfast.sourceforge.net/Home">http://mrsfast.sourceforge.net/Home</a></p>
<p><a href="http://sourceforge.net/projects/mrsfast/files/mrsfast-ultra-3.1.0/">http://sourceforge.net/projects/mrsfast/files/mrsfast-ultra-3.1.0/</a></p>
<p><strong>BFAST</strong></p>
<p><a href="http://sourceforge.net/apps/mediawiki/bfast/index.php?title=Main_Page">http://sourceforge.net/apps/mediawiki/bfast/index.php?title=Main_Page</a></p>
<p><strong>SHRiMP (for&nbsp;AB SOLiD color-space reads)</strong></p>
<p><a href="http://compbio.cs.toronto.edu/shrimp/">http://compbio.cs.toronto.edu/shrimp/</a></p>
<p><strong>RazerA 3</strong></p>
<p><a href="http://www.seqan.de/projects/razers/">http://www.seqan.de/projects/razers/</a></p><p>Address of the bookmark: <a href="http://www.biomedcentral.com/1471-2105/14/184" rel="nofollow">http://www.biomedcentral.com/1471-2105/14/184</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/9400/largest-genome-sequenced</guid>
	<pubDate>Fri, 21 Mar 2014 13:57:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/9400/largest-genome-sequenced</link>
	<title><![CDATA[Largest Genome Sequenced]]></title>
	<description><![CDATA[<p>The enormous size of the <strong>loblolly pine genome</strong> having <strong>22 billion base pairs</strong> compared to only 3 billion in the human genome. In other words, it is&nbsp;<strong>seven times</strong> larger than a human&rsquo;s and also the largest and the most complete&nbsp;<strong>conifer<a href="http://en.wikipedia.org/wiki/Pinophyta" target="_blank"></a></strong>&nbsp;genome ever sequenced.</p>
<p><strong>Related Paper:</strong></p>
<p>http://genomebiology.com/2014/15/3/R59/abstract</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="http://www.news.ucdavis.edu/search/news_detail.lasso?id=10859" rel="nofollow">http://www.news.ucdavis.edu/search/news_detail.lasso?id=10859</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/10243/new-rna-seq-tool</guid>
	<pubDate>Fri, 25 Apr 2014 10:59:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/10243/new-rna-seq-tool</link>
	<title><![CDATA[New RNA Seq tool]]></title>
	<description><![CDATA[<p>"<span>By removing the time-consuming step of read mapping, the authors reported, Sailfish able to provide quantification estimates 20&ndash;30 times faster than current methods without loss of accuracy."</span></p>
<p><span>Tool link:</span></p>
<p><span>http://www.cs.cmu.edu/~ckingsf/software/sailfish/</span></p>
<p><span></span></p><p>Address of the bookmark: <a href="http://www.genengnews.com/gen-news-highlights/lightweight-algorithms-sail-through-rna-sequencing-data/81249765/" rel="nofollow">http://www.genengnews.com/gen-news-highlights/lightweight-algorithms-sail-through-rna-sequencing-data/81249765/</a></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/10966/genxpro-gmbh</guid>
	<pubDate>Thu, 22 May 2014 07:18:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/10966/genxpro-gmbh</link>
	<title><![CDATA[GenXPro GmbH]]></title>
	<description><![CDATA[<p><strong>GenXPro</strong>&nbsp;GMbH is service provider for entire spectrum of nucleotide-based information&nbsp;of any biological sample. By combining intelligent data reduction techniques and&nbsp;latest next generation sequencing technologies, our service portfolio provides most accurate and cost efficient solutions for&nbsp;transcriptomic-, genomic- or epigenomic research.</p><p><span><span><strong><span>GENXPRO GMBH</span>,&nbsp;</strong></span></span><span>ALTENH&Ouml;FERALLEE 3,&nbsp;</span><span>60438 FRANKFURT MAIN,&nbsp;</span><span>GERMANY</span></p><p><span><span><strong>Website</strong></span>:&nbsp;<a href="http://www.genxpro.info/products_and_services/"></a><a href="http://www.genxpro.info/products_and_services/">http://www.genxpro.info/products_and_services/</a></span></p><p><span><strong>PHONE</strong>: +49 (0)69- 95 73 97 10,&nbsp;FAX: +49 (0)69- 95 73 97 06</span></p><p><span>EMAIL: info@genxpro.de</span></p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/13415/genomics-and-sequencing-approach-for-identification-of-biomarkers-to-assess-the-efficacy-of-tgf-%CE%B2ri-inhibitors-of-liver-cancer-in-vivo</guid>
	<pubDate>Tue, 05 Aug 2014 13:55:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/13415/genomics-and-sequencing-approach-for-identification-of-biomarkers-to-assess-the-efficacy-of-tgf-%CE%B2ri-inhibitors-of-liver-cancer-in-vivo</link>
	<title><![CDATA[Genomics and sequencing approach for identification of biomarkers to assess the efficacy of TGF-βRI inhibitors (of liver cancer) in vivo]]></title>
	<description><![CDATA[<p>Liver cancer is third leading cause of deaths and fourth most frequent occuring cancer worldwide. There are multiple signaling pathways responsible for causing cancer amongst which TGFb is most important cytokine whose signaling pathway promote cancer. However, main problem is to cure this cancer at late stage where we still have no treatment strategy to tackle this deadly cancer. &nbsp;Hence we need to find out new therapeutic target. One way is to look the relationships between mRNA, methylation and miRNA data of patients with different pathological conditions (cancer vs control either with inhibitor/not). MiRNA is small RNA molecules known to inhibit mRNA expression of particular gene by binding improperly to 3'UTR region of a gene and hence block binding of TF /translation of gene. CpG regions is known to located at promoter region of gene (5' UTR) and usually hypomethylated which allow to gene to transcribe and translate however sometime this region become hyper-methylated thats prevent expression of host gene. Thus , integration of these three data reveal new targets and pathways important for causing or preventing cancer and also reveal biomarker thats check the effects of inhibitor on signaling pathway underlying liver cancer.</p>]]></description>
	<dc:creator>Rahul Agarwal</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/13415" length="26423" type="image/jpeg" />
</item>

</channel>
</rss>