<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38457?</link>
	<atom:link href="https://bioinformaticsonline.com/related/38457?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</guid>
	<pubDate>Tue, 26 Apr 2016 03:38:43 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27076/ale-a-generic-assembly-likelihood-evaluation-framework-for-assessing-the-accuracy-of-genome-and-metagenome-assemblies</link>
	<title><![CDATA[ALE: a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies]]></title>
	<description><![CDATA[<p>Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process.</p>
<p>More at&nbsp;http://www.ncbi.nlm.nih.gov/pubmed/23303509</p><p>Address of the bookmark: <a href="http://sc932.github.io/ALE/about.html" rel="nofollow">http://sc932.github.io/ALE/about.html</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35119/frontend-perl-web-framework-documentation-andrej-sali-lab</guid>
	<pubDate>Mon, 08 Jan 2018 22:32:03 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35119/frontend-perl-web-framework-documentation-andrej-sali-lab</link>
	<title><![CDATA[Frontend: Perl Web framework documentation - Andrej Sali Lab]]></title>
	<description><![CDATA[<p><span>The frontend is a set of Perl classes that displays the web interface, allowing a user to upload their input files, start a job, display a list of all jobs in the system, and get back job results. The main&nbsp;</span><a href="https://saliweb.readthedocs.io/en/latest/modules/frontend.html#saliwebfrontend" title="saliwebfrontend"><code><span>saliwebfrontend</span></code></a><span>&nbsp;class must be subclassed for each web service. This class is then used to display the web pages using a set of CGI scripts that are set up automatically by the build system.</span></p><p>Address of the bookmark: <a href="https://saliweb.readthedocs.io/en/latest/frontend.html" rel="nofollow">https://saliweb.readthedocs.io/en/latest/frontend.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38489/biotite-a-general-framework-for-computational-biology</guid>
	<pubDate>Mon, 17 Dec 2018 18:52:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38489/biotite-a-general-framework-for-computational-biology</link>
	<title><![CDATA[Biotite: A general framework for computational biology]]></title>
	<description><![CDATA[<p><span>The package is open source and freely available at GitHub (</span><span><a href="https://github.com/biotite-dev/biotite" target="_blank">https://github.com/biotite-dev/biotite</a></span><span>). This package is simple to use especially for the beginners in programming and computationally efficient because of the implementation of Numpy and Cython.&nbsp;Biotite consists of four sub packages: sequence, structure, databases, and application. The&nbsp;</span><em>sequence</em><span>&nbsp;and&nbsp;</span><em>structure</em><span>&nbsp;modules serve for the analysis of sequence and structural data analysis respectively,&nbsp;</span><em>database</em><span>&nbsp;downloads files from the other databases such as RCSB PDB, and&nbsp;</span><em>application</em><span>&nbsp;provides interface for external software.&nbsp;</span></p>
<p><span><span>The&nbsp;</span><em>Biotite</em><span>&nbsp;package bundles popular tasks in computational biology into an unifying framework, which is easy to use on the one hand side, but is also computationally efficient due to intensive usage of&nbsp;</span><em>NumPy</em><span>&nbsp;and&nbsp;</span><em>Cython</em><span>. This package focuses on working with sequence and structure data and supports various file formats and analysis and manipulation functions.</span></span></p><p>Address of the bookmark: <a href="https://github.com/biotite-dev/biotite" rel="nofollow">https://github.com/biotite-dev/biotite</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40221/dash-a-web-application-framework-that-provides-pure-python-abstraction-around-html-css-and-javascript</guid>
	<pubDate>Tue, 05 Nov 2019 06:39:48 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40221/dash-a-web-application-framework-that-provides-pure-python-abstraction-around-html-css-and-javascript</link>
	<title><![CDATA[Dash: a web application framework that provides pure Python abstraction around HTML, CSS, and JavaScript.]]></title>
	<description><![CDATA[<p style="margin-top: 0px; margin-bottom: 0.75rem;">Dash is a web application framework that provides pure Python abstraction around HTML, CSS, and JavaScript.</p>
<p style="margin-top: 0px; margin-bottom: 0.75rem;">Dash Bio is a suite of bioinformatics components that make it simpler to analyze and visualize bioinformatics data and interact with them in a Dash application.</p>
<p style="margin-top: 0px; margin-bottom: 0.75rem;">The source can be found on GitHub at<span>&nbsp;</span><a href="https://github.com/plotly/dash-bio">plotly/dash-bio</a>.</p>
<p style="margin-top: 0px; margin-bottom: 0.75rem;">These docs are using Dash Bio version 0.1.4.</p><p>Address of the bookmark: <a href="https://dash.plot.ly/dash-bio" rel="nofollow">https://dash.plot.ly/dash-bio</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33955/crocoblast-optimized-parallel-implementation-of-local-sequence-alignment-algorithms</guid>
	<pubDate>Tue, 25 Jul 2017 05:03:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33955/crocoblast-optimized-parallel-implementation-of-local-sequence-alignment-algorithms</link>
	<title><![CDATA[CrocoBLAST: Optimized parallel implementation of local sequence alignment algorithms]]></title>
	<description><![CDATA[<p><span>Local sequence alignment is a cornerstone of bioinformatics, allowing to compare the amino-acid sequences of different proteins, or the nucleotide sequences of different pieces of DNA. The Basic Local Alignment Search Tool (BLAST) has revolutionized the field of bioinformatics, and is currently implemented in all free and commercial bioinformatics packages. However, with the advent of Next Generation Sequencing (NGS) and the development of new sequencing techniques, the utility of traditional BLAST implementations is limited. CrocoBLAST combines the accuracy and general applicability of BLAST with computational efficiency, accessibility, and user experience, so that NGS data can be analyzed efficiently even when only modest computational resources are available.</span></p>
<p>https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST</p><p>Address of the bookmark: <a href="https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST" rel="nofollow">https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</guid>
	<pubDate>Fri, 08 Dec 2017 16:26:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34567/jobtree-based-python-wrapper-to-run-the-genome-simulation-tool-suite-evolver</link>
	<title><![CDATA[jobTree based python wrapper to run the genome simulation tool suite Evolver]]></title>
	<description><![CDATA[<p><span>evolverSimControl</span><span>&nbsp;(</span><span>eSC</span><span>) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (</span><a href="http://evolution.genetics.washington.edu/phylip/newicktree.html">Newick format</a><span>). In addition to simply running evolver,&nbsp;</span><span>eSC</span><span>&nbsp;also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (</span><a href="http://genome.ucsc.edu/FAQ/FAQformat.html#format5">.maf</a><span>) from leaf and branch nodes from a completed simulation and with the help of&nbsp;</span><a href="https://github.com/dentearl/mafTools/">mafJoin</a><span>, join them together into a single maf covering the entire simulation.</span></p><p>Address of the bookmark: <a href="https://github.com/dentearl/evolverSimControl" rel="nofollow">https://github.com/dentearl/evolverSimControl</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36616/srbreak-a-read-depth-and-split-read-framework-to-identify-breakpoints-of-different-events-inside-simple-copy-number-variable-regions</guid>
	<pubDate>Tue, 15 May 2018 04:42:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36616/srbreak-a-read-depth-and-split-read-framework-to-identify-breakpoints-of-different-events-inside-simple-copy-number-variable-regions</link>
	<title><![CDATA[SRBreak: A Read-Depth and Split-Read Framework to Identify Breakpoints of Different Events Inside Simple Copy-Number Variable Regions]]></title>
	<description><![CDATA[SRBreak is a read-depth and split-read package written in R for identifying copy-number variants in next-generation sequencing datasets.

Note: SBReak was designed to work for multiple samples. It can work for &gt;= 2 samples, but we suggest that users should use &gt;= 5 samples as in the work tested in our paper.<p>Address of the bookmark: <a href="https://github.com/hoangtn/SRBreak" rel="nofollow">https://github.com/hoangtn/SRBreak</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35148/mojolicious-a-next-generation-web-framework-for-the-perl-programming-language</guid>
	<pubDate>Fri, 12 Jan 2018 16:48:10 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35148/mojolicious-a-next-generation-web-framework-for-the-perl-programming-language</link>
	<title><![CDATA[mojolicious: a next generation web framework for the Perl programming language.]]></title>
	<description><![CDATA[<p><span>Back in the early days of the web, many people learned Perl because of a wonderful Perl library called&nbsp;</span><a href="https://metacpan.org/module/CGI" target="_blank">CGI</a><span>. It was simple enough to get started without knowing much about the language and powerful enough to keep you going, learning by doing was much fun. While most of the techniques used are outdated now, the idea behind it is not. Mojolicious is a new endeavor to implement this idea using bleeding edge technologies.</span></p>
<h2>Features</h2>
<ul>
<li>An amazing&nbsp;<strong>real-time web framework</strong>, allowing you to easily grow single file prototypes into well-structured MVC web applications.
<ul>
<li>Powerful out of the box with RESTful routes, plugins, commands, Perl-ish templates, content negotiation, session management, form validation, testing framework, static file server, CGI/<a href="http://plackperl.org/" target="_blank">PSGI</a>&nbsp;detection, first class Unicode support and much more for you to discover.</li>
</ul>
</li>
<li>A powerful&nbsp;<strong>web development toolkit</strong>, that you can use for all kinds of applications, independently of the web framework.
<ul>
<li>Full stack HTTP and WebSocket client/server implementation with IPv6, TLS, SNI, IDNA, HTTP/SOCKS5 proxy, UNIX domain socket, Comet (long polling), Promises/A+, keep-alive, connection pooling, timeout, cookie, multipart and gzip compression support.</li>
<li>Built-in non-blocking I/O web server, supporting multiple event loops as well as optional pre-forking and hot deployment, perfect for building highly scalable web services.</li>
<li>JSON and HTML/XML parser with CSS selector support.</li>
</ul>
</li>
<li>Very clean, portable and object-oriented pure-Perl API with no hidden magic and no requirements besides Perl 5.24.0 (versions as old as 5.10.1 can be used too, but may require additional CPAN modules to be installed)</li>
<li>Fresh code based upon years of experience developing&nbsp;<a href="http://catalystframework.org/" target="_blank">Catalyst</a>, free and open source.</li>
<li>Hundreds of 3rd party&nbsp;<a href="https://metacpan.org/requires/distribution/Mojolicious">extensions</a>&nbsp;and high quality spin-off projects like the&nbsp;<a href="https://metacpan.org/pod/Minion">Minion</a>&nbsp;job queue.</li>
</ul>
<p>http://mojolicious.org/</p><p>Address of the bookmark: <a href="http://mojolicious.org/" rel="nofollow">http://mojolicious.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38829/nquire-a-statistical-framework-for-ploidy-estimation-using-ngs-short-read-data</guid>
	<pubDate>Thu, 31 Jan 2019 05:12:19 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38829/nquire-a-statistical-framework-for-ploidy-estimation-using-ngs-short-read-data</link>
	<title><![CDATA[nQuire: A statistical framework for ploidy estimation using NGS short-read data]]></title>
	<description><![CDATA[<p>nQuire implements a set of commands to estimate ploidy level of individuals from species, where recent polyploidization occurred and intraspecific ploidy variation is observed. Specifically, nQuire uses next-generation sequencing data to distinguish between diploids, triploids and tetraploids, on the basis of frequency distributions at variant sites where only two bases are segregating.</p>
<p>For more background see also the publication at&nbsp;<a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2128-z">BMC Bioinformatics</a>.</p>
<p>https://github.com/clwgg/nQuire</p><p>Address of the bookmark: <a href="https://github.com/clwgg/nQuire" rel="nofollow">https://github.com/clwgg/nQuire</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>