<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41875?offset=10</link>
	<atom:link href="https://bioinformaticsonline.com/related/41875?offset=10" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43447/rna-seq-workflow-gene-level-exploratory-analysis-and-differential-expression</guid>
	<pubDate>Sat, 09 Oct 2021 07:59:23 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43447/rna-seq-workflow-gene-level-exploratory-analysis-and-differential-expression</link>
	<title><![CDATA[RNA-seq workflow: gene-level exploratory analysis and differential expression]]></title>
	<description><![CDATA[<p><span>Here we walk through an end-to-end gene-level RNA-seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were quantified to the reference transcripts, and prepare gene-level count datasets for downstream analysis. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results.</span></p><p>Address of the bookmark: <a href="http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html" rel="nofollow">http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</guid>
	<pubDate>Sat, 25 Aug 2018 11:32:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37590/parallel-processing-with-perl</link>
	<title><![CDATA[Parallel Processing with Perl !]]></title>
	<description><![CDATA[<p>Here is a small tutorial on how to make best use of multiple processors for bioinformatics analysis. One best way is using perl threads and forks. Knowing how these threads and forks work is very important before implementing them. Getting to know how these work would be really useful before reading this tutorial.</p><p>Many times in bioinformatics we need to deal with huge datasets which&nbsp; are more than 100GB size. The traditional way to analysis a file is using the while loop</p><p>while (FILE){</p><p>Do something;</p><p>}</p><p>This is very slow(since we are using only one processor) and if we have 500 million lines in the dataset it takes more than a day to iterate through the whole dataset. So how do we make best use of all our processors and get the work done quickly?</p><p>Here is a very simple and efficient technique with perl which i have been using. I am&nbsp; more inclined towards using perl fork than perl threads.</p><p>One of the oldest way to fork is</p><blockquote><p>my $fork = fork();<br />if($fork){&nbsp;&nbsp;&nbsp;<br />push (@childs,$fork);&nbsp;<br />}<br />elseif($fork==0){<br /><strong>your code here;</strong><br />exit(0);<br />}<br />else{die &ldquo;Couldnt fork : $!&rdquo;;}</p><p>## wait for the child process to finish<br />foreach(@childs){<br />my $tmp=waitid($_,0);<br />}</p></blockquote><p>what a fork does is it creates a child process and takes the variables and code with it to analyze it separately (detached from the parent process) and thus a separate process is created( which usually runs on a separate processor). Thats it!! One big disadvantage of forking is its very difficult to share variables among the different processes. I will show you how to do it easily but still it has its own drawbacks.</p><blockquote><p>Okie, now if you really do not want to use fork in your code, that&rsquo;s okie too..There are many useful modules which do it for you very efficiently. One really useful module is Parallel::ForkManager. You can use Parallel::ForkManager to manage the number of forks you want to generate (number of processors you want to use).</p><p><strong>Simple usage:</strong><br />use Parallel::ForkManager;<br />my $max_processors=8;<br />my $fork= new Parallel::ForkManager($max_processors);<br />foreach (@dna) {<br />$fork-&gt;start and next; # do the fork<br /><strong>you code here;</strong><br />$fork-&gt;finish; # do the exit in the child process<br />}<br />$pm-&gt;wait_all_children;</p></blockquote><p>so you will be generating 8 forks which do the same thing for your each element of array. when one child finishes, Parallel::ForkManager generates a new one and thus you will be using all your processors to analyze the data. Now, if you have generated 8 child processes and want to write the data to one file. You need to lock the file to do this, because you will have problems with the buffering. You can lock the file using flock command.</p><blockquote><p>open (my $QUAL, &ldquo;myfile.txt&rdquo;);<br />flock $QUAL, LOCK_EX or die &ldquo;cant lock file $!&rdquo;;<br />print $QUAL &ldquo;$output&rdquo;;<br />flock $QUAL, LOCK_UN or die &ldquo;$!&rdquo;;<br />close $QUAL;</p></blockquote><p>I would not suggest using flock when dealing with multiple processes because it will decrease the processing efficiency( each child process must wait for the lock to be released by the other child process). Instead, I would suggest each fork writing to a separate file and after the processing just concatenating them.</p><p><strong>Putting it all together, If you have 100GB data you can do this</strong></p><blockquote><p><strong>step 1</strong>&nbsp;: split the dataset equally according to number of processors you have. this may take a few hours(about 2-3 hrs for 100GB file)<br />You can use unix &ldquo;split&rdquo; command for this<br />for example:<br />my $number_split=int($number_of_entries_in_your_dataset/$max_processors);<br />my $split_Files=`split -l $number_split &ldquo;your_file.fasta&rdquo; &ldquo;file_name&rdquo;`;</p><p><strong>step2</strong>: open you directory comtaining you split files and start Parallel::ForkManager.<br /><strong>For example:</strong><br />opendir(DIRECTORY, $split_files_directory) or die $!; ### open the directory<br />my $fork= new Parallel::ForkManager($max_processors);<br />while (my $file = readdir(DIRECTORY)) { ### read the directory<br />if($file=~/^\./){next;}<br />print $file,&rdquo;\n&rdquo;;<br />########## Start fork ##########<br />my $pid= $super_fork-&gt;start and next;<br /><strong>Whatever you want to do with the split file ;</strong><br /><strong>analyze my piece of $file;</strong><br />######### end fork ###############<br />$super_fork-&gt;finish;<br />}<br />$super_fork-&gt;wait_all_children;</p></blockquote><p>So basically each processor will be active with its piece of data (split file) and thus you have created 8 processes at one time which run without interfering with the other process. I again will not suggest writing output from each child process to one file(for reasons above). Write output from each fork to a separate file and finally concatenate them. Thats it, you have just increased your program speed by 8 times!! Isnt it easy?</p><p><strong>Note:</strong><br />You may worry about concatenation of the output each child generates, since it does take some time(remember 100GB). I think now you can use a mysql database LOAD DATA LOCAL INFILE command to load all the files into a single table(Should take about 3hrs for 100Gb dataset) and then export the whole table into one file. This should be faster than just concatenating them using &ldquo;cat&rdquo; command.(correct me if I am wrong)</p><p>Or much simpler way is to use pipes</p><p>cat output_dir/* | my_pipe or my_pipe &lt;(file1) final_file;</p><p>Thats it guys!! Enjoy programming and please do comment. I am not a computer scientist so forgive me for any mistakes and if any please report them. Thank you.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33955/crocoblast-optimized-parallel-implementation-of-local-sequence-alignment-algorithms</guid>
	<pubDate>Tue, 25 Jul 2017 05:03:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33955/crocoblast-optimized-parallel-implementation-of-local-sequence-alignment-algorithms</link>
	<title><![CDATA[CrocoBLAST: Optimized parallel implementation of local sequence alignment algorithms]]></title>
	<description><![CDATA[<p><span>Local sequence alignment is a cornerstone of bioinformatics, allowing to compare the amino-acid sequences of different proteins, or the nucleotide sequences of different pieces of DNA. The Basic Local Alignment Search Tool (BLAST) has revolutionized the field of bioinformatics, and is currently implemented in all free and commercial bioinformatics packages. However, with the advent of Next Generation Sequencing (NGS) and the development of new sequencing techniques, the utility of traditional BLAST implementations is limited. CrocoBLAST combines the accuracy and general applicability of BLAST with computational efficiency, accessibility, and user experience, so that NGS data can be analyzed efficiently even when only modest computational resources are available.</span></p>
<p>https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST</p><p>Address of the bookmark: <a href="https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST" rel="nofollow">https://webchem.ncbr.muni.cz/Platform/App/CrocoBLAST</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27847/anvio</guid>
	<pubDate>Thu, 16 Jun 2016 18:15:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27847/anvio</link>
	<title><![CDATA[Anvio]]></title>
	<description><![CDATA[<p>In a nutshell</p>
<p>Anvi&rsquo;o is an analysis and visualization platform for &lsquo;omics data.</p>
<p>Please find the methods paper here: https://peerj.com/articles/1319/</p>
<p>Anvi&rsquo;o would not have been possible without the help of many people who directly or indirectly contributed to its development. Here is the acknowledgements section of our methods paper</p>
<p><span>An analysis and visualization platform for 'omics data</span><span>&nbsp;</span><span><a href="http://merenlab.org/projects/anvio">http://merenlab.org/projects/anvio</a></span></p>
<p><span>Paper&nbsp;https://peerj.com/articles/1839/</span></p><p>Address of the bookmark: <a href="https://github.com/meren/anvio" rel="nofollow">https://github.com/meren/anvio</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44313/orthovenn3-an-integrated-platform-for-exploring-and-visualizing-orthologous-data-across-genomes</guid>
	<pubDate>Tue, 02 May 2023 00:48:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44313/orthovenn3-an-integrated-platform-for-exploring-and-visualizing-orthologous-data-across-genomes</link>
	<title><![CDATA[OrthoVenn3: an integrated platform for exploring and visualizing orthologous data across genomes]]></title>
	<description><![CDATA[<p><span>OrthoVenn3 is a powerful tool for comparative genomics analysis, used as a web server for full genome comparisons, annotation, and evolutionary analysis of orthologous clusters across multiple species. It has already been used by thousands of users from over 60 countries.</span></p><p>Address of the bookmark: <a href="https://orthovenn3.bioinfotoolkits.net/" rel="nofollow">https://orthovenn3.bioinfotoolkits.net/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</guid>
	<pubDate>Thu, 25 Oct 2018 10:46:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38008/quast-lg-versatile-genome-assembly-evaluation</link>
	<title><![CDATA[QUAST-LG: Versatile genome assembly evaluation]]></title>
	<description><![CDATA[<p>QUAST-LG-a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.</p>
<h4>AVAILABILITY AND IMPLEMENTATION:</h4>
<p>http://cab.spbu.ru/software/quast-lg</p><p>Address of the bookmark: <a href="http://cab.spbu.ru/software/quast-lg/" rel="nofollow">http://cab.spbu.ru/software/quast-lg/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36994/minimap2-a-versatile-pairwise-aligner-for-genomic-and-spliced-nucleotide-sequences</guid>
	<pubDate>Wed, 20 Jun 2018 07:55:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36994/minimap2-a-versatile-pairwise-aligner-for-genomic-and-spliced-nucleotide-sequences</link>
	<title><![CDATA[minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences]]></title>
	<description><![CDATA[git clone https://github.com/lh3/minimap2
cd minimap2 &amp;&amp; make
# long sequences against a reference genome
./minimap2 -a test/MT-human.fa test/MT-orang.fa &gt; test.sam
# create an index first and then map
./minimap2 -d MT-human.mmi test/MT-human.fa
./minimap2 -a MT-human.mmi test/MT-orang.fa &gt; test.sam
# use presets (no test data)
./minimap2 -ax map-pb ref.fa pacbio.fq.gz &gt; aln.sam       # PacBio genomic reads
./minimap2 -ax map-ont ref.fa ont.fq.gz &gt; aln.sam         # Oxford Nanopore genomic reads
./minimap2 -ax sr ref.fa read1.fa read2.fa &gt; aln.sam      # short genomic paired-end reads
./minimap2 -ax splice ref.fa rna-reads.fa &gt; aln.sam       # spliced long reads
./minimap2 -ax splice -k14 -uf ref.fa reads.fa &gt; aln.sam  # Nanopore Direct RNA-seq
./minimap2 -cx asm5 asm1.fa asm2.fa &gt; aln.paf             # intra-species asm-to-asm alignment
./minimap2 -x ava-pb reads.fa reads.fa &gt; overlaps.paf     # PacBio read overlap
./minimap2 -x ava-ont reads.fa reads.fa &gt; overlaps.paf    # Nanopore read overlap
# man page for detailed command line options
man ./minimap2.1<p>Address of the bookmark: <a href="https://github.com/lh3/minimap2" rel="nofollow">https://github.com/lh3/minimap2</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43766/genometools-the-versatile-open-source-genome-analysis-software</guid>
	<pubDate>Wed, 02 Feb 2022 04:00:21 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43766/genometools-the-versatile-open-source-genome-analysis-software</link>
	<title><![CDATA[GenomeTools: The versatile open source genome analysis software]]></title>
	<description><![CDATA[<p>The&nbsp;<em>GenomeTools</em>&nbsp;genome analysis system is a&nbsp;<a href="http://genometools.org/license.html">free</a>&nbsp;collection of bioinformatics&nbsp;<a href="http://genometools.org/tools.html">tools</a>&nbsp;(in the realm of genome informatics) combined into a single binary named&nbsp;<em>gt</em>. It is based on a C library named &ldquo;libgenometools&rdquo; which consists of several modules.</p>
<p><img src="http://genometools.org/images/annotation.png" alt="image" style="border: 0px;"></p>
<p>If you are interested in gene prediction, have a look at&nbsp;<a href="http://genomethreader.org/" title="GenomeThreader gene prediction        software"><em>GenomeThreader</em></a>.</p>
<p>http://genometools.org/pub/</p><p>Address of the bookmark: <a href="http://genometools.org/" rel="nofollow">http://genometools.org/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42972/list-of-bioinformatics-workflow-management-tools</guid>
	<pubDate>Sat, 20 Mar 2021 00:15:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42972/list-of-bioinformatics-workflow-management-tools</link>
	<title><![CDATA[List of bioinformatics workflow management tools !]]></title>
	<description><![CDATA[<h3>Here are list of&nbsp;Workflow Managers</h3><ul>
<li><span><a href="https://github.com/pcingola/BigDataScript">BigDataScript</a></span>&nbsp;&ndash; A cross-system scripting language for working with big data pipelines in computer systems of different sizes and capabilities. [&nbsp;<a href="https://pubmed.ncbi.nlm.nih.gov/25189778">paper-2014</a>&nbsp;|&nbsp;<a href="https://pcingola.github.io/BigDataScript">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/ssadedin/bpipe">Bpipe</a></span>&nbsp;&ndash; A small language for defining pipeline stages and linking them together to make pipelines. [&nbsp;<a href="http://docs.bpipe.org/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/common-workflow-language/common-workflow-language">Common Workflow Language</a></span>&nbsp;&ndash; a specification for describing analysis workflows and tools that are portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. [&nbsp;<a href="http://www.commonwl.org/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/broadinstitute/cromwell">Cromwell</a></span>&nbsp;&ndash; A Workflow Management System geared towards scientific workflows. [&nbsp;<a href="https://cromwell.readthedocs.io/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/galaxyproject">Galaxy</a></span>&nbsp;&ndash; a popular open-source, web-based platform for data intensive biomedical research. Has several features, from data analysis to workflow management to visualization tools. [&nbsp;<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030816">paper-2018</a>&nbsp;|&nbsp;<a href="https://galaxyproject.org/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/nextflow-io/nextflow">Nextflow</a>&nbsp;(recommended)</span>&nbsp;&ndash; A fluent DSL modelled around the UNIX pipe concept, that simplifies writing parallel and scalable pipelines in a portable manner. [&nbsp;<a href="https://pubmed.ncbi.nlm.nih.gov/29412134">paper-2018</a>&nbsp;|&nbsp;<a href="http://nextflow.io/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/cgat-developers/ruffus">Ruffus</a></span>&nbsp;&ndash; Computation Pipeline library for python widely used in science and bioinformatics. [&nbsp;<a href="https://pubmed.ncbi.nlm.nih.gov/20847218">paper-2010</a>&nbsp;|&nbsp;<a href="http://www.ruffus.org.uk/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/SeqWare/seqware">SeqWare</a></span>&nbsp;&ndash; Hadoop Oozie-based workflow system focused on genomics data analysis in cloud environments. [&nbsp;<a href="https://pubmed.ncbi.nlm.nih.gov/21210981">paper-2010</a>&nbsp;|&nbsp;<a href="https://seqware.github.io/">web</a>&nbsp;]</li>
<li><span><a href="https://bitbucket.org/snakemake">Snakemake</a></span>&nbsp;&ndash; A workflow management system in Python that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment. [&nbsp;<a href="https://pubmed.ncbi.nlm.nih.gov/29788404">paper-2018</a>&nbsp;|&nbsp;<a href="https://snakemake.readthedocs.io/">web</a>&nbsp;]</li>
<li><span><a href="https://github.com/broadinstitute/wdl">Workflow Descriptor Language</a></span>&nbsp;&ndash; Workflow standard developed by the Broad. [&nbsp;<a href="https://software.broadinstitute.org/wdl">web</a>&nbsp;]</li>
</ul>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44628/uncovar-workflow-for-transparent-and-robust-virus-variant-calling-genome-reconstruction-and-lineage-assignment</guid>
	<pubDate>Mon, 05 Aug 2024 23:01:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44628/uncovar-workflow-for-transparent-and-robust-virus-variant-calling-genome-reconstruction-and-lineage-assignment</link>
	<title><![CDATA[UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment]]></title>
	<description><![CDATA[<p>UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment</p>
<ul>
<li>
<p>Using state of the art tools, easily extended for other viruses</p>
</li>
<li>
<p>Tool and database updates for critical components via Conda</p>
</li>
<li>
<p>Built using modern design patterns with Conda and Snakemake</p>
</li>
<li>
<p>Extensible and easy to customize</p>
</li>
<li>
<p>Submission Ready Genomes</p>
</li>
<li>
<p>Customizable reporting with comprehensive visualization</p>
</li>
</ul>
<p>https://ikim-essen.github.io/uncovar/</p>
<p>Github&nbsp;https://github.com/IKIM-Essen/uncovar</p>
<p>&nbsp;</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://ikim-essen.github.io/uncovar/" rel="nofollow">https://ikim-essen.github.io/uncovar/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>