<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/36525?offset=40</link>
	<atom:link href="https://bioinformaticsonline.com/related/36525?offset=40" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36723/hapsembler-an-assembler-for-highly-polymorphic-genomes</guid>
	<pubDate>Tue, 22 May 2018 04:09:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36723/hapsembler-an-assembler-for-highly-polymorphic-genomes</link>
	<title><![CDATA[Hapsembler: An Assembler for Highly Polymorphic Genomes]]></title>
	<description><![CDATA[Hapsembler is a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism. Hapsembler can be used to assemble reads from a variety of platforms including Illumina and Roche/454. 

http://compbio.cs.toronto.edu/hapsembler/<p>Address of the bookmark: <a href="http://compbio.cs.toronto.edu/hapsembler/" rel="nofollow">http://compbio.cs.toronto.edu/hapsembler/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</guid>
	<pubDate>Sat, 16 Jan 2021 21:42:11 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/42633/protocol-for-de-novo-genome-assembly-using-illumina-reads</link>
	<title><![CDATA[Protocol for De novo Genome Assembly using Illumina Reads]]></title>
	<description><![CDATA[<p>In this protocol, we address and describe the de novo assembly method for small to medium-sized genomes.</p><p><strong>What is de novo genome assembly?<br /></strong>The method of taking a large number of short DNA sequences and placing them back together to create a reflection of the original chromosomes from which the DNA originated relates to genome assembly. No previous knowledge of the source DNA sequence length, structure or composition is inferred by De novo genome assemblies. The DNA of the target organism is split up into millions of tiny parts and read on a sequencing computer in a genome sequencing experiment. Depending on the sequencing system used, these "reads" range from 20 to 1000 nucleotide base pairs (bp) in length. Usually, length reads of 36 - 150 bp are produced for Illumina style short read sequencing. These reads can be either &ldquo;single ended&rdquo; as described above or &ldquo;paired end.&rdquo;</p><p><strong>Why genome assembly?</strong><br />In basic research into why and how they live, as well as in applied topics, identifying the DNA sequence of an organism is useful. Awareness of a DNA sequence may be useful in virtually any biological research because of the relevance of DNA to living things. For example, it may be used in medicine to classify, diagnose and eventually improve genetic disorder therapies. Similarly, pathogens study can lead to treatments for infectious diseases.</p><p><strong>Raw NGS data</strong><br />Reads can be saved as a Fasta file as text or in a FastQ file with their attributes.&nbsp;FastQ is the most common read file format since this is what the Illumina sequencing pipeline creates. This will henceforth be the subject of our conversation.</p><p><strong>In a nutshell the protocol:</strong> <br />Get the sequence file(s) read from the sequencing machine (s). <br />Look at the readings - have an idea of what you have and what the standard is like. <br />If required, raw data cleanup/quality trimming. <br />Choose an adequate parameter set for assembly. <br />Assemble the data into scaffolds/contigs. <br />Examine the assembly performance and determine the efficiency of the assembly.</p><p><strong>Read Quality Control:</strong><br />Check the qualiy with fastQC.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42540/install-fastqc-using-conda</p><p>Quality trimming/cleanup of read files.<br />This function trims adapters, barcodes and other contaminants from the reads.<br />Script<br />https://bioinformaticsonline.com/snippets/view/42542/trimmomatic-command</p><p><strong>Genome Assembly:</strong><br />The object of this portion of the protocol is to explain the method of assembling the reads trimmed by quality into draft contigs.</p><blockquote><p>spades.py -1 illumina_R1.fastq.gz -2 illumina_R2.fastq.gz --careful --cov-cutoff auto -o result_of_spades_assembly_all_illumina</p></blockquote><p>A significant range of short-read assemblers are available. Everyone with strengths and disadvantages of their own. <br /><em>Some of the assemblers available include:</em><br />Velvet<br />SOAP-denovo<br />MIRA<br />ALLPATHS</p><p>Next step is to assess the suitability and what to do with a draft package of contiguous details for the remainder of the study now.&nbsp;Few stuff you can note about the contigs you just created:&nbsp;They're the draft Contigs. Any mis-assemblies can occur.</p><p><strong>Mis-assembly checking and assembly metric tools:</strong><br />QUAST - Quality assessment tool for genome assembly http://bioinf.spbau.ru/quast<br />Mauve assembly metrics - http://code.google.com/p/ngopt/wiki/How_To_Score_Genome_Assemblies_with_Mauve<br />InGAP-SV - https://sites.google.com/site/nextgengenomics/ingap and http://ingap.sourceforge.net/<br />inGAP is also useful for finding structural variants between genomes from read mappings.</p><p><strong>Genome finishing tools:</strong><br />Semi-automated gap fillers:<br />Gap filler - http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/gapfiller/</p><p>IMAGE (V2) - http://sourceforge.net/apps/mediawiki/image2/index.php?title=Main_Page</p><p><strong>Genome visualisers and editors:</strong><br />Artemis - http://www.sanger.ac.uk/resources/software/artemis/<br />IGV - http://www.broadinstitute.org/igv/</p><p><strong>Automated and semi automated annotation tools:</strong><br />Prokka - https://github.com/tseemann/prokka<br />RAST - http://www.nmpdr.org/FIG/wiki/view.cgi/FIG/RapidAnnotationServer<br />JCVI Annotation Service - http://www.jcvi.org/cms/research/projects/annotation-service/</p><p><strong>Frequent command use for the analysis are at:</strong></p><p>https://bioinformaticsonline.com/blog/view/38765/list-of-tools-frequently-used-while-genome-assembly<br />https://bioinformaticsonline.com/pages/view/42275/frequent-parameters-for-bioinformatics-tools</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</guid>
	<pubDate>Wed, 12 Jul 2023 06:29:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</link>
	<title><![CDATA[NCBI Datasets pages]]></title>
	<description><![CDATA[<p>Update! Assembly and Genome record pages now redirect to new NCBI Datasets pages. NCBI Datasets is a new resource that makes it easier to find and download genome data. Learn more: https://ncbiinsights.ncbi.nlm.nih.gov/2023/07/11/ncbi-datasets-genome-assembly-pages/&nbsp;<a href="https://ow.ly/GU3o50P8QH4"></a><a href="https://www.linkedin.com/feed/hashtag/?keywords=ncbicgr&amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7084592728260386816">#NCBICGR</a></p><p><span>Effective July 10, 2023, NCBI&rsquo;s Assembly and Genome record pages now redirect to&nbsp;</span>new<a href="https://www.ncbi.nlm.nih.gov/datasets/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"> NCBI Datasets </a><span>pages. As&nbsp;</span><a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/03/07/ncbi-datasets-genome-taxonomy-pages/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711">previously announced</a><span>, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.  </span><span>&nbsp;</span></p><h5>The following pages have been updated:</h5><ul>
<li><span>The NCBI Assembly record pages now redirect to the new </span><a href="https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_023065955.2/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Genome</span></a><span> </span><span>record pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. </span><span>&nbsp;</span></li>
<li><span>The NCBI</span><strong> </strong><span>Genome record pages now redirect to the </span><a href="https://www.ncbi.nlm.nih.gov/datasets/taxonomy/9644/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Taxonomy</span></a><span> </span><span>record pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.  </span><span>&nbsp;</span></li>
</ul><p><span>During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024. </span><span>&nbsp;</span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</guid>
	<pubDate>Tue, 14 Nov 2017 10:26:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34328/dfast-a-flexible-prokaryotic-genome-annotation-pipeline-for-faster-genome-publication</link>
	<title><![CDATA[DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication]]></title>
	<description><![CDATA[<p>We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.</p>
<div>Availability and Implementation</div>
<p>The software is implemented in Python 3 and runs in both Python 2.7 and 3.4&ndash; on Macintosh and Linux systems. It is freely available at&nbsp;<a href="https://github.com/nigyta/dfast_core/" target="">https://github.com/nigyta/dfast_core/</a>&nbsp;under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at&nbsp;<a href="https://dfast.nig.ac.jp/" target="">https://dfast.nig.ac.jp/</a>.</p><p>Address of the bookmark: <a href="https://dfast.nig.ac.jp/" rel="nofollow">https://dfast.nig.ac.jp/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</guid>
	<pubDate>Tue, 08 May 2018 04:52:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36516/metassembler-merging-and-optimizing-de-novo-genome-assemblies</link>
	<title><![CDATA[Metassembler: merging and optimizing de novo genome assemblies]]></title>
	<description><![CDATA[<p><span>Metassembler combines multiple whole genome de novo assemblies into a combined consensus assembly using the best segments of the individual assemblies.</span></p>
<p><span><span>Genome assembly projects typically run multiple algorithms in an attempt to find the single best assembly, although those assemblies often have complementary, if untapped, strengths and weaknesses. We present our metassembler algorithm that merges multiple assemblies of a genome into a single superior sequence.&nbsp;</span></span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/metassembler/?source=directory" rel="nofollow">https://sourceforge.net/projects/metassembler/?source=directory</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40465/airlift-a-methodology-and-tool-for-comprehensively-moving-mappings-and-annotations-from-one-genome-to-another-similar-genome</guid>
	<pubDate>Mon, 23 Dec 2019 10:20:13 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40465/airlift-a-methodology-and-tool-for-comprehensively-moving-mappings-and-annotations-from-one-genome-to-another-similar-genome</link>
	<title><![CDATA[AirLift, a methodology and tool for comprehensively moving mappings and annotations from one genome to another similar genome]]></title>
	<description><![CDATA[<p>We propose AirLift, a methodology and tool for comprehensively moving mappings and annotations from one genome to another similar genome while maintaining the accuracy of a full mapper.</p><p>Address of the bookmark: <a href="https://github.com/CMU-SAFARI/AirLift" rel="nofollow">https://github.com/CMU-SAFARI/AirLift</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43364/ragtag-a-collection-of-software-tools-for-scaffolding-and-improving-modern-genome-assemblies</guid>
	<pubDate>Sat, 11 Sep 2021 00:28:14 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43364/ragtag-a-collection-of-software-tools-for-scaffolding-and-improving-modern-genome-assemblies</link>
	<title><![CDATA[RagTag: a collection of software tools for scaffolding and improving modern genome assemblies]]></title>
	<description><![CDATA[<p>RagTag is a collection of software tools for scaffolding and improving modern genome assemblies. Tasks include:</p>
<ul>
<li>Homology-based misassembly&nbsp;<a href="https://github.com/malonge/RagTag/wiki/correct">correction</a></li>
<li>Homology-based assembly&nbsp;<a href="https://github.com/malonge/RagTag/wiki/scaffold">scaffolding</a>&nbsp;and&nbsp;<a href="https://github.com/malonge/RagTag/wiki/patch">patching</a></li>
<li>Scaffold&nbsp;<a href="https://github.com/malonge/RagTag/wiki/merge">merging</a></li>
</ul><p>Address of the bookmark: <a href="https://github.com/malonge/RagTag" rel="nofollow">https://github.com/malonge/RagTag</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43670/useful-bioinformatics-analysis-tools</guid>
	<pubDate>Thu, 23 Dec 2021 23:10:02 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43670/useful-bioinformatics-analysis-tools</link>
	<title><![CDATA[Useful Bioinformatics Analysis Tools !]]></title>
	<description><![CDATA[<h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=cometa&amp;subpage=about">CoMeta</a></h3><p><strong>Classificier of reads from metagenomic sequencing experiments.</strong></p><p><span>&bull;&nbsp;&nbsp;Kawulok, J., Deorowicz, S.,&nbsp;</span><em>CoMeta: Classification of Metagenomes Using k-mers</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0121453">PLOS ONE,&nbsp;</a><span>2015; 10(4):1&ndash;23,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=CoMSA&amp;subpage=about">CoMSA</a></h3><p><strong>Compressor of multiple sequence alignments of proteins.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A.,&nbsp;</span><em>CoMSA: compression of protein multiple sequence alignment files</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty619">Bioinformatics,&nbsp;</a><span>2019; 35(2):22&ndash;234,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=dsrc&amp;subpage=about">DSRC</a></h3><p><strong>Compressor of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Roguski, L., Deorowicz, S.,&nbsp;</span><em>DSRC 2: Industry-oriented compression of FASTQ files</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/30/15/2213">Bioinformatics,&nbsp;</a><span>2014; 30(15):2213&ndash;2215,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Compression of DNA sequences in FASTQ format</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/">Bioinformatics,&nbsp;</a><span>2011; 27(6):860&ndash;862,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=famsa&amp;subpage=about">FAMSA</a></h3><p><strong>Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A.,&nbsp;</span><em>FAMSA: Fast and accurate multiple sequence alignment of huge protein families</em><span>,&nbsp;</span><a href="http://www.nature.com/articles/srep33964">Scientific Reports,&nbsp;</a><span>2016; 6(33964):</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=fastore&amp;subpage=about">FaStore</a></h3><p><strong>Compressor of FASTQ files.</strong></p><p><span>&bull;&nbsp;&nbsp;Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S.,&nbsp;</span><em>FaStore - a space-saving solution for raw sequencing data</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty205">Bioinformatics,&nbsp;</a><span>2018; 34(16):2748&ndash;2756,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=fqsqueezer&amp;subpage=about">FQSqueezer</a></h3><p><strong>Experimental high-end compressor of FASTQ files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S.,&nbsp;</span><em>FQSqueezer: k-mer-based compression of sequencing data</em><span>,&nbsp;</span><a href="https://www.nature.com/articles/s41598-020-57452-6">Scientific Reports,&nbsp;</a><span>2020; 10(578):</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gdc&amp;subpage=about">GDC</a></h3><p><strong>Compressor of collections of genome sequences.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A., Niemiec, M.,&nbsp;</span><em>GDC 2: Compression of large collections of genomes</em><span>,&nbsp;</span><a href="http://www.nature.com/srep/2015/150625/srep11565/full/srep11565.html">Scientific Reports,&nbsp;</a><span>2015; 5(11565):1&ndash;12,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Robust relative compression of genomes with random access</em><span>,&nbsp;</span><a href="http://sun.aei.polsl.pl/REFRESH/bioinformatics.oxfordjournals.org/content/27/21/2979.abstract">Bioinformatics,&nbsp;</a><span>2011; 27(21):2979&ndash;2986,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gtc&amp;subpage=about">GTC</a></h3><p><strong>Genotype databases compressor with support for fast queries.</strong></p><p><span>&bull;&nbsp;&nbsp;Danek, A., Deorowicz, S.,&nbsp;</span><em>GTC: how to maintain huge genotype collections in a compressed form</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty023">Bioinformatics,&nbsp;</a><span>2018; 34(11):1834&ndash;1840,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gtshark&amp;subpage=about">GTShark</a></h3><p><strong>Genotypes compressor.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A.,&nbsp;</span><em>GTShark: Genotype compression in large projects</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btz508">Bioinformatics,&nbsp;</a><span>2019; 35(22):4791&ndash;4793,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=kmc&amp;subpage=about">KMC</a></h3><p><strong>Memory frugal&nbsp;<em>k</em>-mer counter.</strong></p><p><span>&bull;&nbsp;&nbsp;Kokot, M., Długosz, M., Deorowicz, S.,&nbsp;</span><em>KMC 3: counting and manipulating k -mer statistics</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btx304">Bioinformatics,&nbsp;</a><span>2017; 33(17):2759&ndash;2761,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A.,&nbsp;</span><em>KMC 2: Fast and resource-frugal k-mer counting</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btv022">Bioinformatics,&nbsp;</a><span>2015; 31(10):1569&ndash;1576,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz.,&nbsp;</span><em>Disk-based k-mer counting on a PC</em><span>,&nbsp;</span><a href="http://www.biomedcentral.com/1471-2105/14/160">BMC Bioinformatics,&nbsp;</a><span>2013; 14():Article no. 160,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=kmer-db&amp;subpage=about">Kmer-db</a></h3><p><strong>Tool for estimation of evolutionary distances in a collection of genomes.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A.,&nbsp;</span><em>Kmer-db: instant evolutionary distance estimation</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty610">Bioinformatics,&nbsp;</a><span>2019; 35(1):133&ndash;136,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=mugi&amp;subpage=about">MuGI</a></h3><p><strong>Index allowing queries for a collection of multiple genome sequences.</strong></p><p><span>&bull;&nbsp;&nbsp;Danek, A., Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Indexes of Large Genome Collections on a PC</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109384">PLOS ONE,&nbsp;</a><span>2014; 9(10):e109384,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=orcom&amp;subpage=about">ORCOM</a></h3><p><strong>Experimental compressor of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Grabowski, Sz., Deorowicz, S., Roguski, L.,&nbsp;</span><em>Disk-based compression of data from genome sequencing</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/early/2014/12/22/bioinformatics.btu844.abstract">Bioinformatics,&nbsp;</a><span>2014; 31(9):1389&ndash;1395,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=pgsa&amp;subpage=about">PgSA</a></h3><p><strong>Index allowing queries for a collection of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Kowalski, T., Grabowski, Sz., Deorowicz, S.,&nbsp;</span><em>Indexing arbitrary-length k-mers in sequencing reads</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0133198">PLOS ONE,&nbsp;</a><span>2015; 10(7):1&ndash;16,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=quickprobs&amp;subpage=about">QuickProbs</a></h3><p><strong>Multiple sequence alignment designed especially for GPU.</strong></p><p><span>&bull;&nbsp;&nbsp;Gudys, A., Deorowicz, S.,&nbsp;</span><em>QuickProbs 2: towards rapid construction of high-quality alignments of large protein families</em><span>,&nbsp;</span><a href="http://www.nature.com/articles/srep41553">Scientific Reports,&nbsp;</a><span>2017; 7(41553):</span><br /><span>&bull;&nbsp;&nbsp;Gudys, A., Deorowicz, S.,&nbsp;</span><em>QuickProbs &ndash; A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors</em><span>,&nbsp;</span><a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0088901">PLOS ONE,&nbsp;</a><span>2014; 9(2):e88901,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=reckoner&amp;subpage=about">RECKONER</a></h3><p><strong>Read error corrector.</strong></p><p><span>&bull;&nbsp;&nbsp;Maciej Długosz, M., Deorowicz, S.,&nbsp;</span><em>RECKONER: read error corrector based on KMC</em><span>,&nbsp;</span><a href="https://academic.oup.com/bioinformatics/article-abstract/33/7/1086/2843893/RECKONER-read-error-corrector-based-on-KMC">Bioinformatics,&nbsp;</a><span>2017; 33(7):1086&ndash;1089,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=tgc&amp;subpage=about">TGC</a></h3><p><strong>Compressor of collections of genomes given in Variant Call Format (VCF) files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A., Grabowski, Sz.,&nbsp;</span><em>Genome compression: a novel approach for large collections</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/early/2013/08/29/bioinformatics.btt460">Bioinformatics,&nbsp;</a><span>2013; 29(20):2572&ndash;2578,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=vcfshark&amp;subpage=about">VCFShark</a></h3><p><strong>Compressor of VCF files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A.,&nbsp;</span><em>GTShark: Genotype compression in large projects</em><span>,&nbsp;</span><a href="https://www.biorxiv.org/content/10.1101/2020.12.18.423437v1">biorxiv.org,&nbsp;</a><span>2020; ():</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=whisper&amp;subpage=about">Whisper</a></h3><p><strong>Experimental mapper of whole genome sequencing data.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Gudys, A.,&nbsp;</span><em>Whisper 2: indel-sensitive short read mapping</em><span>,&nbsp;</span><a href="https://doi.org/10.1101/2019.12.18.881292">bioRxiv.org,&nbsp;</a><span>2019; :</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz.,&nbsp;</span><em>Whisper: read sorting allows robust robust mapping of DNA sequencing data</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty927">Bioinformatics,&nbsp;</a><span>2019; 35(12):2043&ndash;2050,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz.,&nbsp;</span><em>Robust mapping of whole genome sequencing data</em><span>,&nbsp;</span><a href="https://meetings.cshl.edu/abstracts.aspx?meet=GENOME&amp;year=17">Poster at The Biology of Genomes Conference,&nbsp;</a><span>2017;</span></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/34864/installing-perl-environment-on-linux</guid>
	<pubDate>Tue, 26 Dec 2017 21:21:50 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/34864/installing-perl-environment-on-linux</link>
	<title><![CDATA[Installing Perl environment on Linux]]></title>
	<description><![CDATA[<p>By using&nbsp;<code>plenv</code>, you can easily install and switch among different version of Perl. This will be installed under your home directory in<code>~/.plenv</code>.</p><h4>Install latest Perl (with supporting multithreading) and CPANMinus.</h4><pre><code> $ cd
 $ git clone git://github.com/tokuhirom/plenv.git ~/.plenv
 $ git clone git://github.com/tokuhirom/Perl-Build.git ~/.plenv/plugins/perl-build/
 $ echo 'export PATH="$HOME/.plenv/bin:$PATH"' &gt;&gt; ~/.bashrc
 $ echo 'eval "$(plenv init -)"' &gt;&gt; ~/.bashrc
 $ source ~/.bashrc
 $ plenv install 5.18.1 -Dusethreads
 $ plenv rehash
 $ plenv global 5.18.1
 $ plenv install-cpanm
</code></pre><ul>
<li><code>git</code>&nbsp;is a distributed revision control and source code management software which can help you to download files from GitHub server.</li>
<li><code>echo</code>&nbsp;means "print".</li>
<li><code>&gt;&gt;</code>&nbsp;means adding the output into the end of the file, while&nbsp;<code>&gt;</code>&nbsp;means adding the output by overwriting the whole file. Please use<code>&gt;</code>&nbsp;with additional cares.</li>
<li>In Linux system, there are two types of outputs when you execute a command. One is called standard output (or sometimes STDOUT for short), and the other is a standard error (STDERR).&nbsp;<code>1&gt;</code>&nbsp;is for STDOUT only,&nbsp;<code>2&gt;</code>&nbsp;is for STDERR only, and&nbsp;<code>&amp;&gt;</code>means for both. In default&nbsp;<code>&gt;</code>&nbsp;is the same to&nbsp;<code>1&gt;</code>.</li>
<li><code>exec</code>&nbsp;is execution.</li>
<li>Remember to install Perl in supporting multithreading (with option&nbsp;<code>-Dusethreads</code>), which is important for many NGS analysis packages (e.g. Trinity). In this setting, you can use multiple CPU for Perl software.</li>
<li>Install the CPAN (Comprehensive Perl Archive Network) manager software, CPANMinus, by&nbsp;<code>install-cpanm</code>.</li>
</ul><p>You can use&nbsp;<code>plenv global</code>&nbsp;and&nbsp;<code>plenv local</code>&nbsp;to change the different version of Perl to fulfil different needs of your Perl software.</p><p>For example, if the&nbsp;specific version of Perl is not compatible with your script, you can switch to the different version by:</p><pre><code> $ plenv local 
</code></pre><ul>
<li>It is similar to set the local version of your script language when you use&nbsp;<code>pyenv</code>&nbsp;and&nbsp;<code>rbenv</code>&nbsp;as the following.</li>
</ul><p>Put the following path into&nbsp;<code>~/.bashrc file</code>.</p><pre><code>export PERL5LIB="$HOME/.plenv/build/perl-5.18.1/lib"
</code></pre><h4>Install BioPerl and PerlIO::gzip</h4><p>CPANMinus is a very good Perl module manager, use&nbsp;<code>cpanm</code>&nbsp;to install BioPerl can save you a lot of time. Here are some useful modules:</p><pre><code>$ cpanm Bio::Perl
$ cpanm Bio::SearchIO
$ cpanm PerlIO::gzip<br /></code></pre><p><span>For more information, please visit:&nbsp;</span><a href="https://github.com/tokuhirom/plenv">https://github.com/tokuhirom/plenv</a></p><pre><code>&nbsp;</code></pre>]]></description>
	<dc:creator>biogeek</dc:creator>
</item>

</channel>
</rss>