<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41734?offset=70</link>
	<atom:link href="https://bioinformaticsonline.com/related/41734?offset=70" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36817/kwip-the-k-mer-weighted-inner-product-a-de-novo-estimator-of-genetic-similarity</guid>
	<pubDate>Tue, 29 May 2018 08:37:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36817/kwip-the-k-mer-weighted-inner-product-a-de-novo-estimator-of-genetic-similarity</link>
	<title><![CDATA[kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity]]></title>
	<description><![CDATA[<p>The k-mer Weighted Inner Product.</p>
<p>This software implements a <em>de novo</em>, <em>alignment free</em> measure of sample genetic dissimilarity which operates upon raw sequencing reads. It is able to calculate the genetic dissimilarity between samples without any reference genome, and without assembling one.</p>
<p> </p>

De novo estimates of genetic relatedness from next-gen sequencing data https://kwip.readthedocs.org<p>Address of the bookmark: <a href="https://github.com/kdmurray91/kwip" rel="nofollow">https://github.com/kdmurray91/kwip</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</guid>
	<pubDate>Fri, 19 Oct 2018 07:25:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</link>
	<title><![CDATA[BASE: a practical de novo assembler for large genomes using long NGS reads]]></title>
	<description><![CDATA[<p><span>new&nbsp;</span><em>de novo</em><span>&nbsp;assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.</span></p><p>Address of the bookmark: <a href="https://github.com/dhlbh/BASE" rel="nofollow">https://github.com/dhlbh/BASE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39671/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Sat, 06 Jul 2019 03:48:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39671/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34049/libsvm-a-library-for-support-vector-machines</guid>
	<pubDate>Wed, 02 Aug 2017 06:49:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34049/libsvm-a-library-for-support-vector-machines</link>
	<title><![CDATA[LIBSVM -- A Library for Support Vector Machines]]></title>
	<description><![CDATA[<p><strong>LIBSVM&nbsp;</strong>is an integrated software for support vector classification, (C-SVC,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#nuandone">nu-SVC</a>), regression (epsilon-SVR,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#nuandone">nu-SVR</a>) and distribution estimation (<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#nuandone">one-class SVM</a>). It supports multi-class classification.</p>
<p>Since version 2.8, it implements an SMO-type algorithm proposed in this paper:<br>R.-E. Fan, P.-H. Chen, and C.-J. Lin.&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf">Working set selection using second order information for training SVM</a>. Journal of Machine Learning Research 6, 1889-1918, 2005. You can also find a pseudo code there. (<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f203">how to cite LIBSVM</a>)</p>
<p><span style="color: #ff0000;">Our goal is to help users from other fields to easily use SVM as a tool.&nbsp;</span><strong>LIBSVM&nbsp;</strong>provides a simple interface where users can easily link it with their own programs. Main features of&nbsp;<strong>LIBSVM</strong>&nbsp;include</p>
<ul>
<li>Different SVM formulations</li>
<li>Efficient multi-class classification</li>
<li>Cross validation for model selection</li>
<li>Probability estimates</li>
<li>Various kernels (including precomputed kernel matrix)</li>
<li>Weighted SVM for unbalanced data</li>
<li>Both C++ and&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#java">Java</a>&nbsp;sources</li>
<li><a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#GUI">GUI</a>&nbsp;demonstrating SVM classification and regression</li>
<li><a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#python">Python</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#R">R</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab">MATLAB</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#perl">Perl</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#ruby">Ruby</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#weka">Weka</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#lisp">Common LISP</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#clisp">CLISP</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#haskell">Haskell</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#ocaml">OCaml</a>,&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#labview">LabVIEW</a>, and&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#PHP">PHP</a>&nbsp;interfaces.&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#csharp">C# .NET</a>&nbsp;code and&nbsp;<a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/#cuda">CUDA</a>&nbsp;extension is available.&nbsp;<br>It's also included in some data mining environments:&nbsp;<a href="http://rapid-i.com/">RapidMiner</a>,&nbsp;<a href="http://pcp.sourceforge.net/">PCP</a>, and&nbsp;<a href="http://lionoso.org/">LIONsolver</a>.</li>
<li>Automatic model selection which can generate contour of cross validation accuracy.</li>
<li></li>
</ul>
<p>https://www.csie.ntu.edu.tw/~cjlin/libsvm/</p><p>Address of the bookmark: <a href="https://www.csie.ntu.edu.tw/~cjlin/libsvm/" rel="nofollow">https://www.csie.ntu.edu.tw/~cjlin/libsvm/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37529/bokeh-an-interactive-visualization-library-that-targets-modern-web-browsers-for-presentation</guid>
	<pubDate>Fri, 10 Aug 2018 18:43:08 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37529/bokeh-an-interactive-visualization-library-that-targets-modern-web-browsers-for-presentation</link>
	<title><![CDATA[Bokeh: An interactive visualization library that targets modern web browsers for presentation]]></title>
	<description><![CDATA[<p id="about">Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.</p>
<p>To get started using Bokeh to make your visualizations, see the&nbsp;<a href="https://bokeh.pydata.org/en/latest/docs/user_guide.html#userguide">User Guide</a>.</p>
<p>To see examples of how you might use Bokeh with your own data, check out the&nbsp;<a href="https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery">Gallery</a>.</p>
<p>A complete API reference of Bokeh is at&nbsp;<a href="https://bokeh.pydata.org/en/latest/docs/reference.html#refguide">Reference Guide</a>.</p>
<p>If you are interested in contributing to Bokeh, or extending the library, see the&nbsp;<a href="https://bokeh.pydata.org/en/latest/docs/dev_guide.html#devguide">Developer Guide</a>.</p><p>Address of the bookmark: <a href="https://bokeh.pydata.org/en/latest/" rel="nofollow">https://bokeh.pydata.org/en/latest/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40389/sequila-cov-a-fast-and-scalable-library-for-depth-of-coverage-calculations</guid>
	<pubDate>Sun, 15 Dec 2019 10:19:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40389/sequila-cov-a-fast-and-scalable-library-for-depth-of-coverage-calculations</link>
	<title><![CDATA[SeQuiLa-cov: A fast and scalable library for depth of coverage calculations]]></title>
	<description><![CDATA[<p><span>The Docker image is available at&nbsp;</span><a href="https://hub.docker.com/r/biodatageeks/" target="">https://hub.docker.com/r/biodatageeks/</a><span>. Supplementary information on benchmarking procedure as well as test data are publicly accessible at the project documentation site&nbsp;</span><a href="http://biodatageeks.org/sequila/benchmarking/benchmarking.html#depth-of-coverage" target="">http://biodatageeks.org/sequila/benchmarking/benchmarking.html#depth-of-coverage</a><span>. An archival copy of the code and supporting data is also available via the GigaScience database GigaDB</span></p>
<p>&bull; Project name: SeQuiLa-cov</p>
<p>&bull; Project home page:&nbsp;<a href="http://biodatageeks.org/sequila/" target="">http://biodatageeks.org/sequila/</a></p>
<p>&bull; Source code repository:&nbsp;<a href="https://github.com/ZSI-Bio/bdg-sequila" target="">https://github.com/ZSI-Bio/bdg-sequila</a></p>
<p>&bull; Operating system: Platform independent</p>
<p>&bull; Programming language: Scala</p>
<p>&bull; Other requirements: Docker</p>
<p>&bull; License: Apache License 2.0</p><p>Address of the bookmark: <a href="https://academic.oup.com/gigascience/article/8/8/giz094/5543653" rel="nofollow">https://academic.oup.com/gigascience/article/8/8/giz094/5543653</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35249/gpopsim-a-simulation-tool-for-whole-genome-genetic-data</guid>
	<pubDate>Wed, 17 Jan 2018 03:47:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35249/gpopsim-a-simulation-tool-for-whole-genome-genetic-data</link>
	<title><![CDATA[GPOPSIM: a simulation tool for whole-genome genetic data]]></title>
	<description><![CDATA[<p><span>GPOPSIM is a simulation tool for pedigree, phenotypes, and genomic data, with a variety of population and genome structures and trait genetic architectures. It provides flexible parameter settings for a wide discipline of users, especially can simulate multiple genetically correlated traits with desired genetic parameters and underlying genetic architectures.</span></p><p>Address of the bookmark: <a href="https://github.com/SCAU-AnimalGenetics/GPOPSIM" rel="nofollow">https://github.com/SCAU-AnimalGenetics/GPOPSIM</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39837/cactus-a-reference-free-whole-genome-multiple-alignment-program</guid>
	<pubDate>Mon, 12 Aug 2019 07:52:33 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39837/cactus-a-reference-free-whole-genome-multiple-alignment-program</link>
	<title><![CDATA[Cactus: a reference-free whole-genome multiple alignment program]]></title>
	<description><![CDATA[<p>Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here:&nbsp;<a href="https://doi.org/10.1101/gr.123356.111">https://doi.org/10.1101/gr.123356.111</a></p>
<p><span>Cactus uses substantial resources. For primate-sized genomes (3 gigabases each), you should expect Cactus to use approximately 120 CPU-days of compute per genome, with about 120 GB of RAM used at peak. The requirements scale roughly quadratically, so aligning two 1-megabase bacterial genomes takes only 1.5 CPU-hours and 14 GB RAM.</span>&nbsp;</p><p>Address of the bookmark: <a href="https://github.com/ComparativeGenomicsToolkit/cactus" rel="nofollow">https://github.com/ComparativeGenomicsToolkit/cactus</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42148/chromatiblock-scalable-whole-genome-visualisation-of-structural-changes-in-prokaryotes</guid>
	<pubDate>Sat, 22 Aug 2020 05:17:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42148/chromatiblock-scalable-whole-genome-visualisation-of-structural-changes-in-prokaryotes</link>
	<title><![CDATA[chromatiblock: Scalable, whole-genome visualisation of structural changes in prokaryotes]]></title>
	<description><![CDATA[<p>To create a fresh environment for chromatiblock to run in do:</p>
<pre><code>conda create --name chromatiblock
conda activate chromatiblock
conda install chromatiblock --channel conda-forge --channel bioconda
</code></pre>
<p>Then in future to run chromatiblock you can reactivate this environemtn using&nbsp;<code>conda activate chromatiblock</code></p>
<h4><a href="https://github.com/mjsull/chromatiblock#direct-download"></a>Direct download:</h4>
<p>Alternatively you can download and run the script from&nbsp;<a href="https://github.com/mjsull/chromatiblock/releases/download/v0.4.1/chromatiblock">here</a>.</p><p>Address of the bookmark: <a href="https://github.com/mjsull/chromatiblock" rel="nofollow">https://github.com/mjsull/chromatiblock</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>