<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38212?offset=70</link>
	<atom:link href="https://bioinformaticsonline.com/related/38212?offset=70" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41559/dahak-benchmarking-and-containerization-of-tools-for-analysis-of-complex-non-clinical-metagenomes</guid>
	<pubDate>Thu, 09 Apr 2020 04:56:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41559/dahak-benchmarking-and-containerization-of-tools-for-analysis-of-complex-non-clinical-metagenomes</link>
	<title><![CDATA[Dahak: benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.]]></title>
	<description><![CDATA[<p><span>Dahak is a software suite that integrates state-of-the-art open source tools for metagenomic analyses. Tools in the dahak software suite will perform various steps in metagenomic analysis workflows including data pre-processing, metagenome assembly, taxonomic and functional classification, genome binning, and gene assignment. We aim to deliver the analytical framework as a robust and reliable containerized workflow system, which will be free from dependency, installation, and execution problems typically associated with other open-source bioinformatics solutions. This will maximize the transparency, data provenance (i.e., the process of tracing the origins of data and its movement through the workflow), and reproducibility.</span></p>
<p><span>More at&nbsp;<a href="https://dahak-metagenomics.github.io/dahak/">https://dahak-metagenomics.github.io/dahak/</a></span></p><p>Address of the bookmark: <a href="https://github.com/dahak-metagenomics/dahak" rel="nofollow">https://github.com/dahak-metagenomics/dahak</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31714/krona</guid>
	<pubDate>Wed, 22 Mar 2017 04:47:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31714/krona</link>
	<title><![CDATA[Krona]]></title>
	<description><![CDATA[<p>Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an <a href="https://github.com/marbl/Krona/wiki/ExcelTemplate">Excel template</a> or <a href="https://github.com/marbl/Krona/wiki/KronaTools">KronaTools</a>, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser (see <a href="https://github.com/marbl/Krona/wiki/Browser%20support">Browser support</a>).</p>
<p><a href="http://marbl.github.io/Krona/img/screen_mgrast.png"><img src="https://camo.githubusercontent.com/27b71b1f1832523723c3d14dec764e7ad098438c/687474703a2f2f6d6172626c2e6769746875622e696f2f4b726f6e612f696d672f7468756d625f6d67726173742e706e67" width="210" height="167" alt="image" style="border: 0px;"></a></p><p>Address of the bookmark: <a href="https://github.com/marbl/Krona/wiki" rel="nofollow">https://github.com/marbl/Krona/wiki</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40713/glia-a-graphsmith-waterman-partial-order-alignerrealigner</guid>
	<pubDate>Tue, 28 Jan 2020 04:02:58 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40713/glia-a-graphsmith-waterman-partial-order-alignerrealigner</link>
	<title><![CDATA[Glia: a Graph/Smith-Waterman (partial order) aligner/realigner]]></title>
	<description><![CDATA[<p><span>glia's main use is as a local realigner. It will realign reads to a set of known (or putative) variants in a VCF, both consuming and producing an ordered stream of BAM alignments.&nbsp;</span></p>
<p><span>More at&nbsp;<a href="https://github.com/ekg/glia">https://github.com/ekg/glia</a></span></p>
<pre><code>glia -f ~/human_g1k_v37.fasta -t 20:62900077-62902077 -v variants.vcf.gz \
     -s AAATGTAAACATTTTATAGGGGATTCCCCTAAAAACAAAAAAACTTTCTGGGAAAGATTTTTCAAAAAATAAAA</code></pre><p>Address of the bookmark: <a href="https://github.com/ekg/glia" rel="nofollow">https://github.com/ekg/glia</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44182/collection-of-graph-visualization-tools</guid>
	<pubDate>Wed, 25 Jan 2023 02:57:42 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44182/collection-of-graph-visualization-tools</link>
	<title><![CDATA[Collection of Graph Visualization tools !]]></title>
	<description><![CDATA[<p>Standard approaches to genome inference and analysis relate sequences to a single linear reference genome. This is efficient but has a fundamental problem: Differences from this reference are hard to observe and describe in a coherent way. Variation and sequence are separated.</p>
<p><a href="https://pangenome.github.io/images/genomic-vs-pangenomic-analysis.png"><img src="https://pangenome.github.io/images/genomic-vs-pangenomic-analysis.png" alt="image" width="45%" style="border: 0px; border: 0px;"></a><span>&nbsp;</span><a href="https://pangenome.github.io/images/genomic-vs-pangenomic-models.png"><img src="https://pangenome.github.io/images/genomic-vs-pangenomic-models.png" alt="image" width="54%" style="border: 0px; border: 0px;"></a></p>
<p><a href="https://fungidb.org/fungidb/app/downloads/Current_Release/GultimumBR650/" target="_blank">https://fungidb.org/fungidb/app/downloads/Current_Release/GultimumBR650/</a></p><p>Address of the bookmark: <a href="https://pangenome.github.io/" rel="nofollow">https://pangenome.github.io/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33791/slactree-svg-large-annotated-circular-tree-drawing</guid>
	<pubDate>Mon, 03 Jul 2017 08:02:56 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33791/slactree-svg-large-annotated-circular-tree-drawing</link>
	<title><![CDATA[slacTree: SVG Large Annotated Circular Tree drawing]]></title>
	<description><![CDATA[<p>A simple, extensible, Perl script for producing figures of large phylogenetic trees.</p>
<ul>
<li>While there are many other tree drawing programs, slacTree was originally written in 2009 to fill a need for producing publication quality figures of circular trees with more than 1000 taxa with custom annotations</li>
<li>Because it is a single Perl script with very few dependencies, it is easy to run, and easy to further customize</li>
<li>SVG is used because it is a scalable format allowing for very small representations of entire trees or highly magnified regions with unlimited resolution</li>
<li>Circular and radial trees are more compact than linear representations</li>
<li></li>
</ul>
<h2>&nbsp;</h2><p>Address of the bookmark: <a href="https://github.com/mccrowjp/slacTree" rel="nofollow">https://github.com/mccrowjp/slacTree</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</guid>
	<pubDate>Fri, 19 Oct 2018 07:25:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37957/base-a-practical-de-novo-assembler-for-large-genomes-using-long-ngs-reads</link>
	<title><![CDATA[BASE: a practical de novo assembler for large genomes using long NGS reads]]></title>
	<description><![CDATA[<p><span>new&nbsp;</span><em>de novo</em><span>&nbsp;assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.</span></p><p>Address of the bookmark: <a href="https://github.com/dhlbh/BASE" rel="nofollow">https://github.com/dhlbh/BASE</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43770/chromeister-an-ultra-fast-heuristic-approach-to-detect-conserved-signals-in-extremely-large-pairwise-genome-comparisons</guid>
	<pubDate>Thu, 03 Feb 2022 04:01:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43770/chromeister-an-ultra-fast-heuristic-approach-to-detect-conserved-signals-in-extremely-large-pairwise-genome-comparisons</link>
	<title><![CDATA[chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.]]></title>
	<description><![CDATA[<p>chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.</p>
<p dir="auto">USAGE:</p>
<ul dir="auto">
<li>-query: sequence A in fasta format</li>
<li>-db: sequence B in fasta format</li>
<li>-out: output matrix</li>
<li>-kmer Integer: k&gt;1 (default 32) Use 32 for chromosomes and genomes and 16 for small bacteria</li>
<li>-diffuse Integer: z&gt;0 (default 4) Use 4 for everything - if using large plant genomes you can try using 1</li>
<li>-dimension Size of the output matrix and plot. Integer: d&gt;0 (default 1000) Use 1000 for everything that is not full genome size, where 2000 is recommended</li>
</ul><p>Address of the bookmark: <a href="https://github.com/estebanpw/chromeister" rel="nofollow">https://github.com/estebanpw/chromeister</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/22761/pit-bioinformatics-group</guid>
  <pubDate>Tue, 16 Jun 2015 14:34:26 -0500</pubDate>
  <link></link>
  <title><![CDATA[PIT Bioinformatics Group]]></title>
  <description><![CDATA[
<p>PIT Bioinformatics Group solves problems in bioinformatics and  computational biology. Recent developed online tools:</p>

<p>- Budapest Reference Connectome: View a parametrizable connectome (brain graph).<br />- AmphoraNet: The webserver implementation of the AMPHORA2 workflow for phylogenetic analysis of metagenomic shotgun sequencing data.<br />- AmphoraVizu: Chart visualization for metagenomics analysis tools AMPHORA2 and AmphoraNet.<br />- SCARF: Free online association rule mining tool.</p>

<p>More at: http://pitgroup.org</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42132/squeezemeta-a-fully-automated-metagenomics-pipeline-from-reads-to-bins</guid>
	<pubDate>Mon, 17 Aug 2020 05:25:10 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42132/squeezemeta-a-fully-automated-metagenomics-pipeline-from-reads-to-bins</link>
	<title><![CDATA[SqueezeMeta: a fully automated metagenomics pipeline, from reads to bins]]></title>
	<description><![CDATA[<p>SqueezeMeta is a full automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. SqueezeMeta includes multi-metagenome support allowing the co-assembly of related metagenomes and the retrieval of individual genomes via binning procedures. Thus, SqueezeMeta features several unique characteristics:</p>
<ol>
<li>Co-assembly procedure with read mapping for estimation of the abundances of genes in each metagenome</li>
<li>Co-assembly of a large number of metagenomes via merging of individual metagenomes</li>
<li>Includes binning and bin checking, for retrieving individual genomes</li>
<li>The results are stored in a database, where they can be easily exported and shared, and can be inspected anywhere using a web interface.</li>
<li>Internal checks for the assembly and binning steps inform about the consistency of contigs and bins, allowing to spot potential chimeras.</li>
<li>Metatranscriptomic support via mapping of cDNA reads against reference metagenomes</li>
</ol><p>Address of the bookmark: <a href="https://github.com/jtamames/SqueezeMeta" rel="nofollow">https://github.com/jtamames/SqueezeMeta</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>