<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44474?offset=50</link>
	<atom:link href="https://bioinformaticsonline.com/related/44474?offset=50" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34216/meraculous-de-novo-genome-assembly-with-short-paired-end-reads</guid>
	<pubDate>Tue, 07 Nov 2017 04:36:10 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34216/meraculous-de-novo-genome-assembly-with-short-paired-end-reads</link>
	<title><![CDATA[Meraculous: De Novo Genome Assembly with Short Paired-End Reads]]></title>
	<description><![CDATA[<p><span>We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast&nbsp;</span><em>Pichia stipitis</em><span>. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the&nbsp;</span><em>k</em><span>-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by &sim;280 bp or &sim;3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.</span></p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158087/" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3158087/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36514/evidentialgene-tr2aacds-mrna-transcript-assembly-software</guid>
	<pubDate>Tue, 08 May 2018 04:39:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36514/evidentialgene-tr2aacds-mrna-transcript-assembly-software</link>
	<title><![CDATA[EvidentialGene: tr2aacds, mRNA Transcript Assembly Software]]></title>
	<description><![CDATA[<p><span>EvidentialGene is a genome informatics project, "Evidence Directed Gene Construction for Eukaryotes", to construct high quality, accurate gene sets for animals and plants, developed by Don Gilbert at Indiana University, see</span><br><a href="http://arthropods.eugenes.org/EvidentialGene/" target="_blank">http://arthropods.eugenes.org/EvidentialGene/<span></span></a><br><br><span>Construction refers to the combination of classical gene prediction, and more recent gene assembly (de-novo and genome-assisted) methods. The basic Evigene methods involve using available best-of-breed gene prediction and assembly software, combining all evidence for genes, from expressed sequences, genome assembly sequences, related species protein sequences, and any other, to annotate and score gene constructions. Over-produced constructions are classified by gene evidence for best qualities per "locus", including genome-aligned and gene-transcript aligned (genome-free) locus identification. All software developed for EvidentialGene is publicly available. See project wiki/blog for notes.</span></p>
<p><span>Download&nbsp;</span></p>
<p>http://arthropods.eugenes.org/EvidentialGene/trassembly.html</p>
<p>https://sourceforge.net/p/evidentialgene/blog/</p><p>Address of the bookmark: <a href="http://arthropods.eugenes.org/EvidentialGene/trassembly.html" rel="nofollow">http://arthropods.eugenes.org/EvidentialGene/trassembly.html</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36597/gappadder-a-sensitive-approach-for-closing-gaps-on-draft-genomes-with-short-sequence-reads</guid>
	<pubDate>Mon, 14 May 2018 05:25:48 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36597/gappadder-a-sensitive-approach-for-closing-gaps-on-draft-genomes-with-short-sequence-reads</link>
	<title><![CDATA[GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads]]></title>
	<description><![CDATA[<p><span>This software is provided ``as is&rdquo; without warranty of any kind. In no event shall the author be held responsible for any damage resulting from the use of this software. The program package, including source codes, executables, and this documentation, is distributed free of charge. If you use this program in a publication, please cite the following reference:</span><br><span>Chong Chu, Xin Li, and Yufeng Wu. "GAPPadder: A Sensitive Approach for Closing Gaps on Draft Genomes with Short Sequence Reads." bioRxiv (2017): 125534.</span></p><p>Address of the bookmark: <a href="https://github.com/Reedwarbler/GAPPadder" rel="nofollow">https://github.com/Reedwarbler/GAPPadder</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38316/simba-a-genome-assembly-project-management-system</guid>
	<pubDate>Thu, 29 Nov 2018 08:52:25 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38316/simba-a-genome-assembly-project-management-system</link>
	<title><![CDATA[SIMBA: a Genome Assembly Project Management System]]></title>
	<description><![CDATA[<p><span>SIMBA</span><span>, SImple Manager for Bacterial Assemblies, is a Web interface for managing assembly projects of bacterial genomes. SIMBA was created to assist bioinformaticians to assemble bacterial genomes sequenced with NextGeneration Sequencing (NGS) platforms quickly, easily and effectively. SIMBA also is open source tool, i.e., can be freely downloaded, shared and modified.</span></p><p>Address of the bookmark: <a href="http://ufmg-simba.sourceforge.net/" rel="nofollow">http://ufmg-simba.sourceforge.net/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</guid>
	<pubDate>Mon, 20 Jan 2020 05:13:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</link>
	<title><![CDATA[De novo Genome Assembly for Illumina Data]]></title>
	<description><![CDATA[<p>Written and maintained by <a href="mailto:simon.gladman@unimelb.edu.au">Simon Gladman</a> - Melbourne Bioinformatics (formerly VLSCI)</p>
<p>Protocol Overview / Introduction</p>
<p>In this protocol we discuss and outline the process of de novo assembly for small to medium sized genomes.</p>
<p>https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</p><p>Address of the bookmark: <a href="https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/" rel="nofollow">https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/42626/spades-team-announce-new-version-spades-v315</guid>
	<pubDate>Fri, 15 Jan 2021 10:24:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/42626/spades-team-announce-new-version-spades-v315</link>
	<title><![CDATA[SPADes team announce new version SPADes v3.15]]></title>
	<description><![CDATA[<p>New SPAdes 3.15.0.0. announced by the SPADes team This release includes such new features as:&nbsp;<br />- CoronaSPAdes pipeline for the assembly of transcriptomic and metatranscriptomic data of full-length coronaviridae genomes;&nbsp;<br />- Meta-Viral and RNA-Viral pipelines for metagenomic and metatranscriptomic data defining viral genomes;&nbsp;<br />-New trusted contiguous use algorithm;&nbsp;<br />-Switched to the memory allocator mimalloc;&nbsp;<br />- PlasmidSPAdes and bgcSPAdes are now provided as an input assembly graph;&nbsp;<br />- Important improvements and corrections to the metaplasmid pipeline;&nbsp;<br />- Multiple performance improvements in procedures for simplification and repeat resolving.&nbsp;<br />Please, consider updating.</p><p>Check out more at&nbsp;https://cab.spbu.ru/software/spades/</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43315/genome-assembly-workshop-2020</guid>
	<pubDate>Wed, 25 Aug 2021 04:30:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43315/genome-assembly-workshop-2020</link>
	<title><![CDATA[Genome Assembly Workshop 2020]]></title>
	<description><![CDATA[<p><span>Our team offers custom bioinformatics services to academic and private organizations. We have a strong academic background with a focus on cutting edge, open source software. We replicate standard analysis pipelines (best practices) when appropriate, and/or develop novel applications and pipelines when needed, however we always emphasize biological interpretation of the data.</span></p>
<p><span>More at&nbsp;https://ucdavis-bioinformatics-training.github.io/</span></p><p>Address of the bookmark: <a href="https://ucdavis-bioinformatics-training.github.io/2020-Genome_Assembly_Workshop/snakemake/snakemake_intro" rel="nofollow">https://ucdavis-bioinformatics-training.github.io/2020-Genome_Assembly_Workshop/snakemake/snakemake_intro</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</guid>
	<pubDate>Wed, 08 Mar 2023 01:39:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</link>
	<title><![CDATA[ALE: Assembly Likelihood Estimator]]></title>
	<description><![CDATA[<p>Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly.&nbsp; Depending on your genome size you may want to convert the .wig files to the BigWig format.</p><p>Address of the bookmark: <a href="https://github.com/sc932/ALE" rel="nofollow">https://github.com/sc932/ALE</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37524/fmlrc-a-long-read-error-correction-tool-using-the-multi-string-burrows-wheeler-transform</guid>
	<pubDate>Fri, 10 Aug 2018 13:29:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37524/fmlrc-a-long-read-error-correction-tool-using-the-multi-string-burrows-wheeler-transform</link>
	<title><![CDATA[FMLRC: a long-read error correction tool using the multi-string Burrows Wheeler Transform]]></title>
	<description><![CDATA[<p><span>FMLRC, or FM-index Long Read Corrector, is a tool for performing hybrid correction of long read sequencing using the BWT and FM-index of short-read sequencing data. Given a BWT of the short-read sequencing data, FMLRC will build an FM-index and use that as an implicit de Bruijn graph. Each long read is then corrected independently by identifying low frequency k-mers in the long read and replacing them with the closest matching high frequency k-mers in the implicit de Bruijn graph. In contrast to other de Bruijn graph based implementations, FMLRC is not restricted to a particular k-mer size and instead uses a two pass method with both a short "k-mer" and a longer "K-mer". This allows FMLRC to correct through low complexity regions that are computational difficult for short k-mers.</span></p><p>Address of the bookmark: <a href="https://github.com/holtjma/fmlrc" rel="nofollow">https://github.com/holtjma/fmlrc</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>

</channel>
</rss>