<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/41501?offset=260</link>
	<atom:link href="https://bioinformaticsonline.com/related/41501?offset=260" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38316/simba-a-genome-assembly-project-management-system</guid>
	<pubDate>Thu, 29 Nov 2018 08:52:25 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38316/simba-a-genome-assembly-project-management-system</link>
	<title><![CDATA[SIMBA: a Genome Assembly Project Management System]]></title>
	<description><![CDATA[<p><span>SIMBA</span><span>, SImple Manager for Bacterial Assemblies, is a Web interface for managing assembly projects of bacterial genomes. SIMBA was created to assist bioinformaticians to assemble bacterial genomes sequenced with NextGeneration Sequencing (NGS) platforms quickly, easily and effectively. SIMBA also is open source tool, i.e., can be freely downloaded, shared and modified.</span></p><p>Address of the bookmark: <a href="http://ufmg-simba.sourceforge.net/" rel="nofollow">http://ufmg-simba.sourceforge.net/</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</guid>
	<pubDate>Mon, 20 Jan 2020 05:13:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</link>
	<title><![CDATA[De novo Genome Assembly for Illumina Data]]></title>
	<description><![CDATA[<p>Written and maintained by <a href="mailto:simon.gladman@unimelb.edu.au">Simon Gladman</a> - Melbourne Bioinformatics (formerly VLSCI)</p>
<p>Protocol Overview / Introduction</p>
<p>In this protocol we discuss and outline the process of de novo assembly for small to medium sized genomes.</p>
<p>https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</p><p>Address of the bookmark: <a href="https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/" rel="nofollow">https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43419/senior-bioinformatician-assembly-moore-aquatic-symbiosis-project-tree-of-life</guid>
  <pubDate>Sat, 02 Oct 2021 00:28:30 -0500</pubDate>
  <link></link>
  <title><![CDATA[Senior Bioinformatician (Assembly) Moore Aquatic Symbiosis Project Tree of Life]]></title>
  <description><![CDATA[
<p>You will have some previous experience with genome bioinformatics or other large scale scientific data analysis, or a newly qualified graduate student with data science skills interested in DNA sequence data. While desirable, previous experience with DNA sequencing data is not strictly necessary for the position. We have a strong publication record and culture of producing open data resources and open source software development. This role requires an investigative and solution-oriented mindset and excellent communication skills to work effectively within large national and international consortia. </p>

<p>More at https://jobs.sanger.ac.uk/vacancy/senior-bioinformatician-assembly-moore-aquatic-symbiosis-project-tree-of-life-458923.html</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</guid>
	<pubDate>Wed, 12 Jul 2023 06:29:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</link>
	<title><![CDATA[NCBI Datasets pages]]></title>
	<description><![CDATA[<p>Update! Assembly and Genome record pages now redirect to new NCBI Datasets pages. NCBI Datasets is a new resource that makes it easier to find and download genome data. Learn more: https://ncbiinsights.ncbi.nlm.nih.gov/2023/07/11/ncbi-datasets-genome-assembly-pages/&nbsp;<a href="https://ow.ly/GU3o50P8QH4"></a><a href="https://www.linkedin.com/feed/hashtag/?keywords=ncbicgr&amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7084592728260386816">#NCBICGR</a></p><p><span>Effective July 10, 2023, NCBI&rsquo;s Assembly and Genome record pages now redirect to&nbsp;</span>new<a href="https://www.ncbi.nlm.nih.gov/datasets/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"> NCBI Datasets </a><span>pages. As&nbsp;</span><a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/03/07/ncbi-datasets-genome-taxonomy-pages/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711">previously announced</a><span>, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.  </span><span>&nbsp;</span></p><h5>The following pages have been updated:</h5><ul>
<li><span>The NCBI Assembly record pages now redirect to the new </span><a href="https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_023065955.2/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Genome</span></a><span> </span><span>record pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. </span><span>&nbsp;</span></li>
<li><span>The NCBI</span><strong> </strong><span>Genome record pages now redirect to the </span><a href="https://www.ncbi.nlm.nih.gov/datasets/taxonomy/9644/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Taxonomy</span></a><span> </span><span>record pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.  </span><span>&nbsp;</span></li>
</ul><p><span>During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024. </span><span>&nbsp;</span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36618/lamsa-fast-split-read-alignment-with-long-approximate-matches</guid>
	<pubDate>Tue, 15 May 2018 04:44:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36618/lamsa-fast-split-read-alignment-with-long-approximate-matches</link>
	<title><![CDATA[LAMSA: fast split read alignment with long approximate matches]]></title>
	<description><![CDATA[LAMSA (Long Approximate Matches-based Split Aligner) is a novel split alignment approach with faster speed and good ability of handling SV events. It is well-suited to align long reads (over thousands of base-pairs).

LAMSA takes takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming (SDP)-based split alignment approach to handle the large or non-co-linear variants.

We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; mean-while, it also has good ability to handle various categories of SVs.

LAMSA is open source and free for non-commercial use.

LAMSA is mainly designed by Bo Liu &amp; Yan Gao and developed by Yan Gao in Center for Bioinformatics, Harbin Institute of Technology, China.<p>Address of the bookmark: <a href="https://github.com/hitbc/LAMSA" rel="nofollow">https://github.com/hitbc/LAMSA</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37457/nanofilt-filtering-and-trimming-of-long-read-sequencing-data</guid>
	<pubDate>Mon, 30 Jul 2018 12:01:52 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37457/nanofilt-filtering-and-trimming-of-long-read-sequencing-data</link>
	<title><![CDATA[nanofilt: Filtering and trimming of long read sequencing data]]></title>
	<description><![CDATA[<p>Filtering on quality and/or read length, and optional trimming after passing filters.<br>Reads from stdin, writes to stdout.</p>
<p>Intended to be used:</p>
<ul>
<li>directly after fastq extraction</li>
<li>prior to mapping</li>
<li>in a stream between extraction and mapping</li>
</ul>
<p>https://github.com/wdecoster/nanofilt</p><p>Address of the bookmark: <a href="https://github.com/wdecoster/nanofilt" rel="nofollow">https://github.com/wdecoster/nanofilt</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35057/ectools-long-read-correction-and-other-correction-tools</guid>
	<pubDate>Fri, 05 Jan 2018 04:02:22 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35057/ectools-long-read-correction-and-other-correction-tools</link>
	<title><![CDATA[ECTOOLS: Long Read Correction and other Correction tools]]></title>
	<description><![CDATA[<p>Long Read Correction and other Correction tools</p>
<p>This package is a loose collection of scripts. To run the correction<br>routine see the section below. Descriptions of the other scripts<br>are at the bottom of this file.</p>
<p>Contact: gurtowsk@cshl.edu</p>
<p>In short, the correction algorithm takes as input the unitigs from a short read assembly and uses them to correct long read data. More background information for the algorithm can be found:<br>http://schatzlab.cshl.edu/presentations/2013-06-18.PBUserMeeting.pdf</p><p>Address of the bookmark: <a href="https://github.com/jgurtowski/ectools" rel="nofollow">https://github.com/jgurtowski/ectools</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42530/shovill-assemble-bacterial-isolate-genomes-from-illumina-paired-end-reads</guid>
	<pubDate>Sat, 02 Jan 2021 07:05:36 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42530/shovill-assemble-bacterial-isolate-genomes-from-illumina-paired-end-reads</link>
	<title><![CDATA[shovill: Assemble bacterial isolate genomes from Illumina paired-end reads]]></title>
	<description><![CDATA[<p><span>Shovill is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time. Shovill also supports other assemblers like SKESA, Velvet and Megahit, so you can take advantage of the pre- and post-processing the Shovill provides with those too.</span></p><p>Address of the bookmark: <a href="https://github.com/tseemann/shovill" rel="nofollow">https://github.com/tseemann/shovill</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36512/hisat2-a-fast-and-sensitive-alignment-program-for-mapping-next-generation-sequencing-reads</guid>
	<pubDate>Tue, 08 May 2018 04:27:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36512/hisat2-a-fast-and-sensitive-alignment-program-for-mapping-next-generation-sequencing-reads</link>
	<title><![CDATA[HISAT2: a fast and sensitive alignment program for mapping next-generation sequencing reads]]></title>
	<description><![CDATA[<p><strong>HISAT2</strong><span>&nbsp;is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs&nbsp;</span><a href="http://dl.acm.org/citation.cfm?id=2674828">[Sir&eacute;n et al. 2014]</a><span>, we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).&nbsp;</span></p>
<p><span>more at&nbsp;https://ccb.jhu.edu/software/hisat2/index.shtml</span></p><p>Address of the bookmark: <a href="https://github.com/infphilo/hisat2" rel="nofollow">https://github.com/infphilo/hisat2</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>