<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44366?offset=230</link>
	<atom:link href="https://bioinformaticsonline.com/related/44366?offset=230" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</guid>
	<pubDate>Fri, 21 Feb 2025 10:39:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</link>
	<title><![CDATA[NVIDIA and Arc Institute Unveil Evo 2: A Breakthrough AI for DNA Design]]></title>
	<description><![CDATA[<p>NVIDIA and the Arc Institute have introduced <strong style="font-size: 12.8px;">Evo 2</strong>, a groundbreaking AI model designed to <strong style="font-size: 12.8px;">understand, predict, and generate DNA sequences</strong>. This marks a major advancement in computational biology, offering scientists an unprecedented tool to decode the genetic blueprint of life and even design entirely new biological systems.</p><h3><strong>The Power of Evo 2: AI Meets DNA</strong></h3><p>Evo 2 is <strong>the largest AI model for biology ever created</strong>, trained on an astonishing <strong>9.3 trillion DNA "letters"</strong> (nucleotides) carefully selected from genomes spanning the entire tree of life. This massive dataset ensures that Evo 2 can recognize patterns and relationships in genetic sequences at an unparalleled scale.</p><p>For the first time, scientists can <strong>design DNA with AI</strong>, moving beyond simple sequence analysis to active DNA generation. Evo 2 enables researchers to <strong>predict, modify, and even create entire genetic sequences</strong>, opening new possibilities in medicine, agriculture, and synthetic biology.</p><h3><strong>Decoding the Dark Genome</strong></h3><p>One of the biggest challenges in genetics is understanding the <strong>non-coding regions</strong> of DNA&mdash;vast stretches of the genome that do not code for proteins but play crucial roles in regulating gene expression. These regions control when and how genes are activated, influencing everything from development to disease.</p><p>Evo 2 is designed to <strong>decode these non-coding elements</strong>, helping researchers uncover their functions and use this knowledge to develop gene-based therapies, synthetic life forms, and precision agriculture solutions.</p><h3><strong>From Reading DNA to Writing It</strong></h3><p>To put Evo 2&rsquo;s impact into perspective:</p><ul>
<li><strong>Previous AI models could "read" DNA</strong> like a book, analyzing genetic sequences and identifying patterns.</li>
<li><strong>Evo 2 can "write" entirely new DNA</strong>, designing functional genes, chromosomes, and even full genomes from scratch.</li>
</ul><p>This means scientists can now <strong>engineer biological systems with AI</strong>, designing new proteins, metabolic pathways, and genetic circuits to address real-world challenges.</p><h3><strong>A Step Toward Generative Biology</strong></h3><p>The Arc Institute describes Evo 2 as a major step toward <strong>"generative biology"</strong>&mdash;a revolutionary approach where AI is used to create <strong>novel biological structures</strong> rather than just analyzing existing ones. This could lead to breakthroughs such as:</p><ul>
<li><strong>New medicines</strong>: AI-generated enzymes and proteins tailored for targeted therapies.</li>
<li><strong>Disease-resistant crops</strong>: Genetically optimized plants for higher yield and climate resilience.</li>
<li><strong>Synthetic organisms</strong>: Custom-designed microbes for bioremediation, biofuel production, and industrial applications.</li>
</ul><h3><strong>An Open-Source Revolution</strong></h3><p>Unlike many proprietary AI models, <strong>Evo 2 is open source</strong>, making its capabilities accessible to researchers worldwide. This democratization of AI-driven biology means that scientists from different disciplines can <strong>collaborate, experiment, and innovate</strong>, accelerating discoveries in genetic engineering and synthetic biology.</p><p>With Evo 2, the boundaries of what&rsquo;s possible in <strong>DNA design, genetic engineering, and biological innovation</strong> are being redrawn. The future of life sciences is no longer just about understanding life&rsquo;s code&mdash;it&rsquo;s about writing it.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</guid>
	<pubDate>Mon, 10 Jul 2017 05:56:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</link>
	<title><![CDATA[Omega2: metagenome assembly pipeline]]></title>
	<description><![CDATA[<p><span>Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.</span></p><p>Address of the bookmark: <a href="http://omega.omicsbio.org/" rel="nofollow">http://omega.omicsbio.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35345/rgfa-powerful-and-convenient-handling-of-assembly-graphs</guid>
	<pubDate>Thu, 25 Jan 2018 05:47:53 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35345/rgfa-powerful-and-convenient-handling-of-assembly-graphs</link>
	<title><![CDATA[RGFA: powerful and convenient handling of assembly graphs]]></title>
	<description><![CDATA[<p><span>RGFA, an implementation of the proposed GFA specification in Ruby. It allows the user to conveniently parse, edit and write GFA files. Complex operations such as the separation of the implicit instances of repeats and the merging of linear paths can be performed. A typical application of RGFA is the editing of a graph, to finish the assembly of a sequence, using information not available to the assembler. We illustrate a use case, in which the assembly of a repetitive metagenomic fosmid insert was completed using a script based on RGFA.</span></p>
<p><span>https://github.com/ggonnella/rgfa</span></p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103826/" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103826/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36867/cerulean-a-hybrid-assembly-using-high-throughput-short-and-long-reads</guid>
	<pubDate>Tue, 05 Jun 2018 10:10:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36867/cerulean-a-hybrid-assembly-using-high-throughput-short-and-long-reads</link>
	<title><![CDATA[Cerulean: A hybrid assembly using high throughput short and long reads]]></title>
	<description><![CDATA[Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads.

Cerulean v0.1 has been implemented with bacterial genomes in mind.

The method is fully described in Deshpande, V., Fung, E. D., Pham, S., &amp; Bafna, V. (2013). Cerulean: A hybrid assembly using high throughput short and long reads. arXiv preprint arXiv:1307.7933.
http://arxiv.org/abs/1307.7933<p>Address of the bookmark: <a href="https://sourceforge.net/projects/ceruleanassembler/" rel="nofollow">https://sourceforge.net/projects/ceruleanassembler/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</guid>
	<pubDate>Mon, 20 Aug 2018 04:08:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37554/finishersca-repeat-aware-tool-for-upgrading-de-novo-assembly-using-long-reads</link>
	<title><![CDATA[FinisherSC:a repeat-aware tool for upgrading de novo assembly using long reads]]></title>
	<description><![CDATA[<p><br>Here is the command to run the tool:</p>
<pre><code>python finisherSC.py destinedFolder mummerPath
</code></pre>
<p>If you are running on server computer and would like to use multiple threads, then the following commands can generate 20 threads to run FinisherSC.</p>
<pre><code>python finisherSC.py -par 20 destinedFolder mummerPath
</code></pre>
<p>Sometimes, if the names of raw reads and contigs consists of special characters/formats, FinisherSC/MUMmer may not parse them correctly. In that case, you want to have a quick renaming of the names of contigs/reads in contigs.fasta or raw_reads.fasta using the following command.</p>
<pre><code>    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' raw_reads.fasta &gt; newRaw_reads.fasta
    cp newRaw_reads.fasta raw_reads.fasta
    perl -pe 's/&gt;[^\$]*$/"&gt;Seg" . ++$n ."\n"/ge' contigs.fasta &gt; newContigs.fasta
    cp newContigs.fasta contigs.fasta</code></pre><p>Address of the bookmark: <a href="https://github.com/kakitone/finishingTool" rel="nofollow">https://github.com/kakitone/finishingTool</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Tue, 02 Apr 2019 21:54:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41843/stringtie-transcript-assembly-and-quantification-for-rna-seq</guid>
	<pubDate>Tue, 09 Jun 2020 05:21:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41843/stringtie-transcript-assembly-and-quantification-for-rna-seq</link>
	<title><![CDATA[StringTie Transcript assembly and quantification for RNA-Seq]]></title>
	<description><![CDATA[<p><strong>StringTie</strong><span>&nbsp;is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional&nbsp;</span><em>de novo</em><span>&nbsp;assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only alignments of short reads that can also be used by other transcript assemblers, but also alignments of longer sequences that have been assembled from those reads. In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like&nbsp;</span><a href="https://github.com/alyssafrazee/ballgown">Ballgown</a><span>,&nbsp;</span><a href="http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html">Cuffdiff</a><span>&nbsp;or other programs (DESeq2, edgeR, etc.).</span></p><p>Address of the bookmark: <a href="https://ccb.jhu.edu/software/stringtie/" rel="nofollow">https://ccb.jhu.edu/software/stringtie/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40856/3d-de-novo-assembly-3d-dna-pipeline</guid>
	<pubDate>Sun, 02 Feb 2020 13:41:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40856/3d-de-novo-assembly-3d-dna-pipeline</link>
	<title><![CDATA[3D de novo assembly (3D DNA) pipeline]]></title>
	<description><![CDATA[<p>For a detailed description of the pipeline and how it integrates with other tools designed by the Aiden Lab see&nbsp;<a href="http://aidenlab.org/assembly/manual_180322.pdf">Genome Assembly Cookbook</a>&nbsp;on&nbsp;<a href="http://aidenlab.org/assembly">http://aidenlab.org/assembly</a>.</p>
<p>For the original version of the pipeline and to reproduce the Hs2-HiC and the AaegL4 genomes reported in&nbsp;<a href="http://science.sciencemag.org/content/356/6333/92">(Dudchenko et al.,&nbsp;<em>Science</em>, 2017)</a>&nbsp;see the&nbsp;<a href="https://github.com/theaidenlab/3d-dna/tree/745779bdf64db6e55bddb70c24e9b58825938c33">original commit</a>.</p>
<p>For the detailed description of the merge section see&nbsp;<a href="https://github.com/theaidenlab/AGWG-merge">https://github.com/theaidenlab/AGWG-merge</a>.</p><p>Address of the bookmark: <a href="https://github.com/theaidenlab/3d-dna" rel="nofollow">https://github.com/theaidenlab/3d-dna</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>