<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/40994?offset=250</link>
	<atom:link href="https://bioinformaticsonline.com/related/40994?offset=250" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</guid>
	<pubDate>Fri, 21 Feb 2025 10:39:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</link>
	<title><![CDATA[NVIDIA and Arc Institute Unveil Evo 2: A Breakthrough AI for DNA Design]]></title>
	<description><![CDATA[<p>NVIDIA and the Arc Institute have introduced <strong style="font-size: 12.8px;">Evo 2</strong>, a groundbreaking AI model designed to <strong style="font-size: 12.8px;">understand, predict, and generate DNA sequences</strong>. This marks a major advancement in computational biology, offering scientists an unprecedented tool to decode the genetic blueprint of life and even design entirely new biological systems.</p><h3><strong>The Power of Evo 2: AI Meets DNA</strong></h3><p>Evo 2 is <strong>the largest AI model for biology ever created</strong>, trained on an astonishing <strong>9.3 trillion DNA "letters"</strong> (nucleotides) carefully selected from genomes spanning the entire tree of life. This massive dataset ensures that Evo 2 can recognize patterns and relationships in genetic sequences at an unparalleled scale.</p><p>For the first time, scientists can <strong>design DNA with AI</strong>, moving beyond simple sequence analysis to active DNA generation. Evo 2 enables researchers to <strong>predict, modify, and even create entire genetic sequences</strong>, opening new possibilities in medicine, agriculture, and synthetic biology.</p><h3><strong>Decoding the Dark Genome</strong></h3><p>One of the biggest challenges in genetics is understanding the <strong>non-coding regions</strong> of DNA&mdash;vast stretches of the genome that do not code for proteins but play crucial roles in regulating gene expression. These regions control when and how genes are activated, influencing everything from development to disease.</p><p>Evo 2 is designed to <strong>decode these non-coding elements</strong>, helping researchers uncover their functions and use this knowledge to develop gene-based therapies, synthetic life forms, and precision agriculture solutions.</p><h3><strong>From Reading DNA to Writing It</strong></h3><p>To put Evo 2&rsquo;s impact into perspective:</p><ul>
<li><strong>Previous AI models could "read" DNA</strong> like a book, analyzing genetic sequences and identifying patterns.</li>
<li><strong>Evo 2 can "write" entirely new DNA</strong>, designing functional genes, chromosomes, and even full genomes from scratch.</li>
</ul><p>This means scientists can now <strong>engineer biological systems with AI</strong>, designing new proteins, metabolic pathways, and genetic circuits to address real-world challenges.</p><h3><strong>A Step Toward Generative Biology</strong></h3><p>The Arc Institute describes Evo 2 as a major step toward <strong>"generative biology"</strong>&mdash;a revolutionary approach where AI is used to create <strong>novel biological structures</strong> rather than just analyzing existing ones. This could lead to breakthroughs such as:</p><ul>
<li><strong>New medicines</strong>: AI-generated enzymes and proteins tailored for targeted therapies.</li>
<li><strong>Disease-resistant crops</strong>: Genetically optimized plants for higher yield and climate resilience.</li>
<li><strong>Synthetic organisms</strong>: Custom-designed microbes for bioremediation, biofuel production, and industrial applications.</li>
</ul><h3><strong>An Open-Source Revolution</strong></h3><p>Unlike many proprietary AI models, <strong>Evo 2 is open source</strong>, making its capabilities accessible to researchers worldwide. This democratization of AI-driven biology means that scientists from different disciplines can <strong>collaborate, experiment, and innovate</strong>, accelerating discoveries in genetic engineering and synthetic biology.</p><p>With Evo 2, the boundaries of what&rsquo;s possible in <strong>DNA design, genetic engineering, and biological innovation</strong> are being redrawn. The future of life sciences is no longer just about understanding life&rsquo;s code&mdash;it&rsquo;s about writing it.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38659/detail-annotation-of-genes</guid>
	<pubDate>Fri, 11 Jan 2019 05:23:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38659/detail-annotation-of-genes</link>
	<title><![CDATA[Detail annotation of genes !]]></title>
	<description><![CDATA[<p>gene_info recalculated daily<br>---------------------------------------------------------------------------<br> tab-delimited<br> one line per GeneID<br> Column header line is the first line in the file.<br> Note: subsets of gene_info are available in the DATA/GENE_INFO<br> directory (described later)<br>---------------------------------------------------------------------------</p>
<p>tax_id:<br> the unique identifier provided by NCBI Taxonomy<br> for the species or strain/isolate</p>
<p>GeneID:<br> the unique identifier for a gene<br> ASN1: geneid</p>
<p>Symbol:<br> the default symbol for the gene<br> ASN1: gene-&gt;locus</p>
<p>LocusTag:<br> the LocusTag value<br> ASN1: gene-&gt;locus-tag</p>
<p>Synonyms:<br> bar-delimited set of unofficial symbols for the gene</p>
<p>dbXrefs:<br> bar-delimited set of identifiers in other databases<br> for this gene. The unit of the set is database:value.<br> Note that HGNC and MGI include 'HGNC' and 'MGI', respectively,<br> in the value part of their identifier. Consequently,<br> dbXrefs for these databases will appear like:<br> HGNC:HGNC:1100<br> This would be interpreted as database='HGNC', value='HGNC:1100'<br> Example for MGI:<br> MGI:MGI:104537<br> This would be interpreted as database='MGI', value='MGI:104537'</p>
<p>chromosome:<br> the chromosome on which this gene is placed.<br> for mitochondrial genomes, the value 'MT' is used.</p>
<p>map location:<br> the map location for this gene</p>
<p>description:<br> a descriptive name for this gene</p>
<p>type of gene:<br> the type assigned to the gene according to the list of options<br> provided in https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/objects/entrezgene/entrezgene.asn</p>
<p><br>Symbol from nomenclature authority:<br> when not '-', indicates that this symbol is from a<br> a nomenclature authority</p>
<p>Full name from nomenclature authority:<br> when not '-', indicates that this full name is from a<br> a nomenclature authority</p>
<p>Nomenclature status:<br> when not '-', indicates the status of the name from the <br> nomenclature authority (O for official, I for interim)</p>
<p>Other designations:<br> pipe-delimited set of some alternate descriptions that<br> have been assigned to a GeneID<br> '-' indicates none is being reported.</p>
<p>Modification date:<br> the last date a gene record was updated, in YYYYMMDD format</p>
<p>Feature type:<br> pipe-delimited set of annotated features and their classes or <br> controlled vocabularies, displayed as feature_type:feature_class <br> or feature_type:controlled_vocabulary, when appropriate; derived <br> from select feature annotations on RefSeq(s) associated with the <br> GeneID</p><p>Address of the bookmark: <a href="ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/" rel="nofollow">ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/8417/conserved-domain-database-cdd-version-311-released</guid>
	<pubDate>Wed, 19 Feb 2014 15:02:40 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/8417/conserved-domain-database-cdd-version-311-released</link>
	<title><![CDATA[Conserved Domain Database (CDD) version 3.11 released]]></title>
	<description><![CDATA[<p>National Center for Biotechnology Information (NCBI) Conserved Domain Database (CDD) version 3.11 is now available with 596 new or updated NCBI-curated and 49,641 total domain models. The new version now contains the most recent Pfam release 27.</p><p><img src="http://www.ncbi.nlm.nih.gov/Structure/cdd/docs/images/np_081086_triangles_site_features_on_query_gi255958238_mouse_mutl1.png" alt="image" width="800" height="415" style="border: 0px; border: 0px;"></p><p>Updates to the Conserved Domain Database include:</p><ul>
<li>Position-specific score matrices (PSSMs) have been recomputed for many models in CDD, and frequency tables have been added to the PSSMs;</li>
</ul><ul>
<li>The search databases distributed as part of this release can now be used with the more recent versions of RPS-BLAST (BLAST release 2.2.28 and up) using composition-based scoring. This abolishes the need to mask out compositionally biased regions in query sequences;</li>
</ul><ul>
<li>Domain annotation displays in CD-Search, BATCH CD-Search, and other services now all use a uniform display style. A new display option in CD-Search and BATCH CD-Search provides “standard” results, in addition to “concise” and “full” results. “Standard” results will provide, for each region on the query sequence, the best0-scoring domain model (if any) from each of CDD’s database providers (Pfam, SMART, COG, TIGRFAMs, Protein Clusters, and the NCBI in-house curation project), but will suppress redundancy from within a single provider's results list.</li>
</ul><p>You can access CDD at the <a href="http://www.ncbi.nlm.nih.gov/cdd">Conserved Domains homepage</a> and find updated content on the <a href="ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd">CDD FTP site</a>.</p><p>Reference:</p><p>NCBI Website</p>]]></description>
	<dc:creator>Shikha Logwani</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/26375/ncbi-remap</guid>
	<pubDate>Thu, 11 Feb 2016 11:02:26 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/26375/ncbi-remap</link>
	<title><![CDATA[NCBI Remap]]></title>
	<description><![CDATA[<p><span><span><strong>NCBI Remap</strong>. This tool is conceptually similar to liftOver in that in manages conversions between a pair of genome assemblies but it uses different methods to achieve these mappings. It is also available through a simple <a href="http://www.ncbi.nlm.nih.gov/genome/tools/remap">web interface</a> or you can use the <a href="http://www.ncbi.nlm.nih.gov/genome/tools/remap/docs/api">API for NCBI Remap</a>.</span></span></p>
<p><span><span>More at http://www.ncbi.nlm.nih.gov/genome/tools/remap</span></span></p>
<p><span><span>API http://www.ncbi.nlm.nih.gov/genome/tools/remap/docs/api</span></span></p><p>Address of the bookmark: <a href="http://www.ncbi.nlm.nih.gov/genome/tools/remap" rel="nofollow">http://www.ncbi.nlm.nih.gov/genome/tools/remap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29410/entrez-direct-e-utilities-on-the-unix-command-line</guid>
	<pubDate>Wed, 19 Oct 2016 08:06:24 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29410/entrez-direct-e-utilities-on-the-unix-command-line</link>
	<title><![CDATA[Entrez Direct: E-utilities on the UNIX Command Line]]></title>
	<description><![CDATA[<p>Entrez Direct (EDirect) is an advanced method for accessing the NCBI's suite of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.</p>
<p>EDirect also provides an argument-driven function that simplifies the extraction of data from document summaries or other results that are returned in structured XML format. This can eliminate the need for writing custom software to answer ad hoc questions. Queries can move seamlessly between EDirect commands and UNIX utilities or scripts to perform actions that cannot be accomplished entirely within Entrez.</p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/books/NBK179288/" rel="nofollow">https://www.ncbi.nlm.nih.gov/books/NBK179288/</a></p>]]></description>
	<dc:creator>Anjana</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41482/magic-blast</guid>
	<pubDate>Fri, 20 Mar 2020 15:18:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41482/magic-blast</link>
	<title><![CDATA[Magic-BLAST]]></title>
	<description><![CDATA[<p>Magic-BLAST is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. Each alignment optimizes a composite score, taking into account simultaneously the two reads of a pair, and in case of RNA-seq, locating the candidate introns and adding up the score of all exons. This is very different from other versions of BLAST, where each exon is scored as a separate hit and read-pairing is ignored.</p><p>Address of the bookmark: <a href="https://ncbi.github.io/magicblast/" rel="nofollow">https://ncbi.github.io/magicblast/</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44604/new-release-of-refseq</guid>
	<pubDate>Tue, 16 Jul 2024 10:09:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44604/new-release-of-refseq</link>
	<title><![CDATA[New Release of RefSeq !]]></title>
	<description><![CDATA[<p>Check out RefSeq release 225, now available&nbsp;<a href="https://www.ncbi.nlm.nih.gov/refseq/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=refseq-release-225-20240715">online</a>&nbsp;and from the&nbsp;<a href="https://ftp.ncbi.nlm.nih.gov/refseq/release/">FTP</a>&nbsp;site. You can access RefSeq data through&nbsp;<a href="https://www.ncbi.nlm.nih.gov/datasets/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=refseq-release-225-20240715">NCBI Datasets</a>.</p><h5>What&rsquo;s included in this release?</h5><p>As of July 8, 2024, this full release incorporates genomic, transcript, and protein data containing:</p><ul>
<li><span>448,507,905 records</span></li>
<li><span>334,845,613 proteins</span></li>
<li><span>63,542,774 RNAs</span></li>
<li><span>Sequences from 152,668 organisms</span></li>
</ul><p>The release is provided in several directories as a complete dataset and also as divided by logical groupings.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34702/run-miniasm-assembler-on-nanopore-reads</guid>
	<pubDate>Mon, 18 Dec 2017 04:07:50 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34702/run-miniasm-assembler-on-nanopore-reads</link>
	<title><![CDATA[Run miniasm assembler on nanopore reads !]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p><p>Find the detail of the reads repeats:</p><blockquote><p>fq2fa ONT_A.fastq ONT_A.fasta&nbsp;<br /><br />minimap2 -xava-ont ONT_A.fasta ONT_A.fasta -t10 -X &gt; AONT.paf&nbsp;<br /><br />awk '{if($1==$6){print}}' AONT.paf &gt; AONTself.paf&nbsp;<br /><br />awk '$5=="-"' AONTself.paf | awk '{print $1}'| sort|uniq &gt; invertedrepeat.list</p></blockquote><p>Generated a few palindrome and repeats plots (highlighting only repeats largest than 10, 20 and 30 kb)</p><blockquote><p>minidot -f 5 -m 30000 AONTself.paf &gt; AONTself30000.eps&nbsp;<br />sed 's/_template_pass_FAH31515//' AONTself30000.eps &gt; AONTself30000final.eps&nbsp;<br /><br />minidot -f 5 -m 20000 AONTself.paf &gt; AONTself20000.eps&nbsp;<br />sed 's/_template_pass_FAH31515//' AONTself20000.eps &gt; AONTself20000final.eps&nbsp;<br /><br />minidot -f 5 -m 10000 AONTself.paf &gt; AONTself10000.eps&nbsp;<br />sed 's/_template_pass_FAH31515//' AONTself10000.eps &gt; AONTself10000final.eps&nbsp;</p></blockquote><p>Assemble with miniasm:</p><blockquote><p>miniasm -f ONT_A.fasta AONT.paf &gt; AONT.gfa&nbsp;</p><p>grep '^S' AONT.gfa |awk '{print "&gt;"$2"\n"$3}' &gt; AONT_miniasm.fasta&nbsp;<br /><br />minimap2 -xasm10 AONT_miniasm.fasta AONT_miniasm.fasta -t1 -X &gt; AONT_miniasm.paf&nbsp;<br /><br />awk '{if($1==$6){print}}' AONT_miniasm.paf &gt; AONT_miniasm_self.paf&nbsp;<br /><br />minidot -f 5 -m 10000 AONT_miniasm_self.paf &gt; AONT_miniasm_self10000.eps&nbsp;</p></blockquote><p>Njoy the assembly !</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</guid>
	<pubDate>Wed, 23 May 2018 06:54:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</link>
	<title><![CDATA[BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,]]></title>
	<description><![CDATA[<p><span>BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.</span></p>
<p>Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The &ldquo;target.fasta.sa&rdquo; is the suffix array from &ldquo;target.fasta&rdquo; generated by sawriter.</p>
<blockquote>
<p>blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15</p>
</blockquote>
<p>the output format option &ldquo;-m 4&Prime; generate the alignment coordinate. Not fully documented, but I can explain that to you.&nbsp;</p>
<p>I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.</p><p>Address of the bookmark: <a href="http://bix.ucsd.edu/projects/blasr/" rel="nofollow">http://bix.ucsd.edu/projects/blasr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36812/porechop-tool-for-finding-and-removing-adapters-from-oxford-nanopore-reads</guid>
	<pubDate>Tue, 29 May 2018 07:33:44 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36812/porechop-tool-for-finding-and-removing-adapters-from-oxford-nanopore-reads</link>
	<title><![CDATA[Porechop:  tool for finding and removing adapters from Oxford Nanopore reads]]></title>
	<description><![CDATA[<p>Porechop is a tool for finding and removing adapters from <a href="https://nanoporetech.com/">Oxford Nanopore</a> reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.</p>
<p>Porechop also supports demultiplexing of Nanopore reads that were barcoded with the <a href="https://store.nanoporetech.com/native-barcoding-kit-1d.html">Native Barcoding Kit</a>, <a href="https://store.nanoporetech.com/pcr-barcoding-kit-96.html">PCR Barcoding Kit</a> or <a href="https://store.nanoporetech.com/rapid-barcoding-sequencing-kit.html">Rapid Barcoding Kit</a>.</p><p>Address of the bookmark: <a href="https://github.com/rrwick/Porechop" rel="nofollow">https://github.com/rrwick/Porechop</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>