<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/19090?offset=1290</link>
	<atom:link href="https://bioinformaticsonline.com/related/19090?offset=1290" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</guid>
	<pubDate>Wed, 08 Mar 2023 01:39:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</link>
	<title><![CDATA[ALE: Assembly Likelihood Estimator]]></title>
	<description><![CDATA[<p>Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly.&nbsp; Depending on your genome size you may want to convert the .wig files to the BigWig format.</p><p>Address of the bookmark: <a href="https://github.com/sc932/ALE" rel="nofollow">https://github.com/sc932/ALE</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</guid>
	<pubDate>Thu, 31 Aug 2023 02:43:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</link>
	<title><![CDATA[Steps to find all the repeats in the genome !]]></title>
	<description><![CDATA[<div><p>To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. Here's a step-by-step guide:</p></div><div><ol>
<li>Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
</ol></div><div><ol>
<li>Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".</li>
</ol><blockquote><p>./RepeatMasker -pa &lt;number_of_processors&gt; -nolow -norna -no_is -div &lt;divergence_value&gt; -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species &lt;species_name&gt; -dir &lt;output_directory&gt; -length &lt;min_length&gt;-&lt;max_length&gt; genome.fasta</p></blockquote><div><p>Replace the following placeholders with appropriate values:</p><ul>
<li><code>&lt;number_of_processors&gt;</code>: The number of processors/threads you want to use for parallel processing.</li>
<li><code>&lt;divergence_value&gt;</code>: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
<li><code>&lt;species_name&gt;</code>: The name of the species you are analyzing.</li>
<li><code>&lt;output_directory&gt;</code>: The directory where you want the output files to be saved.</li>
<li><code>&lt;min_length&gt;</code>&nbsp;and&nbsp;<code>&lt;max_length&gt;</code>: The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).</li>
</ul></div><div><ol>
<li>Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. You can download it from the source provided.</li>
</ol></div><div><ol>
<li>Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:</li>
</ol><blockquote><p>perl one_code_to_find_them_all.pl --rm &lt;RepeatMasker_out_file&gt; --length &lt;length_file&gt;</p></blockquote></div><p>&nbsp;</p></div><div><div><p>Replace&nbsp;<code>&lt;RepeatMasker_out_file&gt;</code>&nbsp;with the path to your RepeatMasker .out file, and&nbsp;<code>&lt;length_file&gt;</code>&nbsp;with the path to a file containing the lengths of the reference elements.</p></div><div><p>This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.</p></div><div><p>Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.</p></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</guid>
	<pubDate>Thu, 08 Aug 2024 23:31:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</link>
	<title><![CDATA[Tools to access the quality of your assembled genome !]]></title>
	<description><![CDATA[<ul dir="auto">
<li><a href="https://github.com/linsalrob/fasta_validator">FASTA VALIDATOR</a>&nbsp;+&nbsp;<a href="https://github.com/shenwei356/seqkit">SEQKIT RMDUP</a>: FASTA validation</li>
<li><a href="https://genometools.org/tools/gt_gff3validator.html">GENOMETOOLS GT GFF3VALIDATOR</a>: GFF3 validation</li>
<li><a href="https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl">ASSEMBLATHON STATS</a>: Assembly statistics</li>
<li><a href="https://genometools.org/tools/gt_stat.html">GENOMETOOLS GT STAT</a>: Annotation statistics</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS ADAPTOR</a>: Adaptor contamination pass/fail</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS GX</a>: Foreign organism contamination pass/fail</li>
<li><a href="https://gitlab.com/ezlab/busco">BUSCO</a>: Gene-space completeness estimation</li>
<li><a href="https://github.com/tolkit/telomeric-identifier">TIDK</a>: Telomere repeat identification</li>
<li><a href="https://github.com/oushujun/LTR_retriever/blob/master/LAI">LAI</a>: Continuity of repetitive sequences</li>
<li><a href="https://github.com/DerrickWood/kraken2">KRAKEN2</a>: Taxonomy classification</li>
<li><a href="https://github.com/igvteam/juicebox.js">HIC CONTACT MAP</a>: Alignment and visualisation of HiC data</li>
<li><a href="https://github.com/mummer4/mummer">MUMMER</a>&nbsp;&rarr;&nbsp;<a href="http://circos.ca/documentation/">CIRCOS</a>&nbsp;+&nbsp;<a href="https://plotly.com/">DOTPLOT</a>&nbsp;&amp;&nbsp;<a href="https://github.com/lh3/minimap2">MINIMAP2</a>&nbsp;&rarr;&nbsp;<a href="https://github.com/schneebergerlab/plotsr">PLOTSR</a>: Synteny analysis</li>
<li><a href="https://github.com/marbl/merqury">MERQURY</a>: K-mer completeness, consensus quality and phasing assessment</li>
</ul>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</guid>
	<pubDate>Fri, 21 Feb 2025 10:39:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44770/nvidia-and-arc-institute-unveil-evo-2-a-breakthrough-ai-for-dna-design</link>
	<title><![CDATA[NVIDIA and Arc Institute Unveil Evo 2: A Breakthrough AI for DNA Design]]></title>
	<description><![CDATA[<p>NVIDIA and the Arc Institute have introduced <strong style="font-size: 12.8px;">Evo 2</strong>, a groundbreaking AI model designed to <strong style="font-size: 12.8px;">understand, predict, and generate DNA sequences</strong>. This marks a major advancement in computational biology, offering scientists an unprecedented tool to decode the genetic blueprint of life and even design entirely new biological systems.</p><h3><strong>The Power of Evo 2: AI Meets DNA</strong></h3><p>Evo 2 is <strong>the largest AI model for biology ever created</strong>, trained on an astonishing <strong>9.3 trillion DNA "letters"</strong> (nucleotides) carefully selected from genomes spanning the entire tree of life. This massive dataset ensures that Evo 2 can recognize patterns and relationships in genetic sequences at an unparalleled scale.</p><p>For the first time, scientists can <strong>design DNA with AI</strong>, moving beyond simple sequence analysis to active DNA generation. Evo 2 enables researchers to <strong>predict, modify, and even create entire genetic sequences</strong>, opening new possibilities in medicine, agriculture, and synthetic biology.</p><h3><strong>Decoding the Dark Genome</strong></h3><p>One of the biggest challenges in genetics is understanding the <strong>non-coding regions</strong> of DNA&mdash;vast stretches of the genome that do not code for proteins but play crucial roles in regulating gene expression. These regions control when and how genes are activated, influencing everything from development to disease.</p><p>Evo 2 is designed to <strong>decode these non-coding elements</strong>, helping researchers uncover their functions and use this knowledge to develop gene-based therapies, synthetic life forms, and precision agriculture solutions.</p><h3><strong>From Reading DNA to Writing It</strong></h3><p>To put Evo 2&rsquo;s impact into perspective:</p><ul>
<li><strong>Previous AI models could "read" DNA</strong> like a book, analyzing genetic sequences and identifying patterns.</li>
<li><strong>Evo 2 can "write" entirely new DNA</strong>, designing functional genes, chromosomes, and even full genomes from scratch.</li>
</ul><p>This means scientists can now <strong>engineer biological systems with AI</strong>, designing new proteins, metabolic pathways, and genetic circuits to address real-world challenges.</p><h3><strong>A Step Toward Generative Biology</strong></h3><p>The Arc Institute describes Evo 2 as a major step toward <strong>"generative biology"</strong>&mdash;a revolutionary approach where AI is used to create <strong>novel biological structures</strong> rather than just analyzing existing ones. This could lead to breakthroughs such as:</p><ul>
<li><strong>New medicines</strong>: AI-generated enzymes and proteins tailored for targeted therapies.</li>
<li><strong>Disease-resistant crops</strong>: Genetically optimized plants for higher yield and climate resilience.</li>
<li><strong>Synthetic organisms</strong>: Custom-designed microbes for bioremediation, biofuel production, and industrial applications.</li>
</ul><h3><strong>An Open-Source Revolution</strong></h3><p>Unlike many proprietary AI models, <strong>Evo 2 is open source</strong>, making its capabilities accessible to researchers worldwide. This democratization of AI-driven biology means that scientists from different disciplines can <strong>collaborate, experiment, and innovate</strong>, accelerating discoveries in genetic engineering and synthetic biology.</p><p>With Evo 2, the boundaries of what&rsquo;s possible in <strong>DNA design, genetic engineering, and biological innovation</strong> are being redrawn. The future of life sciences is no longer just about understanding life&rsquo;s code&mdash;it&rsquo;s about writing it.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/researchlabs/view/22393/narcis-fernandez-fuentes-lab</guid>
  <pubDate>Mon, 25 May 2015 07:30:00 -0500</pubDate>
  <link></link>
  <title><![CDATA[Narcis Fernandez-Fuentes Lab]]></title>
  <description><![CDATA[
<p>Welcome to our web-site compiling all the research-related activities of the group. Our research interests relate to a number of areas within Bioinformatics. We have a long-standing interest in protein structure prediction and structure-to-function relationships. We work in the study of biomolecular interactions, modeling of protein complexes, the study and characterization of protein-protein interactions, peptide design, modeling of genetic variation, structure-based protein design and different aspects of Plant Bioinformatics. Take a look at the our databases and servers and the list of publications for more information.</p>

<p>More at http://www.bioinsilico.org/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/videolist/watch/3918/the-human-genome-project-video-3d-animation-introduction-low</guid>
	<pubDate>Sat, 24 Aug 2013 19:01:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/videolist/watch/3918/the-human-genome-project-video-3d-animation-introduction-low</link>
	<title><![CDATA[The Human Genome Project Video   3D Animation Introduction Low)]]></title>
	<description><![CDATA[<iframe width="" height="" src="https://www.youtube-nocookie.com/embed/YxoQFSBwyms" frameborder="0" allowfullscreen></iframe>]]></description>
	
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/33486/quick-next-generation-sequencing-ngs-terms-definition</guid>
	<pubDate>Fri, 09 Jun 2017 04:52:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/33486/quick-next-generation-sequencing-ngs-terms-definition</link>
	<title><![CDATA[Quick next generation sequencing (NGS) terms definition]]></title>
	<description><![CDATA[<p><strong>fragment size:</strong><span>&nbsp;the Illumina WGS protocol generates paired-end reads from both ends of longer fragments. The lengths of these fragments are assumed to be sampled from a normal distribution. Therefore, in the absence of structural variants, mapping locations of the paired ends span within an interval [&delta;min,&delta;max]. Most (&gt;90%) of paired-end reads are sampled from no-SV regions, therefore the fragment size distribution can be learned empirically for each WGS data set separately.</span><br /><br /><strong>concordant reads:</strong><span>&nbsp;a read pair is called concordant if they can be mapped to the reference genome as &ldquo;expected&rdquo;: (a) mapped to opposing strands where the upstream read is mapped to the forward strand and the downstream read is mapped to the reverse strand2, (b) the distance between ends is between the minimum and maximum expected fragment size.</span><br /><br /><strong>discordant reads:</strong><span>&nbsp;briefly, any non-concordant read pair is considered discordant. Note that, by definition, the discordant read pairs signal potential SVs. The sequence signature produced by these type of reads is known as read-pair signature.</span><br /><br /><strong>split reads:</strong><span>&nbsp;a read that can only be mapped to the reference genome by breaking into two sub-reads is called a split-read. These types of reads also indicate a potential SV or a short insertion or deletion (indel).</span><br /><br /><strong>read depth:</strong><span>&nbsp;number of reads that map within a region of the genome. Overall genome-wide read depth is also referred to as depth of coverage. It is expected that the number of reads that &ldquo;cover&rdquo; each base-pair to follow a Poisson distribution. Therefore, if the read depth over a certain region deviates significantly from this distribution, it signals for a potential copy number variation (CNV).</span></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36884/halc-high-throughput-algorithm-for-long-read-error-correction</guid>
	<pubDate>Fri, 08 Jun 2018 10:47:41 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36884/halc-high-throughput-algorithm-for-long-read-error-correction</link>
	<title><![CDATA[HALC: High throughput algorithm for long read error correction]]></title>
	<description><![CDATA[HALC, a high throughput algorithm for long read error correction. HALC aligns the long reads to short read contigs from the same species with a relatively low identity requirement so that a long read region can be aligned to at least one contig region, including its true genome region’s repeats in the contigs sufficiently similar to it (similar repeat based alignment approach)

HALC was able to obtain 6.7-41.1% higher throughput than the existing algorithms while maintaining comparable accuracy. The HALC corrected long reads can thus result in 11.4-60.7% longer assembled contigs than the existing algorithms.<p>Address of the bookmark: <a href="https://github.com/lanl001/halc" rel="nofollow">https://github.com/lanl001/halc</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/28051/convert-ensembl-gtf-to-annotation-table-geneid-genesymbol-genewisechrlocation-geneclass-strand-raw</guid>
	<pubDate>Fri, 24 Jun 2016 18:08:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/28051/convert-ensembl-gtf-to-annotation-table-geneid-genesymbol-genewisechrlocation-geneclass-strand-raw</link>
	<title><![CDATA[Convert EnsEMBL GTF to Annotation table (Geneid, GeneSymbol, GeneWiseChrLocation, GeneClass, Strand) Raw]]></title>
	<description><![CDATA[<p><strong>Bash Script source:</strong></p><p>https://gist.github.com/santhilalsubhash/367befcf5216be4b1fd9</p><p>&nbsp;</p><p><strong>Information</strong>:</p><p>This script converts EnsEMBL GTF (Ex:&nbsp;<a href="https://gist.githubusercontent.com/santhilalsubhash/1e7cca357e52a181dc25/raw/cfb803e07900a2baefbb6534f1299fd30cb57a29/sample.GTF">https://gist.githubusercontent.com/santhilalsubhash/1e7cca357e52a181dc25/raw/cfb803e07900a2baefbb6534f1299fd30cb57a29/sample.GTF</a>) file to annotation table format. It generated two files<br />1) Transcript wise chromosome location with information about transcripts (Ex:&nbsp;<a href="https://gist.githubusercontent.com/santhilalsubhash/c7dec516e0338503a4b6/raw/de0af1a39f0005c4ce7321c5ae57fc8b4a14c7f4/sample.GTF_enst_annotation.txt">https://gist.githubusercontent.com/santhilalsubhash/c7dec516e0338503a4b6/raw/de0af1a39f0005c4ce7321c5ae57fc8b4a14c7f4/sample.GTF_enst_annotation.txt</a>)<br />2) Gene wise chromosome location with information about genes (Ex:&nbsp;<a href="https://gist.githubusercontent.com/santhilalsubhash/c92006c5080f0333bec2/raw/d16e0b2440d73b09b486d3c9751cdb248a73fa0b/sample.GTF_ensg_annotation.txt">https://gist.githubusercontent.com/santhilalsubhash/c92006c5080f0333bec2/raw/d16e0b2440d73b09b486d3c9751cdb248a73fa0b/sample.GTF_ensg_annotation.txt</a>)</p><p>Note: You can download GTF files from&nbsp;<a href="http://www.ensembl.org/info/data/ftp/index.html">http://www.ensembl.org/info/data/ftp/index.html</a></p>]]></description>
	<dc:creator>EagleEye</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37830/nquire-a-statistical-framework-for-ploidy-estimation-using-next-generation-sequencing</guid>
	<pubDate>Thu, 04 Oct 2018 05:23:59 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37830/nquire-a-statistical-framework-for-ploidy-estimation-using-next-generation-sequencing</link>
	<title><![CDATA[nQuire: a statistical framework for ploidy estimation using next generation sequencing]]></title>
	<description><![CDATA[<p>nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuireunder the MIT license.</p><p>Address of the bookmark: <a href="https://github.com/clwgg/nQuireunder" rel="nofollow">https://github.com/clwgg/nQuireunder</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>