<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43057?offset=520</link>
	<atom:link href="https://bioinformaticsonline.com/related/43057?offset=520" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43714/hiv-genome-database</guid>
	<pubDate>Fri, 21 Jan 2022 05:40:15 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43714/hiv-genome-database</link>
	<title><![CDATA[HIV genome database !]]></title>
	<description><![CDATA[<p>HIV resources</p>
<p>https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html</p><p>Address of the bookmark: <a href="https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html" rel="nofollow">https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43909/human-complete-genome</guid>
	<pubDate>Wed, 06 Jul 2022 06:42:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43909/human-complete-genome</link>
	<title><![CDATA[Human Complete Genome]]></title>
	<description><![CDATA[<h1 dir="auto">Telomere-to-telomere consortium</h1>
<p dir="auto">We have sequenced the CHM13hTERT human cell line with a number of technologies. Human genomic DNA was extracted from the cultured cell line. As the DNA is native, modified bases will be preserved. The data includes 30x&nbsp;<a href="https://www.pacb.com/">PacBio</a>&nbsp;<a href="https://www.ncbi.nlm.nih.gov/sra/?term=SRX789768*+CHM13">HiFi</a>, 120x coverage of&nbsp;<a href="https://nanoporetech.com/">Oxford Nanopore</a>, 70x&nbsp;<a href="https://www.pacb.com/">PacBio</a>&nbsp;CLR, 50x&nbsp;<a href="https://www.10xgenomics.com/">10X Genomics</a>, as well as&nbsp;<a href="https://bionanogenomics.com/technology/dls-technology/">BioNano DLS</a>&nbsp;and&nbsp;<a href="https://arimagenomics.com/kit/">Arima Genomics HiC</a>. Most raw data is available from this site, with the exception of the PacBio data which was generated by the University of Washington/PacBio and is available from&nbsp;<a href="https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&amp;from_uid=269593">NCBI SRA</a>.</p>
<p dir="auto">A UCSC browser is available for&nbsp;<a href="https://genome.ucsc.edu/h/GCA_009914755.4">v2.0</a>&nbsp;(as well as legacy&nbsp;<a href="http://genome.ucsc.edu/cgi-bin/hgTracks?genome=t2t-chm13-v1.0&amp;hubUrl=http://t2t.gi.ucsc.edu/chm13/hub/hub.txt">v1.0</a>&nbsp;and&nbsp;<a href="http://genome.ucsc.edu/cgi-bin/hgTracks?genome=t2t-chm13-v1.1&amp;hubUrl=http://t2t.gi.ucsc.edu/chm13/hub/hub.txt">v1.1</a>&nbsp;versions). An interactive dotplot visualization of all genomic repeats is also available from&nbsp;<a href="https://resgen.io/paper-data/T2T-Nurk-et-al-2021/views/t2t-identity-v2">resgen.io</a>. Known issues identified in the assembly are tracked at&nbsp;<a href="https://github.com/marbl/CHM13-issues">CHM13 issues</a>.</p>
<p dir="auto">&nbsp;</p>
<p dir="auto">MORE at&nbsp;https://github.com/marbl/CHM13</p><p>Address of the bookmark: <a href="https://www.science.org/doi/10.1126/science.abj6987" rel="nofollow">https://www.science.org/doi/10.1126/science.abj6987</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</guid>
	<pubDate>Thu, 31 Aug 2023 02:43:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</link>
	<title><![CDATA[Steps to find all the repeats in the genome !]]></title>
	<description><![CDATA[<div><p>To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. Here's a step-by-step guide:</p></div><div><ol>
<li>Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
</ol></div><div><ol>
<li>Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".</li>
</ol><blockquote><p>./RepeatMasker -pa &lt;number_of_processors&gt; -nolow -norna -no_is -div &lt;divergence_value&gt; -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species &lt;species_name&gt; -dir &lt;output_directory&gt; -length &lt;min_length&gt;-&lt;max_length&gt; genome.fasta</p></blockquote><div><p>Replace the following placeholders with appropriate values:</p><ul>
<li><code>&lt;number_of_processors&gt;</code>: The number of processors/threads you want to use for parallel processing.</li>
<li><code>&lt;divergence_value&gt;</code>: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
<li><code>&lt;species_name&gt;</code>: The name of the species you are analyzing.</li>
<li><code>&lt;output_directory&gt;</code>: The directory where you want the output files to be saved.</li>
<li><code>&lt;min_length&gt;</code>&nbsp;and&nbsp;<code>&lt;max_length&gt;</code>: The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).</li>
</ul></div><div><ol>
<li>Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. You can download it from the source provided.</li>
</ol></div><div><ol>
<li>Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:</li>
</ol><blockquote><p>perl one_code_to_find_them_all.pl --rm &lt;RepeatMasker_out_file&gt; --length &lt;length_file&gt;</p></blockquote></div><p>&nbsp;</p></div><div><div><p>Replace&nbsp;<code>&lt;RepeatMasker_out_file&gt;</code>&nbsp;with the path to your RepeatMasker .out file, and&nbsp;<code>&lt;length_file&gt;</code>&nbsp;with the path to a file containing the lengths of the reference elements.</p></div><div><p>This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.</p></div><div><p>Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.</p></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44628/uncovar-workflow-for-transparent-and-robust-virus-variant-calling-genome-reconstruction-and-lineage-assignment</guid>
	<pubDate>Mon, 05 Aug 2024 23:01:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44628/uncovar-workflow-for-transparent-and-robust-virus-variant-calling-genome-reconstruction-and-lineage-assignment</link>
	<title><![CDATA[UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment]]></title>
	<description><![CDATA[<p>UnCoVar: Workflow for Transparent and Robust Virus Variant Calling, Genome Reconstruction and Lineage Assignment</p>
<ul>
<li>
<p>Using state of the art tools, easily extended for other viruses</p>
</li>
<li>
<p>Tool and database updates for critical components via Conda</p>
</li>
<li>
<p>Built using modern design patterns with Conda and Snakemake</p>
</li>
<li>
<p>Extensible and easy to customize</p>
</li>
<li>
<p>Submission Ready Genomes</p>
</li>
<li>
<p>Customizable reporting with comprehensive visualization</p>
</li>
</ul>
<p>https://ikim-essen.github.io/uncovar/</p>
<p>Github&nbsp;https://github.com/IKIM-Essen/uncovar</p>
<p>&nbsp;</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://ikim-essen.github.io/uncovar/" rel="nofollow">https://ikim-essen.github.io/uncovar/</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44766/genome-simulation-with-slim-and-msprime</guid>
	<pubDate>Fri, 31 Jan 2025 12:47:43 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44766/genome-simulation-with-slim-and-msprime</link>
	<title><![CDATA[Genome Simulation with SLiM and msprime]]></title>
	<description><![CDATA[<p>Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are <strong style="font-size: 12.8px;">SLiM</strong><span style="font-size: 12.8px; font-weight: normal;"> and </span><strong style="font-size: 12.8px;">msprime</strong><span style="font-size: 12.8px; font-weight: normal;">. While both serve different purposes, they can be used together with the </span><strong style="font-size: 12.8px;">slendr</strong><span style="font-size: 12.8px; font-weight: normal;"> framework to compare simulation outputs effectively.</span></p><h2>Overview of SLiM and msprime</h2><h3>SLiM: Forward Genetic Simulator</h3><p>SLiM is a <strong>free, open-source</strong> tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.</p><p><strong>Key Features of SLiM:</strong></p><ul>
<li>
<p>Simulates population evolution forward in time</p>
</li>
<li>
<p>Supports custom evolutionary models using an embedded scripting language</p>
</li>
<li>
<p>Allows modeling of spatial and ecological dynamics</p>
</li>
<li>
<p>Provides high flexibility and extensibility for user-defined scenarios</p>
</li>
<li>
<p>Available on GitHub as an open-source project</p>
</li>
</ul><h3>msprime: Ancestry and Mutation Simulator</h3><p>msprime is an efficient, <strong>open-source</strong> tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.</p><p><strong>Key Features of msprime:</strong></p><ul>
<li>
<p>Implements coalescent simulations for ancestry modeling</p>
</li>
<li>
<p>Efficiently simulates large population histories</p>
</li>
<li>
<p>Supports the addition of mutations to genealogies</p>
</li>
<li>
<p>Developed using an open-source community model</p>
</li>
<li>
<p>Often faster and more memory-efficient than alternative simulators</p>
</li>
</ul><h2>Using SLiM and msprime with slendr</h2><p>Both SLiM and msprime can be integrated with <strong>slendr</strong>, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.</p><h3>How They Work Together:</h3><ul>
<li>
<p>SLiM and msprime simulations can be analyzed within slendr.</p>
</li>
<li>
<p>The <strong>ts_read()</strong> function in slendr enables loading and comparing tree sequence outputs from both simulators.</p>
</li>
<li>
<p>This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.</p>
</li>
</ul><h2>Performance Considerations</h2><p>While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its <strong>speed and memory efficiency</strong> when simulating ancestry and mutations. The choice between the two depends on the research goals:</p><ul>
<li>
<p><strong>For detailed evolutionary modeling with selection and recombination:</strong> Use SLiM.</p>
</li>
<li>
<p><strong>For large-scale coalescent simulations with mutations:</strong> Use msprime.</p>
</li>
<li>
<p><strong>For comparing different simulation models and their outputs:</strong> Use slendr to integrate SLiM and msprime results.</p>
</li>
</ul><h2>Conclusion</h2><p>SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.</p><p>For more information, check out the official GitHub repositories for <strong>SLiM</strong> and <strong>msprime</strong>, and explore the <strong>slendr</strong> framework for streamlined simulation workflow</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37205/afterqc-automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</guid>
	<pubDate>Fri, 29 Jun 2018 03:26:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37205/afterqc-automatic-filtering-trimming-error-removing-and-quality-control-for-fastq-data</link>
	<title><![CDATA[AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data]]></title>
	<description><![CDATA[Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats

The author has reimplemented this tool in C++ with multithreading support to make it much faster. The new tool is called fastp and can be found at: https://github.com/OpenGene/fastp . If you prefer a C++ based tool, please use fastp instead.

https://github.com/OpenGene/AfterQC<p>Address of the bookmark: <a href="https://github.com/OpenGene/AfterQC" rel="nofollow">https://github.com/OpenGene/AfterQC</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32943/npscarf-scaffolding-and-completing-assemblies-in-real-time-fashion</guid>
	<pubDate>Tue, 23 May 2017 04:53:29 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32943/npscarf-scaffolding-and-completing-assemblies-in-real-time-fashion</link>
	<title><![CDATA[npScarf: Scaffolding and Completing Assemblies in Real-time Fashion]]></title>
	<description><![CDATA[<p><em>npScarf</em>&nbsp;(jsa.np.npscarf) is a program that scaffolds and completes draft genomes assemblies in real-time with Oxford Nanopore sequencing. The pipeline can run on a computing cluster as well as on a laptop computer for microbial datasets. It also facilitates the real-time analysis of positional information such as gene ordering and the detection of genes from mobile elements (plasmids and genomic islands).</p>
<p>Complete paper at&nbsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321748/</p><p>Address of the bookmark: <a href="https://github.com/mdcao/npScarf" rel="nofollow">https://github.com/mdcao/npScarf</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36015/repeat-aware-repeat-aware-scaffolding-evaluation-framework-by-igor-mandric</guid>
	<pubDate>Wed, 21 Mar 2018 18:10:00 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36015/repeat-aware-repeat-aware-scaffolding-evaluation-framework-by-igor-mandric</link>
	<title><![CDATA[repeat-aware: Repeat aware scaffolding evaluation framework by Igor Mandric]]></title>
	<description><![CDATA[<p>Genome scaffolding is a classical challenging problem in bioinformatics. It refers to joining assembly contigs into chains (called scaffolds). The join between two contigs A and B is considered correct if:</p>
<ul>
<li>Their relative orientation is correct</li>
<li>Their relative order is correct</li>
<li>The gap estimate is similar to the true distance on the reference</li>
</ul>
<p>The problem of scaffolding validation is also a challenging one. One of the main issues which hinders from an adequate scaffolding evaluation are genome repeats. The previous standard for evaluation&nbsp;<a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r42">(Hunt et al.,&nbsp;<em>Genome Biology</em>, 2014)</a>&nbsp;did not take into account repeats. In this evaluation framework, repeats are taken into account.</p>
<p style="text-align: center;"><a href="https://camo.githubusercontent.com/9675b90205e5bc0dc0b6b84b321b00bc87d8d88e/687474703a2f2f616c616e2e63732e6773752e6564752f7265706561742d61776172652f6669677572652e706e67" target="_blank"><img src="https://camo.githubusercontent.com/9675b90205e5bc0dc0b6b84b321b00bc87d8d88e/687474703a2f2f616c616e2e63732e6773752e6564752f7265706561742d61776172652f6669677572652e706e67" width="75%" alt="image" style="border: 0px;"></a></p>
<p>The new evaluation framework considers the optimal assignment of contigs in the output scaffolding to contigs in the reference scaffolding in the sense of the number of correct links.</p>
<p>&nbsp;</p>
<p>https://github.com/mandricigor/repeat-aware</p><p>Address of the bookmark: <a href="https://github.com/mandricigor/repeat-aware" rel="nofollow">https://github.com/mandricigor/repeat-aware</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/39302/understanding-reads-mapping-and-flags</guid>
	<pubDate>Thu, 25 Apr 2019 09:06:20 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/39302/understanding-reads-mapping-and-flags</link>
	<title><![CDATA[Understanding reads mapping and flags !]]></title>
	<description><![CDATA[<p><strong>Linear Alignment:</strong>&nbsp;An alignment of a read to a single reference sequence that may&nbsp;<q>include insertions, deletions, skips and clipping</q>,&nbsp;<span style="text-decoration: underline;">but may not include direction changes</span>&nbsp;(i.e. one portion of the alignment on forward strand and another portion of alignment on reverse strand).<sup id="fnref:1"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:1"><br /></a></sup></p><p><strong>Chimeric Alignment:</strong>&nbsp;An alignment of a read that cannot be represented as a linear alignment. Typically, one of the linear alignments in a chimeric alignment is considered the &ldquo;representative&rdquo; alignment, and the others are called &ldquo;supplementary&rdquo; and are distinguished by the supplementary alignment flag.<sup id="fnref:1:1"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:1"><br /></a></sup></p><p>Chimeric reads are indicative of structural variation in DNA-seq and it may indicate the presence of&nbsp;<a href="https://en.wikipedia.org/wiki/Chimeric_gene">chimeric genes</a>&nbsp;in RNA-seq.<sup id="fnref:2"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:2"><br /></a></sup></p><p>In short, chimeric reads can be split in to two or more parts, each part would be mapped to reference(it&rsquo;s not&nbsp;<a href="https://www.biostars.org/p/119537/">hard-clipped</a>), the total length of the mapped part is longger than read length.<sup id="fnref:3"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:3"><br /></a></sup></p><p><strong>Representative alignment:</strong>&nbsp;A chimeric alignment that is represented as a set of linear alignments that do not have large overlaps typically has one linear alignment that is considered the representative alignment.<sup id="fnref:4"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:4"><br /></a></sup></p><p>One read can align to multiple positions, we can find one alignmnet position which sequence do not have large overlaps, it called representative alighment, for other alignment positions, we called them supplementary alignment.</p><p>It seems that GATK can realignment those representative reads to the correctly position via&nbsp;<q>RealignerTargetCreator and IndelRealigner</q>. (WARNING: I am not quite sure if I understand this correctly. If someone could help me, please leave me a message below, thanks, thanks.)</p><p><strong>Supplementary Alignment:</strong>&nbsp;A chimeric reads but not a representative reads.</p><p><strong>Primary Alignment and Secondary Alignment:</strong>&nbsp;A read may map ambiguously to multiple locations, e.g. due to repeats.&nbsp;<strong>Only one of the multiple read alignments is considered primary</strong>,<span style="text-decoration: underline;">&nbsp;and this decision may be arbitrary</span>. All other alignments have the secondary alignment flag.<sup id="fnref:5"><a href="https://yulijia.net/en/bioinformatics/2015/12/21/Linear-Chimeric-Supplementary-Primary-and-Secondary-Alignments.html#fn:5"><br /></a></sup></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>