<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/44783?offset=170</link>
	<atom:link href="https://bioinformaticsonline.com/related/44783?offset=170" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43060/simons-genome-diversity-project</guid>
	<pubDate>Sat, 08 May 2021 21:55:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43060/simons-genome-diversity-project</link>
	<title><![CDATA[Simons Genome Diversity Project]]></title>
	<description><![CDATA[<p><em>Complete genome sequences from more than one hundred diverse human populations</em></p>
<p>All genomes in the dataset were sequenced to at least 30x coverage using Illumina technology. The sequencing reads were mapped and genotyped using a customized procedure that was optimized for population genetic analysis. The researchers eliminated bias of alleles toward matching the human genome reference sequence, and determined genotypes on a single-sample basis to avoid preferential calling of genotypes from populations that had more individuals represented.</p><p>Address of the bookmark: <a href="https://www.simonsfoundation.org/simons-genome-diversity-project/" rel="nofollow">https://www.simonsfoundation.org/simons-genome-diversity-project/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43376/hisat2-index-files-download</guid>
	<pubDate>Wed, 15 Sep 2021 22:17:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43376/hisat2-index-files-download</link>
	<title><![CDATA[HISAT2 Index Files Download !]]></title>
	<description><![CDATA[<p>Resource for downloading all the HISAT2 related files&nbsp;</p>
<p>Please cite:</p>
<blockquote>
<p>Kim, D., Paggi, J.M., Park, C.&nbsp;<em>et al.</em>&nbsp;Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.&nbsp;<em>Nat Biotechnol</em>&nbsp;<strong>37</strong>, 907&ndash;915 (2019).&nbsp;<a href="https://doi.org/10.1038/s41587-019-0201-4" target="_blank">https://doi.org/10.1038/s41587-019-0201-4</a></p>
</blockquote><p>Address of the bookmark: <a href="http://daehwankimlab.github.io/hisat2/download/#h-sapiens" rel="nofollow">http://daehwankimlab.github.io/hisat2/download/#h-sapiens</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/43670/useful-bioinformatics-analysis-tools</guid>
	<pubDate>Thu, 23 Dec 2021 23:10:02 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/43670/useful-bioinformatics-analysis-tools</link>
	<title><![CDATA[Useful Bioinformatics Analysis Tools !]]></title>
	<description><![CDATA[<h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=cometa&amp;subpage=about">CoMeta</a></h3><p><strong>Classificier of reads from metagenomic sequencing experiments.</strong></p><p><span>&bull;&nbsp;&nbsp;Kawulok, J., Deorowicz, S.,&nbsp;</span><em>CoMeta: Classification of Metagenomes Using k-mers</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0121453">PLOS ONE,&nbsp;</a><span>2015; 10(4):1&ndash;23,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=CoMSA&amp;subpage=about">CoMSA</a></h3><p><strong>Compressor of multiple sequence alignments of proteins.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Walczyszyn, J., Debudaj-Grabysz, A.,&nbsp;</span><em>CoMSA: compression of protein multiple sequence alignment files</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty619">Bioinformatics,&nbsp;</a><span>2019; 35(2):22&ndash;234,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=dsrc&amp;subpage=about">DSRC</a></h3><p><strong>Compressor of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Roguski, L., Deorowicz, S.,&nbsp;</span><em>DSRC 2: Industry-oriented compression of FASTQ files</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/30/15/2213">Bioinformatics,&nbsp;</a><span>2014; 30(15):2213&ndash;2215,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Compression of DNA sequences in FASTQ format</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/">Bioinformatics,&nbsp;</a><span>2011; 27(6):860&ndash;862,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=famsa&amp;subpage=about">FAMSA</a></h3><p><strong>Multiple sequence alignment designed for huge families of proteins (even containing hundreds of thousands of sequences).</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A.,&nbsp;</span><em>FAMSA: Fast and accurate multiple sequence alignment of huge protein families</em><span>,&nbsp;</span><a href="http://www.nature.com/articles/srep33964">Scientific Reports,&nbsp;</a><span>2016; 6(33964):</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=fastore&amp;subpage=about">FaStore</a></h3><p><strong>Compressor of FASTQ files.</strong></p><p><span>&bull;&nbsp;&nbsp;Roguski, L., Ochoa, I., Hernaez, M., Deorowicz, S.,&nbsp;</span><em>FaStore - a space-saving solution for raw sequencing data</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty205">Bioinformatics,&nbsp;</a><span>2018; 34(16):2748&ndash;2756,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=fqsqueezer&amp;subpage=about">FQSqueezer</a></h3><p><strong>Experimental high-end compressor of FASTQ files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S.,&nbsp;</span><em>FQSqueezer: k-mer-based compression of sequencing data</em><span>,&nbsp;</span><a href="https://www.nature.com/articles/s41598-020-57452-6">Scientific Reports,&nbsp;</a><span>2020; 10(578):</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gdc&amp;subpage=about">GDC</a></h3><p><strong>Compressor of collections of genome sequences.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A., Niemiec, M.,&nbsp;</span><em>GDC 2: Compression of large collections of genomes</em><span>,&nbsp;</span><a href="http://www.nature.com/srep/2015/150625/srep11565/full/srep11565.html">Scientific Reports,&nbsp;</a><span>2015; 5(11565):1&ndash;12,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Robust relative compression of genomes with random access</em><span>,&nbsp;</span><a href="http://sun.aei.polsl.pl/REFRESH/bioinformatics.oxfordjournals.org/content/27/21/2979.abstract">Bioinformatics,&nbsp;</a><span>2011; 27(21):2979&ndash;2986,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gtc&amp;subpage=about">GTC</a></h3><p><strong>Genotype databases compressor with support for fast queries.</strong></p><p><span>&bull;&nbsp;&nbsp;Danek, A., Deorowicz, S.,&nbsp;</span><em>GTC: how to maintain huge genotype collections in a compressed form</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty023">Bioinformatics,&nbsp;</a><span>2018; 34(11):1834&ndash;1840,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=gtshark&amp;subpage=about">GTShark</a></h3><p><strong>Genotypes compressor.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A.,&nbsp;</span><em>GTShark: Genotype compression in large projects</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btz508">Bioinformatics,&nbsp;</a><span>2019; 35(22):4791&ndash;4793,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=kmc&amp;subpage=about">KMC</a></h3><p><strong>Memory frugal&nbsp;<em>k</em>-mer counter.</strong></p><p><span>&bull;&nbsp;&nbsp;Kokot, M., Długosz, M., Deorowicz, S.,&nbsp;</span><em>KMC 3: counting and manipulating k -mer statistics</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btx304">Bioinformatics,&nbsp;</a><span>2017; 33(17):2759&ndash;2761,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Kokot, M., Grabowski, Sz., Debudaj-Grabysz, A.,&nbsp;</span><em>KMC 2: Fast and resource-frugal k-mer counting</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/btv022">Bioinformatics,&nbsp;</a><span>2015; 31(10):1569&ndash;1576,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Grabowski, Sz.,&nbsp;</span><em>Disk-based k-mer counting on a PC</em><span>,&nbsp;</span><a href="http://www.biomedcentral.com/1471-2105/14/160">BMC Bioinformatics,&nbsp;</a><span>2013; 14():Article no. 160,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=kmer-db&amp;subpage=about">Kmer-db</a></h3><p><strong>Tool for estimation of evolutionary distances in a collection of genomes.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Gudys, A., Dlugosz, M., Kokot, M., Danek, A.,&nbsp;</span><em>Kmer-db: instant evolutionary distance estimation</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty610">Bioinformatics,&nbsp;</a><span>2019; 35(1):133&ndash;136,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=mugi&amp;subpage=about">MuGI</a></h3><p><strong>Index allowing queries for a collection of multiple genome sequences.</strong></p><p><span>&bull;&nbsp;&nbsp;Danek, A., Deorowicz, S., Grabowski, Sz.,&nbsp;</span><em>Indexes of Large Genome Collections on a PC</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0109384">PLOS ONE,&nbsp;</a><span>2014; 9(10):e109384,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=orcom&amp;subpage=about">ORCOM</a></h3><p><strong>Experimental compressor of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Grabowski, Sz., Deorowicz, S., Roguski, L.,&nbsp;</span><em>Disk-based compression of data from genome sequencing</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/early/2014/12/22/bioinformatics.btu844.abstract">Bioinformatics,&nbsp;</a><span>2014; 31(9):1389&ndash;1395,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=pgsa&amp;subpage=about">PgSA</a></h3><p><strong>Index allowing queries for a collection of sequencing reads.</strong></p><p><span>&bull;&nbsp;&nbsp;Kowalski, T., Grabowski, Sz., Deorowicz, S.,&nbsp;</span><em>Indexing arbitrary-length k-mers in sequencing reads</em><span>,&nbsp;</span><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0133198">PLOS ONE,&nbsp;</a><span>2015; 10(7):1&ndash;16,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=quickprobs&amp;subpage=about">QuickProbs</a></h3><p><strong>Multiple sequence alignment designed especially for GPU.</strong></p><p><span>&bull;&nbsp;&nbsp;Gudys, A., Deorowicz, S.,&nbsp;</span><em>QuickProbs 2: towards rapid construction of high-quality alignments of large protein families</em><span>,&nbsp;</span><a href="http://www.nature.com/articles/srep41553">Scientific Reports,&nbsp;</a><span>2017; 7(41553):</span><br /><span>&bull;&nbsp;&nbsp;Gudys, A., Deorowicz, S.,&nbsp;</span><em>QuickProbs &ndash; A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors</em><span>,&nbsp;</span><a href="http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0088901">PLOS ONE,&nbsp;</a><span>2014; 9(2):e88901,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=reckoner&amp;subpage=about">RECKONER</a></h3><p><strong>Read error corrector.</strong></p><p><span>&bull;&nbsp;&nbsp;Maciej Długosz, M., Deorowicz, S.,&nbsp;</span><em>RECKONER: read error corrector based on KMC</em><span>,&nbsp;</span><a href="https://academic.oup.com/bioinformatics/article-abstract/33/7/1086/2843893/RECKONER-read-error-corrector-based-on-KMC">Bioinformatics,&nbsp;</a><span>2017; 33(7):1086&ndash;1089,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=tgc&amp;subpage=about">TGC</a></h3><p><strong>Compressor of collections of genomes given in Variant Call Format (VCF) files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A., Grabowski, Sz.,&nbsp;</span><em>Genome compression: a novel approach for large collections</em><span>,&nbsp;</span><a href="http://bioinformatics.oxfordjournals.org/content/early/2013/08/29/bioinformatics.btt460">Bioinformatics,&nbsp;</a><span>2013; 29(20):2572&ndash;2578,</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=vcfshark&amp;subpage=about">VCFShark</a></h3><p><strong>Compressor of VCF files.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Danek, A.,&nbsp;</span><em>GTShark: Genotype compression in large projects</em><span>,&nbsp;</span><a href="https://www.biorxiv.org/content/10.1101/2020.12.18.423437v1">biorxiv.org,&nbsp;</a><span>2020; ():</span></p><h3><a href="http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&amp;project=whisper&amp;subpage=about">Whisper</a></h3><p><strong>Experimental mapper of whole genome sequencing data.</strong></p><p><span>&bull;&nbsp;&nbsp;Deorowicz, S., Gudys, A.,&nbsp;</span><em>Whisper 2: indel-sensitive short read mapping</em><span>,&nbsp;</span><a href="https://doi.org/10.1101/2019.12.18.881292">bioRxiv.org,&nbsp;</a><span>2019; :</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz.,&nbsp;</span><em>Whisper: read sorting allows robust robust mapping of DNA sequencing data</em><span>,&nbsp;</span><a href="https://doi.org/10.1093/bioinformatics/bty927">Bioinformatics,&nbsp;</a><span>2019; 35(12):2043&ndash;2050,</span><br /><span>&bull;&nbsp;&nbsp;Deorowicz, S., Debudaj-Grabysz, A., Gudys, A., Grabowski, Sz.,&nbsp;</span><em>Robust mapping of whole genome sequencing data</em><span>,&nbsp;</span><a href="https://meetings.cshl.edu/abstracts.aspx?meet=GENOME&amp;year=17">Poster at The Biology of Genomes Conference,&nbsp;</a><span>2017;</span></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43799/kast</guid>
	<pubDate>Wed, 23 Feb 2022 08:28:36 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43799/kast</link>
	<title><![CDATA[KAST]]></title>
	<description><![CDATA[<p><span>Perform Alignment-free k-tuple frequency comparisons from sequences. This can be in the form of two input files (e.g. a reference and a query) or a single file for pairwise comparisons to be made.</span></p><p>Address of the bookmark: <a href="https://github.com/martinjvickers/KAST" rel="nofollow">https://github.com/martinjvickers/KAST</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</guid>
	<pubDate>Wed, 08 Mar 2023 01:39:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44223/ale-assembly-likelihood-estimator</link>
	<title><![CDATA[ALE: Assembly Likelihood Estimator]]></title>
	<description><![CDATA[<p>Just import the assembly, bam and ALE scores. You can convert the .ale file to a set of .wig files with ale2wiggle.py and IGV can read those directly.&nbsp; Depending on your genome size you may want to convert the .wig files to the BigWig format.</p><p>Address of the bookmark: <a href="https://github.com/sc932/ALE" rel="nofollow">https://github.com/sc932/ALE</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</guid>
	<pubDate>Thu, 31 Aug 2023 02:43:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44371/steps-to-find-all-the-repeats-in-the-genome</link>
	<title><![CDATA[Steps to find all the repeats in the genome !]]></title>
	<description><![CDATA[<div><p>To find repeats in a genome from 2 to 9 length using a Perl script, you can use the RepeatMasker tool with the "--length" option<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. Here's a step-by-step guide:</p></div><div><ol>
<li>Install RepeatMasker: First, you need to install RepeatMasker on your system. You can download it from the RepeatMasker website<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
</ol></div><div><ol>
<li>Prepare the genome sequence: Make sure you have the genome sequence in a FASTA file format. Let's assume the file is named "genome.fasta".</li>
</ol><blockquote><p>./RepeatMasker -pa &lt;number_of_processors&gt; -nolow -norna -no_is -div &lt;divergence_value&gt; -lib RepeatMaskerLib.embl -gff -xsmall -small -poly -species &lt;species_name&gt; -dir &lt;output_directory&gt; -length &lt;min_length&gt;-&lt;max_length&gt; genome.fasta</p></blockquote><div><p>Replace the following placeholders with appropriate values:</p><ul>
<li><code>&lt;number_of_processors&gt;</code>: The number of processors/threads you want to use for parallel processing.</li>
<li><code>&lt;divergence_value&gt;</code>: The divergence value for the species you are analyzing. You can find divergence values for different species in the RepeatMasker documentation<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>.</li>
<li><code>&lt;species_name&gt;</code>: The name of the species you are analyzing.</li>
<li><code>&lt;output_directory&gt;</code>: The directory where you want the output files to be saved.</li>
<li><code>&lt;min_length&gt;</code>&nbsp;and&nbsp;<code>&lt;max_length&gt;</code>: The minimum and maximum lengths of the repeats you want to find (in this case, 2 and 9).</li>
</ul></div><div><ol>
<li>Analyze the output: RepeatMasker will generate several output files, including a .out file. You can parse this file to extract the information you need. There is a Perl tool called "one_code_to_find_them_all.pl" that can help you parse RepeatMasker output files<a href="https://mobilednajournal.biomedcentral.com/articles/10.1186/1759-8753-5-13" target="_blank">[0]</a>. You can download it from the source provided.</li>
</ol></div><div><ol>
<li>Use the provided Perl script: Once you have the "one_code_to_find_them_all.pl" script, you can run it to conveniently parse the RepeatMasker output files. Here's an example of how to use it:</li>
</ol><blockquote><p>perl one_code_to_find_them_all.pl --rm &lt;RepeatMasker_out_file&gt; --length &lt;length_file&gt;</p></blockquote></div><p>&nbsp;</p></div><div><div><p>Replace&nbsp;<code>&lt;RepeatMasker_out_file&gt;</code>&nbsp;with the path to your RepeatMasker .out file, and&nbsp;<code>&lt;length_file&gt;</code>&nbsp;with the path to a file containing the lengths of the reference elements.</p></div><div><p>This script will generate several output files, including .log.txt and .copynumber.csv, which contain quantitative information about the identified repeat elements.</p></div><div><p>Remember to adjust the parameters and options according to your specific needs and the characteristics of your genome.</p></div></div>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</guid>
	<pubDate>Thu, 08 Aug 2024 23:31:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44637/tools-to-access-the-quality-of-your-assembled-genome</link>
	<title><![CDATA[Tools to access the quality of your assembled genome !]]></title>
	<description><![CDATA[<ul dir="auto">
<li><a href="https://github.com/linsalrob/fasta_validator">FASTA VALIDATOR</a>&nbsp;+&nbsp;<a href="https://github.com/shenwei356/seqkit">SEQKIT RMDUP</a>: FASTA validation</li>
<li><a href="https://genometools.org/tools/gt_gff3validator.html">GENOMETOOLS GT GFF3VALIDATOR</a>: GFF3 validation</li>
<li><a href="https://github.com/PlantandFoodResearch/assemblathon2-analysis/blob/a93cba25d847434f7eadc04e63b58c567c46a56d/assemblathon_stats.pl">ASSEMBLATHON STATS</a>: Assembly statistics</li>
<li><a href="https://genometools.org/tools/gt_stat.html">GENOMETOOLS GT STAT</a>: Annotation statistics</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS ADAPTOR</a>: Adaptor contamination pass/fail</li>
<li><a href="https://github.com/ncbi/fcs">NCBI FCS GX</a>: Foreign organism contamination pass/fail</li>
<li><a href="https://gitlab.com/ezlab/busco">BUSCO</a>: Gene-space completeness estimation</li>
<li><a href="https://github.com/tolkit/telomeric-identifier">TIDK</a>: Telomere repeat identification</li>
<li><a href="https://github.com/oushujun/LTR_retriever/blob/master/LAI">LAI</a>: Continuity of repetitive sequences</li>
<li><a href="https://github.com/DerrickWood/kraken2">KRAKEN2</a>: Taxonomy classification</li>
<li><a href="https://github.com/igvteam/juicebox.js">HIC CONTACT MAP</a>: Alignment and visualisation of HiC data</li>
<li><a href="https://github.com/mummer4/mummer">MUMMER</a>&nbsp;&rarr;&nbsp;<a href="http://circos.ca/documentation/">CIRCOS</a>&nbsp;+&nbsp;<a href="https://plotly.com/">DOTPLOT</a>&nbsp;&amp;&nbsp;<a href="https://github.com/lh3/minimap2">MINIMAP2</a>&nbsp;&rarr;&nbsp;<a href="https://github.com/schneebergerlab/plotsr">PLOTSR</a>: Synteny analysis</li>
<li><a href="https://github.com/marbl/merqury">MERQURY</a>: K-mer completeness, consensus quality and phasing assessment</li>
</ul>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44902/hite-a-fast-and-accurate-dynamic-boundary-adjustment-approach-for-full-length-transposable-elements-detection-and-annotation-in-genome-assemblies</guid>
	<pubDate>Sat, 20 Sep 2025 09:34:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44902/hite-a-fast-and-accurate-dynamic-boundary-adjustment-approach-for-full-length-transposable-elements-detection-and-annotation-in-genome-assemblies</link>
	<title><![CDATA[HiTE: a fast and accurate dynamic boundary adjustment approach for full-length Transposable Elements detection and annotation in Genome Assemblies]]></title>
	<description><![CDATA[<p dir="auto"><code>HiTE</code>&nbsp;is a Python software that uses a dynamic boundary adjustment approach to detect and annotate full-length Transposable Elements in Genome Assemblies. In comparison to other tools, HiTE demonstrates superior performance in detecting a greater number of full-length TEs.</p>
<div dir="auto">
<h2 dir="auto">panHiTE</h2>
<a href="https://github.com/CSU-KangHu/HiTE#panhite"></a></div>
<p dir="auto">We have developed panHiTE, a comprehensive and accurate pipeline for TE detection in large-scale population genomes. It has been successfully applied to hundreds of plant population genomes, demonstrating its effectiveness and scalability.</p>
<p dir="auto">For detailed instructions, please refer to the&nbsp;<a href="https://github.com/CSU-KangHu/HiTE/wiki/panHiTE-tutorial">panHiTE tutorial</a>.</p><p>Address of the bookmark: <a href="https://github.com/CSU-KangHu/HiTE" rel="nofollow">https://github.com/CSU-KangHu/HiTE</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44791/hibc-human-intestinal-bacteria-collection</guid>
	<pubDate>Wed, 07 May 2025 05:49:19 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44791/hibc-human-intestinal-bacteria-collection</link>
	<title><![CDATA[HiBC: Human Intestinal Bacteria Collection]]></title>
	<description><![CDATA[<p>The human gut is home to trillions of microorganisms, forming one of the most complex and dynamic microbial ecosystems known to science. The <strong style="font-size: 12.8px;">Human Intestinal Bacteria Collection (HiBC)</strong><span style="font-size: 12.8px; font-weight: normal;"> is a pioneering initiative aimed at cataloging, preserving, and studying the diverse bacterial species that inhabit the human gastrointestinal tract. This curated collection serves as a critical resource for researchers working on microbiome-related health, disease, and therapeutics.</span></p><h2>What is HiBC?</h2><p>The Human Intestinal Bacteria Collection (HiBC) is a comprehensive, high-quality reference repository of bacterial isolates derived from human fecal samples. It focuses on anaerobic and facultative anaerobic bacteria that play pivotal roles in digestion, immune modulation, vitamin synthesis, and pathogen resistance. The collection includes both culturable strains and genomic data from unculturable taxa, bridging the gap between culture-dependent and -independent microbiome studies.</p><h2>Why is HiBC Important?</h2><ol>
<li>
<p><strong>Understanding Microbiome-Host Interactions</strong><br /> HiBC enables deeper insight into the functions of specific bacterial taxa in the gut. With well-characterized isolates, researchers can conduct mechanistic studies to explore how certain bacteria influence metabolism, inflammation, or mental health.</p>
</li>
<li>
<p><strong>Precision Probiotics and Therapeutics</strong><br /> By providing access to native human gut microbes, HiBC supports the development of next-generation probiotics, live biotherapeutic products (LBPs), and fecal microbiota transplantation (FMT) alternatives.</p>
</li>
<li>
<p><strong>Standardization and Reproducibility</strong><br /> With standardized cultivation and genomic protocols, HiBC ensures consistency across microbiome research studies, improving reproducibility and comparability of findings.</p>
</li>
<li>
<p><strong>Antimicrobial Resistance (AMR) Surveillance</strong><br /> HiBC includes metadata on antibiotic resistance genes (ARGs), helping track the spread of AMR in commensal gut bacteria and understanding its implications for human health.</p>
</li>
</ol><h2>Key Features of HiBC</h2><ul>
<li>
<p><strong>Culturable Bacteria Repository:</strong> A living collection of anaerobic and facultative strains isolated from healthy and diseased individuals worldwide.</p>
</li>
<li>
<p><strong>Metadata-rich Entries:</strong> Each isolate is annotated with host details (age, health status, diet), geographical origin, phenotypic traits, and antibiotic susceptibility profiles.</p>
</li>
<li>
<p><strong>Whole Genome Sequencing (WGS):</strong> High-quality genome assemblies for most strains to support functional and comparative genomics.</p>
</li>
<li>
<p><strong>Interactive Database Access:</strong> User-friendly search and filtering options for strain selection based on taxonomy, function, or clinical relevance.</p>
</li>
<li>
<p><strong>Cross-linking with Other Databases:</strong> Integration with NCBI, GOLD, and Human Microbiome Project (HMP) data for broader context and validation.</p>
</li>
</ul><h2>Applications of HiBC</h2><ul>
<li>
<p>Microbiome-based diagnostics and biomarker discovery</p>
</li>
<li>
<p>Host-microbe interaction studies in gnotobiotic mouse models</p>
</li>
<li>
<p>Gut microbiome modulation through diet, drugs, or engineered bacteria</p>
</li>
<li>
<p>Longitudinal studies of gut flora across age, geography, and lifestyle</p>
</li>
<li>
<p>Environmental and evolutionary microbiology of human-associated bacteria</p>
</li>
</ul><h2>Accessing HiBC</h2><p>Researchers and interested parties can explore the HiBC database through its official website: <a href="https://www.hibc.rwth-aachen.de/" target="_new">https://www.hibc.rwth-aachen.de/</a>. The platform offers comprehensive information on bacterial isolates, including taxonomy, cultivation conditions, and genomic data, facilitating advanced research in human gut microbiome studies.</p><h2>Final Thoughts</h2><p>The <strong>HiBC</strong> is a cornerstone resource in the rapidly evolving field of microbiome research. As science moves toward personalized medicine and microbial therapeutics, having a reliable and diverse collection of human gut bacteria is not just useful &mdash; it's essential. Whether you're a microbiologist, clinician, computational biologist, or biotechnologist, HiBC offers tools to accelerate discovery and innovation in gut microbiome science.</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>

</channel>
</rss>