<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43732?offset=60</link>
	<atom:link href="https://bioinformaticsonline.com/related/43732?offset=60" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	
<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/43419/senior-bioinformatician-assembly-moore-aquatic-symbiosis-project-tree-of-life</guid>
  <pubDate>Sat, 02 Oct 2021 00:28:30 -0500</pubDate>
  <link></link>
  <title><![CDATA[Senior Bioinformatician (Assembly) Moore Aquatic Symbiosis Project Tree of Life]]></title>
  <description><![CDATA[
<p>You will have some previous experience with genome bioinformatics or other large scale scientific data analysis, or a newly qualified graduate student with data science skills interested in DNA sequence data. While desirable, previous experience with DNA sequencing data is not strictly necessary for the position. We have a strong publication record and culture of producing open data resources and open source software development. This role requires an investigative and solution-oriented mindset and excellent communication skills to work effectively within large national and international consortia. </p>

<p>More at https://jobs.sanger.ac.uk/vacancy/senior-bioinformatician-assembly-moore-aquatic-symbiosis-project-tree-of-life-458923.html</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</guid>
	<pubDate>Wed, 12 Jul 2023 06:29:31 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44342/ncbi-datasets%E2%80%AFpages</link>
	<title><![CDATA[NCBI Datasets pages]]></title>
	<description><![CDATA[<p>Update! Assembly and Genome record pages now redirect to new NCBI Datasets pages. NCBI Datasets is a new resource that makes it easier to find and download genome data. Learn more: https://ncbiinsights.ncbi.nlm.nih.gov/2023/07/11/ncbi-datasets-genome-assembly-pages/&nbsp;<a href="https://ow.ly/GU3o50P8QH4"></a><a href="https://www.linkedin.com/feed/hashtag/?keywords=ncbicgr&amp;highlightedUpdateUrns=urn%3Ali%3Aactivity%3A7084592728260386816">#NCBICGR</a></p><p><span>Effective July 10, 2023, NCBI&rsquo;s Assembly and Genome record pages now redirect to&nbsp;</span>new<a href="https://www.ncbi.nlm.nih.gov/datasets/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"> NCBI Datasets </a><span>pages. As&nbsp;</span><a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/03/07/ncbi-datasets-genome-taxonomy-pages/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711">previously announced</a><span>, these updates are part of our ongoing effort to modernize and improve your user experience. NCBI Datasets is a new resource that makes it easier to find and download genome data.  </span><span>&nbsp;</span></p><h5>The following pages have been updated:</h5><ul>
<li><span>The NCBI Assembly record pages now redirect to the new </span><a href="https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_023065955.2/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Genome</span></a><span> </span><span>record pages that describe assembled genomes and provide links to related NCBI tools such as Genome Data Viewer and BLAST. </span><span>&nbsp;</span></li>
<li><span>The NCBI</span><strong> </strong><span>Genome record pages now redirect to the </span><a href="https://www.ncbi.nlm.nih.gov/datasets/taxonomy/9644/?utm_source=ncbi_insights&amp;utm_medium=referral&amp;utm_campaign=datasets-genome-assembly-redirect-20230711"><span>NCBI Datasets</span><strong><span> </span></strong><span>Taxonomy</span></a><span> </span><span>record pages that provide a taxonomy-focused portal to genes, genomes, and additional NCBI resources.  </span><span>&nbsp;</span></li>
</ul><p><span>During this transition, you will have the option to return to the legacy Genome and Assembly record pages. We will remove the legacy pages in early 2024. </span><span>&nbsp;</span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37576/lrcstats-a-tool-for-evaluating-long-reads-correction-methods</guid>
	<pubDate>Wed, 22 Aug 2018 11:05:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37576/lrcstats-a-tool-for-evaluating-long-reads-correction-methods</link>
	<title><![CDATA[LRCstats: a tool for evaluating long reads correction methods]]></title>
	<description><![CDATA[<p><span>LRCstats is an open-source pipeline for benchmarking DNA long read correction algorithms for long reads outputted by third generation sequencing technology such as machines produced by Pacific Biosciences. The reads produced by third generation sequencing technology, as the name suggests, are longer in length than reads produced by next generation sequencing technologies, such as those produced by Illumina. However, long reads are plagued by high error rates, which can cause issues in downstream analysis. Long read correction algorithms reduce the error rate of long reads either through self-correcting methods or using accurate, short reads outputted by next generation sequencing technologies to correct long reads.</span></p><p>Address of the bookmark: <a href="https://github.com/cchauve/lrcstats" rel="nofollow">https://github.com/cchauve/lrcstats</a></p>]]></description>
	<dc:creator>Aaryan Lokwani</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44227/common-methods-to-discover-tandem-repeats</guid>
	<pubDate>Thu, 09 Mar 2023 02:40:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44227/common-methods-to-discover-tandem-repeats</link>
	<title><![CDATA[Common methods to discover tandem repeats]]></title>
	<description><![CDATA[<div><div><div><div><div><div><div><div><div><div><p>Tandem repeats are DNA sequences that are repeated in a contiguous manner in the genome. These sequences are often used as genetic markers and are important in many areas of genetics and genomics research. Here are some methods for discovering tandem repeats in genomes:</p><ol>
<li>
<p>Tandem Repeat Finder: Tandem Repeat Finder is a software tool that identifies tandem repeats in DNA sequences. It is available for free download and can be used on both nucleotide and protein sequences. The tool uses a statistical algorithm to identify repeats based on their length, copy number, and overall composition.</p>
</li>
<li>
<p>RepeatMasker: RepeatMasker is another software tool that can identify tandem repeats in DNA sequences. It works by comparing the input sequence to a database of known repeats and then identifies any tandem repeats that match those in the database.</p>
</li>
<li>
<p>PCR-based methods: Polymerase chain reaction (PCR) can be used to amplify and detect tandem repeats in genomic DNA. PCR primers are designed to flank the tandem repeat region, and amplification of the target DNA fragment can be visualized on a gel. This method can be useful for detecting novel tandem repeats and for genotyping.</p>
</li>
<li>
<p>Southern blotting: Southern blotting is a classic method for detecting DNA fragments in a sample. It can be used to detect tandem repeats by digesting genomic DNA with a restriction enzyme, separating the fragments by gel electrophoresis, and then probing the blot with a tandem repeat-specific probe.</p>
</li>
</ol><p>Overall, a combination of these methods can be used to comprehensively identify tandem repeats in genomes.</p></div></div></div></div></div></div></div></div></div></div>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/44724/step-by-step-guide-to-detect-pirnas-using-bioinformatics</guid>
	<pubDate>Fri, 13 Dec 2024 11:41:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/news/view/44724/step-by-step-guide-to-detect-pirnas-using-bioinformatics</link>
	<title><![CDATA[Step-by-Step Guide to Detect piRNAs Using Bioinformatics]]></title>
	<description><![CDATA[<p>Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs that play crucial roles in silencing transposable elements and regulating gene expression, particularly in germline cells. Detecting piRNAs involves identifying their unique characteristics, such as size, sequence motifs, and association with Piwi proteins, from high-throughput RNA sequencing data.</p><p>This blog provides a comprehensive step-by-step guide to detect piRNAs using bioinformatics tools and workflows.</p><h4><strong>Step 1: Prepare Your Data</strong></h4><ol>
<li>
<p><strong>Obtain RNA Sequencing Data</strong><br />Acquire raw small RNA-seq data in FASTQ format. Datasets can be sourced from repositories like <strong>NCBI SRA</strong>, <strong>EMBL-EBI</strong>, or specific small RNA sequencing projects.</p>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use <strong>FastQC</strong> to assess the quality of raw reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq </code></div>
</div>
<p>Evaluate the per-base quality, adapter content, and overrepresented sequences.</p>
</li>
<li>
<p><strong>Trimming and Adapter Removal</strong><br />Use tools like <strong>Cutadapt</strong> or <strong>Trim Galore!</strong> to remove adapters and low-quality bases:</p>
<div>
<div dir="ltr"><code>cutadapt -a TGGAATTCTCGGGTGCCAAGG -o trimmed_reads.fastq reads.fastq </code></div>
</div>
<p>Ensure the remaining reads are of high quality for downstream analysis.</p>
</li>
</ol><h4><strong>Step 2: Map Reads to the Genome</strong></h4><p>Mapping reads to the reference genome is crucial for identifying piRNA loci.</p><ol>
<li>
<p><strong>Reference Genome Preparation</strong><br />Download the genome assembly of your organism from databases like <strong>Ensembl</strong>, <strong>UCSC Genome Browser</strong>, or <strong>NCBI</strong>.</p>
</li>
<li>
<p><strong>Align Reads</strong><br />Use <strong>Bowtie</strong> or <strong>STAR</strong> for small RNA alignment:</p>
<div>
<div dir="ltr"><code>bowtie -v 1 -k 1 --best genome_index trimmed_reads.fastq -S aligned_reads.sam </code></div>
</div>
<ul>
<li><code>-v 1</code>: Allows one mismatch.</li>
<li><code>-k 1</code>: Reports the best alignment.</li>
</ul>
</li>
<li>
<p><strong>Convert SAM to BAM</strong><br />Convert and sort alignments using <strong>SAMtools</strong>:</p>
<div>
<div dir="ltr"><code>samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam </code></div>
</div>
</li>
</ol><h4><strong>Step 3: Identify Small RNAs</strong></h4><p>piRNAs are characterized by their size (24&ndash;32 nt) and strand bias.</p><ol>
<li>
<p><strong>Extract Reads by Size</strong><br />Use tools like <strong>BEDtools</strong> or custom scripts to filter reads between 24 and 32 nt:</p>
<div>
<div dir="ltr"><code>bedtools bamtofastq -i sorted_reads.bam -fq all_reads.fastq seqkit seq -m 24 -M 32 all_reads.fastq &gt; piRNA_size_reads.fastq </code></div>
</div>
</li>
<li>
<p><strong>Check for Sequence Bias</strong><br />piRNAs often have a strong bias for a uridine at the 5&rsquo; end (1U bias). Use tools like <strong>WebLogo</strong> to visualize sequence motifs.</p>
</li>
</ol><h4><strong>Step 4: Detect Ping-Pong Signature</strong></h4><p>The ping-pong amplification loop is a hallmark of piRNA biogenesis, characterized by a 10 nt overlap between piRNAs on opposite strands.</p><ol>
<li>
<p><strong>Generate Overlap Statistics</strong><br />Use the <strong>piPipes</strong> tool or custom scripts to calculate overlap:</p>
<div>
<div dir="ltr"><code>python ping_pong_overlap.py sorted_reads.bam </code></div>
</div>
</li>
<li>
<p><strong>Visualize Overlap Distribution</strong><br />Plot the distribution of overlaps to confirm the presence of the 10 nt ping-pong signature.</p>
</li>
</ol><h4><strong>Step 5: Annotate piRNA Clusters</strong></h4><p>piRNAs are often generated from genomic clusters.</p><ol>
<li>
<p><strong>Cluster Identification</strong><br />Use tools like <strong>proTRAC</strong> or <strong>PIRANHA</strong> to identify piRNA-producing clusters:</p>
<div>
<div dir="ltr"><code>proTRAC.pl -s sorted_reads.bam -g genome.fa -o clusters </code></div>
</div>
</li>
<li>
<p><strong>Annotate Genomic Regions</strong><br />Annotate the identified clusters using gene annotation files (GTF/GFF). Tools like <strong>BEDtools intersect</strong> can help associate piRNA clusters with genes or transposable elements:</p>
<div>
<div dir="ltr"><code>bedtools intersect -a clusters.bed -b genome_annotation.gtf &gt; annotated_clusters.bed </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Functional Analysis</strong></h4><p>Functional analysis of piRNAs can uncover their targets and regulatory roles.</p><ol>
<li>
<p><strong>Predict piRNA Targets</strong><br />Use tools like <strong>IntaRNA</strong> or <strong>RNAhybrid</strong> to predict interactions between piRNAs and potential target mRNAs:</p>
<div>
<div dir="ltr"><code>RNAhybrid -t target_transcripts.fa -q piRNAs.fa &gt; piRNA_targets.txt </code></div>
</div>
</li>
<li>
<p><strong>Enrichment Analysis</strong><br />Perform GO or KEGG enrichment analysis of target genes using tools like <strong>g:Profiler</strong> or <strong>DAVID</strong>.</p>
</li>
</ol><h4><strong>Step 7: Validation and Visualization</strong></h4><ol>
<li>
<p><strong>Validate piRNA Candidates</strong><br />Cross-check the identified piRNAs against known piRNA databases, such as <strong>piRBase</strong> or <strong>piRNAdb</strong>.</p>
</li>
<li>
<p><strong>Visualize Results</strong></p>
<ul>
<li>Use <strong>IGV</strong> (Integrative Genomics Viewer) to visualize piRNA alignment and clusters on the genome.</li>
<li>Generate heatmaps or circos plots to present piRNA distributions.</li>
</ul>
</li>
</ol><h4><strong>Step 8: Share and Publish Findings</strong></h4><ol>
<li>
<p><strong>Archive Data</strong><br />Submit sequencing data to public repositories like <strong>SRA</strong> or <strong>GEO</strong> with metadata specifying piRNA-related experiments.</p>
</li>
<li>
<p><strong>Publish Results</strong><br />Share findings in journals or conferences, emphasizing novel piRNA candidates, target genes, or regulatory mechanisms.</p>
</li>
</ol><h4><strong>Conclusion</strong></h4><p>Detecting piRNAs involves a combination of computational and analytical methods to identify these unique small RNAs and their roles in gene regulation and transposable element suppression. By following this step-by-step guide, you can confidently navigate the complexities of piRNA detection and contribute to the growing understanding of their biological significance.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</guid>
	<pubDate>Mon, 10 Jul 2017 05:56:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</link>
	<title><![CDATA[Omega2: metagenome assembly pipeline]]></title>
	<description><![CDATA[<p><span>Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.</span></p><p>Address of the bookmark: <a href="http://omega.omicsbio.org/" rel="nofollow">http://omega.omicsbio.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</guid>
	<pubDate>Mon, 27 Nov 2017 07:58:49 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34416/miniasm-very-fast-olc-based-de-novo-assembler-for-noisy-long-reads</link>
	<title><![CDATA[miniasm: very fast OLC-based de novo assembler for noisy long reads]]></title>
	<description><![CDATA[<p>Miniasm is a very fast OLC-based&nbsp;<em>de novo</em>&nbsp;assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>) as input and outputs an assembly graph in the&nbsp;<a href="https://github.com/pmelsted/GFA-spec/blob/master/GFA-spec.md">GFA</a>&nbsp;format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final&nbsp;<a href="http://wgs-assembler.sourceforge.net/wiki/index.php/Celera_Assembler_Terminology">unitig</a>&nbsp;sequences. Thus the per-base error rate is similar to the raw input reads.</p>
<p>So far miniasm is in early development stage. It has only been tested on a dozen of PacBio and Oxford Nanopore (ONT) bacterial data sets. Including the mapping step, it takes about 3 minutes to assemble a bacterial genome. Under the default setting, miniasm assembles 9 out of 12 PacBio datasets and 3 out of 4 ONT datasets into a single contig. The 12 PacBio data sets are&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-Bacterial-Assembly">PacBio E. coli sample</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS473430">ERS473430</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS544009">ERS544009</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS554120">ERS554120</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS605484">ERS605484</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS617393">ERS617393</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS646601">ERS646601</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS659581">ERS659581</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS670327">ERS670327</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS685285">ERS685285</a>,&nbsp;<a href="http://www.ebi.ac.uk/ena/data/view/ERS743109">ERS743109</a>&nbsp;and a&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/E.-coli-20kb-Size-Selected-Library-with-P6-C4/ce0533c1d2a957488594f0b29da61ffa3e4627e8">deprecated PacBio E. coli data set</a>. ONT data are acquired from the&nbsp;<a href="http://lab.loman.net/2015/09/24/first-sqk-map-006-experiment/">Loman Lab</a>.</p>
<p>For a&nbsp;<em>C. elegans</em>&nbsp;<a href="https://github.com/PacificBiosciences/DevNet/wiki/C.-elegans-data-set">PacBio data set</a>&nbsp;(only 40X are used, not the whole dataset), miniasm finishes the assembly, including reads overlapping, in ~10 minutes with 16 CPUs. The total assembly size is 105Mb; the N50 is 1.94Mb. In comparison, the&nbsp;<a href="https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP">HGAP3</a>produces a 104Mb assembly with N50 1.61Mb.&nbsp;<a href="http://lh3lh3.users.sourceforge.net/download/ce-miniasm.png">This dotter plot</a>&nbsp;gives a global view of the miniasm assembly (on the X axis) and the HGAP3 assembly (on Y). They are broadly comparable. Of course, the HGAP3 consensus sequences are much more accurate. In addition, on the whole data set (assembled in ~30 min), the miniasm N50 is reduced to 1.79Mb. Miniasm still needs improvements.</p>
<p>Miniasm confirms that at least for high-coverage bacterial genomes, it is possible to generate long contigs from raw PacBio or ONT reads without error correction. It also shows that&nbsp;<a href="https://github.com/lh3/minimap">minimap</a>&nbsp;can be used as a read overlapper, even though it is probably not as sensitive as the more sophisticated overlapers such as&nbsp;<a href="https://github.com/marbl/MHAP">MHAP</a>&nbsp;and&nbsp;<a href="https://github.com/thegenemyers/DALIGNER">DALIGNER</a>. Coupled with long-read error correctors and consensus tools, miniasm may also be useful to produce high-quality assemblies.</p>
<p>Minimap and miniasm are ultrafast tools for (i) mapping and (ii) assembly. Designed for long, noisy reads, they do not have a correction or consensus step, and therefore the resulting assemblies are contiguous (i.e. long) but very noisy (i.e. full of errors)</p>
<p>We start with an all against all comparison:</p>
<div>
<pre><code>minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq | gzip -1 &gt; reads.paf.gz
</code></pre>
</div>
<p>Then we can assemble</p>
<div>
<pre><code>miniasm -f reads.fq reads.paf.gz &gt; reads.gfa
</code></pre>
</div>
<p>Convert GFA to FASTA:</p>
<div>
<pre><code>awk <span>'/^S/{print "&gt;"$2"\n"$3}'</span> reads.gfa | fold &gt; reads.fa
</code></pre>
</div>
<p>And then count how many contigs:</p>
<div>
<pre><code>grep <span>"&gt;"</span> reads.fa | wc -l</code></pre>
</div>
<p>&nbsp;</p>
<pre><span><span>#</span> Download sample PacBio from the PBcR website</span>
wget -O- http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz <span>|</span> tar zxf -
ln -s selfSampleData/pacbio_filtered.fastq reads.fq
<span><span>#</span> Install minimap and miniasm (requiring gcc and zlib)</span>
git clone https://github.com/lh3/minimap <span>&amp;&amp;</span> (cd minimap <span>&amp;&amp;</span> make)
git clone https://github.com/lh3/miniasm <span>&amp;&amp;</span> (cd miniasm <span>&amp;&amp;</span> make)
<span><span>#</span> Overlap</span>
minimap/minimap -Sw5 -L100 -m0 -t8 reads.fq reads.fq <span>|</span> gzip -1 <span>&gt;</span> reads.paf.gz
<span><span>#</span> Layout</span>
miniasm/miniasm -f reads.fq reads.paf.gz <span>&gt;</span> reads.gfa</pre><p>Address of the bookmark: <a href="https://github.com/lh3/miniasm" rel="nofollow">https://github.com/lh3/miniasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35345/rgfa-powerful-and-convenient-handling-of-assembly-graphs</guid>
	<pubDate>Thu, 25 Jan 2018 05:47:53 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35345/rgfa-powerful-and-convenient-handling-of-assembly-graphs</link>
	<title><![CDATA[RGFA: powerful and convenient handling of assembly graphs]]></title>
	<description><![CDATA[<p><span>RGFA, an implementation of the proposed GFA specification in Ruby. It allows the user to conveniently parse, edit and write GFA files. Complex operations such as the separation of the implicit instances of repeats and the merging of linear paths can be performed. A typical application of RGFA is the editing of a graph, to finish the assembly of a sequence, using information not available to the assembler. We illustrate a use case, in which the assembly of a repetitive metagenomic fosmid insert was completed using a script based on RGFA.</span></p>
<p><span>https://github.com/ggonnella/rgfa</span></p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103826/" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5103826/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36533/mecat-fast-mapping-error-correction-and-de-novo-assembly-for-single-molecule-sequencing-reads</guid>
	<pubDate>Fri, 11 May 2018 05:07:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36533/mecat-fast-mapping-error-correction-and-de-novo-assembly-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads]]></title>
	<description><![CDATA[<p>MECAT is an ultra-fast Mapping, Error Correction and de novo Assembly Tools for single molecula sequencing (SMRT) reads. MECAT employs novel alignment and error correction algorithms that are much more efficient than the state of art of aligners and error correction tools. MECAT can be used for effectively de novo assemblying large genomes. For example, on a 32-thread computer with 2.0 GHz CPU , MECAT takes 9.5 days to assemble a human genome based on 54x SMRT data, which is 40 times faster than the current&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>. MECAT performance were compared with&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>,&nbsp;<a href="https://github.com/PacificBiosciences/falcon">FALCON</a>&nbsp;and&nbsp;<a href="http://canu.readthedocs.io/en/latest/">Canu(v1.3)</a>&nbsp;in five real datasets. The quality of assembled contigs produced by MECAT is the same or better than that of the&nbsp;<a href="http://cbcb.umd.edu/software/pbcr/mhap/">PBcR-Mhap pipeline</a>&nbsp;and&nbsp;<a href="https://github.com/PacificBiosciences/falcon">FALCON</a>.&nbsp;</p>
<p>https://www.nature.com/articles/nmeth.4432</p><p>Address of the bookmark: <a href="https://github.com/xiaochuanle/MECAT" rel="nofollow">https://github.com/xiaochuanle/MECAT</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>