<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43983?offset=40</link>
	<atom:link href="https://bioinformaticsonline.com/related/43983?offset=40" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30966/maftools</guid>
	<pubDate>Thu, 16 Feb 2017 11:16:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30966/maftools</link>
	<title><![CDATA[MafTools]]></title>
	<description><![CDATA[<p>maftools - An R package to summarize, analyze and visualize MAF files. <a href="https://github.com/PoisonAlien/maftools#introduction"></a>Introduction.</p>
<p>With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widley accepted and used to store variants detected. <a href="http://cancergenome.nih.gov">The Cancer Genome Atlas</a> Project has seqenced over 30 different cancers with sample size of each cancer type being over 200. The <a href="https://wiki.nci.nih.gov/display/TCGA/TCGA+MAF+Files">resulting data</a> consisting of genetic variants is stored in the form of <a href="https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification">Mutation Annotation Format</a>. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner either from TCGA sources or any in-house studies as long as the data is in MAF format. Maftools can also handle ICGC Simple Somatic Mutation format.</p>
<p>maftools is on <img src="https://assets-cdn.github.com/images/icons/emoji/unicode/1f449.png" alt=":point_right:" width="20" height="20" style="border: 0px;"> <a href="http://biorxiv.org/content/early/2016/05/11/052662">bioRxiv</a> <img src="https://assets-cdn.github.com/images/icons/emoji/bowtie.png" alt=":bowtie:" title=":bowtie:" width="20" height="20" style="border: 0px; text-align: absmiddle;"></p>
<p>Please cite the below if you find this tool useful for you.</p>
<p>Mayakonda, A. and H.P. Koeffler, Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies. bioRxiv, 2016. doi: <a href="http://dx.doi.org/10.1101/052662">http://dx.doi.org/10.1101/052662</a></p><p>Address of the bookmark: <a href="https://github.com/PoisonAlien/maftools" rel="nofollow">https://github.com/PoisonAlien/maftools</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/35033/bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries</guid>
	<pubDate>Wed, 03 Jan 2018 00:25:27 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/35033/bbsplit-read-binning-tool-for-metagenomes-and-contaminated-libraries</link>
	<title><![CDATA[BBSplit: Read Binning Tool for Metagenomes and Contaminated Libraries]]></title>
	<description><![CDATA[<p>BBSplit internally uses BBMap to map reads to multiple genomes at once, and determine which genome they match best. This is different than with ordinary mapping. If a genome (say, human) contains an exact repeat somewhere, reads mapping to it will be mapped ambiguously. But if you want to determine whether reads are mouse or human, it does not matter whether they map ambiguously within human, only whether they are ambiguous between human and mouse. BBSplit tracks this additional ambiguity information and decides how to use it based on the &ldquo;ambig2&rdquo; flag. The normal use of BBSplit is like Seal, either quantifying how many reads go to each reference, or splitting the reads into multiple output files, one per reference. BBSplit can only be run using references indexed with BBSplit, as they contain additional information regarding which sequences came from which reference file.</p><p><span>BBSplit is a tool that bins reads by mapping to multiple references simultaneously, using&nbsp;</span><a href="http://seqanswers.com/forums/showthread.php?t=41057" target="_blank">BBMap</a><span>. The reads go to the bin of the reference they map to best. There are also disambiguation options, such that reads that map to multiple references can be binned with all of them, none of them, one of them, or put in a special "ambiguous" file for each of them. Paired reads will always be kept together.</span><br /><br /><span>For example, if you had a library of something that was contaminated with e.coli and salmonella, you could do this:</span><br /><br /><strong>bbsplit.sh in=reads.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu=clean.fq int=t</strong><br /><br /><span>This will produce 3 output files:</span><br /><strong>out_ecoli.fq</strong><span>&nbsp;(ecoli reads)</span><br /><strong>out_salmonella.fq</strong><span>&nbsp;(salmonella reads)</span><br /><strong>clean.fq</strong><span>&nbsp;(unmapped reads)</span><br /><br /><span>In this case, "int=t" means that the input file is paired and interleaved. For single-end reads you would leave that out. For paired reads in 2 files, you would do this:</span><br /><strong>bbsplit.sh in1=reads1.fq in2=reads2.fq ref=ecoli.fa,salmonella.fa basename=out_%.fq outu1=clean1.fq outu2=clean2.fq</strong></p><p><strong><span>BBSplit is available here:</span><br /><a href="https://sourceforge.net/projects/bbmap/" target="_blank">https://sourceforge.net/projects/bbmap/</a></strong></p><p><span>The sensitivity can be raised to be equivalent to BBMap with these flags: "minratio=0.56 minhits=1 maxindel=16000"</span></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/43977/read-simulators</guid>
	<pubDate>Fri, 30 Sep 2022 06:48:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/43977/read-simulators</link>
	<title><![CDATA[Read Simulators]]></title>
	<description><![CDATA[<h1>Short Read Simulators</h1><p>With the popularity of next-generation sequencing (NGS) technologies, many NGS read simulators have been developed. Currently, many of the popular short read simulators are designed to simulate reads mimicking many Illumina, 454 and SOLiD platforms. Listed below are some popular short read simulators. Links to their publications are provided as well.</p><ol>
<li><a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003373" target="_blank">MetaSim</a></li>
<li><a href="https://github.com/lh3/wgsim" target="_blank">wgsim</a></li>
<li><a href="https://github.com/timmassingham/simNGS" target="_blank">SimNGS</a></li>
<li><a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0049110" target="_blank">ArtificialFastqGenerator</a></li>
<li id="e943"><a href="https://academic.oup.com/bioinformatics/article/35/3/521/5055123" target="_blank">InSilicoSeq</a></li>
</ol><h1>Long Read Simulators</h1><p id="d469">With the advancements in sequencing technologies, scientists have shown an increasing interest in using third-generation sequencing (TGS) technologies. Currently, many of the popular long read simulators are designed to simulate reads mimicking the two main TGS technologies; (1)&nbsp;<em>Pacific Biosciences (PacBio)</em>&nbsp;and (2)&nbsp;<em>Oxford Nanopore (ONT)</em>. Listed below are some of the popular and recently introduced PacBio and ONT simulators. Links to their publications are provided as well.</p><h2><span>PacBio Simulators</span></h2><ol>
<li><a href="https://academic.oup.com/bioinformatics/article/29/1/119/273243" target="_blank">PBSIM</a></li>
<li><a href="https://academic.oup.com/bioinformatics/article/32/24/3829/2525710" target="_blank">LongISLND</a></li>
<li><a href="https://academic.oup.com/bioinformatics/article/32/17/2704/2450740" target="_blank">SimLoRD</a></li>
<li><a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2208-0" target="_blank">NPBSS</a></li>
<li id="fed0"><a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2901-7" target="_blank">PaSS</a></li>
</ol><h2><span>ONT Simulators</span></h2><ol>
<li id="f145"><a href="https://academic.oup.com/gigascience/article/6/4/gix010/3051934" target="_blank">NanoSim</a></li>
<li id="c6f5"><a href="https://ieeexplore.ieee.org/document/8621253" target="_blank">Nanopore SimulatION</a></li>
<li><a href="https://academic.oup.com/bioinformatics/article/34/17/2899/4962495" target="_blank">DeepSimulator</a></li>
<li><a href="https://academic.oup.com/bioinformatics/article/36/8/2578/5698265" target="_blank">DeepSimulator1.5</a></li>
</ol>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35061/proovread-large-scale-high-accuracy-pacbio-correction-through-iterative-short-read-consensus</guid>
	<pubDate>Fri, 05 Jan 2018 04:12:20 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35061/proovread-large-scale-high-accuracy-pacbio-correction-through-iterative-short-read-consensus</link>
	<title><![CDATA[proovread : large-scale high-accuracy PacBio correction through iterative short read consensus]]></title>
	<description><![CDATA[<p>proovread : large-scale high-accuracy PacBio correction through iterative short read consensus</p>
<ul>
<li>outperforms PacBioToCA/LSC in terms of accuracy and contiguity/sensitivity (<a href="http://dx.doi.org/10.1093/bioinformatics/btu392">http://dx.doi.org/10.1093/bioinformatics/btu392</a>)</li>
<li>is easy to install/run/configure</li>
<li>supports various types of dat
<ul>
<li><strong>HiSeq/MiSeq&nbsp;</strong>(100-500bp)</li>
<li><strong>Unitigs</strong></li>
<li>454, ...</li>
</ul>
</li>
</ul>
<p>proovread maps high coverage data to pacbio reads (bwa mem, blasr, daligner) in multiple iterations.</p><p>Address of the bookmark: <a href="https://github.com/BioInf-Wuerzburg/proovread" rel="nofollow">https://github.com/BioInf-Wuerzburg/proovread</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36950/salsa-a-tool-to-scaffold-long-read-assemblies-with-hi-c</guid>
	<pubDate>Fri, 15 Jun 2018 04:01:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36950/salsa-a-tool-to-scaffold-long-read-assemblies-with-hi-c</link>
	<title><![CDATA[SALSA: A tool to scaffold long read assemblies with Hi-C]]></title>
	<description><![CDATA[This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch.

To use the latest version, first run the following commands:

  cd SALSA
  make
To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2).

If you consider using this tool, please cite our publication which describes the methods used for scaffolding.

Ghurye, J., Pop, M., Koren, S., Bickhart, D., &amp; Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link

Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link

For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).<p>Address of the bookmark: <a href="https://github.com/machinegun/SALSA" rel="nofollow">https://github.com/machinegun/SALSA</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37574/simlord-a-read-simulator-for-third-generation-sequencing-reads</guid>
	<pubDate>Wed, 22 Aug 2018 10:40:27 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37574/simlord-a-read-simulator-for-third-generation-sequencing-reads</link>
	<title><![CDATA[SimLoRD: A read simulator for third generation sequencing reads]]></title>
	<description><![CDATA[<p>SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.</p>
<p>Reads are simulated from both strands of a provided or randomly generated reference sequence.</p>
<div id="rst-header-features">
<ul>
<li>The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)</li>
<li>The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length</li>
<li>Quality values and number of passes depend on fragment length.</li>
<li>Provided subread error probabilities are modified according to number of passes</li>
<li>Outputs reads in FASTQ format and alignments in SAM format</li>
</ul>
</div><p>Address of the bookmark: <a href="https://bitbucket.org/genomeinformatics/simlord/" rel="nofollow">https://bitbucket.org/genomeinformatics/simlord/</a></p>]]></description>
	<dc:creator>Aaryan Lokwani</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40531/shasta-long-read-assembler</guid>
	<pubDate>Tue, 14 Jan 2020 06:47:07 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40531/shasta-long-read-assembler</link>
	<title><![CDATA[Shasta long read assembler]]></title>
	<description><![CDATA[<p>The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by&nbsp;<a href="https://nanoporetech.com/">Oxford Nanopore</a>&nbsp;flow cells.</p>
<p>Computational methods used by the Shasta assembler include:</p>
<ul>
<li>Using a&nbsp;<a href="https://en.wikipedia.org/wiki/Run-length_encoding">run-length</a>&nbsp;representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.</li>
<li>Using in some phases of the computation a representation of the read sequence based on&nbsp;<em>markers</em>, a fixed subset of short k-mers (k &asymp; 10).</li>
</ul>
<p>More at&nbsp;<a href="https://chanzuckerberg.github.io/shasta/index.html">https://chanzuckerberg.github.io/shasta/index.html</a></p><p>Address of the bookmark: <a href="https://github.com/chanzuckerberg/shasta" rel="nofollow">https://github.com/chanzuckerberg/shasta</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>