<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/35918?offset=30</link>
	<atom:link href="https://bioinformaticsonline.com/related/35918?offset=30" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41565/csar-web-a-web-server-of-contig-scaffolding-using-algebraic-rearrangements</guid>
	<pubDate>Fri, 10 Apr 2020 04:39:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41565/csar-web-a-web-server-of-contig-scaffolding-using-algebraic-rearrangements</link>
	<title><![CDATA[CSAR-web: a web server of contig scaffolding using algebraic rearrangements]]></title>
	<description><![CDATA[<p><span>CSAR-web is a web-based tool that allows the users to efficiently and accurately scaffold (i.e. order and orient) the contigs of a target draft genome based on a complete or incomplete reference genome from a related organism.&nbsp;</span></p>
<p><span><span>CSAR-web can serve as a convenient and useful scaffolding tool allowing the users to efficiently and accurately scaffold their draft genomes according to a complete or incomplete reference genome.&nbsp;</span></span></p><p>Address of the bookmark: <a href="http://genome.cs.nthu.edu.tw/CSAR-web" rel="nofollow">http://genome.cs.nthu.edu.tw/CSAR-web</a></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38623/kallisto-a-program-for-quantifying-abundances-of-transcripts-from-bulk-and-single-cell-rna-seq-data</guid>
	<pubDate>Mon, 07 Jan 2019 10:35:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38623/kallisto-a-program-for-quantifying-abundances-of-transcripts-from-bulk-and-single-cell-rna-seq-data</link>
	<title><![CDATA[kallisto: a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data]]></title>
	<description><![CDATA[<p><strong>kallisto</strong>&nbsp;is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of&nbsp;<em>pseudoalignment</em>&nbsp;for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data,&nbsp;<strong>kallisto</strong>&nbsp;can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and&nbsp;<strong>kallisto</strong>&nbsp;is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks&nbsp;<strong>kallisto</strong>&nbsp;significantly outperforms existing tools.&nbsp;<strong>kallisto</strong>&nbsp;is described in detail in:</p>
<p>Nicolas L Bray, Harold Pimentel, P&aacute;ll Melsted and Lior Pachter,&nbsp;<a href="http://www.nature.com/nbt/journal/v34/n5/full/nbt.3519.html">Near-optimal probabilistic RNA-seq quantification</a>, Nature Biotechnology&nbsp;<strong>34</strong>, 525&ndash;527 (2016), doi:10.1038/nbt.3519</p><p>Address of the bookmark: <a href="https://pachterlab.github.io/kallisto/about" rel="nofollow">https://pachterlab.github.io/kallisto/about</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/37677/installing-blat-on-linux</guid>
	<pubDate>Tue, 11 Sep 2018 08:17:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/37677/installing-blat-on-linux</link>
	<title><![CDATA[Installing BLAT on Linux !]]></title>
	<description><![CDATA[<p><span>It's been a while since I last installed BLAT and when I went to the download directory at UCSC:&nbsp;</span><a href="http://users.soe.ucsc.edu/~kent/src/">http://users.soe.ucsc.edu/~kent/src/</a><span>&nbsp;I found that the latest blast is now version 35 and that the code to download was:&nbsp;</span><a href="http://users.soe.ucsc.edu/~kent/src/blatSrc35.zip">blatSrc35.zip</a><span>. However, you can also get pre-compiled binaries at:&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/admin/exe/">http://hgdownload.cse.ucsc.edu/admin/exe/</a><span>&nbsp;and that there was a linux x86_64 executable for my architecture available at:&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/">http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/blat/</a><span>. Though YYMV, BLAT can be a little bit of a tricky beast to get going, so I decided to download the source code and compile that.</span><br /><br /><span>I will be compiling this code as 'root' as a system tool in&nbsp;</span><code>/usr/local/src</code><span>, so do not scream at me for that.</span><br /><br /><span>First I created an /usr/local/src/blat directory and I copied the blatSrc35.zip file into that.</span><br /><br /><span>Next I used</span></p><pre><code>unzip blatSrc35.zip</code></pre><p><span>to unpack the archive. This gives a directory blatSrc now move into that directory.</span></p><pre><code>#cd blatSrc</code></pre><p><span>before you begin read the README file that comes with the source code.</span><br /><br /><span>One thing about building blat is that you need to set the MACHTYPE variable so that the BLAT sources know what type of machine you are compiling the software on.</span><br /><br /><span>on most *nix machines, typing</span></p><pre><code>echo $MACHTYPE</code></pre><p><span>will return the machine architecture type.</span><br /><br /><span>On my CentOS 6 based system this gave:</span></p><pre><code>x86_64-redhat-linux-gnu</code></pre><p><span>However, what BLAT requires is the 'short value' (ie the first part of the MACHTYPE). To correct this, in the bash shell type (change this to the correct MACHTYPE for your system)</span></p><pre><code>MACHTYPE=x86_64
export MACHTYPE</code></pre><p><span>now running the command:</span></p><pre><code>echo $MACHTYPE</code></pre><p><span>should give the correct short form of the MACHTYPE:</span></p><pre><code>x86_64</code></pre><p><span>now create the directory lib/$MACHTYPE in the source tree. ie:</span></p><pre><code>mkdir lib/$MACHTYPE</code></pre><p><span>For my machine, lib/x86_64 already existed, so I did not have to do this, but this is not the case for all architectures.</span><br /><br /><span>The BLAT code assumes that you are compiling BLAT as a non-privileged (ie non-root) user. As a result, you must create the directory for the executables to go into:</span><br /><br /><span>mkdir ~/bin/$MACHTYPE</span><br /><br /><span>If you are installing as a normal user, edit your .bashrc to add the following (change the x86_64 to be your MACHTYPE):</span><br /><br /><span>export PATH=~/bin/x86_64::$PATH</span><br /><br /><span>For me, though, this was not good enough. I wanted the executables in /usr/local/bin where all my other code goes. As a result I did some hackery...</span><br /><br /><span>There is a master make template in the&nbsp;</span><code>inc</code><span>&nbsp;directory called&nbsp;</span><code>common.mk</code><span>&nbsp;and I edited this file with the command:</span><br /><br /><span>vi inc/common.mk</span><br /><br /><span>I replaced the line</span></p><pre><code>    BINDIR=${HOME}/bin/${MACHTYPE}</code></pre><p><span>with</span></p><pre><code>    BINDIR=/usr/local/bin</code></pre><p><span>saved and quit (as this is in my path, I do not need to do anything else)</span><br /><br /><span>All the preparation is now done and you can create the blat executables by going into the toplevel of the blat source tree (for me it was&nbsp;</span><code>/usr/local/src/blat/blatSrc</code><span>, but change to wherever you unpacked blat into).</span><br /><br /><span>Now simply run the command:</span></p><pre><code>make</code></pre><p><span>to compile the code.</span><br /><br /><span>Blat installed cleanly and the executables were all neatly placed in /usr/local/bin/x86_64, just like I wanted.</span><br /><br /><span>now simply running the command:</span></p><pre><code>blat</code></pre><p><span>on the command line gives me information on blat and sample usage.</span><br /><br /><span>Blat is installed and it's installed properly in my system code tree!!!</span></p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</guid>
	<pubDate>Fri, 01 Jun 2018 08:07:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36842/gap-filling-or-contigs-extensions-tools</link>
	<title><![CDATA[Gap filling or Contigs extensions tools !]]></title>
	<description><![CDATA[
<p>There are many tools to perform gap filling using Illumina short reads, for example "GapFiller: a de novo assembly approach to fill the gap within paired reads" or "Toward almost closed genomes with GapFiller". There are also some tools like GAPresolution that can help to perform local re-assemblies using 454 reads. We used GAPresolution but it is not a very good software, it is useful only in some specific situations.</p>

<p>Take a look at the PRICE software from the DeRisi lab. Its meant to do something very similar. http://derisilab.ucsf.edu/index.php?page=software</p>

<p>You could also look at SSPACE (http://www.baseclear.com/landingpages/basetools-a-wide-range-of-bioinformatics-solutions/sspacev12/), ATLAS tools (http://www.hgsc.bcm.tmc.edu/content/bcm-hgsc-software), and SCARPA (http://compbio.cs.toronto.edu/hapsembler/scarpa.html).</p>

<p>See the PAGIT protocol: http://www.sanger.ac.uk/resources/software/pagit/ </p>

<p>In particular, take a look at the IMAGE tool: http://genomebiology.com/2010/11/4/R41 </p>

<p>Also SOAPdenovo has ha function for scaffolding. Not sure about ABYSS</p>

<p>Here there is a useful explanation of several tools.</p>

<p>https://bioinformaticsonline.com/search?q=scaffolding&amp;entity_type=object&amp;entity_subtype=bookmarks&amp;offset=0&amp;search_type=entities</p>

<p>I could be wrong, but the above answers to your hypothetical scenario appear to miss the point that you aren't interested in assembling the full genome, just the 100 kb part you're interested in. I suggest the following algorithm:</p>

<p>1. Start with the initial assembly C0 of the contigs you have identified as overlapping your region of interest, and the set S of reads those contigs contain. Let C = C0.</p>

<p>2. Repeat:<br />a. Identify paired-end reads (not in C) for which one or both ends align within, or extending, contigs in C.<br />b. Identify unpaired reads that align extending these new paired-end reads.<br />c. Construct a new assembly C' from C and the new reads identified in (a) and (b).<br />d. Trim C' so it does not extend more than 100 kb to either end of C0. Set C = C'.<br />e. Let S' denote the reads that contribute to C'. If S' does not contain any reads not present in S, stop. Otherwise, Set S = S'.</p>

<p>3. If you don't have a complete assembly of the region of interest, generate an STS for each end of each contig, probe a library for clones including these STSes, subclone these clones into a paired-end sequencing vector, and generate paired-end reads for this library; then try steps (1) and (2) again, adding these new sequencing reads to what you had before.</p>

<p>4. If your average sequencing depth for the region of interest exceeds 25 or so without filling all gaps, it is likely that the remaining gaps represent sequences that are not getting cloned in your sequencing vectors. Try different sequencing vectors.</p>
]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30236/pyscaf</guid>
	<pubDate>Mon, 19 Dec 2016 14:20:33 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30236/pyscaf</link>
	<title><![CDATA[pyScaf]]></title>
	<description><![CDATA[<p>pyScaf orders contigs from genome assemblies utilising several types of information:</p>
<ul>
<li>paired-end (PE) and/or mate-pair libraries (<a href="https://github.com/lpryszcz/pyScaf#ngs-based-scaffolding">NGS-based mode</a>)</li>
<li>long reads (<a href="https://github.com/lpryszcz/pyScaf#scaffolding-based-on-long-reads">NGS-based mode</a>)</li>
<li>synteny to the genome of some related species (<a href="https://github.com/lpryszcz/pyScaf#reference-based-scaffolding">reference-based mode</a>)</li>
</ul>
<p>Scaffolding&nbsp;</p>
<p>In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.</p>
<p>Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:</p>
<ul>
<li>matches not satisfying cut-offs (<code>--identity</code>&nbsp;and&nbsp;<code>--overlap</code>)</li>
<li>suboptimal matches (only best match of each query to reference is kept)</li>
<li>and removing overlapping matches on reference.</li>
</ul>
<p>In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on&nbsp;<em>C. parapsilosis</em>&nbsp;(13 Mb; CANPA) and&nbsp;<em>A. thaliana</em>&nbsp;(119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (<a href="https://www.dropbox.com/sh/bb7lwggo40xrwtc/AAAZ7pByVQQQ-WhUXZVeJaZVa/pyScaf?dl=0">Figures in dropbox</a>,&nbsp;<a href="https://docs.google.com/spreadsheets/d/1InBExy-qKDLj-upd8tlPItVSKc4mLepZjZxB31ii9OY/edit#gid=2036953672">CANPA table</a>,&nbsp;<a href="https://docs.google.com/spreadsheets/d/1InBExy-qKDLj-upd8tlPItVSKc4mLepZjZxB31ii9OY/edit#gid=1920757821">ARATH table</a>).<br>Runs took ~0.5 min for CANPA on&nbsp;<code>4 CPUs</code>&nbsp;and ~2 min for ARATH on&nbsp;<code>16 CPUs</code>.</p>
<p><span>Important remarks:</span></p>
<ul>
<li>Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.</li>
<li>pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no&nbsp;<em>de novo assembler</em>&nbsp;/ scaffolder is perfect...), which breaks synteny.</li>
<li>pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.</li>
<li>pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations.&nbsp;<span>Note however, this is experimental implementation!</span></li>
<li>Consider closing gaps after scaffolding.</li>
</ul><p>Address of the bookmark: <a href="https://github.com/lpryszcz/pyScaf" rel="nofollow">https://github.com/lpryszcz/pyScaf</a></p>]]></description>
	<dc:creator>Bulbul</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32946/grass-a-generic-algorithm-for-scaffolding-next-generation-sequencing-assemblies</guid>
	<pubDate>Tue, 23 May 2017 05:20:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32946/grass-a-generic-algorithm-for-scaffolding-next-generation-sequencing-assemblies</link>
	<title><![CDATA[GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.]]></title>
	<description><![CDATA[<p><span>GRASS (GeneRic ASsembly Scaffolder)-a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation-maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.</span></p><p>Address of the bookmark: <a href="https://github.com/AlexeyG/GRASS" rel="nofollow">https://github.com/AlexeyG/GRASS</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35883/arcs-scaffolding-genome-drafts-with-linked-reads</guid>
	<pubDate>Tue, 06 Mar 2018 16:35:26 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35883/arcs-scaffolding-genome-drafts-with-linked-reads</link>
	<title><![CDATA[ARCS: scaffolding genome drafts with linked reads]]></title>
	<description><![CDATA[<p><span>ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS&nbsp;</span><em>H.sapiens</em><span>genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts.</span></p><p>Address of the bookmark: <a href="https://github.com/bcgsc/ARCS/" rel="nofollow">https://github.com/bcgsc/ARCS/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/38481/arcs-scaffolding-genome-drafts-with-linked-reads</guid>
	<pubDate>Mon, 17 Dec 2018 17:40:28 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/38481/arcs-scaffolding-genome-drafts-with-linked-reads</link>
	<title><![CDATA[ARCS: scaffolding genome drafts with linked reads]]></title>
	<description><![CDATA[<p>ARCS requires two input files:</p>
<ul>
<li>Draft assembly fasta file</li>
<li>Interleaved linked reads file (Barcode sequence expected in the BX tag of the read header or in the form "@readname_barcode" ; Run&nbsp;<a href="https://support.10xgenomics.com/genome-exome/software/pipelines/latest/what-is-long-ranger">Long Ranger basic</a>&nbsp;on raw chromium reads to produce this interleaved file)</li>
<li></li>
</ul><p>Address of the bookmark: <a href="https://github.com/bcgsc/ARCS/" rel="nofollow">https://github.com/bcgsc/ARCS/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42267/hapsolo-an-optimization-approach-for-removing-secondary-haplotigs-during-diploid-genome-assembly-and-scaffolding</guid>
	<pubDate>Mon, 26 Oct 2020 21:23:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42267/hapsolo-an-optimization-approach-for-removing-secondary-haplotigs-during-diploid-genome-assembly-and-scaffolding</link>
	<title><![CDATA[HapSolo: An optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding.]]></title>
	<description><![CDATA[<p><span>Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding.</span></p>
<p><span>More at&nbsp;https://github.com/esolares/HapSolo</span></p><p>Address of the bookmark: <a href="https://github.com/esolares/HapSolo" rel="nofollow">https://github.com/esolares/HapSolo</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>