<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/38169?offset=280</link>
	<atom:link href="https://bioinformaticsonline.com/related/38169?offset=280" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</guid>
	<pubDate>Mon, 13 Jun 2016 05:47:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</link>
	<title><![CDATA[BBMap/BBTools package: Multipurpose tool designed for converting reads or other nucleotide data between different formats.]]></title>
	<description><![CDATA[<div id="post_message_148585"><a href="https://sourceforge.net/projects/bbmap/" target="_blank">Reformat</a>is a member of the <a href="https://sourceforge.net/projects/bbmap/" target="_blank">BBMap/BBTools package</a>. It is a multipurpose tool designed for converting reads or other nucleotide data between different formats. It supports, and can inter-convert:<br /> <br /> fastq<br /> fasta<br /> fasta+qual<br /> sam<br /> scarf (an old Illumina format)<br /> bam (if samtools is installed)<br /> gzip<br /> zip<br /> ascii-33 (sanger)<br /> ascii-64 (old Illumina)<br /> paired files<br /> interleaved files<br /> <br /> It is multithreaded and can process data at over 500 megabytes per second, and can accept streams from standard in and write to standard out, allowing it to be easily dropped into the middle of a pipeline for format conversion. Reformat autodetects formats based on file extensions and content, making it very easy to use; and the autodetection can be overridden, allowing flexibility for people who don't like to follow naming conventions, or out-of-spec fastq files with qualities values like -17 or 120.<br /> <br /> The program has been gradually expanded, and can now perform various other functions. None of these will break pairing, if the input is paired.<br /> <br /> Quality trimming (either or both ends)<br /> Quality filtering<br /> Fixed-length trimming<br /> Generation of histograms (base composition, quality, etc)<br /> Subsampling (to a fraction of input reads, or an exact number of reads or bases)<br /> Changing fasta line-wrapping length<br /> Reverse-complementing (all reads or only read 2)<br /> Adding /1 and /2 suffix to read names<br /> GC-content filtering<br /> Length-filtering<br /> Testing for corrupted interleaved files<br /> <br /> Reformat is compatible with any platform that supports Java 1.7 or higher. It also has a bash shellscript for simpler invocation. Typical usage examples:<br /> <br /> Reformat fastq into fasta:<br /> <strong>reformat.sh in=x.fq out=y.fa</strong><br /> <br /> Interleave paired reads:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out=y.fq</strong><br /> <br /> Note - you can actually use a shortcut if paired read files have the same name with a 1 and a 2. This is equivalent to the above command:<br /> <strong>reformat.sh in=x#.fq out=y.fq</strong><br /> <br /> De-interleave reads:<br /> <strong>reformat.sh in=x.fq out1=y1.fq out2=y2.fq</strong><br /> <br /> Verify that interleaving appears correct, assuming Illumina namimg conventions:<br /> <strong>reformat.sh in=x.fq vint</strong><br /> <br /> Convert ASCII-33 to ASCII-64:<br /> <strong>reformat.sh in=x.fq out=y.fq qin=33 qout=64</strong><br /> <br /> Quality-trim paired reads to Q10 on the left and right ends and discard reads shorter than 50bp after trimming:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out1=y1.fq out2=y2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlength=50</strong><br /> <br /> Subsample 10% of the first 20000 pairs in an interleaved file:<br /> <strong>reformat.sh in=x.fq out=y.fq reads=20000 samplerate=0.1 int=t</strong><br /> (in this case "int=t" overrides interleaving autodetection, to ensure reads are treated as pairs)<br /> <br /> Pipe in a gzipped sam file and pipe out fasta:<br /> <strong>reformat.sh in=stdin.sam.gz out=stdout.fa</strong><br /> <br /> Reverse-complement reads:<br /> <strong>reformat.sh in=x.fq out=y.fq rcomp</strong><br /> <br /> For reformatting a file with very long sequences, Reformat will need more memory; just add the additional flag "-Xmx2g". For example, to change the line-wrapping length on the human genome (which has individual sequences over 200Mbp long) to 70 characters:<br /> <strong>reformat.sh -Xmx2g in=HG19.fa.gz out=HG19_wrapped.fa.gz fastawrap=70</strong><br /> <br /> For additional functions, please run the shellscript with no arguments, or just read it with a text editor. If you have any questions, please post them in this thread.<br /> <br /> For people using a non-bash terminal, you may need to type "bash reformat.sh" instead of just "reformat.sh".<br /> For users of Windows or other platforms that do not support bash shellscripts, replace "reformat.sh" with "java -ea -Xmx200m /path/to/bbmap/current/ jgi.ReformatReads"<br /> for example,<br /> <strong>java -ea -Xmx200m C:\bbmap\current\ jgi.ReformatReads in=x.fq out=y.fa</strong><br /> <br /> Reformat can be downloaded with BBTools here:<br /> <a href="https://sourceforge.net/projects/bbmap/" target="_blank">https://sourceforge.net/projects/bbmap/</a></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27967/linux-command-line-exercises-for-ngs-data-processing</guid>
	<pubDate>Wed, 22 Jun 2016 07:59:39 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27967/linux-command-line-exercises-for-ngs-data-processing</link>
	<title><![CDATA[Linux command line exercises for NGS data processing]]></title>
	<description><![CDATA[<p>The purpose of this tutorial is to introduce students to the frequently used tools for NGS analysis as well as giving experience in writing one-liners. Copy the required files to your current directory, change directory (<code>cd</code>) to the <code>linuxTutorial</code> folder, and do all the processing inside:</p>
<pre><span>[uzi@quince-srv2 ~/]$</span> cp -r /home/opt/MScBioinformatics/linuxTutorial .
<span>[uzi@quince-srv2 ~/]$</span> cd linuxTutorial
<span>[uzi@quince-srv2 ~/linuxTutorial]$</span>
</pre>
<p>I have deliberately chosen <code>Awk</code> in the exercises as it is a language in itself and is used more often to manipulate NGS data as compared to the other command line tools such as <code>grep</code>, <code>sed</code>, <code>perl</code> etc. Furthermore, having a command on <code>awk</code> will make it easier to understand advanced tutorials such as <a href="http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/Illumina_workflow.html">Illumina Amplicons Processing Workflow</a>. <br><br> In <code>Linux</code>, we use a shell that is a program that takes your commands from the keyboard and gives them to the operating system. Most Linux systems utilize Bourne Again SHell (<code>bash</code>), but there are several additional shell programs on a typical Linux system such as <code>ksh</code>, <code>tcsh</code>, and <code>zsh</code>. To see which shell you are using, type</p>
<pre><span>[uzi@quince-srv2 ~/linuxTutorial]$</span> echo $SHELL

<span>/bin/bash
</span></pre><p>Address of the bookmark: <a href="http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/linux.html" rel="nofollow">http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/linux.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28835/a5-miseq</guid>
	<pubDate>Thu, 18 Aug 2016 04:05:23 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28835/a5-miseq</link>
	<title><![CDATA[A5-miseq]]></title>
	<description><![CDATA[<p><span><span>_A5-miseq_ is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. This README will take you through the steps necessary for running _A5-miseq_. </span></span></p>
<p><span>Point to note:</span></p>
<p><span>There are many situations where A5-miseq is not the right tool for the job. In order to produce accurate results, A5-miseq requires Illumina data with certain characteristics. A5-miseq will likely not work well with Illumina reads shorter than around 80nt, or reads where the base qualities are low in all or most reads before 60nt. A5-miseq assumes it is assembling homozygous haploid genomes. Use a different assembler for metagenomes and heterozygous diploid or polyploid organisms. Use a different assembler if a tool like FastQC reports your data quality is dubious. You have been warned! Datasets consisting solely of unpaired reads are not currently supported.</span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/ngopt/" rel="nofollow">https://sourceforge.net/projects/ngopt/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28415/scarpa</guid>
	<pubDate>Wed, 13 Jul 2016 07:59:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28415/scarpa</link>
	<title><![CDATA[Scarpa]]></title>
	<description><![CDATA[<p><strong>Scarpa</strong>&nbsp;is a stand-alone scaffolding tool for NGS data. It can be used together with virtually any genome assembler and any NGS read mapper that supports SAM format. Other features include support for multiple libraries and an option to estimate insert size distributions from data. Scarpa is available free of charge for academic and commercial use under the GNU General Public License (GPL).</p>
<p>See the&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/hapsembler-2.21_manual.pdf">user manual</a>&nbsp;or the&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/scarpa_paper.pdf">paper</a>&nbsp;for more information about Scarpa. Click&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/ScarpaSupplementary.pdf">here</a>&nbsp;for the supplementary material.</p><p>Address of the bookmark: <a href="http://compbio.cs.toronto.edu/hapsembler/scarpa.html" rel="nofollow">http://compbio.cs.toronto.edu/hapsembler/scarpa.html</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29018/crossmap</guid>
	<pubDate>Mon, 05 Sep 2016 04:07:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29018/crossmap</link>
	<title><![CDATA[CrossMap]]></title>
	<description><![CDATA[<ul>
<li>CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between&nbsp;<em>different assemblies</em>&nbsp;(such as Human&nbsp;<a href="http://www.ncbi.nlm.nih.gov/assembly/2928/">hg18 (NCBI36)</a>&nbsp;&lt;&gt;&nbsp;<a href="http://www.ncbi.nlm.nih.gov/assembly/2758/">hg19 (GRCh37)</a>, Mouse&nbsp;<a href="http://www.ncbi.nlm.nih.gov/assembly/165668/">mm9 (MGSCv37)</a>&nbsp;&lt;&gt;&nbsp;<a href="http://www.ncbi.nlm.nih.gov/assembly/327618/">mm10 (GRCm38)</a>).</li>
<li>It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.</li>
<li>CrossMap is designed to liftover genome coordinates between assemblies. It&rsquo;s&nbsp;<em>not</em>&nbsp;a program for aligning sequences to reference genome.</li>
<li>We&nbsp;<em>do not</em>&nbsp;recommend using CrossMap to convert genome coordinates between species.</li>
</ul><p>Address of the bookmark: <a href="http://crossmap.sourceforge.net/" rel="nofollow">http://crossmap.sourceforge.net/</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28844/teannot</guid>
	<pubDate>Thu, 18 Aug 2016 10:02:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28844/teannot</link>
	<title><![CDATA[TEannot]]></title>
	<description><![CDATA[<p>We advise to run first the TEdenovo pipeline but it is not compulsory. We suppose you begin by running the TEannot pipeline on the example provided in the directory "db/" rather than directly on your own genomic sequences. Thus, from now on, the project name is "DmelChr4".</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://urgi.versailles.inra.fr/Tools/REPET/TEannot-tuto" rel="nofollow">https://urgi.versailles.inra.fr/Tools/REPET/TEannot-tuto</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28891/lumpy</guid>
	<pubDate>Thu, 25 Aug 2016 08:05:02 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28891/lumpy</link>
	<title><![CDATA[LUMPY]]></title>
	<description><![CDATA[<p>A probabilistic framework for structural variant discovery.</p>
<p>Ryan M Layer, Colby Chiang, Aaron R Quinlan, and Ira M Hall. 2014. "LUMPY: a Probabilistic Framework for Structural Variant Discovery." Genome Biology 15 (6): R84.&nbsp;<a href="http://dx.doi.org/10.1186/gb-2014-15-6-r84">doi:10.1186/gb-2014-15-6-r84</a>.</p>
<p>More at&nbsp;https://github.com/arq5x/lumpy-sv</p><p>Address of the bookmark: <a href="https://github.com/arq5x/lumpy-sv" rel="nofollow">https://github.com/arq5x/lumpy-sv</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28922/ka-ks-and-kaks-calculations</guid>
	<pubDate>Mon, 29 Aug 2016 11:44:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28922/ka-ks-and-kaks-calculations</link>
	<title><![CDATA[Ka, Ks and Ka/Ks calculations]]></title>
	<description><![CDATA[<p>gKaKs is a codon-based genome-level Ka/Ks computation pipeline developed and based on programs from four widely used packages: BLAT, BLASTALL (including bl2seq, formatdb and fastacmd), PAML (including codeml and yn00) and KaKs_Calculator (including 10 substitution rate estimation methods). gKaKs can automatically detect and eliminate frameshift mutations and premature stop codons to compute the substitution rates (Ka, Ks and Ka/Ks) between a well-annotated genome and a non-annotated genome or even a poorly assembled scaffold dataset. It is especially useful for newly sequenced genomes that have not been well annotated.&nbsp;</p>
<p>Look for KaKs calculation:</p>
<p>https://github.com/fumba/kaks-calculator</p>
<p>http://longlab.uchicago.edu/?q=gKaKs</p>
<p>http://www.ncbi.nlm.nih.gov/pubmed/23314322</p><p>Address of the bookmark: <a href="http://longlab.uchicago.edu/?q=gKaKs" rel="nofollow">http://longlab.uchicago.edu/?q=gKaKs</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28999/redundans</guid>
	<pubDate>Thu, 01 Sep 2016 08:28:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28999/redundans</link>
	<title><![CDATA[Redundans]]></title>
	<description><![CDATA[<p>Redundans pipeline assists&nbsp;<span>an assembly of heterozygous genomes</span>.<br>Program takes as input&nbsp;<span>assembled contigs</span>,&nbsp;<span>paired-end and/or mate pairs sequencing libraries</span>&nbsp;and returns&nbsp;<span>scaffolded homozygous genome assembly</span>, that should be&nbsp;<span>less fragmented</span>&nbsp;and with total&nbsp;<span>size smaller</span>&nbsp;than the input contigs. In addition, Redundans will automatically&nbsp;<span>close the gaps</span>&nbsp;resulting from genome assembly or scaffolding&nbsp;<a href="https://github.com/Gabaldonlab/redundans/blob/master/test#redundans-pipeline">more details</a>.</p>
<p>The pipeline consists of three steps/modules:</p>
<ul>
<li><span>redundancy reduction</span>: detection and selectively removal of redundant contigs from an initial&nbsp;<em>de novo</em>&nbsp;assembly</li>
<li><span>scaffolding</span>: joining of genome fragments using paired-end and/or mate-pairs reads</li>
<li><span>gap closing</span></li>
</ul>
<p>Redundans is:</p>
<ul>
<li><span>fast</span>&nbsp;&amp;&nbsp;<span>lightweight</span>, multi-core support and memory-optimised, so it can be run even on the laptop for small-to-medium size genomes</li>
<li><span>flexible</span>&nbsp;toward many sequencing technologies (Illumina, 454 or Sanger) and library types (paired-end, mate pairs, fosmids)</li>
<li><span>modular</span>: every step can be ommited or replaced by another tools</li>
</ul><p>Address of the bookmark: <a href="https://github.com/Gabaldonlab/redundans" rel="nofollow">https://github.com/Gabaldonlab/redundans</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/file/view/29108/assembly-tutorial-ppt</guid>
	<pubDate>Wed, 07 Sep 2016 03:12:53 -0500</pubDate>
	<link>https://bioinformaticsonline.com/file/view/29108/assembly-tutorial-ppt</link>
	<title><![CDATA[Assembly tutorial PPT]]></title>
	<description><![CDATA[<p>Saved Cornell University assembly workshop PPT.</p><p>Reference:&nbsp;</p><p>http://cbsu.tc.cornell.edu/lab/doc/assembly_workshop_20150420_lecture1.pdf</p>]]></description>
	<dc:creator>Jit</dc:creator>
	<enclosure url="https://bioinformaticsonline.com/file/download/29108" length="1617402" type="application/pdf" />
</item>

</channel>
</rss>