<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/27076?offset=170</link>
	<atom:link href="https://bioinformaticsonline.com/related/27076?offset=170" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27094/smash-an-alignment-free-method-to-find-and-visualise-rearrangements-between-pairs-of-dna-sequences</guid>
	<pubDate>Tue, 26 Apr 2016 12:18:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27094/smash-an-alignment-free-method-to-find-and-visualise-rearrangements-between-pairs-of-dna-sequences</link>
	<title><![CDATA[Smash: An alignment-free method to find and visualise rearrangements between pairs of DNA sequences]]></title>
	<description><![CDATA[<p><strong>Smash is a completely alignment-free method/tool to find and visualise genomic rearrangements</strong><span>. The detection is based on&nbsp;</span><strong>conditional exclusive compression</strong><span>, namely using a FCM (Markov model), of high context order (typically 20). For visualisation, Smash outputs a&nbsp;</span><strong>SVG image</strong><span>, with an&nbsp;</span><strong>ideogram</strong><span>output architecture, where the patterns are represented with several&nbsp;</span><strong>HSV values</strong><span>&nbsp;(only value varies). The method can perform both in small- and large-scale. Nevertheless is more directed to large-scale since that the main aim of the research is to&nbsp;</span><strong>know where the large-scale [chromosomal by chromosome] of several primates was equal/different, having at a glance a map of the entire genomes</strong><span>.</span></p><p>Address of the bookmark: <a href="http://bioinformatics.ua.pt/software/smash/" rel="nofollow">http://bioinformatics.ua.pt/software/smash/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29343/accnet</guid>
	<pubDate>Fri, 07 Oct 2016 05:22:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29343/accnet</link>
	<title><![CDATA[AccNET]]></title>
	<description><![CDATA[<p><span>AccNET is a Perl application that presents a new way to study the accessory genome of a given set of organisms. Using the proteomes of these organisms, AccNET create a bipartite network compatible with common network analysis platforms. AccNET collects phylogenetic and functional information in a network improving the analysis capability. Networks offer a new perspective of organism organization through elements acquired by horizontal gene transfers and not constricted by hierarchical structures.</span></p>
<p><span>More at&nbsp;https://www.youtube.com/watch?v=vdGuy1GAJrQ</span></p><p>Address of the bookmark: <a href="https://sourceforge.net/projects/accnet/" rel="nofollow">https://sourceforge.net/projects/accnet/</a></p>]]></description>
	<dc:creator>Jitendra Narayan</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27430/mosaik-a-hash-based-algorithm-for-accurate-next-generation-sequencing-short-read-mapping</guid>
	<pubDate>Fri, 20 May 2016 18:53:49 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27430/mosaik-a-hash-based-algorithm-for-accurate-next-generation-sequencing-short-read-mapping</link>
	<title><![CDATA[MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping]]></title>
	<description><![CDATA[<p><span>MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery.</span></p><p>Address of the bookmark: <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581" rel="nofollow">http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0090581</a></p>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</guid>
	<pubDate>Thu, 16 Jun 2016 17:55:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</link>
	<title><![CDATA[CNIDARIA: fast, reference-free phylogenomic clustering]]></title>
	<description><![CDATA[<p>Motivation: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but these do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.</p>
<p>Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on ge-nome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at supra-species level and 78% accuracy for species level.</p>
<p>Availability and Implementation: Cnidaria is written in C++ and Python and is available at http://www.ab.wur.nl/cnidaria.</p>
<p>Contact: Saulo Aflitos - sauloal@gmail.com</p>
<p>Supplementary information: Supplementary data are available at Bioinformatics online.</p><p>Address of the bookmark: <a href="https://github.com/sauloal/cnidaria/wiki" rel="nofollow">https://github.com/sauloal/cnidaria/wiki</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27971/samtools-primer</guid>
	<pubDate>Thu, 23 Jun 2016 07:18:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27971/samtools-primer</link>
	<title><![CDATA[Samtools Primer !!]]></title>
	<description><![CDATA[<p>SAMtools: Primer / Tutorial by Ethan Cerami, Ph.D.<br><br>keywords: samtools, next-gen, next-generation, sequencing, bowtie, sam, bam, primer, tutorial, how-to, introduction<br>Revisions<br><br>&nbsp;&nbsp;&nbsp; 1.0: May 30, 2013: First public release on biobits.org.<br>&nbsp;&nbsp;&nbsp; 1.1: July 24, 2013: Updated with Disqus Comments / Feedback section.<br>&nbsp;&nbsp;&nbsp; 1.2: December 19, 2014: Multiple updates, including:<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Updated to use samtools 1.1 and bcftools 1.2.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Updated usage for bcftools.<br><br>About<br><br>SAMtools is a popular open-source tool used in next-generation sequence analysis. This primer provides an introduction to SAMtools, and is geared towards those new to next-generation sequence analysis. The primer is also designed to be self-contained and hands-on, meaning that you only need to install SAMtools, and no other tools, and sample data sets are provided. Terms in bold are also explained in the glossary at the end of the document.</p><p>Address of the bookmark: <a href="http://biobits.org/samtools_primer.html" rel="nofollow">http://biobits.org/samtools_primer.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/28112/ngs-glossary</guid>
	<pubDate>Mon, 27 Jun 2016 08:56:18 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/28112/ngs-glossary</link>
	<title><![CDATA[NGS Glossary !!]]></title>
	<description><![CDATA[<p><strong>alignment</strong>: the mapping of a raw sequence read to a location within a reference genome. The mapping occurs because the sequences within the raw read match or align to sequences within the reference genome. Alignment information is stored in the <strong>SAM</strong> or <strong>BAM</strong> file formats.</p><p><strong>bcftools</strong>: a set of companion tools, currently bundled with SAMtools, for identifying and filtering genomics variants.</p><p><strong>bowtie</strong>: widely used, open source alignment software for aligning raw sequence reads to a reference genome.</p><p><strong>BAM Format</strong>: binary, compressed format for storing <strong>SAM</strong> data.</p><p><strong>BCF Format</strong>: Binary call format. Binary, compressed format for storing <strong>VCF</strong> data.</p><p><strong>CIGAR String</strong>: Compact Idiosyncratic Gapped Alignment Report. A compact string that (partially) summarizes the alignment of a raw sequence read to the reference genome. Three core abbreviations are used: M for alignment match; I for insertion; and D for Deletion. For example, a CIGAR string of 5M2I63M indicates that the first 5 base pairs of the read align to the reference, followed by 2 base pairs, which are unique to the read, and not in the reference genome, followed by an additional 63 base pairs of alignment.</p><p><strong>FASTA Format</strong>: text format for storing raw sequence data. For example, the FASTA file at: <a href="http://www.ncbi.nlm.nih.gov/nuccore/NC_008253">http://www.ncbi.nlm.nih.gov/nuccore/NC_008253</a> contains entire genome for Escherichia coli 536.</p><p><strong>FASTQ Format</strong>: text format for storing raw sequence data along with quality scores for each base; usually generated by sequencing machines.</p><p><strong>genotype likelihood</strong>: the probability that a specific genotype is present in the sample of interest. Genotype likelihoods are usually expressed as a <strong>Phred-scaled probability</strong>, where P = 10 ^ (-Q/10). For example, if the genotype TT (both alleles are T) at position 1,299,132 in human chromosome 12 (reference G) is 37, this translates to a probability of 10<sup>-37/10</sup> = 0.0001995, meaning that there is very low probability that the reads in your sample support a TT genotype. On the other hand, a genotype of AA at the same position with a score of 0 translates into a probability of 10<sup>-0</sup> = 1, indicating extremely high probability that your sample contains a homozygous mutation of G to A.</p><p><strong>mate-pair</strong>: in paired-end sequencing, both ends of a single DNA or RNA fragment are sequenced, but the intermediate region is not. The two ends which are sequenced form a pair, and are frequently referred to as mate-pairs.</p><p><strong>QNAME</strong>: unique identifier of a raw sequence read (also known as the Query Name). Used in <strong>FASTQ</strong> and <strong>SAM</strong> files.</p><p><strong>paired-end sequencing</strong>: sequencing process where both ends of a single DNA or RNA fragment are sequenced, but the intermediate region is not. Particularly useful for identifying structural rearrangements, including gene fusions.</p><p><strong>Phred-scaled probability</strong>: a scaled value (Q) used to compactly summarize a probability, where P = 10<sup>-Q/10</sup>. For example, a Phred Q score of 10 translates to probability (P) = 10<sup>-10/10</sup> = 0.1. Phred-scaled probabilities are common in next-generation sequencing, and are used to represent multiple types of quality metrics, including quality of base calls, quality of mappings, and probabilities associated with specific genotypes. The name Phred refers to the original Phred base-calling software, which first used and developed the scale.</p><p><strong>Phred quality score</strong>: a score assigned to each base within a sequence, quantifying the probability that the base was called incorrectly. Scores use a <strong>Phred-scaled probability</strong> metric. For example, a Phred Q score of 10 translates to P=10<sup>-10/10</sup> = 0.1, indicating that the base has a 0.1 probability of being incorrect. Higher Phred score correspond to higher accuracy. In the <strong>FASTQ format</strong>, Phred scores are represented as single ASCII letters. For details on translating between Phred scores and ASCII values, refer to <a href="http://www.somewhereville.com/?p=1508">Table 1 of this useful blog post from Damian Gregory Allis</a>.</p><p><strong>read-length</strong>: the number of base pairs that are sequenced in an individual sequence read.</p><p><strong>read-depth</strong>: the number of sequence reads that pile up at the same genomic location. For example, 30X read-depth coverage indicates that the genomic location is covered by 30 independent sequencing reads. Increased read-depth translates into higher confidence for calling genomic variants.</p><p><strong>RNAME</strong>: reference genome identifier (also known as the Reference Name). Within a SAM formatted file, the RNAME identifies the reference genome where the raw read aligns.</p><p><strong>SAM Flag</strong>: a single integer value (e.g. 16), which encodes multiple elements of meta-data regarding a read and its alignment. Elements include: whether the read is one part of a paired-end read, whether the read aligns to the genome, and whether the read aligns to the forward or reverse strand of the genome. A <a href="http://picard.sourceforge.net/explain-flags.html">useful online utility</a> decodes a single SAM flag value into plain English.</p><p><strong>SAM Format</strong>: Text file format for storing sequence alignments against a reference genome. See also <strong>BAM</strong> Format.</p><p><strong>SAMtools</strong>: widely used, open source command line tool for manipulating SAM/BAM files. Includes options for converting, sorting, indexing and viewing SAM/BAM files. The SAMtools distribution also includes bcftools, a set of command line tools for identifying and filtering genomics variants. Created by <a href="http://lh3lh3.users.sourceforge.net/">Heng Li</a>, currently of the Broad Institute.</p><p><strong>single-read sequencing</strong>: sequencing process where only one end of a DNA or RNA fragment is sequenced. Contrast with <strong>paired-end</strong> sequencing.</p><p><strong>VCF Format</strong>: Variant call format. Text file format for storing genomic variants, including single nucleotide polymorphisms, insertions, deletions and structural rearrangements. See also <strong>BCF</strong> format.</p><p><strong>Next</strong><strong>Generation</strong><strong>Sequencing</strong><br /> A high-throughput sequencing method which parallelizes the sequencing process, producing thousands or millions of sequences at once.</p><p><strong>Deep</strong><strong>Sequencing</strong><br /> Techniques of nucleotide sequence analysis that increase the range, complexity, sensitivity, and accuracy of results by greatly increasing the scale of operations and thus the number of nucleotides, and the number of copies of each nucleotide sequenced.</p><p><strong>Paired-End</strong><strong>Sequencing</strong><br /> Sequence both ends of the same fragment and keep track of the paired data.</p><p><strong>Adapter</strong><br /> Short oligonucleotides which are attached to the DNA to be sequenced. An adapter can provide a priming site for both amplification and sequencing of the adjoining, unknown nucleic acid.</p><p><strong>Library</strong><br /> A collection of DNA fragments with adapters ligated to each end.</p><p><strong>Bridge</strong><strong>Amplification</strong><br /> Generation of in situ copies of a specific DNA molecule on an oligo-decorated solid support.</p><p><strong>Emulsion</strong><strong>PCR</strong><br /> A method for bead-based amplification of a library. A single adapter-bound fragment is attached to the surface of a bead, and an oil emulsion containing necessary amplification reagents is formed around the bead/fragment component. Parallel amplification of millions of beads with millions of single strand fragments produces a sequencer-ready library.</p><p><strong>Alignment</strong><br /> Mapping of sequence reads to a known reference sequence</p><p><strong>Reference</strong><strong>sequence</strong><strong>/</strong><strong>genome</strong><strong>&nbsp; </strong><br /> A fully assembled version of a genome that can be used for mapping short DNA sequence reads for comparisons of genomes from various individuals</p><p><strong>Coverage</strong><strong>Depth</strong><br /> The number of nucleotides from reads that are mapped to a given position of reference genome.</p><p><strong>Specificity</strong><strong>&nbsp; </strong><br /> The percentage of sequences that map to the intended targets out of total bases per run.</p><p><strong>Uniformity</strong><strong>&nbsp; </strong><br /> The variability in sequence coverage across target regions.</p><p><strong>Homopolymer</strong><br /> Uninterrupted stretch of a single nucleotide type (e.g., TTT or GGGGGG)</p><p><strong>InDel</strong><br /> InDel stands for Insertion or deletion. A form of structural variation in which a DNA segment is either deleted or inserted.</p><p><strong>SNP</strong><strong>&nbsp; </strong></p><p>SNP stands for Single Nucleotide Polymorphism. A single base difference found when comparing the same DNA sequence from two different individuals.</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28417/wisescaffolder</guid>
	<pubDate>Wed, 13 Jul 2016 08:08:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28417/wisescaffolder</link>
	<title><![CDATA[WiseScaffolder]]></title>
	<description><![CDATA[<p>Function</p>
<p>WiseScaffolder is a stand-alone semi-automatic application for genome scaffolding of pre-assembled contigs using mate-pair data. It also produces editable scaffold maps, allowing either to build gapped scaffolds or usable as a common thread for the manual improvement of scaffolds.</p>
<p>Description&nbsp;</p>
<p>WiseScaffolder includes 4 subcommands: dumpconfig generates a configuration file that notably specifies the average insert size of the mate-pair library preprocess allows the detection and correction of chimerae, the estimation of contigs copy number and produces valuable outputs for the manual improvement of scaffolds scaffold constitutes the central scaffold-builder and comprises two modules:</p>
<p>i) the interative_scaffold_extender, which works with big, unambiguous contigs, or when they run out, single copy contigs, and</p>
<p>ii) the small_contig_inserter, which inserts the small contigs within scaffolds buildfasta converts the scaffold(s) map(s) into Fasta sequences.</p><p>Address of the bookmark: <a href="http://abims.sb-roscoff.fr/wisescaffolder" rel="nofollow">http://abims.sb-roscoff.fr/wisescaffolder</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/29235/valet</guid>
	<pubDate>Thu, 22 Sep 2016 04:27:09 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/29235/valet</link>
	<title><![CDATA[valet]]></title>
	<description><![CDATA[<div>
<div>
<div>VALET is a pipeline for performing&nbsp;<em>de novo</em>&nbsp;validation of metagenomic assemblies. VALET checks a number of properties that should hold true for a correct assembly (e.g., mate-pairs are aligned at the correct distance from each other in the assembly, the depth of coverage is fairly uniform along contigs, etc.). The violations of these invariants are reported allowing one to pinpoint areas that were potentially mis-assembled, or to compare the quality of different assemblies. For comparing multiple assemblies of the same data-sets, VALET also reports an overall estimate of the likelihood a particular assembly is correct.</div>
</div>
</div>
<div>
<div>Home Page:&nbsp;</div>
<div>
<div><a href="https://github.com/jgluck/VALET">VALET code repository</a></div>
</div>
</div><p>Address of the bookmark: <a href="https://www.cbcb.umd.edu/software/valet" rel="nofollow">https://www.cbcb.umd.edu/software/valet</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34528/cope-an-accurate-k-mer-based-pair-end-reads-connection-tool-to-facilitate-genome-assembly</guid>
	<pubDate>Wed, 06 Dec 2017 02:08:14 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34528/cope-an-accurate-k-mer-based-pair-end-reads-connection-tool-to-facilitate-genome-assembly</link>
	<title><![CDATA[COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly]]></title>
	<description><![CDATA[<p><span>An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30&times; simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.</span></p><p>Address of the bookmark: <a href="ftp://ftp.genomics.org.cn/pub/cope" rel="nofollow">ftp://ftp.genomics.org.cn/pub/cope</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28842/repeatmodeler</guid>
	<pubDate>Thu, 18 Aug 2016 09:57:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28842/repeatmodeler</link>
	<title><![CDATA[RepeatModeler]]></title>
	<description><![CDATA[<p><span>RepeatModeler is a de-novo repeat family identification and modeling package. At the heart of RepeatModeler are two de-novo repeat finding programs ( RECON and RepeatScout ) which employ complementary computational methods for identifying repeat element boundaries and family relationships from sequence data. RepeatModeler assists in automating the runs of RECON and RepeatScout given a genomic database and uses the output to build, refine and classify consensus models of putative interspersed repeats.</span></p><p>Address of the bookmark: <a href="http://www.repeatmasker.org/RepeatModeler.html" rel="nofollow">http://www.repeatmasker.org/RepeatModeler.html</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>