<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/43859?offset=160</link>
	<atom:link href="https://bioinformaticsonline.com/related/43859?offset=160" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</guid>
	<pubDate>Fri, 13 Dec 2024 11:35:55 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44722/step-by-step-guide-to-running-genome-assembly</link>
	<title><![CDATA[Step-by-Step Guide to Running Genome Assembly]]></title>
	<description><![CDATA[<p>Genome assembly is a critical process in bioinformatics, enabling the reconstruction of an organism's genome from short DNA sequence reads. Whether you&rsquo;re working on a new microbial genome or a complex eukaryotic organism, this guide will walk you through the steps of genome assembly using state-of-the-art tools and best practices.</p><h4><strong>What is Genome Assembly?</strong></h4><p>Genome assembly involves piecing together short DNA sequence reads generated by sequencing platforms (e.g., Illumina, PacBio, Oxford Nanopore) into longer, contiguous sequences called contigs. This can be performed as:</p><ul>
<li><strong>De Novo Assembly</strong>: Without a reference genome.</li>
<li><strong>Reference-Guided Assembly</strong>: Using a reference genome to guide the assembly process.</li>
</ul><h4><strong>Step 1: Preparing Your Data</strong></h4><p>Before starting the assembly, ensure that your raw sequencing data is high quality.</p><ol>
<li>
<p><strong>Input Data</strong></p>
<ul>
<li><strong>Short Reads</strong>: Illumina sequencing generates short, accurate reads ideal for scaffolding.</li>
<li><strong>Long Reads</strong>: PacBio and Nanopore sequencing provide long reads for resolving repetitive regions.</li>
</ul>
</li>
<li>
<p><strong>Quality Control (QC)</strong><br />Use tools like <strong>FastQC</strong> or <strong>MultiQC</strong> to assess the quality of your reads:</p>
<div>
<div dir="ltr"><code>fastqc reads.fastq multiqc . </code></div>
</div>
<p>Look for issues like low-quality bases, adapter contamination, or overrepresented sequences.</p>
</li>
<li>
<p><strong>Read Trimming and Filtering</strong><br />Trim low-quality bases and adapters using <strong>Trimmomatic</strong> or <strong>Cutadapt</strong>:</p>
<div>
<div dir="ltr"><code>trimmomatic PE reads_R1.fastq reads_R2.fastq trimmed_R1.fastq trimmed_R2.fastq \ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36 </code></div>
</div>
</li>
</ol><h4><strong>Step 2: Choosing an Assembly Strategy</strong></h4><p>Select an assembly strategy based on your data type:</p><ul>
<li>
<p><strong>Short-Read Assemblers</strong>:</p>
<ul>
<li>SPAdes: Popular for microbial genomes.</li>
<li>Velvet: Fast for smaller genomes.</li>
</ul>
</li>
<li>
<p><strong>Long-Read Assemblers</strong>:</p>
<ul>
<li>Canu: Ideal for long-read datasets.</li>
<li>Flye: Versatile for small and large genomes.</li>
</ul>
</li>
<li>
<p><strong>Hybrid Assemblers</strong>:</p>
<ul>
<li>MaSuRCA: Combines short and long reads.</li>
<li>Unicycler: Optimized for bacterial genomes.</li>
</ul>
</li>
</ul><h4><strong>Step 3: Running the Assembly</strong></h4><h5><strong>3.1. SPAdes (Short-Read Assembly)</strong></h5><p>SPAdes is an excellent choice for small genomes, such as bacteria.</p><div><div dir="ltr"><code>spades.py -1 trimmed_R1.fastq -2 trimmed_R2.fastq -o spades_output </code></div></div><p>The output includes assembled contigs (<code>contigs.fasta</code>) and scaffolds (<code>scaffolds.fasta</code>).</p><h5><strong>3.2. Canu (Long-Read Assembly)</strong></h5><p>Canu is designed for high-error long reads from PacBio or Nanopore.</p><div><div dir="ltr"><code>canu -p genome -d canu_output genomeSize=4.7m -nanopore-raw reads.fastq </code></div></div><p>The output will be in <code>canu_output/genome.contigs.fasta</code>.</p><h5><strong>3.3. Hybrid Assembly with Unicycler</strong></h5><p>Unicycler combines short and long reads for improved assemblies.</p><div><div dir="ltr"><code>unicycler -1 trimmed_R1.fastq -2 trimmed_R2.fastq -l long_reads.fastq -o unicycler_output </code></div></div><h4><strong>Step 4: Assessing Assembly Quality</strong></h4><p>After assembly, evaluate its quality using the following tools:</p><ol>
<li>
<p><strong>QUAST</strong><br />QUAST generates assembly statistics, such as N50, genome size, and GC content:</p>
<div>
<div dir="ltr"><code>quast contigs.fasta -o quast_output </code></div>
</div>
</li>
<li>
<p><strong>BUSCO</strong><br />BUSCO checks genome completeness by identifying conserved genes:</p>
<div>
<div dir="ltr"><code>busco -i contigs.fasta -o busco_output -l fungi_odb10 -m genome </code></div>
</div>
</li>
<li>
<p><strong>Assembly Graph Visualization</strong><br />Visualize assembly graphs with <strong>Bandage</strong>:</p>
<div>
<div dir="ltr"><code>Bandage load assembly_graph.gfa </code></div>
</div>
</li>
</ol><hr><h4><strong>Step 5: Post-Assembly Steps</strong></h4><ol>
<li>
<p><strong>Polishing</strong><br />Improve assembly accuracy using tools like <strong>Pilon</strong> (for short reads) or <strong>Racon</strong> (for long reads).</p>
<div>
<div dir="ltr"><code>racon long_reads.fasta mapped_reads.sam contigs.fasta &gt; polished_contigs.fasta </code></div>
</div>
</li>
<li>
<p><strong>Scaffolding</strong><br />Link contigs into scaffolds using tools like <strong>SSPACE</strong> or <strong>Opera-LG</strong> if required.</p>
</li>
<li>
<p><strong>Annotation</strong><br />Annotate the assembled genome using <strong>Prokka</strong> for prokaryotes or <strong>Maker</strong> for eukaryotes.</p>
<div>
<div dir="ltr"><code>prokka --outdir annotation_output --prefix genome contigs.fasta </code></div>
</div>
</li>
</ol><h4><strong>Step 6: Sharing and Archiving</strong></h4><ol>
<li>
<p><strong>Submit to Public Repositories</strong><br />Share your assembly in databases like <strong>NCBI GenBank</strong>, <strong>ENA</strong>, or <strong>DDBJ</strong>.</p>
</li>
<li>
<p><strong>Metadata Preparation</strong><br />Include detailed metadata for your submission, such as organism name, sequencing platform, and coverage.</p>
</li>
</ol><h4><strong>Best Practices</strong></h4><ul>
<li>Always perform quality checks at each stage to ensure data integrity.</li>
<li>Use multiple tools to cross-validate results when working with complex genomes.</li>
<li>Document parameters and software versions for reproducibility.</li>
</ul><h4><strong>Conclusion</strong></h4><p>Genome assembly is a powerful process that transforms raw sequencing data into a coherent representation of an organism&rsquo;s genome. By following this step-by-step guide, you can successfully assemble genomes and uncover valuable biological insights. Whether you&rsquo;re assembling a microbial genome or tackling the complexities of a eukaryotic genome, these tools and strategies will set you on the path to success.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44775/genomic-architecture-surrounding-the-fusion-site-of-human-chromosome-2</guid>
	<pubDate>Tue, 04 Mar 2025 12:26:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44775/genomic-architecture-surrounding-the-fusion-site-of-human-chromosome-2</link>
	<title><![CDATA[Genomic architecture surrounding the fusion site of human chromosome 2]]></title>
	<description><![CDATA[<p>The article <strong>"Genomic Structure and Evolution of the Ancestral Chromosome Fusion Site in 2q13&ndash;2q14.1 and Paralogous Regions on Other Human Chromosomes (https://pmc.ncbi.nlm.nih.gov/articles/PMC187548/)"</strong> explores the genomic architecture surrounding the fusion site of human chromosome 2. This fusion event is a key evolutionary marker distinguishing humans from other great apes, as humans have 46 chromosomes while chimpanzees, gorillas, and orangutans possess 48. The fusion occurred through an end-to-end joining of two ancestral chromosomes, which remain separate in nonhuman primates.</p><h3><strong>Key Findings:</strong></h3><ol>
<li>
<p><strong>Chromosomal Fusion and Its Molecular Signature:</strong></p>
<ul>
<li>The fusion site is located at <strong>2q13&ndash;2q14.1</strong> and is characterized by <strong>degenerate telomeric sequences</strong> appearing interstitially, indicating the historical head-to-head joining of ancestral chromosomes.</li>
<li>Despite being a signature of a past fusion event, these telomeric repeats are no longer functional and have undergone sequence degradation over time.</li>
</ul>
</li>
<li>
<p><strong>Extensive Duplications in the Surrounding Genomic Region:</strong></p>
<ul>
<li>The study identifies <strong>large-scale segmental duplications</strong> flanking the fusion site, with several of these regions duplicated and scattered across multiple chromosomes.</li>
<li>These duplications are predominantly located in <strong>subtelomeric and pericentromeric regions</strong>, suggesting their role in genomic instability and chromosomal evolution.</li>
</ul>
</li>
<li>
<p><strong>Paralogous Regions and Their Evolutionary Relationships:</strong></p>
<ul>
<li>A <strong>168-kilobase (kb) segment</strong> near the fusion site has <strong>98%&ndash;99% sequence identity</strong> with three regions on <strong>chromosome 9 (9pter, 9p11.2, and 9q13)</strong>.</li>
<li>Another <strong>67-kb region distal to the fusion site</strong> shows a high degree of homology to sequences in <strong>chromosome 22qter</strong>.</li>
<li>Additionally, a <strong>100-kb segment</strong> exhibits <strong>96% sequence identity</strong> with a region in <strong>chromosome 2q11.2</strong>.</li>
</ul>
</li>
<li>
<p><strong>Comparative Genomics and Evolutionary Implications:</strong></p>
<ul>
<li>By comparing the duplicated sequences and their arrangement in primates, the researchers traced the order of duplication events leading to their present distribution.</li>
<li>The presence of specific repetitive elements within these duplicated segments serves as <strong>evolutionary markers</strong> that help infer their historical rearrangements.</li>
<li>Some of these <strong>duplicated regions are associated with chromosomal inversion breakpoints</strong>, potentially contributing to evolutionary changes in primates.</li>
<li>Recurrent <strong>structural rearrangements</strong> in these regions have been linked to human chromosomal disorders.</li>
</ul>
</li>
</ol><h3><strong>Conclusions and Implications:</strong></h3><ul>
<li>The findings provide valuable insights into <strong>the structural evolution of human chromosome 2</strong>, which played a crucial role in human speciation.</li>
<li>Understanding these <strong>segmental duplications</strong> and their evolutionary trajectories sheds light on <strong>genomic instability</strong>, which may contribute to <strong>human genetic diseases</strong>.</li>
<li>The study highlights how large-scale chromosomal rearrangements, such as fusion and duplication, have influenced the <strong>evolutionary divergence of humans</strong> from other primates.</li>
</ul><p>This research advances our understanding of <strong>human genome evolution</strong> and offers a foundation for studying the effects of <strong>structural variants in genetic disorders</strong>.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39898/itis-the-integrated-taxonomic-information-system</guid>
	<pubDate>Fri, 30 Aug 2019 23:07:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39898/itis-the-integrated-taxonomic-information-system</link>
	<title><![CDATA[ITIS: the Integrated Taxonomic Information System!]]></title>
	<description><![CDATA[<p><span>ITIS, the Integrated Taxonomic Information System! Here you will find authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. We are a&nbsp;</span><a href="https://www.itis.gov/organ.html">partnership</a><span>&nbsp;of U.S.,&nbsp;</span><a href="http://www.cbif.gc.ca/eng/home/?id=1370403266262" target="_blank">Canadian</a><span>, and&nbsp;</span><a href="http://www.conabio.gob.mx/" target="_blank">Mexican</a><span>&nbsp;agencies (</span><a href="http://www.cbif.gc.ca/eng/integrated-taxonomic-information-system-itis/?id=1381347793621" target="_blank">ITIS-North America</a><span>); other organizations; and taxonomic specialists. ITIS is also a partner of&nbsp;</span><a href="http://www.sp2000.org/" target="_blank">Species 2000</a><span>&nbsp;and the&nbsp;</span><a href="http://www.gbif.org/" target="_blank">Global Biodiversity Information Facility (GBIF)</a><span>. The ITIS and Species 2000&nbsp;</span><a href="http://www.catalogueoflife.org/annual-checklist/" target="_blank">Catalogue of Life (CoL)</a><span>partnership is proud to provide the taxonomic backbone to the&nbsp;</span><a href="http://www.eol.org/" target="_blank">Encyclopedia of Life (EOL)</a><span>.&nbsp;</span></p>
<p><span><a href="https://www.itis.gov/pdf/twb_ug.pdf">https://www.itis.gov/pdf/twb_ug.pdf</a></span></p><p>Address of the bookmark: <a href="https://www.itis.gov/" rel="nofollow">https://www.itis.gov/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/fun/view/14036/introduction-to-programming-write-short-programs-that-generate-graphics-and-animation</guid>
	<pubDate>Thu, 14 Aug 2014 23:29:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/fun/view/14036/introduction-to-programming-write-short-programs-that-generate-graphics-and-animation</link>
	<title><![CDATA[Introduction to programming. Write short programs that generate graphics and animation.]]></title>
	<description><![CDATA[<p>Introduction to programming. Write short programs that generate graphics and animation.</p><p>http://funprogramming.org/</p>]]></description>
	<dc:creator>Ram Yash Pal</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/22571/pattern-matching-problem-solution-with-perl</guid>
	<pubDate>Tue, 09 Jun 2015 23:58:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/22571/pattern-matching-problem-solution-with-perl</link>
	<title><![CDATA[Pattern Matching Problem Solution with Perl]]></title>
	<description><![CDATA[<p>Problem at http://rosalind.info/problems/1c/</p><p>#Find all occurrences of a pattern in a string.<br />#Given: Strings Pattern and Genome.<br />#Return: All starting positions in Genome where Pattern appears as a substring. Use 0-based indexing.<br /><br />use strict;<br />use warnings;<br /><br />my $string="GATATATGCATATACTT";<br />my $subStr="ATAT";<br />my $kmer=length($subStr);<br /><br />kmerMatch ($string, $subStr, $kmer);<br /><br />sub kmerMatch { #Check the exact matching kmers with sliding window<br />my ($string, $myStr, $kmer)=@_;<br />for (my $aa=0; $aa&lt;=(length($string)-$kmer); $aa++) {<br />&nbsp;&nbsp;&nbsp; my $myWin=substr&nbsp; $string, $aa,$kmer;<br />&nbsp;&nbsp;&nbsp; if ($myWin eq $myStr) {<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #print "$myWin eq $myStr\n";<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; print $aa;<br />&nbsp;&nbsp;&nbsp; }<br />}<br />}</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/22961/bioscripts</guid>
	<pubDate>Sun, 28 Jun 2015 07:46:14 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/22961/bioscripts</link>
	<title><![CDATA[BioScripts]]></title>
	<description><![CDATA[<p>You are requested to please bookmark collection of bioinformatics tools, scripts, codes that can be pieced together in a very easy and flexible manner to perform both simple and complex bioinformatics tasks.</p>
<p>The next-generation sequencing included whole genome sequencing(WGS), transcriptome sequencing (whole cDNA sequencing, RNA-seq), digital gene expression sequencing (Tag-Seq), ChIP-Seq, and so on. And there are many sequencing platform to generate sequece, as well know Sanger/ABi(the frist generation), Solexa/illumina, SOLiD/ABi, 454/Roche. But thier sequence format is different, also they have different error type. High quality data is very important for further analysis or data mining. There are many pipeline for raw sequence quality analysis and control with few of process for reporting reads quality statistical details, trimming, filtering, and error correction. Please bookmarks them for the benefits of bioinformatics community.</p>
<p>https://code.google.com/p/biowiki/</p>
<p>https://code.google.com/p/ngs-pipeline/source/browse/#svn%2Ftrunk</p>
<p>NGSand Perl scripts https://code.google.com/hosting/search?q=NGS+perl&amp;projectsearch=Search+projects</p>
<p>NGS and Python scripts https://code.google.com/hosting/search?q=NGS+Python&amp;projectsearch=Search+projects</p><p>Address of the bookmark: <a href="https://code.google.com/hosting/search?q=bioinformatics&amp;sa=Search" rel="nofollow">https://code.google.com/hosting/search?q=bioinformatics&amp;sa=Search</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30897/finestructure-v2-globetrotter</guid>
	<pubDate>Mon, 13 Feb 2017 08:40:23 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30897/finestructure-v2-globetrotter</link>
	<title><![CDATA[fineSTRUCTURE v2 &amp; GLOBETROTTER]]></title>
	<description><![CDATA[<p>Software available at this site</p>
<div>
<ul>
<li><a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/finestructure_info.html">FineSTRUCTURE version 2</a>, a pipeline for running ChromoPainter and FineSTRUCTURE for population inference. A GUI is available for interpretation. Download from the <a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/finestructure.html">Downloads</a> page.</li>
<li><a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/finestructureR.html">FineSTRUCTURE R scripts</a>, a facility for exploring the results when the GUI is unavailable.</li>
<li><a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/globetrotter.html">GLOBETROTTER</a>, the admixture dating method based on ChromoPainter. Download from the <a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/finestructure.html">Downloads</a> page.</li>
<li><a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/admixture.html">AdmixturePainting</a>, A set of R tools to inmterpret the results of ADMIXTURE and STRUCTURE-like mixture models.</li>
<li><a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/radpainter.html">RADpainter</a>, finestructure and ChromoPainter for RAD tag data used for non-model organisms.</li>
<li>Scripts to perform many types of conversion. Included in the main software download from the <a href="https://people.maths.bris.ac.uk/%7Emadjl/finestructure/finestructure.html">Downloads</a> page.</li>
</ul>
What this page is This page provides information about and downloads for <strong>methodology for Chromosome Painting</strong>. It is not a facility to analyse your genome. Sorry if you were misled by the punchy name!<br> About Chromosome Painting Painting is an efficient way of identifying important haplotype information from dense genotype data. It describes ancestry in an efficient way suitable for a range of further analyses, including population identification and admixture dating.</div><p>Address of the bookmark: <a href="http://paintmychromosomes.com/" rel="nofollow">http://paintmychromosomes.com/</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34546/comparative-genomics-scripts</guid>
	<pubDate>Wed, 06 Dec 2017 15:20:45 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34546/comparative-genomics-scripts</link>
	<title><![CDATA[Comparative genomics scripts]]></title>
	<description><![CDATA[<p>Comparative genomics educational material and papers bookmarks</p>
<p>https://github.com/iansealy/coursera-comparinggenomes</p><p>Address of the bookmark: <a href="https://github.com/iansealy/coursera-comparinggenomes" rel="nofollow">https://github.com/iansealy/coursera-comparinggenomes</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36905/d-genies-a-tool-for-dotplot-large-genomes-in-an-interactive-efficient-and-simple-way</guid>
	<pubDate>Mon, 11 Jun 2018 09:41:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36905/d-genies-a-tool-for-dotplot-large-genomes-in-an-interactive-efficient-and-simple-way</link>
	<title><![CDATA[D-GENIES: A tool for Dotplot large Genomes in an Interactive, Efficient and Simple way]]></title>
	<description><![CDATA[D-GENIES – for Dotplot large Genomes in an Interactive, Efficient and Simple way – is an online tool designed to compare two genomes. It supports large genome and you can interact with the dot plot to improve the visualisation.

We use minimap version 2 to align the two genomes. Then, the PAF file is parsed and plotted into an interactive plot written with d3.js library.

D-Genies also allows to display dot plots from other aligners by uploading their PAF or MAF alignment file.

http://dgenies.toulouse.inra.fr/<p>Address of the bookmark: <a href="http://dgenies.toulouse.inra.fr/" rel="nofollow">http://dgenies.toulouse.inra.fr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/14191/scalpel</guid>
	<pubDate>Wed, 20 Aug 2014 02:07:58 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/14191/scalpel</link>
	<title><![CDATA[Scalpel]]></title>
	<description><![CDATA[<p>A team from Cold Spring Harbor Laboratory has released an algorithm, called Scalpel, for finding insertions and deletions in next generation sequencing data sets. Scalpel, which is open source and <a href="http://scalpel.sourceforge.net/" title="available for download">available for download</a> on SourceForge,&nbsp;<span>outperformed the popular tools GATK HaplotypeCaller and SOAPindel in test runs on both simulated and real whole human exomes.</span></p><p>Like other indel callers, Scalpel works by performing <em>de novo</em>&nbsp;assembly of regions of interest, so that misalignment to the reference genome cannot obscure the presence of an insertion or deletion. Scalpel's innovation is to repeatedly check its assembly before comparing to the reference genome, to account for simple sequence repeats that are a regular source of error in indel calling. When Scalpel assembles an exon, it collects reads that map to that exon (including partial matches), splits them into k-mers, and creates a de Bruijn graph to span the exon; however, if it detects repeats in the map, it iteratively increases the size of the k-mers by one base until the repeats are eliminated. This ensures that the final assembly of the exon is highly accurate while minimizing compute time.</p><p>The Cold Spring Harbor team's validation of Scalpel, <a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3069.html" title="published over the weekend in Nature Methods">published over the weekend in <em>Nature Methods</em></a>, compares Scalpel's performance on a live whole exome against HaplotypeCaller and SOAPindel. The donor is an individual with serious neurological disorders, which may be linked to a high incidence of indels. One thousand indels from this individual's exome, called by one or more of the informatics pipelines, were selected for focused resequencing. This resequencing revealed a 77% true positive rate for Scalpel calls, dramatically better than the rates for either of the competing tools; Scalpel performed especially well with indels longer than five base pairs, a traditional weak point for indel callers.</p><p>Finally, the authors demonstrate Scalpel's use on a large set of genetic data from nearly 600 families who donated samples to the Simons Simplex Collection, a project of the Simons Foundation Autism Research Initiative. Scalpel found a very high enrichment for indels in children affected by autism, compared with their unaffected siblings, a pattern that persisted even after excluding common variants.</p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>

</channel>
</rss>