<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/42419?offset=70</link>
	<atom:link href="https://bioinformaticsonline.com/related/42419?offset=70" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43641/refseq-viraal-genome-sequences</guid>
	<pubDate>Sat, 11 Dec 2021 08:35:18 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43641/refseq-viraal-genome-sequences</link>
	<title><![CDATA[Refseq viraal genome sequences !]]></title>
	<description><![CDATA[<p>List of all viruses on NCBI&nbsp;</p>
<p>https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/</p><p>Address of the bookmark: <a href="https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/" rel="nofollow">https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44529/contigextender-a-new-approach-to-improving-de-novo-sequence-assembly-for-viral-metagenomics-data</guid>
	<pubDate>Wed, 08 May 2024 07:32:45 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44529/contigextender-a-new-approach-to-improving-de-novo-sequence-assembly-for-viral-metagenomics-data</link>
	<title><![CDATA[ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data]]></title>
	<description><![CDATA[<p dir="auto">ContigExtender, was developed to extend contigs, complementing de novo assembly. ContigExtender employs a novel recursive Overlap Layout Candidates (r-OLC) strategy that explores multiple extending paths to achieve longer and highly accurate contigs. ContigExtender is effective for extending contigs significantly in in silico synthesized and real metagenomics datasets.</p>
<p dir="auto">More at&nbsp;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7953547/</p>
<p dir="auto"><a href="https://camo.githubusercontent.com/72dc78177cd84dd0c667a2922a9fd984fb548b5ec94b11f9a547211a4adba3b1/68747470733a2f2f692e696d6775722e636f6d2f7734516944496a2e706e67" target="_blank"><img src="https://camo.githubusercontent.com/72dc78177cd84dd0c667a2922a9fd984fb548b5ec94b11f9a547211a4adba3b1/68747470733a2f2f692e696d6775722e636f6d2f7734516944496a2e706e67" alt="extension process" title="extension process" style="border: 0px;"></a></p><p>Address of the bookmark: <a href="https://github.com/dengzac/contig-extender" rel="nofollow">https://github.com/dengzac/contig-extender</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44375/phyloherb-a-high%E2%80%90throughput-phylogenomic-pipeline-for-processing-genome-skimming-data</guid>
	<pubDate>Wed, 06 Sep 2023 00:14:28 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44375/phyloherb-a-high%E2%80%90throughput-phylogenomic-pipeline-for-processing-genome-skimming-data</link>
	<title><![CDATA[PhyloHerb: A high‐throughput phylogenomic pipeline for processing genome skimming data]]></title>
	<description><![CDATA[<p dir="auto"><span>Phylo</span>genomic Analysis Pipeline for&nbsp;<span>Herb</span>arium Specimens</p>
<p dir="auto"><span>What is PhyloHerb</span>: PhyloHerb is a wrapper program to process&nbsp;<span>genome skimming</span>&nbsp;data collected from plant materials. The outcomes include the plastid genome (plastome) assemblies, mitochondrial genome assemblies, nuclear ribosomal DNAs (NTS+ETS+18S+ITS1+5.8S+ITS2+28S), alignments of gene and intergenic regions, and a species tree. It is designed to be a high throughput program dealing with lower quality data. Examples include&nbsp;<span>low-coverage (5x cpDNA) plastome phylogeny, recycling plastid genes from target enrichment data, retrieving low-copy nuclear genes from medium coverage (5x nucDNA) genome skimming</span>.</p>
<p dir="auto"><span>License</span>: GNU General Public License</p>
<p dir="auto"><span>Citation</span>:</p>
<ul dir="auto">
<li>Cai, Liming, Hongrui Zhang, and Charles C. Davis. 2022. PhyloHerb: A high‐throughput phylogenomic pipeline for processing genome‐skimming data. Applications in Plant Sciences 10(3): 1&ndash;9.&nbsp;<a href="https://doi.org/10.1002/aps3.11475">https://doi.org/10.1002/aps3.11475</a></li>
</ul><p>Address of the bookmark: <a href="https://github.com/lmcai/PhyloHerb/" rel="nofollow">https://github.com/lmcai/PhyloHerb/</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/44672/libraries-or-management-tools-for-high-throughput-sequencing-data</guid>
	<pubDate>Fri, 04 Oct 2024 02:45:06 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/44672/libraries-or-management-tools-for-high-throughput-sequencing-data</link>
	<title><![CDATA[Libraries or management tools for high throughput sequencing data]]></title>
	<description><![CDATA[<ul>
<li><a href="http://gatb.inria.fr/"><span>GATB</span></a>&nbsp;Library.&nbsp;The&nbsp;<span>Genome Analysis Toolbox with de-Bruijn graph.&nbsp;</span>A large part of tools developed by the GenScale team are based on this library.<br />These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge amount of reads data coming from any kind of organisms such as bacteria, plants, animals and even complex samples (<em>e.g.</em>&nbsp;metagenomes). Among them are (the full is available here:&nbsp;<a href="https://gatb.inria.fr/software/">https://gatb.inria.fr/software/</a>):</li>
<li><a href="https://github.com/morispi/LRez"><span>LRez</span></a>: C++ Library and toolkit for the barcode-based management and indexation of linked-read datasets.</li>
</ul><h2>Variant calling and/or genotyping</h2><ul>
<li><a href="https://gatb.inria.fr/software/discosnp/" title="DiscoSNP">DiscoSNP++ and&nbsp;discoSnpRAD</a>: Reference-free small variant discovery (SNPs and indels)</li>
<li><a href="https://gatb.inria.fr/software/mind-the-gap/" title="MindTheGap">MindTheGap</a>: Detection and assembly of large insertion variants</li>
<li><a href="https://gatb.inria.fr/software/takeabreak/" title="TakeABreak">TakeABreak</a>:&nbsp;reference-free inversion discovery tool</li>
<li><a href="https://github.com/llecompte/SVJedi">SVJedi</a>: Structural Variant genotyper with long read data</li>
<li><a href="https://github.com/SandraLouise/SVJedi-graph">SVJedi-graph</a>: Structural Variant genotyper with long read data using a variation graph</li>
</ul><h2>Sequence assembly</h2><ul>
<li><a href="https://github.com/cguyomar/MinYS">MinYS</a>: reference-guided genome assembly in metagenomics data</li>
<li><a href="https://github.com/anne-gcd/MTG-Link">MTG-link</a>: local assembly tool for linked-read data</li>
<li><a href="https://gatb.inria.fr/software/minia/" title="Minia">Minia</a>: De novo short read assembler</li>
<li><a href="https://gatb.inria.fr/de-novo-genome-assembly/">de-novo pipeline</a>:&nbsp;<em>de-novo</em>&nbsp;assembly pipeline (error correction / contigs / scaffolding) for genomes and meta-genomes</li>
<li><a href="https://gatb.inria.fr/software/mapsembler/" title="Mapsembler2">Mapsembler2</a>: Targeted assembly (not maintained)</li>
</ul><h2>Managing k-mers &amp; indexation</h2><ul>
<li><a href="https://github.com/lrobidou/findere">findere</a>:&nbsp;simple strategy for speeding up queries and for reducing false positive calls from any Approximate Membership Query data structure.
<ul>
<li><a href="https://github.com/lrobidou/fimpera">fimpera</a>&nbsp;extends findere adding the abundance information.</li>
</ul>
</li>
<li><a href="https://github.com/tlemane/kmtricks">kmtricks</a>:&nbsp;modular tool suite for counting kmers, and constructing Bloom filters or kmer matrices, for large collections of sequencing data.</li>
<li><a href="https://github.com/tlemane/kmindex">kmindex&nbsp;</a>is a tool for indexing and querying sequencing samples. It is built on top of kmtricks.</li>
<li><a href="https://github.com/pierrepeterlongo/back_to_sequences">back to sequences</a>: Find sequences (reads, unitigs, genes) related to a set of kmers in large datasets, in a matter of seconds.</li>
<li><a href="https://github.com/vicLeva/bqf">Backpack Quotient Filter</a>:&nbsp;k-mer indexing data structure with abundance</li>
<li><a href="http://github.com/GATB/rconnector">short read connector</a>:&nbsp;Detect similar reads from potentially large read set</li>
<li><a href="https://gatb.inria.fr/software/dsk/" title="DSK">DSK</a>:&nbsp;Count K-mer in sequences</li>
</ul><h2>Pangenome graph manipulation</h2><ul>
<li><a href="https://github.com/Tharos-ux/pancat">Pancat</a>: Pangenome Comparison and Analysis Toolkit</li>
<li><a href="https://pypi.org/project/gfagraphs/">GFAGraphs</a>: a Python library to handle pangenome graph files in GFA format.</li>
</ul><h2>Comparative metagenomics with k-mers</h2><ul>
<li><a href="https://github.com/GATB/simka">Simka and SimkaMin</a>:&nbsp;Comparative metagenomics for large-scale datasets</li>
<li><a href="https://team.inria.fr/genscale/high-throughput-sequence-analysis/compreads-metagenomic-data-analysis/">Comparead &amp; Commet</a>:&nbsp;comparison of metagenomic datasets</li>
</ul><h2>Species and bacterial strains identification</h2><ul>
<li><a href="https://github.com/gsiekaniec/ORI">ORI</a>: software using long nanopore reads to identify bacteria present in a sample at the strain level</li>
<li><a href="https://github.com/kevsilva/StrainFLAIR">StrainFLAIR</a>:&nbsp;STRAIN-level proFiLing using vArIation gRaph</li>
</ul><h2>General-purpose sequencing data manipulation</h2><ul>
<li><a href="https://team.inria.fr/genscale/ngs-software/gassst/">GASSST</a>:&nbsp;long read mapper</li>
<li><a href="https://gatb.inria.fr/software/leon/" title="Leon">Leon</a>: short read compressor (now included in GATB-core)</li>
<li><a href="https://gatb.inria.fr/software/bloocoo/" title="Bloocoo">Bloocoo</a>:&nbsp;short read corrector</li>
<li><a href="https://github.com/GATB/bcalm">BCALM</a>:&nbsp;Construct compacted de Bruijn graphs (unitigs)</li>
</ul><h2>&nbsp;Protein Structure</h2><ul>
<li><a href="https://team.inria.fr/genscale/protein-structure/a-purva-contact-map-overlap-solver/">A_Purva</a>:&nbsp;Contact Map Overlap solver</li>
<li><a href="https://team.inria.fr/genscale/protein-structure/md-jeep-distance-geomtry-solver/">MD-Jeep</a>:&nbsp;Distance Geometry solver</li>
<li><a href="https://team.inria.fr/genscale/csa-comparative-structural-alignment/">CSA</a>:&nbsp;Comparative Structural Alignment</li>
</ul><h2>Workflow</h2><ul>
<li><a href="https://team.inria.fr/genscale/workflows/slicee/">SLICEE</a>:&nbsp;parallel execution of bioinformatics workflows</li>
</ul><h3>Comparative Genomics</h3><ul>
<li><a href="https://team.inria.fr/genscale/comparative-genomics/cassis/">CASSIS</a>:&nbsp;detection of rearrangement breakpoints</li>
<li><a href="https://team.inria.fr/genscale/high-throughput-sequence-analysis/plast-intensive-sequence-comparison/">PLAST</a>:&nbsp;intensive bank-to-bank sequence comparison</li>
<li><a href="https://github.com/stephanierobin/DrjBreakpointFinder">DRJBreakpointFinder</a>: detection and precise localization of excision sites in proviral segments</li>
</ul>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/23680/five-key-traits-to-seek-out-in-potential-bioinformatics-candidates</guid>
	<pubDate>Mon, 10 Aug 2015 12:53:50 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/23680/five-key-traits-to-seek-out-in-potential-bioinformatics-candidates</link>
	<title><![CDATA[Five key traits to seek out in potential bioinformatics candidates !!!]]></title>
	<description><![CDATA[<p>Genomics and proteomics data are being collected in bulk, but mostly, traditional biologist don&rsquo;t know what to do with it. Perhaps this is the reason why (not only this!!! ) computational biologist/bioinformatics scientists are hot commodities in the research world.</p><p>In fact, there are huge demands for expert biological data analyst. It&rsquo;s a fairly new &nbsp;(not exactly) hot area, these bioinformatician are invaluable because they know and understand the significance of biological data for your research and how you can use it for better understanding of biological problems.</p><p>The bioinformatics can discover biological patterns and stories in genomic and proteomics data. They can develop the pipeline needed to properly collect, store and analyse it.</p><p><img src="http://bioinformaticsonline.com/mod/photo/hire.gif" alt="image" style="border: 0px;"></p><p>Once your research group is ready to make a larger investment and hire a bioinformatician to gain a competitive edge, there are several key traits to seek out in potential candidates. The best bioinformatician are:</p><p>1. Highly Skilled - programming skills, experience with the biological software and tools.</p><p>The biological data won&rsquo;t illuminate much if the scientist analysing it doesn&rsquo;t possess practical programming skills, experience with the biological software and tools and a thorough understanding of basic biological stuff. A solid background in mathematics and statistics is also an indispensable trait.</p><p>2. Insight - Real vision, robust understanding and deep insight.</p><p>In order to hire the best bioinformatics and computational biologist scientist for your needs, it is always recommended and mostly practiced by the recruiters, to ask each contender to write and develop a sample script/presentation based on a specific set of data you provide. Then, explore the approaches used to deal with data provided and pick up those candidates who convey real vision, robust understanding and deep insight.</p><p>3. Energetic &ndash; Curiosity to explore</p><p>Mostly natural curiosity and enthusiasm for solving big biological problems coupled with an ability to transform data into a scientific stories may place one candidate above the rest. In addition to achieve that, the bioinformatician should be agile enough to quickly modify their methods to suit changes within a particular research.</p><p>4. Researcher &ndash; Publications</p><p>Look for someone who has a keen sense and understanding of concern biological problems. You can judge it by looking at previously published papers and data. It is always recommended to have a look at GitHub and other repository for codes written by her/him.</p><p>5. Impressive communicator - Insight that can&rsquo;t be expressed is worthless.</p><p>Good bioinformatics scientists are able to uncover biological patterns and are willing to explain those patterns in clear and helpful ways through thoughtful and open communication. In other words, they should must have good scientific writing skills. A computational biologis/bioinformatician&nbsp; should know how to present the data and tell a scientific story through numbers/images.</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37993/platypus-a-haplotype-based-variant-caller-for-next-generation-sequence-data</guid>
	<pubDate>Thu, 25 Oct 2018 06:14:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37993/platypus-a-haplotype-based-variant-caller-for-next-generation-sequence-data</link>
	<title><![CDATA[Platypus: A Haplotype-Based Variant Caller For Next Generation Sequence Data]]></title>
	<description><![CDATA[<p><strong>Platypus</strong><span>&nbsp;is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/pubmed/?term=24463883">whole-genome</a><span>,&nbsp;</span><a href="http://www.nature.com/ng/journal/v45/n1/abs/ng.2492.html">exon-capture</a><span>, and&nbsp;</span><a href="http://www.nature.com/nature/journal/v493/n7432/abs/nature11725.html">targeted capture</a><span>&nbsp;data, it has been run on very large datasets as part of the&nbsp;</span><a href="http://www.1000genomes.org/">Thousand Genomes</a><span>&nbsp;and WGS500 projects, and is being used in clinical sequencing trials in the&nbsp;</span><a href="http://www.mcgprogramme.com/">Mainstreaming Cancer Genetics</a><span>&nbsp;programme.&nbsp;</span></p>
<p><span>Tutorial&nbsp;https://github.com/andyrimmer/Platypus/blob/master/misc/README.txt</span></p><p>Address of the bookmark: <a href="http://www.well.ox.ac.uk/platypus" rel="nofollow">http://www.well.ox.ac.uk/platypus</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</guid>
	<pubDate>Mon, 20 Jan 2020 05:13:29 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40573/de-novo-genome-assembly-for-illumina-data</link>
	<title><![CDATA[De novo Genome Assembly for Illumina Data]]></title>
	<description><![CDATA[<p>Written and maintained by <a href="mailto:simon.gladman@unimelb.edu.au">Simon Gladman</a> - Melbourne Bioinformatics (formerly VLSCI)</p>
<p>Protocol Overview / Introduction</p>
<p>In this protocol we discuss and outline the process of de novo assembly for small to medium sized genomes.</p>
<p>https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</p><p>Address of the bookmark: <a href="https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/" rel="nofollow">https://www.melbournebioinformatics.org.au/tutorials/tutorials/assembly/assembly-protocol/</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</guid>
	<pubDate>Wed, 15 Sep 2021 21:15:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/43374/reference-sequence-resource</link>
	<title><![CDATA[Reference Sequence Resource!]]></title>
	<description><![CDATA[<p><span>The ENCODE project uses Reference Genomes from&nbsp;</span><a href="http://www.ncbi.nlm.nih.gov/genome/browse/reference/">NCBI</a><span>&nbsp;or&nbsp;</span><a href="http://hgdownload.cse.ucsc.edu/downloads.html">UCSC</a><span>&nbsp;to provide a consistent framework for mapping high-throughput sequencing data.&nbsp;In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability.&nbsp;</span><em>Drosophia melanogaster</em><span>&nbsp;experiments are mapped to either dm3 or dm6 and&nbsp;</span><em>Caenorhabdilis elegans&nbsp;</em><span>experiments are mapped to ce10 or ce11.&nbsp;T</span></p><p>Address of the bookmark: <a href="https://www.encodeproject.org/data-standards/reference-sequences/" rel="nofollow">https://www.encodeproject.org/data-standards/reference-sequences/</a></p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</guid>
	<pubDate>Mon, 16 Jun 2025 01:44:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</link>
	<title><![CDATA[What is Data Science? — A Bioinformatics Perspective]]></title>
	<description><![CDATA[<p>In today&rsquo;s era of big biology, we&rsquo;re generating more data than ever before&mdash;genomes, transcriptomes, proteomes, metabolomes, microbiomes&hellip; you name it. But raw biological data doesn&rsquo;t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.</p><p><strong>So, What Is Data Science?</strong><br />At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.</p><p>Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes&mdash;these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.</p><p><strong>Data Science Meets Bioinformatics</strong><br />Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:</p><p>Clean and process massive datasets</p><p>Discover patterns in high-dimensional data</p><p>Build predictive models (e.g., for disease classification)</p><p>Visualize complex biological networks and trends</p><p>Integrate diverse data types (e.g., transcriptomic + epigenomic data)</p><p><strong>The Bioinformatics Toolkit</strong><br />Here&rsquo;s what data science typically looks like in bioinformatics:</p><p>Task Data Science Role<br />Sequence alignment Efficient algorithms, indexing, parallel processing<br />Gene expression analysis Statistical modeling (e.g., DESeq2, limma)<br />Variant calling Data filtering, probabilistic models<br />Clustering of cells in single-cell data Unsupervised learning<br />Protein structure prediction Deep learning models (e.g., AlphaFold)<br />Metagenomics Data integration, classification, dimensionality reduction</p><p>Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow&mdash;often working together in reproducible workflows.</p><p><strong>It's Not Just About Coding</strong><br />A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:</p><p>Understanding experimental design</p><p>Asking biologically meaningful questions</p><p>Choosing the right statistical or machine learning models</p><p>Communicating findings effectively (e.g., plots, dashboards, papers)</p><p>In other words, data science in bioinformatics is where biology, statistics, and computer science converge.</p><p><strong>Why It Matters</strong><br />The real power of data science in bioinformatics is its ability to scale discovery.</p><p>Instead of studying one gene, we can study thousands.</p><p>Instead of analyzing one species, we can explore entire ecosystems.</p><p>Instead of waiting months for lab results, we can generate hypotheses in days.</p><p>From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.</p><p><strong>Final Thoughts</strong><br />If you&rsquo;re a biologist who&rsquo;s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground&mdash;and data science is your toolkit.</p><p>In bioinformatics, data science isn&rsquo;t just useful. It&rsquo;s essential.</p><p>&nbsp;</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/news/view/18738/surrogate-variable-analysis-sva</guid>
	<pubDate>Thu, 30 Oct 2014 08:01:58 -0500</pubDate>
	<link>https://bioinformaticsonline.com/news/view/18738/surrogate-variable-analysis-sva</link>
	<title><![CDATA[Surrogate Variable Analysis (SVA)]]></title>
	<description><![CDATA[<p>The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways:</p><p>(1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS),</p><p>(2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and</p><p>(3) removing batch effects with known control probes (Leek 2014 biorXiv).</p><p>Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).</p><p>More at http://www.bioconductor.org/packages/release/bioc/html/sva.html</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>