<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: June 2025]]></title>
	<link>https://bioinformaticsonline.com/blog/archive/abhinav/1748754000/1751346000?</link>
	<atom:link href="https://bioinformaticsonline.com/blog/archive/abhinav/1748754000/1751346000?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44858/p-value-fdr-q-score-what-do-they-mean-a-simple-guide-with-example</guid>
	<pubDate>Fri, 27 Jun 2025 03:26:38 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44858/p-value-fdr-q-score-what-do-they-mean-a-simple-guide-with-example</link>
	<title><![CDATA[P-Value, FDR, q-score: What Do They Mean? A Simple Guide with Example]]></title>
	<description><![CDATA[<p>In statistics and bioinformatics, you&rsquo;ll often see results reported with p-values, FDR, and q-values (q-scores). But what do these terms mean, and how are they different? Let&rsquo;s break them down with simple definitions and a step-by-step example.</p><p>1. What is a P-Value?<br />Definition: The p-value is the probability of observing a result at least as extreme as the one you got, assuming the null hypothesis is true.</p><p>Low p-value (e.g., p &lt; 0.05) &rarr; evidence against the null hypothesis.</p><p>High p-value &rarr; no strong evidence against the null.</p><p>Key idea: It tells you how surprising your data is if there&rsquo;s really no effect.</p><p>2. The Multiple Testing Problem<br />In bioinformatics, genomics, or any large-scale study, you test thousands of hypotheses (e.g., thousands of genes). Even if there&rsquo;s no real signal, some tests will have p &lt; 0.05 just by chance.</p><p>Example:</p><p>Testing 10,000 genes</p><p>Even if all null, expect ~500 genes with p &lt; 0.05 by chance</p><p>This is why we need multiple testing correction.</p><p>3. What is FDR (False Discovery Rate)?<br />Definition: FDR is the expected proportion of false positives among the results you declare significant.</p><p>Unlike the family-wise error rate (FWER), which controls for even a single false positive, FDR lets you tolerate some false discoveries to gain power.</p><p>Benjamini&ndash;Hochberg (BH) procedure is the most popular method to control FDR.</p><p>4. What is a q-value (or q-score)?<br />Definition: The q-value of a test is the minimum FDR at which that test would be called significant.</p><p>A p-value tells you how surprising your result is.</p><p>A q-value tells you how many of your significant results might be false positives if you call this result significant.</p><p>You can think of the q-value as the FDR-adjusted p-value.</p><p>5. Example: Step-by-Step<br />Let&rsquo;s work through an example with 10 tests.</p><p>Test Raw p-value<br />1 0.001<br />2 0.004<br />3 0.010<br />4 0.020<br />5 0.030<br />6 0.040<br />7 0.050<br />8 0.060<br />9 0.070<br />10 0.080</p><p>Goal: Control FDR at 5%.</p><p>Step 1: Rank p-values<br />Rank from lowest to highest:</p><p>Rank p-value<br />1 0.001<br />2 0.004<br />3 0.010<br />4 0.020<br />5 0.030<br />6 0.040<br />7 0.050<br />8 0.060<br />9 0.070<br />10 0.080</p><p>Step 2: Apply Benjamini&ndash;Hochberg threshold<br />For each rank i, compute:</p><p>BH&nbsp;critical&nbsp;value =i/m*q<br />BH&nbsp;critical&nbsp;value=m/i*Q<br />m = 10 tests<br />Q = 0.05</p><p>Rank p-value BH critical value<br />1 0.001 0.005<br />2 0.004 0.010<br />3 0.010 0.015<br />4 0.020 0.020<br />5 0.030 0.025<br />6 0.040 0.030<br />7 0.050 0.035<br />8 0.060 0.040<br />9 0.070 0.045<br />10 0.080 0.050</p><p>Find the largest p-value &le; its critical value:</p><p>p(4) = 0.020 &le; 0.020 (T)</p><p>p(5) = 0.030 &gt; 0.025 (F)</p><p>Result: We can declare the top 4 tests significant at FDR 5%.</p><p>Step 3: Computing q-values (conceptually)<br />The q-value for each p-value is roughly the minimum FDR at which it would be significant. Specialized software (e.g., R&rsquo;s qvalue package) can estimate them.</p><p>In our example:</p><p>Tests 1&ndash;4 would have q-values &le; 0.05</p><p>Tests 5&ndash;10 would have q-values &gt; 0.05</p><p>The q-value gives you an adjusted p-value that accounts for multiple testing.</p><p>6. In Bioinformatics Workflows<br />You see these all the time:</p><p>RNA-seq differential expression &rarr; Report p-values, FDR/q-values</p><p>ChIP-seq peak calling</p><p>Genome-wide association studies (GWAS)</p><p>Proteomics, metabolomics</p><p>Always check if results are corrected for multiple testing. Reporting raw p-values alone can be misleading.</p><p>Summary<br />Term Meaning Interpretation<br />p-value Probability under null Small p &rarr; evidence against null<br />FDR False Discovery Rate Expected proportion of false positives among calls<br />q-value FDR-adjusted p-value Minimum FDR threshold where result is significant</p><p>Final Tip<br />Always correct for multiple testing! Otherwise, your beautiful "significant" results might just be noise.</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</guid>
	<pubDate>Mon, 16 Jun 2025 01:44:34 -0500</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44852/what-is-data-science-%E2%80%94-a-bioinformatics-perspective</link>
	<title><![CDATA[What is Data Science? — A Bioinformatics Perspective]]></title>
	<description><![CDATA[<p>In today&rsquo;s era of big biology, we&rsquo;re generating more data than ever before&mdash;genomes, transcriptomes, proteomes, metabolomes, microbiomes&hellip; you name it. But raw biological data doesn&rsquo;t speak for itself. Making sense of it requires more than traditional biology. This is where data science steps in.</p><p><strong>So, What Is Data Science?</strong><br />At its core, data science is the interdisciplinary field that extracts knowledge and insights from data using programming, statistics, and domain expertise. In bioinformatics, data science enables us to turn gigabytes of sequence data into biological meaning.</p><p>Imagine trying to understand gene regulation in cancer by analyzing thousands of RNA-seq samples, or predicting antibiotic resistance from bacterial genomes&mdash;these challenges are not solvable through wet lab experiments alone. They require data-driven thinking.</p><p><strong>Data Science Meets Bioinformatics</strong><br />Bioinformatics is inherently a data science domain. From genomics to systems biology, every field in modern biology relies on data science techniques to:</p><p>Clean and process massive datasets</p><p>Discover patterns in high-dimensional data</p><p>Build predictive models (e.g., for disease classification)</p><p>Visualize complex biological networks and trends</p><p>Integrate diverse data types (e.g., transcriptomic + epigenomic data)</p><p><strong>The Bioinformatics Toolkit</strong><br />Here&rsquo;s what data science typically looks like in bioinformatics:</p><p>Task Data Science Role<br />Sequence alignment Efficient algorithms, indexing, parallel processing<br />Gene expression analysis Statistical modeling (e.g., DESeq2, limma)<br />Variant calling Data filtering, probabilistic models<br />Clustering of cells in single-cell data Unsupervised learning<br />Protein structure prediction Deep learning models (e.g., AlphaFold)<br />Metagenomics Data integration, classification, dimensionality reduction</p><p>Common tools include Python, R, Bioconductor, scikit-learn, Pandas, Seurat, and TensorFlow&mdash;often working together in reproducible workflows.</p><p><strong>It's Not Just About Coding</strong><br />A common misconception is that bioinformatics is just programming or scripting. But being a data scientist in bioinformatics also means:</p><p>Understanding experimental design</p><p>Asking biologically meaningful questions</p><p>Choosing the right statistical or machine learning models</p><p>Communicating findings effectively (e.g., plots, dashboards, papers)</p><p>In other words, data science in bioinformatics is where biology, statistics, and computer science converge.</p><p><strong>Why It Matters</strong><br />The real power of data science in bioinformatics is its ability to scale discovery.</p><p>Instead of studying one gene, we can study thousands.</p><p>Instead of analyzing one species, we can explore entire ecosystems.</p><p>Instead of waiting months for lab results, we can generate hypotheses in days.</p><p>From personalized medicine and cancer diagnostics to agricultural genomics and pandemic surveillance, data science is at the heart of the bioinformatics revolution.</p><p><strong>Final Thoughts</strong><br />If you&rsquo;re a biologist who&rsquo;s curious about code, or a data enthusiast fascinated by life sciences, bioinformatics is your playground&mdash;and data science is your toolkit.</p><p>In bioinformatics, data science isn&rsquo;t just useful. It&rsquo;s essential.</p><p>&nbsp;</p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>

</channel>
</rss>