In statistics and bioinformatics, you’ll often see results reported with p-values, FDR, and q-values (q-scores). But what do these terms mean, and how are they different? Let’s break them down with simple definitions and a step-by-step example.
1. What is a P-Value?
Definition: The p-value is the probability of observing a result at least as extreme as the one you got, assuming the null hypothesis is true.
Low p-value (e.g., p < 0.05) → evidence against the null hypothesis.
High p-value → no strong evidence against the null.
Key idea: It tells you how surprising your data is if there’s really no effect.
2. The Multiple Testing Problem
In bioinformatics, genomics, or any large-scale study, you test thousands of hypotheses (e.g., thousands of genes). Even if there’s no real signal, some tests will have p < 0.05 just by chance.
Example:
Testing 10,000 genes
Even if all null, expect ~500 genes with p < 0.05 by chance
This is why we need multiple testing correction.
3. What is FDR (False Discovery Rate)?
Definition: FDR is the expected proportion of false positives among the results you declare significant.
Unlike the family-wise error rate (FWER), which controls for even a single false positive, FDR lets you tolerate some false discoveries to gain power.
Benjamini–Hochberg (BH) procedure is the most popular method to control FDR.
4. What is a q-value (or q-score)?
Definition: The q-value of a test is the minimum FDR at which that test would be called significant.
A p-value tells you how surprising your result is.
A q-value tells you how many of your significant results might be false positives if you call this result significant.
You can think of the q-value as the FDR-adjusted p-value.
5. Example: Step-by-Step
Let’s work through an example with 10 tests.
Test Raw p-value
1 0.001
2 0.004
3 0.010
4 0.020
5 0.030
6 0.040
7 0.050
8 0.060
9 0.070
10 0.080
Goal: Control FDR at 5%.
Step 1: Rank p-values
Rank from lowest to highest:
Rank p-value
1 0.001
2 0.004
3 0.010
4 0.020
5 0.030
6 0.040
7 0.050
8 0.060
9 0.070
10 0.080
Step 2: Apply Benjamini–Hochberg threshold
For each rank i, compute:
BH critical value =i/m*q
BH critical value=m/i*Q
m = 10 tests
Q = 0.05
Rank p-value BH critical value
1 0.001 0.005
2 0.004 0.010
3 0.010 0.015
4 0.020 0.020
5 0.030 0.025
6 0.040 0.030
7 0.050 0.035
8 0.060 0.040
9 0.070 0.045
10 0.080 0.050
Find the largest p-value ≤ its critical value:
p(4) = 0.020 ≤ 0.020 (T)
p(5) = 0.030 > 0.025 (F)
Result: We can declare the top 4 tests significant at FDR 5%.
Step 3: Computing q-values (conceptually)
The q-value for each p-value is roughly the minimum FDR at which it would be significant. Specialized software (e.g., R’s qvalue package) can estimate them.
In our example:
Tests 1–4 would have q-values ≤ 0.05
Tests 5–10 would have q-values > 0.05
The q-value gives you an adjusted p-value that accounts for multiple testing.
6. In Bioinformatics Workflows
You see these all the time:
RNA-seq differential expression → Report p-values, FDR/q-values
ChIP-seq peak calling
Genome-wide association studies (GWAS)
Proteomics, metabolomics
Always check if results are corrected for multiple testing. Reporting raw p-values alone can be misleading.
Summary
Term Meaning Interpretation
p-value Probability under null Small p → evidence against null
FDR False Discovery Rate Expected proportion of false positives among calls
q-value FDR-adjusted p-value Minimum FDR threshold where result is significant
Final Tip
Always correct for multiple testing! Otherwise, your beautiful "significant" results might just be noise.