Confidence Intervals¶
Confidence intervals tell you the range where your true metric likely falls. They're often more useful than just knowing if a result is "significant."
Why Confidence Intervals Matter¶
A p-value only tells you "is there a difference?"
A confidence interval tells you "how big is the difference likely to be?"
Example
Two tests both have p < 0.05 (significant):
- Test A: Lift = +5%, CI = [+0.1%, +9.9%] - Could be tiny or substantial
- Test B: Lift = +5%, CI = [+4.2%, +5.8%] - Reliably around 5%
The confidence interval gives you much more information!
Single Rate Confidence Interval¶
Get the range for a single conversion rate:
from expstats import conversion
ci = conversion.confidence_interval(
visitors=1000,
conversions=50,
confidence=95,
)
print(f"Rate: {ci.rate:.2%}")
print(f"95% CI: [{ci.lower:.2%}, {ci.upper:.2%}]")
print(f"Margin of error: ±{ci.margin_of_error:.2%}")
Output:
Single Mean Confidence Interval¶
Get the range for a numeric metric:
from expstats import magnitude
ci = magnitude.confidence_interval(
visitors=1000,
mean=50.00,
std=25.00,
confidence=95,
)
print(f"Mean: ${ci.mean:.2f}")
print(f"95% CI: [${ci.lower:.2f}, ${ci.upper:.2f}]")
Confidence Intervals in Test Results¶
Every test result includes confidence intervals for the lift:
result = conversion.analyze(...)
print(f"Lift: {result.lift_absolute:.4f}")
print(f"CI: [{result.confidence_interval_lower:.4f}, {result.confidence_interval_upper:.4f}]")
Interpreting the CI¶
| CI Range | Interpretation |
|---|---|
| Both bounds positive | Variant is better (significant positive effect) |
| Both bounds negative | Control is better (significant negative effect) |
| Spans zero | No significant difference |
Effect of Sample Size¶
Larger samples give narrower (more precise) confidence intervals:
# Small sample
ci_small = conversion.confidence_interval(visitors=100, conversions=5)
print(f"n=100: [{ci_small.lower:.2%}, {ci_small.upper:.2%}]")
# Large sample
ci_large = conversion.confidence_interval(visitors=10000, conversions=500)
print(f"n=10000: [{ci_large.lower:.2%}, {ci_large.upper:.2%}]")
Output:
Choosing Confidence Level¶
Common choices:
| Level | Z-score | Use When |
|---|---|---|
| 90% | 1.645 | Exploratory analysis |
| 95% | 1.96 | Standard for most tests |
| 99% | 2.576 | High-stakes decisions |
ci_90 = conversion.confidence_interval(visitors=1000, conversions=50, confidence=90)
ci_95 = conversion.confidence_interval(visitors=1000, conversions=50, confidence=95)
ci_99 = conversion.confidence_interval(visitors=1000, conversions=50, confidence=99)
print(f"90% CI: [{ci_90.lower:.2%}, {ci_90.upper:.2%}]")
print(f"95% CI: [{ci_95.lower:.2%}, {ci_95.upper:.2%}]")
print(f"99% CI: [{ci_99.lower:.2%}, {ci_99.upper:.2%}]")
Higher confidence = wider interval.
Methods Used¶
For Conversion Rates¶
expstats uses the Wilson score interval, which is more accurate than the normal approximation for: - Small samples - Rates near 0% or 100%
For Numeric Metrics¶
expstats uses the t-distribution, which accounts for uncertainty in the standard deviation estimate.
Best Practices¶
- Always look at CIs - Not just p-values
- Consider practical significance - Is the lower bound of the CI large enough to matter?
- Use appropriate sample sizes - Wider CIs = less certainty
- Report CIs in presentations - They're more informative than p-values alone