Analyzing Results¶

Once your test has collected enough data, it's time to analyze the results.

Basic Analysis¶

Conversion Rate Test¶

from expstats import conversion

result = conversion.analyze(
    control_visitors=10000,
    control_conversions=500,      # 5.0%
    variant_visitors=10000,
    variant_conversions=600,      # 6.0%
)

# Key metrics
print(f"Control rate: {result.control_rate:.2%}")
print(f"Variant rate: {result.variant_rate:.2%}")
print(f"Lift: {result.lift_percent:+.1f}%")
print(f"P-value: {result.p_value:.4f}")
print(f"Significant: {result.is_significant}")
print(f"Winner: {result.winner}")

Revenue Test¶

from expstats import magnitude

result = magnitude.analyze(
    control_visitors=5000,
    control_mean=50.00,
    control_std=25.00,
    variant_visitors=5000,
    variant_mean=52.50,
    variant_std=25.00,
)

print(f"Lift: ${result.lift_absolute:+.2f} ({result.lift_percent:+.1f}%)")
print(f"Winner: {result.winner}")

Understanding the Results¶

Is It Significant?¶

Check result.is_significant:

if result.is_significant:
    print(f"Winner: {result.winner}")
else:
    print("No winner yet - need more data")

What 'significant' means

Statistical significance means the observed difference is unlikely to be due to random chance. It does NOT mean the difference is large or important.

Reading the P-Value¶

The p-value tells you the probability of seeing your results if there was no real difference:

P-value	Interpretation
< 0.01	Very strong evidence
0.01 - 0.05	Strong evidence (significant at 95% confidence)
0.05 - 0.10	Weak evidence
> 0.10	Not enough evidence

Confidence Intervals¶

The confidence interval tells you the likely range of the true effect:

print(f"95% CI: [{result.confidence_interval_lower:.4f}, {result.confidence_interval_upper:.4f}]")

If the CI doesn't include zero, the result is significant.

The Recommendation¶

Every result includes a plain-English recommendation:

print(result.recommendation)

Example output:

**Test variant is significantly higher than control** (p-value: 0.0003).

_What this means:_ With 95% confidence, the difference between variant (6.00%)
and control (5.00%) is statistically real, not due to random chance. A p-value
of 0.0003 means there's only a 0.03% probability this result occurred by chance.

Generating Reports¶

Create shareable reports for stakeholders:

report = conversion.summarize(
    result,
    test_name="Homepage CTA Test"
)
print(report)

For revenue tests, customize the metric name and currency:

report = magnitude.summarize(
    result,
    test_name="Checkout Flow Test",
    metric_name="Average Order Value",
    currency="€"
)

Common Scenarios¶

Scenario 1: Clear Winner¶

result.is_significant  # True
result.winner          # "variant"
result.lift_percent    # +20.0%

Action: Implement the variant!

Scenario 2: No Significant Difference¶

result.is_significant  # False
result.p_value         # 0.35

Action: Either: - Run the test longer to collect more data - Accept that the variants are equivalent - The effect may be smaller than you designed for

Scenario 3: Significant, But Small Effect¶

result.is_significant  # True
result.lift_percent    # +1.2%

Action: Consider if a 1.2% improvement is worth the implementation effort.

Scenario 4: Control Wins¶

result.is_significant  # True
result.winner          # "control"
result.lift_percent    # -15.0%

Action: Don't implement the variant—it hurts performance!

Best Practices¶

Don't peek - Decide your sample size before starting and stick to it
Run full weeks - Capture day-of-week patterns
Look at confidence intervals - They tell you the range of possible effects
Consider business impact - Statistical significance ≠ business significance