Home

Measure what changed in user behavior, not just which test to run.

Outcome Effects¶

expstats models experimental impact across three fundamental outcome dimensions:

Effect Type	Question Answered	Examples
Conversion	Whether something happens	Signup, purchase, click, trial start
Magnitude	How much it happens	Revenue, time spent, order value
Timing	When it happens	Time to purchase, time to churn

This framework ensures experiments are interpreted in terms of behavioral change, not just statistical tests.

Installation¶

pip install expstats

Quick Start¶

from expstats import conversion, magnitude, timing

# Conversion: Did the treatment change whether users purchase?
result = conversion.analyze(
    control_visitors=10000,
    control_conversions=500,
    variant_visitors=10000,
    variant_conversions=600,
)
print(f"Conversion lift: {result.lift_percent:+.1f}%")

# Magnitude: Did the treatment change how much users spend?
result = magnitude.analyze(
    control_visitors=5000,
    control_mean=50.00,
    control_std=25.00,
    variant_visitors=5000,
    variant_mean=52.50,
    variant_std=25.00,
)
print(f"Revenue lift: ${result.lift_absolute:+.2f}")

# Timing: Did the treatment change when users convert?
result = timing.analyze(
    control_times=[5, 8, 12, 15, 20],
    control_events=[1, 1, 1, 0, 1],
    treatment_times=[3, 6, 9, 12, 16],
    treatment_events=[1, 1, 1, 1, 1],
)
print(f"Hazard ratio: {result.hazard_ratio:.2f}")

Or use the fully-qualified path:

from expstats.effects.outcome import conversion, magnitude, timing

📊 Conversion Effects — Whether it happens¶

Use when your outcome is binary: did the user convert or not?

Analyze a Test¶

from expstats import conversion

result = conversion.analyze(
    control_visitors=10000,
    control_conversions=500,      # 5.0% conversion
    variant_visitors=10000,
    variant_conversions=600,      # 6.0% conversion
)

print(f"Control: {result.control_rate:.2%}")
print(f"Variant: {result.variant_rate:.2%}")
print(f"Lift: {result.lift_percent:+.1f}%")
print(f"Significant: {result.is_significant}")
print(f"Winner: {result.winner}")

Calculate Sample Size¶

plan = conversion.sample_size(
    current_rate=5,       # 5% baseline
    lift_percent=10,      # detect 10% relative lift
    confidence=95,
    power=80,
)

print(f"Need {plan.visitors_per_variant:,} per variant")
plan.with_daily_traffic(10000)
print(f"Duration: {plan.test_duration_days} days")

Multi-Variant Tests (Chi-Square)¶

result = conversion.analyze_multi(
    variants=[
        {"name": "control", "visitors": 10000, "conversions": 500},
        {"name": "variant_a", "visitors": 10000, "conversions": 550},
        {"name": "variant_b", "visitors": 10000, "conversions": 600},
    ]
)

print(f"Best: {result.best_variant}")
print(f"P-value: {result.p_value:.4f}")

Difference-in-Differences¶

result = conversion.diff_in_diff(
    control_pre_visitors=5000, control_pre_conversions=250,
    control_post_visitors=5000, control_post_conversions=275,
    treatment_pre_visitors=5000, treatment_pre_conversions=250,
    treatment_post_visitors=5000, treatment_post_conversions=350,
)

print(f"DiD effect: {result.diff_in_diff:+.2%}")

📈 Magnitude Effects — How much it happens¶

Use when your outcome is a continuous value: revenue, time, count.

Analyze a Test¶

from expstats import magnitude

result = magnitude.analyze(
    control_visitors=5000,
    control_mean=50.00,
    control_std=25.00,
    variant_visitors=5000,
    variant_mean=52.50,
    variant_std=25.00,
)

print(f"Control: ${result.control_mean:.2f}")
print(f"Variant: ${result.variant_mean:.2f}")
print(f"Lift: ${result.lift_absolute:+.2f} ({result.lift_percent:+.1f}%)")
print(f"Significant: {result.is_significant}")

Calculate Sample Size¶

plan = magnitude.sample_size(
    current_mean=50,      # $50 average
    current_std=25,       # $25 std dev
    lift_percent=5,       # detect 5% lift
)

print(f"Need {plan.visitors_per_variant:,} per variant")

Multi-Variant Tests (ANOVA)¶

result = magnitude.analyze_multi(
    variants=[
        {"name": "control", "visitors": 1000, "mean": 50, "std": 25},
        {"name": "new_layout", "visitors": 1000, "mean": 52, "std": 25},
        {"name": "premium_upsell", "visitors": 1000, "mean": 55, "std": 25},
    ]
)

print(f"Best: {result.best_variant}")
print(f"F-statistic: {result.f_statistic:.2f}")

Difference-in-Differences¶

result = magnitude.diff_in_diff(
    control_pre_n=1000, control_pre_mean=50, control_pre_std=25,
    control_post_n=1000, control_post_mean=51, control_post_std=25,
    treatment_pre_n=1000, treatment_pre_mean=50, treatment_pre_std=25,
    treatment_post_n=1000, treatment_post_mean=55, treatment_post_std=26,
)

print(f"DiD effect: ${result.diff_in_diff:+.2f}")

⏱️ Timing Effects — When it happens¶

Use when you care about time-to-event: time to purchase, time to churn, event rates.

Survival Analysis¶

from expstats import timing

result = timing.analyze(
    control_times=[5, 8, 12, 15, 18, 22, 25, 30],
    control_events=[1, 1, 1, 0, 1, 1, 0, 1],      # 1=event, 0=censored
    treatment_times=[3, 6, 9, 12, 14, 16, 20, 24],
    treatment_events=[1, 1, 1, 1, 0, 1, 1, 1],
)

print(f"Control median time: {result.control_median_time}")
print(f"Treatment median time: {result.treatment_median_time}")
print(f"Hazard ratio: {result.hazard_ratio:.3f}")
print(f"Significant: {result.is_significant}")

Kaplan-Meier Survival Curves¶

curve = timing.survival_curve(
    times=[5, 10, 15, 20, 25, 30],
    events=[1, 1, 0, 1, 1, 0],
    confidence=95,
)

print(f"Median survival time: {curve.median_time}")
print(f"Survival probabilities: {curve.survival_probabilities}")

Event Rate Analysis (Poisson)¶

result = timing.analyze_rates(
    control_events=45,
    control_exposure=100,      # 100 days of observation
    treatment_events=38,
    treatment_exposure=100,
)

print(f"Rate ratio: {result.rate_ratio:.3f}")
print(f"Significant: {result.is_significant}")

📋 Generate Stakeholder Reports¶

Every effect type includes summarize() for markdown reports:

result = conversion.analyze(...)
report = conversion.summarize(result, test_name="Signup Button Test")
print(report)

🌐 Web Interface¶

expstats includes a web UI for interactive analysis:

expstats-server
# Open http://localhost:8000

Features include:

Sample Size Calculator — Plan tests with intuitive parameter explanations
A/B Test Results — Analyze 2-variant and multi-variant tests
Timing & Rates — Survival analysis and Poisson rate comparisons
Diff-in-Diff — Quasi-experimental causal inference
Confidence Intervals — Estimate precision of your metrics

Why "Outcome Effects"?¶

Traditional A/B testing tools are test-centric: "Which statistical test should I use?"

expstats is effect-centric: "What changed about user behavior?"

This means:

Matches how stakeholders think — "Did conversion increase?" not "Did we reject the null hypothesis?"
Avoids false equivalence — A conversion effect and a magnitude effect are different things
Generalizes naturally — Timing, variance, and durability effects fit cleanly

Best Practices¶

Decide sample size BEFORE starting — Don't peek and stop early
Run for at least 1-2 weeks — Capture weekly patterns
Look at confidence intervals — Not just p-values
Statistical significance ≠ business significance — A 0.1% lift might be "significant" but not worth it
Use Bonferroni correction — For multi-variant tests

License¶

MIT License

Credits¶

Inspired by Evan Miller's A/B Testing Tools.