Skip to content

magnitude

Magnitude EffectsHow much it happens

The magnitude module provides tools for analyzing experiments where the outcome is a continuous value: revenue, time spent, order value, number of actions. Use this when you care about the size of the outcome, not just whether it happened.

Overview

Function Purpose
sample_size() Calculate required sample size for a test
analyze() Analyze a 2-variant A/B test
analyze_multi() Analyze a multi-variant test (3+ variants)
diff_in_diff() Difference-in-Differences analysis
confidence_interval() Calculate confidence interval for a mean
summarize() Generate stakeholder report for 2-variant test
summarize_multi() Generate stakeholder report for multi-variant test
summarize_diff_in_diff() Generate stakeholder report for DiD
summarize_plan() Generate stakeholder report for sample size plan

sample_size

Calculate the required sample size to detect a given lift in a numeric metric.

def sample_size(
    current_mean: float,
    current_std: float,
    lift_percent: float = 5,
    confidence: int = 95,
    power: int = 80,
    num_variants: int = 2,
) -> SampleSizePlan

Parameters

Parameter Type Default Description
current_mean float required Current mean value of the metric
current_std float required Standard deviation of the metric
lift_percent float 5 Minimum relative lift to detect (e.g., 5 = 5% improvement)
confidence int 95 Confidence level (e.g., 95 for 95% confidence)
power int 80 Statistical power (e.g., 80 for 80% power)
num_variants int 2 Number of variants including control

Returns

SampleSizePlan with attributes:

Attribute Type Description
visitors_per_variant int Required visitors per variant
total_visitors int Total visitors needed across all variants
current_mean float Current mean value
expected_mean float Expected variant mean if lift is achieved
standard_deviation float Standard deviation used
lift_percent float Target lift percentage
confidence int Confidence level
power int Statistical power
test_duration_days int | None Estimated test duration (set via with_daily_traffic())

Methods

with_daily_traffic(daily_visitors: int) -> SampleSizePlan

Set daily traffic to calculate estimated test duration.

Example

from expstats import magnitude

plan = magnitude.sample_size(
    current_mean=50,      # $50 average order value
    current_std=25,       # $25 standard deviation
    lift_percent=5,       # detect 5% relative lift
    confidence=95,
    power=80,
)

print(f"Need {plan.visitors_per_variant:,} per variant")
print(f"Total: {plan.total_visitors:,}")

# Calculate duration
plan.with_daily_traffic(5000)
print(f"Duration: {plan.test_duration_days} days")

analyze

Analyze a 2-variant A/B test for numeric metrics using Welch's t-test.

def analyze(
    control_visitors: int,
    control_mean: float,
    control_std: float,
    variant_visitors: int,
    variant_mean: float,
    variant_std: float,
    confidence: int = 95,
) -> TestResults

Parameters

Parameter Type Default Description
control_visitors int required Sample size in control group
control_mean float required Mean value in control group
control_std float required Standard deviation in control group
variant_visitors int required Sample size in variant group
variant_mean float required Mean value in variant group
variant_std float required Standard deviation in variant group
confidence int 95 Confidence level

Returns

TestResults with attributes:

Attribute Type Description
control_mean float Control mean
variant_mean float Variant mean
lift_percent float Relative lift (%)
lift_absolute float Absolute lift
is_significant bool Whether result is statistically significant
confidence int Confidence level used
p_value float P-value of the test
confidence_interval_lower float Lower bound of CI for lift
confidence_interval_upper float Upper bound of CI for lift
control_std float Control standard deviation
variant_std float Variant standard deviation
winner str "control", "variant", or "no winner yet"
recommendation str Plain-English recommendation

Example

from expstats import magnitude

result = magnitude.analyze(
    control_visitors=5000,
    control_mean=50.00,
    control_std=25.00,
    variant_visitors=5000,
    variant_mean=52.50,
    variant_std=25.00,
)

print(f"Significant: {result.is_significant}")
print(f"Lift: {result.lift_percent:+.1f}%")
print(f"Winner: {result.winner}")
print(result.recommendation)

analyze_multi

Analyze a multi-variant test (3+ variants) using one-way ANOVA with optional Bonferroni correction for pairwise comparisons.

def analyze_multi(
    variants: List[Dict[str, Any]],
    confidence: int = 95,
    correction: Literal["bonferroni", "none"] = "bonferroni",
) -> MultiVariantResults

Parameters

Parameter Type Default Description
variants list[dict] required List of variant dictionaries
confidence int 95 Confidence level
correction str "bonferroni" Multiple comparison correction method

Each variant dictionary must have:

Key Type Description
name str Variant name
visitors int Sample size
mean float Mean value
std float Standard deviation

Returns

MultiVariantResults with attributes:

Attribute Type Description
variants list[Variant] List of Variant objects
is_significant bool Whether overall test is significant
confidence int Confidence level
p_value float ANOVA test p-value
f_statistic float F-statistic
df_between int Degrees of freedom (between groups)
df_within int Degrees of freedom (within groups)
best_variant str Name of best performing variant
worst_variant str Name of worst performing variant
pairwise_comparisons list[PairwiseComparison] All pairwise comparisons
recommendation str Plain-English recommendation

Example

from expstats import magnitude

result = magnitude.analyze_multi(
    variants=[
        {"name": "control", "visitors": 1000, "mean": 50, "std": 25},
        {"name": "new_layout", "visitors": 1000, "mean": 52, "std": 25},
        {"name": "premium_upsell", "visitors": 1000, "mean": 55, "std": 25},
    ]
)

print(f"Best: {result.best_variant}")
print(f"F-statistic: {result.f_statistic:.2f}")
print(f"Significant: {result.is_significant}")

for p in result.pairwise_comparisons:
    if p.is_significant:
        print(f"  {p.variant_a} vs {p.variant_b}: p={p.p_value_adjusted:.4f}")

diff_in_diff

Perform a Difference-in-Differences analysis for numeric metrics. Used for quasi-experimental designs with pre/post measurements.

def diff_in_diff(
    control_pre_n: int,
    control_pre_mean: float,
    control_pre_std: float,
    control_post_n: int,
    control_post_mean: float,
    control_post_std: float,
    treatment_pre_n: int,
    treatment_pre_mean: float,
    treatment_pre_std: float,
    treatment_post_n: int,
    treatment_post_mean: float,
    treatment_post_std: float,
    confidence: int = 95,
) -> DiffInDiffResults

Parameters

Parameter Type Default Description
control_pre_n int required Control group sample size in pre-period
control_pre_mean float required Control group mean in pre-period
control_pre_std float required Control group std dev in pre-period
control_post_n int required Control group sample size in post-period
control_post_mean float required Control group mean in post-period
control_post_std float required Control group std dev in post-period
treatment_pre_n int required Treatment group sample size in pre-period
treatment_pre_mean float required Treatment group mean in pre-period
treatment_pre_std float required Treatment group std dev in pre-period
treatment_post_n int required Treatment group sample size in post-period
treatment_post_mean float required Treatment group mean in post-period
treatment_post_std float required Treatment group std dev in post-period
confidence int 95 Confidence level

Returns

DiffInDiffResults with attributes:

Attribute Type Description
control_pre_mean float Control pre-period mean
control_post_mean float Control post-period mean
treatment_pre_mean float Treatment pre-period mean
treatment_post_mean float Treatment post-period mean
control_change float Change in control group
treatment_change float Change in treatment group
diff_in_diff float DiD estimate (treatment effect)
diff_in_diff_percent float DiD as relative percent
is_significant bool Whether DiD is significant
confidence int Confidence level
p_value float P-value
t_statistic float T-statistic
degrees_of_freedom float Degrees of freedom
confidence_interval_lower float Lower CI bound
confidence_interval_upper float Upper CI bound
recommendation str Plain-English recommendation

Example

from expstats import magnitude

result = magnitude.diff_in_diff(
    control_pre_n=1000,
    control_pre_mean=50.00,
    control_pre_std=25.00,
    control_post_n=1000,
    control_post_mean=51.00,
    control_post_std=25.00,
    treatment_pre_n=1000,
    treatment_pre_mean=50.00,
    treatment_pre_std=25.00,
    treatment_post_n=1000,
    treatment_post_mean=55.00,
    treatment_post_std=26.00,
)

print(f"DiD effect: ${result.diff_in_diff:+.2f}")
print(f"Significant: {result.is_significant}")

confidence_interval

Calculate the confidence interval for a single mean using the t-distribution.

def confidence_interval(
    visitors: int,
    mean: float,
    std: float,
    confidence: int = 95,
) -> ConfidenceInterval

Parameters

Parameter Type Default Description
visitors int required Sample size
mean float required Sample mean
std float required Sample standard deviation
confidence int 95 Confidence level

Returns

ConfidenceInterval with attributes:

Attribute Type Description
mean float Sample mean
lower float Lower bound of CI
upper float Upper bound of CI
confidence int Confidence level
margin_of_error float Margin of error

Example

from expstats import magnitude

ci = magnitude.confidence_interval(
    visitors=1000,
    mean=50.00,
    std=25.00,
    confidence=95,
)

print(f"Mean: ${ci.mean:.2f}")
print(f"95% CI: [${ci.lower:.2f}, ${ci.upper:.2f}]")

summarize

Generate a markdown report for a 2-variant test result.

def summarize(
    result: TestResults,
    test_name: str = "Revenue Test",
    metric_name: str = "Average Order Value",
    currency: str = "$",
) -> str

Parameters

Parameter Type Default Description
result TestResults required Result from analyze()
test_name str "Revenue Test" Name of the test for the report
metric_name str "Average Order Value" Name of the metric
currency str "$" Currency symbol to use

Returns

A markdown-formatted string suitable for sharing with stakeholders.

Example

from expstats import magnitude

result = magnitude.analyze(...)
report = magnitude.summarize(
    result,
    test_name="Checkout Flow Test",
    metric_name="Average Order Value",
    currency="$"
)
print(report)

summarize_multi

Generate a markdown report for a multi-variant test result.

def summarize_multi(
    result: MultiVariantResults,
    test_name: str = "Multi-Variant Test",
    metric_name: str = "Average Value",
    currency: str = "$",
) -> str

Parameters

Parameter Type Default Description
result MultiVariantResults required Result from analyze_multi()
test_name str "Multi-Variant Test" Name of the test for the report
metric_name str "Average Value" Name of the metric
currency str "$" Currency symbol to use

Returns

A markdown-formatted string with variant performance table and pairwise comparisons.


summarize_diff_in_diff

Generate a markdown report for a Difference-in-Differences analysis.

def summarize_diff_in_diff(
    result: DiffInDiffResults,
    test_name: str = "Difference-in-Differences Analysis",
    metric_name: str = "Average Value",
    currency: str = "$",
) -> str

Parameters

Parameter Type Default Description
result DiffInDiffResults required Result from diff_in_diff()
test_name str "Difference-in-Differences Analysis" Name of the analysis
metric_name str "Average Value" Name of the metric
currency str "$" Currency symbol

Returns

A markdown-formatted string with pre/post comparison table, DiD estimate, and interpretation.


summarize_plan

Generate a markdown report for a sample size plan.

def summarize_plan(
    plan: SampleSizePlan,
    test_name: str = "Revenue Test",
    metric_name: str = "Average Order Value",
    currency: str = "$",
) -> str

Parameters

Parameter Type Default Description
plan SampleSizePlan required Result from sample_size()
test_name str "Revenue Test" Name of the test for the report
metric_name str "Average Order Value" Name of the metric
currency str "$" Currency symbol to use

Returns

A markdown-formatted string with test parameters, required sample size, and duration estimate.


Data Classes

SampleSizePlan

@dataclass
class SampleSizePlan:
    visitors_per_variant: int
    total_visitors: int
    current_mean: float
    expected_mean: float
    standard_deviation: float
    lift_percent: float
    confidence: int
    power: int
    test_duration_days: Optional[int] = None

    def with_daily_traffic(self, daily_visitors: int) -> 'SampleSizePlan': ...

TestResults

@dataclass
class TestResults:
    control_mean: float
    variant_mean: float
    lift_percent: float
    lift_absolute: float
    is_significant: bool
    confidence: int
    p_value: float
    confidence_interval_lower: float
    confidence_interval_upper: float
    control_visitors: int
    control_std: float
    variant_visitors: int
    variant_std: float
    winner: Literal["control", "variant", "no winner yet"]
    recommendation: str

ConfidenceInterval

@dataclass
class ConfidenceInterval:
    mean: float
    lower: float
    upper: float
    confidence: int
    margin_of_error: float

MultiVariantResults

@dataclass
class MultiVariantResults:
    variants: List[Variant]
    is_significant: bool
    confidence: int
    p_value: float
    f_statistic: float
    df_between: int
    df_within: int
    best_variant: str
    worst_variant: str
    pairwise_comparisons: List[PairwiseComparison]
    recommendation: str

PairwiseComparison

@dataclass
class PairwiseComparison:
    variant_a: str
    variant_b: str
    mean_a: float
    mean_b: float
    lift_percent: float
    lift_absolute: float
    p_value: float
    p_value_adjusted: float
    is_significant: bool
    confidence_interval_lower: float
    confidence_interval_upper: float