magnitude¶
Magnitude Effects — How much it happens
The magnitude module provides tools for analyzing experiments where the outcome is a continuous value: revenue, time spent, order value, number of actions. Use this when you care about the size of the outcome, not just whether it happened.
Overview¶
| Function | Purpose |
|---|---|
sample_size() |
Calculate required sample size for a test |
analyze() |
Analyze a 2-variant A/B test |
analyze_multi() |
Analyze a multi-variant test (3+ variants) |
diff_in_diff() |
Difference-in-Differences analysis |
confidence_interval() |
Calculate confidence interval for a mean |
summarize() |
Generate stakeholder report for 2-variant test |
summarize_multi() |
Generate stakeholder report for multi-variant test |
summarize_diff_in_diff() |
Generate stakeholder report for DiD |
summarize_plan() |
Generate stakeholder report for sample size plan |
sample_size¶
Calculate the required sample size to detect a given lift in a numeric metric.
def sample_size(
current_mean: float,
current_std: float,
lift_percent: float = 5,
confidence: int = 95,
power: int = 80,
num_variants: int = 2,
) -> SampleSizePlan
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
current_mean |
float |
required | Current mean value of the metric |
current_std |
float |
required | Standard deviation of the metric |
lift_percent |
float |
5 |
Minimum relative lift to detect (e.g., 5 = 5% improvement) |
confidence |
int |
95 |
Confidence level (e.g., 95 for 95% confidence) |
power |
int |
80 |
Statistical power (e.g., 80 for 80% power) |
num_variants |
int |
2 |
Number of variants including control |
Returns¶
SampleSizePlan with attributes:
| Attribute | Type | Description |
|---|---|---|
visitors_per_variant |
int |
Required visitors per variant |
total_visitors |
int |
Total visitors needed across all variants |
current_mean |
float |
Current mean value |
expected_mean |
float |
Expected variant mean if lift is achieved |
standard_deviation |
float |
Standard deviation used |
lift_percent |
float |
Target lift percentage |
confidence |
int |
Confidence level |
power |
int |
Statistical power |
test_duration_days |
int | None |
Estimated test duration (set via with_daily_traffic()) |
Methods¶
with_daily_traffic(daily_visitors: int) -> SampleSizePlan
Set daily traffic to calculate estimated test duration.
Example¶
from expstats import magnitude
plan = magnitude.sample_size(
current_mean=50, # $50 average order value
current_std=25, # $25 standard deviation
lift_percent=5, # detect 5% relative lift
confidence=95,
power=80,
)
print(f"Need {plan.visitors_per_variant:,} per variant")
print(f"Total: {plan.total_visitors:,}")
# Calculate duration
plan.with_daily_traffic(5000)
print(f"Duration: {plan.test_duration_days} days")
analyze¶
Analyze a 2-variant A/B test for numeric metrics using Welch's t-test.
def analyze(
control_visitors: int,
control_mean: float,
control_std: float,
variant_visitors: int,
variant_mean: float,
variant_std: float,
confidence: int = 95,
) -> TestResults
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
control_visitors |
int |
required | Sample size in control group |
control_mean |
float |
required | Mean value in control group |
control_std |
float |
required | Standard deviation in control group |
variant_visitors |
int |
required | Sample size in variant group |
variant_mean |
float |
required | Mean value in variant group |
variant_std |
float |
required | Standard deviation in variant group |
confidence |
int |
95 |
Confidence level |
Returns¶
TestResults with attributes:
| Attribute | Type | Description |
|---|---|---|
control_mean |
float |
Control mean |
variant_mean |
float |
Variant mean |
lift_percent |
float |
Relative lift (%) |
lift_absolute |
float |
Absolute lift |
is_significant |
bool |
Whether result is statistically significant |
confidence |
int |
Confidence level used |
p_value |
float |
P-value of the test |
confidence_interval_lower |
float |
Lower bound of CI for lift |
confidence_interval_upper |
float |
Upper bound of CI for lift |
control_std |
float |
Control standard deviation |
variant_std |
float |
Variant standard deviation |
winner |
str |
"control", "variant", or "no winner yet" |
recommendation |
str |
Plain-English recommendation |
Example¶
from expstats import magnitude
result = magnitude.analyze(
control_visitors=5000,
control_mean=50.00,
control_std=25.00,
variant_visitors=5000,
variant_mean=52.50,
variant_std=25.00,
)
print(f"Significant: {result.is_significant}")
print(f"Lift: {result.lift_percent:+.1f}%")
print(f"Winner: {result.winner}")
print(result.recommendation)
analyze_multi¶
Analyze a multi-variant test (3+ variants) using one-way ANOVA with optional Bonferroni correction for pairwise comparisons.
def analyze_multi(
variants: List[Dict[str, Any]],
confidence: int = 95,
correction: Literal["bonferroni", "none"] = "bonferroni",
) -> MultiVariantResults
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
variants |
list[dict] |
required | List of variant dictionaries |
confidence |
int |
95 |
Confidence level |
correction |
str |
"bonferroni" |
Multiple comparison correction method |
Each variant dictionary must have:
| Key | Type | Description |
|---|---|---|
name |
str |
Variant name |
visitors |
int |
Sample size |
mean |
float |
Mean value |
std |
float |
Standard deviation |
Returns¶
MultiVariantResults with attributes:
| Attribute | Type | Description |
|---|---|---|
variants |
list[Variant] |
List of Variant objects |
is_significant |
bool |
Whether overall test is significant |
confidence |
int |
Confidence level |
p_value |
float |
ANOVA test p-value |
f_statistic |
float |
F-statistic |
df_between |
int |
Degrees of freedom (between groups) |
df_within |
int |
Degrees of freedom (within groups) |
best_variant |
str |
Name of best performing variant |
worst_variant |
str |
Name of worst performing variant |
pairwise_comparisons |
list[PairwiseComparison] |
All pairwise comparisons |
recommendation |
str |
Plain-English recommendation |
Example¶
from expstats import magnitude
result = magnitude.analyze_multi(
variants=[
{"name": "control", "visitors": 1000, "mean": 50, "std": 25},
{"name": "new_layout", "visitors": 1000, "mean": 52, "std": 25},
{"name": "premium_upsell", "visitors": 1000, "mean": 55, "std": 25},
]
)
print(f"Best: {result.best_variant}")
print(f"F-statistic: {result.f_statistic:.2f}")
print(f"Significant: {result.is_significant}")
for p in result.pairwise_comparisons:
if p.is_significant:
print(f" {p.variant_a} vs {p.variant_b}: p={p.p_value_adjusted:.4f}")
diff_in_diff¶
Perform a Difference-in-Differences analysis for numeric metrics. Used for quasi-experimental designs with pre/post measurements.
def diff_in_diff(
control_pre_n: int,
control_pre_mean: float,
control_pre_std: float,
control_post_n: int,
control_post_mean: float,
control_post_std: float,
treatment_pre_n: int,
treatment_pre_mean: float,
treatment_pre_std: float,
treatment_post_n: int,
treatment_post_mean: float,
treatment_post_std: float,
confidence: int = 95,
) -> DiffInDiffResults
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
control_pre_n |
int |
required | Control group sample size in pre-period |
control_pre_mean |
float |
required | Control group mean in pre-period |
control_pre_std |
float |
required | Control group std dev in pre-period |
control_post_n |
int |
required | Control group sample size in post-period |
control_post_mean |
float |
required | Control group mean in post-period |
control_post_std |
float |
required | Control group std dev in post-period |
treatment_pre_n |
int |
required | Treatment group sample size in pre-period |
treatment_pre_mean |
float |
required | Treatment group mean in pre-period |
treatment_pre_std |
float |
required | Treatment group std dev in pre-period |
treatment_post_n |
int |
required | Treatment group sample size in post-period |
treatment_post_mean |
float |
required | Treatment group mean in post-period |
treatment_post_std |
float |
required | Treatment group std dev in post-period |
confidence |
int |
95 |
Confidence level |
Returns¶
DiffInDiffResults with attributes:
| Attribute | Type | Description |
|---|---|---|
control_pre_mean |
float |
Control pre-period mean |
control_post_mean |
float |
Control post-period mean |
treatment_pre_mean |
float |
Treatment pre-period mean |
treatment_post_mean |
float |
Treatment post-period mean |
control_change |
float |
Change in control group |
treatment_change |
float |
Change in treatment group |
diff_in_diff |
float |
DiD estimate (treatment effect) |
diff_in_diff_percent |
float |
DiD as relative percent |
is_significant |
bool |
Whether DiD is significant |
confidence |
int |
Confidence level |
p_value |
float |
P-value |
t_statistic |
float |
T-statistic |
degrees_of_freedom |
float |
Degrees of freedom |
confidence_interval_lower |
float |
Lower CI bound |
confidence_interval_upper |
float |
Upper CI bound |
recommendation |
str |
Plain-English recommendation |
Example¶
from expstats import magnitude
result = magnitude.diff_in_diff(
control_pre_n=1000,
control_pre_mean=50.00,
control_pre_std=25.00,
control_post_n=1000,
control_post_mean=51.00,
control_post_std=25.00,
treatment_pre_n=1000,
treatment_pre_mean=50.00,
treatment_pre_std=25.00,
treatment_post_n=1000,
treatment_post_mean=55.00,
treatment_post_std=26.00,
)
print(f"DiD effect: ${result.diff_in_diff:+.2f}")
print(f"Significant: {result.is_significant}")
confidence_interval¶
Calculate the confidence interval for a single mean using the t-distribution.
def confidence_interval(
visitors: int,
mean: float,
std: float,
confidence: int = 95,
) -> ConfidenceInterval
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
visitors |
int |
required | Sample size |
mean |
float |
required | Sample mean |
std |
float |
required | Sample standard deviation |
confidence |
int |
95 |
Confidence level |
Returns¶
ConfidenceInterval with attributes:
| Attribute | Type | Description |
|---|---|---|
mean |
float |
Sample mean |
lower |
float |
Lower bound of CI |
upper |
float |
Upper bound of CI |
confidence |
int |
Confidence level |
margin_of_error |
float |
Margin of error |
Example¶
from expstats import magnitude
ci = magnitude.confidence_interval(
visitors=1000,
mean=50.00,
std=25.00,
confidence=95,
)
print(f"Mean: ${ci.mean:.2f}")
print(f"95% CI: [${ci.lower:.2f}, ${ci.upper:.2f}]")
summarize¶
Generate a markdown report for a 2-variant test result.
def summarize(
result: TestResults,
test_name: str = "Revenue Test",
metric_name: str = "Average Order Value",
currency: str = "$",
) -> str
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
result |
TestResults |
required | Result from analyze() |
test_name |
str |
"Revenue Test" |
Name of the test for the report |
metric_name |
str |
"Average Order Value" |
Name of the metric |
currency |
str |
"$" |
Currency symbol to use |
Returns¶
A markdown-formatted string suitable for sharing with stakeholders.
Example¶
from expstats import magnitude
result = magnitude.analyze(...)
report = magnitude.summarize(
result,
test_name="Checkout Flow Test",
metric_name="Average Order Value",
currency="$"
)
print(report)
summarize_multi¶
Generate a markdown report for a multi-variant test result.
def summarize_multi(
result: MultiVariantResults,
test_name: str = "Multi-Variant Test",
metric_name: str = "Average Value",
currency: str = "$",
) -> str
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
result |
MultiVariantResults |
required | Result from analyze_multi() |
test_name |
str |
"Multi-Variant Test" |
Name of the test for the report |
metric_name |
str |
"Average Value" |
Name of the metric |
currency |
str |
"$" |
Currency symbol to use |
Returns¶
A markdown-formatted string with variant performance table and pairwise comparisons.
summarize_diff_in_diff¶
Generate a markdown report for a Difference-in-Differences analysis.
def summarize_diff_in_diff(
result: DiffInDiffResults,
test_name: str = "Difference-in-Differences Analysis",
metric_name: str = "Average Value",
currency: str = "$",
) -> str
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
result |
DiffInDiffResults |
required | Result from diff_in_diff() |
test_name |
str |
"Difference-in-Differences Analysis" |
Name of the analysis |
metric_name |
str |
"Average Value" |
Name of the metric |
currency |
str |
"$" |
Currency symbol |
Returns¶
A markdown-formatted string with pre/post comparison table, DiD estimate, and interpretation.
summarize_plan¶
Generate a markdown report for a sample size plan.
def summarize_plan(
plan: SampleSizePlan,
test_name: str = "Revenue Test",
metric_name: str = "Average Order Value",
currency: str = "$",
) -> str
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
plan |
SampleSizePlan |
required | Result from sample_size() |
test_name |
str |
"Revenue Test" |
Name of the test for the report |
metric_name |
str |
"Average Order Value" |
Name of the metric |
currency |
str |
"$" |
Currency symbol to use |
Returns¶
A markdown-formatted string with test parameters, required sample size, and duration estimate.
Data Classes¶
SampleSizePlan¶
@dataclass
class SampleSizePlan:
visitors_per_variant: int
total_visitors: int
current_mean: float
expected_mean: float
standard_deviation: float
lift_percent: float
confidence: int
power: int
test_duration_days: Optional[int] = None
def with_daily_traffic(self, daily_visitors: int) -> 'SampleSizePlan': ...
TestResults¶
@dataclass
class TestResults:
control_mean: float
variant_mean: float
lift_percent: float
lift_absolute: float
is_significant: bool
confidence: int
p_value: float
confidence_interval_lower: float
confidence_interval_upper: float
control_visitors: int
control_std: float
variant_visitors: int
variant_std: float
winner: Literal["control", "variant", "no winner yet"]
recommendation: str
ConfidenceInterval¶
@dataclass
class ConfidenceInterval:
mean: float
lower: float
upper: float
confidence: int
margin_of_error: float
MultiVariantResults¶
@dataclass
class MultiVariantResults:
variants: List[Variant]
is_significant: bool
confidence: int
p_value: float
f_statistic: float
df_between: int
df_within: int
best_variant: str
worst_variant: str
pairwise_comparisons: List[PairwiseComparison]
recommendation: str