Intermediate 10 min

Reading Your Results

Significance, confidence intervals, and the discipline of not peeking.

Statistical significance is a threshold, not a verdict. 95% confidence means there is roughly a one-in-twenty chance your result is noise. Run twenty flat tests and one will look like a winner by accident.

Always read the confidence interval alongside the point estimate. '+3.2% with a CI of -0.5% to +6.9%' means the test cannot rule out a small loss. '+3.2% with a CI of +1.4% to +5.1%' means you have something.

Segment the result after — never before — the primary metric. Segment results are hypotheses for the next test, not conclusions for this one. And do not call a result early because the dashboard looks good on day three. The dashboard always looks good on day three.

Example

// A flat result is still a result. Log it.
const result = {
  test: 'hero-cta-verb-v1',
  outcome: 'flat',
  primaryMetric: 'ctr',
  lift: 0.4,
  ciLow: -1.2,
  ciHigh: 2.0,
  significant: false,
};

Try this on the Shop

The best way to learn is to ship. Open this surface and apply the lesson.

Hold the Stone of Significance