Controversial CorrelationsCorrelation vs Causation
Back to the feed

The fine print of every chart

What Is Correlation vs Causation?

Two things moving together is not the same as one of them making the other happen. Knowing the difference is the whole reason this site exists — every correlation here is real, and almost none of them are causal.

Summer heat drives up both ice cream sales and drowning deaths, so the two correlate without one causing the other.
A confounding variable — summer heat — moves both series at once.

The short version

Correlation means two measurements tend to rise and fall together. Causation means changing one of them actually changes the other. Correlation is something you can see in the data; causation is a claim about why. The data alone can never prove the why — which is exactly why “correlation does not imply causation” is repeated so often.

The classic example: ice cream & drownings

Across a year, ice cream sales and drowning deaths track each other almost perfectly. Both climb in June, peak in July and August, and collapse by October. If you only looked at the chart, you might conclude that buying ice cream is dangerously linked to drowning.

It is not. A third variable — summer heat — is quietly driving both. When it is hot, more people buy ice cream, and, completely separately, more people go swimming, so more people drown. The heat is a confounding variable: it causes both series, which makes them correlate even though neither one touches the other. Ban ice cream and the drownings would not budge.

This is the textbook illustration of confounding, and it is the trap nearly every “controversial correlation” on this site is built on.

Why it matters

Mistaking correlation for causation leads to expensive mistakes: medicine that does not work, policies that miss the real lever, and headlines that scare people for no reason. The cost of the error is acting on the wrong cause — pouring effort into ice cream when the problem was the heat all along.

How to tell them apart

There is no single test, but investigators lean on a handful of reliable tactics. Run a suspicious correlation through these before believing it:

  1. Hunt for a confounder

    Ask what third thing could be driving both variables at once. A "lurking" variable like summer heat, age, or income is the single most common reason unrelated trends march in lock-step.

  2. Check the direction

    If A and B are linked, does A cause B, does B cause A, or neither? "Reverse causation" is easy to miss — e.g. exercise and good mood feed each other, so the arrow can point either way.

  3. Demand temporal order

    A cause has to come before its effect. If the supposed effect was already moving before the cause showed up, the story falls apart.

  4. Insist on a mechanism

    Can you tell a believable step-by-step story for how one leads to the other? "No plausible mechanism" is a strong hint you are looking at a coincidence.

  5. Distrust tiny or cherry-picked data

    Mine enough datasets over enough years and striking correlations appear by pure chance. Short time windows and small samples make spurious matches almost inevitable.

  6. Look for a dose-response

    Real causes usually scale: more of the cause produces more of the effect. A consistent dose-response gradient is one of the Bradford Hill criteria epidemiologists use.

  7. Find a controlled experiment

    Randomized controlled trials randomly assign who gets the "cause", which cancels out confounders. When an experiment is impossible, look for natural experiments or replication across populations.

Common questions

What is the difference between correlation and causation?
Correlation means two variables tend to move together. Causation means a change in one variable actually produces a change in the other. Correlation can exist without causation — most famously when a hidden third variable drives both.
Why do ice cream sales correlate with drowning deaths?
Both rise in summer. Hot weather is a confounding variable: it pushes people to buy ice cream and, separately, to swim — which raises drownings. Ice cream does not cause drownings; the heat drives both.
How can you prove causation?
The strongest evidence is a randomized controlled experiment, which randomly assigns the cause and so cancels out confounders. Where experiments are impossible, researchers weigh criteria like mechanism, temporal order, dose-response, and replication.

Sources & further reading