#54 - Correlation and Causality
Why “correlation isn’t causation” is both true... and badly overused
Last time we looked at the curse of monocausality and the damage done when people insist a single variable explains everything. Today we turn to a related and equally abused idea: the relationship between correlation and causation.
You have heard the phrase a thousand times... “Correlation does not equal causation!” It is one of the most repeated lines in popular discourse.
Like many overused phrases, it’s both completely true and badly misused.
The Cliché Exists Because It Is True
Two things can move together without either one causing the other. Ice cream sales and drowning deaths rise and fall in near lockstep, not because popsicles are dangerous but because both track the arrival of summer. Children’s shoe sizes correlate strongly with their reading ability, not because bigger feet help you read, but because older kids have both larger feet and more schooling. The world is chock full of variables that move in parallel for reasons that have nothing to do with one causing the other.
This is a useful caution when reading any newspaper or even (especially) academic studies! Because correlations are far easier to find than causal relationships, a ton of published research and even more journalism about that research, treats “X is correlated with Y” as though it proved “X causes Y”.
These statements resonate because we humans like explanations about the world we live in. But such simplifications are often absurd misrepresentations of what’s actually happening. Anyone trying to understand the world needs to hold this error in mind.
The Cliché Gets Weaponized
Unfortunately, “Correlation doesn’t equal causation” has become a thought-terminating cliché. The moment someone presents a finding another person doesn’t like, out comes the phrase, deployed as though it settled the matter. It doesn’t.
Smoking and lung cancer started as a correlation. So did seat belts and survival. So did hand-washing and reduced infection. A huge portion of what we now accept as basic causal knowledge began as an observed correlation that turned out to reflect a real causal link once properly investigated.
Dismissing any uncomfortable correlation by reciting the phrase is not skepticism. It is lazy reasoning dressed up to sound rigorous. It is also, incidentally, a close cousin of the motte-and-bailey tactic we covered earlier. The motte: “correlation is not proof of causation”. The bailey: “therefore the correlation is meaningless and we should stop talking about it”. The first is true. The second does not follow.
What a Correlation Actually Implies
When you observe a real, reproducible correlation between two variables, there are only three possibilities:
Coincidence. The correlation is a statistical accident. Over enough variables, pure chance will produce apparent patterns that mean nothing. The Nicolas Cage films vs pool drownings meme is the canonical example.
A confounding third factor. Both variables are being driven by something else. Summer drives ice cream and drowning. Age drives shoe size and reading ability.
Actual causation. One variable genuinely influences the other. Sometimes this runs in one direction, sometimes both variables reinforce each other.
A real correlation is always one of these three. The honest analyst’s job is to uncover which of the three a given correlation most likely is. And as we covered in single vs multivariate and monocausality, even when actual causation is present, it is almost never the entire story. The cause you have identified is usually one of several, each contributing something to the outcome.

A Rough Framework for Evaluating Causality
Epidemiologists have worked on this problem for decades. The result is a set of criteria (often attributed to the British statistician Austin Bradford Hill) that can be simplified into something any reader can actually apply:
Strength. How strong is the correlation? Weak correlations are much more likely to be spurious or confounded.
Consistency. Does it show up across different studies, populations, and contexts? A correlation that replicates is much more credible than one that appears once.
Temporality. Does the supposed cause precede the effect? If not, whatever you are looking at is not causation in the normal sense.
Dose-response. When the supposed cause increases, does the effect increase with it? A clean dose-response pattern is a strong signal of real causation.
Plausibility. Is there a known mechanism that would explain the link? The absence of a mechanism doesn’t kill a causal claim, but its presence strengthens one considerably.
Ruling out confounders. Have plausible third factors been controlled for? This is the step most armchair analysts skip entirely.
Experimental evidence where possible. Can the effect be reproduced in a controlled setting? Even a rough natural experiment can tell you a lot.
An aside: We’ve applied this set of criteria in the past, in Midwits in Action But Correlation Doesn’t Equal Causation assessing ties between the COVID Vaccine and increased rates of cancer that have followed.
No single criterion is decisive. But the more of them line up, the more confident you can be that a correlation reflects a real causal relationship.
The Honest Middle
The intellectually honest position on any given correlation sits between two lazy extremes. On one side, the person who treats every correlation as proof of causation. On the other, the person who reflexively dismisses every correlation (or at least the ones they don’t like) with the cliché and walks away feeling clever. Truth does not live at either extreme. It lives in the work of figuring out which kind of correlation you are actually looking at.
With normal distributions, multivariate thinking, monocausality, and now correlation and causality, the statistical toolkit is nearly complete. But there is one more illusion to dismantle before we put any of it to work: the myth of the average person. Next time, we look at why the average does not exist.






