A celebrated experiment once seemed to prove that babies, long before they can talk, are born preferring the kind over the unkind — but shown the same choice, more than a thousand babies across thirty-seven labs split evenly down the middle

It is one of those findings that feels true the moment you hear it, and that is part of why it traveled so far. Show a baby a little morality play — one character helps, another hinders — and the baby, too young to speak or even crawl well, reaches for the helper. We are born, the story went, already leaning toward the good. The result made the rounds of television programs and popular books and became a fixture of the idea that human decency runs deeper than anything we are taught.

The trouble is that when researchers recently ran the experiment again, bigger and more carefully than ever before, the babies did not reach for the helper. They split down the middle.

The experiment that charmed everyone

The original study, published in 2007 by Kiley Hamlin, Karen Wynn, and Paul Bloom, used what came to be called the hill paradigm. A wooden shape with googly eyes — the “climber” — struggled to get up a hill. Sometimes a second shape, the “helper,” nudged it up from below. Sometimes a third shape, the “hinderer,” shoved it back down from above. Afterward, the babies were offered the two shapes to reach for, and they overwhelmingly chose the helper.

The researchers had an answer ready for the obvious objection. Maybe babies just like upward movement and dislike downward movement, with nothing moral about it. So they ran a version with an inanimate block in place of the climber — same pushing up, same pushing down, but now nothing that could plausibly have a goal. This time the babies showed no preference. That contrast was the heart of the claim: the preference seemed to track helping and hindering, not mere direction. Six-month-olds did it. Ten-month-olds did it. It looked like the first flicker of a moral sense, present before language, before instruction, almost before anything.

It is worth being clear about what we are doing here. We are writers reporting on psychology, not psychologists, and what follows is not a verdict on whether babies have an inner life. It is the story of how one famous piece of evidence held up when the field stopped admiring it and started testing it.

When others looked again

The first cracks were about bouncing. In 2012, a team led by Damian Scarf pointed out that in the original films the climber did something other than simply go up: after being helped, it bounced happily at the top of the hill. Babies, it turns out, like bouncing. When Scarf’s group rebuilt the scene so the climber bounced at the bottom after being hindered, the babies preferred the hinderer. When it bounced in both places, they chose at random. The unsettling implication was that a low-level perceptual quirk, not a moral judgment, might have been steering the babies all along. Hamlin and her colleagues pushed back, arguing the climber’s gaze and struggling movements made its goal clear in their version and not in Scarf’s, and the two camps reached an impasse that smaller studies could not break.

What broke it was scale. In a project called ManyBabies, dozens of laboratories agreed to run the very same experiment, with the same procedure and a shared plan registered in advance so no one could quietly adjust the analysis after seeing the data. The result, published in 2025, drew on 1,018 infants tested across thirty-seven labs on five continents. In the social version of the task, 49 percent of the babies reached for the helper — a coin flip. In the version with the pushed object, 56 percent reached for the one that pushed up, also indistinguishable from chance, and no different from the social version. The preference that had launched a thousand headlines simply was not there. Tellingly, Kiley Hamlin, who ran the original study, was one of the authors of the replication that failed to find it.

What a failed replication does, and does not, mean

It is tempting to swing from one tidy story to the opposite one — from “babies are born good” to “babies are little blanks.” The evidence supports neither extreme, and this is the part most worth slowing down for.

A large, careful study failing to find an effect does not prove the effect is imaginary. It means that if something is there, it is weaker, rarer, or slower to appear than the early work suggested — too faint for even a thousand babies to reveal it cleanly in this particular task. The helper preference might be real in some narrower form, or under conditions the standardized version smoothed away. Other lines of infant research, using other methods, continue to find that babies are far from indifferent to how people treat one another. One paradigm wobbling is not the collapse of everything we think we know about babies’ social minds.

What the replication does puncture is the confidence. A result that was told and retold as settled fact — as proof of inborn morality — turns out to rest on a foundation that the field’s own most rigorous test could not hold up. That is not a scandal. It is, in the least dramatic sense, science working: a beloved claim subjected to the kind of scrutiny that beloved claims usually escape, and found wanting.

What this can and cannot tell us

For a parent, the temptation is to read any of this as a scorecard on your child’s character, and it is nothing of the sort. The original study never could have told you whether your baby was kind, and the replication cannot tell you your baby is callous. Both were asking a narrower question — whether, on average, infants in a lab reach toward a helpful shape — and the honest current answer is that they do not reliably do so.

What it can offer is a small lesson in how to hold a wonderful-sounding fact. The next time a study promises that science has proven something tender and flattering about human nature — that we are born generous, born fair, born good — it is worth remembering the babies and the hill. Not because the warm version is necessarily false, but because the warm version is exactly the kind we are least inclined to question, and therefore the kind most in need of a second look.

Babies may yet turn out to arrive with more moral equipment than we can presently measure. For now, the most honest thing to say is that we wanted the story to be true, and a thousand babies declined to confirm it.

Print
Share
Pin