umma.dev

Backpropagation vs the Brain

Every few months, a headline announces that some new neural network has “learned to do X just like the human brain.” And sure, there are surface-level similarities - networks of nodes, weighted connections, something vaguely reminiscent of a neuron firing. But underneath that analogy is the fact that the most important algorithm in modern deep learning, backpropagation, is almost certainly not how your brain learns anything.

This isn’t a minor technical footnote. It goes to the heart of what AI is, what the brain is, and why those two things are more different than the hype suggests.

What Backpropagation Actually Does

A neural network learns by adjusting the strength (weight) of connections between neurons. The question is how does it know which connections to adjust, and by how much?

Backpropagation answers this by working backwards. You feed an input through the network, get an output, compare that output to the correct answer and compute an error. Then you propagate that error signal backwards through every layer of the network, using the chain rule from calculus to figure out how much each connection contributed to the mistake.

Each weight gets nudged in the direction that would have reduced the error. Do this millions of times across millions of examples and the network starts learning.

It works extraordinarily well. It’s the backbone of GPT, image classifiers, AlphaFold, speech recognition.

The Biological Implausibility Problem

Here’s where the brain runs into trouble with backpropogation. The algorithm requires things that neurons simply cannot do.

The Weight Transport Problem

Backpropogation requires that the same weight used in the forward pass (passing signals forward through the network) is also used in the backward pass (sending error signals backwards). Mathematically, the error at layer N is multiplied by the exact weight connecting layer N to layer N-1.

Neurons don’t work like that. A synapse is a one-way street, signals travel from one neuron to another, not back and forth across the same connection with the same strength. For backpropogation to work in the brain, every synapse would need a precise symmetric counterpart carrying information in the reverse direction. There’s no evidence such a system exists.

Update Locking

Backpropogation requires the network to fully complete its forward pass before any weights can be updated. You can’t start learning from layer 3 until layer 4 has finished computing and sent its error signal backwards to layer 3.

Biological neurons don’t wait. They fire, they update, they keep going in real time, continuously, without a global “done” signal. The brain doesn’t have a clock cycle marking the end of a computation. Learning in the cortex appears to be local and ongoing, not batched and sequential.

The Non-Local Learning Signal

Perhaps most critically, backpropogation is a global algorithm. The gradient at any given synapse depends on the error computed at the very end of the network, passed back through every intermediate layer. A synapse buried deep in the network has to “know” about an error signal that originated far away.

There’s no biological mechanism for this. A synapse only has access to the activity of the neurons directly connected to it. It cannot see what’s happening three layers downstream. The learning rule would have to be local - dependent only on information available at that synapse.

This constraint is called the locality principle, and it’s one of the oldest critiques of backpropogation as a model of brain learning, raised by Francis Crick as far back as 1989.

How the Brain Actually Learns (As Far As We Know)

The brain’s learning mechanisms are still far from fully understood but a few principles are well-established.

Hebbian Learning

The classic formulation is “neurons that fire together, wire together.” A synapse strengthens when the presynaptic neuron (the one sending a signal) and the postsynaptic neuron (the one receiving it) are active at the same time. This is purely local, each synapse only needs to observe its own inputs and outputs.

Hebbian learning captures a lot of real neuroscience. In its basic form, it’s unstable, weights can grow without bound, and it’s not obvious how it would enable the kind of structured, error-driven learning that lets you, say, get better at chess.

Spike-Timing-Dependent Plasticity (STDP)

Synaptic strength changes depending on the relative timing of pre and post synaptic spikes. If neuron A fires just before neuron B, the connection from A to B strengthens. If A fires just after B, the connection weakens.

STDP is well-documented experimentally and provides a temporal structure to Hebbian learning. It’s still local. It’s still nothing like sending a gradient backwards through a chain of matrix multiplications.

Neuromodulators as a Global Signal

Neuromodulators like dopamine can act as global reward signals. Dopamine release in the striatum correlates with reward prediction error roughly, “things went better or worse than expected.” This is closer to reinforcement learning than backprop, and the brain seems to use something like it for reward-based learning.

But this is a coarse signal compared to the high-precision gradients backprop computes. It tells the brain “things went well” or “things went badly,” not “the third-layer weight at position [42, 17] needs to increase by 0.003.”

The Gap Is Real, and It Matters

Predictive coding frameworks, for example, suggest that the brain might compute something like prediction errors at each layer, which could approximate gradients locally. Geoffrey Hinton’s forward-forward algorithm and Yoshua Bengio’s work on equilibrium propagation are attempts to find biologically plausible alternatives.

None of them have yet shown that a brain-like learning rule can match backprop on hard, real-world tasks. The algorithm’s mathematical efficiency - the way it precisely attributes error across a deep hierarchy is for now, uniquely powerful.

What This Should Change About How We Talk About AI

“Inspired by the brain” is a useful origin story for neural networks - Rosenblatt’s perceptron, Rumelhart and McClelland’s PDP models, Hopfield networks. The early researchers were trying to model neurons.

The brain remains one of the most efficient learning systems we’ve ever encountered - capable of learning from remarkably few examples, generalising in flexible ways, consuming about 20 watts. Understanding how it does that is still an open question. And the answer is probably not, “it runs gradient descent.”