Neuroscience and computer vision have always had a complicated relationship. Computer vision borrowed heavily from neuroscience in its early days, convolutional neural networks were explicitly inspired by the work of Hubel and Wiesel on the cat visual cortex in the 1960s, and then largely went its own way, driven by data, compute, and benchmark scores rather than biology.
Now the two fields are pulling back toward each other, and the recent research is genuinely interesting in both directions: AI is getting better by thinking more like a brain, and neuroscience is getting better by using AI as a tool to understand what the brain is actually doing.
Standard convolutional neural networks apply a fixed grid of square filters across an image. Each filter slides over the image, detecting edges, textures, or patterns within a fixed rectangular window.
The visual cortex doesn’t work this way. Neurons in V1 (the first visual processing area) respond to oriented edges but their receptive fields are shaped organically - some are narrow and elongated, some are more circular, and they stretch or compress depending on what they’re tuned to detect. The brain matches its filter shape to the job.
A recent paper from researchers at the Institute for Basic Science, Yonsei University, and the Max Planck Institute tried to build this adaptability into AI. Their technique, Lp-Convolution, replaces the fixed square filter with a filter that can dynamically reshape itself, stretching horizontally or vertically depending on what the task demands, using a mathematical framework called the multivariate p-generalised normal distribution. The filters behave more like the selective, tuned receptive fields of cortical neurons.
The practical upshot: on standard image recognition benchmarks (CIFAR-100, TinyImageNet), Lp-Convolution improved accuracy over both traditional CNNs and Vision Transformers while requiring less compute. The researchers presented this at ICLR 2025.
This is a good example of what “brain-inspired” actually means in practice, not mimicking the brain in every detail, but taking a specific principle (adaptive, shape-matched filtering) and asking whether it translates into engineering benefit. Here, it did.
Most deep learning vision models treat an image as a static grid of pixels. The brain doesn’t. It processes the world as a continuous stream of events arriving over time, a fundamentally temporal operation.
A review paper from Machine Intelligence Research argues this is the key failure mode of conventional computer vision: it captures spatial patterns well but largely ignores the spatio-temporal structure of real visual scenes.
The biological alternative starts at the sensor. The retina doesn’t send a full-frame snapshot to the brain every 30 milliseconds. Its ganglion cells fire when something changes in their receptive field, a spike-based, event-driven signal. Neuromorphic cameras (sometimes called spike cameras or event cameras) attempt to replicate this: instead of outputting frames, they output a stream of individual pixel-level events, each timestamped to the microsecond, recording only where and when brightness changed.
Downstream of that, the review describes several brain-inspired processing models:
Excitation-inhibition balanced networks: borrowed from the brain’s mechanism for keeping neural activity stable. Real cortical circuits maintain a careful balance between neurons that excite their neighbours and neurons that inhibit them. This balance enables rapid signal detection and makes the network less likely to spiral into runaway activity or collapse into silence. Computer implementations of this show faster response times than conventional deep networks.
Spiking continuous attractor networks: a model specialised for tracking moving objects. The key feature is something called spike frequency adaptation: as a neuron fires repeatedly, it gradually becomes harder to fire again. This makes the network “predict ahead” rather than just react, which is exactly the anticipatory behaviour you see in the brain when tracking a moving target.
Hierarchical spike train processing: matching the brain’s visual hierarchy (V1 → V2 → V4 → higher areas), where each stage processes increasingly abstract features from the temporal spike patterns coming up from below.
If AI vision systems are increasingly inspired by the brain, brain scientists are increasingly using AI systems as tools to study the brain itself.
The most striking recent example is TRIBE v2, a foundation model from Meta AI. It’s a single model trained on video, audio, and language simultaneously, more than 1,000 hours of fMRI data collected from 720 subjects watching and listening to real-world content. The model learns to predict what pattern of brain activity a given piece of video, audio, or language will produce in a given person.
Why does this matter? Because fMRI experiments are slow, expensive, and exhausting for participants. You can realistically run maybe a few dozen stimuli in a session. TRIBE v2 lets you run millions of virtual experiments: feed any hypothetical stimulus into the model and get a prediction of the brain response. The paper calls this in silico neuroscience, neuroscience run inside a computer rather than inside a skull.
The results are compelling in two ways. First, the model’s predictions are several times more accurate than the previous best (linear encoding models, which essentially just learn a direct mapping from stimulus features to voxel activity). Second, when the researchers used the model to simulate classic neuroscience experiments, experiments that established well-known findings about how the brain responds to faces, objects, language, and multisensory stimuli, the model recovered those findings without being explicitly trained to. It discovered the brain’s functional organisation as a byproduct of learning to predict activity.
One particularly interesting finding, TRIBE v2 reveals the fine-grained geography of how the brain integrates information across senses. Vision and audition don’t live in completely separate boxes - there are regions of cortex that respond to both, in complex ways that depend on context. The model makes these integration patterns legible in a way that was previously hard to study directly because you’d need too many experimental conditions to map it by hand.
The broader claim the authors make is that AI foundation models might become a “unifying framework” for neuroscience - a way of bridging the dozens of specialised, isolated models that different labs have built for different brain regions and cognitive tasks, into something coherent.
A more applied convergence is happening in neuroimaging, using computer vision and deep learning to analyse brain scans for clinical purposes.
A special issue from PMC collected 17 papers on AI in neuroimaging, and the range of applications is notable:
The consistent theme is that AI is better than manual analysis at integrating information across modalities and time, finding patterns too subtle or too distributed across the scan to be obvious to a human reader.
On the pure computer vision side, a collection of recent papers shows the practical influence of neuroscience-adjacent thinking even in systems that don’t explicitly invoke biology:
Attention mechanisms: now ubiquitous in Vision Transformers and U-Net variants - have a clear neuroscientific analogue in the top-down attentional modulation the brain applies to sensory processing. The cortex doesn’t passively receive visual input; it actively amplifies what’s relevant and suppresses what isn’t, based on current task and expectations. Attention in neural networks does something structurally similar, weighting different parts of the input based on context.
Lightweight, efficient architectures: are showing that you don’t need brute-force scale to get strong performance. Much of this work is driven by deployment constraints (running vision models on phones, embedded systems, autonomous vehicles), but it echoes a real feature of biological vision, the brain achieves remarkable perceptual performance at extremely low power - about 20 watts for the entire brain - through extreme efficiency of representation and processing.
Medical image segmentation: using transformer-based models with attention mechanisms is achieving state-of-the-art results for lymph node detection and other diagnostic tasks where spatial precision matters enormously.
Step back and the pattern is clear. There are roughly three modes of interaction happening simultaneously:
Neuroscience → AI: Biological principles get formalised and tested as engineering choices. Dynamic filter shapes (Lp-Convolution), excitation-inhibition balance, spike-based temporal processing, hierarchical representations - these ideas from biology are being translated into systems that compete with or beat standard approaches on real tasks.
AI → Neuroscience: Large-scale models trained on perception data become tools for studying perception itself. TRIBE v2 is the clearest example, a model that predicts brain activity well enough to run experiments the real brain would be too slow and expensive to participate in.
AI applied to neuroscience data: Separate from whether the AI is brain-inspired, deep learning is simply becoming the standard tool for analysing the data neuroscience produces - MRI, EEG, PET, spike recordings. This is less conceptually exciting but probably has the largest near-term clinical impact.
Computer vision was famously dismissive of neuroscience for much of the deep learning era. The ImageNet moment in 2012 seemed to suggest that scale and data were all that mattered; biological plausibility was a distraction. For a while, that was roughly correct - brute-force scaling of CNNs and then Transformers produced results that neuroscience-inspired approaches couldn’t match.
The shift now is partly because the easy scaling gains are getting harder to find, and partly because the problems that remain - efficient inference, temporal reasoning, robustness to distribution shift, learning from less data - are precisely the ones where the brain remains far ahead of our best systems. When you’re trying to close those gaps, looking at how evolution solved them starts to seem more useful.
At the same time, neuroscience is running into the limits of what it can learn from traditional experiments. fMRI gives you spatial resolution but not temporal. EEG gives you temporal resolution but not spatial. You can’t run every experiment you want on a human subject. Computational models that accurately predict brain responses open up the experimental space enormously.
Institute for Basic Science / Yonsei University / Max Planck Institute. Lp-Convolution: Brain-inspired adaptive filters for image recognition. Presented at ICLR 2025. ScienceDaily coverage
Meta AI. TRIBE v2: A Foundation Model of Vision, Audition, and Language for In Silico Neuroscience. ai.meta.com
Frontiers in Neuroscience. Research collection: Computer vision advances in applied deep learning. (2025). frontiersin.org
Machine Intelligence Research. Towards a New Paradigm for Brain-inspired Computer Vision. (2022). mi-research.net
PMC / Frontiers in Neuroscience. Special Issue: Advances of Artificial Intelligence in Neuroimaging. PMC12025469. pmc.ncbi.nlm.nih.gov
Neurocomputing. Research article on neuroscience-inspired computer vision. (2025). sciencedirect.com
Science of the Total Environment / ScienceDirect. Research article on AI and visual neuroscience. (2025). sciencedirect.com