Rethinking Intelligence Substack

Rethinking Intelligence Substack

Share this post

Rethinking Intelligence Substack
Rethinking Intelligence Substack
What Transformers Could Learn from the Human Brain

What Transformers Could Learn from the Human Brain

Why prediction alone isn't enough, and how neuroscience could shape the next generation of AI.

Dr Daniel Navarro's avatar
Dr Daniel Navarro
Jun 28, 2025

Share this post

Rethinking Intelligence Substack
Rethinking Intelligence Substack
What Transformers Could Learn from the Human Brain
Share

Hi there - and welcome to Rethinking Intelligence.

If you are new, this is a space where neuroscience meets machine learning. We explore how brain-inspired computation might unlock the next leap in AI - from predictive coding to planning agents and beyond.

Rethinking Intelligence Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a subscriber.

This post kicks off our flagship series.


The Illusion of Intelligence

Large Language Models (LLMs) such as GPT-4, Claude, and Gemini have captured global fascination. They write code, pass exams, and hold conversations that feel intelligent. Yet beneath the surface, they are statistical engines trained to predict the next word in a sequence.

This begs the question:

Are these systems truly intelligent  -  or simply convincing mimics?

Biological brains have been solving prediction problems far longer  -  and with far greater efficiency  -  than any neural network. What might artificial intelligence learn from nature?

This essay explores how predictive coding, a central theory in neuroscience, compares with transformer-based AI models. More importantly, it asks: What cognitive ingredients are still missing from our most advanced AI systems?


1. The Brain as a Prediction Machine

Predictive coding, formalised by Rao & Ballard (1999), proposes that the brain constantly generates internal models of the world, forecasting sensory input before it arrives. When predictions match reality, the brain stays quiet. When they don’t, it updates its model based on the difference —  the prediction error.

Friston’s Free Energy Principle (2010) builds on this: intelligence is about minimising surprise (or technically, variational free energy) over time. The brain is not just reactive: it is proactive, efficient, and deeply hierarchical.

Recent advancements have demonstrated that predictive coding networks can be scaled to depths exceeding 100 layers, bridging the gap between biologically inspired models and deep learning architectures (Innocenti et al., 2025).


2. How Transformers Predict

Transformers, by contrast, are feedforward models trained to predict the next token in a sequence. Introduced in Attention is All You Need (Vaswani et al., 2017), they replace recurrence with self-attention. This enables the model to weigh all previous tokens when predicting the next.

LLMs scale this architecture massively and model complex linguistic patterns. Yet they fundamentally lack several key ingredients of biological cognition, such as:

  • Intrinsic goals

  • Grounding in real-world causality

  • Sensory input

While more modern LLMs can process sensory-like data (images, audio) as tokens, this is different from experiencing the world through continuous, integrated sensory modalities as biological systems do.

As a consequence, they excel at mimicry, but struggle with:

  • Common-sense reasoning

  • Long-term planning

  • Goal-directed behaviour

  • Adaptive memory

So, while LLMs can predict, do they truly understand? Some researchers have noted that attention mechanisms in deep learning models do not map cleanly to human attention — conceptually or functionally (Lindsay, 2020).


3. What the Brain Has That Transformers Don’t

Here’s how some core cognitive traits compare between the brain and current Transformer-based models:

Prediction

  • Brain: Hierarchical, context-aware

  • Transformer: Autoregressive, shallow (short-range)

Feedback Loops

  • Brain: Extensive reentrant processing

  • Transformer: Mostly absent

Memory

  • Brain: Working, episodic, long-term

  • Transformer: Limited context window, patched with retrieval mechanisms

Energy Efficiency

  • Brain: ~20W power usage

  • Transformer: Requires extensive GPU computation

Sensory Grounding

  • Brain: Embodied in multimodal sensorimotor experience

  • Transformer: Often symbolic or text-based; some recent models incorporate images and audio, but lack true embodied grounding

While many Transformer models remain symbolic and primarily text-based, others  -  such as DeepMind’s Gato (Reed et al., 2022)  -  incorporate multimodal input and interaction with simulated environments to address the gap in embodied grounding.

Goals

  • Brain: Intrinsic and adaptive

  • Transformer: Externally prompted and task-constrained

Recent models like RWKV (Peng et al., 2023) and Hyena (Poli et al., 2023) reintroduce architectural features such as recurrence and focus on energy-efficient processing to overcome Transformer limitations. Mamba (Gu et al., 2023) further explores these directions by incorporating selective state space models, which enable recurrence, effective temporal abstraction through content-based reasoning, and significant energy efficiency.

TransformerFAM, a 2024 model that uses feedback attention to construct internal working memory, demonstrates that adding feedback mechanisms to Transformer architectures can significantly enhance their ability to handle long contexts and improve memory capacity, which the authors argue is a key prerequisite for reasoning (Hwang et al., 2024).

This growing body of research suggests a shift: modern AI may need to rediscover the very mechanisms evolution refined in biological systems.


4. What AI Can Learn from the Brain

Several architectural insights from neuroscience may enhance next-generation AI:

  • Prediction Error Signalling: Enable models to pass local prediction errors forward through hierarchical layers, mimicking how brains adjust internal beliefs based on surprise.

  • Recurrence and Memory: Move beyond context windows; enable true internal state.

Recent studies have explored augmenting Transformers with recurrent mechanisms. One approach, depth-wise recurrence with dynamic halting (Chowdhury & Caragea, 2024), enables adaptive computational depth. Another, chunk-wise recurrence via temporal latent bottlenecks (Didolkar et al., 2022), consolidates information by combining fast and slow processing streams. These innovations address core limitations of standard Transformers - including high computational cost, rigid processing depth, and difficulty generalising to long or unfamiliar sequences.

  • Sparse and Modular Computation: Emulate the brain’s efficiency.

  • Multi-modal Grounding: Integrate sensory modalities for richer representation.

  • Goal-Directed Planning: Design agents with intrinsic reward systems or self-updating goals.

While DeepMind's recent projects, such as Gato (Reed et al., 2022), hypothesise that scaling data, compute, and model parameters can lead to generalist AI, other independent research efforts suggest a growing awareness that scaling alone - especially without addressing architectural inefficiencies or inherent limitations in understanding - may not be enough to solve intelligence (Bender et al., 2021).


5. Intelligence Is More Than Prediction

Predictive coding teaches us that intelligence involves more than reaction. It requires anticipation, adaptation, and purpose. The human brain is generative, model-driven, and constantly adjusting to incoming sensory feedback.

Today's LLMs are powerful tools  -  and with frameworks like AutoGPT, Open Interpreter, and GPT-4's memory API, we are seeing early forms of memory and agent-like behaviour emerge. Agent frameworks such as ReAct, Voyager, and LangGraph also show promise in enabling goal-conditioned tool use and iterative planning.

But these systems still fall short of true intelligence:

  • Memory is externally scaffolded or narrowly scoped

  • Embodiment is largely symbolic or simulated

  • Goals are prompt-engineered (not intrinsically generated)

LLMs simulate agency  -  but do not yet possess it.

As Bender et al. (2021) caution, surface-level fluency can be misleading: what appears intelligent may simply be statistical mimicry without real comprehension or intent.

To build genuinely intelligent machines, we may need to return to the first intelligent system we ever knew: the brain.


References

  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?🦜. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610–623).

  • Chowdhury, J. R., & Caragea, C. (2024). Investigating Recurrent Transformers with Dynamic Halt. arXiv preprint arXiv:2402.00976.

  • Didolkar, A., Gupta, K., Goyal, A., Gundavarapu, N. B., Lamb, A. M., Ke, N. R., & Bengio, Y. (2022). Temporal latent bottleneck: Synthesis of fast and slow processing mechanisms in sequence learning. Advances in Neural Information Processing Systems, 35, 10505–10520.

  • Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature reviews neuroscience, 11(2), 127–138.

  • Gu, A., & Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752.

  • Hwang, D., Wang, W., Huo, Z., Sim, K. C., & Mengibar, P. M. (2024). Transformerfam: Feedback attention is working memory. arXiv preprint arXiv:2404.09173.

  • Innocenti, F., Achour, E. M., & Buckley, C. L. (2025). μPC: Scaling Predictive Coding to 100+ Layer Networks. arXiv preprint arXiv:2505.13124.

  • Lindsay, G. W. (2020). Attention in psychology, neuroscience, and machine learning. Frontiers in computational neuroscience, 14, 29.

  • Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Biderman, S., Cao, H., Cheng, X., Chung, M., Grella, M., & GV, K.K. (2023). Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048.

  • Poli, M., Massaroli, S., Nguyen, E., Fu, D.Y., Dao, T., Baccus, S., Bengio, Y., Ermon, S., & Ré, C. (2023, July). Hyena hierarchy: Towards larger convolutional language models. In International Conference on Machine Learning (pp. 28043–28078). PMLR.

  • Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature neuroscience, 2(1), 79–87.

  • Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S.G., Novikov, A., Barth-Maron, G., Gimenez, M., Sulsky, Y., Kay, J., Springenberg, J.T., & Eccles, T. (2022). A generalist agent. arXiv preprint arXiv:2205.06175.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.


Stay in the Loop

This post is part of Rethinking Intelligence  -  a series exploring how neuroscience can shape the future of AI, from predictive coding to intelligent agents.

→ For early access, subscriber-only posts, and the full roadmap:
🔗 Subscribe on Substack

→ Prefer reading on Medium? Follow this publication for future posts:
🔗 Follow on Medium

🧠 Next up: How Brains Learn from Almost Nothing: Why AI needs mountains of data, and the brain doesn’t.

Rethinking Intelligence Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Share this post

Rethinking Intelligence Substack
Rethinking Intelligence Substack
What Transformers Could Learn from the Human Brain
Share

Ready for more?

© 2025 Dr Daniel Navarro
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share