The real unit of intelligence is not the model. It is the collective.
For the past several years, the AI industry has measured progress by a single metric: how smart is the model? We benchmark on reasoning tasks, score on graduate-level exams, count parameters, and measure tokens per second. The implicit assumption is that intelligence is a property of an individual system, something you can isolate, test, and rank.
That assumption is wrong. The evidence has been in front of us the whole time.
Intelligence Was Never a Solo Act
Before we talk about AI, let us talk about human intelligence: the kind that built civilizations, cured diseases, and put telescopes in orbit. Human intelligence has never been an individual phenomenon. It is a collective one.
When Newtonian mechanics reached its limits, when physicists tried to apply it to the atom and it broke completely, the response was not a single genius solving the problem alone. It was a collective. Planck quantized energy in 1900. Bohr proposed the atomic model in 1913. Heisenberg formulated matrix mechanics in 1925. Schrödinger published his wave equation in 1926. Dirac unified quantum mechanics with special relativity in 1928. None of them could have done it alone. Each built on what the others had done, questioned what the others had assumed, and handed something forward that the next mind could stand on.
Schrödinger's equation, which describes how quantum systems evolve, is not just a piece of physics history. It is the computational foundation of modern materials science and drug discovery. Every density functional theory calculation, every molecular simulation, every quantum chemistry software package is, at its core, an attempt to solve or approximate Schrödinger's equation for systems too complex to solve analytically. When a startup today uses AI to accelerate materials discovery or catalyst design at the atomic level, it is standing on Schrödinger's shoulders, who was standing on Heisenberg's, who was standing on Bohr's, who was standing on Planck's.
This is collective intelligence in its most literal form. Not a metaphor. A mechanism. Knowledge accumulating across generations, each layer enabling the next, until an AI agent running in an autonomous lab can propose a new catalyst by reasoning from first quantum principles in seconds rather than decades.
Go back further. Primate intelligence, as Robin Dunbar's work on social brain hypotheses shows, scaled not with habitat difficulty but with social group size. The brain grew because tracking complex social relationships required it. Intelligence, at its evolutionary root, is a social organ.
Writing, law, bureaucracy, double-entry bookkeeping: each was an externalization of social cognition into infrastructure. A Sumerian scribe administering a grain accounting system did not need to comprehend its macroeconomic function. The system itself was functionally smarter than any participant within it.
The point is not that individuals do not matter. It is that intelligence, at every scale that has ever changed the world, is a property of systems, not nodes.
What Reasoning Models Are Actually Doing
This reframe matters enormously for how we interpret what is happening inside today's frontier AI systems.
Researchers from Google, the University of Chicago, and the Santa Fe Institute have revealed something striking about how reasoning models like DeepSeek-R1 and QwQ-32B actually work. In their February 2026 paper, "Reasoning Models Generate Societies of Thought" (arXiv:2601.10825), they show that these models do not get better at hard tasks simply by thinking longer. What actually happens inside their extended chain-of-thought is more interesting: they simulate multi-perspective internal debates. Distinct cognitive stances argue with each other, question each other's premises, and reconcile their differences.
The critical detail is this: none of these models were trained to do this. When reinforcement learning is applied with accuracy as the sole reward signal, multi-perspective conversational behavior emerges spontaneously. The models are rediscovering, through optimization pressure alone, what centuries of epistemology have suggested: robust reasoning is a social process, even when it occurs inside a single mind.
This should change how we think about model scaling. The question is not just "how many parameters?" or "how many tokens of pretraining?" It is "how rich is the internal social structure of the model's reasoning?" Team science, small-group sociology, and social psychology have spent a century studying how composition, hierarchy, role differentiation, and structured conflict shape collective performance. These fields are suddenly relevant as blueprints for AI architecture. Almost none of that research has been brought to bear yet.
Biology Already Solved This
The spontaneous emergence of internal debate inside reasoning models is not surprising to anyone who has been paying attention to neuroscience for the past three decades. Biology arrived at the same solution long before silicon did.
The human brain does not run on synchronized, continuous computation the way a GPU does. It runs on spikes: discrete, asynchronous electrical events that fire only when a neuron's input crosses a threshold. Most neurons are silent most of the time. Computation is event-driven, sparse, and massively parallel across roughly 86 billion nodes, each connected to thousands of others. The result is a system that consumes roughly 20 watts while outperforming any silicon architecture ever built on tasks requiring flexible, contextual reasoning.
Neuromorphic computing, pioneered by Caltech's Carver Mead in the late 1980s and accelerating rapidly today, attempts to replicate this architecture in hardware. Intel's Loihi 2 chip supports up to one million neurons and 120 million synapses on a single die. Its Hala Point system scales to billions. IBM's TrueNorth, BrainChip's Akida, and Manchester's SpiNNaker 2 are all pushing similar frontiers. This week, Intel released Loihi 3 and IBM transitioned NorthPole to full production, marking neuromorphic computing's entry into the commercial mainstream.
The relevance to collective intelligence runs across three layers.
First: architectural proof. The biological brain is not organized as a single unified processor. It is a society of specialized regions: prefrontal cortex for planning, hippocampus for memory consolidation, amygdala for threat detection, cerebellum for procedural skill. These regions communicate asynchronously, in parallel, through sparse spike-based signals. When reasoning models spontaneously develop internal debates between distinct cognitive perspectives, they are converging on something the brain discovered through 500 million years of evolution: that distributed, asynchronous, multi-perspective processing is what robust intelligence looks like from the inside.
Second: the hardware substrate that future collectives will run on. Today's transformer-based AI agents live in data centers: power-hungry, latency-constrained, tethered to the cloud. Neuromorphic chips change that equation. Because spiking neurons only consume significant power when they fire, and most neurons are silent most of the time, the overall power draw is drastically lower, often in the milliwatt range. As these chips mature, always-on AI agents become feasible at the edge, embedded in physical environments, running on ambient power, without requiring a round-trip to a data center.
Third: the learning architecture. Unlike traditional artificial neural networks, spiking neural networks operate with local plasticity rules, where synaptic weights update based on the timing of neural signals rather than a global loss function. This is unsupervised, decentralized learning: no backpropagation, no central supervisor. Each synapse learns from local information only. That is precisely the model of institutional alignment we need at scale.
Transformer-based reasoning models are rediscovering internal social debate. Neuromorphic systems are rediscovering event-driven, decentralized learning. Both are independently arriving at the same conclusion the brain encoded long ago: intelligence at scale is a property of how components interact, not of any single component.
The Centaur Is Already Here
Step outside the model and the picture becomes clearer. We have entered the era of human-AI centaurs: composite actors that are neither purely human nor purely machine.
This is not a metaphor. It is the literal structure of how high-value knowledge work is being reorganized right now, across engineering, law, medicine, finance, research, and design. A corporation comprising thousands of humans already holds singular legal standing and acts with collective agency that no individual member fully controls. The explosion of agentic AI is now seeding the possibility of something analogous, operating at the scale of billions of interacting minds, human and non-human alike.
Conflict, in this architecture, is not a bug. It is a resource.
The Alignment Problem Is an Institutional Design Problem
This is where the reframe becomes urgent for practitioners.
The dominant paradigm for AI alignment, Reinforcement Learning from Human Feedback, is a dyadic model. One trainer, one model, a correction loop that resembles a parent-child relationship. It has been enormously productive. It has also reached a structural ceiling.
RLHF cannot scale to billions of agents interacting in complex social configurations. If the real unit of intelligence is the collective, then alignment is not fundamentally a model-training problem. It is an institutional design problem.
Human societies do not maintain cooperation through individual virtue. They do it through persistent institutional templates: courtrooms with defined roles for judges, attorneys, and juries; markets with price signals and contract law; bureaucracies with procedural rules and audit mechanisms. These institutions are more intelligent than any of their participants because they encode accumulated social learning about how to coordinate under uncertainty, manage conflict, and correct errors over time.
The implication is direct: scalable AI ecosystems require digital equivalents. Not better RLHF. Role protocols. Procedural norms. Constitutional structures where the identity of any agent matters less than its ability to fulfill a well-defined slot.
Read that paper today and something becomes clear: Sculley et al. were not describing a technical problem. They were describing the failure modes of a poorly designed social system. Boundary erosion is what happens when there are no stable role protocols between components. Entanglement is what happens when there is no institutional boundary between agents. Hidden feedback loops are what you would expect from a system with no audit mechanism and no checks and balances.
The vocabulary was software engineering. The problem was governance.
A decade later, we are running the same pattern at a civilizational scale. The debt is not technical anymore. It is governance debt. Unlike a tangled ML pipeline, you cannot refactor a civilization in a weekend sprint.

Figure 1: The five frameworks that define how collective intelligence works at scale, from the human-AI composite actor to the governance debt that accumulates without institutional design.
What This Means for How You Build
The practical implications form four load-bearing principles. Miss any one and the architecture is unsound.
Principle 1: Benchmark composites, not components. The relevant unit of performance is the human-AI team, not the model in isolation. A model that scores lower on standard benchmarks but integrates better with a human workflow is often more valuable in deployment than one that scores higher in testing. We need evaluation frameworks for ensembles, metrics for collaboration quality, and accountability structures that measure collective output.
Principle 2: Treat organisational structure as an architecture decision. The social and organizational sciences have spent a century studying how team size, composition, hierarchy, role differentiation, and conflict norms shape collective performance. How many agents in a working group before coordination costs exceed capacity gains? When should an agent escalate versus decide autonomously? These are first-class architectural decisions that belong in the design phase, not afterthoughts.
Principle 3: Build institutional infrastructure with the same rigour as model capability. Every multi-agent system needs defined role protocols, escalation paths, audit mechanisms, and conflict resolution procedures. Building without these is the governance debt Sculley warned about at model level, now accumulating at system level.
Principle 4: Redefine who the user is. In an agentic world, users are not individual humans making discrete requests. They are composite actors: human-AI ensembles, multi-agent workflows, organizational entities that span biological and silicon components in shifting configurations. Design for that actor, not for the simplified individual user of the pre-agentic era.
Intelligence Is Not Singular
The singularity narrative imagines one vast mind ascending to godlike intelligence and consolidating all cognition. That has always been the wrong mental model, not just empirically wrong, but conceptually confused about what intelligence is and where it comes from.
The more accurate picture is that we are building something that looks less like a brain and more like a city: dense, interconnected, full of conflict and coordination, specialization and redundancy. Different kinds of intelligence operate at different scales, checking and balancing each other, accumulating collective knowledge that no single participant fully holds.
That city is already under construction. The question is not whether it will be built, but whether we will build it with the institutional wisdom the moment demands, or keep treating it like a very large individual model and be surprised when it behaves like a society.
This article draws on three foundational works:
- Evans, Bratton & Agüera y Arcas, "Agentic AI and the Next Intelligence Explosion," Science / arXiv:2603.20639 (2026); researchers from Google, University of Chicago, and Santa Fe Institute, "Reasoning Models Generate Societies of Thought," arXiv:2601.10825 (February 8, 2026); and
- D. Sculley et al., "Hidden Technical Debt in Machine Learning Systems," Advances in Neural Information Processing Systems 28, NeurIPS (2015).
Key Concepts
Governance debt: The accumulated cost of deploying AI systems without institutional scaffolding: role protocols, escalation paths, audit mechanisms, and conflict resolution structures. Analogous to technical debt in software engineering, but operating at the level of social systems rather than codebases. First framed in this context by Dr. Jean-Leah Njoroge, drawing on Sculley et al. (2015).
Institutional alignment: The proposition that scalable AI safety is not primarily a model-training problem but an institutional design problem. Rather than aligning individual models through RLHF, institutional alignment focuses on designing the role protocols, procedural norms, and constitutional structures within which AI agents operate, drawing on Elinor Ostrom's work on commons governance.
Society of thought: A term coined by researchers from Google, the University of Chicago, and the Santa Fe Institute to describe the emergent multi-perspective internal debate that arises spontaneously in reasoning models when reinforcement learning is applied with accuracy as the sole reward signal. The models develop distinct cognitive stances that argue with, question, and reconcile with each other, rediscovering through optimization what epistemology has long suggested: robust reasoning is a social process.
Human-AI centaur: A composite actor that is neither purely human nor purely machine. The knowledge worker directing AI agents while participating as a human node in larger AI-orchestrated workflows. Coined by Evans, Bratton & Agüera y Arcas (2026).
Frequently Asked Questions
What is the difference between traditional AI alignment and institutional alignment?
Traditional AI alignment, primarily through Reinforcement Learning from Human Feedback (RLHF), focuses on correcting individual model behavior through a trainer-model feedback loop. Institutional alignment, by contrast, treats the governance challenge as a systems design problem: rather than making each model safer in isolation, it designs the role protocols, audit mechanisms, and procedural norms within which agents operate collectively. RLHF is a dyadic model that cannot scale to billions of interacting agents. Institutional alignment draws on Elinor Ostrom's work on how communities govern shared resources without top-down control.
What is neuromorphic computing and why does it matter for AI governance?
Neuromorphic computing replicates the architecture of the biological brain in hardware, using spiking neural networks that process information through discrete, asynchronous spike events rather than continuous computation. Intel's Loihi 2, IBM's TrueNorth, and BrainChip's Akida are leading examples, with Intel's Loihi 3 and IBM's NorthPole entering commercial production this week. It matters for governance because neuromorphic systems use decentralized, local learning rules with no global supervisor, the same architectural principle that institutional alignment proposes for multi-agent AI systems.
What does governance debt mean in the context of AI systems?
Governance debt refers to the accumulated liability of deploying AI systems, particularly multi-agent and agentic systems, without the institutional infrastructure to govern them. Just as technical debt accumulates when engineering shortcuts are taken without addressing systemic dependencies, governance debt accumulates when AI systems are deployed without defined role protocols, escalation paths, audit mechanisms, or conflict resolution procedures. Unlike a tangled ML pipeline, governance debt cannot be refactored in a sprint. When it compounds across billions of agents, the interest rate becomes civilizationally significant.
About the Author
Dr. Jean-Leah Njoroge is an Engineer and AI Systems Architect who writes about AI governance, frontier science, and the commercial deployment of emerging technology. She has worked at every scale of the engineering stack: industrial chemistry at Caterpillar, quantum-scale computational modelling during her PhD in Computational Materials Science and Nanotechnology, and AI systems deployment across Fortune 500 companies in tech and retail. She now evaluates AI technologies for patent eligibility and commercial viability in university technology transfer. Her named frameworks, the Bridge Framework and the 7 AI Categories system, give practitioners a structured approach to AI governance decisions in real deployment contexts. She is recognized among VentureBeat's leading women in AI and publishes in Business Daily Africa. She writes at insightsbydrjean.com.
