AI Councils May Smooth Out Unique Insights, Experiment Finds
A new test reveals that multi-model AI systems can dilute distinctive perspectives, mirroring the 'design by committee' problem in human groups.
Testing AI Councils for Groupthink
Many practitioners now use AI councils—systems where multiple large language models collaborate to answer questions or solve problems. The theory is straightforward: diverse models should produce better outputs than any single agent working alone. But researcher Rohit Krishnan wondered whether these multi-model systems might suffer from the same weakness that plagues human committees: a tendency to smooth out distinctive viewpoints and produce bland consensus.
Krishnan designed an experiment to test whether AI councils lose something valuable in the deliberation process. He set up several committee structures using different models, then compared their outputs. The configurations included having a fourth model synthesize answers from three others, a peer review system with a chairperson summarizing results, and a direct "best answer" picker that simply selected the strongest response.
The Smoothing Effect
The central finding mirrors a familiar human problem. When people form committees, they often sand down idiosyncratic ideas and eliminate "spiky" points of view in favor of safer, more conventional outputs. Krishnan's experiment suggested AI councils exhibit similar behavior.
To measure this effect, he needed a way to evaluate whether final responses retained the unique characteristics of individual model outputs or homogenized them into generic answers. The challenge was quantifying how much distinctive insight gets lost when multiple models deliberate rather than working independently.
Why it matters
As organizations increasingly deploy multi-agent AI systems for research, analysis, and decision support, understanding their limitations becomes critical. If AI councils systematically eliminate the most distinctive insights—the very perspectives that might prove most valuable—practitioners need to design workflows that preserve rather than dilute model diversity. The finding suggests that simply adding more models to a deliberation process may not improve output quality if the synthesis mechanism favors consensus over originality.
Implications for AI System Design
The research builds on earlier work showing that model diversity can improve performance, including studies on MarketBench and Andrej Karpathy's LLM Council concept. But Krishnan's experiment adds an important caveat: the architecture of how models interact matters as much as the diversity itself.
The findings were originally published by Rohit Krishnan on Strange Loop Cannon and shared through Azeem Azhar's Exponential View. The work raises practical questions for anyone building or using multi-model AI systems about how to capture the benefits of diverse perspectives without losing what makes each model's contribution valuable.
This is an original analysis by the Omega editorial team. Source reporting: AI Watch.
Want systems like this working for your business?
Book a Call
