AI's Real Bottleneck: How to Talk to the People Building It
The biggest constraint on AI development isn't compute power or annotation budgets. It's the ability to communicate clearly with the humans shaping these systems.
Research surveying more than 120 AI practitioners-engineers, researchers, product managers, and business leaders-ranked cost dead last among obstacles slowing their work. The top bottleneck was something the industry has largely overlooked: the quality of human feedback used to train models, followed closely by difficulty measuring whether training is working at all.
This finding contradicts years of industry consensus about what actually limits AI progress. And it points to a problem no infrastructure spending can solve.
How the Work Changed
Early AI development relied on volume. Thousands of people performed repetitive tasks: labeling images, filtering spam, ranking outputs. The work was binary. The instructions were simple. The humans doing it were largely interchangeable.
That model no longer works. Today's systems need to reason through ambiguous legal scenarios, navigate complex clinical workflows, and generate production-grade code. These tasks demand a different kind of human input entirely.
A domain expert now needs to judge whether a model is actually reasoning correctly-not whether its output merely sounds plausible. Nearly half of surveyed practitioners cited designing evaluation methodologies and subject matter expert validation as primary inputs. The humans shaping these models have moved well past labeling. They're designing the tests that determine whether reasoning is sound.
The industry hasn't figured out how to work with these people.
The Instruction Gap
When asked what makes involving humans in AI workflows difficult, only one in six respondents cited cost. The top challenges were communicating tasks clearly enough for experts to perform them, and finding people with the right domain knowledge and contextual understanding.
These are management problems, not compute problems.
The research identifies this as the instruction gap: the loss of signal between what an engineer needs and what an expert contributor can deliver without proper context. In the era of simple labeling, instructions were objective. A picture either contained a stop sign or it didn't. Today, explaining to a cardiologist how to evaluate a model's reasoning about arrhythmia requires genuine transfer of operational knowledge. When that transfer fails, the data comes back noisy.
Noisy data fed into a sophisticated model produces a less capable and misaligned system. The stakes rise as systems become more ambitious.
The Shift Toward Autonomous Systems
Nearly two-thirds of practitioners identified AI agents and autonomous systems as the primary growth area for 2026. These systems don't just generate text. They plan, decide, and act. They book appointments, execute transactions, triage documents, and navigate interfaces.
A chatbot producing a confused response is an inconvenience. An agent making decisions in healthcare, legal, or financial contexts without properly calibrated human guidance is categorically different. The more autonomous these systems become, the more their behavior depends on the precision of the human signal used to train them. There is no shortcut.
How Leading Teams Close the Gap
Companies closing the instruction gap aren't spending more money. They're treating the management of human expertise as a first-class engineering problem-one deserving the same operational rigor as model architecture or deployment pipelines.
Before domain experts ever see a model output, the best teams invest significant time in calibration. They explain not just what the task is, but why it matters, how the model will be deployed, what failure looks like in context, and which edge cases concern them most.
A cardiologist evaluating a diagnostic model needs to know whether it's being built for emergency triage or routine screening. Those are different tasks requiring different judgments. Without that context, even a highly qualified contributor works in the dark.
This process must be ongoing, not a one-time onboarding. The best human feedback systems build in regular loops where contributors flag ambiguous instructions, engineers tighten task design based on observed disagreements, and the gap between intention and understanding progressively narrows. Most teams write instructions once, ship them to contributors, and treat the resulting data as ground truth. The noise entering the pipeline at that stage doesn't announce itself. It quietly degrades model judgment over time.
There's also a recruiting challenge. Finding a credentialed expert is one thing. Finding someone who can translate tacit professional knowledge into explicit, consistent evaluations is considerably harder. Teams doing this well have stopped treating expert recruitment as a one-off procurement exercise. They build relationships with domain networks, invest in contributor development, and create conditions for genuine expertise to be expressed rather than extracted.
The Real Work Ahead
None of this is glamorous. But as AI systems take on more consequential roles, the quality of human judgment shaping them stops being optional and becomes a direct determinant of whether those systems are safe to deploy.
The industry has spent years asking how to make models more capable. The more pressing question now is how to make the humans guiding them more effective. The model on the other side of the process is only ever as good as the clarity of the people who shaped it.
For managers overseeing AI development, this means treating human feedback systems with the same rigor as technical infrastructure. Learn more about AI for Management and explore how to build these processes into your teams.
Your membership also unlocks: