Myth Number 2: MIT Showed That 95% of AI Pilots Fail

95% of pilots failRay Poynter, 31 May 2026


Perhaps the most commonly quoted statistics about AI projects is that 95% of them fail, according to MIT. It has been quoted by Fortune, Forbes, Harvard Business Review, and a long tail of content that is still, in 2026, using “95% of AI pilots flop” as a hook to sell the very thing the study was supposedly warning us about.

It is a wonderful statistic. It is shocking, it carries an elite institutional badge, it confirms what a tired and sceptical audience already suspected, and it is short enough to survive a thousand reposts. There is only one problem, it does not say what almost everyone thinks it says.

This is the second post in my series on AI myths, and like the first, the point is not to claim that AI in business is going wonderfully. The point is that bad evidence makes bad decisions, even when the bad evidence happens to point in a direction you find reassuring.

Where the number actually comes from

The source is a document called The GenAI Divide: State of AI in Business 2025. It was produced by Project NANDA, branded as “MIT NANDA”, dated July 2025, and authored by Aditya Challapally, Chris Pease, Ramesh Raskar and Pradyumna Chari. NANDA stands for Networked Agents and Decentralised Architecture, and that name is relevant, as you will see

A few things are worth noting before digging into the weeds. The report describes itself as “Preliminary Findings”. It states that the views are those of the authors and not their affiliated employers. The original MIT-hosted link now redirects, although mirrored copies of the PDF remain online. None of that makes the work worthless, but it does tell you that this was never a peer-reviewed, institutionally underwritten piece of MIT research. It was a working paper from a group that builds agentic AI infrastructure. It is NOT an MIT statistic.

What the report measured, and what it did not

Here is the part that the headlines tend to skate over.

The famous “95% failed” line is a media simplification of several different claims that the report itself blurred together. In the executive summary, the report says that 95% of organisations are getting zero return. The chart that people tend to screenshot shows something narrower. What is shown is a funnel that leads to pilots. In that funnel, 60% of organisations investigated such tools, 20% reached pilot stage, and 5% were “successfully implemented”.

If 20% reach pilot and 5% reach production, then roughly one quarter of the organisations that actually ran a pilot cleared the bar. That is not “95% of pilots failed”. It is closer to about 75% of pilots did not meet a deliberately demanding success threshold. Those are very different sentences, but it was the 95% fail, misdirection that ended up in most of the news coverage.

The success bar itself was high. The report defined success as deployment beyond pilot, with measurable KPIs, and ROI measured six months after the pilot. Elsewhere it described a successful tool as one where users or executives reported marked and sustained productivity and/or P&L impact. Frankly, it is remarkable that 25% of pilots managed to generate that much success in six months.

The commentary about pilots failing to show a return masked one of the big findings from the report, that overall AI was having a positive effect. The same PDF reports that over 80% of organisations had explored or piloted general-purpose LLM tools such as ChatGPT and Copilot, nearly 40% reported deployment, and generic LLM chatbots showed a pilot-to-implementation rate of around 83%. The report did not find that enterprise AI was failing across the board. It found that custom, embedded, workflow-specific GenAI struggled to meet a tough six-month business-impact test.

The methodology, in the report’s own words

Not only is the report being misreported, it is a pretty weak study in the first place. The report’s own list of limitations is long, and to the authors’ credit they did not hide it. They said the figures were “directionally accurate” and based on interviews rather than official company reporting. They warned that sample sizes varied by category, that success definitions differed across organisations, that selection bias was possible, that build-versus-buy percentages were interview-based rather than market-wide, and that a six-month window might understate success for slower enterprise rollouts.

The criticism that arrived almost immediately

Serious pushback to the report came within days of its launch, from people who actually read the PDF.

Kevin Werbach, a Wharton professor, dug into the report on LinkedIn and concluded it was deeply problematic, with its central claim insufficiently supported and the MIT association potentially misleading. As one analysis of his critique put it, if you go through the report looking for the evidential basis of the 95% figure, you struggle to find it. The number appears in essentially one sentence, with no underlying data presented to support it.

Paul Roetzer of the Marketing AI Institute reached the same place. His objection was that the success definition was narrow, that it ignored efficiency gains, cost reductions, churn reduction and other indirect value, that the “zero return” finding rested on just 52 interviews the report itself called only directionally accurate, and that the analysis of 300 public initiatives was never properly explained. His advice to readers was blunt, do not put weight on this study.

Practitioner commentary went further still, with some calling the work marketing in disguise, pointing to the short ROI window, the self-selected sample, and the report’s repeated endorsement of NANDA-style agentic infrastructure as the path across the GenAI divide. That last point deserves emphasis. A working paper from a project that builds agentic AI infrastructure concluded that the solution to the problem it identified was, broadly, agentic AI infrastructure. That is not proof of bad faith, but it is exactly the kind of conflict that should make a careful reader slow down.

The media changed the study

There is one more wrinkle that should bother anyone who cares about evidence.

The PDF clearly says 52 interviews and 153 leaders. Yet Fortune, which launched the story into the mainstream in August 2025, reported 150 interviews and a survey of 350 employees. Outlets including Financial Express and Tom’s Hardware then repeated those larger numbers. Perhaps an earlier draft or a press briefing existed, perhaps the numbers were simply misstated. But it means a great many people quoted, and still quote, a sample size that the published document does not contain.

Why the myth stuck anyway

The 95% claim endured because it was a perfect media object. A shocking round number, an elite institutional label, provided a simple message that a large audience already wanted to hear.

The timing helped too. The number landed in a jittery market. Fortune itself noted that the report, alongside comments from Sam Altman about an AI bubble, was part of the rationale for a tech sell-off that week, with investors reading it as an indictment of AI as a whole even though that is not what the research said.

It is worth holding two things in your head at once here. First, the 95% headline is a serious distortion of a preliminary working paper. Second, the underlying concern is not made up, there are lots of pilots that fail.

What to say when someone quotes the number at you

Here are three points you might want to highlight.

First, the 5% figure is not the pilot failure rate. It comes from a funnel for one narrow category, custom embedded tools, where 20% reached pilot and 5% reached production. That implies roughly a quarter of actual pilots cleared a deliberately high bar, not one in twenty.

Second, the same report found general-purpose tools such as ChatGPT and Copilot doing well, with pilot-to-implementation rates above 80%. The study did not show that enterprise AI was failing. It showed that one type of bespoke deployment struggled against a six-month ROI test.

Third, this was a preliminary working paper, self-described as such, from a project with a commercial interest in the recommended solution, and parts of the coverage inflated even its sample size.

Read the paper yourself

You can read the paper via https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *