Why do AI pilots fail?

Most AI pilots fail because they are designed to produce declarations rather than demonstrations. A pilot generates activity, screenshots, and a slide for the board, while the underlying tasks are never re-examined. Diagnosed at task level, the commonest cause is a divergence between task value and user surplus: the tool helps the organisation but gives the person doing the work no reason to use it again.

Why isn't AI delivering ROI?

AI shows no return when spending tracks declared capability rather than demonstrated behaviour. Licences, logins, and training attendance are declarations. ROI appears only when repeated use changes cycle time, quality, cost, risk, decision speed, revenue, or another real operating measure. Adoption is not transformation.

Is there a framework for AI adoption?

Yes. The Demonstration Gap is James Kerr's umbrella framework: organisations declare AI capability they cannot demonstrate, and the return lives in demonstrated task behaviour. Inside it sits a task-level 2x2 diagnostic crossing task value from AI with user surplus for the person doing the work. Its four quadrants are Compounding Adoption, the Willingness Gap, AI Theatre, and Correctly Left Alone.

AI Theatre is the quadrant of the Demonstration Gap diagnostic where enthusiasm runs high but task value runs low. People use the tools, attend the workshops, and generate visible activity, while the work itself does not materially improve. It is where declarations concentrate: pilots, demos, and announcements that never reach an operating metric.

What is declared versus demonstrated AI adoption?

Declared adoption is what an organisation says about its AI capability: strategy decks, licence counts, town halls, vendor announcements. Demonstrated adoption is what its people can actually show: changed task behaviour, repeated voluntary use, and outcomes that move an operating measure. The two ledgers routinely diverge, and only the second one pays.

How do you measure AI adoption?

Measure demonstrations, not declarations. Task value can be measured through cycle time, quality, error reduction, risk reduction, decision speed, cost, revenue, or conversion impact. User surplus can be measured through voluntary repeat use, return without prompting, time to first successful use, prompt or template reuse, workaround rate, and whether users would miss the tool. Licences and logins are weak signals unless they connect to changed work.

How do you close the Demonstration Gap?

You close the Demonstration Gap by auditing what people can demonstrate rather than what the organisation declares, then working task by task: start where AI creates measurable task value and the person doing the work captures real surplus, encode what works into shared playbooks, and measure repeated behaviour against operating outcomes. Demonstration over declaration.

The Demonstration Gap

Q: What is the Demonstration Gap?

The Demonstration Gap is James Kerr's term for the gap between the AI capability an organisation declares and the capability its people can demonstrate in real work. Declared adoption is cheap: licences, pilots, announcements. Demonstrated adoption shows up in how tasks actually get done. Most AI investment dies in the distance between the two.

The Demonstration Gap is the gap between the AI capability an organisation declares and the capability its people can demonstrate in real work. Declared adoption is cheap: licences, pilots, announcements. Demonstrated adoption shows up in how tasks actually get done. Most AI investment dies in the distance between the two.

I coined the term because the failure it names hides in plain sight. Every company now says it is adopting AI. The decks say so, the licence counts say so, the town halls say so. Ask the same company to show you ten tasks that are done differently because of AI, with the behaviour repeating and an operating measure moving, and the room goes quiet. Nothing in the saying was false, exactly. It just was never the same thing as the showing.

What the Demonstration Gap is

Every organisation keeps two ledgers of its AI capability, whether it knows it or not. The declared ledger holds the strategy deck, the vendor contracts, the enablement programme, the licence count, the chief AI officer. The demonstrated ledger holds something smaller and harder: the specific tasks where work is measurably done differently, by people who chose the tool again when nobody was watching.

The Demonstration Gap is the distance between those ledgers. It is not a measure of dishonesty. Most leaders genuinely believe the declared ledger. It is a measure of how far belief has outrun evidence, and it predicts, better than spend or enthusiasm, whether the investment will ever reach the P&L.

Why declared and demonstrated adoption diverge

The divergence is diagnosed one task at a time, not one company at a time. A model can be powerful in general and still fail in the specific workflow where it lands. Inside the Demonstration Gap sits a 2x2 diagnostic built on two plain questions.

Axis one: task value from AI. Does AI measurably improve this specific task? Signals include cycle time, quality, error reduction, risk reduction, decision speed, cost, revenue, conversion, or the quality of judgement.

Axis two: user surplus. Does the person doing the work experience enough net benefit to use the tool again? Signals include voluntary repeat use, return without prompting, time to first successful use, prompt or template reuse, workaround rate, and whether users would miss the tool if it disappeared.

Cross them and every stalled rollout lands in one of four quadrants.

Compounding Adoption (high task value, high user surplus). AI improves the task and the person captures enough benefit to repeat the behaviour. Demonstration accumulates here on its own; adoption sticks without a mandate.

The Willingness Gap (high task value, low user surplus). The overlooked, load-bearing failure: a genuinely useful tool goes untouched because the person at the desk gets more review burden, less autonomy, more risk, or no credit. This quadrant carries enough weight that it has its own page.

AI Theatre (low task value, high user surplus). Enthusiasm without payoff, and the quadrant where declarations concentrate. Pilots, demos, workshops, screenshots for the board. Activity is abundant, operating leverage is not. Most of a company's declared ledger is written here.

Correctly Left Alone (low task value, low user surplus). AI adds little and nobody needs it to. Leaving these tasks alone is judgement, not failure.

	Low user surplus	High user surplus
High task value	The Willingness Gap (useful, unused)	Compounding Adoption (adoption sticks)
Low task value	Correctly Left Alone (leave it)	AI Theatre (activity, little return)

How to measure the Demonstration Gap

Run the audit on demonstrations, not declarations, and keep the two ledgers separate.

Demonstrated capability: tasks done measurably differently with AI; voluntary repeat use; return without prompting; reuse of prompts, templates, or playbooks; an operating measure that moved and stayed moved.
Declared capability: licences, logins, training attendance, pilot counts, announcements. Record these, then treat them as claims awaiting evidence rather than results.
The gap itself: for each declared capability, ask who can demonstrate it, on which task, with what repeated behaviour and outcome. Every claim with no demonstrable task attached is the gap, itemised.

Then work the quadrants: protect and spread what compounds, redesign the workflow where willingness is missing so the user captures real surplus, stop counting theatre as progress, and leave the fourth quadrant alone.

Why it matters

The external evidence rhymes with this. MIT NANDA's 2025 GenAI Divide report found that only a small share of enterprise AI efforts were translating into measurable business impact, despite widespread experimentation. Goldman Sachs chief economist Jan Hatzius made a parallel macro point in February 2026: AI investment had contributed far less to 2025 US GDP growth than the market story implied. Enormous declared adoption, almost no demonstrated capability. The gap, at national scale.

The diagnosis is not that AI does not work. It is that declaration has been allowed to stand in for demonstration, and only demonstration pays.

The key insight

Stop asking what your organisation says about AI and start asking what it can show. The declared ledger will always look healthy, because declarations are cheap to mint and pleasant to read. The demonstrated ledger is the one the P&L reads.

Closing the gap is task-level work: find where AI creates real task value, design the workflow so the person doing the work captures real surplus, encode what works into shared playbooks, and measure repeated behaviour against operating outcomes. Demonstration over declaration, one task at a time.

The question worth sitting with is the uncomfortable one: if a serious buyer, board, or acquirer asked your team to demonstrate its AI capability tomorrow, task by task, how much of the declared ledger would survive the meeting?

The Demonstration Gap

In Brief

What this argues

Why it matters

Key mechanism

What the Demonstration Gap is

Why declared and demonstrated adoption diverge

How to measure the Demonstration Gap

Why it matters

The key insight

The questions people actually ask.

What is the Demonstration Gap?

Why do AI pilots fail?

Why isn't AI delivering ROI?

Is there a framework for AI adoption?

What is AI Theatre?

What is declared versus demonstrated AI adoption?

How do you measure AI adoption?

How do you close the Demonstration Gap?