Two truths about AI ROI: MIT’s market reality and Google’s production lens

4 minutes

Georgi Ivanov - Senior Communications Manager at Payhawk

AuthorGeorgi Ivanov

Read time

4 minutes

PublishedSep 17, 2025

Last updatedSep 26, 2025

Image of two women discussing the ROI of AI in Finance

Quick summary

MIT says 95% of enterprise GenAI pilots show no ROI; Google says most production users already see value. Both are right — because they’re measuring different worlds. The gap between pilots and production explains the divide.

Get a demo

Get fresh finance & AI insights, monthly.

Unsubscribe anytime.

By submitting this form, you agree to receive emails about our products and services per our Privacy Policy.

If you followed the AI news cycle this summer, you probably felt whiplash. On one side, MIT’s Project NANDA says a stunning 95% of enterprise GenAI efforts are yielding no measurable P&L return, despite tens of billions in spend. On the other, Google’s latest global ROI of AI survey reports most production adopters already see ROI, and that agentic AI is accelerating value. Both can’t be true, unless they’re looking at different worlds. They are. And the gap between those worlds explains why so many pilots stall while a minority sprint ahead.

MIT’s world is the messy middle of enterprise AI: experiments, proofs of concept, and “pilot islands” that never touch real systems. The key finding is stark: $30 – 40B in enterprise GenAI; 95% with zero P&L lift; only 5% of integrated pilots extract meaningful value. The report labels this the GenAI Divide—a growing split between firms that can wire AI into work and those that can’t. It isn’t anti-AI; it’s anti-PowerPoint. The difference is production, not potential.

AI OFFICE OF THE CFO

Scale smarter with powerful AI agent support

See how

Google, by contrast, sampled the cohort already using GenAI in production, not just dabbling. Within that population, 74% say they’re seeing ROI on at least one use case; among agentic early adopters, that share jumps to 88%. This is not a contradiction of MIT — it’s the flip side: what happens after you cross the chasm from chatty demos to systems that plan, call tools, and close loops.

Agentic AI — the boring definition, not the hype — means software that can decide and do things: reason over goals, orchestrate steps, call enterprise APIs, and hand off or escalate under human guardrails. Google reports over half (52%) of organizations using GenAI now also leverage agents, and 39% have more than 10 agents in production. That tells you where the ROI lives: not in prompts, but in process.

So the debate we should be having is not “Is AI overhyped?” but “What separates the 5% from the 95%?” Three patterns show up across both reports and in real deployments:

1) Production, not pilots.

Pilots that never touch source-of-truth systems (ERP, ticketing, CRM, policy engines) can’t move numbers that finance will recognize. Google’s methodology makes this plain: their dataset is production users, which naturally skews toward measurable return; MIT’s lens includes the swamp of experiments that never had a chance to earn. If you want CFO-grade proof, you must plug agents into the stack they already audit.

2) Agents with tools, not chat with vibes.

Where ROI shows up, agents have access to tools and data under governance: they open cases, post journal entries, create tickets, update records, and follow policy. Google’s own “what works” sections emphasize secure access to internal systems and governance first; performance follows. MIT’s 5% are, in essence, those who did exactly this.

3) Executive sponsorship and change muscle.

The companies reporting returns tend to have strong C-suite sponsorship and a clear definition of value — speed, accuracy, cost, or revenue—instrumented in advance. Unsurprisingly, organizations with comprehensive executive alignment are far likelier to see ROI. That’s not a platitude; it’s the difference between a science fair and a factory.

This is why the “AI bubble” framing misses the point. The data don’t say “AI doesn’t work.” They say AI that isn’t wired into work doesn’t work. If you evaluate language models as knowledge toys, you’ll get toy results. If you treat agents as transaction participants—with identity, policy, and commitments — you get operational leverage.

Finance operations are instructive. They’re structured, policy-heavy, and already instrumented for controls — perfect terrain for agentic systems to prove themselves without inviting existential risk. In practice, the early patterns are emerging in four everyday workflows:

Close support. Collecting missing receipts and documents, coding expenses, flagging anomalies, escalating as close approaches. An agent can run this continuously, within existing permissions, and write back to the system of record.*
Purchasing. Turning a vague “we need X” into a compliant, budget-aware request; routing it to the right approvers; then creating a card with the right limits or a purchase order. Employees make a plain-language request in Slack or Teams; the agent does the structured legwork in the background.*
Travel. Booking within policy based on traveler preferences, then automatically packaging a trip report and grouping expenses for one-click approval and ERP export.*
Payments support. Deflecting the flood of “why did this transaction fail/why is my card blocked/what’s the status of my reimbursement?” by answering instantly and proposing compliant next steps—without turning finance into a helpdesk.

These aren’t moonshots; they’re narrow, measurable, and auditable — precisely why they move the P&L needle.

Meanwhile, budgets are consolidating around what works. As AI infra costs fall, overall spend is still rising, often via reallocation from non-AI budgets, with a mean 26% of total IT spend now pointed at AI. That capital will keep chasing use cases that clear the ROI bar, i.e., agentic automations tied to governed systems.

So what’s the contrarian take leaders should champion?

Stop counting pilots; start instrumenting agents.

The wrong metric is “number of proofs of concept.” The right metric is “rate of closed-loop automations per quarter that meet policy and pass audit,” plus the dollars attached. That’s how you collapse the perceived gap between MIT’s reality check and Google’s optimism.

Treat “agent ops” like DevOps.

You need a runbook: intent hardening (translate human asks into unambiguous, policy-aware plans), idempotent actions (safe retries without double-spend), rollback semantics (compensations that unwind bad sequences), and observability (trace every tool call and decision). Don’t worry about grand “AI strategies” until you can ship, roll back, and measure an agent the way you do a microservice. (Google’s guidance is blunt here: give agents governed access to enterprise systems and write the rulebook early.)
Anchor ROI in operations, not imagination.

Google’s data show ROI clusters around five areas — productivity, customer experience, business growth, marketing, and security — with rapid time-to-production when use cases are repeatable and data are reachable. Security, notably, is emerging as a first-class agentic domain because the work is event-driven and tool-heavy. That’s not sexy—but it’s bankable.

Mind the sample, not just the statistic.

If your board quotes MIT’s 95%, ask: “How many of those efforts were truly in production?” If your vendor quotes 74% ROI, ask: “Were non-production users included?” These aren’t quibbles; they’re entirely different universes. Google’s own methodology limits claims to organizations using GenAI in production — hence the sunnier numbers. Both truths can coexist; your job is to move from one sample to the other.

In short, the paradox dissolves once you see the dividing line. Most pilots fail; many productions pay. The path from the first to the second is not model magic but systems engineering plus governance. That’s what the 5% already know, and what the 95% must learn fast.

The media will keep chasing the binary — “AI boom!” vs. “AI bust!”—because binaries make good headlines. The better headline for operators is this: AI returns are a function of agency and integration. If your agents can’t call the systems that move money, manage risk, or serve customers, they can’t move your numbers. If they can, they will.

Leaders don’t need another debate about hype. They need a factory for turning intents into actions — and a P&L that notices. Build that, and you won’t have to argue with either report. You’ll be living in the dataset that wins.

Georgi Ivanov

Senior Communications Manager

See all articles by Georgi

Georgi Ivanov is a former CFO turned marketing and communications strategist who now leads brand strategy and AI thought leadership at Payhawk, blending deep financial expertise with forward-looking storytelling.

See all articles by Georgi

photo of colleagues building ai agents for finance

AI and automation

Two truths about AI ROI: MIT’s market reality and Google’s production lens

Table of Contents

1) Production, not pilots.

2) Agents with tools, not chat with vibes.

3) Executive sponsorship and change muscle.

Stop counting pilots; start instrumenting agents.

Treat “agent ops” like DevOps.

Mind the sample, not just the statistic.

Related Articles

Everyone can build an AI agent. Very few can trust one

How CFOs are rethinking AI: Key insights from The Future CFO event in London

How AI reshapes SaaS profitability and the five CFO mandates for 2026