Skip to main content

From pilots to profit: Finance’s GenAI paradox is finally cracking

Georgi Ivanov - Senior Communications Manager at Payhawk
AuthorGeorgi Ivanov
Read time
6 Minuten
PublishedOct 17, 2025
Last updatedOct 17, 2025
Quick summary

Many finance companies are experimenting with AI, but few are realizing real value. Success depends less on the model itself and more on robust processes, clear guardrails, auditable workflows, and the right organizational setup. By focusing on measurable tasks like invoice processing, reconciliation, and expense management—and embedding compliance from the start—AI pilots can finally be turned into tangible profits.

If you talk to boards, everyone’s “doing AI.” If you look at P&Ls, almost no one is. That gap —sky-high intent, thin impact — framed a Financial Times webinar (in partnership with Payhawk) that functioned less like another hype session and more like a field manual for escaping pilot paralysis. The moderator opened with the tell: while 82% of finance leaders say they’re open to AI, only ~20% of companies have progressed beyond proof-of-concept (PoC)to capture tangible value.

What made the discussion compelling is that the panelists didn’t argue about model-of-the-week. They focused on operating reality — guardrails, architectures, incentives, and the messy work of rewiring how finance actually runs.

Enterprise plumbing is the difference between launch and liability

Payhawk cofounder and CEO Hristo Borisov articulated what many buyers whisper: pilots die when shiny tools fail the enterprise checklist. In an agentic era—where software can take actions, not just draft text — scoped permissions, explicit workflow boundaries, and immutable audit trails aren’t nice-to-haves; they’re the ticket to production. “Unexciting,” he admitted, but without them, the intent/impact disconnect persists and PoCs don’t make it into real life.

Barclays’ Gaurav Sawhney extended the point with a practical autonomy ladder — human in command → human in the loop → human over the loop — and a warning: in agentic flows, errors propagate fast without meticulous decision logs. Finance needs “auditability, explainability, reliability, repeatability” baked in from day one. In other words, governance is product.

Scale smarter with powerful AI agent support

It’s not just tech debt; it’s people and process debt

PwC’s Lilia Christofi pushed back on the industry’s comfort with experimentation for experimentation’s sake. Only “maybe one percent” of institutions have the maturity to run AI at scale, she argued, because success demands rewiring culture, org design, and operating models, not just plugging in new models. For now, many enterprises will be better served by a centralised AI stack that enforces consistency and control; over time, that can federate.

That centralisation isn’t monolithic. Christofi described a modular approach where an “agent” is really a system of agents — task logic, prompts, guardrails — plus validation and challenge agents that check each other’s work. The architectural question is how far to componentise for reuse across domains such as procurement or legal. However you answer it, you’ll need tool orchestration and the discipline to fund ongoing R&D, not just demos.

Platforms beat point models

Swiss Re’s Ermir Qeli brought receipts from insurance — one of the few corners of finance that may be out of PoC theater. His team is “past the pilot stage” on use cases buried in the unstructured data sprawl of finance, including claims triage, fraud detection, and contract intelligence. The lesson wasn’t “pick Model X.” It was: build an AI platform where models can change without breaking governance, processes, or measurement. That’s how you manage the entropy of a landscape that updates weekly.

Qeli also made a subtler point: foundation models don’t understand your business. There’s no model that can underwrite risk or pay a claim end-to-end. That’s good news for incumbents — if they staff the “translator” roles that fuse domain expertise with AI fluency and keep efforts aligned to business goals. That’s how you “move away from the trap of experimenting… to big impact.”

Regulators aren’t the excuse, ambiguity is

Yes, regimes are diverging: the EU AI Act’s risk tiers vs. the UK’s sandbox vs. the US’s sectoral patchwork. But the panel’s read was refreshingly pragmatic: treat the EU AI Act as a pillar of a global framework, then flex for APAC and US specifics; expect regulators to demand clarity and remediation, not just data hoarding; and assume fraud vectors will expand with AI, requiring explicit controls on what data agents can access and where it flows. Being “best prepared with what you know” beats waiting for perfect harmonisation.

What actually moves the needle now

The conversation kept returning to discrete, closed-loop finance tasks where value is measurable and fast: chasing expenses and receipts, extracting invoice data, turning transactions into ERP-ready records, closing reconciliation loops. If you can quantify hours/FTE saved, cycle-time reduction, and auto-reconciliation rates, CFOs will listen. If you can’t, they won’t. Payhawk is explicitly structuring its “AI Office of the CFO” around such line-item outcomes.

A production-first stance, or nothing

Christofi’s blunt assessment of the last 18 months — “expensive experimentation”—landed because it matched what buyers feel. Her praise for Payhawk’s posture (“not doing PoC but taking everything into production in a structured way”) won’t settle vendor bake-offs, but it tees up a broader lesson for the ecosystem: production is a product choice. Teams that make it will stop talking about AI strategies and start reporting AI revenues.

A CFO’s cheat sheet: six ways to escape the PoC trap

  1. Ship the guardrails first: Track the % of AI actions with complete, immutable audit trails and the % executed within scoped permissions. If those numbers are low, your own risk team will halt deployment.
  2. Pick closed-loop use cases: Instrument hours/FTE saved, cycle-time reduction, auto-reconciliation rates—not vibes. Discrete tasks with clear baselines make the ROI story bulletproof.
  3. Build a model-portable platform: Measure time to swap/upgrade a model, shared orchestration coverage, and regression pass rates after changes. Your moat is stable orchestration and testing, not vendor lock-in.
  4. Measure beyond cost takeout: For service journeys, track CSAT (customer satisfaction score), first-contact resolution, handle-time. For risk, quantify detection lift and false-positive/negative deltas. For revenue, show conversion or upsell from AI-surfaced insights. Boards want profitable growth, not demo metrics.
  5. Codify the autonomy ladder: Report the mix of flows by autonomy level (in-command/in-loop/over-loop) and the MTTR (mean time to resolution) when humans intervene. If you can’t explain when people stay in charge, you won’t scale agents.
  6. Treat compliance as a design input: Stand up a global framework with EU AI Act as a pillar, plus playbooks for UK/US/APAC variance. Track time-to-evidence for regulators and how quickly audit findings are closed. Regulation is a constraint, not a blocker.

The bigger narrative here is hopeful. Finance is edging from assistive UI sugar to auditable action. Competitive edge won’t come from model selection; it will come from operational discipline — the ability to turn model outputs into financial outcomes. After a year of AI show-and-tell, the bar has moved. The real question now is simple: Where is the money showing up?

Georgi Ivanov - Senior Communications Manager at Payhawk
Georgi Ivanov
Senior Communications Manager
LinkedIn

Georgi Ivanov is a former CFO turned marketing and communications strategist who now leads brand strategy and AI thought leadership at Payhawk, blending deep financial expertise with forward-looking storytelling.

See all articles by Georgi

Related Articles