The procurement agent myth: Conversation doesn’t replace process

4 minutes

Georgi Ivanov - Senior Communications Manager at Payhawk

AuthorGeorgi Ivanov

Read time

4 minutes

PublishedJan 27, 2026

Last updatedJan 30, 2026

CFO using AI-powered procurement software to review spend, approvals, and compliance in real time.

Quick summary

Everyone’s selling “agents” as if autonomy is the breakthrough. In procurement, it isn’t. The real advance is delegation inside controls: Work that’s broken into steps that can be followed, checked, and audited.

Project Vend is the clearest public example of what “agents in the real economy” look like. It’s an experiment where Anthropic put an AI agent in charge of a small shop in their office and let it run for weeks. In phase two, the shop got better, but not because “autonomy arrived.” It got better because Anthropic stopped letting the agent improvise and started forcing procedures.

Get a demo

Get fresh finance & AI insights, monthly.

Unsubscribe anytime.

By submitting this form, you agree to receive emails about our products and services per our Privacy Policy.

Phase one: Capable but not operational

In phase one, Anthropic and Andon Labs put Claude Sonnet 3.7 in charge of a real office shop. The agent (nicknamed Claudius) could browse the web, message people in Slack, and ask Andon Labs to restock. It could set prices, choose inventory, and deal with customers.

Some of it worked: It found suppliers fast, handled requests, and refused a few obvious “harmful” asks from internal testers.

But it failed in the way agents usually fail when they touch real operations: It ignored clear profit opportunities, hallucinated critical details like where to send payments, priced products without research, and sold at a loss. It also got talked into discount codes and giveaways, and it did not reliably learn from its own mistakes.

If you work in procurement, none of this should surprise you.

Procurement is a control system which exists to prevent three outcomes: Buying the wrong thing, buying it the wrong way, or buying it at the wrong price. Which means, a “helpful” assistant that improvises in real time can never pass those checks.

Orchestrate finance with ease & efficiency: Meet the agents

See how

Phase two: The real change

Phase two brought a newer Sonnet model and a changed environment. That helped, but the bigger change was structural: More tools, tighter instructions, and more mandatory procedures.

Anthropic gave the agent a CRM to track customers, suppliers, deliveries, and orders. They improved inventory management so it could see what it paid for items. They expanded web browsing so it could do deeper supplier and pricing research. They added basic ops helpers like feedback forms, payment links, and reminders. And they kept a key boundary: No direct payment interface, so purchases still needed human approval.

That setup steadied performance. Claudius got better at normal business interactions, sourcing items, setting prices that kept margin, and executing sales.

What worked: Forced procedure

The most telling part of phase two is what Anthropic says moved the needle: Forcing Claudius to follow steps. When a request came in, instead of throwing out a low price and an optimistic delivery date, the agent had to check facts using tools first. Prices went up, and delivery promises got more conservative, so the output got more realistic.

Anthropic also says the quiet part out loud: Bureaucracy matters. Procedures and checklists exist because they provide institutional memory and prevent common screwups.

That is the core strategic implication of Project Vend's evolution. The path from “cool demo” to something you can run with isn't just about a bigger model. It’s turning work into steps the agent can’t skip, backed by tools that make those steps easy to follow and hard to game.

Why most “procurement agents” disappoint in production

This is why so many procurement agents look great in a demo and fall apart in production. They treat procurement as a conversation. But procurement needs required fields, routing, approvals, and an audit trail (conversation is just the interface).

Project Vend also shows what doesn’t work as a shortcut. Anthropic introduced an AI “CEO,” Seymour Cash, to pressure the shopkeeper into fewer bad decisions. It helped in some areas, like reducing discounts, but it created new problems. The CEO swapped discounts for refunds and store credits, which still burn revenue. It drifted into strange loops. Some of that got patched with more aggressive prompting.

The takeaway is straightforward: You don’t get governance by adding another agent. If the “manager agent” is made of the same stuff as the “worker agent,” you get the same kinds of mistakes, but now doubled. Governance comes from policy, role permissions, and hard gates.

By contrast, a specialist agent in phase two — Clothius, focused on custom merch — and worked better mostly because the role was narrow and clear, letting Claudius focus on running the shop.

That maps cleanly to procurement: the pattern that scales is not a single general agent that “runs procurement,” but specialists operating within defined workflows with defined boundaries.

Even after these changes, phase two didn’t turn into “autonomous commerce.” Anthropic is clear that there’s still a wide gap between “capable” and “consistently safe.” The agent was still open to naive decisions and social engineering.

The experiment still needed substantial human support, both for physical tasks and for getting the agents unstuck.

What this means for agentic procurement

Slick chat UX is easy to copy. But the advantage goes to teams that translate policy into enforced workflows, so the agent acts consistently within real controls.

That’s the philosophy behind Payhawk’s Procurement Agent. The point isn’t to add “AI” to intake. The point is to run a procurement procedure end-to-end, inside the tools people already use, with controls built in.

In Payhawk, the Procurement AI Agent helps employees create and manage requests in Slack, as well as in the Payhawk web portal or mobile app. It guides users step by step and is built to keep requests within the company’s configured workflows, rather than relying on ad hoc judgment. It also helps keep requests moving through notifications and follow-ups in Slack, so work doesn’t stall.

Notice what’s missing: It’s not “the agent buys things.” It’s “the agent helps the organisation run procurement cleanly.” That’s what makes delegation possible.

How to evaluate procurement agents after Vend 2

If you want a practical way to evaluate procurement agents after Project Vend 2, don’t ask for a demo of “what the model can do.” Ask how the system forces procedure.

Questions that matter:

Can it turn messy chat into a structured request with required fields every time, without someone retyping it?
Can it route approvals by category, thresholds, and policy, without guessing who should approve?
Does it run checks before commitment (budget, vendor status, basic price sanity)?
Does it handle exceptions as a real path (new vendor, out of policy, urgent buy, missing info) and package them into a short escalation that an approver can actually decide on?
Can it produce an audit trail that explains what happened and why, without reconstructing the story from Slack threads?

Project Vend 2 is a reminder that polished output is easy. The hard part is operational reliability: Explicit procedures, connected tools, and enforced control points that keep the model behaving like a steady operator.

The best systems feel simple to employees: “Message Slack and it gets done.” But under the hood, they are strict. That’s what procurement needs and what finance can sign off on.

Project Vend didn’t show that agents are ready to run businesses alone. It showed something more useful: The road to trustworthy agents in finance is paved with the boring stuff — procedures, checklists, cost visibility, routing logic, and audit trails.

Book a demo with Payhawk to see how procurement workflows can run end-to-end inside Slack and Payhawk — with procedures, controls, and auditability built in from the start.

Georgi Ivanov

Senior Communications Manager

See all articles by Georgi

Georgi Ivanov is a former CFO turned marketing and communications strategist who now leads brand strategy and AI thought leadership at Payhawk, blending deep financial expertise with forward-looking storytelling.

See all articles by Georgi