Three weeks after shipping to production, a team's inference bill is running 4x projections. Their "AI agent" works flawlessly in demos but silently degrades on complex requests. When something fails mid-workflow, the on-call engineer can't reconstruct what the system actually did — there's no meaningful trace, no checkpoint, no way to distinguish what succeeded from what didn't. The postmortem will blame the model, or the prompts, or the cloud provider.

It should blame the architecture.

This pattern is becoming endemic. The culprit isn't incompetence — it's a category error that vendor marketing has made extremely easy to commit. Teams are deploying MCP, watching tools execute, and calling it an agentic architecture. They've solved a transport problem and mistaken it for an orchestration strategy. The production failures are already showing up. So are the infrastructure bills.

---

What MCP Actually Is

Start with a precise mental model, because the confusion downstream traces back to a fuzzy one here.

MCP — the Model Context Protocol, developed by Anthropic and now under Linux Foundation governance through the Agentic AI Foundation — is a transport and discovery protocol. It standardises how tools are described, discovered, and invoked across client-server boundaries using JSON-RPC 2.0. Its intellectual lineage is the Language Server Protocol, which solved an almost identical problem in developer tooling: instead of every IDE building custom integrations for every language analysis server, LSP gave them a common interface. MCP does the same thing for AI model-to-tool connectivity.

This is genuinely valuable. Before MCP, every vendor integration was a custom connector. OpenAI's function-calling API was powerful but vendor-locked. ChatGPT plugins were a stopgap. MCP's standardised schema means a compatible model can discover and invoke tools without bespoke integration work for each one. Linux Foundation governance signals serious long-term standardisation intent — this isn't a proprietary play that gets deprecated in eighteen months.

None of that changes what MCP is: plumbing. Standardised, well-designed, increasingly well-governed plumbing.

The analogy that matters: HTTP doesn't build your web application — it specifies how clients and servers exchange messages. Nobody conflates setting up HTTP routing with building an application. MCP is closer to HTTP than it is to an application server. It handles the message format and the transport. It does not handle what gets decided, when, or why. As IBM's technical documentation states explicitly: MCP does not decide when a tool is called and for what purpose.

That gap — between invoking a tool and deciding to invoke it — is where the expensive confusion lives.

---

What an Agent Actually Requires

A model that can call tools is not an agent. A model that can call tools is a model that can call tools. The distinction matters architecturally.

A genuine agent architecture requires five capabilities that MCP provides exactly zero of:

Goal decomposition — breaking an objective into a sequence of sub-tasks, determining dependencies, and managing execution order. State persistence — maintaining a coherent record of what has been done, what succeeded, what failed, and what context exists at each step. Conditional execution — branching workflow logic based on intermediate results, not just executing a predetermined sequence. Error recovery with context — not just retrying a failed step, but retrying it with knowledge of what succeeded before it, so recovery is semantically informed rather than blind. Role coordination — in multi-agent systems, managing who does what, who supervises whom, and how conflicting actions are arbitrated.

Frameworks like LangGraph and CrewAI exist to provide these capabilities. They give you graph-based execution where nodes are tasks and edges encode dependencies and conditions. They give you state objects that persist across steps. They give you memory layers — in-context for immediate reasoning, database-backed for long-term retention, episodic for workflow history. They give you the infrastructure to build systems that can actually pursue goals rather than just respond to prompts.

MCP can make those frameworks richer by giving them standardised tool access. That's the correct relationship: MCP as the integration layer beneath an orchestration layer. The problem is what happens when the orchestration layer doesn't exist.

What you get is what practitioners call the flat loop:

User Request → LLM → MCP Tool Call → Result → LLM → MCP Tool Call → Result → [repeat]

No state object. No workflow graph. No error branching. This works for simple, single-step requests — which is exactly what demos show. It collapses under any workflow that requires more than two sequential decisions.

Consider a representative business workflow: gather data from multiple sources, synthesise findings, format output for a specific audience, route for human approval, then distribute. Run this through a flat loop and you have a system that cannot track which steps completed, cannot branch based on approval outcomes, and cannot recover from a synthesis failure without restarting from data gathering. In production, under real load, with real failure rates, this architecture doesn't degrade gracefully. It fails opaquely.

---

The Costs That Show Up in Month Three

The budget destruction happens in three distinct ways. None of them appear in the deployment cost estimate.

The token burn problem. Anthropic's own engineering team has flagged this: tool descriptions consume context window space, increasing both latency and cost. When an MCP server exposes a large tool inventory to a general-purpose model, every request begins with the model processing every tool description in context before reasoning about which one to use. At 500–2,000 tokens per tool description, a server with 50 tools creates 25,000–100,000 tokens of overhead per request. That overhead compounds with every call. It doesn't show up in the deployment cost — it shows up in the inference bill three months later, by which point the architecture is load-bearing and expensive to change.

The fix requires a semantic tool router: an intent-classification layer that narrows the available tool set to a relevant subset — typically five to ten tools — before the model reasons about them. This cuts context overhead by 80–90% for large tool inventories and meaningfully improves reasoning quality, because the model isn't distributing attention across irrelevant options. That routing layer is an orchestration responsibility. It doesn't exist in MCP's spec.

The workflow restart cost. Without an orchestration layer checkpointing state, a failed step in a multi-step workflow triggers a full restart. In a five-step research and synthesis workflow, a failure at step four means re-executing steps one through three — re-calling upstream tools, re-burning tokens, and re-triggering any side effects those steps produced. If step two sent an email or wrote to a database, it does so again. During failure periods, per-workflow costs run three to ten times projected amounts. This isn't a monitoring problem. You cannot monitor your way out of a stateless architecture. Faster failure detection helps, but the restart cost is structural.

Tool description drift. An MCP server gets updated — new tools added, existing descriptions modified. The model now operates on a different tool set than was used during evaluation. Unlike a traditional API breaking change, this failure mode doesn't throw an error. It degrades decision quality silently, in production, with no obvious signal. Catching it requires versioned tool registries and regression testing at the orchestration layer — infrastructure that doesn't exist in most bolt-on MCP implementations because the category wasn't on anyone's architectural radar when the system was built.

---

The Architecture That Actually Works

The correct pattern is straightforward once the separation of concerns is clear:

Orchestrator (LangGraph / equivalent)
  → Task Graph with persisted State object
      → When tool execution is needed:
          MCP Client → MCP Server → External Tool / API / Database
          ← Structured result returned to orchestrator
  → State updated, next graph node determined

MCP handles what it's designed for — standardised tool connectivity with a clean schema and transport layer. The orchestrator handles all decisional logic: which tools to call, in what sequence, with what conditional branching, and what to do when something fails. The state object lives at the orchestration layer, checkpointing progress so failures are recoverable rather than catastrophic.

The LSP analogy is instructive here too. LSP succeeded because IDEs already had rich orchestration logic for editing, refactoring, and navigation. LSP standardised how those IDEs talked to language analysis servers — it didn't replace the editors' internal logic. MCP is the same kind of advance. It standardises the interface between your orchestration layer and the tools it calls. It doesn't replace the orchestration layer.

One caveat worth naming: MCP's experimental Tasks specification describes durable execution wrappers with status tracking and deferred result retrieval. If this matures, it will push some orchestration capability into the protocol itself, and the calculation here changes. Watch it. Don't build production systems on it yet. Shipping load-bearing infrastructure on experimental specs is precisely the kind of architecture bet that creates the budget problems this article describes.

---

The Governance Gap Nobody Is Discussing

MCP standardises how tools are called. It says nothing about whether they should be called, by whom, with what authorisation, or with what audit trail.

In enterprise environments, a tool call that triggers a database write, sends a customer-facing communication, or initiates a financial transaction needs authorisation gates that exist entirely outside the MCP layer. That compliance exposure only becomes visible when something goes wrong — which, given the stateless failure modes described above, is a question of when, not whether.

Teams building "agentic" systems on MCP alone are implicitly assuming these controls exist somewhere else in the stack. They may exist in principle. The question worth asking in your next design review is whether they're actually wired into the workflow execution path, or merely assumed to be present downstream.

---

What to Do This Week

If you have a system in production that was described as an "AI agent" and was built primarily by deploying an MCP server, run one diagnostic: find the state object.

Where is the record of what your workflow has done, what succeeded, and what the system knew at each decision point? If that object doesn't exist — if there's no orchestration layer maintaining it — you have a flat loop in production. The costs above are either already accumulating or waiting for load to expose them.

If you're mid-build, the priority inversion is counterintuitive but important: design the orchestration layer before the integration layer. Map your workflow as a graph — nodes for tasks, edges for dependencies and conditions, explicit state for what needs to persist across steps. Then use MCP to connect the tools that graph will need. In that order, not the reverse.

MCP is a genuine advancement in AI infrastructure standardisation. The mistake isn't adopting it. The mistake is treating connectivity as architecture — and discovering six months later that you built a very expensive, very well-connected system with no floor.