From 60 Minutes to 60 Seconds: Production MCP Workflows for Healthcare Billing

Healthcare providers lose 30-60 minutes per insurance claim error, with the industry wasting over $250 billion annually on administrative overhead. At Kustode, we built a multi-tenant Revenue Cycle Management (RCM) platform processing thousands of daily EDI transactions — 837P claims, 835 remittances, and 270/271 eligibility checks. While we automated the EDI pipes early on, the intelligent workflow orchestration — denial management, prior authorization, claim intervention — remained stubbornly manual. Until we integrated MCP.

This post shares the production lessons from deploying MCP workflows in a HIPAA-compliant healthcare platform. I'll be presenting this in depth at the MCP Dev Summit North America in New York on April 3, 2026.

The Problem: $262 Billion in Administrative Waste

Healthcare billing is a domain where small errors cascade into massive costs. The numbers are stark:

$262B annual administrative waste in the US healthcare system
11.8% claim denial rate industry-wide
$57 average cost per rework on denied claims
30-60 minutes per error for manual claim correction

Behind each of these numbers is a billing specialist navigating between CMS coverage databases, NCCI edit rules, payer-specific policies, and clinical documentation — copying data from one system, cross-referencing in another, and manually constructing appeal letters. The workflows are complex, multi-step, and require domain expertise. They're also perfectly suited for AI agent orchestration.

Why MCP Instead of Traditional API Orchestration?

Before MCP, we evaluated several approaches: hard-coded workflow engines, LangChain-style chains, and custom API orchestration layers. Each had fundamental limitations for our use case:

Workflow engines (Temporal, Airflow) require predefined DAGs. Denial resolution paths vary per payer, per denial code, per facility. The combinatorial space is too large for static workflows.
LangChain/LangGraph chains couple tool definitions with orchestration logic. When you need the same CMS coverage tool accessible to denial resolution, medical necessity review, and eligibility checking agents, the abstraction breaks down.
Custom API layers work but require writing and maintaining tool-specific integration code for every agent.

MCP solved this by separating tool providers (MCP servers) from tool consumers (agents). Each domain knowledge source — CMS coverage, RVU data, medical coding, NCCI edits, denial codes — runs as an independent MCP server. Agents connect to the servers they need for a given task.

Architecture: MCP Servers as Healthcare Knowledge APIs

Our MCP server topology maps directly to the knowledge domains billing specialists rely on:

┌─────────────────────────────────────────────────────┐
│  Agents                                              │
│  ┌──────────────┐ ┌───────────────┐ ┌─────────────┐ │
│  │ CodeResearch │ │ MedNecessity  │ │ DenialResolv│ │
│  └──────┬───────┘ └───────┬───────┘ └──────┬──────┘ │
│         │    MCP tool calls│                │        │
├─────────┼─────────────────┼────────────────┼────────┤
│  MCP Servers                                         │
│  ┌────────────┐ ┌─────────┐ ┌──────────┐ ┌────────┐│
│  │CMS Coverage│ │ CMS RVU │ │Med Coding│ │  NCCI  ││
│  └────────────┘ └─────────┘ └──────────┘ └────────┘│
│  ┌──────────────┐                                   │
│  │ Denial Codes │                                   │
│  └──────────────┘                                   │
├─────────────────────────────────────────────────────┤
│  External APIs                                       │
│  CMS.gov  │  NIH Clinical Tables  │  CMS Downloads  │
└─────────────────────────────────────────────────────┘

Each MCP server is built with FastMCP (Python) and exposes domain-specific tools. The CMS Coverage server, for example, provides tools like search_coverage_by_cpt, get_lcd_details, and check_ncd_applicability. The NCCI Edits server provides check_code_pair_edits and get_modifier_requirements.

Real Workflow: Automated Denial Resolution

Here's what happens when a claim denial arrives via an 835 remittance:

Denial ingestion — The platform receives the 835, extracts CARC/RARC codes, and creates a denial work item.
Agent activation — The DenialResolution agent receives the denial context: CPT codes, diagnosis codes, modifier, denial reason, payer, and facility.
Evidence gathering via MCP — The agent calls MCP tools to gather evidence:
- cms_coverage.search_coverage_by_cpt("99213") → Gets LCD/NCD coverage criteria
- ncci_edits.check_code_pair("99213", "36415") → Verifies bundling rules
- denial_codes.get_resolution_guidance("CO-4") → Gets payer-specific resolution patterns
Reasoning trace — Every MCP tool call produces a ReasoningTrace with confidence scores, evidence sources, and the logical chain. This feeds into SENTINEL for audit.
Resolution action — Based on evidence, the agent either corrects and resubmits the claim, generates an appeal letter with cited policy references, or escalates to a human specialist with a structured summary.

The entire workflow that previously took 30-60 minutes of human research completes in under 60 seconds.

Multi-Tenant MCP: PHI Isolation in a Shared Architecture

Multi-tenancy in healthcare means PHI isolation isn't optional — it's a HIPAA requirement. Our MCP architecture enforces tenant boundaries at three levels:

Request-level isolation — Every MCP tool call carries a tenant context (organization ID). MCP servers filter all queries by tenant scope. A tool call from Org A can never return data belonging to Org B.
Database-level isolation — PostgreSQL row-level security policies enforce tenant boundaries at the query level. Even if agent logic had a bug, the database would reject cross-tenant access.
Audit trail — Every MCP tool invocation is logged with tenant ID, user ID, tool name, parameters, and the response summary. The audit trail is immutable and tenant-scoped.

Long-Running Workflows: Managing 45+ Day Denial Cycles

Healthcare denial resolution isn't a request-response pattern. A denial can take 45+ days to resolve, involving multiple rounds of additional documentation requests, payer communications, and appeals. MCP's tool-call model is inherently stateless — so we needed a state management layer on top:

Workflow state machine — Each denial/prior-auth workflow has a state machine (pending → researching → awaiting_documentation → submitted → resolved/escalated). State transitions are durable and resumable.
Checkpoint-and-resume — When a workflow pauses (waiting for a payer response, additional documentation, or human review), its full context is serialized. When the trigger arrives (835 update, document upload, timer), the agent resumes from the exact state with all prior MCP tool results available.
Timeout and escalation — Configurable per-payer timeout rules automatically escalate workflows that exceed expected response windows.

Security Patterns for Regulated Environments

Deploying MCP in a HIPAA-compliant environment required security patterns beyond standard MCP usage:

AEGIS integration — All agent inputs pass through our AEGIS security proxy before reaching MCP-connected agents. This prevents prompt injection attacks from reaching billing data.
Tool-level RBAC — Not all agents can access all MCP tools. The CodeResearch agent can call CMS coverage tools but cannot access claims data. The DenialResolution agent has claims access but cannot modify coverage rules. Enforced via agentgateway with CEL policies.
PHI redaction in traces — MCP tool responses that contain PHI (patient names, dates of birth) are automatically redacted in reasoning traces and logs. The operational data pipeline never contains unredacted PHI.
No PHI in model context — Agent prompts use anonymized identifiers (claim IDs, organization IDs) rather than patient-identifying information. The MCP servers perform the mapping between anonymous IDs and actual records.

Observability: Tracing MCP Tool Calls in Production

We instrument every MCP interaction with OpenTelemetry. Each tool call becomes a span in a distributed trace, carrying:

Tool name and MCP server
Input parameters (PHI-redacted)
Response latency and size
Confidence score from the agent's reasoning
Tenant and user context

Prometheus metrics track tool call rates, error rates, and latency distributions per MCP server. Grafana dashboards give us real-time visibility into which tools are being called most, which are slowest, and which have the highest error rates. This observability data feeds into capacity planning for the MCP servers themselves.

When MCP Beats Traditional API Orchestration

After six months of production deployment, clear patterns have emerged for when MCP adds value versus when it's overhead:

MCP wins when:

The workflow requires multiple knowledge sources with varying combinations per task
New knowledge sources need to be added without changing agent code
The same tools are shared across multiple agents with different access policies
Reasoning transparency is required (MCP's structured tool-call format enables audit trails)

Traditional APIs win when:

The workflow is deterministic and rarely changes
Latency requirements are sub-millisecond
No reasoning transparency is needed

For healthcare billing, the dynamic multi-source research pattern makes MCP a natural fit. Every payer has different rules, every denial has a different resolution path, and the knowledge sources (CMS databases, NCCI edits, payer policies) are constantly being updated.

Hear More at MCP Dev Summit

I'll be presenting this in full detail at the MCP Dev Summit North America on April 3, 2026 in New York. The talk covers production metrics, deployment challenges, live architecture walkthroughs, and lessons learned from integrating MCP into an existing production system (versus greenfield builds).

← Back to Blog