DB-first state, complexity-routed models, and why fire-and-forget LLM calls are an insurance liability.

The Hallucination Problem Nobody Talks About

Every AI demo shows the happy path. The agent classifies a document correctly. Files a claim. Sends a notification. The audience claps.

What they don't show: the agent that invents a claim status that doesn't exist. The agent that reports a $45,000 reserve when the actual number is $4,500. The agent that "remembers" a policy endorsement from a previous conversation that never happened, because its context window reset between sessions.

In insurance, a hallucinated data point isn't an annoying chatbot glitch. It's a regulatory violation, a misquoted premium, or a fraud referral on a clean claim. We needed agents that couldn't lie, even by accident.

The Fix: Database-First State

The agent doesn't own state. The database does. Before every conversation turn, the agent calls syncFromDb() and pulls the latest authoritative data: claim status, reserve amount, fraud signals, document inventory. If an adjuster updated the claim while the agent was idle, the agent picks it up on the next sync.

When the agent takes an action (escalates to SIU, updates a reserve, requests a document), it writes to the database first, then re-syncs its in-memory state. The agent's "memory" is always a reflection of ground truth, never a divergent cache.

This is a deliberate architectural choice: the agent holds session context (conversation history, reasoning chain), but the database holds authoritative state (claim numbers, dollar amounts, statuses). The two never conflict because state flows in one direction: database → agent.

Not Every Decision Needs Claude

We run 15 agents in production. Most of them handle routine tasks: parsing a document, classifying an email, updating a timeline. For these, a fast inference model running at the edge is more than sufficient, and 4x cheaper per token.

But some decisions are high-stakes. Fraud detection on a $200,000 claim. An SIU referral that triggers a human investigation. An underwriting decision that binds a carrier to $5M in liability. These need the best reasoning model available.

Our base agent class exposes a single property: taskComplexity. Standard complexity routes to the fast model. High-stakes routes to the reasoning model. The decision is per-agent, not per-request, because the nature of the work determines the required reasoning depth, not the individual message.

The claim agent is always high-stakes. The document classifier is always standard. The submission agent defaults to standard but can escalate per-tool when FMCSA enrichment reveals a carrier with revoked authority.

Tools Are Transactions, Not Side Effects

Every agent tool follows the same pattern: read from DB, compute decision, write to DB, re-sync state, return result. No tool fires and forgets. No tool modifies in-memory state without persisting it first.

The fraud detection tool reads the policy inception date, compares it to the loss date, checks the claim description length, and generates severity-tagged signals. It merges new signals with existing ones (deduplicating by description), writes the merged list to the database, and only then returns the result to the agent.

If the tool fails halfway through (network error, database timeout), the state hasn't changed. The agent can retry safely because every tool is idempotent. There is no partial state.

Vectorized Knowledge: Agents That Read the Manual

Agents don't just reason over claim data. They search a vectorized knowledge base of underwriting guidelines, regulatory requirements, and historical patterns. The search is semantic; an agent handling a workers' comp claim in California can pull WCIRB filing requirements without knowing the exact document title.

The neural index runs as a background workflow, continuously ingesting and embedding new documents, guidelines, and regulatory updates. When an agent calls its searchKnowledge tool, it's querying a live, up-to-date vector store, not a static prompt preamble.

This is the difference between an agent that says "I think California WC requires a WCIRB filing" (hallucination risk) and one that says "Per document UW-2026-CA-003, California WC requires WCIRB filing for all new policies effective after January 1, 2026" (grounded in retrieved evidence).

Every Generation Is Traced

Insurance regulators will eventually ask: "Why did your AI escalate this claim to SIU?" You need an answer that includes the exact input, the exact model, the reasoning chain, the tool calls, and the final output. You need to know what it cost and how long it took.

Every agent generation is traced with full observability: model ID, prompt tokens, completion tokens, latency, cost in USD, complexity routing decision, and the complete tool call chain. The trace is immutable and queryable.

This isn't optional instrumentation we'll add later. It's built into the base class. Every agent that extends it gets tracing for free. You can't ship an unobserved agent because the framework won't let you.

15 Agents. Zero Hallucinated State.

The pattern works because it constrains the agent to what it's good at (reasoning, pattern matching, natural language) and removes what it's bad at (remembering numbers, maintaining state, being deterministic). The database handles truth. The model handles inference. The framework handles safety.

Adding a new agent is about 100 lines of TypeScript: an initial state type, a syncFromDb() implementation, a system prompt, and a set of typed tools. The base class provides database access, model routing, observability, and lifecycle management.

We went from "AI is too risky for insurance" to "AI is too useful to not use in insurance" by making the risk architectural instead of aspirational.

Want to see the agents in action?

We Gave Our AI Agents a Database. They Stopped Hallucinating.