Skip to content
← All insights
GROUNDING·Apr 22, 2026·5 min read

No confident nonsense

Hallucinations are a UX problem before they're a model problem. The fix isn't a smarter LLM — it's a stricter pipeline.

The most common complaint about AI support chat — fairly or not — is that it makes things up. It does. Every model in production today will, given enough surface area, generate a refund policy, an SLA, a discount code, an integration that doesn't exist. The temptation is to wait for the next model release to fix it. We don't think that works.

Hallucinations are a pipeline problem. The model is a component, not the cure.

The shape of a grounded answer

A Chatified answer isn't one model call. It's a sequence:

  1. Embed the visitor's question. Search a per-tenant Qdrant collection for the top-20 candidate chunks.
  2. Pass those 20 to a cross-encoder reranker. Keep the top 5 by relevance, drop the rest.
  3. Build a sectioned prompt — sources block, voice rules, citation rules, hard rules — and stream the answer.
  4. Inspect the answer for hedge language and missing citations. Classify the result as answered, retrieval_empty, or model_hedged.
  5. Render the answer with inline [n] citations to the chunks that supported each claim.

Steps 4 and 5 are where most chat tools cut corners and most hallucinations leak through. If the retriever found nothing useful, the answer is suppressed and the visitor is offered a human. If the model produced a grounded answer that was somehow forced into a hedge — "I don't know but…" — it's flagged as a miss anyway, because a hedge isn't the answer the merchant wanted.

Why citations matter even when the answer is right

A correct answer without sources is indistinguishable, to the visitor, from a confident hallucination. They have no way to verify. Showing the chunks that supported the answer — by title, by URL, by confidence — converts a black-box reply into something the visitor can audit on their own. The interesting effect is on the operator side: when the agent cites the wrong chunk, it's an obvious tell, and you can fix the underlying KB instead of trying to fix the model.

The miss is the asset

The classifier's output isn't just a runtime decision — it's a signal. Every retrieval_empty is a question your knowledge base couldn't answer. Every model_hedged is a question it could have answered if the chunks were better written. Cluster a week of those, draft a starting article from each cluster, fact-check before publishing — the miss tells you which doc to write next.

That's the inversion at the centre of the product. Hallucinations aren't a problem you suppress. They're a problem you instrument, classify, and convert into the next version of your knowledge base.

Like the way we think? See it shipped.Start 7-day trial
KEEP READING
REVENUE·4 min

The chat that pays for itself

Most support chat is a cost centre because nobody can see the revenue it earns. We rebuilt the layer underneath so every conversation is tied to a Stripe charge.

CAMPAIGNS·3 min

Outbound from the inbox

Inbound chat is half the conversation. Once you know who's on the site and what they care about, the inbox should be able to speak first — to the right people, on its own terms.