Support is a retrieval problem, not a generation problem

The reflex that fails at the stepping boundary

A customer writes in: their accelerator hangs under a specific DMA pattern, only on the B0 stepping, only above a board temperature, and only when a particular firmware revision is loaded. The reflex in 2026 is to paste that into a frontier model and ship whatever comes back. For this class of question that reflex is a trap. The model will answer — fluently, confidently, in well-formed paragraphs — and it has no way to tell you that the real answer is sitting in a GitHub issue your colleague closed eight months ago with a workaround and a link to the errata sheet. The question is not “can a model produce text about this.” It can. The question is whether the text is the thing your team already knows to be true. For a heavy-tailed technical inquiry, those two are rarely the same.

Retrieval is the discipline; generation is the surface

Information retrieval has been a rigorous field for half a century, with metrics that predate transformers and outlive every model release. Manning, Raghavan and Schütze’s Introduction to Information Retrieval gives the two numbers that matter: precision, the fraction of what you retrieved that was actually relevant, and recall, the fraction of the relevant material you actually found. Tighten one and you usually pay in the other — surface everything and you bury the signal; surface only the safe hits and you miss the one closed ticket that mattered. This is the real engineering problem in support, and it is measurable. Even attribution lives in this frame: a citation has recall (does the support cover the claim) and precision (is the cited source sufficient for it). Generation is the surface a user sees. Retrieval is where the system is correct or wrong.

The answer already exists — and reusing it is a named discipline

Case-based reasoning has made this argument since 1994. Aamodt and Plaza’s R4 cycle — Retrieve, Reuse, Revise, Retain — solves a new problem by pulling the solution to a similar past one and adapting it, with roots in Schank’s Dynamic Memory and Kolodner’s work through the early nineties. Read that loop again with a support org in mind: retrieve the prior resolved ticket, reuse its fix, revise it for the new board, retain the new resolution so the next person inherits it. That is not a clever hack bolted onto a chatbot. It is a thirty-year-old formalization of how expert problem-solving actually works, and it says the answer is usually already in the casebook.

The service industry codified the same insight without the academic vocabulary. Knowledge-Centered Service treats knowledge as a by-product of resolving tickets: you capture the answer in the act of solving, then reuse and improve it on every subsequent contact. Solve once, reuse many. KCS is also where person-level provenance earns its place — the resolution is not anonymous text, it is the thing a named engineer worked out, and citing that engineer is both a trust signal and a way to route the follow-up to someone who can defend the answer.

The weights are a lossy cache of the training corpus. Asking a base model an expert hardware question is a silent cache hit on stale, averaged data — with no miss signal to tell you it guessed.

Why ungrounded generation breaks exactly here

The failure is not random; it is structural and it is documented. Ji et al.’s survey of hallucination in natural language generation makes the point an infrastructure engineer will recognize immediately: ungrounded generation is least reliable in niche, low-resource domains. Hardware support is nothing but that. The questions follow a Zipf-like distribution — a few intake categories recur every week, and the expensive tribal knowledge lives in the long tail of rare, specific failures that appear a handful of times across a company’s entire history. That tail is precisely where the training data is thinnest and where a model’s averaged prior is most likely to invent a plausible, wrong mechanism. The regime where you most need a correct answer is the regime where pure generation is most confidently incorrect.

Errata and No Fault Found, the two cases that settle it

Two everyday support buckets show why this is a retrieval problem in the literal sense. The first is silicon errata. Every CPU and GPU ships with a list of documented hardware bugs — Intel publishes them per stepping, AMD’s early Phenom shipped with a TLB erratum whose firmware mitigation disabled a cache optimization at a measurable performance cost. A large share of inbound support is one question wearing different clothes: is this my bug, or is it a known erratum on this stepping? You do not generate that answer. You retrieve it, against a known corpus of cases, and you cite the document. The second is No Fault Found — the returned unit that passes every bench test because the real cause was a configuration or an interaction the bench never reproduced. NFF is the most expensive bucket in the building, and it is the one where retrieving the prior case — the one engineer who saw this exact non-reproduction and found the missing variable — pays the most. A model with no access to that case has nothing to offer but a confident guess.

Retrieve first, ground every claim, let generation phrase it

The architecture follows from the diagnosis. Retrieve against the resolved corpus — closed issues, resolved tickets, the teammate’s sent folder, the Discord thread where the real fix surfaced — across a graph that knows a board maps to a silicon revision and a revision carries its own errata. Ground every claim in a retrieved case, and measure it the way IR has always measured: citation recall for coverage, citation precision for sufficiency. Then, and only then, let generation do the one job it is genuinely good at — phrasing the retrieved answer in clean prose, adapted to this customer’s board and this customer’s tone. Generation is the Reuse and Revise step of the R4 loop, never the Retrieve. It writes the reply; it is not the source of truth. And the last step stays human: the FAE reads the cited draft, recognizes the case and the colleague who closed it, and presses send.

The line that should survive the demo

The trap in generation is that it always produces an answer, which is exactly why it is dangerous for facts that have a right answer already written down somewhere in your company. Treat support as generation and you optimize for fluent text. Treat it as retrieval and you optimize for a true, sourced answer a skeptical engineer will stake their name on.

In expert support, the answer is almost never missing. It is misfiled. Build the system that finds it, cites it, and lets a person send it — and stop asking a model to hallucinate what a colleague already solved.

See it on your own cases.

Under NDA, on a real workflow from your team.

Request a demo