Why FAE knowledge evaporates — and what to do about it

The two-day problem you pay for twice

The first time a board fails link training after a firmware bump, it costs a senior FAE two days: pulling lane-margining logs, ruling out the SerDes, finding the ASPM L1.1 substate the new power-management path entered too aggressively. The second time, for a different customer on the same silicon, it should cost two minutes. On most teams it costs two days again, because the only copy of the resolved case lives in one person’s head.

It’s not unwritten — it’s scattered

The obvious diagnosis is wrong. The knowledge was not lost because nobody wrote it down; it was lost because of where the pieces were written. The root cause surfaced in a Discord thread between two driver engineers. The symptom arrived as a customer email with an attached dmesg. The fix landed as three lines in a closed GitHub issue, logged in the commit as “tweak L1.1 entry delay.” Each artifact is searchable on its own and inert in aggregate. The one person who can join them into an answer is the person who did the joining the first time, and that person takes PTO, switches teams, or leaves.

This is why “just search harder” does not rescue you. An answer to a returning customer is not a document you retrieve; it is a synthesis you reconstruct: this symptom, on this board revision, under this firmware, was this root cause, fixed this way, and here is why the three other plausible causes were ruled out. No single page holds that. The page that came closest describes the previous board spin, and the one engineer who knows it is stale stopped trusting it months back. Wikis rot from the top: the people who understand a system best get the least value from documenting what they already carry, so they document last and least.

Storage is not the constraint — externalization is

Michael Polanyi gave the cleaner half of the diagnosis in 1966: “we can know more than we can tell.” The senior FAE’s competence is procedural, accumulated by debugging a thousand boards, and most of it never surfaces as a sentence. David Autor later sharpened this into Polanyi’s Paradox: the tasks hardest to automate or document are exactly the tacit ones. That is the real answer to “why can’t we feed it all to a model.” You cannot externalize what was never explicit.

Nonaka’s SECI model traces the loop knowledge travels, and the step that fails is externalization, the tacit-to-explicit conversion, because it is lossy by construction. When Matsushita set out to specify a home bread machine, the kneading would not reduce to numbers; one engineer apprenticed herself to a master baker for months to acquire the feel before it could become a spec. The artifact existed only because someone paid the conversion cost in person, in shared context. A wiki edit box asks an engineer to pay that cost alone, in prose, for a reader they will never meet.

The information-theoretic core

Here is the point most teams miss, and it is information-theoretic, not a matter of laziness. A design doc records the conclusion, “we use ASPM L1.1 on this platform,” and discards the reasoning path: the substates tried, the thermal corner where L1.2 reintroduced the failure, the load pattern under which the first fix deadlocked. The reasoning path is the reusable asset, and it is exactly what does not fit in a doc.

Worse, the two failure modes are the same people. The truck-factor literature found that across 133 popular GitHub systems, roughly two-thirds had a truck factor of two or fewer, and nearly half sat at one. A replication put about half of projects at a bus factor of two or fewer and found no correlation between a project’s stars and its bus factor — popularity buys no resilience. The term is engineer jargon, coined on the Python list in 1994: “what if Guido were hit by a bus.” Truck-factor research proves knowledge concentrates in one or two people; the documentation paradox proves those exact people are least likely to write it down. The binding constraint is the externalization step, not the storage layer.

A cache-eviction problem with no write-back

There is one frame an engineer will trust here. The senior engineer’s memory is a hot L1 cache holding the resolved state of a thousand incidents. Every answer-from-memory is an L1 hit never persisted to the slow store, the wiki; after a departure the next access re-debugs the 2021 link-training bug from the dmesg up. The truck-factor papers are, quite literally, measuring how many cache lines hold exactly one copy with no backing store.

A departure is not a graceful flush — it is a cold eviction with no write-back, and the next access is a full recompute.

So the move is not to extract harder; it is to invert the write policy. Stop draining tacit knowledge into static prose after the fact, which loses the reasoning path and conscripts the wrong people. Capture the solved case at the moment of solving, while the reasoning is warm, as a structured failure-mode and mitigation pair: this symptom, this board, this firmware, this root cause, this fix, these ruled-out alternatives, with the Discord thread, the email, and the closed issue carried as provenance. Model the case as one entity linked across its sources, with the hardware named correctly, so the next inquiry retrieves the case, not three orphaned documents.

Why provenance is load-bearing

Provenance is the structural member, not decoration. A skeptical FAE about to reply to a customer will not stake their name on a synthesis they cannot audit. It is what converts a plausible answer into a defensible one: this is the teammate who solved it, this is the thread where the root cause was found, this is the commit that fixed it. A wiki asserts; a cited resolved case shows its work.

The policy fits in three words: write back on resolve. The expensive recompute already happened once; the only question is whether you persist the result before the line is evicted. Every time a senior engineer answers from memory and the answer dies with the conversation, the team has paid for a computation it threw away. The fix is not another place to read. It is the discipline of not making the team re-earn what one of them already knows.

See it on your own cases.

Under NDA, on a real workflow from your team.

Request a demo