Skip to main content

Command Palette

Search for a command to run...

We cut our LLM trace bill 30% with two sampling rules

Updated
2 min read

Our observability bill for LLM traces was climbing in a straight line with request volume, and most of what we were paying to store was boring: successful calls that did exactly what they were supposed to, captured at full fidelity, forever. We were treating every trace as equally worth keeping. They are not.

The two rules

Traffic Rule
Successful traces keep 10% (head-based sample)
Errors, timeouts, retries, user-flagged keep 100%, never sampled out

Errors are never sampled out. Slow outliers are never sampled out. The one weird retry storm is never sampled out. What gets dropped is the ninth identical successful call that tells you nothing the first did not.

What happened

The trace bill dropped about 30%. In three months of incident debugging since, we have not once needed a trace that sampling threw away. You almost never debug a success. You debug failures, and every failure is still there at full fidelity.

The part I am still uneasy aboutThe rare GOOD trace. Sometimes a successful run is interesting precisely because it succeeded in a surprising way, and head-based sampling has a 90% chance of dropping it. Tail-based sampling would fix this in theory and is more infrastructure than we wanted to run.

If you have found a cheap way to keep the rare informative success without storing every boring one, I want to hear it.