We cut our LLM trace bill 30% with two sampling rules

Our observability bill for LLM traces was climbing in a straight line with request volume, and most of what we were paying to store was boring: successful calls that did exactly what they were supposed to, captured at full fidelity, forever. We were treating every trace as equally worth keeping. They are not.

The two rules

Traffic	Rule
Successful traces	keep 10% (head-based sample)
Errors, timeouts, retries, user-flagged	keep 100%, never sampled out

Errors are never sampled out. Slow outliers are never sampled out. The one weird retry storm is never sampled out. What gets dropped is the ninth identical successful call that tells you nothing the first did not.

What happened

The trace bill dropped about 30%. In three months of incident debugging since, we have not once needed a trace that sampling threw away. You almost never debug a success. You debug failures, and every failure is still there at full fidelity.

The part I am still uneasy aboutThe rare GOOD trace. Sometimes a successful run is interesting precisely because it succeeded in a surprising way, and head-based sampling has a 90% chance of dropping it. Tail-based sampling would fix this in theory and is more infrastructure than we wanted to run.

If you have found a cheap way to keep the rare informative success without storing every boring one, I want to hear it.

We cut our LLM trace bill 30% with two sampling rules

The two rules

What happened

The part I am still uneasy aboutThe rare GOOD trace. Sometimes a successful run is interesting precisely because it succeeded in a surprising way, and head-based sampling has a 90% chance of dropping it. Tail-based sampling would fix this in theory and is more infrastructure than we wanted to run.

Comments

More from this blog

Langfuse alternatives: 6 LLM observability tools, sorted by the thing that bites you in month eight

We put a cost SLO on our LLM features. It is the number that finally made eng care about token spend.

Per-project LLM cost attribution with OTel spans: the wiring

Command Palette

The two rules

What happened

The part I am still uneasy aboutThe rare GOOD trace. Sometimes a successful run is interesting precisely because it succeeded in a surprising way, and head-based sampling has a 90% chance of dropping it. Tail-based sampling would fix this in theory and is more infrastructure than we wanted to run.

Comments

More from this blog