KV cache

KV cache

/ˌkeɪ ˈviː ˌkæʃ/

LLM Inference

cached key-value pairs from previous tokens to speed up autoregressive generation

The KV cache avoids recomputing attention for earlier tokens at each step.

Origin: Key-Value cache, from database terminology