
KV cache
/ˌkeɪ ˈviː ˌkæʃ/
cached key-value pairs from previous tokens to speed up autoregressive generation
KV cache in a sentence
“The KV cache avoids recomputing attention for earlier tokens at each step.”
Origin of KV cache
Key-Value cache, from database terminology
Related Words
inference
the process of using a trained model to generate predictions or outputs
temperature
a parameter controlling randomness in generation—higher means more creative, lower means more deterministic
sampling
randomly selecting the next token from the probability distribution rather than always choosing the most likely
beam search
a search algorithm that explores multiple candidate sequences simultaneously
greedy decoding
always selecting the highest probability token at each step
top-k sampling
sampling only from the k most likely next tokens