KV cache

/ˌkeɪ ˈviː ˌkæʃ/

cached key-value pairs from previous tokens to speed up autoregressive generation

KV cache in a sentence

“The KV cache avoids recomputing attention for earlier tokens at each step.”

Key-Value cache, from database terminology

Related Words

inference

the process of using a trained model to generate predictions or outputs

temperature

a parameter controlling randomness in generation—higher means more creative, lower means more deterministic

sampling

randomly selecting the next token from the probability distribution rather than always choosing the most likely

beam search

a search algorithm that explores multiple candidate sequences simultaneously

greedy decoding

always selecting the highest probability token at each step

top-k sampling

sampling only from the k most likely next tokens