Segue
Segue
Play
iOS
LLM Inference·Artificial Intelligence
KV cache

KV cache

/ˌkeɪ ˈviː ˌkæʃ/

⚡ LLM Inference

cached key-value pairs from previous tokens to speed up autoregressive generation

KV cache in a sentence

“The KV cache avoids recomputing attention for earlier tokens at each step.”

Origin of KV cache

Key-Value cache, from database terminology

Related Words

inference

the process of using a trained model to generate predictions or outputs

temperature

a parameter controlling randomness in generation—higher means more creative, lower means more deterministic

sampling

randomly selecting the next token from the probability distribution rather than always choosing the most likely

beam search

a search algorithm that explores multiple candidate sequences simultaneously

greedy decoding

always selecting the highest probability token at each step

top-k sampling

sampling only from the k most likely next tokens

SegueMaster the art of eloquence
iOS AppWord of the DayContactPrivacyTerms