inference
the process of using a trained model to generate predictions or outputs
“Inference latency determines how quickly the chatbot can respond.”
Origin: Latin inferre `to bring in, conclude` from in- + ferre `to carry`
Loading collection...
How language models generate responses at runtime
the process of using a trained model to generate predictions or outputs
“Inference latency determines how quickly the chatbot can respond.”
Origin: Latin inferre `to bring in, conclude` from in- + ferre `to carry`
a parameter controlling randomness in generation—higher means more creative, lower means more deterministic
“Setting temperature to 0.7 balances creativity with coherence.”
Origin: Latin temperatura `a mixing in due proportion`
randomly selecting the next token from the probability distribution rather than always choosing the most likely
“Top-p sampling only considers tokens whose cumulative probability exceeds a threshold.”
Origin: Old French essample `example` from Latin exemplum
a search algorithm that explores multiple candidate sequences simultaneously
“Beam search with width 5 tracks the five most promising response paths.”
Origin: Old English bēam `tree, ray of light` + Old French cerchier `to search`
always selecting the highest probability token at each step
“Greedy decoding is fast but may miss better overall sequences.”
Origin: Old English grǣdig `voracious` + Latin decodare `to decipher`
sampling only from the k most likely next tokens
“Top-k sampling with k=50 prevents rare, nonsensical tokens from being selected.”
Origin: From statistics: selecting the top k elements
sampling from tokens comprising the top cumulative probability mass (top-p)
“Nucleus sampling with p=0.95 adapts vocabulary size to context certainty.”
Origin: Latin nucleus `kernel` from nux `nut`
raw, unnormalized scores output by the model before conversion to probabilities
“Logits are converted to probabilities using the softmax function.”
Origin: From log-odds in statistics, coined 1944
a function that converts logits into a probability distribution summing to one
“Softmax exponentiates each logit and normalizes so all probabilities sum to 1.”
Origin: soft (smooth approximation) + max (maximum function)
cached key-value pairs from previous tokens to speed up autoregressive generation
“The KV cache avoids recomputing attention for earlier tokens at each step.”
Origin: Key-Value cache, from database terminology
Explore other vocabulary categories in this collection.