
greedy decoding
/ˌɡriːdi dɪˈkoʊdɪŋ/
always selecting the highest probability token at each step
greedy decoding in a sentence
“Greedy decoding is fast but may miss better overall sequences.”
Origin of greedy decoding
Old English grǣdig voracious + Latin decodare to decipher
Related Words
top-k sampling
sampling only from the k most likely next tokens
nucleus sampling
sampling from tokens comprising the top cumulative probability mass (top-p)
logits
raw, unnormalized scores output by the model before conversion to probabilities
softmax
a function that converts logits into a probability distribution summing to one
KV cache
cached key-value pairs from previous tokens to speed up autoregressive generation
inference
the process of using a trained model to generate predictions or outputs