greedy decoding

/ˌɡriːdi dɪˈkoʊdɪŋ/

always selecting the highest probability token at each step

greedy decoding in a sentence

“Greedy decoding is fast but may miss better overall sequences.”

Old English grǣdig voracious + Latin decodare to decipher

Related Words

top-k sampling

sampling only from the k most likely next tokens

nucleus sampling

sampling from tokens comprising the top cumulative probability mass (top-p)

logits

raw, unnormalized scores output by the model before conversion to probabilities

softmax

a function that converts logits into a probability distribution summing to one

KV cache

cached key-value pairs from previous tokens to speed up autoregressive generation

inference

the process of using a trained model to generate predictions or outputs