How many llm inference words are in this list?

This vocabulary list contains 10 carefully curated llm inference words with definitions and examples.

How can I learn these llm inference vocabulary words?

Segue offers multiple ways to learn: interactive flashcards for memorization, multiple-choice quizzes for testing, and typing practice for reinforcement. Add this list to your collection and practice with any method.

⚡

LLM Inference Vocabulary

How language models generate responses at runtime

10 words

📱

See Beautiful Illustrations

The Segue iOS app features stunning illustrations for each word, making vocabulary memorable.

All 10 Words

inference

/ˈɪnfɝəns/

the process of using a trained model to generate predictions or outputs

“Inference latency determines how quickly the chatbot can respond.”

Origin: Latin inferre `to bring in, conclude` from in- + ferre `to carry`

temperature

/ˈtɛmpɝətʃɝ/

a parameter controlling randomness in generation—higher means more creative, lower means more deterministic

“Setting temperature to 0.7 balances creativity with coherence.”

Origin: Latin temperatura `a mixing in due proportion`

sampling

/ˈsæmpɫɪŋ/

randomly selecting the next token from the probability distribution rather than always choosing the most likely

“Top-p sampling only considers tokens whose cumulative probability exceeds a threshold.”

Origin: Old French essample `example` from Latin exemplum

beam search

/ˈbiːm ˌsɜːrtʃ/

a search algorithm that explores multiple candidate sequences simultaneously

“Beam search with width 5 tracks the five most promising response paths.”

Origin: Old English bēam `tree, ray of light` + Old French cerchier `to search`

greedy decoding

/ˌɡriːdi dɪˈkoʊdɪŋ/

always selecting the highest probability token at each step

“Greedy decoding is fast but may miss better overall sequences.”

Origin: Old English grǣdig `voracious` + Latin decodare `to decipher`

top-k sampling

/ˌtɒp ˈkeɪ ˌsæmplɪŋ/

sampling only from the k most likely next tokens

“Top-k sampling with k=50 prevents rare, nonsensical tokens from being selected.”

Origin: From statistics: selecting the top k elements

nucleus sampling

/ˈnjuːkliəs ˌsæmplɪŋ/

sampling from tokens comprising the top cumulative probability mass (top-p)

“Nucleus sampling with p=0.95 adapts vocabulary size to context certainty.”

Origin: Latin nucleus `kernel` from nux `nut`

logits

/ˈloʊdʒɪts/

raw, unnormalized scores output by the model before conversion to probabilities

“Logits are converted to probabilities using the softmax function.”

Origin: From log-odds in statistics, coined 1944

softmax

/ˈsɔːftmæks/

a function that converts logits into a probability distribution summing to one

“Softmax exponentiates each logit and normalizes so all probabilities sum to 1.”

Origin: soft (smooth approximation) + max (maximum function)

KV cache

/ˌkeɪ ˈviː ˌkæʃ/

cached key-value pairs from previous tokens to speed up autoregressive generation

“The KV cache avoids recomputing attention for earlier tokens at each step.”

Origin: Key-Value cache, from database terminology

LLM Inference Vocabulary

See Beautiful Illustrations

All 10 Words

inference

temperature

sampling

beam search

greedy decoding

top-k sampling

nucleus sampling

logits

softmax

KV cache

More from Artificial Intelligence

LLM Inference Vocabulary

See Beautiful Illustrations

All 10 Words

inference

temperature

sampling

beam search

greedy decoding

top-k sampling

nucleus sampling

logits

softmax

KV cache

More from Artificial Intelligence