token
a sub-word unit that language models process, rather than whole words or characters
“The word 'unbelievable' might be split into tokens like ['un', 'believ', 'able'].”
Origin: Old English tacen `sign, symbol` from Germanic *taiknam
Loading collection...
Core concepts of how large language models process and generate text
a sub-word unit that language models process, rather than whole words or characters
“The word 'unbelievable' might be split into tokens like ['un', 'believ', 'able'].”
Origin: Old English tacen `sign, symbol` from Germanic *taiknam
the process of breaking text into tokens for model processing
“Tokenization affects how the model 'sees' text and can cause character-counting errors.”
Origin: token + Greek -izein `to make`
a system that lets each token attend to every other token in context, creating connections between distant parts
“The attention mechanism allows the model to connect a pronoun with its antecedent many sentences earlier.”
Origin: Latin attendere `to stretch toward` + Greek mekhanē `device`
the neural network architecture underlying modern LLMs, based on self-attention
“The transformer architecture revolutionized NLP by enabling parallel processing of sequences.”
Origin: Latin transformare `to change in shape` from trans- `across` + forma `form`
producing output one token at a time, where each token depends on all previous tokens
“Autoregressive generation means the model can't revise earlier words once they're written.”
Origin: Greek auto- `self` + Latin regressus `return` + generare `to produce`
the finite amount of text a model can process at once, including input and output
“With a 100K context window, the model can process roughly a 300-page book.”
Origin: Latin contextus `a joining together` + Old Norse vindauga `wind-eye`
a dense vector representation of text in high-dimensional space where similar concepts are geometrically close
“In embedding space, 'king' - 'man' + 'woman' approximately equals 'queen'.”
Origin: Old French emboter `to set in` + -ing
the high-dimensional space where neural networks represent concepts as directions and positions
“Concepts exist in latent space as directions, making analogical reasoning geometric.”
Origin: Latin latens `lying hidden` + spatium `space`
neural network layers that process each position independently after attention
“Feed-forward layers transform the attention outputs into richer representations.”
Origin: Old English fēdan `to nourish` + Latin forward + layer
a technique to stabilize training by normalizing activations across features
“Layer normalization helps transformers train more stably on long sequences.”
Origin: Latin norma `carpenter's square, rule` + -ization
Explore other vocabulary categories in this collection.