Loading collection...
Loading collection...
Core concepts of how large language models process and generate text

a sub-word unit that language models process, rather than whole words or characters
“The word 'unbelievable' might be split into tokens like ['un', 'believ', 'able'].”

the process of breaking text into tokens for model processing
“Tokenization affects how the model 'sees' text and can cause character-counting errors.”

a system that lets each token attend to every other token in context, creating connections between distant parts
“The attention mechanism allows the model to connect a pronoun with its antecedent many sentences earlier.”

the neural network architecture underlying modern LLMs, based on self-attention
“The transformer architecture revolutionized NLP by enabling parallel processing of sequences.”

producing output one token at a time, where each token depends on all previous tokens
“Autoregressive generation means the model can't revise earlier words once they're written.”

the finite amount of text a model can process at once, including input and output
“With a 100K context window, the model can process roughly a 300-page book.”

a dense vector representation of text in high-dimensional space where similar concepts are geometrically close
“In embedding space, 'king' - 'man' + 'woman' approximately equals 'queen'.”

the high-dimensional space where neural networks represent concepts as directions and positions
“Concepts exist in latent space as directions, making analogical reasoning geometric.”

neural network layers that process each position independently after attention
“Feed-forward layers transform the attention outputs into richer representations.”

a technique to stabilize training by normalizing activations across features
“Layer normalization helps transformers train more stably on long sequences.”
Explore other vocabulary categories in this collection.