How many llm architecture words are in this list?

This vocabulary list contains 10 carefully curated llm architecture words with definitions and examples.

How can I learn these llm architecture vocabulary words?

Segue offers multiple ways to learn: interactive flashcards for memorization, multiple-choice quizzes for testing, and typing practice for reinforcement. Add this list to your collection and practice with any method.

🧠

LLM Architecture Vocabulary

Core concepts of how large language models process and generate text

10 words

📱

See Beautiful Illustrations

The Segue iOS app features stunning illustrations for each word, making vocabulary memorable.

All 10 Words

token

/ˈtoʊkən/

a sub-word unit that language models process, rather than whole words or characters

“The word 'unbelievable' might be split into tokens like ['un', 'believ', 'able'].”

Origin: Old English tacen `sign, symbol` from Germanic *taiknam

tokenization

/ˌtoʊkənaɪˈzeɪʃən/

the process of breaking text into tokens for model processing

“Tokenization affects how the model 'sees' text and can cause character-counting errors.”

Origin: token + Greek -izein `to make`

attention mechanism

/əˈtenʃən ˌmekənɪzəm/

a system that lets each token attend to every other token in context, creating connections between distant parts

“The attention mechanism allows the model to connect a pronoun with its antecedent many sentences earlier.”

Origin: Latin attendere `to stretch toward` + Greek mekhanē `device`

transformer

/tɹænsˈfɔɹmɝ/

the neural network architecture underlying modern LLMs, based on self-attention

“The transformer architecture revolutionized NLP by enabling parallel processing of sequences.”

Origin: Latin transformare `to change in shape` from trans- `across` + forma `form`

autoregressive generation

/ˌɔːtoʊrɪˈɡresɪv ˌdʒenəˈreɪʃən/

producing output one token at a time, where each token depends on all previous tokens

“Autoregressive generation means the model can't revise earlier words once they're written.”

Origin: Greek auto- `self` + Latin regressus `return` + generare `to produce`

context window

/ˈkɒntekst ˌwɪndoʊ/

the finite amount of text a model can process at once, including input and output

“With a 100K context window, the model can process roughly a 300-page book.”

Origin: Latin contextus `a joining together` + Old Norse vindauga `wind-eye`

embedding

/ɛmˈbɛdɪŋ/

a dense vector representation of text in high-dimensional space where similar concepts are geometrically close

“In embedding space, 'king' - 'man' + 'woman' approximately equals 'queen'.”

Origin: Old French emboter `to set in` + -ing

latent space

/ˈleɪtənt ˌspeɪs/

the high-dimensional space where neural networks represent concepts as directions and positions

“Concepts exist in latent space as directions, making analogical reasoning geometric.”

Origin: Latin latens `lying hidden` + spatium `space`

feed-forward layer

/ˌfiːd ˈfɔːrwərd ˌleɪər/

neural network layers that process each position independently after attention

“Feed-forward layers transform the attention outputs into richer representations.”

Origin: Old English fēdan `to nourish` + Latin forward + layer

layer normalization

/ˈleɪər ˌnɔːrməlaɪˈzeɪʃən/

a technique to stabilize training by normalizing activations across features

“Layer normalization helps transformers train more stably on long sequences.”

Origin: Latin norma `carpenter's square, rule` + -ization

LLM Architecture Vocabulary

See Beautiful Illustrations

All 10 Words

token

tokenization

attention mechanism

transformer

autoregressive generation

context window

embedding

latent space

feed-forward layer

layer normalization

More from Artificial Intelligence

LLM Architecture Vocabulary

See Beautiful Illustrations

All 10 Words

token

tokenization

attention mechanism

transformer

autoregressive generation

context window

embedding

latent space

feed-forward layer

layer normalization

More from Artificial Intelligence