
layer normalization
/ˈleɪər ˌnɔːrməlaɪˈzeɪʃən/
a technique to stabilize training by normalizing activations across features
layer normalization in a sentence
“Layer normalization helps transformers train more stably on long sequences.”
Origin of layer normalization
Latin norma carpenter's square, rule + -ization
Related Words
token
a sub-word unit that language models process, rather than whole words or characters
tokenization
the process of breaking text into tokens for model processing
attention mechanism
a system that lets each token attend to every other token in context, creating connections between distant parts
transformer
the neural network architecture underlying modern LLMs, based on self-attention
autoregressive generation
producing output one token at a time, where each token depends on all previous tokens
context window
the finite amount of text a model can process at once, including input and output