
tokenization
/ˌtoʊkənaɪˈzeɪʃən/
Breaking text into smaller units for processing
tokenization in a sentence
“Tokenization splits sentences into words or subword pieces.”
Origin of tokenization
From Old French token (sign, symbol), from Old English tacn; -ization from Greek suffix
Related Words
attention mechanism
A technique allowing models to focus on relevant parts of input
gradient descent
An optimization algorithm that minimizes error iteratively
overfitting
When a model learns noise instead of the underlying pattern
underfitting
When a model is too simple to capture the underlying pattern
hyperparameter
A parameter set before training begins, not learned from data
epoch
One complete pass through the entire training dataset