Segue
Segue
Play
iOS
Model Optimization·Artificial Intelligence
speculative decoding

speculative decoding

/ˈspekjələtɪv diˈkoʊdɪŋ/

⚡ Model Optimization

using a small model to draft tokens for verification by a large model

speculative decoding in a sentence

“Speculative decoding doubled the inference speed without losing quality.”

Origin of speculative decoding

Latin speculari to spy out + decoding

Related Words

KV cache

storing attention calculations to speed up generation

context caching

saving the processed state of a prompt prefix to avoid recomputing it

quantization

reducing the precision of model weights (e.g., to 4-bit) to save memory

LoRA

Low-Rank Adaptation; fine-tuning only a small subset of parameters

distillation

training a smaller 'student' model to mimic a larger 'teacher' model

Mixture of Experts

using multiple specialized sub-models (experts) and routing tokens to them

SegueMaster the art of eloquence
iOS AppWord of the DayContactPrivacyTerms