speculative decoding

speculative decoding

/ˈspekjələtɪv diˈkoʊdɪŋ/

Model Optimization

using a small model to draft tokens for verification by a large model

Speculative decoding doubled the inference speed without losing quality.

Origin: Latin speculari to spy out + decoding