
speculative decoding
/ˈspekjələtɪv diˈkoʊdɪŋ/
using a small model to draft tokens for verification by a large model
“Speculative decoding doubled the inference speed without losing quality.”
Origin: Latin speculari to spy out + decoding

/ˈspekjələtɪv diˈkoʊdɪŋ/
using a small model to draft tokens for verification by a large model
“Speculative decoding doubled the inference speed without losing quality.”
Origin: Latin speculari to spy out + decoding