interpretability

interpretability

/ɪnˌtɜːrprɪtəˈbɪlɪti/

🛡️ AI Safety & Alignment

the ability to understand how a model makes its decisions

Interpretability tools revealed which words the model focused on for its prediction.

Origin: Latin interpretari to explain + -ability