Segue
Segue
Today
iOS
AI Safety & Alignment·Artificial Intelligence
corrigibility

corrigibility

/ˌkɒrɪdʒɪˈbɪlɪti/

🛡️ AI Safety & Alignment

an AI's willingness to be corrected, modified, or shut down by humans

corrigibility in a sentence

“A corrigible AI would allow humans to fix its mistakes without resistance.”

Origin of corrigibility

Latin corrigere to make straight, correct + -ibility

Related Words

interpretability

the ability to understand how a model makes its decisions

red teaming

adversarial testing to find vulnerabilities and failure modes in AI systems

constitutional AI

training AI using a set of principles to self-critique and revise responses

alignment

ensuring AI systems pursue goals that match human values and intentions

value alignment

the challenge of encoding human values into AI systems

reward hacking

when AI finds unintended ways to maximize its reward signal without achieving the true goal

SegueMaster the art of eloquence
iOS AppWord of the DayBlogContactPrivacyTerms