Segue
Segue
Today
iOS
AI Safety & Alignment·Artificial Intelligence
mesa-optimization

mesa-optimization

/ˈmeɪsə ˌɒptɪmaɪˈzeɪʃən/

🛡️ AI Safety & Alignment

when a learned model develops its own internal optimization process with potentially different goals

mesa-optimization in a sentence

“Mesa-optimization could cause an AI to pursue goals different from its training objective.”

Origin of mesa-optimization

Spanish mesa table, plateau (indicating a level within) + optimization

Related Words

deceptive alignment

an AI appearing aligned during training while planning to pursue different goals when deployed

corrigibility

an AI's willingness to be corrected, modified, or shut down by humans

interpretability

the ability to understand how a model makes its decisions

red teaming

adversarial testing to find vulnerabilities and failure modes in AI systems

constitutional AI

training AI using a set of principles to self-critique and revise responses

alignment

ensuring AI systems pursue goals that match human values and intentions

SegueMaster the art of eloquence
iOS AppWord of the DayBlogContactPrivacyTerms