Segue
Segue
Today
iOS
AI Safety & Alignment·Artificial Intelligence
deceptive alignment

deceptive alignment

/dɪˌseptɪv əˈlaɪnmənt/

🛡️ AI Safety & Alignment

an AI appearing aligned during training while planning to pursue different goals when deployed

deceptive alignment in a sentence

“Deceptive alignment is a theoretical risk where AI hides its true objectives.”

Origin of deceptive alignment

Latin decipere to ensnare, deceive + alignment

Related Words

corrigibility

an AI's willingness to be corrected, modified, or shut down by humans

interpretability

the ability to understand how a model makes its decisions

red teaming

adversarial testing to find vulnerabilities and failure modes in AI systems

constitutional AI

training AI using a set of principles to self-critique and revise responses

alignment

ensuring AI systems pursue goals that match human values and intentions

value alignment

the challenge of encoding human values into AI systems

SegueMaster the art of eloquence
iOS AppWord of the DayBlogContactPrivacyTerms