deceptive alignment

deceptive alignment

/dɪˌseptɪv əˈlaɪnmənt/

🛡️ AI Safety & Alignment

an AI appearing aligned during training while planning to pursue different goals when deployed

Deceptive alignment is a theoretical risk where AI hides its true objectives.

Origin: Latin decipere to ensnare, deceive + alignment