
constitutional AI
/ˌkɒnstɪˈtuːʃənəl ˌeɪ ˈaɪ/
training AI using a set of principles to self-critique and revise responses
constitutional AI in a sentence
“Constitutional AI helped the model refuse harmful requests while remaining helpful.”
Origin of constitutional AI
Latin constitutio establishing + AI
Related Words
alignment
ensuring AI systems pursue goals that match human values and intentions
value alignment
the challenge of encoding human values into AI systems
reward hacking
when AI finds unintended ways to maximize its reward signal without achieving the true goal
Goodhart's Law
when a measure becomes a target, it ceases to be a good measure
mesa-optimization
when a learned model develops its own internal optimization process with potentially different goals
deceptive alignment
an AI appearing aligned during training while planning to pursue different goals when deployed