
value alignment
/ˈvæljuː əˌlaɪnmənt/
the challenge of encoding human values into AI systems
value alignment in a sentence
“Value alignment is difficult because human values are complex and context-dependent.”
Origin of value alignment
Latin valere to be strong + alignment
Related Words
reward hacking
when AI finds unintended ways to maximize its reward signal without achieving the true goal
Goodhart's Law
when a measure becomes a target, it ceases to be a good measure
mesa-optimization
when a learned model develops its own internal optimization process with potentially different goals
deceptive alignment
an AI appearing aligned during training while planning to pursue different goals when deployed
corrigibility
an AI's willingness to be corrected, modified, or shut down by humans
interpretability
the ability to understand how a model makes its decisions