
RLHF
/ˌɑːr el eɪtʃ ˈef/
Reinforcement Learning from Human Feedback for training AI
RLHF in a sentence
“RLHF helped the model produce more helpful and harmless responses.”
Origin of RLHF
Acronym combining reinforcement (Latin re- + fortis) + learning + human (Latin humanus) + feedback
Related Words
guardrails
Constraints preventing AI from producing harmful outputs
agent
An AI system that can take actions autonomously to achieve goals
tool use
AI capability to invoke external functions or APIs
agentic workflow
Multi-step AI processes that iterate and self-correct
synthetic data
Artificially generated data used for training AI models
distillation
Training a smaller model to mimic a larger one