RLHF

/ˌɑːr el eɪtʃ ˈef/

Reinforcement Learning from Human Feedback for training AI

RLHF in a sentence

“RLHF helped the model produce more helpful and harmless responses.”

Acronym combining reinforcement (Latin re- + fortis) + learning + human (Latin humanus) + feedback

Related Words

guardrails

Constraints preventing AI from producing harmful outputs

agent

An AI system that can take actions autonomously to achieve goals

tool use

AI capability to invoke external functions or APIs

agentic workflow

Multi-step AI processes that iterate and self-correct

synthetic data

Artificially generated data used for training AI models

distillation

Training a smaller model to mimic a larger one