RLHF – Reinforcement Learning from Human Feedback
(RL) Reinforcement Learning is the science of decision making. It is about learning the optimal behavior in an environment to obtain maximum reward. RLHF uses “human feedback” as a measure of performance and quality to optimize the model. In brief RLHF uses methods from reinforcement learning (RL) to directly optimize a language model with human feedback. RLHF‘s most recent success story is its use in ChatGPT.
ChatGPT, modeled like InstructGPT is trained to follow instructions in a prompt and provide detailed responses. However, the difference between the two models is that the former adds interaction with human agents that makes it better at aligning with human preferences.