2 Comments

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2307.15217

Expand full comment

Is RLHF More Difficult than Standard RL?

https://arxiv.org/pdf/2306.14111.pdf

“This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs. Specifically, (1) for preferences that are drawn from reward-based probabilistic models, we reduce the problem to robust reward-based RL that can tolerate small errors in rewards; (2) for general arbitrary preferences where the objective is to find the von Neumann winner, we reduce the problem to multiagent reward-based RL which finds Nash equilibria for factored Markov games under a restricted set of policies. ”

Expand full comment