Sycophancy is one form of RLHF induced reward hacking, but reasoning training (R...

Sycophancy is one form of RLHF induced reward hacking, but reasoning training (RLVR) can also induce other forms of reward hacking. OpenAIs models are particularly affected. See https://www.lesswrong.com/posts/rKC4xJFkxm6cNq4i9/reward-hac...