Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sycophancy is one form of RLHF induced reward hacking, but reasoning training (RLVR) can also induce other forms of reward hacking. OpenAIs models are particularly affected. See https://www.lesswrong.com/posts/rKC4xJFkxm6cNq4i9/reward-hac...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: