Publications

Defining and Characterizing Reward Hacking

Joar Skalse, Nikolaus HR Howe, Dmitrii Krasheninnikov, David Krueger, 2022. (In Advances in Neural Information Processing Systems 35).

URL

No matching items
Back to top