Hot Takes from Ilya's Interview
Ilya Sutskever, on Dwarkesh Podcast:
But when people do RL training, they do need to think. They say, “Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.” From what I hear, all the companies have teams that just produce new RL environments and just add it to the training mix. The question is, well, what are those? There are so many degrees of freedom. There is such a huge variety of RL environments you could produce.
I agree. "The real reward hacking is the human researchers who are too focused on the evals."
We spend so much of our time as researchers hill-climbing on benchmarks that are easy to grade: math problems, coding competitions, multiple-choice exams. These are important because they are objective. A correct answer is a correct answer.
But then you see someone on a forum share that an LLM was surprisingly helpful for processing the grief of a divorce. There is no LMSYS for this. You can't write a unit test for emotional support. It’s a use case that is profound in its impact but completely invisible to our current evaluation paradigms.
It’s a classic case of looking for your keys where the light is. We optimize what we can measure, and in doing so, we may be missing the most human applications of this technology. The killer app for AI might not be superhuman coding, but simply being an infinitely patient, non-judgmental listener. Perhaps the most important metric is the one we can’t quantify.