Senior Research Scientist @Google DeepMind

Learnings from Cursor Tab

Some notes on Cursor's "Tab RL" blog:

To use policy gradient methods to improve Tab, we defined a reward that encourages accepted suggestions while discouraging showing suggestions to the user that aren’t accepted. Let’s say we want the model to show a suggestion if its chance of being accepted is at least 25%. Then we could assign a reward of 0.75 for accepted suggestions, a reward of -0.25 for rejected suggestions...

This is a beautiful, concrete example of translating a product goal ("don't be annoying") into a mathematical objective the model can optimize for. The art of applied RL is all in the design of the reward function. This isn't just about predicting the next token; it's about teaching the model to have good judgment.

This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes.

And here’s the payoff for the user. It's a classic "less is more" design win. The best AI assistance isn't the one that's constantly chattering; it's the one that speaks up only when it has something truly useful to say. This result—less noise, more signal—is a direct consequence of the elegant engineering choices they made. It's a sign of a product team that deeply understands the user's flow state.