Learnings from Cursor Tab
Some notes on Cursor's "Tab RL" blog:
To use policy gradient methods to improve Tab, we defined a reward that encourages accepted suggestions while discouraging showing suggestions to the user that aren’t accepted. Let’s say we want the model to show a suggestion if its chance of being accepted is at least 25%. Then we could assign a reward of 0.75 for accepted suggestions, a reward of -0.25 for rejected suggestions...
This is a beautiful, concrete example of translating a product goal ("don't be annoying") into a mathematical objective the model can optimize for. The art of applied RL is all in the design of the reward function. This isn't just about predicting the next token; it's about teaching the model to have good judgment.
This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes.
And here’s the payoff for the user. It's a classic "less is more" design win. The best AI assistance isn't the one that's constantly chattering; it's the one that speaks up only when it has something truly useful to say. This result—less noise, more signal—is a direct consequence of the elegant engineering choices they made. It's a sign of a product team that deeply understands the user's flow state.