We Are Not Losing Our Voice to AI
People are scared that AI is making the internet flat and boring. They worry it's creating a sea
Hot Takes from Ilya's Interview
Ilya Sutskever, on Dwarkesh Podcast:
But when people do RL training, they do need to think. They say, “Okay, we
The Art of the Productive Failure
I'm starting to believe the most valuable runs are the ones that fail catastrophically. The runs where the
The Frontend Verification Gap for AI Agents
Agentic coding loops are a marvel for backend work. An LLM can write some code, run a test to verify
2025 Thoughts
✦ September
Learnings from Cursor Tab
Cursor's "Tab RL" blog:
To use policy gradient methods to improve Tab, we defined a reward
2025 Thoughts
✦ September
Confidence != Wisdom
The single most dangerous design flaw in today's language models isn't that they hallucinate; it'
2025 Thoughts
✦ September
LLM's Data
We obsess over model architecture and FLOPS, but the most important component of any new language model is the one
The "AI Button"
The new trend of every app getting an "AI button" feels like a solution in search of a
The Browser Window
There seems to be a strange desperation in the air with AI companies trying to buy web browsers. It’s
Benchmark Matters
A recent clarity I’ve had is that the most consequential design document in modern AI research isn’t the
Type 1 / 2 Hard Problems
An underdeveloped meta-skill in AI research is distinguishing between "Type 1 Hard" and "Type 2 Hard"
Keynote / Experiments
It's that time of year again—keynote season. We're about to see a firehose of new