Senior Research Scientist @Google DeepMind

Confidence != Wisdom

The single most dangerous design flaw in today's language models isn't that they hallucinate; it's the unearned confidence with which they do it.

The frontier of alignment research isn't just about making models more truthful. It's about making them more humble. An AI that can reliably say, "I'm only 40% sure about this, you should double-check," is infinitely more useful and trustworthy than one that just guesses with perfect conviction. We're building systems that are great at projecting confidence; the real challenge is to build systems capable of demonstrating wisdom.