Senior Research Scientist @Google DeepMind

The Frontend Verification Gap for AI Agents

Agentic coding loops are a marvel for backend work. An LLM can write some code, run a test to verify its work, see the result, and iterate. It's a clean, closed loop of logical feedback.

But for the front-end? It’s a mess.

This is the real hurdle for AI code generation. How does an agent truly verify a UI change? It can run a test to confirm a button exists in the DOM, but can it tell if the padding is off? If the color clashes with the brand palette? If the new animation feels janky? We have decades of practice in verifying logical correctness with automated tests. We have almost no practice in programmatically verifying aesthetic correctness.

It’s the difference between a system that can pass a unit test and one that can pass a design review. Until we can bridge that perception gap, AI agents will remain brilliant backend engineers who are frustratingly poor product builders.