June 22, 2026

Observability: Role of Evals, Benchmarks & Data in Frontier AI | Alex Ratner from Snorkel AI

Topic:

The excitement around agentic AI is real — backed by quantitative progress on model cards and genuine leaps in capability. But our ability to measure AI has been outpaced by our ability to develop it, and closing this evaluation gap is one of the most important problems facing the field. More enduring benchmarks are needed to advance the next vectors of capability and chart the path to reliable agents.

In this talk, Snorkel AI Co-Founder and CEO Alex Ratner will share insights from major research and benchmark collaborations on agentic coding and continual learning, along with practical tips from working with global frontier labs and leading academics. He’ll focus on three dimensions where today’s models most often break down, and where the next generation of benchmarks will need to deliver real signal: environment complexity (how dynamic and rich the operating world is), autonomy horizon (how far an agent can act independently), and output complexity (how sophisticated and verifiable the deliverable is).

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy