Agentic Observability – Making LLM Apps Debuggable, Trustworthy, and Scalable

As LLM applications evolve into multi-agent systems and power complex decision-making workflows, the ability to observe and debug their behavior becomes a core engineering challenge. These systems are dynamic, non-deterministic, and increasingly reliant on external tools and APIs making traditional monitoring approaches insufficient. At Fiddler, we’ve worked with enterprise and federal teams deploying LLMs at scale, and what we’ve consistently seen is the absence of effective observability creates blind spots that delay iteration and introduce risk. In this talk, we will introduce Agentic Observability, a set of techniques and infrastructure to monitor production LLM systems. We will walk through how we trace agent reasoning and tool usage in structured form, apply Fast Trust Models to evaluate output quality beyond token-level accuracy, and monitor shifts in behavior using statistical and embedding-based methods. We will also share how we enable integration testing for agent workflows by simulating decision paths and validating semantic intent, all while operating under the scale and latency constraints of modern AI stacks. This work bridges AI science, platform engineering, and real-world GenAI deployment. We will highlight engineering lessons learned from high-scale environments, and how these observability tools are helping teams move faster, catch failures earlier, and build AI systems that can be trusted in production.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy