At Expedia Group we are building our on-road experience that includes a common runtime compute platform for a target scale of more than 15 thousand applications, running across a fleet of tens of thousands of nodes on Kubernetes.
In this talk we will present our chaos engineering platform, a part of our platform on-road experience, which aims to enable execution of chaos experiments for thousands of engineers. We will touch upon the importance of a great developer experience, scaling the platform through integrations with continuous delivery mechanisms, and operational aspects such as monitoring and runbooks. We will also present our learnings from promoting the platform through GameDays, byte-size videos, and success stories. Finally, we will demonstrate our recent work on closing the feedback loop between reliability best practices and tools through our reliability hub.