SEPTEMBER 26, 2023

Large Language Models for Automatic Cloud Incident Management

Building reliable hyper-scale cloud services can be challenging. We need to quickly detect, analyze and mitigate incidents, which largely rely on human effort today. Recent breakthroughs in Large-Language Models (LLMs) have motivated us to explore their potential for automated incident diagnosis. By leveraging LLMs, we aim to accelerate the incident resolution process, leading to improved service reliability and better customer experience. For the first time, we have demonstrated the effectiveness of LLMs in improving cloud reliability. In this talk, we will share our findings, research innovations, and visions in this space.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy