FEBRUARY 24, 2016

Automatic regression triaging at Facebook

TOPIC: Dev Tools and Ops, Privacy, Sustainability and Performance

@SCALE SERIES: Performance @Scale

TYPE: video

YEAR: 2016

TAGS: performance

Guilin Chen shifted focus to backend server efficiency. At Facebook’s scale, even small regressions can have major implications for site efficiency. The team pushes massive amounts of code to production every week, and catching regressions early — without slowing down developer speed — is a big challenge. After a quick overview of the Facebook release process, Guilin stepped through the process for identifying and fixing regressions using AutoTriage. The team starts by logging performance-tracking metrics for products that they care about. Once a regression has been observed, the team uses Stack Trace Finder to map the regression to a candidate list of offending functions. The team then uses a tool called Pushed Commit Search to locate all diffs that introduced changes to the offending functions. A Diff Ranker algorithm quickly prioritizes diffs by their likelihood of having introduced the regression. With these steps chained together into the AutoTriage system, the team has largely automated the most tedious aspects of regression analysis

SUBSCRIBE TO @SCALE

TOPICS

Data, Systems and Networking Dev Tools and Ops, Privacy, Sustainability and Performance Fighting Abuse and Security Machine Learning and AI Mobile, Video and Web