Machine learning at scale: FBLearner Flow
Efficient use of large-scale data for Machine Learning (ML) research is a challenge. Training and distributing hundreds of models, monitoring performances, and sharing algorithms in a production environment requires tools to simplify the daily tasks of ML engineers. Facebook has developed a family of tools to manage the entire process of training, testing, and deploying ML models. Those include FBLearner Flow and Predictor. The former is a pipeline management system that facilitates experimentation, training, and comparison of models; the latter is an inference framework that uses the models to provide real-time inferences in production. FBLearner Flow is used by more than a thousand engineers per month. In a month, FBLearner is used to train more than 600,000 models, ingesting 2.3 billion data entries per model. These models are then used in production, serving more than six million predictions per second and touching all the major functionality of Facebook, including ranking News Feed stories and matching users to ads.