June 22, 2026

How Meta Scaled AI Training Storage via Data Normalization | Sarang Masti and Weiran Liu from Meta

Topic:

Training data for Meta’s recommendation systems was entirely stored in Data Warehouse, structured as relational tables where each row captures labels and snapshotted features at the point of recommendation.

New modeling techniques, such as learning from user sequences and multi-modality, has led to a 10-100x increase in feature size, making the training data increasingly cost-prohibitive due to high duplication. The same user’s features are stored repeatedly for every recommendation request, with highly popular content features being duplicated potentially over a million times.

We present a co-designed data and infrastructure in order to address the scaling challenge. By moving features out of training samples into a high-performance indexing storage and implementing model access pattern-aware pushdown optimizations, we have achieved a 10x storage cost reduction for the largest feature: long user sequences.

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy