The problem of deep learning and building large scale systems for production is not just one of model training, but data preprocessing as well. At production scale, just the data loading and processing part of the system can cause significant friction and consume your engineers’ time, while still being non-performant as more and more data is used. We provide an overview of the top pain points that are normally faced in this space. With these pain points in mind, we’ve created two libraries that solve different parts of the data workflow, TorchData to make pipeline creation composable, easy to use, and flexible simplifying the path from research to production, and TorchArrow a DataFrame library that allows for scale through the use of high performance execution runtimes built on the Arrow memory format. We’ll step through the out of the box offerings with our open-sourced TorchData and TorchArrow APIs and building blocks, and provide a real world case study that shows how we’ve made data preprocessing performant at scale within Meta. Lastly, we’ll give a peek into upcoming work as we continue to develop and share our learnings with the open source community.
- WATCH NOW
- VIEW EVENTS
- 2023
- JANUARY
- No Events
- FEBRUARY
- no events
- MARCH
- RTC @Scale 2023
- April
- no events
- May
- AI Infra @Scale
- June
- no events
- July
- Systems @Scale Summer 2023
- August
- Product @Scale 2023
- September
- Networking @Scale 2023
- Reliability @Scale 2023
- October
- Mobile @Scale 2023
- November
- Video @Scale 2023
- December
- Systems @Scale Winter 2023
- 2022
- January
- no events
- February
- RTC @Scale 2022
- March
- Systems @Scale Spring 2022
- April
- Product @Scale Spring 2022
- May
- Data @Scale Spring 2022
- June
- Systems @Scale Summer 2022
- Networking @Scale Summer 2022
- July
- no events
- August
- Reliability @Scale Summer 2022
- September
- AI @Scale 2022
- October
- no events
- November
- Networking @Scale Fall 2022
- Video @Scale Fall 2022
- December
- Systems @Scale Winter 2022
- 2021
- 2020
- January
- no events
- February
- no events
- March
- no events
- April
- no events
- May
- no events
- June
- no events
- July
- no events
- August
- Systems @Scale Remote Edition — Summer 2020
- September
- no events
- October
- no events
- November
- Performance @Scale NY 2020
- Keeping the Lights On @Scale
- AI @Scale 2020
- December
- no events
- 2019
- January
- no events
- February
- no events
- March
- no events
- April
- no events
- May
- no events
- June
- Performance @Scale 2019
- Systems @Scale Summer 2019
- July
- no events
- August
- no events
- September
- Networking @Scale California 2019
- Systems @Scale Fall 2019
- Video @Scale 2019
- October
- The @Scale Conference 2019
- November
- Fighting Abuse @Scale 2019
- Systems @Scale Tel Aviv Fall 2019
- Networking @Scale Boston 2019
- December
- no events
- 2018
- January
- Android @Scale 2018
- February
- no events
- March
- Performance @Scale 2018
- April
- Video @Scale 2018
- Fighting Abuse @Scale 2018
- May
- Networking @Scale 2018
- June
- no events
- July
- Systems @Scale Summer 2018
- August
- no events
- September
- The @Scale Conference 2018
- October
- Data @Scale Boston 2018
- November
- Mobile @Scale Tel Aviv 2018
- December
- no events
- 2017
- January
- no events
- February
- Machine Learning @Scale 2017
- Video @Scale 2017
- March
- no events
- April
- no events
- May
- Dev Tools @Scale 2017
- Networking @Scale 2017
- June
- Data @Scale 2017
- July
- no events
- August
- The @Scale Conference 2017
- September
- no events
- October
- Mobile @Scale Boston 2017
- November
- no events
- December
- no events
- 2016
- January
- Video @Scale 2016
- February
- Performance @Scale 2016
- March
- Mobile @Scale 2016
- April
- no events
- May
- Networking @Scale 2016
- June
- Data @Scale 2016
- July
- no events
- August
- The @Scale Conference 2016
- September
- no events
- October
- Boston Networking @Scale 2016
- November
- Spam Fighting 2016
- December
- no events
- 2015
- 2023
- DIVIDER
- EXPLORE TOPICS
- MACHINE LEARNING AND AI
- Data, Systems, and Networking
- MOBILE, VIDEO, AND WEB
- DEV TOOLS AND OPS, PRIVACY, SUSTAINABILITY, AND PERFORMANCE
- Fighting Abuse and Security
- DIVIDER
- Annual @Scale Conference
- Blog
- Community Forum
- Speaker Submissions
- About @Scale