In this talk we will show how we implemented a media processing pipeline to perform (autodub / lipsync) media inference at Meta scale.
We will focus on the challenges we faced from a media processing / scaling point of view, such as: inference latency and scheduling, voice isolation, media timing/alignment, alternate tracks delivery, instrumentation, model evaluation, etc.