Movie Gen: A Cast of Media-Generation Foundation Models

Designed for engineers that develop or manage large-scale video systems serving millions of people. The development of large-scale video systems includes complex, unprecedented engineering challenges.

Humans communicate using a rich variety of digital media—text, images, videos, audio.

Movie Gen is a cast of media-generation foundation models that enables users to use simple text inputs to generate high-quality videos, personalize or edit them, and add audio. When the generations are evaluated by humans, on all of these tasks Movie Gen establishes new state-of-the-art performance compared to existing solutions. 

Movie Gen builds upon Meta’s track record of foundational research in this space: See the Make-A-Scene models that enabled generation of image, audio, video, and 3D animation. Our second wave of work with Llama Image foundation models enabled higher-quality generation of images and video, as well as image editing. Movie Gen builds upon these advances while enabling higher-quality outputs and finer-grained control. 

We have piloted Movie Gen with Hollywood creatives [3] who have found this to be a useful collaborative tool. You can read more about Movie Gen on our blog [2] and technical paper [4], and see examples on our website [1]. 

Resources

[1] https://ai.meta.com/research/movie-gen/

[2] https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/

[3] https://ai.meta.com/blog/movie-gen-video-sound-generation-blumhouse/

[4] https://arxiv.org/abs/2410.13720

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy