Movie Gen: A Cast of Media-Generation Foundation Models

These days, people expect great product experiences - wherever, whenever, and on whatever platform or device they have in front of them. Products and applications must work seamlessly under the hood to make complex engineering feats transparent to the people using them. This theme explores the technologies and best practices that enable amazing product experiences for millions or billions of people from around the world.

Ishan Misra

TOPIC: Mobile, Video and Web

@SCALE SERIES: Mobile, Video and Web

TYPE: ARTICLE

YEAR: 2024

TAGS:

Humans communicate using a rich variety of digital media—text, images, videos, audio.

Movie Gen is a cast of media-generation foundation models that enables users to use simple text inputs to generate high-quality videos, personalize or edit them, and add audio. When the generations are evaluated by humans, on all of these tasks Movie Gen establishes new state-of-the-art performance compared to existing solutions.

Movie Gen builds upon Meta’s track record of foundational research in this space: See the Make-A-Scene models that enabled generation of image, audio, video, and 3D animation. Our second wave of work with Llama Image foundation models enabled higher-quality generation of images and video, as well as image editing. Movie Gen builds upon these advances while enabling higher-quality outputs and finer-grained control.

We have piloted Movie Gen with Hollywood creatives [3] who have found this to be a useful collaborative tool. You can read more about Movie Gen on our blog [2] and technical paper [4], and see examples on our website [1].

Resources

[1] https://ai.meta.com/research/movie-gen/

[2] https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/

[3] https://ai.meta.com/blog/movie-gen-video-sound-generation-blumhouse/

[4] https://arxiv.org/abs/2410.13720