Skip to main content

Text-to-video AI inches closer as startup Runway announces new model

Text-to-video AI inches closer as startup Runway announces new model

/

Text-to-video is the next frontier for generative AI, though current output is rudimentary. Runway says it’ll be making its new generative video model, Gen-2, available to users in ‘the coming weeks.’

Share this story

AI-generated footage of “A shot following a hiker through jungle brush.” The video is not photorealistic and appears smeared and blurry.
An example video generated by Runway’s Gen-2 model. The text input prompt was “A shot following a hiker through jungle brush.”
Image: Runway

Text-to-image AI is mainstream now, but just waiting in the wings is text-to-video. The pitch for this technology is that you’ll be able to type a description and generate a corresponding video in any style you like. Current capabilities lag behind this dream, but for those tracking the tech’s progress, an announcement today by AI startup Runway of a new AI video generation model is noteworthy nonetheless.

Runway offers a web-based video editor that specializes in AI tools like background removal and pose detection. The company helped develop open-source text-to-image model Stable Diffusion and announced its first AI video editing model, Gen-1, in February.

Gen-1 focused on transforming existing video footage, letting users input a rough 3D animation or shaky smartphone clip and apply an AI-generated overlay. In the clip below, for example, footage of cardboard packaging is paired with an image of an industrial factory to produce a clip that could be used for storyboarding or pitching a more polished feature.

Gen-2, by comparison, seems more focused on generating videos from scratch, though there are lots of caveats to note. First, the demo clips shared by Runway are short, unstable, and certainly not photorealistic, and second, access is limited. Bloomberg News reports that users will have to sign up to join a waitlist for Gen-2 via Runway’s Discord, and a spokesperson for the company, Kelsey Rondenet, told The Verge that Runway will be “providing broad access in the coming weeks.”

In other words, all we have to judge Gen-2 right now is a demo reel and a handful of clips (most of which were already being advertised as part of Gen-1).

Close-up footage of an eye.
AI video generated using Gen-2 with the prompt “A close-up of an eye.”
Image: Runway
AI generated video of “An aerial shot of a mountain landscape.”
AI-generated video using the prompt “An aerial shot of a mountain landscape.”
Image: Runway
An AI generated video showing sunlight flickering behind a window in an urban apartment.
AI-generated video using the prompt “Sunset through a window in a New York apartment.”
Image: Runway

Still, the results are fascinating, and the prospect of text-to-video AI is certainly intoxicating — promising both new creative opportunities and new threats for misinformation, etc. It’s also worth comparing Runway’s work with text-to-video research shared by behemoths like Meta and Google. The work by these companies is more advanced (their AI-generated clips are longer and more cohesive) but not in a way that necessarily reflects these firms’ massive resources. (Runway, by comparison, is only a 45-person team.)

In other words: startups continue to do exciting work in generative AI, including the still-unexplored territory of text-to-video. Watch for more soon, AI-generated or not.