OpenAI’s latest model can generate videos – and they look decent


OpenAI, following in the footsteps of startups like Track and tech giants like Google And Metalaunches into video generation.

OpenAI revealed today Sora, a GenAI model that creates video from text. From a brief or detailed description or still image, Sora can generate 1080p movie-like scenes with multiple characters, different types of movement and background details, OpenAI claims.

Sora can also “expand” existing video clips, doing his best to fill in missing details.

“Sora has a deep understanding of language, allowing it to accurately interpret prompts and generate compelling characters that express vibrant emotions,” OpenAI writes in a blog post. “The model understands not only what the user asked in the prompt, but also how those things exist in the physical world.

Now, there’s a lot of bombast in OpenAI’s demo page for Sora – the statement above being an example. But the handpicked samples of the model TO DO look pretty impressive, at least compared to other text-to-video technologies we’ve seen.

For starters, Sora can generate videos in a range of styles (e.g. photorealistic, animated, black and white, etc.) up to a minute, which is much longer than most text-to-video models. And these videos maintain reasonable consistency in the sense that they don’t always succumb to what I like to call “AI weirdness”, like objects moving in physically impossible directions.

Check out this art gallery tour, all generated by Sora (ignore the grain – compression from my video-to-GIF conversion tool):

OpenAI Sora

Image credits: OpenAI

Or this animation of a flower blooming:

OpenAI Sora

Image credits: OpenAI

I will say that some of Sora’s videos with a humanoid subject—a robot standing against a cityscape, for example, or a person walking down a snowy path—have a video game quality, perhaps because it doesn’t take place not much. background. The strangeness of the AI ​​also manages to infiltrate many clips, such as cars driving in one direction then suddenly reversing or arms melting into a duvet cover.

OpenAI Sora

Image credits: OpenAI

OpenAI – for all its superlatives – recognizes that the model is not perfect. He writes :

“[Sora] may struggle to accurately simulate the physics of a complex scene and may not understand specific cases of cause and effect. For example, a person may bite into a cookie, but subsequently the cookie may not have a bite mark. The model may also confuse the spatial details of a prompt, such as mixing left and right, and may have difficulty accurately describing events that unfold over time, such as following a specific camera path.

OpenAI positions Sora as a research preview, revealing little about the data used to train the model (short of approximately 10,000 hours of “high quality” video) and refraining from making Sora generally available. Its rationale is the potential for abuse; OpenAI rightly points out that bad actors could abuse a model like Sora in multiple ways.

OpenAI says it is working with experts to probe the model for exploits and create tools to detect whether a video was generated by Sora. The company also says that if it chooses to integrate the model into a public-facing product, it will ensure that provenance metadata is included in the generated results.

“We will engage policymakers, educators, and artists around the world to understand their concerns and identify positive use cases for this new technology,” writes OpenAI. “Despite extensive research and testing, we cannot predict all the beneficial ways people will use our technology, nor all the ways they will abuse it. This is why we believe that learning about real-world usage is a critical part of creating and disseminating increasingly secure AI systems over time.


Leave a Comment

Your email address will not be published. Required fields are marked *