OpenAI’s Sora video generation model can also render video games


The new and first from OpenAI! — video generation model, Sora, can achieve truly impressive cinematic feats. But the model is even more capable than OpenAI initially claimed, at least judging from a technical point paper published this evening.

The paper, titled “Video Generation Models as World Simulators,” co-authored by numerous OpenAI researchers, sheds light on key aspects of Sora’s architecture, revealing for example that Sora can generate videos of arbitrary resolution and aspect ratio (up to 1080p). According to the document, Sora is capable of performing a range of image and video editing tasks, from creating looping videos to extending videos forward or backward in time, to changing the background of an existing video.

But what intrigues this author most is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In one experiment, OpenAI unleashed Sora on Minecraft and had him render the world – and its dynamics, including physics – while simultaneously controlling the player.

OpenAI Minecraft's sister

Sora controls a player in Minecraft – and renders the video game world at the same time. Note that the grain was introduced by a video to GIF conversion tool, not Sora. Image credits: OpenAI

So how is Sora able to do this? Well, like observed by Jim Fan, Principal Researcher at Nvidia (via Quartz), Sora is more of a “data-driven physics engine” than a creative one. It’s not just about generating a single photo or video, but determining the physics of each object in an environment and rendering a photo or video (or an interactive 3D world, as appropriate) based on those calculations.

“These capabilities suggest that continued scaling of video models represents a promising path toward developing high-performance simulators of the physical and digital world, as well as the objects, animals, and people that live there,” write the co -authors.

Now Sora’s usual limitations apply in the video game realm. The model cannot accurately approximate the physics of basic interactions like glass breaking. And even with the interactions can model, Sora is often inconsistent – for example rendering a person eating a hamburger but failing to render the bite marks.

Still, if I read the document correctly, it seems like Sora could pave the way for more realistic – perhaps even photorealistic – procedurally generated games. It’s both exciting and terrifying (think of the implications of deepfakes, for example) – which is probably why OpenAI chose to place Sora behind a very limited access program at this time.

Hoping we will know more as soon as possible.


Leave a Comment

Your email address will not be published. Required fields are marked *