OpenAI Sora: One Step Away From The Matrix

The best text-to-video AI model is also… a world simulator?

Alberto Romero



This article is a selection from The Algorithmic Bridge, an educational project to bridge the gap between AI and people.

Yesterday, OpenAI announced the most important AI model yet in 2024: Sora, a state-of-the-art (SOTA) text-to-video model that can generate high-quality, high-fidelity 1-minute videos with different aspect ratios and resolutions. Calling it SOTA is an understatement; Sora is miles ahead of anything else in the space. It’s general, scalable, and it’s also… a world simulator?

Quick digression: Sorry, Google, Gemini 1.5 was the most important release yesterday — and perhaps of 2024 — but OpenAI didn’t want to give you a single ounce of protagonism (if Jimmy Apples is to be believed, OpenAI had Sora ready since March — what? — which would explain why they manage to be so timely in disrupting competitors’ PR moves). I’ll do a write-up about Gemini 1.5 anyway because although it went under the radar, we shouldn’t ignore a 10M-token context window breakthrough.

Back to Sora. This two-part article is intended for those of you who know nothing about this AI model. It’s also for those of you who watched the cascade of generated videos that flooded the X timeline but didn’t bother to read the post or the report.

In the first part (this one), I review the model and the “technical” report (it deserves to be in quotes) at a high level (will avoid jargon for the most part) and will interleave through the text the best examples I’ve compiled and some insightful comments and hypotheses I’ve read about how Sora was trained and what we can expect in future releases.

Before you ask, OpenAI isn’t releasing Sora at this time (not even as a low-key research preview). The model is going through red-teaming and safety checks. OpenAI wants to gather feedback from “policymakers, educators and artists around the world.” They’re also working on a detection classifier to recognize Sora-made videos and on ways to prevent misinformation.

In the second part (hopefully soon), I’ll share reflections about where I think we’re going both technologically and culturally (there’s optimism but also pessimism). I hope you…