OpenAI, the company behind ChatGPT and Dall-E, has just introduced a new artificial intelligence specifically for videos. Called Sora, it represents a major step forward from anything that already exists, but a public launch is not yet planned.
It was just a few weeks ago that Google announced Lumière, an artificial intelligence capable of producing five-second video clips in unprecedented quality. But things are moving faster and faster in this area and a new AI has just beaten Google's AI by a wide margin. Meet OpenAI's Sora, an artificial intelligence capable of generating ultra-realistic videos that can last up to a minute.
From a simple sentence, Sora can generate a complex scene in Full HD (1920 x 1080 pixels) and even simulate multiple shots. The model is based on a transformer-like architecture and represents data in the form of “patches”, which correspond to the tokens of the GPT model. The AI is not limited to textual queries; it can also start from an image or even complete an existing video. The project page contains a large number of generated videos, such as a woman walking through the streets of Tokyo.
Stunning videos but still imperfect
At this point, it's still pretty easy to tell that these are AI-generated videos. Some details show its limitations, such as objects or people in the background magically disappearing or appearing, or giant people in the foreground. And unsurprisingly, Sora has the same problem as all other AIs with generated text, such as on signs. Nevertheless, it represents a stunning advance in the field.
Sora is currently not open to the public. OpenAI says they are working with misinformation, hate content and bias specialists to make the model safer, and are developing tools to identify Sora-generated videos.