Open AI Launches Sora, A Revolutionary Artificial Intelligence Video Tool

While the fascination with ChatGPT and generative artificial intelligence language models has not yet faded, OpenAI has just introduced a stunning and revolutionary video creation tool called Sora. To do this, just enter a description of what you want to see on the screen and it's ready, created by artificial intelligence. Some are more successful than others, sometimes they have that video game style that sets them apart from reality, but they are all surprising.

OpenAI CEO Sam Altman announced the launch on the X social network, which was immediately flooded with the new creations. Realistic, futuristic, crazy, cartoon videos… The videos include all kinds of automatic creations made with generative artificial intelligence. Sora is able to create entire videos at once or zoom in on the generated videos to make them longer.

Sora is capable of creating complex scenes with multiple characters, specific types of movement, and precise subject and background details. According to OpenAI, the model understands not only what the user asked for in the request, but also how these things exist in the physical world. The model has a deep understanding of language, allowing it to accurately interpret cues and generate compelling characters that express vivid emotions, the company explains.

“Here is Sora, our video generation model,” Altman wrote. “We offer access to a limited number of creators,” he added, before asking his followers to provide suggestions for creating new videos, in addition to the examples he had already offered on his website.

The instructions can be more or less detailed. One of the examples offered by OpenAI responds to the following description: “An elegant woman walks down a street in Tokyo filled with warm, bright neon lighting and vibrant urban signage.” She is wearing a black leather jacket, a long red dress, black boots, and a black bag. She is wearing sunglasses and red lipstick. Walk with confidence and carefree. The road is wet and reflective, creating a mirror effect of the colored lights. Lots of pedestrians walking around. And the result is surprising.

Another notes: “Trailer for a film about the adventures of the 30-year-old spaceman wearing a red wool knit motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”

Not only is the model capable of generating a video solely from text instructions, but it is also capable of taking an existing still image and generating a video from it, animating the image content with precision and attention to detail. The model can also take an existing video and zoom in or fill in missing frames.

You can ask about content and style and give instructions of all kinds. Altman has released new videos requested by tweeters, proving that the results are immediate. Sora can also create multiple shots within a single generated video while accurately maintaining the characters and visual style.

“We are teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require interaction in the real world,” explains OpenAI at the launch of the new text-video conversion tool. “Sora can create videos up to a minute long while maintaining visual quality and fidelity to user instructions,” he adds.

The tool is initially available to the so-called Red Teams. The members of these teams try to question a product or service, push it to its limits, put it to the test and find its faults as if they were enemies of the company. They have the specific task of assessing critical areas for potential damage or risks. Among them are experts in areas such as misinformation, hateful content and bias.

Open AI also provides access to a range of visual artists, designers and filmmakers to provide feedback on how the model can be improved to make it more useful for creative professionals.

“We share our research progress early to collaborate with people outside of OpenAI and get their feedback, as well as to give the public an idea of the AI capabilities on the horizon,” the company explains. iaaa.

Defects to be polished

The artificial intelligence company itself acknowledges that Sora still has some very obvious flaws. You may have difficulty accurately simulating the physics of a complex scene and may not understand certain cases of cause and effect. He gives the example that a person can bite into a cookie, but the cookie may not have the bite mark.

The model may also confuse spatial details of a cue, e.g. left and right, and have trouble accurately describing events that occur over time, e.g. B. tracking a specific camera path.

Before releasing the tool to the public, OpenAI promises to take some precautions. This includes taking into account the specifications of the red teams. In addition, tools are being developed to detect misleading content, with detectors that can detect when a video was created by Sora. In addition, powerful image classifiers have been developed to check the frames of all generated videos to ensure they comply with usage guidelines before displaying them to the user.

Additionally, you reuse the security methods you created for your products that use DALL-E 3. For example, the text classifier checks and rejects text input requests that violate your usage guidelines, such as those that call for extreme violence or sexual content, hateful images, celebrity images, or third-party intellectual property.

“We will reach out to policymakers, educators and artists around the world to hear their concerns and identify positive use cases for this new technology.” Despite extensive research and testing, we cannot predict how many people will benefit from our technology and how they will abuse them. For this reason, we believe that learning from real-world usage is a fundamental component for creating and deploying increasingly secure AI systems over time,” concludes OpenAI.

You can follow EL PAÍS technology on Facebook and X or sign up here to receive our weekly newsletter.