A first online tool enables the generation of videos from text based on the model of Midjourney and image generation AIs. But at the moment the result is still patchy.
If you’re still amazed by the answers from ChatGPT or the graphics generated by Midjourney, you might be a little more surprised by the new level of artificial intelligence: generating videos from plain text. This is what the Modelscope tool, which is still in its infancy, offers.
Based on the same principle as the other AIs, Modelscope makes it possible to create short videos from a “prompt”, i.e. a written instruction. And so the first idea from a user of the Reddit forum was to get actor Will Smith to eat spaghetti. And the result, seen more than 4 million times on Twitter, is frankly terrifying.
Another user took up the same idea, this time with actress Scarlett Johansson, and laboriously ate spaghetti again.
In fact, there seems to be a netizen obsession for celebrities eating everything from pizza to cake. Modelscope and its obvious flaws still give them a very sketchy or even nightmarish look.
Many generated videos show a Shutterstock watermark, suggesting that the tool pulls basic visuals from online image and video databases to work.
At the moment Modelscope can be used via the HuggingFace platform but is at full capacity. In fact, other models are currently in use, but none are sufficiently mature, which was also the case with Midjourney or Stable Diffusion a year ago.
OpenAI, the creator of ChatGPT and Dall-E, is also working on a similar AI that should show much more convincingly what generative AI can do in this area.
Thomas Leroy Journalist BFM Business