‘Woman walks down Tokyo street’: ChatGPT makers demo AI video feature

SAN JOSE: The makers of ChatGPT are broadening the horizon of their AI-powered software with a new feature that allows users to create up to a minute of video footage from text prompts.

The AI model, named Sora, is first being made available to a limited number of creative professionals, OpenAI chief executive Sam Altman announced on the social media platform X on Thursday.

OpenAI published several demo videos, all generated entirely by artificial intelligence, on the software’s website, along with the description on which they were based.

One of them shows a woman walking through a brightly lit city street at night. The prompt asked for footage of a woman wearing a leather jacket and a red dress and that the street should be reminiscent of Tokyo and have lots of neon signs that are also reflected in puddles.

Other videos show mammoths walking in the snow and a historic-looking footage of California during the Gold Rush.

The videos are impressively realistic at times, particularly the lighting and texture. But they are also clearly artificial, and OpenAI admits that Sora still has weaknesses.

One recurring flaw is that moving subjects tend to bend the laws of physics, and the way people and animals walk still appears unnatural.

Scale and continuity are also pain points, and in one video someone bites off a biscuit and the biscuit later still appears whole, while in another some people appear to be giants while others nearby are far smaller.

However the sample videos are proof that using AI to generate moving images from text prompts could change video production over time.

Certain short clips such as birthday scene and generic lifestyle shots appear close to being ready for use in advertising or promotional material. Some footage requires more than a passing glance to tell they are artificial.

At the same time, there are major concerns that it could be used to create fake videos on a large scale that would be almost indistinguishable from real footage. Several other companies have already developed software that can generate videos from text.

The developers of the technology want Sora videos to be clearly recognised as being created by AI, and are working on ways to incorporate unique recognisable features such as watermarks into the videos

A group of experts are now set to sound out possible security risks before the software can be widely used.

OpenAI’s announcement came just as Google also announced an update to its AI software, one that allows users to get a fast analysis of massive amounts of video or audio material.

The latest version of Gemini AI, Google’s answer to ChatGPT, was tested out with a search for “comedic moments” in a 400-page transcript of conversations from the Apollo 11 space mission to the moon.

In half a minute, Gemini version 1.5 delivered three instances of humour and could even give context on why a certain phrase was funny.

Proving its ability to understand things in their context, the software responded to an uploaded drawing of a boot by linking this to the moment when Neil Armstrong took the first step on the moon.

In addition to processing text, code and audio, the development could make it possible to look for certain visual elements in large amounts of footage without a person having to watch it.

“When given a 44-minute silent Buster Keaton movie, the model can accurately analyse various plot points and events, and even reason about small details in the movie that could easily be missed,” Google’s AI head Demis Hassabis wrote on Thursday.

The internet giant is competing with ChatGPT inventor OpenAI, which triggered global hype surrounding AI just over a year ago.

Earlier in February Google rebranded its AI apps and services under the name Gemini. The Gemini 1.5 model will initially be available to developers and corporate customers before it is rolled out to all users. – dpa

Related Posts