YouTube says OpenAI training Sora with its videos would break the rules

The use of YouTube videos to train OpenAI’s text-to-video generator would be an infraction of the platform’s terms of service, YouTube chief executive officer Neal Mohan said.

In his first public remarks on the topic, Mohan said he had no firsthand knowledge of whether OpenAI had, in fact, used YouTube videos to refine its artificial intelligence-powered video creation tool, called Sora. But if that were the case, it would be a “clear violation” of YouTube’s terms of use, he said.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said Thursday in an interview with Emily Chang, host of Bloomberg Originals.

“One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”

There has been much public debate over what material OpenAI uses to train the AI models underlying popular content creation products such as ChatGPT and DALL-E. Sora and other generative AI tools work by sucking up all sorts of content from around the web and using that data as the foundation from which the tools can generate new content, including videos, photos, narrative text and more.

As companies like OpenAI, Google and others race to develop more powerful artificial intelligence, they are looking to source as much content as possible to train their AI models to get better quality results. Google and YouTube are units of Alphabet Inc.

OpenAI, which is backed by Microsoft Corp, didn’t immediately respond to a request for comment. OpenAI chief technology officer Mira Murati said in an interview with the Wall Street Journal last month that she wasn’t sure whether Sora was trained on user-generated videos from YouTube, Facebook and Instagram.

The Journal reported this week that OpenAI has discussed training its next-generation large language model, GPT-5, on transcriptions of public YouTube videos, citing people familiar with the matter.

Mohan said Google adheres to YouTube’s individual contracts with creators before deciding whether to use videos from the platform in training the company’s own powerful AI model, Gemini.

“Lots of creators have different sorts of licensing contracts in terms of their content on our platform,” Mohan said. Though “some portion of that YouTube corpus maybe being used” to train models like Gemini, Google and YouTube ensure that using the videos as training data for Google’s AI is “in concert with whatever the terms of service or the contract that that creator has signed” beforehand, he said. – Bloomberg

Tagged