he Times article says that the company exhausted supplies of useful data in 2021, and discussed transcribing YouTube videos, podcasts, and audiobooks after blowing through other resources. By then, it had trained its models on data that included computer code from Github, chess move databases, and schoolwork content from Quizlet.
Google spokesperson Matt Bryant told The Verge in an email the company has “seen unconfirmed reports” of OpenAI’s activity, adding that “both our robots.txt files and Terms of Service prohibit unauthorized scraping or downloading of YouTube content,” echoing the company’s terms of use. YouTube CEO Neal Mohan said similar things about the possibility that OpenAI used YouTube to train its Sora video-generating model this week. Bryant said Google takes “technical and legal measures” to prevent such unauthorized use “when we have a clear legal or technical basis to do so.”