OpenAI Reportedly Used Data From YouTube Videos to Train GPT-4 AI Model

As per the report, over a million hours of YouTube videos were transcribed by OpenAI to train its latest AI model.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 8 April 2024 17:46 IST
Highlights
  • Reportedly, OpenAI exhausted its data sources in 2021
  • Leading chatbots have reportedly been trained on three trillion words
  • OpenAI is said to be looking into synthetic data to train AI models

The report alleged that OpenAI President Greg Brockman helped collect data from YouTube videos

Photo Credit: Pexels/Solen Feyissa

OpenAI might have used more than a million hours of transcribed data from YouTube videos to train its latest artificial intelligence (AI) model GPT-4, claims a report. It further states that the ChatGPT maker was forced to procure data through YouTube as it had exhausted its entire supply of text-word resources to train its AI models. The allegation, if true, can lead to new problems for the AI firm which is already fighting multiple lawsuits for using copyrighted data. Notably, a report last month highlighted that its GPT Store contained mini chatbots that violated the company's guidelines.

In a report, The New York Times claimed that after running out of sources with unique text words to train its AI models, the company developed an automatic speech recognition tool called Whisper to use it to transcribe YouTube videos and train its models using the data. OpenAI launched Whisper publicly in September 2022, and the AI firm said it was trained on 6,80,000 hours of “multilingual and multitask supervised data collected from the web”.

The report further alleges, citing unnamed sources familiar with the matter, that the OpenAI employees discussed whether using YouTube's data could breach the platform's guidelines and land them in legal trouble. Notably, Google prohibits the usage of videos for applications that are independent of the platform.

Advertisement

Eventually, the company went ahead with the plan and transcribed more than a million hours of YouTube videos, and the text was fed to GPT-4, as per the report. Further, the NYT report also alleges that OpenAI President Greg Brockman was directly involved with the process and personally helped collect data from videos.

Advertisement

Speaking with The Verge, OpenAI spokesperson Matt Bryant called the reports unconfirmed and denied any such activities saying, “Both our robots.txt files and Terms of Service prohibit unauthorized scraping or downloading of YouTube content.” Another spokesperson, Lindsay Held told the publication that it uses “numerous sources including publicly available data and partnerships for non-public data” as its data sources. She also added that the AI firm was looking into the possibility of using synthetic data to train its future AI models.


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement

Related Stories

Popular Mobile Brands
  1. iPhone 14 Under Rs. 40,000: Flipkart's Big Billion Days Deal Revealed
  2. OTT Releases This Week: Coolie, Saiyaara, a Tamannaah Bhatia Web Series
  3. Samsung Galaxy S25 FE Tipped to Go On Sale At This Price in India
  4. Flipkart BBD Deal: iPhone 16 Pro Max Under Rs. 90,000
  5. Oppo F31 Series Specifications Confirmed Ahead of India Launch
  6. Acer Nitro V15 (2025) Launched in India With This Nvidia RTX 50-Series GPU
  7. Amazon's 10-Minute Delivery Service is Now Available in This City
  8. HMD Vibe 5G Launched in India Alongside HMD 101 4G and HMD 102 4G
  9. You Can Now Sign Up to Test Xiaomi's HyperOS 3 Update
  10. Samsung Galaxy F17 5G With 5,000mAh Battery Launched in India
  1. SpaceX Falcon 9 Launches 21 Satellites for US Military’s New Communications Network
  2. NASA Uses Rocky Mountain Helicopter Drills to Prepare Astronauts for Artemis Moon Missions
  3. NASA’s Perseverance Rover Finds Potential Signs of Life in Mars Rock Sample
  4. iPhone 14 Under Rs. 40,000: Flipkart's Big Billion Days Sale Deal Revealed
  5. Forget iPhone 17 Pro, Get the iPhone 16 Pro Max for Under Rs. 90,000 in Flipkart's Big Billion Days Sale
  6. Supermoon 2025: When Is the Next Full Moon Lighting Up the Sky
  7. New Black Hole Merger Gives Clearest Test of Einstein’s Relativity
  8. Only Murders in the Building Season 5 Now Streaming Online: Know When and Where to Watch
  9. Sony Launches PlayStation Family App on iOS, Android for Parental Controls on Gaming Activity
  10. Itel Super 26 Ultra Launched With 6.8-Inch Display, 6,000mAh Battery: Price, Specifications
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.