Are YouTube videos public domain? Many A.I. related companies seem to think so.
Recent findings have exposed that prominent tech companies such as Apple have utilized YouTube video content to train artificial intelligence models without obtaining permission from the video creators. This practice involved the use of subtitle files from over 170,000 videos from popular YouTube personalities like Marques Brownlee (MKBHD), MrBeast, PewDiePie, and several well-known talk show hosts such as Stephen Colbert and John Oliver.
The subtitle files were originally downloaded by EleutherAI, a nonprofit organization that compiles datasets to aid small developers and academic researchers in AI training. While EleutherAI’s intent was to democratize access to AI training materials, the dataset, widely known as the Pile, was also utilized by large tech corporations with considerable resources.
Apple, NVIDIA, Anthropic, and more AI giants under fire for using YouTube transcripts without permission 🤖💥 What are your thoughts on this controversial practice? #AIethics #TechIndustry https://t.co/8KP7iEbLt6
— AInnovation (@AInnovationAI) July 16, 2024
Research papers and posts by companies like Apple, Nvidia, and Salesforce have detailed their use of the Pile for developing AI technologies. Apple, for instance, used this dataset to enhance OpenELM, a major AI model it introduced shortly before announcing new AI features for iPhones and MacBooks.
This situation has stirred concerns regarding the ethical use of online content for AI training, particularly when done without direct consent from content creators. The implications extend beyond copyright issues, touching on the broader ethical dimensions of AI development and the responsibility of tech giants to respect the intellectual property of content creators.
Apple trained AI models on YouTube content without consent; includes MKBHD videos https://t.co/u7ArS3jU3b $AAPL pic.twitter.com/0fJy4ewRMX
— MacHash (@MacHashNews) July 16, 2024
Key Points:
i. A recent report reveals that several tech giants, including Apple, have been training their AI models using YouTube videos without the consent of the content creators.
ii. The training involved subtitle files from over 170,000 videos, affecting well-known creators such as Marques Brownlee, MrBeast, and others.
iii. The subtitle files were sourced by EleutherAI, a nonprofit aimed at providing AI training materials to developers and academics, though the data ended up being used by major corporations.
iv. The dataset, known as the Pile, was accessed by companies like Apple, Nvidia, and Salesforce for their AI development, contributing to significant technological advancements in their products.
v. The use of this dataset highlights the ongoing ethical and legal issues surrounding the use of publicly available data for training AI systems.
Al Santana – Reprinted with permission of Whatfinger News