Nvidia Working on AI for Videos, Scrapped Netflix and YouTube Despite Concerns Raised by Employees

New Delhi | Updated 06-08-2024, 02:20 PM

  • Nvidia is developing an AI model named ‘Cosmos’ for video content.
  • The company scraped data from Netflix and YouTube to train the model, raising legal and ethical concerns.
  • Nvidia asserts its practices comply with copyright laws, despite employee apprehensions.

 

Nvidia is reportedly developing an advanced AI model capable of understanding and generating video content. According to an exclusive investigation by 404 Media, the AI chip maker has scraped vast amounts of data from sources including Netflix and YouTube to train its AI model for videos, raising legal and ethical questions about the use of copyrighted material for training artificial intelligence models.

Citing documents seen by the publication and conversations with Nvidia employees, 404 Media reports that the new AI model for video has been internally named “Cosmos.” The goal of this project is to create a video foundation model “that encapsulates simulation of light transport, physics, and intelligence in one place to unlock various downstream applications critical to NVIDIA.”

Nvidia has also reportedly asked its employees to use various tools, including the open-source YouTube video downloader yt-dlp, to scrape full-length videos while using virtual machines to evade detection and avoid being blocked by YouTube. Additionally, the employees are using virtual machines on Amazon Web Services, capable of refreshing IP addresses, to download approximately 80 years’ worth of video content per day.

According to the employees who spoke with the publication anonymously, Nvidia’s new model aims to support the company’s diverse product lineup, including its Omniverse 3D world generator, self-driving car systems, and “digital human” products. However, the project has not yet been released to the public.

Another former Nvidia employee revealed that, in addition to YouTube, Nvidia has also targeted Netflix, even though Netflix’s terms of service explicitly prohibit such scraping activities. This approach was not limited to public content. Nvidia also reportedly mined academic datasets and other resources meant solely for research purposes, raising additional ethical questions.

The report also stated that Nvidia’s Cosmos team, in one of the Slack conversations, discussed downloading a variety of video content, including Hollywood films, Discovery Channel documentaries, and high-quality gaming footage. In a Slack message, project leaders like Ming-Yu Liu mentioned the potential benefits of using Hollywood films for training, noting their “gaming-like 3D consistency and fictional content but much higher quality.” However, he also acknowledged the sensitivity of using such content, referencing concerns similar to those raised by artists following the release of Stable Diffusion (SD).

Although many employees have reportedly raised questions about the legality of this data acquisition method, their concerns were dismissed by project managers, who reassured employees that they had top-level approval to scrape this data from the website, deeming it an “executive decision.” In fact, Nvidia has also defended this practice of scraping videos, stating that the work is “in full compliance with the letter and the spirit of copyright law.”

4o