1. News
  2. INTERNET
  3. Meta Faces Lawsuit Over Copyrighted AI Training Data

Meta Faces Lawsuit Over Copyrighted AI Training Data

featured
Share

Share This Post

or copy the link

Meta is currently embroiled in a copyright lawsuit concerning the alleged use of copyrighted materials for training its artificial intelligence (AI) models. The lawsuit features multiple plaintiffs, including several bestselling authors, who have brought forward claims against the tech company. Key allegations against Meta assert that it utilized pirated e-books and articles to train earlier iterations of its Llama AI models, thereby infringing copyright laws. Furthermore, company CEO Mark Zuckerberg is accused of permitting the Llama AI team to access a questionable link aggregator to obtain these copyrighted works.

This information was revealed through two different filings submitted to the US District Court for the Northern District of California on Wednesday. Among the complainants are authors Sarah Silverman and Ta-Nehisi Coates, who highlight Meta’s testimony from late 2024, which revealed that Zuckerberg sanctioned the use of a dataset known as LibGen to educate its Llama AI models.

LibGen, or Library Genesis, is recognized as a file-sharing platform that provides access to academic and general interest literature. It is frequently labeled a pirate library due to its distribution of copyrighted works that may otherwise be behind paywalls or not digitally accessible. The platform has faced numerous legal challenges and has been mandated to shut down in the past.

The legal documents assert that Meta knowingly employed the LibGen dataset, fully aware that it contained pirated content, thereby violating copyright laws. The filings reference a memo directed to Meta’s AI decision-makers, which indicates that after escalation to “MZ,” the Llama AI team received approval to utilize LibGen. Here, “MZ” refers to Mark Zuckerberg.

Moreover, the memo disclosed that executives were informed that public knowledge of using “a dataset we know to be pirated such as LibGen” could jeopardize the company’s negotiating stance with regulators. The social media behemoth is further accused of removing copyright data from the text and metadata of the dataset to obscure its infringement.

According to the filings, Nikolay Bashlykov, a research engineer in Meta’s AI division, allegedly deleted copyright information from the LibGen dataset. To further cover their tracks, Meta’s programmers reportedly incorporated “supervised samples” of data during the fine-tuning of Llama, ensuring that the model’s responses would provide less incriminating answers when questioned about the origin of its training data.

The plaintiffs have also claimed that just accessing LibGen constitutes another form of copyright infringement. The documents allege that Meta engaged in torrenting the LibGen dataset, a process that entails both downloading and uploading (or seeding) content. Uploading copyrighted material can represent a distribution violation, according to the filings.

“Had Meta purchased the Plaintiffs’ works in a bookstore or borrowed them from a library and trained its Llama models on them without a license, it would have committed copyright infringement. Meta’s choice to circumvent lawful methods of obtaining books and engage in an illegal torrenting network violates the CDAFA [California Comprehensive Computer Data Access and Fraud Act] and serves as ample evidence of copyright infringement,” the documents outline.

The copyright lawsuit remains active, with a ruling forthcoming. Meta is yet to present its defense, which is expected to invoke fair use arguments. The court will need to determine whether the generative abilities of the AI model are transformative enough to substantiate that claim.

Meta Faces Lawsuit Over Copyrighted AI Training Data
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!