Meta, the parent company of Facebook, is facing serious allegations of systematic copyright infringement in its development of artificial intelligence models, according to newly unsealed court documents. The tech giant is accused of downloading over 81.7 terabytes of pirated books through torrent networks to train its Llama AI models.
Internal emails reveal that Meta employees engaged in a coordinated effort to obtain copyrighted content through unauthorized channels, with 35.7 terabytes reportedly downloaded from known pirate sites Z-Library and LibGen. The documents indicate that CEO Mark Zuckerberg was aware of and approved the use of these pirated sources.
According to the court filings, Meta took deliberate steps to conceal its activities, including removing copyright markings and altering metadata to minimize legal exposure. The company allegedly operated in “stealth mode” while downloading the content, adjusting settings “so that the smallest amount of seeding possible could occur.”
The revelations have emerged as part of an ongoing lawsuit filed by prominent authors, including comedian Sarah Silverman and novelist Christopher Golden. While Meta’s legal team maintains a “fair use” defense under copyright law, internal communications suggest that the company’s own lawyers had previously warned about potential legal risks associated with these practices.
One Meta engineer expressed discomfort with the operation in internal communications, stating that “Torrenting from a corporate laptop doesn’t feel right.” This sentiment highlights the ethical concerns surrounding the company’s approach to acquiring training data for its AI systems.
The case could have far-reaching implications for the AI industry, as it may establish important legal precedents regarding the acquisition of training data for artificial intelligence models. The outcome could influence how technology companies approach copyright compliance in AI development moving forward.