Meta Platforms has utilized public posts from Facebook and Instagram to enhance its new Meta AI virtual assistant, while intentionally leaving out private posts meant only for intimate circles, according to the company’s chief policy officer. Nick Clegg, Meta’s President of Global Affairs, shared these insights during an interview with Reuters at the annual Connect conference.
Clegg emphasized that private messages shared through Meta’s messaging services were not incorporated into the AI’s training data. He noted that the company took measures to filter out personal information from public datasets used for training. “We’ve tried to exclude datasets that have a heavy preponderance of personal information,” Clegg stated, mentioning that the “vast majority” of the data fed into the system was publicly accessible.
He cited LinkedIn as a specific example of a platform that Meta chose to avoid due to privacy issues. These comments come in light of ongoing scrutiny faced by tech companies, including Meta, OpenAI, and Google’s parent company Alphabet, for using internet-sourced information without consent to train their AI systems. Such AI models gather extensive data to facilitate tasks like content summarization and image generation.
Amidst growing legal challenges from authors asserting copyright violations, these companies are assessing how to handle private or copyrighted materials captured during data collection.
The launch of Meta AI marked a significant moment at the company’s annual product unveiling event, where CEO Mark Zuckerberg introduced it as the flagship consumer-facing AI tool. This year’s Connect conference focused heavily on artificial intelligence, a shift from previous years’ emphasis on augmented and virtual reality.
Meta AI was developed using a proprietary model based on the Llama 2 large language model, released for public commercial use in July, alongside a new image-generating model known as Emu. The virtual assistant will have the capability to produce text, audio, and visual content, leveraging real-time information through a partnership with Microsoft’s Bing search engine.
According to Clegg, public Facebook and Instagram posts served as the foundation for training the Meta AI, incorporating both text and images. The Emu model specifically utilized these posts for its image generation component, while the chat functionalities were primarily based on Llama 2, supplemented by publicly available datasets. Furthermore, interactions with Meta AI may help refine its features in the future.
To address safety concerns, Clegg mentioned that the tool would have restrictions in place, including a prohibition against generating photo-realistic images of public figures. On the topic of copyrighted materials, Clegg anticipated a considerable amount of litigation regarding the interpretation of the “fair use doctrine,” which permits limited use of protected works for commentary, research, and parody. He expressed confidence in their position but acknowledged the likelihood of legal disputes ahead.
While some companies with image-generation capabilities allow the replication of trademarked characters like Mickey Mouse, others have opted for licensing agreements or avoided such content in their training data. OpenAI recently entered into a six-year partnership with Shutterstock to utilize their extensive collections of images, videos, and music for similar training purposes. In response to inquiries about Meta’s approach to copyrighted imagery, a spokesperson highlighted their new terms of service, which prohibit users from generating content that infringes on privacy and intellectual property rights.