Apple is advancing its methods for analyzing user data patterns and aggregating insights to enhance its artificial intelligence (AI) capabilities. The tech company, headquartered in Cupertino, unveiled its differential privacy techniques on Monday, emphasizing that these approaches will safeguard user privacy. Instead, Apple aims to collect data on usage trends and data embeddings, which will aid in refining its text generation functionalities and Genmoji. Importantly, this information will be sourced solely from devices whose users have opted in to share Device Analytics.
Apple Aims to Learn from User Data While Preserving Privacy
In a post on its Machine Learning Research site, Apple provided insights into its novel approach aimed at enhancing various features within Apple Intelligence. The company’s AI capabilities have not met expectations, and it attributes some of this to its commitment to ethical data sourcing and pretraining methodologies for its AI models.
Apple asserts that its generative AI systems are trained using synthetic data, created by other AI models or digital sources, rather than human-generated content. While this strategy is effective for training large language models (LLMs), it may result in outputs that lack the distinctiveness and style of human expression, leading to what some refer to as “AI slop.”
To address these challenges and elevate the quality of its AI outputs, Apple is exploring ways to learn from user data without compromising on privacy. The company refers to this approach as “differential privacy.”
For its Genmoji feature, Apple intends to apply differentially private techniques to discern popular user prompts and patterns from those who have chosen to share their Device Analytics. The company has assured that this method will mathematically guarantee the anonymity of unique or rare prompts, ensuring they cannot be linked to individual users.
This data collection initiative will enable Apple to analyze user interactions that produce the most representative engagement. Essentially, the company is interested in identifying prompts that lead to satisfactory outputs and those instances where users modify prompts to achieve their desired results. An example highlighted in the post illustrated how the models performed in generating multiple entities.
Looking ahead, Apple plans to implement this methodology for various features, including Image Playground, Image Wand, Memories Creation, and Writing Tools within Apple Intelligence, as well as Visual Intelligence in future updates.
Differential Privacy in Apple Intelligence’s text generation feature
Photo Credit: Apple
Apple is also applying this differential privacy technique to its text generation capabilities, albeit with a different strategy than that employed for Genmoji. To evaluate the effectiveness of its email generation tools, the company produced a series of emails covering common subjects, generating various versions for each topic and capturing key attributes such as language, topic, and length. These representations are known as embeddings.
These embeddings were subsequently shared with a limited group of users who had opted in to Device Analytics. The synthetic embeddings were compared to a sample of users’ emails, which allowed Apple to create representations of aggregate trends without ever accessing actual user email content. “Thanks to these privacy safeguards, Apple can generate synthetic data that reflects overall patterns while ensuring the privacy of user communications,” the company stated.
This means that while Apple does not gain insight into the specific contents of emails, it can still gather insights on user preferences regarding email phrasing. The current focus on improving email text generation will pave the way for Apple to expand this technique to email summaries in the future.