1. News
  2. AI
  3. Apple Teams Up with Nvidia to Supercharge AI Speed

Apple Teams Up with Nvidia to Supercharge AI Speed

featured
Share

Share This Post

or copy the link

Apple is collaborating with Nvidia to enhance the speed of artificial intelligence (AI) models. On Wednesday, the tech company based in Cupertino announced its ongoing research into inference acceleration on Nvidia’s platform, aiming to simultaneously boost the efficiency and reduce the latency of large language models (LLMs). This initiative employs a method called Recurrent Drafter (ReDrafter), previously detailed in a research paper published earlier this year, and integrates it with Nvidia’s TensorRT-LLM inference acceleration framework.

Apple Leverages Nvidia Platform to Enhance AI Efficiency

In a blog post, Apple researchers elaborated on their partnership with Nvidia to improve LLM performance and shared the achievements stemming from this collaboration. They underscored their focus on addressing the challenge of boosting inference efficiency while keeping latency low in AI models.

Inference in the realm of machine learning is defined as the process of utilizing a trained model to make predictions, decisions, or conclusions based on input data. Essentially, it represents the processing phase of an AI model, where commands are interpreted and raw data is transformed into usable information.

This year, Apple released and open-sourced the ReDrafter technique, introducing a novel approach to speculative decoding. This method incorporates a recurrent neural network (RNN) draft model, which synergizes beam search—a mechanism that enables AI to explore various solution possibilities—with dynamic tree attention, where tree-structured data is processed using an attention mechanism. Researchers reported that this approach can accelerate LLM token generation by as much as 3.5 tokens for each generation step.

While significant performance improvements were observed by merging these processes, Apple noted that speed enhancements were minimal. To address this limitation, researchers integrated ReDrafter with Nvidia’s TensorRT-LLM inference acceleration framework.

Within the scope of this collaboration, Nvidia implemented new operators and enhanced existing ones to optimize the speculative decoding process. The findings revealed that utilizing the Nvidia platform alongside ReDrafter resulted in a 2.7x increase in tokens generated per second for greedy decoding—a technique often employed in sequence generation tasks.

Apple emphasized that this technology has the potential to lower AI processing latency while requiring fewer GPUs and minimizing power consumption.

Apple Teams Up with Nvidia to Supercharge AI Speed
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!