1. News
  2. INTERNET
  3. Apple Teams Up with Nvidia to Supercharge AI Speed

Apple Teams Up with Nvidia to Supercharge AI Speed

featured
Share

Share This Post

or copy the link

Apple has announced a new collaboration with Nvidia aimed at enhancing the performance speed of artificial intelligence (AI) models. The Cupertino-based technology firm revealed on Wednesday that it has been investigating inference acceleration on Nvidia’s platform to determine if both efficiency and latency improvements can be achieved concurrently in large language models (LLMs). The company utilized a methodology called Recurrent Drafter (ReDrafter), which was detailed in a research paper published earlier this year, in conjunction with the Nvidia TensorRT-LLM inference acceleration framework.

Apple Leverages Nvidia Platform for AI Enhancements

In a blog post, Apple’s researchers elaborated on the partnership with Nvidia focusing on LLM performance and the outcomes of their efforts. The statement emphasized the company’s exploration of enhancing inference efficiency while sustaining latency in AI models.

Inference in the context of machine learning involves generating predictions, decisions, or conclusions from a given dataset or input utilizing a trained model. Essentially, it represents the phase of an AI model where it processes prompts and translates raw data into comprehensible information.

Earlier this year, Apple published and made available the ReDrafter technique, introducing a novel method for speculative data decoding. This approach employs a recurrent neural network (RNN) draft model, merging beam search—which examines multiple potential solutions—with dynamic tree attention, a mechanism for processing tree-structured data. The research indicated that this technique could accelerate LLM token generation by as much as 3.5 tokens per generation step.

Although the company achieved some performance enhancements by merging the two processes, it noted that there was no substantial increase in speed. To address this, researchers integrated ReDrafter into Nvidia’s TensorRT-LLM inference acceleration framework.

As part of the collaboration, Nvidia introduced new operators and enhanced existing ones to refine the speculative decoding process. The findings indicated that employing the Nvidia platform alongside ReDrafter yielded a speed increase of 2.7 times in token generation per second for greedy decoding—a strategy commonly used in sequence generation tasks.

Apple emphasized that this new technology can effectively lower AI processing latency while also requiring fewer GPUs and reducing overall power consumption.

Apple Teams Up with Nvidia to Supercharge AI Speed
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!