Apple Partners With Nvidia to Improve Performance Speed of Its AI Models

Apple used Nvidia’s inference acceleration framework for its open-source Recurrent Drafter technique for AI models.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 19 December 2024 17:09 IST
Highlights
  • Apple published a paper on Recurrent Drafter earlier this year
  • Nvidia’s TensorRT-LLM acceleration framework was used for this
  • Apple claims the process resulted in 2.7x faster token generation

Apple had earlier stated the Recurrent Drafter can improve token generation by up to 3.5 tokens per step

Photo Credit: Reuters

Apple is partnering with Nvidia in an effort to improve the performance speed of artificial intelligence (AI) models. On Wednesday, the Cupertino-based tech giant announced that it has been researching inference acceleration on Nvidia's platform to see whether both the efficiency and latency of a large language model (LLM) can be improved simultaneously. The iPhone maker used a technique dubbed Recurrent Drafter (ReDrafter) that was published in a research paper earlier this year. This technique was combined with the Nvidia TensorRT-LLM inference acceleration framework.

Apple Uses Nvidia Platform to Improve AI Performance

In a blog post, Apple researchers detailed the new collaboration with Nvidia for LLM performance and the results achieved from it. The company highlighted that it has been researching the problem of improving inference efficiency while maintaining latency in AI models.

Advertisement

Inference in machine learning refers to the process of making predictions, decisions, or conclusions based on a given set of data or input while using a trained model. Put simply, it is the processing step of an AI model where it decodes the prompts and converts raw data into processed unseen information.

Earlier this year, Apple published and open-sourced the ReDrafter technique bringing a new approach to the speculative decoding of data. Using a Recurrent neural network (RNN) draft model, it combines beam search (a mechanism where AI explores multiple possibilities for a solution) and dynamic tree attention (tree-structure data is processed using an attention mechanism). The researchers stated that it can speed up LLM token generation by up to 3.5 tokens per generation step.

Advertisement

While the company was able to improve performance efficiency to a certain degree by combining two processes, Apple highlighted that there was no significant boost to speed. To solve this, researchers integrated ReDrafter into the Nvidia TensorRT-LLM inference acceleration framework.

As a part of the collaboration, Nvidia added new operators and exposed the existing ones to improve the speculative decoding process. The post claimed that when using the Nvidia platform with ReDrafter, they found a 2.7x speed-up in generated tokens per second for greedy decoding (a decoding strategy used in sequence generation tasks).

Advertisement

Apple highlighted that this technology can be used to reduce the latency of AI processing while also using fewer GPUs and consuming less power.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Further reading: Apple, Nvidia, AI, Artificial Intelligence
Advertisement

Related Stories

Popular Mobile Brands
  1. Redmi Teases Launch of New Device in India, Amazon Availability Confirmed
  2. OTT Releases This Week (May 4 - May 10): Dacoit, Lukkhe, Citadel Season 2, and More
  3. OnePlus Nord CE 6, Nord CE 6 Lite Launched in India at These Prices
  4. Jio's Latest Device Enables Wireless Android Auto, CarPlay in Your Car
  5. Vivo X300 FE vs iPhone 17 vs Xiaomi 17: Price, Features Compared
  6. Google Health App Replaces Fitbit App as Health Coach Rolls Out in India
  7. Oppo Find X9 Ultra, Find X9s Availability Details Confirmed Ahead of Debut
  8. Qualcomm Launches Two New Mobile Chipsets at Snapdragon for India Event
  9. Acer Iconia iM11-22M5G With 11.45-Inch Display Debuts India at This Price
  1. Scientists Reconsider Dark Matter Theory Amid Growing Cosmological Mysteries
  2. Google Health App Replaces Fitbit App as Company Brings Google Health Coach to India
  3. Pragmata Has Sold Over 2 Million Copies in 16 Days Since Launch, Capcom Announces
  4. Google Upgrades AI Mode, AI Overviews With Expert Advice and Link Previews
  5. Google Tensor G7 Chip's Codename, Key Details Revealed in New Leak; Expected to Debut With Pixel 12 Series in 2027
  6. System OTT Release Date: When and Where to Watch Sonakshi Sinha Starrer Courtroom Thriller Online?
  7. Love Is Blind Poland S1 Now Streaming Online: What You Need to Know About Polish Reality Show
  8. Grand Theft Auto 6 Budget Estimated to Be Over $1 Billion, Take-Two CEO Says 'It Was Expensive'
  9. Redmi K100 Tipped to Launch With Significant Battery Upgrade, Wireless Charging Support
  10. Samsung Galaxy Watch Could Predict Fainting Up to Five Minutes in Advance, Study Shows
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.