Sakana AI Announces AI CUDA Engineer That Can Speed Up Model Development and Deployment

AI CUDA Engineer is an agent framework for automatically converting standard PyTorch code into CUDA kernels.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 21 February 2025 16:52 IST
Highlights
  • These CUDA kernels are 10-100 times faster than typical PyTorch versions
  • Sakana AI is also releasing a dataset with over 30,000 CUDA kernels
  • The company recently released The AI Scientist

Sakana AI has launched a website where others can explore 17,000 verified kernels and their profiles

Photo Credit: Unsplash/Gerard Siderius

Sakana AI, a Tokyo-based artificial intelligence (AI) firm, introduced a new artificial intelligence (AI) agentic framework that can improve the development and deployment speeds of large language models (LLMs). Announced on Thursday, the company unveiled the AI CUDA Engineer that improves both the pre-training and inference speeds of an AI model by optimising the codebase. The AI firm highlighted that the entire process is driven by AI agents and is end-to-end automated. Notably, Sakana AI introduced The AI Scientist last year which can conduct scientific research.

Sakana AI Unveils AI CUDA Engineer

In a post, the Japanese AI firm stated that after developing AI systems that can create new models, and fully automate the AI research process, it began working on ways to speed up the deployment and inference speeds of an LLM.

The company said that the research led to the development of the AI CUDA Engineer. It is a fully automated, comprehensive agent framework for CUDA (Compute Unified Device Architecture) kernel discovery and optimisation.

Advertisement

CUDA kernels can be understood as specialised functions that run on Nvidia GPUs, allowing parallel execution of code across multiple threads. Due to parallelism, it is more optimised than traditional methods and allows for the acceleration of computational tasks, especially those with large datasets. As such, this is considered a great way to optimise AI models' deployment and inference.

Advertisement

Sakana AI said the AI CUDA Engineer can automatically convert PyTorch modules into optimised CUDA kernels, to significantly improve deployment speedups. It can generate kernels that are said to be 10-100 times faster than its PyTorch counterpart.

The process includes four steps. First, the agent framework converts the PyTorch code into working kernels. Then, the agent implements optimisation techniques to ensure only the best kernels are generated. Then, kernel crossover prompts are added, which combine multiple optimised kernels to create new kernels. Finally, the AI agent preserves the high-performance CUDA kernels in an archive, which are used to deliver performance improvements. The company has also published a study that further details the process.

Advertisement

Alongside the paper, Sakana AI is also publishing the AI CUDA Engineer Archive, which is a dataset consisting of more than 30,000 kernels generated by the AI. These kernels are released under the CC-By-4.0 license and can be accessed via Hugging Face.

Additionally, the Japanese firm also launched a website that lets visitors interactively explore 17,000 verified kernels and their profiles. The website allows users to explore these kernels across 230 tasks, and also lets them compare CUDA kernels across individual experiments.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Lava Play Max Launched in India With Vapour Chamber Cooling at This Price
  2. Nothing Phone 3a Community Edition Launched: Here's What Makes It Special
  3. Redmi Note 15 5G 108 Master Pixel Edition Will Launch in India on This Date
  4. Paramount Launches Hostile Bid to Derail Netflix-Warner Bros. Deal
  5. Honor's Robot Phone Could Be One Step Closer to Its Commercial Debut
  6. Samsung's Galaxy Z TriFold Is Now Available to Pre-Order in China
  7. Nothing Phone 3a Community Edition First Impressions
  8. OpenAI's Code Red to Reportedly Continue Till Two More AI Models Are Released
  9. Poco C85 5G With a 6,000mAh Battery Launched in India at This Price
  1. Microsoft to Invest $17.5 Billion to Scale India’s AI and Cloud, Joins Google and OpenAI’s Recent Push
  2. Massive Sunspot Complex on the Sun Raises Risk of Strong Solar Storms
  3. Ronkini Bhavan OTT Release: Know Where to Watch This Bengali Web Series Online?
  4. The Great Shamsuddin Family OTT Release Date: When and Where to Watch it Online?
  5. Angels Fallen OTT Release Date: When and Where to Watch it Online?
  6. OpenAI to Reportedly Release GPT-5.2 AI Model This Week, But ‘Code Red’ Will Continue
  7. Top Cooku Dupe Cooku Season 2 Now Streaming Online: Know Where to Watch This Reality Cooking Series
  8. Nothing Phone 3a Community Edition Launched in India With Custom Hardware Design and Custom UI Elements: Price, Features
  9. Google Shares Safety Guardrails for Chrome Browser’s Agentic Capabilities
  10. Google Pixel 9 Pro, Pixel 9 Pro XL and Pixel 9 Pro Fold Extended Repair Program for Specific Hardware Issues Announced
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.