Sakana AI, a company specializing in artificial intelligence (AI) based in Tokyo, has unveiled an innovative framework aimed at enhancing the development and deployment speeds of large language models (LLMs). The announcement, made on Thursday, introduced the AI CUDA Engineer, which optimizes the codebase to improve both pre-training and inference speeds of AI models. The firm emphasized that the entire operation is driven by AI agents, ensuring a fully automated process. This follows the launch of The AI Scientist last year, a tool designed for conducting scientific research.
Sakana AI Unveils AI CUDA Engineer
In a post, Sakana AI explained that after creating AI systems capable of generating new models and automating the AI research process, they shifted focus towards enhancing the speeds of deployment and inference for LLMs.
This research culminated in the creation of the AI CUDA Engineer, a comprehensive agent framework designed for the discovery and optimization of CUDA (Compute Unified Device Architecture) kernels.
CUDA kernels serve as specialized functions that operate on Nvidia GPUs, facilitating the parallel execution of code across numerous threads. This parallelism is more efficient than conventional methods, especially for computational tasks involving large datasets. Consequently, it is regarded as an effective approach to optimizing the deployment and inference of AI models.
Sakana AI claims that the AI CUDA Engineer can autonomously convert PyTorch modules into optimized CUDA kernels, significantly enhancing deployment speeds. The generated kernels are reported to be 10 to 100 times faster than their PyTorch equivalents.
The process is executed in four distinct stages. First, the agent framework translates PyTorch code into functioning kernels. Next, optimization techniques are applied to ensure the generation of only the most efficient kernels. After that, kernel crossover prompts are introduced, combining multiple optimized kernels into new variations. Finally, high-performance CUDA kernels are archived by the AI agent for future performance enhancements. The company has published a study detailing this process further.
In conjunction with the study, Sakana AI is releasing the AI CUDA Engineer Archive, a dataset containing over 30,000 kernels developed by the AI. These kernels are available under the CC-By-4.0 license and can be accessed via Hugging Face.
The Japanese company has also launched an interactive website, allowing visitors to explore 17,000 verified kernels and their specifications. The site offers users the ability to browse these kernels across 230 tasks and compare CUDA kernels from individual experiments.