On Wednesday, Microsoft researchers unveiled a groundbreaking artificial intelligence (AI) model designed to create 3D gaming environments. Known as the World and Human Action Model (WHAM) or Muse, this innovative AI was developed through a collaboration between Microsoft’s Research Game Intelligence, Teachable AI Experiences (Tai X) teams, and Xbox Games Studios’ Ninja Theory. The company indicated that this large language model (LLM) is intended to assist game designers in both idea generation and the creation of visual elements and controller actions, thereby facilitating the game development process.
Microsoft Introduces Muse AI Model
The tech giant based in Redmond outlined the Muse AI model in a blog post. While currently classified as a research product, Microsoft plans to open-source the model’s weights and sample data for the WHAM Demonstrator, a prototype interface for interacting with the AI. Developers will have the opportunity to experiment with the model through Azure AI Foundry. Furthermore, detailed technical information about the model has been published in a paper in the journal Nature.
Training a model in such a complex spectrum poses significant challenges. To develop this model, researchers amassed a substantial dataset of human gameplay experiences from the 2020 title Bleeding Edge, published by Ninja Theory. The LLM was trained using a billion pairs of images and actions, amounting to an equivalent of seven years of human gameplay. The data collection followed ethical standards and is strictly intended for research use.
According to the researchers, one of the main hurdles was the extensive scaling required for model training. Initially, Muse was trained using a cluster of Nvidia V100 GPUs, and later advanced to utilize multiple Nvidia H100 GPUs for enhanced performance.
The Muse AI model is engineered to accept both text and visual prompts. Once it generates a gaming environment, users can further refine it using controller inputs. The AI adapts to user movements, creating new environments that align with the original prompt and maintaining coherence with other gameplay elements.
Given its unique nature, conventional benchmark tests are insufficient for assessing the model’s performance. The researchers noted that they conducted internal evaluations based on metrics such as consistency, diversity, and persistence. Since Muse remains a research-oriented model, its output is currently limited to a resolution of 300×180 pixels.