Last week, Microsoft unveiled an interactive real-time gameplay experience of Quake II through Copilot Labs. The tech powerhouse from Redmond utilized its newly developed Muse AI models along with a novel approach known as the World and Human Action MaskGIT Model (WHAMM) to create the artificial intelligence-driven gameplay. The demo is currently accessible as a research preview for all users, featuring the game’s world generation and traditional gameplay mechanics. Notably, several limitations within the AI-generated experience were also acknowledged by Microsoft.
Microsoft’s Quake II Gameplay Was Built on Muse AI
In a blog post, Microsoft researchers elaborated on the AI-generated gameplay and the methodology behind its creation. The development of AI-powered 2D and 3D game generation has emerged as a focal point for researchers, as it assesses the technology’s ability to create real-time environments and adapt them to various mechanics employed by human players. This exploration provides insight into the potential for AI models to perform real-world tasks, including robotics applications.
Quake II, originally released in 1997 by Microsoft-owned Activision, is a first-person shooter that incorporates a range of mechanics such as jumping, crouching, shooting, environment destruction, and camera movements within its 3D level-based framework. Players can currently access a single level through Copilot Labs, experiencing approximately two minutes of gameplay using either a controller or keyboard input.
Regarding the development, researchers noted the application of Muse AI models and the WHAMM approach, leveraging insights gained from the prior WHAM-1.6B model.
WHAMM Overview
Photo Credit: Microsoft
WHAMM serves as the enhanced successor to WHAM-1.6B, boasting the capability to produce over 10 frames per second for real-time video generation. The resolution output for the gameplay has been set at 640×360 pixels. Microsoft reported that a significant improvement in speed for WHAMM was facilitated by the adoption of the MaskGIT (Mask Generative Image Transformer) framework, which increased frame rates from one to over ten frames per second.
The MaskGIT setup allows researchers to generate image tokens through a limited number of forward passes, enabling the AI model to predict various possible movements within a single masked image in real time, thus enhancing the user experience.
While the core gameplay maintains fidelity to the original Quake II, Microsoft has highlighted several limitations inherent to the current demo. The AI-generated environments serve as approximations rather than exact replicas of reality, leading to instances of fuzzy image generation during enemy interactions and inaccuracies in combat scenarios.
WHAMM currently possesses a context window of 0.9 seconds (equivalent to 9 frames at 10fps), which may result in the model losing track of any objects that move out of view for longer than this timeframe. This limitation can create scenarios where a player may turn around to discover a new area or look at the sky and return to find themselves relocated within the map. Additionally, users may experience noticeable latency, attributed to the demo’s widespread availability.