Last week, Microsoft introduced an interactive real-time gameplay experience of Quake II through its Copilot Labs initiative. This AI-driven gameplay utilizes the company’s newly released Muse AI models in conjunction with a novel method known as the World and Human Action MaskGIT Model (WHAMM). Currently, the game demo is accessible to the public as a research preview, offering a glimpse of the game’s world-building and typical mechanics. However, Microsoft has acknowledged several limitations within this AI-generated experience.
Microsoft’s Quake II Gameplay Was Built on Muse AI
In a blog post, researchers at Microsoft elaborated on the AI-generated gameplay and the underlying development process. The application of AI in generating both 2D and 3D game environments has garnered significant interest from researchers, serving as a test for the technology’s potential to create real-time worlds and adapt to various mechanics employed by users. This initiative offers insights into whether AI models can learn to perform real-world tasks, similar to controlling robots in physical environments.
Quake II, originally launched in 1997 and published by Microsoft-owned Activision, is a first-person shooter characterized by forward-scrolling, level-based gameplay. It features a variety of mechanics such as jumping, crouching, shooting, environmental destruction, and camera controls. Players can access the game through Copilot Labs, where they can engage in a single level for approximately two minutes using either a controller or keyboard input.
Regarding the development methodology, the research team reported the use of Muse AI models alongside the World and Human Action Model (WHAM) to implement the innovative WHAMM technique.
WHAMM Overview
Photo Credit: Microsoft
WHAMM represents an advancement from WHAM-1.6B, enabling the generation of over 10 frames per second to facilitate real-time video output. The gameplay resolution has been set at 640×360 pixels. According to Microsoft, one of WHAMM’s notable enhancements in processing speed stems from the adoption of the MaskGIT (Mask Generative Image Transformer) framework, which improved frame rates significantly from one frame per second to above ten.
The MaskGIT architecture empowers the researchers to generate all tokens for an image within a limited series of forward passes. This capability allows the AI model to predict the various potential actions of a singular masked image instantaneously, contributing to a more fluid gameplay experience.
While the essential gameplay remains true to the original Quake II, Microsoft has highlighted several current limitations in the demo. The AI-generated game environment only approximately mirrors reality rather than providing an exact replication. Players might encounter issues such as blurred images during enemy interactions or inaccuracies in combat mechanics.
Moreover, WHAMM’s contextual framework currently operates with a 0.9-second memory (equivalent to 9 frames at 10 fps). This limitation means the model may overlook objects that go out of view for longer than this duration. Users may find themselves in unexpected scenarios, such as when turning around to discover a new area or looking up and then back down to find a shift in their location on the map. Additionally, substantial latency is notable in the game, attributed to its widespread availability.