AMD has unveiled its latest artificial intelligence model, the Stable Diffusion 3 Medium, which has been specifically optimized for its XDNA 2 neural processing units (NPUs). The company asserts that this model is the first of its kind able to process outputs in the BF16 format. Users can access this model on newer Ryzen AI laptops, which are equipped with a minimum of 24GB of RAM, following the installation of Tensorstack’s Amuse 3.1 beta software. Notably, the Stable Diffusion 3 Medium functions as an on-device image generator that operates without needing an Internet connection.
AMD’s Image Generation Model Can Generate Print-Ready Images
The Santa Clara-based company elaborated on the new model in a press release. The Stable Diffusion 3 Medium AI model is designed to enhance the capabilities of AMD’s XDNA NPUs and will be featured in Ryzen AI laptops set to launch in 2024 and later.
According to AMD, the model enables users to create stock-quality images directly from text prompts. It generates images at a resolution of 1024×1024 pixels, which are subsequently upscaled to a print-ready resolution of 2048×2048 pixels, utilizing the processing power of the NPU.
This innovative AI model is included in AMD and Tensorstack’s new Amuse 3.1 desktop application, available for free download. Designed to operate entirely offline, the image generation takes place on the device itself, facilitated by the XDNA 2 NPUs.
AMD has made improvements to the memory requirements for this model, reducing it to 24GB of RAM, in contrast to the 32GB necessary for the previous Stable Diffusion XL Turbo model. Furthermore, while active, the new model only requires 9GB of RAM. This memory efficiency was achieved through the use of block floating point 16 (BF16) format.
The company emphasized that the Stable Diffusion 3 Medium AI model maintains a strict adherence to user prompts in terms of structure and order. Users are encouraged to articulate the type of image desired, followed by its structural components, and finish with specific details and context. The model also allows for the inclusion of negative prompts to exclude certain elements, with the placement of full stops influencing the model’s contextual interpretation.