On Thursday, Microsoft unveiled its inaugural in-house AI models, known as MAI-Voice-1 and MAI-1-preview. The MAI-Voice-1 speech model is capable of producing a minute of audio in less than one second using a single GPU, while MAI-1-preview promises to showcase future enhancements within the Copilot suite.
The MAI-Voice-1 model is already being utilized in various features, including the Copilot Daily, where an AI-generated host narrates the day’s leading news stories, and in generating podcast-style dialogues to clarify complex topics.
Users can experiment with the MAI-Voice-1 model at Copilot Labs, allowing them to input text for the AI to vocalize, along with options to customize its voice and speaking style. Meanwhile, the MAI-1-preview was trained on approximately 15,000 Nvidia H100 GPUs, designed to assist users in following instructions and delivering useful responses to everyday inquiries.
Mustafa Suleyman, chief of Microsoft AI, stated in a previous episode of Decoder that the company’s internal AI models are not primarily aimed at enterprise applications. “We need to develop something that excels in consumer use and is tailored to our specific requirements,” Suleyman noted. He emphasized that the focus lies on crafting models that effectively serve as consumer companions.
The plans for the MAI-1-preview model include its application for specific text use cases within the Copilot AI assistant, which at present depends on OpenAI’s large language models. Microsoft has begun testing MAI-1-preview publicly on the AI benchmarking site LMArena.
According to a blog post from Microsoft AI, “We have significant ambitions for our future direction. Beyond immediate advancements, we believe that orchestrating a collection of specialized models to serve diverse user intents and needs will unlock substantial value.”
0 Comments