MAI-Voice-1, Microsoft’s expressive speech generation model, is now powering the company's Copilot Daily feature.
MAI-1-preview is a text-based model designed to help users with everyday queries
Photo Credit: Microsoft
Microsoft unveiled the company's first two homegrown artificial intelligence (AI) models on Thursday. The foundation models were developed by the Redmond-based tech giant's AI division, and are said to be built entirely in-house. The first model is MAI-Voice-1, a speech generation model that natively generates expressive and natural sounding voice. The second is the MAI-1-preview, an under-testing general purpose AI model aimed at helping users with everyday queries. Microsoft said that it is focusing on applied AI as a platform to develop new products that can address the unique needs of its user base.
Since the start of the AI arms race in 2023, Microsoft has been reliant on OpenAI to power its Copilot-based AI products and tools. Over the years, the company has released smaller, fine-tuned models for specific purposes. Before Thursday, it was not a company one would associate with foundation model development.
Changing this perception courtesy of its newly bolstered AI division, Microsoft has now detailed two foundation models that it has created entirely from scratch. The company says that MAI-Voice-1 is a speech generation model that can generate “a full minute of audio in under a second on a single GPU.”
MAI-Voice-1 is not a text-to-speech (TTS) model that can only read out text presented to it. Instead, it can carry out voice conversations with another user, automatically adjusting the intonation, pitch, and cadence based on the context. Microsoft is offering the model's capabilities via its AI daily news feature and an experiment running via Copilot Labs.
First up is Copilot Daily, which features an AI host that narrates the top news stories of the day, and discusses them in a podcast-style conversation to help explain the topics. The company also announced the Copilot Labs experiment that lets users provide any text, and it will speak it aloud, using the voice and the style of narration selected by the user.
Apart from this, the tech giant is currently testing the MAI-1-preview, which is available via the crowdsourced AI model evaluation platform LMArena. Additionally, Microsoft is also making it available to its trusted testers via application programming interface (API). The mixture-of-experts (MoE) model is pre-trained and post-trained on around 15,000 Nvidia H100 GPUs.
The company says the AI model excels in instruction following and generating output for everyday queries. MAI-1-preview is planned to be rolled out for certain text-based use cases within Copilot over the next few weeks.
For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.