Microsoft has released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 AI models.
Photo Credit: Microsoft
The Image-2 model is being rolled out to Copilot, Bing, and PowerPoint
Microsoft released three specialised artificial intelligence (AI) models on Thursday, focusing on image generation, voice generation, and speech-to-text transcription. The Redmond-based tech giant claims that these models outperform specialised models from rival companies, such as Google, OpenAI, and others. The models, MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, are also said to focus on fast generation and competitive pricing. These are currently available via the Microsoft Foundry, and they are also being rolled out to various consumer products.
In a newsroom post, the tech giant introduced the three new large language models (LLMs). All of them are currently available via Microsoft Foundry and the MAI Playground. The biggest highlight is the MAI-Transcribe-1, which the company claims delivers state-of-the-art (SOTA) speech-to-text transcription across the 25 most used languages.
The claims are based on Microsoft's internal testing on the FLEURS benchmark. It is said to outperform Gemini 3.1 Flash and GPT-Transcribe in error rate. Additionally, the company says Foundry users will find it to be the “best price-performance of any large cloud provider.”
Coming to MAI-Voice-1, the LLM is said to generate “natural, realistic speech, rich with nuance, emotional range, and expression.” The model is also said to deliver consistent speech and voice identity during long-form content generation. Inside Foundry, the model will also allow users to create a custom voice with a few seconds of audio.
Microsoft claims that this process is safe and secure. It is said to generate 60 seconds of audio in a single second. Notably, the AI model will also power Copilot Audio Expressions and Copilot Podcasts.
Finally, the MAI-Image-2 model builds on the capabilities of its predecessor and is said to deliver improved output quality at a faster speed. Microsoft revealed that the model was created in collaboration with photographers, designers, and visual storytellers, and it focuses on natural lighting, accurate textures, and clear in-image text. Notably, WPP is among the first enterprise partners to have adopted the AI model.
The model, similar to the other two, will be available via the Microsoft Foundry and the MAI Playground. Additionally, it is also rolling out to Copilot, Bing, and PowerPoint.
Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.
Redmi K Pad 2, New Redmi Laptops Tipped to Launch Alongside Redmi K90 Ultra