Microsoft Releases New AI Models That Can Generate Images, Audio and Transcribe Text

Microsoft has released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 AI models.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 3 April 2026 18:44 IST
Highlights
  • These models are available via Microsoft Foundry and MAI Playground
  • MAI-Transcribe-1 is said to outperform Google and OpenAI’s models
  • Voice-1 can generate realistic speech with an emotional range

The Image-2 model is being rolled out to Copilot, Bing, and PowerPoint

Photo Credit: Microsoft

Microsoft released three specialised artificial intelligence (AI) models on Thursday, focusing on image generation, voice generation, and speech-to-text transcription. The Redmond-based tech giant claims that these models outperform specialised models from rival companies, such as Google, OpenAI, and others. The models, MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, are also said to focus on fast generation and competitive pricing. These are currently available via the Microsoft Foundry, and they are also being rolled out to various consumer products.

Microsoft Brings Three New AI Models

In a newsroom post, the tech giant introduced the three new large language models (LLMs). All of them are currently available via Microsoft Foundry and the MAI Playground. The biggest highlight is the MAI-Transcribe-1, which the company claims delivers state-of-the-art (SOTA) speech-to-text transcription across the 25 most used languages.

Advertisement

The claims are based on Microsoft's internal testing on the FLEURS benchmark. It is said to outperform Gemini 3.1 Flash and GPT-Transcribe in error rate. Additionally, the company says Foundry users will find it to be the “best price-performance of any large cloud provider.”

Coming to MAI-Voice-1, the LLM is said to generate “natural, realistic speech, rich with nuance, emotional range, and expression.” The model is also said to deliver consistent speech and voice identity during long-form content generation. Inside Foundry, the model will also allow users to create a custom voice with a few seconds of audio.

Advertisement

Microsoft claims that this process is safe and secure. It is said to generate 60 seconds of audio in a single second. Notably, the AI model will also power Copilot Audio Expressions and Copilot Podcasts.

Finally, the MAI-Image-2 model builds on the capabilities of its predecessor and is said to deliver improved output quality at a faster speed. Microsoft revealed that the model was created in collaboration with photographers, designers, and visual storytellers, and it focuses on natural lighting, accurate textures, and clear in-image text. Notably, WPP is among the first enterprise partners to have adopted the AI model.

Advertisement

The model, similar to the other two, will be available via the Microsoft Foundry and the MAI Playground. Additionally, it is also rolling out to Copilot, Bing, and PowerPoint.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. CMF's Himanshu Tandon Departs Firm After a 10-Month Stint
  2. Here's Our First Look of the Nothing Phone 4b 'RCB Edition' Variant
  3. OTT Releases This Week: Elle, Super Subbu, Enola Holmes 3, and More
  4. Amazon Prime Day Sale: Early Deals on Smartphones From Top Brands Revealed
  1. PS Plus Monthly Games for July Include Call of Duty: Modern Warfare 3, For the King 2 and CrossCode
  2. Nothing Phone 4b RCB Edition Design, Colour Revealed Days Ahead of Debut
  3. Garmin Forerunner 70, Forerunner 170, Forerunner 170 Music Launched in India With 1.2-Inch Display, Up to 13 Days Battery Life
  4. Redmi Note 17 Series Launch Timeline Teased, Company Touts Display Upgrades and Longer Battery Life
  5. Lava Probuds T51, Xscape 13° Neckband With Up to 70 Hours Battery Life Launched in India: Price, Features
  6. Best Noise Cancellation Headphones in India to Buy This Amazon Prime Day: boAt Rockerz 650 Pro, JBL Tune 520 BT and More
  7. Oppo Enco Air 5 With Up to 52dB ANC, Up to 54 Hours Battery Launched in India: Price, Features
  8. Apple Reportedly Cuts iPhone 17 Series Production Plans by 15 Percent as Demand Softens
  9. Moto G77 Power Set to Launch in India Next Week; Price Range, Specifications Revealed
  10. CMF's Himanshu Tandon Announces Exit Weeks After Firm Confirms 2026 Phone Strategy
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.