Microsoft Releases New AI Models That Can Generate Images, Audio and Transcribe Text

Microsoft has released MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 AI models.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 3 April 2026 18:44 IST
Highlights
  • These models are available via Microsoft Foundry and MAI Playground
  • MAI-Transcribe-1 is said to outperform Google and OpenAI’s models
  • Voice-1 can generate realistic speech with an emotional range

The Image-2 model is being rolled out to Copilot, Bing, and PowerPoint

Photo Credit: Microsoft

Microsoft released three specialised artificial intelligence (AI) models on Thursday, focusing on image generation, voice generation, and speech-to-text transcription. The Redmond-based tech giant claims that these models outperform specialised models from rival companies, such as Google, OpenAI, and others. The models, MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, are also said to focus on fast generation and competitive pricing. These are currently available via the Microsoft Foundry, and they are also being rolled out to various consumer products.

Microsoft Brings Three New AI Models

In a newsroom post, the tech giant introduced the three new large language models (LLMs). All of them are currently available via Microsoft Foundry and the MAI Playground. The biggest highlight is the MAI-Transcribe-1, which the company claims delivers state-of-the-art (SOTA) speech-to-text transcription across the 25 most used languages.

Advertisement

The claims are based on Microsoft's internal testing on the FLEURS benchmark. It is said to outperform Gemini 3.1 Flash and GPT-Transcribe in error rate. Additionally, the company says Foundry users will find it to be the “best price-performance of any large cloud provider.”

Coming to MAI-Voice-1, the LLM is said to generate “natural, realistic speech, rich with nuance, emotional range, and expression.” The model is also said to deliver consistent speech and voice identity during long-form content generation. Inside Foundry, the model will also allow users to create a custom voice with a few seconds of audio.

Advertisement

Microsoft claims that this process is safe and secure. It is said to generate 60 seconds of audio in a single second. Notably, the AI model will also power Copilot Audio Expressions and Copilot Podcasts.

Finally, the MAI-Image-2 model builds on the capabilities of its predecessor and is said to deliver improved output quality at a faster speed. Microsoft revealed that the model was created in collaboration with photographers, designers, and visual storytellers, and it focuses on natural lighting, accurate textures, and clear in-image text. Notably, WPP is among the first enterprise partners to have adopted the AI model.

Advertisement

The model, similar to the other two, will be available via the Microsoft Foundry and the MAI Playground. Additionally, it is also rolling out to Copilot, Bing, and PowerPoint.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. iPhone 17 Pro Max At Rs. 1,02,900 in Apple 50th Anniversary Sale
  2. Vivo T5 Pro 5G Confirmed to Launch in India Soon With These Features
  3. These Four Motorola Phones Are Now Eligible to Get Android 17 Beta Updates
  4. OnePlus Nord 6 First Impressions
  5. Microsoft's Three New AI Models Said to Rival OpenAI and Google
  6. Vivo X300 Ultra European Price Revealed in New Leak
  1. Microsoft Releases New AI Models That Can Generate Images, Audio and Transcribe Text
  2. Redmi K Pad 2, New Redmi Laptops Tipped to Launch Alongside Redmi K90 Ultra
  3. Google Pixel 10 Users Can Now Play Steam Games Offline via GameNative 0.9.0
  4. Circle Unveils cirBTC Token to Expand Bitcoin’s Role in DeFi Ecosystem
  5. Honor 600 Series Could Launch Soon as Company Starts Teasing Debut of a New Phone
  6. Microsoft AI Chief Wants to Deliver State-of-the-Art AI Models by 2027: Report
  7. Infinix GT 50 Pro Leak Shows Design, Cooling, Gaming Features Ahead of Anticipated Launch
  8. Samsung Galaxy Z Fold 8, Galaxy Z Flip 8 to Stick With Older M13 OLED Panels: Report
  9. Crypto Hack Losses Drop to $168.6 Million in Q1 2026 Despite Ongoing Risks
  10. Google Vids Will Now Let All Users Generate Veo 3.1 AI Videos for Free, New Features Added
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.