OpenAI Introduces New Audio Models in API, Can Be Used for Agentic Workflows

Three new AI models, GPT-4o-transcribe, GPT-4o-mini-transcribe, and gpt-4o-mini-tts, were introduced by OpenAI.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 21 March 2025 18:14 IST
Highlights
  • These models can be customised to speak in a certain manner
  • The text-to-speech models can express emotions through voice
  • OpenAI’s new generation of audio models outperforms its existing models

OpenAI says its transcription models can pick speech with accents in noisy environments

Photo Credit: OpenAI

OpenAI, on Thursday, introduced new audio models in application programming interface (API) that offer improved performance in accuracy and reliability. The San Francisco-based AI firm released three new artificial intelligence (AI) models for both speech-to-text transcription and text-to-speech (TTS) functions. The company claimed that these models will enable developers to build applications with agentic workflows. It also stated that the API can enable businesses to automate customer support-like operations. Notably, the new models are based on the company's GPT-4o and GPT-4o mini AI models.

OpenAI Brings New Audio Models in API

In a blog post, the AI firm detailed the new API-specific AI models. The company highlighted that over the years it has released several AI agents such as Operator, Deep Research, Computer-Using Agents, and the Responses API with built-in tools. However, it added that the true potential of agents can only be unlocked when they can perform intuitively and interact across mediums beyond text.

Advertisement

There are three new audio models. GPT-4o-transcribe and GPT-4o-mini-transcribe are the speech-to-text models and the GPT-4o-mini-tts is, as the name suggests, a TTS model. OpenAI claims that these models outperform its existing Whisper models which were released in 2022. However, unlike the older models, the new ones are not open-source.

Coming to the GPT-4o-transcribe, the AI firm stated that it showcases improved “word error rate” (WER) performance on the Few-shot Learning Evaluation of Universal Representations of Speech (FLEURS) benchmark which tests AI models on multilingual speech across 100 languages. OpenAI said the improvements were a result of targeted training techniques such as reinforcement learning (RL) and extensive midtraining with high-quality audio datasets.

Advertisement

These speech-to-text models can capture audio even in challenging scenarios such as heavy accents, noisy environments, and varying speech speeds.

The GPT-4o-mini-tts model also comes with significant improvements. The AI firm claims that the models can speak with customisable inflections, intonations, and emotional expressiveness. This will enable developers to build applications that can be used for a wide range of tasks including customer service and creative storytelling. Notably, the model only offers artificial and preset voices.

Advertisement

OpenAI's API pricing page highlights that the GPT-4o-based audio model will cost $40 (roughly Rs. 3,440) per million input tokens and $80 (roughly Rs. 6,880) per million output tokens. On the other hand, the GPT-4o mini-based audio models will be charged at the rate of $10 (roughly Rs. 860) per million input tokens and $20 (roughly Rs. 1,720) per million output tokens.

All of the audio models are now available to developers via API. OpenAI is also releasing an integration with its Agents software development kit (SDK) to help users build voice agents.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Vivo X300 Ultra, Vivo X300 FE Will Launch in India on This Date
  2. Motorola Edge 70 Pro Arrives With a 6,500mAh Battery at This Price in India
  3. Asus ExpertBook Ultra Debuts in India Alongside New ExpertBook Models
  4. Sennheiser CX 80U, HD 400U With USB Type-C Connectivity Launched in India
  5. Motorola Edge 70 Pro vs OnePlus Nord 6 vs Redmi Note 15 Pro+ Compared
  6. Oppo F33 Pro 5G Review: The Best Looking Phone Under Rs. 40,000?
  7. Honor 600 Pro, Honor 600 Finally Debut With 7,000mAh Batteries: See Prices
  8. Boat Aavante Prime X Soundbar With Dolby Atmos Debuts in India
  9. Suyodhana OTT Release Date: When and Where to Watch This Telugu Mystry Thriller Online?
  10. Redmi K Pad 2 Arrives With 165Hz 3K Display, 9,100mAh Battery
  1. NASA’s Curiosity Rover Finds Crater Filled With Sand, Alters Drilling Plans
  2. Control Ultimate Edition Arrives on iPhone and iPad With Touch Controls, Universal Purchase
  3. Asus ExpertBook Ultra With Intel Core Ultra X7 Series 3 CPU Launched in India Alongside ExpertBook P3, ExpertBook P5 Series
  4. Boat Aavante Prime X Soundbar Launched in India With Dolby Atmos, Wireless Satellite Speakers: Price, Features
  5. Qualcomm CEO Reportedly Visits Samsung Foundry in Korea to Discuss Producing 2nm Chips
  6. Coinbase Announces USDC-INR Trading Services for Users in India
  7. Redmi K Pad 2 Launched With 8.8-Inch 3K Display, Dimensity 9500 Chip: Price, Specifications
  8. Suyodhana OTT Release Date: When and Where to Watch This Telugu Mystry Thriller Online?
  9. OnePlus Watch 4 Launch Appears Imminent as Listing Confirms Snapdragon W5 Chip, OxygenOS Watch 8
  10. Sennheiser CX 80U, Sennheiser HD 400U With USB Type-C Connectivity Launched in India: Price, Features
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.