OpenAI Introduces GPT-Realtime Speech Generation Model, Makes Realtime API Generally Available

OpenAI’s GPT-Realtime is reportedly the company’s most advanced voice model, designed for customer support and assistance.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 29 August 2025 13:21 IST
Highlights
  • OpenAI said the model was trained in collaboration with companies
  • GPT-Realtime will be available with new Cedar and Marine voices
  • The Realtime API was first released as a public beta in October 2024

OpenAI’s new speech generation model can also analyse and read text in images

Photo Credit: OpenAI

OpenAI, on Thursday, announced a new artificial intelligence (AI) speech generation model dubbed GPT-Realtime. This is an enterprise-focused model that is capable of generating native audio with low latency, enabling two-way, real-time voice conversations. The San Francisco-based AI firm said that compared to its existing voice models, the Realtime model offers higher quality output, lower processing times, as well as additional features such as tool calling, support for remote Model Context Protocol (MCP) servers and image input, and the ability to detect alphanumeric sequences in select non-English languages.

OpenAI Brings New Speech Model for Enterprises

In a post, the AI firm announced the release of its most advanced speech generation model, GPT-Realtime. To explain, a speech generation model is different from the traditional voice assistants that companies use for customer support. Those chains together multiple systems, such as text-to-speech and speech-to-text, to carry out a voice conversation with a human. In comparison, the OpenAI model can natively process speech input and generate corresponding speech output, resulting in significantly lower response times.

GPT-Realtime features several new and enhanced capabilities. Similar to Advanced Voice Mode, it is capable of generating a highly expressive and natural-sounding voice, which developers can fine-tune with text-based instructions. Two new voices are being introduced, male voice Cedar and female voice Marin, and the company is also updating the existing eight voices.

Advertisement

In terms of performance, the model can capture non-verbal cues, such as laughter, and respond to them. It can also switch languages mid-sentence and adapt to the user's tone. Based on internal evaluations, OpenAI claims that the model displays higher performance in detecting alphanumeric sequences (such as phone and policy numbers) in non-English languages, such as Chinese, French, Japanese, and Spanish.

Advertisement

The company claimed that GPT-Realtime scored 82.8 percent on the Big Bench Audio benchmark, which measures a voice model's accuracy and reasoning ability. This is significantly higher than its predecessor from December 2024, which scored 65.6 percent.

Additionally, OpenAI claimed that the speech generation model has higher instruction adherence, supports function and tool calling, and can be configured to support remote MCP servers. It can also analyse and read images, allowing use cases where users can upload an image for better context, and the model can then incorporate it into the conversation.

Advertisement

Notably, GPT-Realtime is an enterprise-focused offering, and it is exclusively available with the company's Realtime API, which is now generally available to all developers. The API was first introduced in October 2024 as a public beta.

Coming to the model's pricing, GPT-Realtime will cost developers $32 (roughly Rs. 2,800) per million input and $64 (roughly Rs. 5,600) per million output tokens. Cached input tokens (per million) are priced at $0.40 (roughly Rs. 35).

 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement

Related Stories

Popular Mobile Brands
  1. iPhone 17 Pro Max Redesigned Camera Module, Foldable iPhone Timeline Leaked
  2. Samsung Galaxy S25 FE Accessories Leaked Ahead of September 4 Launch
  3. Motorola Edge 60 Neo Key Specifications Tipped Ahead of Imminent Launch
  4. IFA 2025: Acer Unveils Swift Air 16, Chromebook Plus Spin 514 Unveiled
  5. This iPhone 17 Model Will Reportedly Get More Expensive
  1. Scientists Create Stretchy Rubber That Converts Body Heat Into Electricity for Wearables
  2. NASA’s InSight Reveals Ancient Planetary Remains Preserved Deep Inside Mars
  3. Rajinikanth’s Coolie is Coming to OTT Platforms Soon: Know When, Where to Watch it Online
  4. NASA’s Juno Spacecraft Detects Callisto’s Aurora, Completing Jupiter’s Galilean Moons Set
  5. Kalyani Priyadarshan’s Lokah Chapter 1: Chandra OTT Release Date Revealed
  6. Astronomers Discover Calvera, a Runaway Pulsar Racing Above the Milky Way
  7. Itel A90 Limited Edition Launched in India With MIL-STD-810H Durability: Price, Specifications
  8. OKX Faces EUR 2.25 Million Fine By Dutch National Bank for Operating Without Registration
  9. NASA’s OSIRIS-REx Mission Finds Stardust in Asteroid Bennu Older Than the Solar System
  10. Swiggy and Zomato Raise Platform Fees to Up to Rs. 15 Amidst Rise in Festival-Related Demand
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.