OpenAI Introduces Flex Processing in API to Help Developers Cut AI Usage Costs

OpenAI says Flex processing will offer lower inference costs in exchange for slower response times.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 18 April 2025 17:39 IST
Highlights
  • Flex processing is currently available in beta for o3 and o4-mini models
  • It can return a resource unavailability error occasionally
  • Flex processing will reduce API inference costs by half

OpenAI recommends that developers increase the timeout duration for lengthy prompts

Photo Credit: Unsplash/Solen Feyissa

OpenAI introduced a new service tier for developers on Thursday via its application programming interface (API). Dubbed Flex processing, it reduces the AI usage costs by half for developers, compared to standard pricing. However, the lowered prices come with the consequence of slower response times and occasional resource unavailability. The new API feature is currently available in beta for select reasoning-focused large language models (LLMs). The San Francisco-based AI firm said this service tier can be useful for non-production and non-priority tasks.

OpenAI Adds New Service Tier in API

In its support page, the AI firm detailed this service tier. The Flex processing is currently available in beta for Chat Completions and Responses APIs, and works with the o3 and o4-mini AI models. Developers can set the service tier parameter to Flex in API request to activate the new mode.

Advertisement

One downside of the cheaper API pricing is that the processing time will be significantly higher. OpenAI says developers opting for Flex processing should expect slower response times and occasional resource unavailability. Additionally, users may also face API request timeout issues, in case the prompt is lengthy or the request is complex. As per the AI firm, this mode can be helpful for non-production or low-priority tasks such as model evaluations, data enrichment, or asynchronous workloads.

Notably, OpenAI highlights that developers can avoid timeout errors by increasing the default timeout. By default, these APIs are set to timeout at 10 minutes. However, with Flex processing, lengthy and complex prompts can take longer than that. The company suggests increasing the timeout will reduce the chances of getting a error.

Advertisement

Additionally, Flex processing might sometimes lack resources to handle developers' requests, and instead flag the “429 Resource Unavailable” error code. To manage these scenarios, developers can retry requests with exponential backoff, or switch to the default service tier if timely completion is necessary. OpenAI said it will not charge developers when they receive this error.

Currently, the o3 AI model charges $10 (roughly Rs. 854) per million input tokens and $40 (roughly Rs. 3,418) per million output tokens in the standard mode. The Flex processing brings down the input cost to $5 (roughly Rs. 427) and the output cost to $20 (roughly Rs. 1,709). Similarly, the new service tier will charge $0.55 (roughly Rs. 47) per million input tokens and $2.20 (roughly Rs. 188) per million output tokens for the o4-mini AI model, instead of $1.10 (roughly Rs. 94) for input and $4.40 (roughly Rs. 376) for output in the standard mode.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Realme Narzo Days Sale Brings Discounts on These Narzo Series Phones
  2. New Game From Assassin's Creed Creator Faces Backlash Over AI Assets
  3. ColorOS 17 to Focus on User Experience, No Major Design Changes Expected
  4. Moto G Max 5G With a 200-Megapixel Rear Camera Arrives at This Price
  5. YouTube Brings Its In-App Chat and Video Sharing Features to More Countries
  6. Samsung's One UI 9 Beta Is Now Available to Test on the Galaxy S26 Series
  7. Vi 5G Comes to More Cities; Services Restored on Mumbai Metro Aqua Line 3
  8. Oppo Reno 16 Series Price, Storage Variants Leak Ahead of Launch
  9. FIFA World Cup 2026: How to Watch the World Cup Live on OTT, TV Channels
  1. Honor X7e Plus 5G Reportedly Listed on TDRA and SGS Databases, May Launch in UAE and Other Global Markets
  2. 1666 Amsterdam Developer Apologises for AI Assets in Playable Demo, Promises No AI in Final Game
  3. Samsung's One UI 9 Beta Now Available to Test on Galaxy S26 Series; Wider Roll Out Could Follow
  4. Moto G Max 5G Launched With 5,200mAh Battery, 200-Megapixel Rear Camera: Price, Specifications
  5. Oppo Working on Improving User Experience With ColorOS 17 Update, Executive Says No New Features or Major Design Updates Expected
  6. OpenAI Said to Be Considering Lower Token Pricing Amidst Growing Rivalry With Anthropic
  7. Wikipedia Brings ‘Which Came First?’ History Trivia Game to iPhone a Year After Launch on Android
  8. YouTube Brings Its In-App Chat and Video Sharing Features to More Countries
  9. Oppo Reno 16 Series Price, Storage Variants Leak via European Retailer Listing
  10. Instagram, Facebook and WhatsApp Get Football-Themed Features Ahead of FIFA World Cup 2026
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.