OpenAI called GPT-5.5 models a “new class of intelligence for real work and powering agents.”
GPT-5.5 outperforms Claude Opus 4.7 and Gemini 3.1 Pro on the Terminal-Bench 2.0 benchmark
Photo Credit: Unsplash/Solen Feyissa
OpenAI, on Thursday, introduced a new major update to its GPT-5 series artificial intelligence (AI) models. Bringing it to GPT-5.5, the new models are said to offer improved intent understanding, agentic coding, and reasoning capabilities. The San Francisco-based AI giant claims that the models outperform Anthropic's latest Claude Opus 4.7 and Google's Gemini 3.1 Pro models across a wide range of tasks. These large language models (LLMs) are first rolling out to the company's paid subscribers, and via the application programming interface (API).
In a post, the company introduced and detailed the GPT-5.5 series AI models. There are three variants — the base model, GPT-5.5 Thinking, and GPT-5.5 Pro. OpenAI is rolling out GPT-5.5 and the Thinking model to Plus, Pro, Business, and Enterprise users in ChatGPT, but the GPT-5.5 Pro is not available to Plus users.
In Codex, the GPT-5.5 AI model is available to Go, Plus, Pro, Business, Enterprise, and Edu subscribers. The coding platform will offer a context window of 400K tokens and a Fast mode. OpenAI said the AI models will soon be available via the Responses and Chat Completions APIs. The pricing is set at $5 (roughly Rs. 471) per million input tokens and $30 (roughly Rs. 2,828) per million output tokens.
Coming to improvements, the models focus heavily on agentic coding and intent understanding. OpenAI claims that the GPT-5.5 achieved 82.7 percent on the Terminal-Bench 2.0 benchmark, which tests command-line workflows, outperforming both Anthropic and Google's latest models. It also scored 73.1 percent on the SWE-Bench Pro, the company's internal evaluation for long-horizon coding tasks.
In real-world performance, it is said that Codex, powered by GPT-5.5, can handle code implementation, refactoring, debugging, testing, and validation. OpenAI says the model can also reason through ambiguous failures, check assumptions with tools, and carry changes across the surrounding codebase.
Another key area is knowledge work. The model is said to be more intuitive than its predecessor and offers improved intent understanding. It can better gauge what the user really wants, find relevant information, and review the output to generate something useful. Further, these capabilities are also claimed to make the model better at scientific and technical research. GPT-5.5 can gather evidence, test assumptions, interpret results, and decide on the next steps, OpenAI said.
Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.