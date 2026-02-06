OpenAI released GPT-5.3-Codex, a new artificial intelligence (AI) model for agentic coding tasks, on Thursday. The San Francisco-based AI giant said that the new model will allow Codex to do more, including developing complex video games and applications from scratch. Calling it the company's most capable agentic coding mode, OpenAI said it can handle full workflows, debug entire codebases, research requirements, and deploy changes. Interestingly, the GPT-5.3-Codex is also the AI giant's first AI model that played a key role in the development of itself.

GPT-5.3-Codex Released

In a post, the AI giant detailed its latest frontier model for agentic coding. It is currently available with all the paid ChatGPT plans globally across mobile and desktop apps, command-line interface (CLI), integrated development environment (IDE) extension, and web. The company will soon offer the model via OpenAI's application programming interface (API) as well.

The latest model fuses the coding frontier performance of GPT-5.2-Codex with the reasoning and professional knowledge depth of GPT-5.2, creating a single, unified system. The model is 25 percent faster, allowing it to tackle long-horizon projects that involve research, tool usage and intricate execution steps. Crucially, users can steer it mid-task, which was not possible earlier. As a result, users can get progress updates, ask questions, suggest course corrections, or debate approaches without the agent dropping context.

The model even helped build itself. Early versions assisted the Codex team in debugging training runs, managing deployment and diagnosing evaluation results. OpenAI says the team was impressed by how much acceleration this self-assistance provided during development.

Coming to performance, OpenAI shared benchmark scores based on internal evaluations. On SWE-Bench Pro, a tough real-world software engineering test across multiple languages, it hits 56.8 percent accuracy, edging out GPT-5.2-Codex at 56.4 percent and GPT-5.2 at 55.6 percent. Terminal-Bench 2.0 sees a jump to 77.3 percent from 64.0 percent on the prior Codex variant. In OSWorld-Verified, which measures agent performance in visual desktop environments for productivity tasks, it reaches 64.7 percent compared to 38.2 percent for GPT-5.2-Codex.

GPT-5.3-Codex can build complex web games from underspecified prompts, iterating autonomously over millions of tokens. One demo shared in the announcement post showed a racing game complete with maps, items and racers. It also generates more production-ready websites, automatically handling features like discount displays or testimonial carousels. Outside pure coding, it supports the full software lifecycle, including writing PRDs, editing copy, user research, building slide decks, analyzing spreadsheets and monitoring systems.

OpenAI has also focused on the safety guardrails of the AI model. GPT-5.3-Codex is the first model OpenAI classifies as High capability under its Preparedness Framework for cybersecurity tasks. It deploys a comprehensive cybersecurity safety stack, including safety training, automated monitoring, trusted access controls, and threat intelligence enforcement.