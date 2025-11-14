Google DeepMind introduced Scalable Instructable Multiworld Agent (SIMA) 2, an artificial intelligence (AI) agent, on Thursday. It is the successor of SIMA, which was unveiled in March 2024, and comes with several improvements over it. SIMA 2 is powered by Gemini models and can now think about its actions, reason over it, and even interact with the user via a text interface. The core functionality remains the same: it is designed to play 3D open-world video games, but it now does so more effectively. The company says SIMA 2 also improves over time, learning from its experiences.

SIMA 2 Can Now Reason, Interact, and Play Games Better

In a blog post, Google DeepMind introduced and detailed the SIMA 2 AI agent. Powered by Gemini, it is not only able to execute tasks given by humans but also understand what is being asked, reason about the environment, and plan its next steps accordingly.

The system ingests visual input (the game screen or virtual world imagery) and a human-issued goal (for example: “build a shelter” or “find the red house”), then the agent interprets that goal, constructs intermediate actions and performs them via keyboard/mouse style outputs.

One of the biggest improvements in SIMA 2 is its ability to familiarise itself with new games and environments it has not been trained on. The system was evaluated in previously unseen games, such as Minedojo (a research version of Minecraft) and ASKA, a Viking survival game, and achieved better success rates compared with its predecessor.

It also accepts multimodal prompts (sketches, emojis, different languages) and can transfer concepts. For instance, it may learn “mining” in one game and apply its learnings to the notion of “harvesting” in another, without having to start from zero.

Coming to the training setup for the AI agent, SIMA 2's dataset uses human-demonstration data and auto-generated annotation by Gemini. Additionally, whenever it learns a new motion or skill in novel environments, the data is collected and fed back to train subsequent generations of the agent. DeepMind says this reduces reliance on human-labelled data and allows SIMA 2 to continue improving from its own play.

SIMA 2 still has limitations: the model's memory of past interactions remains constrained, very long-horizon reasoning (many steps ahead) remains challenging, and precise low-level actions (such as robot-style joint control) are not addressed within this game-world framework.

Despite its prowess in video games, SIMA 2 is not being developed to become a gaming assistant. DeepMind believes that by training and testing the agent in unique 3D worlds, the learnings can then be applied to embedded AI, which powers robots that work in the real world. Ultimately, the goal is to create a general-purpose robot capable of handling multiple tasks and controllable via natural language instructions.