Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

Magma is pre-trained on large amounts of heterogeneous VL datasets including images, videos and robotics data.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 21 February 2025 19:23 IST
Highlights
  • Mgama can not only understand multimodal input but act upon it as well
  • Magma also features spatial intelligence
  • It features Set-of-Mark and Trace-of-Mark technical components
Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

Microsoft said the foundation model ranked competitively in benchmark tests, as per internal testing

Photo Credit: Reuters

Microsoft researchers announced a new foundation model on Wednesday that can perform agentic functions. Dubbed Magma, the artificial intelligence (AI) model is pre-trained on a large volume of datasets across text, images, videos, as well as spatial formats. The Redmond-based tech giant said that Magma is an extension of vision-language (VL) models and it can not only understand multimodal information but can also plan and act on them. The AI agent-enabled model can be used in a wide range of tasks including computer vision, user interface (UI) navigation, and robot manipulation.

Microsoft Announces Magma Foundation Model

In a GitHub post, Microsoft researchers detailed the new Magma foundation model. Foundation models are distinctive large language models (LLMs), which are built from scratch and are not distilled from any other model. They often become the baseline for other models in the series. Magma is unique in the sense that the AI model is pre-trained on a wide range of datasets.

The researchers stated that the base architecture behind Magma is the Llama 3 AI model. However, Magma is also equipped with the ability to plan and act in the visual-spatial world. This allows the model to not only generate outputs like a chatbot but also execute actions.

It can be used as a computer vision chatbot that can offer information about the world it views when paired with camera sensors. Magma can also be used to control the UI of a device. But more interestingly, it can also control robots to complete complex tasks using agentic capabilities.

Advertisement

The researchers said a major reason behind these capabilities is the diverse dataset along with two technical components — Set-of-Mark and Trace-of-Mark. The former enables action grounding in images, videos and spatial data by having the model predict numeric marks for buttons or robot arms in image space. The latter feeds the model temporal video dynamics and makes it predict the next frames before it takes action. This allows the model to develop a strong spatial understanding.

Microsoft researchers also shared the benchmark scores of the AI model based on internal testing. It has achieved competitive scores across all the agentic evaluation tests, outperforming models by OpenAI, Alibaba, and Google. The company has not released Magma in the public domain as of now.

 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement

Related Stories

Popular Mobile Brands
  1. Vivo S30, S30 Pro Mini, Pad 5, TWS Air 3 Launch Date, Key Features Confirmed
  2. OnePlus 13s With Snapdragon 8 Elite Chip to Launch in India on This Date
  3. iQOO Neo 10 Pro+ Battery and Charging Details Revealed Ahead of Debut
  4. Huawei MateBook Fold Ultimate Design Debuts With 18-Inch Flexible Display
  5. HP Launches OmniBook 5 Series AI PCs With Snapdragon X Series Chipsets
  6. Realme P3 5G Series to Get a Limited Period Discount in India
  7. Google I/O 2025: What to Expect From Google's Developer Conference
  8. iPhone 17 Air Leak Suggests Battery Capacity, Thickness and Weight
  9. Coinbase Faces Multiple Lawsuits After User Data Breach: Report 
  10. Realme GT 7T Design, Specifications Leaked Ahead of May 27 Launch
  1. Huawei MateBook Fold Ultimate Design With 18-Inch Double-Layer Flexible OLED Display Launched: Price, Features
  2. Huawei Nova 14 Ultra, Nova 14 Pro, Nova 14 With 5,500mAh Battery, 100W Charging Launched: Price, Specifications
  3. Coinbase Faces Multiple Lawsuits After User Data Breach: Report 
  4. Dubai's VARA Sets June 19 Deadline for Crypto Firms to Comply With Updated Activity-Based Rulebooks
  5. Acer AI TransBuds With Ear-Hook Design Unveiled at Computex 2025
  6. Nintendo Switch 2 to Support Text-to-Speech in GameChat, VRR Support Limited to Handheld Mode
  7. Honor 400 Series China Launch Date Revealed; Confirmed to Offer Battery Upgrade Over Predecessors
  8. Xiaomi to Launch 'Tesla-Challenging' YU7 on Thursday
  9. Acer FreeSense Ring With AI-Powered Health Tracking Features Unveiled in Seven Size Options
  10. Acer Swift Go 14 AI, Swift Go 16 AI Copilot+ PCs Launched at Computex 2025 Alongside Swift Edge 14 AI
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.