Search

Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

Magma is pre-trained on large amounts of heterogeneous VL datasets including images, videos and robotics data.

Advertisement
Highlights
  • Mgama can not only understand multimodal input but act upon it as well
  • Magma also features spatial intelligence
  • It features Set-of-Mark and Trace-of-Mark technical components
Microsoft Announces Magma Foundation Model That Can Complete Multimodal Agentic Tasks

Microsoft said the foundation model ranked competitively in benchmark tests, as per internal testing

Photo Credit: Reuters

Microsoft researchers announced a new foundation model on Wednesday that can perform agentic functions. Dubbed Magma, the artificial intelligence (AI) model is pre-trained on a large volume of datasets across text, images, videos, as well as spatial formats. The Redmond-based tech giant said that Magma is an extension of vision-language (VL) models and it can not only understand multimodal information but can also plan and act on them. The AI agent-enabled model can be used in a wide range of tasks including computer vision, user interface (UI) navigation, and robot manipulation.

Microsoft Announces Magma Foundation Model

In a GitHub post, Microsoft researchers detailed the new Magma foundation model. Foundation models are distinctive large language models (LLMs), which are built from scratch and are not distilled from any other model. They often become the baseline for other models in the series. Magma is unique in the sense that the AI model is pre-trained on a wide range of datasets.

The researchers stated that the base architecture behind Magma is the Llama 3 AI model. However, Magma is also equipped with the ability to plan and act in the visual-spatial world. This allows the model to not only generate outputs like a chatbot but also execute actions.

It can be used as a computer vision chatbot that can offer information about the world it views when paired with camera sensors. Magma can also be used to control the UI of a device. But more interestingly, it can also control robots to complete complex tasks using agentic capabilities.

The researchers said a major reason behind these capabilities is the diverse dataset along with two technical components — Set-of-Mark and Trace-of-Mark. The former enables action grounding in images, videos and spatial data by having the model predict numeric marks for buttons or robot arms in image space. The latter feeds the model temporal video dynamics and makes it predict the next frames before it takes action. This allows the model to develop a strong spatial understanding.

Microsoft researchers also shared the benchmark scores of the AI model based on internal testing. It has achieved competitive scores across all the agentic evaluation tests, outperforming models by OpenAI, Alibaba, and Google. The company has not released Magma in the public domain as of now.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

 
Show Full Article
Please wait...
Advertisement

Related Stories

Popular Mobile Brands
  1. Apple Announces iOS 26 With Liquid Glass Design, These New Features
  2. iQOO 13 and More Available With Discounts During iQOO 5th Anniversary Sale
  3. Everything We Know About the Vivo T4 Ultra Ahead of Its June 11 Launch
  4. Poco F7 India Launch Teased; Flipkart Availability Confirmed
  5. Lava Storm Play 5G, Storm Lite 5G Design Teased; India Launch Date Set
  6. iOS 26, iPadOS 26 Are Compatible With These iPhone and iPad Models
  7. Samsung Galaxy Z Fold 7 Claimed to Be Thinnest, Lightest Foldable to Date
  8. WazirX Parent Zettai Seeks Moratorium Extension, Responds to Court Criticism
  1. WWDC 2025: visionOS 26 Announced With Improvements to Personas and New Spatial Features
  2. Samsung Galaxy Z Fold 7 Teased; Claimed to Be Slimmest, Lightest, and Most Advanced Foldable Yet
  3. Konami to Host Livestream Focussed on Metal Gear Solid Delta: Snake Eater and Silent Hill f This Week
  4. Disney to Pay Comcast $439 Million More for Its Hulu Stake
  5. WWDC 2025: Apple Announces tvOS 26 With Liquid Glass Design, Personalised FaceTime Experience, and More
  6. SpaceX Launches SiriusXM’s SXM-10 Satellite, Nails Booster Landing
  7. Tata Motors to Invest up to $4 Billion Over Five Years for EVs, New Cars
  8. iOS 26 and iPadOS 26 Drop Support for Three Older Devices: Check If Yours Made the Cut
  9. WWDC 2025: macOS Tahoe 26 to Be the Last Major Software Update for Intel-Powered Macs
  10. Apple Rolls out iOS 26 Beta 1; Know How to Download and Install, Check Compatible iPhone List
Gadgets 360 is available in
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »