ByteDance Unveils Bagel Open Source Multimodal AI Model With Support for Generating, Editing Images

ByteDance’s Bagel is a visual language model (VLM) with 14 billion parameters.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 27 May 2025 16:23 IST
Highlights
  • Bagel is said to outperform Gemini-2-exp in image editing
  • It is said to outperform Qwen2.5-VL in image understanding
  • The AI model is available to download with an Apache 2.0 licence

ByteDance said Bagel can generate and edit images while using reasoning capabilities

Photo Credit: Unsplash/Markus Winkler

ByteDance released a new multimodal artificial intelligence (AI) model last week. Dubbed Bagel, it is a visual language model (VLM), which is capable of understanding, generating, and editing images. The Beijing-based tech giant has open-sourced the AI model, and it is available to download via popular AI repositories such as GitHub and Hugging Face. The company claims Bagel is capable of free-form visual manipulation, multiview synthesis, and world navigation, which makes it more capable in image editing compared to existing open-source VLMs.

ByteDance's Bagel Outperforms Gemini-2-exp in Image Editing

A GitHub listing page sheds more light on ByteDance's Bagel AI model, including its weights and datasets. However, the company did not provide details about the post-training processes, or the architecture of the model. It is currently available with a permissive Apache 2.0 licence, which allows both academic and commercial usage.

Bagel is a multimodal AI model that accepts both text and images as input. The open-source VLM features a total of 14 billion parameters, out of which seven billion remain active at a time. ByteDance claims that the model was trained on large-scale interleaved multimodal data. This means that different types of data, such as text and images, were combined while feeding the AI system. As a result, the model learned from both modalities jointly, instead of separately.

Advertisement

This method allows foundation models to gain context between different modalities. For instance, if Bagel was fed images and their captions together, it would be better able to understand what the text exactly represents in the visual medium. This would result in more efficient output, as per the company.

Advertisement

ByteDance also claims that the AI model displays better image editing capabilities compared to existing open-source VLMs. It can perform complex tasks such as adding emotion to an image, removing, replacing or adding elements, style transfer, as well as making free-form edits. The company claims that with this ability, Bagel is capable of providing significantly higher output while world-modelling.

World-modelling refers to an AI system's internal understanding of how the real world functions visually. This would include the relationship between different objects, physical context, and the effect of physical factors such as light, wind, rain, and gravity.

Advertisement

Based on internal testing, ByteDance claims that Bagel was able to outperform Qwen2.5-VL-7B, a similarly sized model, in image understanding. It is also said to score higher in image generation benchmarks than Janus-Pro-7B and Flux-1-dev. Additionally, it is also said to beat Gemini-2-exp on the GEdit-Bench for image editing.

Those who wish to try out the AI model without locally running it can head to Hugging Face, where ByteDance has set up a cloud-based interface to test its image analysis, generation, and editing.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. iPhone 17e Launched in India With MagSafe, 48-Megapixel Camera: See Price
  2. iPad Air (2026) With M4 Chip Launched in India at This Price
  3. Poco X8 Lineup, Poco C85x 5G Appear on Flipkart Ahead of Launch
  4. Honor Magic V6 Debuts Globally With 6,600mAh Battery, Latest Snapdragon Chip
  1. Total Lunar Eclipse 2026: Where and How to See the Rare Blood Moon
  2. Poco X8 Series, Poco C85x 5G Teased on Flipkart, Could Launch in India in March
  3. iPad Air (2026) Launched in India With M4 Chip, Up to 13-Inch Display: Price, Specifications
  4. iPhone 17e Launched in India With MagSafe, Ceramic Shield 2 and A19 Chip: Price, Specifications
  5. MWC 2026: Tecno Camon 50 Series Launched as Firm Unveils Modular Concept Phone, Lamborghini Collaboration
  6. Samsung Galaxy S26 Ultra's Successor Tipped to Feature 200-Megapixel ISOCELL HPA Sensor With LOFIC
  7. Moto Buds 2 Plus Launched With Dynamic ANC, Sound by Bose Alongside Moto Buds 2 at MWC 2026
  8. MediaTek Set to Demonstrate 6G, 5G-Advanced, Edge AI Innovations at ‘AI For Life’ Showcase at MWC 2026
  9. MWC 2026: Lenovo Unveils New Yoga, IdeaPad Series Laptop Models Alongside Legion Tab (2026), Idea Tab Pro Gen 2
  10. With Love OTT Release Date: When and Where to Watch it Online?
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.