ByteDance Unveils Bagel Open Source Multimodal AI Model With Support for Generating, Editing Images

ByteDance’s Bagel is a visual language model (VLM) with 14 billion parameters.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 27 May 2025 16:23 IST
Highlights
  • Bagel is said to outperform Gemini-2-exp in image editing
  • It is said to outperform Qwen2.5-VL in image understanding
  • The AI model is available to download with an Apache 2.0 licence

ByteDance said Bagel can generate and edit images while using reasoning capabilities

Photo Credit: Unsplash/Markus Winkler

ByteDance released a new multimodal artificial intelligence (AI) model last week. Dubbed Bagel, it is a visual language model (VLM), which is capable of understanding, generating, and editing images. The Beijing-based tech giant has open-sourced the AI model, and it is available to download via popular AI repositories such as GitHub and Hugging Face. The company claims Bagel is capable of free-form visual manipulation, multiview synthesis, and world navigation, which makes it more capable in image editing compared to existing open-source VLMs.

ByteDance's Bagel Outperforms Gemini-2-exp in Image Editing

A GitHub listing page sheds more light on ByteDance's Bagel AI model, including its weights and datasets. However, the company did not provide details about the post-training processes, or the architecture of the model. It is currently available with a permissive Apache 2.0 licence, which allows both academic and commercial usage.

Advertisement

Bagel is a multimodal AI model that accepts both text and images as input. The open-source VLM features a total of 14 billion parameters, out of which seven billion remain active at a time. ByteDance claims that the model was trained on large-scale interleaved multimodal data. This means that different types of data, such as text and images, were combined while feeding the AI system. As a result, the model learned from both modalities jointly, instead of separately.

This method allows foundation models to gain context between different modalities. For instance, if Bagel was fed images and their captions together, it would be better able to understand what the text exactly represents in the visual medium. This would result in more efficient output, as per the company.

Advertisement

ByteDance also claims that the AI model displays better image editing capabilities compared to existing open-source VLMs. It can perform complex tasks such as adding emotion to an image, removing, replacing or adding elements, style transfer, as well as making free-form edits. The company claims that with this ability, Bagel is capable of providing significantly higher output while world-modelling.

World-modelling refers to an AI system's internal understanding of how the real world functions visually. This would include the relationship between different objects, physical context, and the effect of physical factors such as light, wind, rain, and gravity.

Advertisement

Based on internal testing, ByteDance claims that Bagel was able to outperform Qwen2.5-VL-7B, a similarly sized model, in image understanding. It is also said to score higher in image generation benchmarks than Janus-Pro-7B and Flux-1-dev. Additionally, it is also said to beat Gemini-2-exp on the GEdit-Bench for image editing.

Those who wish to try out the AI model without locally running it can head to Hugging Face, where ByteDance has set up a cloud-based interface to test its image analysis, generation, and editing.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. HP OmniBook X 14, Ultra 16 Refreshed With Nvidia RTX Spark 'Superchip'
  2. Microsoft Unveils Surface Laptop Ultra as Its Most Powerful Laptop to Date
  3. Huawei Nova 16, Nova 16z Debut With 50-Megapixel Camera at This Price
  4. Itel Aqua Launched in India With IP67 Rating, 1,200mAh Battery: See Price
  5. Huawei Nova 16 Pro, Nova 16 Ultra Debut With 7,000mAh Battery: See Price
  6. Asus Unveils These ROG Edition 20 Lineup Products at Computex 2026
  7. Asus ROG Strix Scar 18 (2026) With 240Hz 4K Screen Showcased at Computex
  1. Asus ROG Edition 20 Lineup Unveiled at Computex 2026 to Commemorate 20 Years of ROG Series Products
  2. Indian Startup Pawzeeble Is Building a Pet-Focused Social Networking Space for Indian Users
  3. Asus ROG Strix Scar 18 (2026) With 240Hz 4K Mini-LED Display Showcased at Computex 2026
  4. Huawei Nova 16 Pro, Nova 16 Ultra Launched With Kirin 9010S SoC, 7,000mAh Battery: Price, Specifications
  5. Huawei Nova 16 Launched With 7,000mAh Battery, 50-Megapixel Camera, Nova 16z Tags Along: Price, Specifications
  6. Computex 2026: AMD Unveils Ryzen 7 7700X3D, Radeon RX 9070 GRE; Extends AM5 Support to 2029
  7. Itel Aqua Launched in India With IP67 Rating, 1,200mAh Battery: Price, Features
  8. Vivo X Fold 6 Launch Timeline Leaked; Tipped to Arrive With MediaTek Dimensity 9500 Chip
  9. HP OmniBook Ultra 16 (2026), OmniBook X 14 (2026) Unveiled With Nvidia's RTX Spark 'Superchip'
  10. Acer Swift Air 14 Launched With Intel Core Series 3 CPU, Lightweight Design at Computex 2026
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.