Apple Researchers Are Building AI Model Called ‘Ferret UI’ That Can Navigate Through iOS

Researchers claim that Ferret UI is capable of complex tasks such as widget classification and icon recognition.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 10 April 2024 17:27 IST
Highlights
  • Apple researchers said that Ferret UI is a vision-language model
  • The paper claims most MLLMs cannot process beyond complex images
  • The AI model was trained using data generated by GPT-4

The LLM is designed to automate the perception and interaction within smartphone user interfaces

Photo Credit: Pexels/Mateusz Taciak

Apple researchers have published yet another paper on artificial intelligence (AI) models, and this time the focus is on understanding and navigating through smartphone user interfaces (UI). The yet-to-be peer-reviewed research paper highlights a large language model (LLM) dubbed Ferret UI, which can go beyond traditional computer vision and understand complex smartphone screens. Notably, this is not the first paper on AI published by the research division of the tech giant. It has already published a paper on multimodal LLMs (MLLMs) and another on on-device AI models.

The pre-print version of the research paper has been published on arXiv, an open-access online repository of scholarly papers. The paper is titled “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs” and focuses on expanding the use case of MLLMs. It highlights that most language models with multimodal capabilities cannot understand beyond natural images and are functionality “restricted”. It also states the need for AI models to understand complex and dynamic interfaces such as those on a smartphone.

As per the paper, Ferret UI is “designed to execute precise referring and grounding tasks specific to UI screens, while adeptly interpreting and acting upon open-ended language instructions.” In simple terms, the vision language model can not only process a smartphone screen with multiple elements representing different information but it can also tell a user about them when prompted with a query.

Advertisement

How Ferret UI processes information on a screen
Photo Credit: Apple

Advertisement

 

Based on an image shared in the paper, the model can understand and classify widgets and recognise icons. It can also answer questions such as “Where is the launch icon”, and “How do I open the Reminders app”. This shows that the AI is not only capable of explaining the screen it sees, but can also navigate to different parts of an iPhone based on a prompt.

Advertisement

To train Ferret UI, the Apple researchers created data of varying complexities themselves. This helped the model in learning basic tasks and understanding single-step processes. “For advanced tasks, we use GPT-4 [40] to generate data, including detailed description, conversation perception, conversation interaction, and function inference. These advanced tasks prepare the model to engage in more nuanced discussions about visual components, formulate action plans with specific goals in mind, and interpret the general purpose of a screen,” the paper explained.

The paper is promising, and if it passes the peer-review stage, Apple might be able to utilise this capability to add powerful tools to the iPhone that can perform complex UI navigation tasks with simple text or verbal prompts. This capability appears to be ideal for Siri.


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. iQOO Neo 11 With Snapdragon 8 Elite SoC Launched: Price, Specifications
  2. Top OTT Releases of the Week: Kantara Chapter 1, Lokah Chapter 1, Idli Kadai, and More
  3. Vivo X300 Series Launching Today: Everything You Need to Know
  4. Gemini 3 AI Model Will Be Released Soon, Says Google CEO Sundar Pichai
  5. Reliance Offers Free 18-Month Google AI Pro with Gemini, Veo to Jio Users
  6. Samsung Galaxy S26 Series Teased to Launch With These Notable Upgrades
  7. How to Claim 18 Months of Free Google AI Pro Access on the MyJio App
  8. Realme GT 8 Pro Will Launch in India in November With This Chipset
  9. Vivo S50 Pro Mini Key Specifications Tipped Ahead of Launch
  10. Google Maps Could Soon Save Your Phone's Battery Life While Navigating
  1. Vivo X300 Series Launched Globally With 200-Megapixel Zeiss Camera, Up to 6.78-Inch Display: Price, Features
  2. Canva Introduces Revamped Video Editor, New AI Tools and a Marketing Platform
  3. Bitchat Becomes Jamaica’s Go-to App as Hurricane Melissa Cripples Communication
  4. Google Maps Is Reportedly Developing a New Power Saving Mode for Navigation
  5. Take-Two CEO Says AI Won't Be 'Very Good' at Making a Game Like Grand Theft Auto
  6. Reliance Users to Get Free Google AI Pro Access for 18 Months Worth Rs. 35,100 With Gemini, Veo Features
  7. Meta’s VR Headsets and AI Glasses Cost the Company $4.4 Billion in Q3 2025
  8. iQOO Neo 11 With 7,500mAh Battery, Snapdragon 8 Elite Chip Launched: Price, Specifications
  9. Telegram Founder Pavel Durov Launches Cocoon, a Decentralised AI Project on TON
  10. Hedda (2025) Now Available for Streaming on Amazon Prime Video: What You Need to Know
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.