Search

OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in GPT-4o Mini

OpenAI’s Instructional Hierarchy lets AI know how models should behave when instructions of different priorities conflict.

Advertisement
Highlights
  • OpenAI said the technique will stop issues of prompt injections as well
  • GPT-4o Mini is the first OpenAI AI model to get this new safety measure
  • The AI model has a context window of 128,000 tokens
OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in GPT-4o Mini

GPT-4o Mini, which was released last week, is now the default mode on ChatGPT

Photo Credit: Unsplash/Solen Feyissa

OpenAI released a new artificial intelligence (AI) model dubbed GPT-4o Mini last week, which has new safety and security measures to protect it from harmful usage. The large language model (LLM) is built with a technique called Instructional Hierarchy, which will stop malicious prompt engineers from jailbreaking the AI model. The company said the technique will also show an increased resistance towards issues such as prompt injections and system prompt extractions. As per the company, the new method has improved the robustness score of the AI model by 63 percent.

OpenAI Builts a New Safety Framework

In a research paper, which is published in the online pre-print journal (non-peer-reviewed) arXiv, the AI firm explained the new technique and how it functions. To understand Instructional Hierarchy, jailbreaking needs to be explained first. Jailbreaking is a privilege escalation exploit that uses certain flaws in the software to make it do things it is not programmed to.

In the early days of ChatGPT, many people attempted to make the AI generate offensive or harmful text by tricking it into forgetting the original programming. Such prompts often began with “Forget all previous instructions and do this…” While ChatGPT has come a long way from there and malicious prompt engineering is more difficult, bad actors have also become more strategic in the attempt.

To combat issues where the AI model generates not only offensive text or images but also harmful content such as methods to create a chemical explosive or ways to hack a website, OpenAI is now using the Instructional Hierarchy technique. Put simply, the technique dictates how models should behave when instructions of different priorities conflict.

By creating a hierarchical structure, the company can keep its instructions at the highest priority, which will make it very difficult for any prompt engineer to break, as the AI will always follow the order of priority when it is asked to generate something it was not initially programmed to.

The company claims that it saw an improvement of 63 percent in robustness scores. However, there is a risk that the AI might refuse to listen to the lowest-level instructions. OpenAI's research paper has also outlined several refinements to improve the technique in future. One of the key areas of focus is handling other modalities such as images or audio which can also contain injected instructions.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

 
Show Full Article
Please wait...
Advertisement

Related Stories

Popular Mobile Brands
  1. Vivo Y400 Pro 5G Confirmed to Launch in India Soon, Design Teased
  2. Vivo T4 Lite 5G Price in India, Launch Timeline and Key Features Leaked
  3. Top Smartphones Under Rs 35,000 in India (June 2025): Check List
  4. OnePlus Bullets Wireless Z3 to Launch in India on June 19: All Details
  5. BSNL to Reportedly Install One Lakh Additional 4G Towers Across India
  6. Oppo K13x 5G Build, Durability Details Revealed Ahead of India Launch
  7. Realme GT 7 Dream Edition is Now Available for Purchase in India
  8. Vivo X Fold 5 Battery Capacity Revealed in Latest Teaser
  1. SpaceX Launches 26 New Starlink Satellites, Expands Global Internet Network
  2. Aurora Alert! Northern Lights May Be Visible as Far South as New York on June 14
  3. New Island Forms in Caspian Sea as Water Levels Drop, Russian Scientists Confirm
  4. Kesari Chapter 2 Now Streaming on JioHotstar: Everything You Need to Know About Akshay Kumar Starrer Movie
  5. Steam for Mac Now Available as Native Apple Silicon App With Latest Beta Release
  6. Coinbase Announces American Express-Backed Crypto Credit Card That Offers Bitcoin Rewards
  7. Killed by Google: Support for Android Instant Apps to Reportedly Be Dropped Later This Year
  8. Realme GT 7 Dream Edition Now Available for Purchase in India: Price, Sale Offers
  9. Meta AI Discovery Feed Is Reportedly Filled With Users' Seemingly Private Chats
  10. Boat Airdopes Prime 701 ANC Earphones Confirmed to Launch in India on June 21
Gadgets 360 is available in
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »