OpenAI Adds a New ‘Instructional Hierarchy’ Protocol to Prevent Jailbreaking Incidents in GPT-4o Mini

OpenAI’s Instructional Hierarchy lets AI know how models should behave when instructions of different priorities conflict.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 22 July 2024 12:05 IST
Highlights
  • OpenAI said the technique will stop issues of prompt injections as well
  • GPT-4o Mini is the first OpenAI AI model to get this new safety measure
  • The AI model has a context window of 128,000 tokens

GPT-4o Mini, which was released last week, is now the default mode on ChatGPT

Photo Credit: Unsplash/Solen Feyissa

OpenAI released a new artificial intelligence (AI) model dubbed GPT-4o Mini last week, which has new safety and security measures to protect it from harmful usage. The large language model (LLM) is built with a technique called Instructional Hierarchy, which will stop malicious prompt engineers from jailbreaking the AI model. The company said the technique will also show an increased resistance towards issues such as prompt injections and system prompt extractions. As per the company, the new method has improved the robustness score of the AI model by 63 percent.

OpenAI Builts a New Safety Framework

In a research paper, which is published in the online pre-print journal (non-peer-reviewed) arXiv, the AI firm explained the new technique and how it functions. To understand Instructional Hierarchy, jailbreaking needs to be explained first. Jailbreaking is a privilege escalation exploit that uses certain flaws in the software to make it do things it is not programmed to.

Advertisement

In the early days of ChatGPT, many people attempted to make the AI generate offensive or harmful text by tricking it into forgetting the original programming. Such prompts often began with “Forget all previous instructions and do this…” While ChatGPT has come a long way from there and malicious prompt engineering is more difficult, bad actors have also become more strategic in the attempt.

To combat issues where the AI model generates not only offensive text or images but also harmful content such as methods to create a chemical explosive or ways to hack a website, OpenAI is now using the Instructional Hierarchy technique. Put simply, the technique dictates how models should behave when instructions of different priorities conflict.

Advertisement

By creating a hierarchical structure, the company can keep its instructions at the highest priority, which will make it very difficult for any prompt engineer to break, as the AI will always follow the order of priority when it is asked to generate something it was not initially programmed to.

The company claims that it saw an improvement of 63 percent in robustness scores. However, there is a risk that the AI might refuse to listen to the lowest-level instructions. OpenAI's research paper has also outlined several refinements to improve the technique in future. One of the key areas of focus is handling other modalities such as images or audio which can also contain injected instructions.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. You Can Now Use WhatsApp to Recharge Your Prepaid Number in India
  2. Vivo Y6 5G Debuts With 7,200mAh Battery, 6.75-Inch Screen at This Price
  3. Honor Earbuds 4 With Up to 46 Hours of Total Battery Life Debut Globally
  4. Honor MagicPad 3 Pro 12.3 Debuts With 10,100mAh Battery, Slim 4.8mm Profile
  5. Poco C81 Series Arrives With 13-Megapixel Camera at This Price in India
  6. Redmi Note 17 Pro Max Leak Reveals Chipset, Camera Details
  1. Vivo X300 Ultra Content Creation Features Showcased at Vivo Imagine Labs Ahead of Launch in India
  2. Honor MagicPad 3 Pro 12.3 Launched With Snapdragon 8 Gen 5, Slim 4.8mm Profile and 10,100mAh Battery
  3. Vivo Y6 5G Launched With 7,200mAh Battery, Snapdragon 4 Gen 2 Chip: Price, Specifications
  4. iQOO Smartphone With Dimensity 7500 Chip and 8,000mAh Battery in Development, Tipster Claims
  5. Redmi Note 17 Pro Max Battery, Camera and Chipset Details Leak; to Feature Dimensity 7500 SoC
  6. Assassin's Creed Black Flag Resynced Pricing Leaked Ahead of Official Reveal
  7. Honor Earbuds 4 Launched Globally With Active Noise Cancellation, Up to 46 Hours of Total Battery Life
  8. Motorola Razr 70 Ultra Design, Colour Options Spotted in Leaked Renders and Promotional Image
  9. UK’s FCA Raids Multiple Sites Suspected of Illegal P2P Crypto Operations
  10. Honor Win H7, Win H9 Launched With Up to Intel Core 9 Ultra HX CPU: Price, Specifications
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.