OpenAI Unveils DALL·E and CLIP AI Models That Create and Classify Images

DALL·E can create images from quirky text descriptions, such as ‘an illustration of a baby daikon radish in a tutu walking a dog’. .

Advertisement
By Jasmin Jose | Updated: 7 January 2021 17:52 IST
Highlights
  • GPT-3 is a deep learning language model that produces human like-text
  • DALL·E is a GPT-3 model that creates images from text and image prompts
  • CLIP is another GPT-3 model that learns what’s in a picture from caption
OpenAI Unveils DALL·E and CLIP AI Models That Create and Classify Images

The results are surreal and imaginative

OpenAI has unveiled DALL-E and CLIP, two new generative AI models that can generate images from your text and classify your images into categories respectively. DALL·E is a neural network that can generate images from the wildest text and image descriptions fed to it, such as “as an armchair in the shape of an avocado”, or “the exact same cat on the top as a sketch on the bottom”. CLIP uses a new method of training for image classification, meant to be more accurate, efficient, and flexible across a range of image types.

Generative Pre-trained Transformer 3 (GPT-3) models from the US-based AI company use deep learning to create images and human-like text. You can let your imagination run wild as DALL·E is trained to create diverse — and sometimes surreal — images depending on the text input. But the model has also raised questions regarding copyrights issues since DALL-E sources images from the Web to create its own.

AI illustrator DALL·E creates quirky images

The name DALL·E, as you might have already guessed, is a portmanteau of surrealist artist Salvador Dali and Pixar's WALL·E. DALL·E can use text and image inputs to create quirky images. For example, it can create “an illustration of a baby daikon radish in a tutu walking a dog” or a “snail made of harp”. DALL·E is trained not only to generate images from scratch but also to regenerate any existing image in a way that is consistent with the text or image prompt.

Image results for the text prompt 'a snail made of harp'

Advertisement

GPT-3 by OpenAI is a deep learning language model that can perform a variety of text-generation tasks using language input. GPT-3 could write a story, just like a human. For DALL·E, the San Francisco-based AI lab created an Image GPT-3 by swapping the text with images and training the AI to complete half-finished images.

DALL·E can draw images of animals or things with human characteristics and combine unrelated items sensibly to produce a single image. The success rate of the images will depend on how well the text is phrased. DALL·E is often able to “fill in the blanks” when the caption implies that the image must contain a certain detail that is not explicitly stated. For example, the text ‘a giraffe made of turtle' or ‘an armchair in the shape of an avacado' will give you a satisfactory output.

Advertisement

CLIPing text and images together

CLIP (Contrastive Language-Image Pre-training) is a neural network that can perform accurate image classification based on natural language. It helps more accurately and efficiently classify images into distinct categories from "unfiltered, highly varied, and highly noisy data". What makes CLIP different is that it does not recognise images from a curated data set, as most of the existing models for visual classification do. CLIP has been trained on a wide variety of natural language supervision that's available on the Internet. Thus, CLIP learns what is in a picture from a detailed description rather than a labelled single word from a data set.

CLIP can be applied to any visual classification benchmark by providing the names of the visual categories to be recognised. According to the OpenAI blog, CLIP is similar to “zero-shot” capabilities of GPT-2 and GPT-3.

Advertisement

Models like DALL·E and CLIP have the potential of significant societal impact. The OpenAI team say that they will analyse how these models relates to societal issues like economic impact on certain professions, the potential for bias in the model outputs, and the longer-term ethical challenges implied by this technology.

A generative AI model like DALL·E that picks images directly from the Internet can pave the way to several copyright infringements. DALL·E can regenerate any rectangular region of an existing image on the Internet. And people have been tweeting about attribution and copyright of the distorted images.


What will be the most exciting tech launch of 2021? We discussed this on Orbital, our weekly technology podcast, which you can subscribe to via Apple Podcasts, Google Podcasts, or RSS, download the episode, or just hit the play button below.

Affiliate links may be automatically generated - see our ethics statement for details.
 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Further reading: DALL E, CLIP, OpenAI
Advertisement

Related Stories

Popular Mobile Brands
  1. OnePlus 13s Set to Launch in India Tomorrow: Know Price, Specifications
  2. Vivo T4 Ultra to Launch in India on This Date
  3. You Can Now Try Out Gemini 2.5's Native Audio Dialog Generation
  4. OnePlus 13s Key Specifications, Features Revealed via Amazon Listing
  5. Samsung Galaxy Z Fold 7, Z Flip 7 Colourways, RAM and More Tipped
  6. Poco F7 Launch Timeline, Key Specifications Leaked Ahead of Debut
  7. Google Weather in Search Could Soon Show AI Overviews-Style Summaries
  8. Apple's iPhone 18 Pro Models Could Debut With a 2nm A20 Chip in 2026
  1. Poco F7 Global Launch Timeline Leaked; Indian Variant Tipped to Feature Larger Battery
  2. Samsung Galaxy Z Fold 7, Galaxy Z Flip 7, Galaxy Z Flip 7 FE Colourways, RAM and Storage Options Leaked Ahead of Debut
  3. Australia Limits Crypto ATM Transactions to AUD 5,000 in Bid to Curb Scams, Money Laundering
  4. Google Opens Access to Gemini 2.5 Native Audio Dialog and Controllable Speech Generation in Preview
  5. Vi, Vivo Partner to Offer Vivo V50e Buyers in India an Exclusive 5G Bundled Plan
  6. Google Weather in Search Reportedly Testing AI-Powered Summaries In Some Cities
  7. iPhone 18 Pro, iPhone 18 Pro Max and iPhone 18 Fold Said to Debut With 2nm A20 Chipset in 2026
  8. Perplexity Could be Default AI Assistant on Samsung Galaxy S26 as Part of 'Wide-Ranging' Deal: Report
  9. OnePlus 13s Key Specifications Revealed via Amazon Listing Ahead of June 5 Launch
  10. Nizharkudai Now Streaming on Aha Tamil: What You Need to Know About Tamil Family Drama
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.