DeepSeek-OCR Open-Source AI Model Changes How AI Models Read and Process Plain Text

DeepSeek-OCR AI model brings a new approach to compressing long context text via optical 2D mapping.

Written by Akash Dutta, Edited by Ketan Pratap | Updated: 21 October 2025 17:21 IST

Highlights

The DeepSeek model is currently available on GitHub
Within 24 hours of release, it has received over 6K likes
The model turns text into pixels to improve its context memory

DeepSeek-OCR can compress a 1,000-word article into 100 visual tokens

Photo Credit: Reuters

DeepSeek, on Monday, released a new open-source artificial intelligence (AI) model that changes how these machines analyse and process plain text. Dubbed DeepSeek-OCR, it uses 2D mapping to convert text into pixels to compress long context into a digestible size. The AI startup claims that large language models (LLMs) are more efficient in processing pixels over text, and the compression allows them to capture more relevant information to generate the response. Additionally, the new approach is also said to generate more accurate results compared to traditional methods.

DeepSeek-OCR Introduces Novel Technique to Process Text

Based on optical character recognition (OCR) technology, the latest DeepSeek AI model uses a new method to process information. It first converts plain text into images, and then analyses the content to generate responses. The promise is that by reading the text in an image, it also compresses and stores massive chunks of a document in a way that makes it easier for a model to remember and reason with the information.

At its core, the model introduces “Context Optical Compression,” an approach of turning long pages of text into images, then letting the model convert those images into a highly condensed “vision token” representation, which is much smaller in size than the usual text-token representation. To highlight the conversion, the makers say that a 1,000-word article could be processed with just 100 vision tokens.

Anthropic’s Claude Haiku 4.5 Arrives With Sonnet 4-Level Performance

How the model works is also interesting. First, a document image is captured. Then, a vision encoder, which is a custom module made by the researchers, analyses the image and breaks the information into smaller patches. It is then compressed into a smaller number of vision tokens. Then, a decoder takes these vision tokens and reconstructs the textual meaning.

Because the AI model is working with far fewer tokens, the downstream language model (or reasoning module) has less memory burden and can handle longer content or bigger documents.

Andrej Karpathy, Co-Founder of OpenAI and former Director of AI at Tesla, praised DeepSeek-OCR for its novel implementation of vision tokens. He said that the approach could lead to higher efficiency and has the potential for bidirectional attention. He also said that this method could lead to the elimination of the tokeniser, which would make models more efficient.

For those who want to try out the DeepSeek-OCR, the model is currently being hosted on GitHub, where it has received more than 6,700 likes in just 24 hours. The model is available with the permissive MIT licence for both academic and commercial use cases.

DeepSeek-OCR Open-Source AI Model Changes How AI Models Read and Process Plain Text

DeepSeek-OCR Introduces Novel Technique to Process Text

Related Stories