🌟 Github | 📥 Model Download | 📄 Paper Link | 📄 Arxiv Paper Link |

DeepSeek-OCR: Contexts Optical Compression

Explore the boundaries of visual-text compression.

The official version of DeepSeek-OCR has limited the transformers version to 4.46.3 and has not been adapted to the latest version. Therefore, this community edition has modified the modeling.py module to facilitate user convenience without requiring a transformers downgrade. Additionally, this version has been adapted for MindSpore+MindNLP compatibility, and users are welcome to utilize it on Ascend hardware.

Feel free to opt for various attention implementations such as Flash Attention or SDPA to leverage the latest optimizations in transformers for a performance boost.

MindSpore Usage

Inference using Huggingface transformers on Ascend NPUs. Requirements tested on MindSpore2.7+ CANN8.2：

mindspore==2.7.0
mindnlp==0.5.0rc4
transformers==4.57.1
tokenizers
einops
addict 
easydict

import os
import mindnlp
import mindspore
from transformers import AutoModel, AutoTokenizer

model_name = 'lvyufeng/DeepSeek-OCR-Community-Latest'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, dtype=mindspore.float16, _attn_implementation='sdpa', trust_remote_code=True, use_safetensors=True, device_map='auto')
model = model.eval()

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):

# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False

# Gundam: base_size = 1024, image_size = 640, crop_mode = True

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)

Pytorch Usage

Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8：

torch
transformers==4.57.1
tokenizers
einops
addict 
easydict
pip install flash-attn

from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'lvyufeng/DeepSeek-OCR-Community-Latest'
tokenizer = AutoTokenizer.from_pretrained(model_name, dtype=torch.bfloat16,trust_remote_code=True, device_map='auto')
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval()
# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)

Acknowledgement

We would like to thank Vary, GOT-OCR2.0, MinerU, PaddleOCR, OneChart, Slow Perception for their valuable models and ideas.

We also appreciate the benchmarks: Fox, OminiDocBench.

Downloads last month: 66

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support