Files changed (1) hide show
  1. README.md +4 -68
README.md CHANGED
@@ -8,7 +8,6 @@ tags:
8
  - ocr
9
  - custom_code
10
  license: mit
11
- library_name: transformers
12
  ---
13
  <div align="center">
14
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
@@ -41,18 +40,18 @@ library_name: transformers
41
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
42
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
43
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
44
- <a href="https://arxiv.org/abs/2510.18234"><b>📄 Arxiv Paper Link</b></a> |
45
  </p>
46
  <h2>
47
  <p align="center">
48
- <a href="https://huggingface.co/papers/2510.18234">DeepSeek-OCR: Contexts Optical Compression</a>
49
  </p>
50
  </h2>
51
  <p align="center">
52
  <img src="assets/fig1.png" style="width: 1000px" align=center>
53
  </p>
54
  <p align="center">
55
- <a href="https://huggingface.co/papers/2510.18234">Explore the boundaries of visual-text compression.</a>
56
  </p>
57
 
58
  ## Usage
@@ -99,63 +98,6 @@ res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path =
99
  ## vLLM
100
  Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
101
 
102
- [2025/10/23] 🚀🚀🚀 DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm).
103
- ```shell
104
- uv venv
105
- source .venv/bin/activate
106
- # Until v0.11.1 release, you need to install vLLM from nightly build
107
- uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
108
- ```
109
-
110
- ```python
111
- from vllm import LLM, SamplingParams
112
- from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor
113
- from PIL import Image
114
-
115
- # Create model instance
116
- llm = LLM(
117
- model="deepseek-ai/DeepSeek-OCR",
118
- enable_prefix_caching=False,
119
- mm_processor_cache_gb=0,
120
- logits_processors=[NGramPerReqLogitsProcessor]
121
- )
122
-
123
- # Prepare batched input with your image file
124
- image_1 = Image.open("path/to/your/image_1.png").convert("RGB")
125
- image_2 = Image.open("path/to/your/image_2.png").convert("RGB")
126
- prompt = "<image>\nFree OCR."
127
-
128
- model_input = [
129
- {
130
- "prompt": prompt,
131
- "multi_modal_data": {"image": image_1}
132
- },
133
- {
134
- "prompt": prompt,
135
- "multi_modal_data": {"image": image_2}
136
- }
137
- ]
138
-
139
- sampling_param = SamplingParams(
140
- temperature=0.0,
141
- max_tokens=8192,
142
- # ngram logit processor args
143
- extra_args=dict(
144
- ngram_size=30,
145
- window_size=90,
146
- whitelist_token_ids={128821, 128822}, # whitelist: <td>, </td>
147
- ),
148
- skip_special_tokens=False,
149
- )
150
- # Generate output
151
- model_outputs = llm.generate(model_input, sampling_param)
152
-
153
- # Print output
154
- for output in model_outputs:
155
- print(output.outputs[0].text)
156
- ```
157
-
158
-
159
  ## Visualizations
160
  <table>
161
  <tr>
@@ -177,10 +119,4 @@ We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [Omini
177
 
178
 
179
  ## Citation
180
- ```bibtex
181
- @article{wei2025deepseek,
182
- title={DeepSeek-OCR: Contexts Optical Compression},
183
- author={Wei, Haoran and Sun, Yaofeng and Li, Yukun},
184
- journal={arXiv preprint arXiv:2510.18234},
185
- year={2025}
186
- }
 
8
  - ocr
9
  - custom_code
10
  license: mit
 
11
  ---
12
  <div align="center">
13
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" />
 
40
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR"><b>🌟 Github</b></a> |
41
  <a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"><b>📥 Model Download</b></a> |
42
  <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> |
43
+ <a href=""><b>📄 Arxiv Paper Link</b></a> |
44
  </p>
45
  <h2>
46
  <p align="center">
47
+ <a href="">DeepSeek-OCR: Contexts Optical Compression</a>
48
  </p>
49
  </h2>
50
  <p align="center">
51
  <img src="assets/fig1.png" style="width: 1000px" align=center>
52
  </p>
53
  <p align="center">
54
+ <a href="">Explore the boundaries of visual-text compression.</a>
55
  </p>
56
 
57
  ## Usage
 
98
  ## vLLM
99
  Refer to [🌟GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc.<!-- -->
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Visualizations
102
  <table>
103
  <tr>
 
119
 
120
 
121
  ## Citation
122
+ Coming soon!