PaddlePaddle/PaddleOCR-VL · add transformers usage of PaddleOCR-VL-0.9B

add transformers usage of PaddleOCR-VL-0.9B

#32

by sunflowerting78 - opened 14 days ago

base: refs/heads/main

←

from: refs/pr/32

Discussion Files changed

+79

-3

Files changed (2) hide show

README.md +52 -0
chat_template.jinja +27 -3

README.md CHANGED Viewed

@@ -140,6 +140,58 @@ for res in output:
     ```
 **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
 ## Performance
 ### Page-Level Document Parsing

     ```
 **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/pipeline_usage/PaddleOCR-VL.html).**
+## PaddleOCR-VL-0.9B Usage with transformers
+Currently, we support inference using the PaddleOCR-VL-0.9B model with the `transformers` library, which can recognize texts, formulas, tables, and chart elements. In the future, we plan to support full document parsing inference with `transformers`. Below is a simple script we provide to support inference using the PaddleOCR-VL-0.9B model with `transformers`. We currently recommend using the official method for inference, which is faster and can support page-level document parsing.
+```python
+from PIL import Image
+import torch
+from transformers import AutoModelForCausalLM, AutoProcessor
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+CHOSEN_TASK = "ocr"  # Options: 'ocr' | 'table' | 'chart' | 'formula'
+PROMPTS = {
+    "ocr": "OCR:",
+    "table": "Table Recognition:",
+    "formula": "Formula Recognition:",
+    "chart": "Chart Recognition:",
+}
+model_path = "PaddlePaddle/PaddleOCR-VL"
+image_path = "test.png"
+image = Image.open(image_path).convert("RGB")
+model = AutoModelForCausalLM.from_pretrained(
+    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
+).to(DEVICE).eval()
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+messages = [
+    {"role": "user",
+     "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": PROMPTS[CHOSEN_TASK]},
+        ]
+    }
+]
+inputs = processor.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_dict=True,
+	return_tensors="pt"
+).to(DEVICE)
+outputs = model.generate(**inputs, max_new_tokens=1024)
+outputs = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+print(outputs)
+```
 ## Performance
 ### Page-Level Document Parsing

chat_template.jinja CHANGED Viewed

@@ -7,14 +7,38 @@
 {%- if not sep_token is defined -%}
     {%- set sep_token = "<|end_of_sentence|>" -%}
 {%- endif -%}
 {{- cls_token -}}
 {%- for message in messages -%}
     {%- if message["role"] == "user" -%}
-        {{- "User: <|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>" + message["content"] + "\n" -}}
     {%- elif message["role"] == "assistant" -%}
-        {{- "Assistant: " + message["content"] + sep_token -}}
     {%- elif message["role"] == "system" -%}
-        {{- message["content"] -}}
     {%- endif -%}
 {%- endfor -%}
 {%- if add_generation_prompt -%}

 {%- if not sep_token is defined -%}
     {%- set sep_token = "<|end_of_sentence|>" -%}
 {%- endif -%}
+{%- if not image_token is defined -%}
+    {%- set image_token = "<|IMAGE_START|><|IMAGE_PLACEHOLDER|><|IMAGE_END|>" -%}
+{%- endif -%}
 {{- cls_token -}}
 {%- for message in messages -%}
     {%- if message["role"] == "user" -%}
+        {{- "User: " -}}
+        {%- for content in message["content"] -%}
+            {%- if content["type"] == "image" -%}
+                {{ image_token }}
+            {%- endif -%}
+        {%- endfor -%}
+        {%- for content in message["content"] -%}
+            {%- if content["type"] == "text" -%}
+                {{ content["text"] }}
+            {%- endif -%}
+        {%- endfor -%}
+        {{ "\n" -}}
     {%- elif message["role"] == "assistant" -%}
+        {{- "Assistant: " -}}
+        {%- for content in message["content"] -%}
+            {%- if content["type"] == "text" -%}
+                {{ content["text"] + "\n" }}
+            {%- endif -%}
+        {%- endfor -%}
+        {{ sep_token -}}
     {%- elif message["role"] == "system" -%}
+        {%- for content in message["content"] -%}
+            {%- if content["type"] == "text" -%}
+                {{ content["text"] + "\n" }}
+            {%- endif -%}
+        {%- endfor -%}
     {%- endif -%}
 {%- endfor -%}
 {%- if add_generation_prompt -%}