cernis-intelligence
/

cernis-thinking

@@ -8,14 +8,212 @@ tags:
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** coolAI
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/qwen2.5-vl-7b-instruct-unsloth-bnb-4bit
-This qwen2_5_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 license: apache-2.0
 language:
 - en
+datasets:
+- AI4Math/MathVista
+- unsloth/LaTeX_OCR
+- mychen76/invoices-and-receipts_ocr_v1
+- corto-ai/handwritten-text
 ---
+# Cernis-Thinking: Multi-Task Vision Language Model for Document Understanding
+**Cernis-Thinking** is a reasoning-capable vision language model fine-tuned with reinforcement learning (GRPO/GSPO) for document understanding tasks. Built on Qwen2.5-VL-7B, it excels at mathematical reasoning, LaTeX OCR, invoice extraction, and handwriting transcription.
+## Model Details
+- **Base Model**: [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
+- **Training Method**: Group Relative Policy Optimization (GRPO) with GSPO extensions
+- **Training Data**: ~2,000 samples across 4 document understanding tasks
+- **Model Size**: 7B parameters
+- **License**: Apache 2.0
+## Capabilities
+Cernis-Thinking is trained on four distinct document understanding tasks:
+1. **Mathematical Reasoning** - Solves math problems from images with step-by-step reasoning
+2. **LaTeX OCR** - Converts mathematical notation images to LaTeX code
+3. **Invoice Extraction** - Extracts structured information from invoices and receipts
+4. **Handwriting Transcription** - Transcribes handwritten text from images
+## Training Details
+### Datasets
+- [AI4Math/MathVista](https://huggingface.co/datasets/AI4Math/MathVista) - Mathematical reasoning (filtered for numeric answers)
+- [unsloth/LaTeX_OCR](https://huggingface.co/datasets/unsloth/LaTeX_OCR) - LaTeX formula recognition
+- [mychen76/invoices-and-receipts_ocr_v1](https://huggingface.co/datasets/mychen76/invoices-and-receipts_ocr_v1) - Invoice extraction
+- [corto-ai/handwritten-text](https://huggingface.co/datasets/corto-ai/handwritten-text) - Handwriting transcription
+### Reinforcement Learning Approach
+The model was trained using GRPO (Group Relative Policy Optimization) with custom reward functions:
+**1. Formatting Reward Function**
+- Rewards proper use of `<REASONING>` and `<SOLUTION>` tags
+- Penalizes malformed outputs (e.g., excessive "addCriterion" artifacts)
+- Encourages structured, parseable responses
+**2. Task-Specific Correctness Reward**
+- **Math**: Exact numeric matching (2.0 points)
+- **LaTeX/Handwriting**: String similarity with word overlap scoring (0.75-2.0 points)
+- **Invoices**: Partial credit for extracting key information (1.5 points)
+**3. ROUGE-like Word Overlap**
+- For text-heavy tasks, rewards based on word overlap ratio:
+  - >50% overlap: 1.5 points
+  - >30% overlap: 0.75 points
+  - Prevents wasted training on completely wrong outputs
+### Training Configuration
+```python
+training_args = GRPOConfig(
+    learning_rate = 5e-6,
+    num_train_epochs = 0.5,
+    per_device_train_batch_size = 1,
+    gradient_accumulation_steps = 2,
+    num_generations = 4,
+    max_prompt_length = 1024,
+    max_completion_length = 1024,
+    # GSPO settings
+    importance_sampling_level = "sequence",
+    loss_type = "dr_grpo",
+)
+```
+## Usage
+### With Transformers
+```python
+from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
+from PIL import Image
+# Load model and processor
+model = Qwen2VLForConditionalGeneration.from_pretrained(
+    "coolAI/cernis-thinking",
+    torch_dtype="auto",
+    device_map="auto"
+)
+processor = AutoProcessor.from_pretrained("coolAI/cernis-thinking")
+# Prepare image and prompt
+image = Image.open("document.jpg")
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image"},
+            {"type": "text", "text": "Extract the key information from this invoice. First provide your reasoning between <REASONING> and </REASONING>, then your answer between <SOLUTION> and </SOLUTION>"}
+        ]
+    }
+]
+# Prepare inputs
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to(model.device)
+# Generate
+output_ids = model.generate(**inputs, max_new_tokens=1024)
+generated_text = processor.batch_decode(output_ids, skip_special_tokens=True)
+print(generated_text[0])
+```
+### With vLLM (Recommended for Production)
+```python
+from vllm import LLM, SamplingParams
+from vllm.assets.image import ImageAsset
+# Initialize vLLM
+llm = LLM(
+    model="coolAI/cernis-thinking",
+    max_model_len=16384,
+    gpu_memory_utilization=0.8
+)
+# Prepare prompt
+prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>What is the LaTeX code shown in this image? Provide your answer between <SOLUTION> and </SOLUTION><|im_end|>\n<|im_start|>assistant\n"
+# Sampling parameters
+sampling_params = SamplingParams(
+    temperature=0.7,
+    top_k=50,
+    max_tokens=1024
+)
+# Generate
+outputs = llm.generate(
+    {
+        "prompt": prompt,
+        "multi_modal_data": {"image": ImageAsset("formula.png").pil_image}
+    },
+    sampling_params=sampling_params
+)
+print(outputs[0].outputs[0].text)
+```
+## Example Outputs
+### Mathematical Reasoning
+**Input**: Image of geometry problem
+**Output**:
+```
+<REASONING>
+To solve this parallelogram problem, I need to use the properties:
+1. Opposite sides are equal in a parallelogram
+2. Angle bisectors create specific relationships...
+</REASONING>
+<SOLUTION>
+42
+</SOLUTION>
+```
+### LaTeX OCR
+**Input**: Image of mathematical formula
+**Output**:
+```
+<SOLUTION>
+\frac{2}{3} < a^{2} \alpha^{2} \leq 1
+</SOLUTION>
+```
+### Invoice Extraction
+**Input**: Invoice image
+**Output**:
+```
+<SOLUTION>
+Invoice No: 53553822
+Date: 07/24/2012
+Vendor: Leo Brown
+Seller Address: 082 Christopher Club Apt. 771 Thomasberg, OH 42949
+Seller Tax ID: 926-74-9803
+Total: $247.50
+</SOLUTION>
+```
+## Citation
+```bibtex
+@misc{cernis-thinking-2025,
+  title={Cernis-Thinking: Multi-Task Vision Language Model for Document Understanding},
+  author={Your Name},
+  year={2025},
+  publisher={HuggingFace},
+  howpublished={\url{https://huggingface.co/coolAI/cernis-thinking}}
+}
+```
+## Acknowledgments
+- Built with [Unsloth](https://github.com/unslothai/unsloth) for efficient VLM training
+- Base model: [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
+- Training datasets: AI4Math, Unsloth, mychen76, corto-ai
+## License
+Apache 2.0 - Free for commercial and research use