Update README.md
Browse files
README.md
CHANGED
|
@@ -1,21 +1,157 @@
|
|
|
|
|
| 1 |
---
|
| 2 |
-
base_model:
|
| 3 |
tags:
|
| 4 |
- text-generation-inference
|
| 5 |
- transformers
|
| 6 |
- unsloth
|
| 7 |
- gemma3n
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
license: apache-2.0
|
| 9 |
language:
|
| 10 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
- **License:** apache-2.0
|
| 17 |
-
- **Finetuned from model :** Nayana-cognitivelab/SectionOCR_SFT_v3_half
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
---
|
| 3 |
+
base_model: unsloth/gemma-3n-E4B-it
|
| 4 |
tags:
|
| 5 |
- text-generation-inference
|
| 6 |
- transformers
|
| 7 |
- unsloth
|
| 8 |
- gemma3n
|
| 9 |
+
- ocr
|
| 10 |
+
- document-understanding
|
| 11 |
+
- multilingual
|
| 12 |
+
- vision-language
|
| 13 |
license: apache-2.0
|
| 14 |
language:
|
| 15 |
- en
|
| 16 |
+
- kn
|
| 17 |
+
- hi
|
| 18 |
+
- mr
|
| 19 |
+
- sa
|
| 20 |
+
datasets:
|
| 21 |
+
- Nayana-cognitivelab/SectionOCR-SFT
|
| 22 |
+
library_name: transformers
|
| 23 |
+
pipeline_tag: image-text-to-text
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# π Nayana SectionOCR - Advanced Multilingual OCR Model
|
| 27 |
+
|
| 28 |
+
**Developed by:** [CognitiveLab](https://nayana.cognitivelab.in/)
|
| 29 |
+
**License:** Apache 2.0
|
| 30 |
+
**Base Model:** unsloth/gemma-3n-E4B-it
|
| 31 |
+
**Architecture:** Gemma 3n (4B parameters)
|
| 32 |
+
|
| 33 |
+
## π Model Overview
|
| 34 |
+
|
| 35 |
+
Nayana SectionOCR is an advanced multilingual vision-language model specifically fine-tuned for Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and extracting text from complex visual documents across multiple languages.
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## π Supported Languages
|
| 40 |
+
|
| 41 |
+
- **English** (en) - Primary language
|
| 42 |
+
- **Kannada** (kn) - Indian regional language
|
| 43 |
+
- **Hindi** (hi) - Indian national language
|
| 44 |
+
- **Marathi** (mr) - Indian regional language
|
| 45 |
+
- **Sanskrit** (sa) - Classical language
|
| 46 |
+
|
| 47 |
+
17 other languages coming soon !!!!!
|
| 48 |
+
|
| 49 |
+
## π― Key Features
|
| 50 |
+
|
| 51 |
+
- **Multilingual OCR**: Accurate text extraction in 5 languages
|
| 52 |
+
- **Document Understanding**: Advanced layout and structure comprehension
|
| 53 |
+
- **Fast Inference**: Optimized for real-time applications
|
| 54 |
+
- **High Accuracy**: Fine-tuned on diverse document datasets
|
| 55 |
+
- **Easy Integration**: Compatible with Transformers and Modal deployment
|
| 56 |
+
|
| 57 |
+
## π Model Specifications
|
| 58 |
+
|
| 59 |
+
| Parameter | Value |
|
| 60 |
+
|-----------|-------|
|
| 61 |
+
| Model Size | 4B parameters |
|
| 62 |
+
| Context Length | 32K tokens |
|
| 63 |
+
| Image Resolution | Flexible (optimized for documents) |
|
| 64 |
+
| Precision | BFloat16 |
|
| 65 |
+
| Framework | Transformers + Unsloth |
|
| 66 |
+
|
| 67 |
+
## π Quick Start
|
| 68 |
|
| 69 |
+
### Installation
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
````python
|
| 72 |
+
pip install transformers torch pillow unsloth
|
| 73 |
+
````
|
| 74 |
+
|
| 75 |
+
### Basic Usage
|
| 76 |
+
|
| 77 |
+
````python
|
| 78 |
+
from transformers import AutoProcessor, AutoModelForImageTextToText
|
| 79 |
+
from PIL import Image
|
| 80 |
+
import torch
|
| 81 |
+
|
| 82 |
+
# Load model and processor
|
| 83 |
+
model_id = "Nayana-cognitivelab/NayanaSectionOCR"
|
| 84 |
+
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
|
| 85 |
+
model = AutoModelForImageTextToText.from_pretrained(
|
| 86 |
+
model_id,
|
| 87 |
+
device_map="auto",
|
| 88 |
+
torch_dtype=torch.bfloat16,
|
| 89 |
+
trust_remote_code=True
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
# System prompt
|
| 93 |
+
system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with complex visual documents. Most importantly, you are multilingual, capable of understanding and processing documents in a wide range of languages with precision."
|
| 94 |
+
|
| 95 |
+
# Load and process image
|
| 96 |
+
image = Image.open("your_document.jpg")
|
| 97 |
+
language = "English" # or "Kannada", "Hindi", "Marathi", "Sanskrit"
|
| 98 |
+
user_prompt = f"Extract the text from this image in {language}"
|
| 99 |
+
|
| 100 |
+
# Prepare messages
|
| 101 |
+
messages = [
|
| 102 |
+
{
|
| 103 |
+
"role": "system",
|
| 104 |
+
"content": [{"type": "text", "text": system_prompt}]
|
| 105 |
+
},
|
| 106 |
+
{
|
| 107 |
+
"role": "user",
|
| 108 |
+
"content": [
|
| 109 |
+
{"type": "text", "text": user_prompt},
|
| 110 |
+
{"type": "image", "image": image}
|
| 111 |
+
]
|
| 112 |
+
}
|
| 113 |
+
]
|
| 114 |
+
|
| 115 |
+
# Apply chat template
|
| 116 |
+
inputs = processor.apply_chat_template(
|
| 117 |
+
messages,
|
| 118 |
+
add_generation_prompt=True,
|
| 119 |
+
tokenize=True,
|
| 120 |
+
return_dict=True,
|
| 121 |
+
return_tensors="pt"
|
| 122 |
+
)
|
| 123 |
+
|
| 124 |
+
# Generate response
|
| 125 |
+
with torch.inference_mode():
|
| 126 |
+
outputs = model.generate(
|
| 127 |
+
**inputs,
|
| 128 |
+
max_new_tokens=1024,
|
| 129 |
+
temperature=1.0,
|
| 130 |
+
top_p=0.95,
|
| 131 |
+
top_k=64,
|
| 132 |
+
do_sample=True
|
| 133 |
+
)
|
| 134 |
+
|
| 135 |
+
# Decode response
|
| 136 |
+
response = processor.tokenizer.decode(
|
| 137 |
+
outputs[0][inputs["input_ids"].shape[1]:],
|
| 138 |
+
skip_special_tokens=True
|
| 139 |
+
)
|
| 140 |
+
print(response)
|
| 141 |
+
````
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
This model was trained **2x faster** with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
|
| 145 |
|
| 146 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
| 147 |
+
|
| 148 |
+
## π Citation
|
| 149 |
+
|
| 150 |
+
```bibtex
|
| 151 |
+
@model{nayana_sectionocr_2024,
|
| 152 |
+
title={Nayana SectionOCR: Multilingual Document Understanding with Gemma 3n},
|
| 153 |
+
author={CognitiveLab},
|
| 154 |
+
year={2024},
|
| 155 |
+
url={https://huggingface.co/Nayana-cognitivelab/SectionOCR_SFT_v3_half_en_kn_hi_sa_mr_7250}
|
| 156 |
+
}
|
| 157 |
+
```
|