AdithyaSK commited on
Commit
5898981
Β·
verified Β·
1 Parent(s): e4ec008

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -6
README.md CHANGED
@@ -1,21 +1,157 @@
 
1
  ---
2
- base_model: Nayana-cognitivelab/SectionOCR_SFT_v3_half
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - gemma3n
 
 
 
 
8
  license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** Nayana-cognitivelab
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** Nayana-cognitivelab/SectionOCR_SFT_v3_half
18
 
19
- This gemma3n model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
  ---
3
+ base_model: unsloth/gemma-3n-E4B-it
4
  tags:
5
  - text-generation-inference
6
  - transformers
7
  - unsloth
8
  - gemma3n
9
+ - ocr
10
+ - document-understanding
11
+ - multilingual
12
+ - vision-language
13
  license: apache-2.0
14
  language:
15
  - en
16
+ - kn
17
+ - hi
18
+ - mr
19
+ - sa
20
+ datasets:
21
+ - Nayana-cognitivelab/SectionOCR-SFT
22
+ library_name: transformers
23
+ pipeline_tag: image-text-to-text
24
  ---
25
 
26
+ # πŸ” Nayana SectionOCR - Advanced Multilingual OCR Model
27
+
28
+ **Developed by:** [CognitiveLab](https://nayana.cognitivelab.in/)
29
+ **License:** Apache 2.0
30
+ **Base Model:** unsloth/gemma-3n-E4B-it
31
+ **Architecture:** Gemma 3n (4B parameters)
32
+
33
+ ## 🌟 Model Overview
34
+
35
+ Nayana SectionOCR is an advanced multilingual vision-language model specifically fine-tuned for Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and extracting text from complex visual documents across multiple languages.
36
+
37
+
38
+
39
+ ## 🌍 Supported Languages
40
+
41
+ - **English** (en) - Primary language
42
+ - **Kannada** (kn) - Indian regional language
43
+ - **Hindi** (hi) - Indian national language
44
+ - **Marathi** (mr) - Indian regional language
45
+ - **Sanskrit** (sa) - Classical language
46
+
47
+ 17 other languages coming soon !!!!!
48
+
49
+ ## 🎯 Key Features
50
+
51
+ - **Multilingual OCR**: Accurate text extraction in 5 languages
52
+ - **Document Understanding**: Advanced layout and structure comprehension
53
+ - **Fast Inference**: Optimized for real-time applications
54
+ - **High Accuracy**: Fine-tuned on diverse document datasets
55
+ - **Easy Integration**: Compatible with Transformers and Modal deployment
56
+
57
+ ## πŸ“‹ Model Specifications
58
+
59
+ | Parameter | Value |
60
+ |-----------|-------|
61
+ | Model Size | 4B parameters |
62
+ | Context Length | 32K tokens |
63
+ | Image Resolution | Flexible (optimized for documents) |
64
+ | Precision | BFloat16 |
65
+ | Framework | Transformers + Unsloth |
66
+
67
+ ## πŸš€ Quick Start
68
 
69
+ ### Installation
 
 
70
 
71
+ ````python
72
+ pip install transformers torch pillow unsloth
73
+ ````
74
+
75
+ ### Basic Usage
76
+
77
+ ````python
78
+ from transformers import AutoProcessor, AutoModelForImageTextToText
79
+ from PIL import Image
80
+ import torch
81
+
82
+ # Load model and processor
83
+ model_id = "Nayana-cognitivelab/NayanaSectionOCR"
84
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
85
+ model = AutoModelForImageTextToText.from_pretrained(
86
+ model_id,
87
+ device_map="auto",
88
+ torch_dtype=torch.bfloat16,
89
+ trust_remote_code=True
90
+ )
91
+
92
+ # System prompt
93
+ system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with complex visual documents. Most importantly, you are multilingual, capable of understanding and processing documents in a wide range of languages with precision."
94
+
95
+ # Load and process image
96
+ image = Image.open("your_document.jpg")
97
+ language = "English" # or "Kannada", "Hindi", "Marathi", "Sanskrit"
98
+ user_prompt = f"Extract the text from this image in {language}"
99
+
100
+ # Prepare messages
101
+ messages = [
102
+ {
103
+ "role": "system",
104
+ "content": [{"type": "text", "text": system_prompt}]
105
+ },
106
+ {
107
+ "role": "user",
108
+ "content": [
109
+ {"type": "text", "text": user_prompt},
110
+ {"type": "image", "image": image}
111
+ ]
112
+ }
113
+ ]
114
+
115
+ # Apply chat template
116
+ inputs = processor.apply_chat_template(
117
+ messages,
118
+ add_generation_prompt=True,
119
+ tokenize=True,
120
+ return_dict=True,
121
+ return_tensors="pt"
122
+ )
123
+
124
+ # Generate response
125
+ with torch.inference_mode():
126
+ outputs = model.generate(
127
+ **inputs,
128
+ max_new_tokens=1024,
129
+ temperature=1.0,
130
+ top_p=0.95,
131
+ top_k=64,
132
+ do_sample=True
133
+ )
134
+
135
+ # Decode response
136
+ response = processor.tokenizer.decode(
137
+ outputs[0][inputs["input_ids"].shape[1]:],
138
+ skip_special_tokens=True
139
+ )
140
+ print(response)
141
+ ````
142
+
143
+
144
+ This model was trained **2x faster** with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
145
 
146
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
147
+
148
+ ## πŸ“œ Citation
149
+
150
+ ```bibtex
151
+ @model{nayana_sectionocr_2024,
152
+ title={Nayana SectionOCR: Multilingual Document Understanding with Gemma 3n},
153
+ author={CognitiveLab},
154
+ year={2024},
155
+ url={https://huggingface.co/Nayana-cognitivelab/SectionOCR_SFT_v3_half_en_kn_hi_sa_mr_7250}
156
+ }
157
+ ```