File size: 6,572 Bytes
3b97bab
6375d12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b083e4
6375d12
 
 
3b97bab
6375d12
6b083e4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b97bab
 
6b083e4
6375d12
6b083e4
6375d12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b083e4
6375d12
6b083e4
6375d12
 
 
 
 
6b083e4
6375d12
 
 
 
 
 
 
 
 
 
 
3b97bab
6375d12
 
 
 
6b083e4
6375d12
 
3b97bab
6375d12
3b97bab
6375d12
 
 
 
 
3b97bab
6375d12
 
 
 
 
 
 
3b97bab
6375d12
6b083e4
6375d12
 
 
 
 
3b97bab
6375d12
3b97bab
6375d12
3b97bab
6375d12
 
3b97bab
6b083e4
6375d12
3b97bab
6375d12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b97bab
6375d12
 
6b083e4
6375d12
 
 
6b083e4
f521fbf
6375d12
 
6b083e4
6375d12
 
 
3b97bab
6375d12
3b97bab
6b083e4
6375d12
 
 
 
3b97bab
6375d12
3b97bab
6375d12
3b97bab
6375d12
3b97bab
6375d12
 
 
 
3b97bab
6375d12
3b97bab
6b083e4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
license: apache-2.0
base_model: google/gemma-3-1b-it
tags:
- gemma
- northeast-india
- cultural
- fine-tuned
- assam
- manipur
- nagaland
- mizoram
- tripura
- meghalaya
- arunachal-pradesh
- sikkim
- neodac-mini
language:
- en
pipeline_tag: text-generation
library_name: transformers
widget:
- example_title: Bihu Festival
  text: |
    <start_of_turn>user
    What is Bihu festival?<end_of_turn>
    <start_of_turn>model
- example_title: Hornbill Festival
  text: |
    <start_of_turn>user
    Tell me about Hornbill Festival.<end_of_turn>
    <start_of_turn>model
- example_title: Assamese Cuisine
  text: |
    <start_of_turn>user
    What is traditional Assamese cuisine?<end_of_turn>
    <start_of_turn>model
---

# Neodac-mini: Northeast India Cultural AI Model

**Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region.

## 🎯 Model Overview

- **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it)
- **Specialization**: Northeast India Cultural Knowledge
- **Training Data**: 6,205 culturally authentic Q&A pairs
- **Coverage**: All 8 Northeast Indian states
- **Languages**: English (with cultural context)

## 🌟 Key Features

### Cultural Domains Covered
- **Festivals & Celebrations**: Bihu, Hornbill, Losar, Chapchar Kut, etc.
- **Traditional Arts**: Dance forms, music, crafts, weaving
- **Cuisine**: Regional foods, cooking methods, traditional recipes
- **Tribal Heritage**: Community practices, languages, customs
- **Geography**: Cultural significance of places and landmarks
- **Literature**: Folk tales, oral traditions, regional literature

### Model Capabilities
- βœ… Accurate cultural information without hallucinations
- βœ… Detailed responses about regional traditions
- βœ… Authentic representation of tribal communities
- βœ… Contextual understanding of cultural nuances
- βœ… Preservation of cultural knowledge through AI

## πŸš€ Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini")
model = AutoModelForCausalLM.from_pretrained(
    "MWirelabs/neodac-mini",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example usage
def ask_neodac-mini(question):
    prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n"
    inputs = tokenizer(prompt, return_tensors="pt")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=300,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("<start_of_turn>model\n")[-1].strip()

# Ask about Northeast India culture
response = ask_neodac-mini("What is the significance of bamboo in Northeast India?")
print(response)
```

## πŸ“Š Training Details

### Dataset
- **Size**: 6,205 cultural Q&A pairs
- **Sources**: Regional cultural databases, wiki content, expert curation
- **Quality**: Manually verified for cultural authenticity
- **Split**: 90% training, 10% validation

### Training Configuration
- **Hardware**: NVIDIA A40 40GB
- **Epochs**: 5 (enhanced from initial 3)
- **Learning Rate**: 2e-5 (optimized for detailed responses)
- **Batch Size**: 8 per device
- **Precision**: bfloat16
- **Max Sequence Length**: 512 tokens

### Improvements Over Base Model
| Aspect | Base Gemma 3 1B-IT | Neodac-mini |
|--------|-------------------|---------|
| Cultural Accuracy | ❌ Hallucinations | βœ… Factually correct |
| Response Detail | ⚠️ Generic/brief | βœ… Rich & comprehensive |
| Regional Context | ❌ Limited knowledge | βœ… Deep cultural understanding |
| Tribal Information | ❌ Inaccurate/missing | βœ… Authentic representation |

## πŸŽͺ Example Comparisons

### Question: "What is Bihu festival?"

**Base Model Response:**
> Claims Bihu is about Lord Shiva (incorrect)

**Neodac-mini Response:**
> Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter.

## 🎯 Use Cases

### Cultural Education
- Educational institutions teaching Northeast India studies
- Cultural preservation initiatives
- Tourism and travel information

### Research & Documentation
- Academic research on regional culture
- Cultural anthropology studies
- Digital heritage preservation

### Community Applications
- Cultural chatbots for tourism
- Educational tools for diaspora communities
- Content creation for cultural media

## ⚠️ Limitations

- **Geographic Scope**: Specialized for Northeast India only
- **Language**: Responses in English (cultural terms may be in local languages)
- **Temporal Knowledge**: Training data has knowledge cutoff
- **Bias Inheritance**: May inherit biases from base model and training data

## πŸ”¬ Evaluation & Performance

The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains.

## πŸ“œ Citation

If you use Neodac-mini in your research or applications, please cite:

```bibtex
@misc{neodac2025,
  title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge},
  author={MWire Labs},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/MWirelabs/neodac-mini},
  note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education}
}
```

## 🀝 Contributing

Interested in improving Neodac-mini? We welcome:
- Additional cultural data from Northeast India
- Feedback on cultural accuracy
- Suggestions for new cultural domains
- Community validation of responses

## πŸ“„ License

This model is released under the Apache 2.0 license, same as the base Gemma model.

## πŸ™ Acknowledgments

- Google for the Gemma 3 1B-IT base model
- Cultural experts and communities of Northeast India
- Contributors to the cultural dataset
- Hugging Face for the platform and tools

---

*Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.*