| --- |
| license: apache-2.0 |
| base_model: google/gemma-3-1b-it |
| tags: |
| - gemma |
| - northeast-india |
| - cultural |
| - fine-tuned |
| - assam |
| - manipur |
| - nagaland |
| - mizoram |
| - tripura |
| - meghalaya |
| - arunachal-pradesh |
| - sikkim |
| - neodac-mini |
| language: |
| - en |
| pipeline_tag: text-generation |
| library_name: transformers |
| widget: |
| - example_title: Bihu Festival |
| text: | |
| <start_of_turn>user |
| What is Bihu festival?<end_of_turn> |
| <start_of_turn>model |
| - example_title: Hornbill Festival |
| text: | |
| <start_of_turn>user |
| Tell me about Hornbill Festival.<end_of_turn> |
| <start_of_turn>model |
| - example_title: Assamese Cuisine |
| text: | |
| <start_of_turn>user |
| What is traditional Assamese cuisine?<end_of_turn> |
| <start_of_turn>model |
| --- |
| |
| # Neodac-mini: Northeast India Cultural AI Model |
|
|
| **Neodac-mini** (Northeast India Cultural) is a specialized language model fine-tuned on cultural knowledge of Northeast India's eight states. Built on Google's Gemma 3 1B Instruct, Neodac-mini provides authentic, detailed responses about the rich cultural heritage of the region. |
|
|
| ## π― Model Overview |
|
|
| - **Base Model**: [google/gemma-3-1b-it](https://huggingface.co/google/gemma-3-1b-it) |
| - **Specialization**: Northeast India Cultural Knowledge |
| - **Training Data**: 6,205 culturally authentic Q&A pairs |
| - **Coverage**: All 8 Northeast Indian states |
| - **Languages**: English (with cultural context) |
|
|
| ## π Key Features |
|
|
| ### Cultural Domains Covered |
| - **Festivals & Celebrations**: Bihu, Hornbill, Losar, Chapchar Kut, etc. |
| - **Traditional Arts**: Dance forms, music, crafts, weaving |
| - **Cuisine**: Regional foods, cooking methods, traditional recipes |
| - **Tribal Heritage**: Community practices, languages, customs |
| - **Geography**: Cultural significance of places and landmarks |
| - **Literature**: Folk tales, oral traditions, regional literature |
|
|
| ### Model Capabilities |
| - β
Accurate cultural information without hallucinations |
| - β
Detailed responses about regional traditions |
| - β
Authentic representation of tribal communities |
| - β
Contextual understanding of cultural nuances |
| - β
Preservation of cultural knowledge through AI |
|
|
| ## π Quick Start |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| # Load model and tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("MWirelabs/neodac-mini") |
| model = AutoModelForCausalLM.from_pretrained( |
| "MWirelabs/neodac-mini", |
| torch_dtype=torch.bfloat16, |
| device_map="auto" |
| ) |
| |
| # Example usage |
| def ask_neodac-mini(question): |
| prompt = f"<start_of_turn>user\n{question}<end_of_turn>\n<start_of_turn>model\n" |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_length=300, |
| temperature=0.7, |
| do_sample=True, |
| pad_token_id=tokenizer.eos_token_id |
| ) |
| |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| return response.split("<start_of_turn>model\n")[-1].strip() |
| |
| # Ask about Northeast India culture |
| response = ask_neodac-mini("What is the significance of bamboo in Northeast India?") |
| print(response) |
| ``` |
|
|
| ## π Training Details |
|
|
| ### Dataset |
| - **Size**: 6,205 cultural Q&A pairs |
| - **Sources**: Regional cultural databases, wiki content, expert curation |
| - **Quality**: Manually verified for cultural authenticity |
| - **Split**: 90% training, 10% validation |
|
|
| ### Training Configuration |
| - **Hardware**: NVIDIA A40 40GB |
| - **Epochs**: 5 (enhanced from initial 3) |
| - **Learning Rate**: 2e-5 (optimized for detailed responses) |
| - **Batch Size**: 8 per device |
| - **Precision**: bfloat16 |
| - **Max Sequence Length**: 512 tokens |
|
|
| ### Improvements Over Base Model |
| | Aspect | Base Gemma 3 1B-IT | Neodac-mini | |
| |--------|-------------------|---------| |
| | Cultural Accuracy | β Hallucinations | β
Factually correct | |
| | Response Detail | β οΈ Generic/brief | β
Rich & comprehensive | |
| | Regional Context | β Limited knowledge | β
Deep cultural understanding | |
| | Tribal Information | β Inaccurate/missing | β
Authentic representation | |
|
|
| ## πͺ Example Comparisons |
|
|
| ### Question: "What is Bihu festival?" |
|
|
| **Base Model Response:** |
| > Claims Bihu is about Lord Shiva (incorrect) |
|
|
| **Neodac-mini Response:** |
| > Bihu is the most important festival of Assam, celebrated by all Assamese people. There are three Bihus that mark different stages of the agricultural calendar: Rongali (or Bohag) Bihu in spring, Kati (or Kongali) Bihu in autumn, and Magh (or Bhogali) Bihu in winter. |
|
|
| ## π― Use Cases |
|
|
| ### Cultural Education |
| - Educational institutions teaching Northeast India studies |
| - Cultural preservation initiatives |
| - Tourism and travel information |
|
|
| ### Research & Documentation |
| - Academic research on regional culture |
| - Cultural anthropology studies |
| - Digital heritage preservation |
|
|
| ### Community Applications |
| - Cultural chatbots for tourism |
| - Educational tools for diaspora communities |
| - Content creation for cultural media |
|
|
| ## β οΈ Limitations |
|
|
| - **Geographic Scope**: Specialized for Northeast India only |
| - **Language**: Responses in English (cultural terms may be in local languages) |
| - **Temporal Knowledge**: Training data has knowledge cutoff |
| - **Bias Inheritance**: May inherit biases from base model and training data |
|
|
| ## π¬ Evaluation & Performance |
|
|
| The model was evaluated on cultural accuracy, response completeness, and factual correctness. Significant improvements were observed over the base model in all cultural domains. |
|
|
| ## π Citation |
|
|
| If you use Neodac-mini in your research or applications, please cite: |
|
|
| ```bibtex |
| @misc{neodac2025, |
| title={Neodac-mini: A Specialized Language Model for Northeast India Cultural Knowledge}, |
| author={MWire Labs}, |
| year={2025}, |
| publisher={Hugging Face}, |
| url={https://huggingface.co/MWirelabs/neodac-mini}, |
| note={Fine-tuned from google/gemma-3-1b-it for cultural preservation and education} |
| } |
| ``` |
|
|
| ## π€ Contributing |
|
|
| Interested in improving Neodac-mini? We welcome: |
| - Additional cultural data from Northeast India |
| - Feedback on cultural accuracy |
| - Suggestions for new cultural domains |
| - Community validation of responses |
|
|
| ## π License |
|
|
| This model is released under the Apache 2.0 license, same as the base Gemma model. |
|
|
| ## π Acknowledgments |
|
|
| - Google for the Gemma 3 1B-IT base model |
| - Cultural experts and communities of Northeast India |
| - Contributors to the cultural dataset |
| - Hugging Face for the platform and tools |
|
|
| --- |
|
|
| *Neodac-mini represents a step forward in culturally-aware AI, preserving and making accessible the rich heritage of Northeast India through technology.* |