GPT-OSS Mini Thai Chat

GPT-OSS Mini Thai Chat เป็นโมเดลภาษาไทยขนาดเล็กที่พัฒนาขึ้นโดยอ้างอิงสถาปัตยกรรม GPT-OSS และ Mixture of Experts (MoE) ออกแบบมาเพื่อการทดลองและใช้งานบนเครื่องที่มีทรัพยากรจำกัด เช่น Google Colab T4 หรือ GPU ที่มีหน่วยความจำไม่มาก โมเดลนี้ได้รับการฝึกด้วยข้อมูลสนทนาภาษาไทยที่มีลักษณะเป็น instruction, input และ output

Model Specifications

Specification	Value
Architecture	GPT-OSS Mini (MoE)
Number of Layers	6
Hidden Size	768
Attention Heads	8
Intermediate Size	3072 (SwiGLU)
Mixture of Experts	4
Top-K Active Experts	2
Vocabulary Size	50,000
Maximum Sequence Length	512
Normalization	RMSNorm
Activation Function	SwiGLU
Tokenizer	ZombitX64/Hanuman

Training Data

Dataset: ZombitX64/ThaiChatbotConversation
Domain: สนทนาภาษาไทย (instruction → input → output)
Task: Causal Language Modeling (CLM)

Training Details

Framework: Hugging Face Transformers + Trainer
Loss Function: Cross Entropy (ignore index = -100)
Optimizer: AdamW
Mixed Precision: FP16
Epochs: 3
Batch Size: 2 (gradient accumulation steps = 8)
Learning Rate: 5e-4

Usage

Installation

pip install transformers accelerate sentencepiece

Load Model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("JonusNattapong/gptoss-mini-thaichat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("JonusNattapong/gptoss-mini-thaichat", trust_remote_code=True)

inputs = tokenizer("สวัสดีครับ", return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_length=50,
    do_sample=True,
    top_p=0.9,
    temperature=0.7
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Output

Input:  สวัสดีครับ
Output: สวัสดีครับ ยินดีที่ได้สนทนาด้วยครับ ต้องการให้ช่วยเรื่องใดเพิ่มเติมหรือไม่

Limitations

โมเดลนี้เป็นขนาดเล็กเพื่อการทดลอง อาจไม่เหมาะสมกับการใช้งานในระดับ production
ความสามารถด้าน reasoning และการทำงานกับโจทย์ซับซ้อนยังมีข้อจำกัด
ไม่ได้ผ่านการปรับ fine-tune ด้านความปลอดภัยหรือการกลั่นกรองเนื้อหา

License

MIT License

Downloads last month: 10

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for JonusNattapong/gptoss-mini-thaichat

Finetunes

1 model

Dataset used to train JonusNattapong/gptoss-mini-thaichat

Evaluation results

perplexity on ZombitX64/ThaiChatbotConversation
self-reported

N/A