๐ More intelligent
Browse files
README.md
CHANGED
|
@@ -1,75 +1 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
-
datasets:
|
| 6 |
-
- open-web-math/open-web-math
|
| 7 |
-
- vietgpt/the_pile_openwebtext2
|
| 8 |
-
- li2017dailydialog/daily_dialog
|
| 9 |
-
- OpenAssistant/oasst1
|
| 10 |
-
tags:
|
| 11 |
-
- gpt2
|
| 12 |
-
- decoder-only
|
| 13 |
-
- from-scratch
|
| 14 |
-
- small-model
|
| 15 |
-
- conversational
|
| 16 |
-
- open-source
|
| 17 |
-
- transformers
|
| 18 |
-
---
|
| 19 |
|
| 20 |
-
# ๐ง GPT-1.5 High
|
| 21 |
-
|
| 22 |
-
**GPT-1.5 High** is a lightweight, from-scratch language model designed for text generation, conversational AI, and light reasoning tasks. It follows the GPT2 architecture but is specifically designed to reduce complexity and resource usage while maintaining strong language generation capabilities.
|
| 23 |
-
|
| 24 |
-
## ๐ง Built From Scratch
|
| 25 |
-
|
| 26 |
-
- **Architecture:** Decoder-only transformer (GPT-style)
|
| 27 |
-
- **Layers:** 12 transformer blocks
|
| 28 |
-
- **Hidden size:** 1024
|
| 29 |
-
- **Heads:** 16 attention heads
|
| 30 |
-
- **Context length:** 1024 tokens
|
| 31 |
-
- **Total parameters:** ~200M
|
| 32 |
-
- **Tokenizer:** Custom tokenizer, trained from scratch using a BPE (Byte Pair Encoding) approach.
|
| 33 |
-
|
| 34 |
-
This model was carefully trained from the ground up, using a custom tokenizer to ensure optimal tokenization and vocabulary for the tasks at hand.
|
| 35 |
-
|
| 36 |
-
## ๐ ๏ธ Training Setup
|
| 37 |
-
|
| 38 |
-
This model was trained from scratch using the following resources:
|
| 39 |
-
|
| 40 |
-
- **GPU:** NVIDIA A100 (40GB VRAM)
|
| 41 |
-
- **Framework:** ๐ค Hugging Face Transformers + PyTorch
|
| 42 |
-
- **Batching:** Gradient accumulation to handle larger context sequences
|
| 43 |
-
- **Precision:** Mixed (fp16) and full (fp32) experiments
|
| 44 |
-
- **Training time:** Several days (depending on GPU availability)
|
| 45 |
-
|
| 46 |
-
## ๐ Datasets Used
|
| 47 |
-
|
| 48 |
-
To create a diverse and capable language model, **GPT-1.5 High** was trained on a variety of datasets:
|
| 49 |
-
|
| 50 |
-
- ๐งฎ `open-web-math/open-web-math` โ Mathematical reasoning
|
| 51 |
-
- ๐ `vietgpt/the_pile_openwebtext2` โ General web knowledge
|
| 52 |
-
- ๐ฌ `li2017dailydialog/daily_dialog` โ Dialogue for everyday conversations
|
| 53 |
-
- ๐ค `OpenAssistant/oasst1` โ Assistant-style tasks and prompts
|
| 54 |
-
|
| 55 |
-
These datasets provide a good mix of reasoning, everyday conversation, and general knowledge for diverse use cases.
|
| 56 |
-
|
| 57 |
-
## ๐ก Intended Use
|
| 58 |
-
|
| 59 |
-
This model is ideal for:
|
| 60 |
-
|
| 61 |
-
- Prototyping smaller, efficient language models
|
| 62 |
-
- Educational and research purposes
|
| 63 |
-
- Local/edge deployments where size and latency are critical
|
| 64 |
-
- Conversational AI and chatbots
|
| 65 |
-
|
| 66 |
-
โ ๏ธ Please note: This model is **not fine-tuned** for safety, ethical guidelines, or content moderation. Use with caution in public-facing applications.
|
| 67 |
-
|
| 68 |
-
## ๐งช Example Usage
|
| 69 |
-
|
| 70 |
-
```python
|
| 71 |
-
from transformers import pipeline
|
| 72 |
-
|
| 73 |
-
generator = pipeline("text-generation", model="WolfInk/GPT-1.5-High", tokenizer="WolfInk/GPT-1.5-High")
|
| 74 |
-
response = generator("Hey", max_length=30, do_sample=True, temperature=0.9)
|
| 75 |
-
print(response[0]["generated_text"])
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|