WolfInk commited on
Commit
e2d6609
ยท
verified ยท
1 Parent(s): 7755f64

๐Ÿ”„ More intelligent

Browse files
Files changed (1) hide show
  1. README.md +0 -74
README.md CHANGED
@@ -1,75 +1 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- datasets:
6
- - open-web-math/open-web-math
7
- - vietgpt/the_pile_openwebtext2
8
- - li2017dailydialog/daily_dialog
9
- - OpenAssistant/oasst1
10
- tags:
11
- - gpt2
12
- - decoder-only
13
- - from-scratch
14
- - small-model
15
- - conversational
16
- - open-source
17
- - transformers
18
- ---
19
 
20
- # ๐Ÿง  GPT-1.5 High
21
-
22
- **GPT-1.5 High** is a lightweight, from-scratch language model designed for text generation, conversational AI, and light reasoning tasks. It follows the GPT2 architecture but is specifically designed to reduce complexity and resource usage while maintaining strong language generation capabilities.
23
-
24
- ## ๐Ÿšง Built From Scratch
25
-
26
- - **Architecture:** Decoder-only transformer (GPT-style)
27
- - **Layers:** 12 transformer blocks
28
- - **Hidden size:** 1024
29
- - **Heads:** 16 attention heads
30
- - **Context length:** 1024 tokens
31
- - **Total parameters:** ~200M
32
- - **Tokenizer:** Custom tokenizer, trained from scratch using a BPE (Byte Pair Encoding) approach.
33
-
34
- This model was carefully trained from the ground up, using a custom tokenizer to ensure optimal tokenization and vocabulary for the tasks at hand.
35
-
36
- ## ๐Ÿ› ๏ธ Training Setup
37
-
38
- This model was trained from scratch using the following resources:
39
-
40
- - **GPU:** NVIDIA A100 (40GB VRAM)
41
- - **Framework:** ๐Ÿค— Hugging Face Transformers + PyTorch
42
- - **Batching:** Gradient accumulation to handle larger context sequences
43
- - **Precision:** Mixed (fp16) and full (fp32) experiments
44
- - **Training time:** Several days (depending on GPU availability)
45
-
46
- ## ๐Ÿ“š Datasets Used
47
-
48
- To create a diverse and capable language model, **GPT-1.5 High** was trained on a variety of datasets:
49
-
50
- - ๐Ÿงฎ `open-web-math/open-web-math` โ€“ Mathematical reasoning
51
- - ๐ŸŒ `vietgpt/the_pile_openwebtext2` โ€“ General web knowledge
52
- - ๐Ÿ’ฌ `li2017dailydialog/daily_dialog` โ€“ Dialogue for everyday conversations
53
- - ๐Ÿค– `OpenAssistant/oasst1` โ€“ Assistant-style tasks and prompts
54
-
55
- These datasets provide a good mix of reasoning, everyday conversation, and general knowledge for diverse use cases.
56
-
57
- ## ๐Ÿ’ก Intended Use
58
-
59
- This model is ideal for:
60
-
61
- - Prototyping smaller, efficient language models
62
- - Educational and research purposes
63
- - Local/edge deployments where size and latency are critical
64
- - Conversational AI and chatbots
65
-
66
- โš ๏ธ Please note: This model is **not fine-tuned** for safety, ethical guidelines, or content moderation. Use with caution in public-facing applications.
67
-
68
- ## ๐Ÿงช Example Usage
69
-
70
- ```python
71
- from transformers import pipeline
72
-
73
- generator = pipeline("text-generation", model="WolfInk/GPT-1.5-High", tokenizer="WolfInk/GPT-1.5-High")
74
- response = generator("Hey", max_length=30, do_sample=True, temperature=0.9)
75
- print(response[0]["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1