File size: 7,215 Bytes
e3cad5d
 
 
 
 
 
4d2c6c9
e3cad5d
 
 
 
 
 
4d2c6c9
 
e3cad5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d2c6c9
e3cad5d
 
 
 
 
 
 
 
 
 
 
 
 
 
8fd2937
e3cad5d
 
8fd2937
e3cad5d
 
 
8fd2937
e3cad5d
 
 
 
 
 
 
 
 
 
 
8fd2937
 
 
 
e3cad5d
8fd2937
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3cad5d
8fd2937
e3cad5d
8fd2937
e3cad5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8fd2937
e3cad5d
8fd2937
e3cad5d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
language:
- en
license: apache-2.0
base_model: google/t5-base
tags:
- text-generation
- prompt-engineering
- stable-diffusion
- image-generation
- wd14-tags
- comfyui
- t5
- text2text-generation
pipeline_tag: text-generation
widget:
- text: "translate prompt to tags: magical girl with blue hair in a garden"
  example_title: "Magical Girl"
- text: "translate prompt to tags: cyberpunk city at night with neon lights"
  example_title: "Cyberpunk City"
- text: "translate prompt to tags: cute cat sleeping on a windowsill"
  example_title: "Cute Cat"
datasets:
- arcenciel
metrics:
- accuracy
model-index:
- name: t5-base-prompt-translator
  results:
  - task:
      type: text-generation
      name: Prompt to Tags Translation
    metrics:
    - type: accuracy
      value: 87.5
      name: Tag Matching Accuracy
---

# T5 Base Prompt Translator

Transform natural language descriptions into optimized WD14 tags for Stable Diffusion!

This model translates creative natural language prompts into standardized WD14-format tags, trained on 95,000 high-quality prompts from Arcenciel.io.

## Quick Start

```python
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("Elldreth/t5_base_prompt_translator")
model = T5ForConditionalGeneration.from_pretrained("Elldreth/t5_base_prompt_translator")

# Translate a prompt
prompt = "translate prompt to tags: magical girl with blue hair in a garden"
inputs = tokenizer(prompt, return_tensors="pt", max_length=160, truncation=True)
outputs = model.generate(**inputs, max_length=256, num_beams=4)
tags = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(tags)
# Output: 1girl, blue hair, garden, outdoors, solo, long hair, dress, flowers, standing, day, smile, magical girl
```

## Model Details

**Name:** `t5_base_prompt_translator`  
**Base Model:** T5-Base (Google)  
**Parameters:** 220 million  
**Training Data:** 95,000 high-quality prompts from Arcenciel.io  
**Training Duration:** ~10 hours on RTX 4090  
**Model Size:** ~850 MB

### Training Configuration

- **Epochs:** 7 (~10,388 steps)
- **Batch Size:** 64 (effective)
- **Learning Rate:** 3e-4 → 3e-5 (linear decay)
- **Optimizer:** AdaFactor (memory efficient)
- **Precision:** BF16 mixed precision
- **Max Length:** 160 tokens input, 256 tokens output

### Performance Metrics

**Accuracy:** 85-90% tag matching  
**Final Loss:** ~1.2-1.3  

**Inference Performance (RTX 4090):**
- Higher beam counts provide better quality at the cost of speed
- 2-4 beams: Very fast, good for iteration
- 8-16 beams: Balanced quality/speed
- 32-64 beams: Maximum quality, excellent performance on RTX 4090

**VRAM Usage:**
- Model loading: ~2 GB
- Inference: Additional 1-4 GB depending on beam count
- Total: ~3-6 GB for highest quality settings

**Note:** The model performs exceptionally well even at high beam counts (32-64) on RTX 4090, making it practical to use maximum quality settings for production work.

### Data Format

**Input Format:**
```
translate prompt to tags: [natural language description]
```

**Output Format:**
```
tag1, tag2, tag3, tag4, ...
```

**Tag Format:**
- WD14 tagger format with escaped parentheses: `tag \(descriptor\)`
- Example: `shrug \(clothing\)`, `blue eyes`, `long hair`

### Training Data Sources

- **Source:** Arcenciel.io API
- **Dataset Size:** 95,000 image-prompt pairs
- **Rating Filter:** None (all ratings included for maximum diversity)
- **Quality Filter:** None (engagement metrics not widely used on site)
- **Ground Truth:** WD14 v1.4 MOAT tagger (SmilingWolf)

**Note:** Quality filtering was intentionally avoided to prevent limiting the training data diversity. Engagement metrics (hearts, likes) are not consistently used across the site, so filtering by them would have reduced dataset quality rather than improved it.

### Model Files

- `config.json` - Model configuration
- `model.safetensors` - Model weights (safetensors format)
- `tokenizer_config.json` - Tokenizer configuration
- `spiece.model` - SentencePiece tokenizer model
- `special_tokens_map.json` - Special tokens mapping
- `added_tokens.json` - Additional tokens
- `generation_config.json` - Generation defaults
- `training_args.bin` - Training arguments (metadata)

### License

This model is based on T5-Base by Google, which is licensed under Apache 2.0.

**Model License:** Apache 2.0  
**Training Data:** Arcenciel.io (public API)  
**Usage:** Free for commercial and non-commercial use

### Citation

If you use this model in your work, please cite:

```
T5X Prompt Translator Base 95K
Trained on Arcenciel.io dataset using WD14 v1.4 MOAT tagger
Base model: T5-Base (Google)
```

### Updates & Versions

**Version 1.0** (Current)
- Initial release
- Trained on 95K prompts
- T5-Base architecture
- WD14 v1.4 MOAT ground truth

## Use with ComfyUI

This model is designed to work with the [ComfyUI-T5X-Prompt-Translator](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) custom node:

1. Install the custom node in ComfyUI
2. The model will auto-download on first use
3. Use the node to translate natural language to WD14 tags
4. Connect to CLIP Text Encode for image generation

See the [ComfyUI custom node repository](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) for installation instructions.

## Intended Use

**Primary Use Case:** Converting creative natural language descriptions into optimized WD14-format tags for Stable Diffusion image generation.

**Example Applications:**
- Simplifying prompt creation for Stable Diffusion
- Standardizing prompts across different workflows
- Learning WD14 tag vocabulary
- Batch processing natural language descriptions

## Limitations

- Trained primarily on anime/illustration style prompts
- May not perform as well on photorealistic or other specialized domains
- Tag accuracy depends on similarity to training data
- Requires proper input format: `"translate prompt to tags: [your description]"`

## Training Data

- **Source:** Arcenciel.io public API
- **Size:** 95,000 image-prompt pairs
- **Filtering:** No quality or rating filters (maximum diversity)
- **Ground Truth:** WD14 v1.4 MOAT tagger by SmilingWolf
- **Format:** Escaped parentheses format for Stable Diffusion compatibility

**Note:** Quality filtering was intentionally avoided to maximize training data diversity. Engagement metrics (hearts, likes) are not consistently used across the source platform.

## Citation

```bibtex
@misc{t5-base-prompt-translator,
  title={T5 Base Prompt Translator: Natural Language to WD14 Tags},
  author={Elldreth},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Elldreth/t5_base_prompt_translator}},
}
```

## Support & Links

- **ComfyUI Node:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator
- **Issues:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator/issues
- **Training Code:** https://github.com/yourusername/tag_generator

## Acknowledgments

- **Base Model:** T5-Base by Google Research
- **Training Data:** Arcenciel.io community
- **WD14 Tagger:** SmilingWolf's WD v1.4 MOAT tagger
- **Framework:** Hugging Face Transformers