File size: 7,215 Bytes
e3cad5d 4d2c6c9 e3cad5d 4d2c6c9 e3cad5d 4d2c6c9 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d 8fd2937 e3cad5d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
---
language:
- en
license: apache-2.0
base_model: google/t5-base
tags:
- text-generation
- prompt-engineering
- stable-diffusion
- image-generation
- wd14-tags
- comfyui
- t5
- text2text-generation
pipeline_tag: text-generation
widget:
- text: "translate prompt to tags: magical girl with blue hair in a garden"
example_title: "Magical Girl"
- text: "translate prompt to tags: cyberpunk city at night with neon lights"
example_title: "Cyberpunk City"
- text: "translate prompt to tags: cute cat sleeping on a windowsill"
example_title: "Cute Cat"
datasets:
- arcenciel
metrics:
- accuracy
model-index:
- name: t5-base-prompt-translator
results:
- task:
type: text-generation
name: Prompt to Tags Translation
metrics:
- type: accuracy
value: 87.5
name: Tag Matching Accuracy
---
# T5 Base Prompt Translator
Transform natural language descriptions into optimized WD14 tags for Stable Diffusion!
This model translates creative natural language prompts into standardized WD14-format tags, trained on 95,000 high-quality prompts from Arcenciel.io.
## Quick Start
```python
from transformers import T5Tokenizer, T5ForConditionalGeneration
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("Elldreth/t5_base_prompt_translator")
model = T5ForConditionalGeneration.from_pretrained("Elldreth/t5_base_prompt_translator")
# Translate a prompt
prompt = "translate prompt to tags: magical girl with blue hair in a garden"
inputs = tokenizer(prompt, return_tensors="pt", max_length=160, truncation=True)
outputs = model.generate(**inputs, max_length=256, num_beams=4)
tags = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(tags)
# Output: 1girl, blue hair, garden, outdoors, solo, long hair, dress, flowers, standing, day, smile, magical girl
```
## Model Details
**Name:** `t5_base_prompt_translator`
**Base Model:** T5-Base (Google)
**Parameters:** 220 million
**Training Data:** 95,000 high-quality prompts from Arcenciel.io
**Training Duration:** ~10 hours on RTX 4090
**Model Size:** ~850 MB
### Training Configuration
- **Epochs:** 7 (~10,388 steps)
- **Batch Size:** 64 (effective)
- **Learning Rate:** 3e-4 → 3e-5 (linear decay)
- **Optimizer:** AdaFactor (memory efficient)
- **Precision:** BF16 mixed precision
- **Max Length:** 160 tokens input, 256 tokens output
### Performance Metrics
**Accuracy:** 85-90% tag matching
**Final Loss:** ~1.2-1.3
**Inference Performance (RTX 4090):**
- Higher beam counts provide better quality at the cost of speed
- 2-4 beams: Very fast, good for iteration
- 8-16 beams: Balanced quality/speed
- 32-64 beams: Maximum quality, excellent performance on RTX 4090
**VRAM Usage:**
- Model loading: ~2 GB
- Inference: Additional 1-4 GB depending on beam count
- Total: ~3-6 GB for highest quality settings
**Note:** The model performs exceptionally well even at high beam counts (32-64) on RTX 4090, making it practical to use maximum quality settings for production work.
### Data Format
**Input Format:**
```
translate prompt to tags: [natural language description]
```
**Output Format:**
```
tag1, tag2, tag3, tag4, ...
```
**Tag Format:**
- WD14 tagger format with escaped parentheses: `tag \(descriptor\)`
- Example: `shrug \(clothing\)`, `blue eyes`, `long hair`
### Training Data Sources
- **Source:** Arcenciel.io API
- **Dataset Size:** 95,000 image-prompt pairs
- **Rating Filter:** None (all ratings included for maximum diversity)
- **Quality Filter:** None (engagement metrics not widely used on site)
- **Ground Truth:** WD14 v1.4 MOAT tagger (SmilingWolf)
**Note:** Quality filtering was intentionally avoided to prevent limiting the training data diversity. Engagement metrics (hearts, likes) are not consistently used across the site, so filtering by them would have reduced dataset quality rather than improved it.
### Model Files
- `config.json` - Model configuration
- `model.safetensors` - Model weights (safetensors format)
- `tokenizer_config.json` - Tokenizer configuration
- `spiece.model` - SentencePiece tokenizer model
- `special_tokens_map.json` - Special tokens mapping
- `added_tokens.json` - Additional tokens
- `generation_config.json` - Generation defaults
- `training_args.bin` - Training arguments (metadata)
### License
This model is based on T5-Base by Google, which is licensed under Apache 2.0.
**Model License:** Apache 2.0
**Training Data:** Arcenciel.io (public API)
**Usage:** Free for commercial and non-commercial use
### Citation
If you use this model in your work, please cite:
```
T5X Prompt Translator Base 95K
Trained on Arcenciel.io dataset using WD14 v1.4 MOAT tagger
Base model: T5-Base (Google)
```
### Updates & Versions
**Version 1.0** (Current)
- Initial release
- Trained on 95K prompts
- T5-Base architecture
- WD14 v1.4 MOAT ground truth
## Use with ComfyUI
This model is designed to work with the [ComfyUI-T5X-Prompt-Translator](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) custom node:
1. Install the custom node in ComfyUI
2. The model will auto-download on first use
3. Use the node to translate natural language to WD14 tags
4. Connect to CLIP Text Encode for image generation
See the [ComfyUI custom node repository](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) for installation instructions.
## Intended Use
**Primary Use Case:** Converting creative natural language descriptions into optimized WD14-format tags for Stable Diffusion image generation.
**Example Applications:**
- Simplifying prompt creation for Stable Diffusion
- Standardizing prompts across different workflows
- Learning WD14 tag vocabulary
- Batch processing natural language descriptions
## Limitations
- Trained primarily on anime/illustration style prompts
- May not perform as well on photorealistic or other specialized domains
- Tag accuracy depends on similarity to training data
- Requires proper input format: `"translate prompt to tags: [your description]"`
## Training Data
- **Source:** Arcenciel.io public API
- **Size:** 95,000 image-prompt pairs
- **Filtering:** No quality or rating filters (maximum diversity)
- **Ground Truth:** WD14 v1.4 MOAT tagger by SmilingWolf
- **Format:** Escaped parentheses format for Stable Diffusion compatibility
**Note:** Quality filtering was intentionally avoided to maximize training data diversity. Engagement metrics (hearts, likes) are not consistently used across the source platform.
## Citation
```bibtex
@misc{t5-base-prompt-translator,
title={T5 Base Prompt Translator: Natural Language to WD14 Tags},
author={Elldreth},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/Elldreth/t5_base_prompt_translator}},
}
```
## Support & Links
- **ComfyUI Node:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator
- **Issues:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator/issues
- **Training Code:** https://github.com/yourusername/tag_generator
## Acknowledgments
- **Base Model:** T5-Base by Google Research
- **Training Data:** Arcenciel.io community
- **WD14 Tagger:** SmilingWolf's WD v1.4 MOAT tagger
- **Framework:** Hugging Face Transformers
|