--- language: - en license: apache-2.0 base_model: google/t5-base tags: - text-generation - prompt-engineering - stable-diffusion - image-generation - wd14-tags - comfyui - t5 - text2text-generation pipeline_tag: text-generation widget: - text: "translate prompt to tags: magical girl with blue hair in a garden" example_title: "Magical Girl" - text: "translate prompt to tags: cyberpunk city at night with neon lights" example_title: "Cyberpunk City" - text: "translate prompt to tags: cute cat sleeping on a windowsill" example_title: "Cute Cat" datasets: - arcenciel metrics: - accuracy model-index: - name: t5-base-prompt-translator results: - task: type: text-generation name: Prompt to Tags Translation metrics: - type: accuracy value: 87.5 name: Tag Matching Accuracy --- # T5 Base Prompt Translator Transform natural language descriptions into optimized WD14 tags for Stable Diffusion! This model translates creative natural language prompts into standardized WD14-format tags, trained on 95,000 high-quality prompts from Arcenciel.io. ## Quick Start ```python from transformers import T5Tokenizer, T5ForConditionalGeneration # Load model and tokenizer tokenizer = T5Tokenizer.from_pretrained("Elldreth/t5_base_prompt_translator") model = T5ForConditionalGeneration.from_pretrained("Elldreth/t5_base_prompt_translator") # Translate a prompt prompt = "translate prompt to tags: magical girl with blue hair in a garden" inputs = tokenizer(prompt, return_tensors="pt", max_length=160, truncation=True) outputs = model.generate(**inputs, max_length=256, num_beams=4) tags = tokenizer.decode(outputs[0], skip_special_tokens=True) print(tags) # Output: 1girl, blue hair, garden, outdoors, solo, long hair, dress, flowers, standing, day, smile, magical girl ``` ## Model Details **Name:** `t5_base_prompt_translator` **Base Model:** T5-Base (Google) **Parameters:** 220 million **Training Data:** 95,000 high-quality prompts from Arcenciel.io **Training Duration:** ~10 hours on RTX 4090 **Model Size:** ~850 MB ### Training Configuration - **Epochs:** 7 (~10,388 steps) - **Batch Size:** 64 (effective) - **Learning Rate:** 3e-4 → 3e-5 (linear decay) - **Optimizer:** AdaFactor (memory efficient) - **Precision:** BF16 mixed precision - **Max Length:** 160 tokens input, 256 tokens output ### Performance Metrics **Accuracy:** 85-90% tag matching **Final Loss:** ~1.2-1.3 **Inference Performance (RTX 4090):** - Higher beam counts provide better quality at the cost of speed - 2-4 beams: Very fast, good for iteration - 8-16 beams: Balanced quality/speed - 32-64 beams: Maximum quality, excellent performance on RTX 4090 **VRAM Usage:** - Model loading: ~2 GB - Inference: Additional 1-4 GB depending on beam count - Total: ~3-6 GB for highest quality settings **Note:** The model performs exceptionally well even at high beam counts (32-64) on RTX 4090, making it practical to use maximum quality settings for production work. ### Data Format **Input Format:** ``` translate prompt to tags: [natural language description] ``` **Output Format:** ``` tag1, tag2, tag3, tag4, ... ``` **Tag Format:** - WD14 tagger format with escaped parentheses: `tag \(descriptor\)` - Example: `shrug \(clothing\)`, `blue eyes`, `long hair` ### Training Data Sources - **Source:** Arcenciel.io API - **Dataset Size:** 95,000 image-prompt pairs - **Rating Filter:** None (all ratings included for maximum diversity) - **Quality Filter:** None (engagement metrics not widely used on site) - **Ground Truth:** WD14 v1.4 MOAT tagger (SmilingWolf) **Note:** Quality filtering was intentionally avoided to prevent limiting the training data diversity. Engagement metrics (hearts, likes) are not consistently used across the site, so filtering by them would have reduced dataset quality rather than improved it. ### Model Files - `config.json` - Model configuration - `model.safetensors` - Model weights (safetensors format) - `tokenizer_config.json` - Tokenizer configuration - `spiece.model` - SentencePiece tokenizer model - `special_tokens_map.json` - Special tokens mapping - `added_tokens.json` - Additional tokens - `generation_config.json` - Generation defaults - `training_args.bin` - Training arguments (metadata) ### License This model is based on T5-Base by Google, which is licensed under Apache 2.0. **Model License:** Apache 2.0 **Training Data:** Arcenciel.io (public API) **Usage:** Free for commercial and non-commercial use ### Citation If you use this model in your work, please cite: ``` T5X Prompt Translator Base 95K Trained on Arcenciel.io dataset using WD14 v1.4 MOAT tagger Base model: T5-Base (Google) ``` ### Updates & Versions **Version 1.0** (Current) - Initial release - Trained on 95K prompts - T5-Base architecture - WD14 v1.4 MOAT ground truth ## Use with ComfyUI This model is designed to work with the [ComfyUI-T5X-Prompt-Translator](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) custom node: 1. Install the custom node in ComfyUI 2. The model will auto-download on first use 3. Use the node to translate natural language to WD14 tags 4. Connect to CLIP Text Encode for image generation See the [ComfyUI custom node repository](https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator) for installation instructions. ## Intended Use **Primary Use Case:** Converting creative natural language descriptions into optimized WD14-format tags for Stable Diffusion image generation. **Example Applications:** - Simplifying prompt creation for Stable Diffusion - Standardizing prompts across different workflows - Learning WD14 tag vocabulary - Batch processing natural language descriptions ## Limitations - Trained primarily on anime/illustration style prompts - May not perform as well on photorealistic or other specialized domains - Tag accuracy depends on similarity to training data - Requires proper input format: `"translate prompt to tags: [your description]"` ## Training Data - **Source:** Arcenciel.io public API - **Size:** 95,000 image-prompt pairs - **Filtering:** No quality or rating filters (maximum diversity) - **Ground Truth:** WD14 v1.4 MOAT tagger by SmilingWolf - **Format:** Escaped parentheses format for Stable Diffusion compatibility **Note:** Quality filtering was intentionally avoided to maximize training data diversity. Engagement metrics (hearts, likes) are not consistently used across the source platform. ## Citation ```bibtex @misc{t5-base-prompt-translator, title={T5 Base Prompt Translator: Natural Language to WD14 Tags}, author={Elldreth}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/Elldreth/t5_base_prompt_translator}}, } ``` ## Support & Links - **ComfyUI Node:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator - **Issues:** https://github.com/yourusername/ComfyUI-T5X-Prompt-Translator/issues - **Training Code:** https://github.com/yourusername/tag_generator ## Acknowledgments - **Base Model:** T5-Base by Google Research - **Training Data:** Arcenciel.io community - **WD14 Tagger:** SmilingWolf's WD v1.4 MOAT tagger - **Framework:** Hugging Face Transformers