--- pipeline_tag: text-to-image license: apache-2.0 base_model: - neta-art/Neta-Lumina - Alpha-VLLM/Lumina-Image-2.0 tags: - stable-diffusion - text-to-image - comfyui - diffusion-single-file --- # NetaYume Lumina Image v2.0 ![NetaYume Lumina Image v2.0](./Example/Demo_v2.png) --- **I. Introduction** NetaYume Lumina is a text-to-image model fine-tuned from [Neta Lumina](https://huggingface.co/neta-art/Neta-Lumina), a high-quality anime-style image generation model developed by [Neta.art Lab](https://huggingface.co/neta-art). It builds upon [Lumina-Image-2.0](https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0), an open-source base model released by the [Alpha-VLLM](https://huggingface.co/Alpha-VLLM) team at Shanghai AI Laboratory. This model was trained with the goal of not only generating realistic human images but also producing high-quality anime-style images. Despite being fine-tuned on a specific dataset, it retains a significant amount of knowledge from the base model. **Key Features:** - **High-Quality Anime Generation**: Generates detailed anime-style images with sharp outlines, vibrant colors, and smooth shading. - **Improved Character Understanding**: Better captures characters, especially those from the Danbooru dataset, resulting in more coherent and accurate character representations. - **Enhanced Fine Details**: Accurately generates accessories, clothing textures, hairstyles, and background elements with greater clarity. The file NetaYume_Lumina_v2_all_in_one.safetensors is an all-in-one file that contains the necessary weights for the VAE, text encoder, and image backbone to be used with ComfyUI. --- **II. Model Components & Training Details** - **Text Encoder**: Pre-trained **Gemma-2-2b** - **Variational Autoencoder**: Pre-trained **Flux.1 dev's VAE** - **Image Backbone**: Fine-tune **NetaLumina's Image Backbone** --- **III. Suggestion** **System Prompt:** This help you generate your desired images more easily by understanding and aligning with your prompts. For anime-style images using Danbooru tags: You are an assistant designed to generate anime images based on textual prompts. You are an assistant designed to generate high-quality images based on user prompts and danbooru tags. **Recommended Settings** - CFG: 4–7 - Sampling Steps: 40-50 - Sampler: - Euler a (with scheduler: normal) - res_multistep (with scheduler: linear_quadratic) --- **IV. Acknowledgments** - [narugo1992](https://huggingface.co/narugo) – for the invaluable Danbooru dataset - [Alpha-VLLM](https://huggingface.co/Alpha-VLLM) - for creating the a wonderful model! - [Neta.art](https://huggingface.co/neta-art/Neta-Lumina) and his team – for openly sharing awesome model.