Pony V7

Pony V7 is a versatile character generation model based on AuraFlow architecture. It supports a wide range of styles and species types (humanoid, anthro, feral, and more) and handles character interactions through natural language prompts.

Fictional

First, let me introduce Fictional - our multimodal platform where AI Characters come alive through text, images, voice, and (soon) video. Powered by PonyV7, V6, Chroma, Seedream 4, and other advanced models, Fictional lets you discover, create, and interact with characters who live their own lives and share their own stories.

Fictional is also what enables the development of models like V7, so if you're excited about the future of multimodal AI characters, please download Fictional on iOS or Android and help shape our future!

iOS: https://apps.apple.com/us/app/fictional/id6739802573
Android: https://play.google.com/store/apps/details?id=ai.fictional.app

Get in touch with us

Please join our Discord Server if you have questions about Fictional and Pony models.

Important model information

Please check this article to learn more about why it took so long for us to ship V7 and upcoming model releases.

Important HuggingFace links

GGUF Models - Quantized models for lower VRAM usage (Q8_0 recommended for best quality/size balance)
Safetensor Model - Single-file safetensors format for easier loading
LoRA Training - Information and tools for training LoRAs with SimpleTuner
Workflows - ComfyUI workflow examples for standard and GGUF inference
ComfyUI Nodes - Custom PonyNoise node for GPU/CPU noise selection

Model prompting

This model supports a wide array of styles and aesthetics but provides an opinionated default prompt template:

special tags, factual description of image, stylistic description of image, additional content tags

Special Tags

score_X, style_cluster_x, source_X - warning: V7 prompting may be inconsistent, please see the article as we are working on V7.1 to address this.

Factual description of image

Description of what is portrayed in the image without any stylistic indicators. Two recommendations:

Start with a single phrase describing what you want in the image before going into details
When referring to characters use pattern: <species> <gender> <name> from <source>

For example "Anthro bunny female Lola Bunny from Space Jam".

This model is capable of recognizing many popular and obscure characters and series.

Stylistic description of image

Any information about image medium, shot type, lighting, etc. (More info TBD with captioning Colab)

Captioning Colab

To get a better understanding of V7 prompting, we are releasing a captioning Colab with all the models used for V7 captioning.

Supported inference settings

V7 supports resolutions in the range of 768px to 1536px. It is recommended to go for higher resolutions and at least 30 steps during inference.

Highlights compared to V6

Much stronger understanding of prompts, especially when it comes to spatial information and multiple characters
Much stronger background support - both generation of backgrounds and using background with character
Much stronger realism support out of the box
Ability to generate very dark and very light images
Resolution up to 1536x1536 pixels
Expanded character recognition (some V6 characters may get less recognized, but generally we extended the knowledge by a lot)

Special thanks

Iceman for helping to procure necessary training resources
Simo Ryu and the rest of FAL.ai team for creating AuraFlow and emotional support
Runpod for providing captioning compute
Piclumen for being our partners
City96 for help with GGUF support
diffusers team for supporting AuraFlow integration work
PSAI Server Subscribers for supporting the project costs
PSAI Server Moderators for being vigilant and managing the community
Many supporters that decided to remain anonymous but their help has been critical for getting V7 done

Technical details

The model has been trained on ~10M images aesthetically ranked and selected from a superset of over 30M images with roughly 1:1 ratio between anime/cartoon/furry/pony datasets and 1:1 ratio between safe/questionable/explicit ratings. 100% of all images have been tagged and captioned with high quality detailed captions.

All images have been used in training with both captions and tags. Artists' names have been removed and source data has been filtered based on our Opt-in/Opt-out program. Any inappropriate explicit content has been filtered out.

Limitations

This model does not support text generation and has degraded text generation capabilities compared to base AuraFlow
Special tags (including quality tags) have much weaker performance compared to V6, meaning score_9 would not necessarily yield better results on some prompts. We are working on a V7.1 follow-up to improve this
Small details and especially faces may degrade significantly depending on art style, this is a combination of outdated VAE and insufficient training which we are trying to improve in V7.1

LoRA training

We recommend using SimpleTuner for LoRA training following this guide.

For information on converting SimpleTuner LoRAs to diffusers/ComfyUI compatible format, see the LoRA folder. A LoRA workflow example is also available.

Commercial API

We provide commercial API via our exclusive partner FAL.ai

License

This model is licensed under a Pony License

In short, you can use this model and its outputs commercially unless you provide an inference service or application, have a company with over 1M revenue or use in professional video production. This limitations do not apply if you use first party commercial APIs.

If you want to use this model commercially, please reach us at [email protected].

Explicit permission for commercial inference has been granted to CivitAi and Hugging Face.

Downloads last month: 1,709

GGUF

Model size

7B params

Architecture

aura

Hardware compatibility

4-bit

8-bit

purplesmartai
/

pony-v7-base