Rotated Positional Embedding for Object Detection in Latent Space

The initial positional embeddings are rotated to align with the latent coordinates of the tagged objects. Positioning them in proximity to the corresponding object in the image.

Built on a multimodal model, Wan2.1 encoded the image.

Categories:

- [1] hat
- [2] hair
- [3] sunglasses
- [4] shirt
- [5] skirt
- [6] pants
- [7] dress
- [8] belt
- [9] shoes
- [11] face
- [12] legs
- [14] arms
- [16] bag
- [17] scarf

Disclaimer

The documentation and the model requires citation and attribution to the author via a link to their Hugging Face profile.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

41.4M params

Tensor type

F32