Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
jaisidhsingh 's Collections
HYMA: VLM connector checkpoints
HYMA: LlaVA Alignment CC3M-558K Pre-embedded

HYMA: LlaVA Alignment CC3M-558K Pre-embedded

updated Aug 19

This collection presents CC3M-558K, the data used by LlaVA for multi-modal feature alignment, pre-embedded across 9 image encoders & 3 text encoders.

Upvote
-

  • jaisidhsingh/cc3m558k-img-embed-dim-384

    Updated Aug 18 • 31

  • jaisidhsingh/cc3m558k-img-embed-dim-768

    Updated Aug 18 • 30

  • jaisidhsingh/cc3m558k-img-embed-dim-1024

    Updated Aug 18 • 35

  • jaisidhsingh/cc3m558k-txt-embed-all-dims

    Updated Aug 18 • 35

  • (Almost) Free Modality Stitching of Foundation Models

    Paper • 2507.10015 • Published Jul 14 • 1
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs