jaisidhsingh/cc3m558k-img-embed-dim-384
Updated
•
31
This collection presents CC3M-558K, the data used by LlaVA for multi-modal feature alignment, pre-embedded across 9 image encoders & 3 text encoders.