HYMA: LlaVA Alignment CC3M-558K Pre-embedded

jaisidhsingh 's Collections

updated Aug 19

This collection presents CC3M-558K, the data used by LlaVA for multi-modal feature alignment, pre-embedded across 9 image encoders & 3 text encoders.