xT: Nested Tokenization for Larger Context in Large Images Paper • 2403.01915 • Published Mar 4, 2024 • 1
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63