21 31 59

Tony Wu

tonywu71

AI & ML interests

LLM, Multimodal, Agents, Information Retrieval, RAG, Speech

Recent Activity

liked a Space about 1 month ago

Hcompany/Holo1.5-Localization

liked a Space about 1 month ago

Hcompany/Holo1.5-Navigation

liked a model about 1 month ago

Hcompany/Holo1.5-72B

View all activity

Organizations

upvoted a collection about 1 month ago

Holo1.5

Collection

Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated Sep 15 • 33

upvoted 2 articles 3 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

• 385

Article

Merge Large Language Models with mergekit

•

Jan 9, 2024

• 144

upvoted an article 4 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

• 702

upvoted a collection 5 months ago

Holo1

Collection

Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10 • 48

upvoted 2 articles 5 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21

• 224

Article

Preference Optimization for Vision Language Models

Jul 10, 2024

• 86

upvoted 2 articles 6 months ago

Article

Vision Language Models (Better, Faster, Stronger)

May 12

• 555

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

•

Apr 18

• 44

upvoted 2 papers 7 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 297

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

upvoted an article 7 months ago

Article

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

and 2 others •

Mar 18

• 12

upvoted an article 8 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

•

Jul 29, 2024

• 364

upvoted a paper 8 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 152

upvoted 2 articles 8 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 186

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Feb 19

• 72

upvoted a paper 9 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 243

upvoted 3 articles 9 months ago

Article

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Feb 4

• 179

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.31k

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 186

Tony Wu

AI & ML interests

Recent Activity

Organizations

tonywu71's activity

You could have designed state of the art positional encoding

Merge Large Language Models with mergekit

SmolLM3: smol, multilingual, long-context reasoner

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Preference Optimization for Vision Language Models

Vision Language Models (Better, Faster, Stronger)

Gotchas in Tokenizer Behavior Every Developer Should Know

ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

SigLIP 2: A better multilingual vision language encoder

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Open-source DeepResearch – Freeing our search agents

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!