EO-1 Vision-Language-Action Model (Initialization)
A pre-initialized vision-language-action model based on Qwen2.5-VL-3B-Instruct, specifically designed for recent Lerobot PR: https://github.com/huggingface/lerobot/pull/1971
π Quick Start
from transformers import AutoProcessor, AutoModelForCausalLM
# Load the model and processor
model = AutoModelForCausalLM.from_pretrained("IPEC-COMMUNITY/eo1-qwen2_5_vl-initial", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("IPEC-COMMUNITY/eo1-qwen2_5_vl-initial", trust_remote_code=True)
# Ready for training - no additional setup required!
π― Key Features
- Pre-configured Special Tokens: All EO-1 robotic tokens are pre-added to the vocabulary
- Multimodal Processing: Integrated processor handles images, videos, text, robot states, and actions
- Training-Ready: Directly loadable for fine-tuning without modifications
- Based on Qwen2.5-VL-3B: Inherits strong vision-language understanding capabilities
π§ Special Tokens
The model includes pre-configured special tokens for robotic manipulation:
| Token | Purpose |
|---|---|
<|action_start|> |
Marks the beginning of action sequences |
<|action_pad|> |
Padding token for actions |
<|action_pass|> |
Pass-through token for actions |
<|action_end|> |
Marks the end of action sequences |
<|state_start|> |
Marks the beginning of state sequences |
<|state_pad|> |
Padding token for states |
<|state_end|> |
Marks the end of state sequences |
<|vla|> |
Vision-Language-Action task token |
π Data Processing
The integrated processor handles multiple modalities:
- Images: Automatically resized to adaptive pixels
- Videos: Automatically resized to adaptive pixels
- Text: Standard tokenization with special token support
- Robot States: Vectorized and tokenized
- Actions: Vectorized and tokenized with denoising support
ποΈ Model Architecture
- Base Model: Qwen2.5-VL-3B-Instruct
- Vision Encoder: Pre-trained vision transformer
- Language Model: 3B parameter transformer
- Action Projector: Custom layers for robotic action prediction
- Flow Matching: Integrated denoising mechanism for action generation
π‘ Usage Project
- π€Lerobot: https://github.com/huggingface/lerobot/tree/main/src/lerobot/policies/eo1
- πEO-1: https://github.com/EO-Robotics/EO-1
π€ Contributing
For issues, questions, or contributions, please visit our GitHub repository.
Note: This is an initialization model. For best results, fine-tune on your specific robotic task data.
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support