Improve model card: Add metadata and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +68 -10
README.md CHANGED
@@ -1,5 +1,16 @@
 
 
 
 
 
 
 
 
 
1
  # InternVLA-N1 Model Series
2
 
 
 
3
  ![License](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)
4
  ![Transformers](https://img.shields.io/badge/%F0%9F%A4%97%20Transformers-9cf?style=flat)
5
  ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)
@@ -43,20 +54,68 @@ InternVLA-N1 is a state-of-the-art navigation foundation model built on a **mult
43
  ## Model Variants
44
 
45
  | Model Variant | Description | Key Characteristics |
46
- |--------------|-------------|----------------------|
47
- | [**InternVLA-N1 (S2)**](https://huggingface.co/InternRobotics/InternVLA-N1-System2) | Finetuned Qwen2.5-VL model for pixel-goal grounding | Strong System 2 module; compatible with decoupled System 1 controllers or joint optimization pipelines |
48
- | [**InternVLA-N1 (Dual System) _w/ NavDP\*_**](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP) | Jointly tuned System 1 (NavDP*) and InternVLA-N1 (S2) | Optimized end-to-end performance; uses RGB-D observations |
49
- | [**InternVLA-N1 (Dual System) _DualVLN_**](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) | Latest dual-system architecture | Optimized end-to-end performance and faster convergence; uses RGB observations |
50
-
51
-
52
 
53
 
54
  > The previously released version is now called [InternVLA-N1-wo-dagger](https://huggingface.co/InternRobotics/InternVLA-N1-wo-dagger). The lastest official release is recommended for best performance.
55
 
56
  ---
57
 
58
- ## Usage
59
- For inference, evaluation, and the Gradio demo, please refer to the [InternNav repository](https://github.com/InternRobotics/InternNav).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ---
62
 
@@ -85,5 +144,4 @@ If you find our work helpful, please consider starring this repository 🌟 and
85
  primaryClass={cs.RO},
86
  url={https://arxiv.org/abs/2512.08186},
87
  }
88
-
89
-
 
1
+ ---
2
+ pipeline_tag: robotics
3
+ library_name: transformers
4
+ license: cc-by-nc-sa-4.0
5
+ tags:
6
+ - vision-language-model
7
+ - navigation
8
+ ---
9
+
10
  # InternVLA-N1 Model Series
11
 
12
+ This model was presented in the paper [Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation](https://huggingface.co/papers/2512.08186).
13
+
14
  ![License](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)
15
  ![Transformers](https://img.shields.io/badge/%F0%9F%A4%97%20Transformers-9cf?style=flat)
16
  ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)
 
54
  ## Model Variants
55
 
56
  | Model Variant | Description | Key Characteristics |
57
+ |--------------|-------------|----------------------|\
58
+ | [**InternVLA-N1 (S2)**](https://huggingface.co/InternRobotics/InternVLA-N1-System2) | Finetuned Qwen2.5-VL model for pixel-goal grounding | Strong System 2 module; compatible with decoupled System 1 controllers or joint optimization pipelines |\
59
+ | [**InternVLA-N1 (Dual System) _w/ NavDP\*_**](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP) | Jointly tuned System 1 (NavDP\*) and InternVLA-N1 (S2) | Optimized end-to-end performance; uses RGB-D observations |\
60
+ | [**InternVLA-N1 (Dual System) _DualVLN_**](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) | Latest dual-system architecture | Optimized end-to-end performance and faster convergence; uses RGB observations |\
 
 
61
 
62
 
63
  > The previously released version is now called [InternVLA-N1-wo-dagger](https://huggingface.co/InternRobotics/InternVLA-N1-wo-dagger). The lastest official release is recommended for best performance.
64
 
65
  ---
66
 
67
+ ## Sample Usage
68
+
69
+ This model is compatible with the Hugging Face `transformers` library. The following code snippet demonstrates how to perform inference:
70
+
71
+ ```python
72
+ import torch
73
+ from PIL import Image
74
+ from transformers import AutoProcessor, AutoModelForCausalLM
75
+ import requests
76
+ from io import BytesIO
77
+
78
+ # Load model and processor
79
+ hf_model_id = "InternRobotics/InternVLA-N1-DualVLN"
80
+ model = AutoModelForCausalLM.from_pretrained(hf_model_id, torch_dtype=torch.float16, trust_remote_code=True, device_map="cuda")
81
+ processor = AutoProcessor.from_pretrained(hf_model_id, trust_remote_code=True)
82
+
83
+ # Load a dummy image
84
+ # Replace with your actual image path or a URL to a relevant scene
85
+ image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_image.jpg"
86
+ image = Image.open(BytesIO(requests.get(image_url).content)).convert("RGB")
87
+
88
+ # Define a question related to navigation or visual understanding
89
+ question = "What is the most direct path to the kitchen from here? Describe the first few steps."
90
+
91
+ messages = [
92
+ {"role": "user", "content": f"<|image_pad|>{question}"},
93
+ ]
94
+
95
+ # Process inputs
96
+ inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
97
+ inputs = inputs.to(model.device)
98
+ pixel_values = processor.preprocess(images=image, return_tensors="pt")["pixel_values"]
99
+ pixel_values = pixel_values.to(model.device, dtype=torch.float16)
100
+
101
+ # Generate response
102
+ with torch.inference_mode():
103
+ outputs = model.generate(
104
+ **inputs,
105
+ pixel_values=pixel_values,
106
+ do_sample=True,
107
+ temperature=0.7,
108
+ max_new_tokens=1024,
109
+ eos_token_id=processor.tokenizer.eos_token_id,
110
+ repetition_penalty=1.05
111
+ )
112
+
113
+ response = processor.decode(outputs[0], skip_special_tokens=True)
114
+ print(f"User: {question}
115
+ Assistant: {response}")
116
+ ```
117
+
118
+ For more detailed usage (inference, evaluation, and Gradio demo), please refer to the [InternNav repository](https://github.com/InternRobotics/InternNav).
119
 
120
  ---
121
 
 
144
  primaryClass={cs.RO},
145
  url={https://arxiv.org/abs/2512.08186},
146
  }
147
+ ```