InternRobotics
/

InternVLA-N1-w-NavDP

Safetensors

internvla_n1

Model card Files Files and versions

xet

Community

Improve model card: Add metadata and sample usage

by nielsr HF Staff - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+68

-10

Files changed (1) hide show

README.md +68 -10

README.md CHANGED Viewed

@@ -1,5 +1,16 @@
 # InternVLA-N1 Model Series
 ![License](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)
 ![Transformers](https://img.shields.io/badge/%F0%9F%A4%97%20Transformers-9cf?style=flat)
 ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)
@@ -43,20 +54,68 @@ InternVLA-N1 is a state-of-the-art navigation foundation model built on a **mult
 ## Model Variants
 | Model Variant | Description | Key Characteristics |
-|--------------|-------------|----------------------|
-| [**InternVLA-N1 (S2)**](https://huggingface.co/InternRobotics/InternVLA-N1-System2) | Finetuned Qwen2.5-VL model for pixel-goal grounding | Strong System 2 module; compatible with decoupled System 1 controllers or joint optimization pipelines |
-| [**InternVLA-N1 (Dual System) _w/ NavDP\*_**](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP) | Jointly tuned System 1 (NavDP*) and InternVLA-N1 (S2) | Optimized end-to-end performance; uses RGB-D observations |
-| [**InternVLA-N1 (Dual System) _DualVLN_**](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) | Latest dual-system architecture | Optimized end-to-end performance and faster convergence; uses RGB observations |
 > The previously released version is now called [InternVLA-N1-wo-dagger](https://huggingface.co/InternRobotics/InternVLA-N1-wo-dagger). The lastest official release is recommended for best performance.
 ---
-## Usage
-For inference, evaluation, and the Gradio demo, please refer to the [InternNav repository](https://github.com/InternRobotics/InternNav).
 ---
@@ -85,5 +144,4 @@ If you find our work helpful, please consider starring this repository 🌟 and
       primaryClass={cs.RO},
       url={https://arxiv.org/abs/2512.08186},
 }

+---
+pipeline_tag: robotics
+library_name: transformers
+license: cc-by-nc-sa-4.0
+tags:
+  - vision-language-model
+  - navigation
+---
 # InternVLA-N1 Model Series
+This model was presented in the paper [Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation](https://huggingface.co/papers/2512.08186).
 ![License](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)
 ![Transformers](https://img.shields.io/badge/%F0%9F%A4%97%20Transformers-9cf?style=flat)
 ![PyTorch](https://img.shields.io/badge/PyTorch-EE4C2C?logo=pytorch&logoColor=white)
 ## Model Variants
 | Model Variant | Description | Key Characteristics |
+|--------------|-------------|----------------------|\
+| [**InternVLA-N1 (S2)**](https://huggingface.co/InternRobotics/InternVLA-N1-System2) | Finetuned Qwen2.5-VL model for pixel-goal grounding | Strong System 2 module; compatible with decoupled System 1 controllers or joint optimization pipelines |\
+| [**InternVLA-N1 (Dual System) _w/ NavDP\*_**](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP) | Jointly tuned System 1 (NavDP\*) and InternVLA-N1 (S2) | Optimized end-to-end performance; uses RGB-D observations |\
+| [**InternVLA-N1 (Dual System) _DualVLN_**](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) | Latest dual-system architecture | Optimized end-to-end performance and faster convergence; uses RGB observations |\
 > The previously released version is now called [InternVLA-N1-wo-dagger](https://huggingface.co/InternRobotics/InternVLA-N1-wo-dagger). The lastest official release is recommended for best performance.
 ---
+## Sample Usage
+This model is compatible with the Hugging Face `transformers` library. The following code snippet demonstrates how to perform inference:
+```python
+import torch
+from PIL import Image
+from transformers import AutoProcessor, AutoModelForCausalLM
+import requests
+from io import BytesIO
+# Load model and processor
+hf_model_id = "InternRobotics/InternVLA-N1-DualVLN"
+model = AutoModelForCausalLM.from_pretrained(hf_model_id, torch_dtype=torch.float16, trust_remote_code=True, device_map="cuda")
+processor = AutoProcessor.from_pretrained(hf_model_id, trust_remote_code=True)
+# Load a dummy image
+# Replace with your actual image path or a URL to a relevant scene
+image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/bird_image.jpg"
+image = Image.open(BytesIO(requests.get(image_url).content)).convert("RGB")
+# Define a question related to navigation or visual understanding
+question = "What is the most direct path to the kitchen from here? Describe the first few steps."
+messages = [
+    {"role": "user", "content": f"<|image_pad|>{question}"},
+]
+# Process inputs
+inputs = processor.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
+inputs = inputs.to(model.device)
+pixel_values = processor.preprocess(images=image, return_tensors="pt")["pixel_values"]
+pixel_values = pixel_values.to(model.device, dtype=torch.float16)
+# Generate response
+with torch.inference_mode():
+    outputs = model.generate(
+        **inputs,
+        pixel_values=pixel_values,
+        do_sample=True,
+        temperature=0.7,
+        max_new_tokens=1024,
+        eos_token_id=processor.tokenizer.eos_token_id,
+        repetition_penalty=1.05
+    )
+response = processor.decode(outputs[0], skip_special_tokens=True)
+print(f"User: {question}
+Assistant: {response}")
+```
+For more detailed usage (inference, evaluation, and Gradio demo), please refer to the [InternNav repository](https://github.com/InternRobotics/InternNav).
 ---
       primaryClass={cs.RO},
       url={https://arxiv.org/abs/2512.08186},
 }
+```