Update model card: Add main paper link, update project page and citation
Browse filesThis PR improves the model card for InternVLA-N1 by:
- Adding a direct link to the main research paper: [Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation](https://huggingface.co/papers/2512.08186).
- Updating the project page link to the correct URL: `https://internrobotics.github.io/internvla-n1-dualvln.github.io/`.
- Reorganizing the top links for better readability and clarity, including the GitHub repository, updated project page, and technical report for InternVLA-N1.
- Updating the BibTeX citation to reflect the "Ground Slow, Move Fast" paper details.
- Ensuring no sample usage code snippets are added, as per guidelines requiring explicit evidence in the GitHub README.
Please review and merge if these improvements align with our goals for model card completeness.
|
@@ -1,11 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
pipeline_tag: robotics
|
| 3 |
library_name: transformers
|
| 4 |
license: cc-by-nc-sa-4.0
|
|
|
|
| 5 |
tags:
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
---
|
| 10 |
|
| 11 |
<div id="top" align="center">
|
|
@@ -13,18 +13,13 @@ tags:
|
|
| 13 |
|
| 14 |
</div>
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
# InternVLA-N1: An Open Dual-System Navigation Foundation Model with Learned Latent Plans
|
| 20 |
|
| 21 |
-
[
|
| 22 |
-
|
| 23 |
-
Project
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
Data: https://huggingface.co/datasets/InternRobotics/InternData-N1
|
| 28 |
|
| 29 |
|
| 30 |
|
|
@@ -35,13 +30,13 @@ Data: https://huggingface.co/datasets/InternRobotics/InternData-N1
|
|
| 35 |
* We recommend using this official release for research and deployment, as it contains the most stable and up-to-date improvements.
|
| 36 |
|
| 37 |
### Key Difference: Preview vs Official
|
| 38 |
-
| Feature
|
| 39 |
-
|
| 40 |
-
| System Design | Dual-System (synchronous)
|
| 41 |
-
| Training
|
| 42 |
-
| Inference
|
| 43 |
-
| Performance
|
| 44 |
-
| Status
|
| 45 |
|
| 46 |
## Highlights
|
| 47 |
|
|
@@ -66,11 +61,11 @@ Please refer to [InternNav](https://github.com/InternRobotics/InternNav) for its
|
|
| 66 |
If you find our work helpful, please consider starring this repo 🌟 and cite:
|
| 67 |
|
| 68 |
```bibtex
|
| 69 |
-
@
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
}
|
| 75 |
```
|
| 76 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: cc-by-nc-sa-4.0
|
| 4 |
+
pipeline_tag: robotics
|
| 5 |
tags:
|
| 6 |
+
- vision-language-model
|
| 7 |
+
- video-language-model
|
| 8 |
+
- navigation
|
| 9 |
---
|
| 10 |
|
| 11 |
<div id="top" align="center">
|
|
|
|
| 13 |
|
| 14 |
</div>
|
| 15 |
|
|
|
|
|
|
|
|
|
|
| 16 |
# InternVLA-N1: An Open Dual-System Navigation Foundation Model with Learned Latent Plans
|
| 17 |
|
| 18 |
+
**Paper:** [Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation](https://huggingface.co/papers/2512.08186)
|
| 19 |
+
**Code:** [](https://github.com/InternRobotics/InternNav)
|
| 20 |
+
**Project Page:** https://internrobotics.github.io/internvla-n1-dualvln.github.io/
|
| 21 |
+
**Technical Report (InternVLA-N1):** https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf
|
| 22 |
+
**Data:** https://huggingface.co/datasets/InternRobotics/InternData-N1
|
|
|
|
|
|
|
| 23 |
|
| 24 |
|
| 25 |
|
|
|
|
| 30 |
* We recommend using this official release for research and deployment, as it contains the most stable and up-to-date improvements.
|
| 31 |
|
| 32 |
### Key Difference: Preview vs Official
|
| 33 |
+
| Feature | InternVLA-N1-Preview | InternVLA-N1 (official) |
|
| 34 |
+
|---|---|---|
|
| 35 |
+
| System Design | Dual-System (synchronous) | Dual-System (asynchronous) |
|
| 36 |
+
| Training | System 1 trained only at System 2 inferrence step | System 1 trained on denser step (~25 cm), using latest System 2 hidden state |
|
| 37 |
+
| Inference | System 1, 2 infered at same frequency (~2 hz) | System 1, 2 infered asynchronously, allowing dynamic obstacle avoidance |
|
| 38 |
+
| Performance | Solid baseline in simulation & benchmarks | Improved smoothness, efficiency, and real-world zero-shot generalization |
|
| 39 |
+
| Status | Historical preview | Stable official release (recommended)
|
| 40 |
|
| 41 |
## Highlights
|
| 42 |
|
|
|
|
| 61 |
If you find our work helpful, please consider starring this repo 🌟 and cite:
|
| 62 |
|
| 63 |
```bibtex
|
| 64 |
+
@article{wei2025ground,
|
| 65 |
+
title={Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation},
|
| 66 |
+
author={Wei, Meng and Wan, Chenyang and Peng, Jiaqi and Yu, Xiqian and Yang, Yuqiang and Feng, Delin and Cai, Wenzhe and Zhu, Chenming and Wang, Tai and Pang, Jiangmiao and Liu, Xihui},
|
| 67 |
+
journal={arXiv preprint arXiv:2512.08186},
|
| 68 |
+
year={2025}
|
| 69 |
}
|
| 70 |
```
|
| 71 |
|