File size: 3,761 Bytes

---
library_name: transformers
license: cc-by-nc-sa-4.0
pipeline_tag: robotics
tags:
- vision-language-model
- video-language-model
- navigation
---

<div id="top" align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/64e6d9d229a548f66aff6e5b/4ZRvK6ySWCFj9mlpND791.gif" width=60% >

</div>

# InternVLA-N1: An Open Dual-System Navigation Foundation Model with Learned Latent Plans

**Paper:** [Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation](https://huggingface.co/papers/2512.08186)
**Code:** [![Code](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/InternRobotics/InternNav)
**Project Page:** https://internrobotics.github.io/internvla-n1-dualvln.github.io/
**Technical Report (InternVLA-N1):** https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf
**Data:** https://huggingface.co/datasets/InternRobotics/InternData-N1



## 🔔 Important Notice

* This repository hosts the **official release** of **InternVLA-N1**.
* The previously **InternVLA-N1** model has been renamed to **InternVLA-N1-Preview**. If you are looking for the **earlier preview version**, please check [InternVLA-N1-Preview](https://huggingface.co/InternRobotics/InternVLA-N1-Preview).
* We recommend using this official release for research and deployment, as it contains the most stable and up-to-date improvements.

### Key Difference: Preview vs Official
| Feature | InternVLA-N1-Preview | InternVLA-N1 (official) |
|---|---|---|
| System Design | Dual-System (synchronous) | Dual-System (asynchronous) |
| Training | System 1 trained only at System 2 inferrence step | System 1 trained on denser step (~25 cm), using latest System 2 hidden state |
| Inference | System 1, 2 infered at same frequency (~2 hz) | System 1, 2 infered asynchronously, allowing dynamic obstacle avoidance |
| Performance | Solid baseline in simulation & benchmarks | Improved smoothness, efficiency, and real-world zero-shot generalization |
| Status | Historical preview | Stable official release (recommended)

## Highlights

- Dual-System Framework

The first navigation foundation model that achieves joint-tuning and asychronous inference of System-2 reasoning and System-1 action, resulting in smooth and efficient execution during the instruction-followed navigation procedure.

- State-of-the-art

The whole navigation foundation model with each system achieves state-of-the-art performance on both mainstream and our new established challenging benchmarks, including VLN-CE R2R & RxR, GRScenes-100, VLN-PE, etc.

- Sim2Real Zero-shot Generalization

The training is based on simulation data InternData-N1 only, with diverse scenes, embodiments and other randomization, while achieving great zero-shot generalization capabilities in the real world.

## Usage

Please refer to [InternNav](https://github.com/InternRobotics/InternNav) for its inference, evaluation and gradio demo.

## Citation

If you find our work helpful, please consider starring this repo 🌟 and cite:

```bibtex
@article{wei2025ground,
  title={Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation},
  author={Wei, Meng and Wan, Chenyang and Peng, Jiaqi and Yu, Xiqian and Yang, Yuqiang and Feng, Delin and Cai, Wenzhe and Zhu, Chenming and Wang, Tai and Pang, Jiangmiao and Liu, Xihui},
  journal={arXiv preprint arXiv:2512.08186},
  year={2025}
}
```

## License
This work is under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).

## Acknowledgements
This repository is based on [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL).