Update README.md
Browse files
README.md
CHANGED
|
@@ -29,12 +29,16 @@ base_model:
|
|
| 29 |
<div align="center" style="line-height: 1;">
|
| 30 |
<a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
|
| 31 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
|
|
|
|
|
|
| 32 |
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
| 33 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
| 34 |
</div>
|
| 35 |
|
| 36 |
## ✨ News
|
| 37 |
|
|
|
|
|
|
|
| 38 |
- [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
|
| 39 |
- [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
|
| 40 |
- [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
|
|
@@ -160,6 +164,20 @@ python eval.py --metadata SpatialLM-Testset/test.csv --gt_dir SpatialLM-Testset/
|
|
| 160 |
|
| 161 |
We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
## SpatialLM Testset
|
| 164 |
|
| 165 |
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
|
|
@@ -182,8 +200,8 @@ Layout estimation focuses on predicting architectural elements, i.e., walls, doo
|
|
| 182 |
|
| 183 |
| **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
|
| 184 |
| :-------------: | :------------: | :-------------------------: | :------------------------------------: |
|
| 185 |
-
| **F1 @.25 IoU** |
|
| 186 |
-
| **F1 @.5 IoU** |
|
| 187 |
|
| 188 |
</div>
|
| 189 |
|
|
@@ -210,8 +228,8 @@ Zero-shot detection results on the challenging SpatialLM-Testset are reported in
|
|
| 210 |
| :-------------: | :-----------------------: | :------------------------: |
|
| 211 |
| **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
|
| 212 |
| wall | 68.9 | 68.2 |
|
| 213 |
-
| door |
|
| 214 |
-
| window |
|
| 215 |
| | | |
|
| 216 |
| **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
|
| 217 |
| curtain | 34.9 | 37.0 |
|
|
@@ -262,14 +280,11 @@ SpatialLM1.1 are built upon Sonata point cloud encoder, model weight is licensed
|
|
| 262 |
If you find this work useful, please consider citing:
|
| 263 |
|
| 264 |
```bibtex
|
| 265 |
-
@
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
eprint = {2506.07491},
|
| 271 |
-
archivePrefix = {arXiv},
|
| 272 |
-
primaryClass = {cs.CV}
|
| 273 |
}
|
| 274 |
```
|
| 275 |
|
|
|
|
| 29 |
<div align="center" style="line-height: 1;">
|
| 30 |
<a href="https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
|
| 31 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialLM-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
| 32 |
+
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
| 33 |
+
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Dataset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
| 34 |
<a href="https://huggingface.co/datasets/manycore-research/SpatialLM-Testset" target="_blank" style="margin: 2px;"><img alt="Dataset"
|
| 35 |
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Testset-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
|
| 36 |
</div>
|
| 37 |
|
| 38 |
## ✨ News
|
| 39 |
|
| 40 |
+
- [Sept, 2025] [SpatialLM-Dataset](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) is now available on Hugging Face.
|
| 41 |
+
- [Sept, 2025] SpatialLM accepted at NeurIPS 2025.
|
| 42 |
- [Jun, 2025] Check out our new models: [SpatialLM1.1-Llama-1B](https://huggingface.co/manycore-research/SpatialLM1.1-Llama-1B) and [SpatialLM1.1-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM1.1-Qwen-0.5B), now available on Hugging Face. SpatialLM1.1 doubles the point cloud resolution, incorporates a more powerful point cloud encoder [Sonata](https://xywu.me/sonata/) and supports detection with user-specified categories.
|
| 43 |
- [Jun, 2025] SpatialLM [Technical Report](https://arxiv.org/abs/2506.07491) is now on arXiv.
|
| 44 |
- [Mar, 2025] We're excited to release the [SpatialLM-Llama-1B](https://huggingface.co/manycore-research/SpatialLM-Llama-1B) and [SpatialLM-Qwen-0.5B](https://huggingface.co/manycore-research/SpatialLM-Qwen-0.5B) on Hugging Face.
|
|
|
|
| 164 |
|
| 165 |
We provide an example of how to use our model to estimate scene layout starting from a RGB video with the newly released [SLAM3R](https://github.com/PKU-VCL-3DV/SLAM3R) in [EXAMPLE.md](EXAMPLE.md). These steps work for MASt3R-SLAM, and other reconstruction methods as well.
|
| 166 |
|
| 167 |
+
## SpatialLM Dataset
|
| 168 |
+
|
| 169 |
+
The SpatialLM dataset is a large-scale, high-quality synthetic dataset designed by professional 3D designers and used for real-world production. It contains point clouds from 12,328 diverse indoor scenes comprising 54,778 rooms, each paired with rich ground-truth 3D annotations. SpatialLM dataset provides an additional valuable resource for advancing research in indoor scene understanding, 3D perception, and related applications.
|
| 170 |
+
|
| 171 |
+
For access to photorealistic RGB/Depth/Normal/Semantic/Instance panoramic renderings and camera trajectories used to generate the SpatialLM point clouds, please refer to the [SpatialGen project](https://manycore-research.github.io/SpatialGen) for more details.
|
| 172 |
+
|
| 173 |
+
<div align="center">
|
| 174 |
+
|
| 175 |
+
| **Dataset** | **Download** |
|
| 176 |
+
| :---------------: | ---------------------------------------------------------------------------------- |
|
| 177 |
+
| SpatialLM-Dataset | [🤗 Datasets](https://huggingface.co/datasets/manycore-research/SpatialLM-Dataset) |
|
| 178 |
+
|
| 179 |
+
</div>
|
| 180 |
+
|
| 181 |
## SpatialLM Testset
|
| 182 |
|
| 183 |
We provide a test set of 107 preprocessed point clouds, reconstructed from RGB videos using [MASt3R-SLAM](https://github.com/rmurai0610/MASt3R-SLAM). SpatialLM-Testset is quite challenging compared to prior clean RGBD scans datasets due to the noises and occlusions in the point clouds reconstructed from monocular RGB videos.
|
|
|
|
| 200 |
|
| 201 |
| **Method** | **RoomFormer** | **SceneScript (finetuned)** | **SpatialLM1.1-Qwen-0.5B (finetuned)** |
|
| 202 |
| :-------------: | :------------: | :-------------------------: | :------------------------------------: |
|
| 203 |
+
| **F1 @.25 IoU** | 83.4 | 90.4 | 94.3 |
|
| 204 |
+
| **F1 @.5 IoU** | 81.4 | 89.2 | 93.5 |
|
| 205 |
|
| 206 |
</div>
|
| 207 |
|
|
|
|
| 228 |
| :-------------: | :-----------------------: | :------------------------: |
|
| 229 |
| **Layout** | **F1 @.25 IoU (2D)** | **F1 @.25 IoU (2D)** |
|
| 230 |
| wall | 68.9 | 68.2 |
|
| 231 |
+
| door | 49.1 | 47.4 |
|
| 232 |
+
| window | 47.0 | 51.4 |
|
| 233 |
| | | |
|
| 234 |
| **Objects** | **F1 @.25 IoU (3D)** | **F1 @.25 IoU (2D)** |
|
| 235 |
| curtain | 34.9 | 37.0 |
|
|
|
|
| 280 |
If you find this work useful, please consider citing:
|
| 281 |
|
| 282 |
```bibtex
|
| 283 |
+
@inproceedings{SpatialLM,
|
| 284 |
+
title = {SpatialLM: Training Large Language Models for Structured Indoor Modeling},
|
| 285 |
+
author = {Mao, Yongsen and Zhong, Junhao and Fang, Chuan and Zheng, Jia and Tang, Rui and Zhu, Hao and Tan, Ping and Zhou, Zihan},
|
| 286 |
+
booktitle = {Advances in Neural Information Processing Systems},
|
| 287 |
+
year = {2025}
|
|
|
|
|
|
|
|
|
|
| 288 |
}
|
| 289 |
```
|
| 290 |
|