Atlas โ€” 3D-Tokenized LLM for Autonomous Driving

ๅŸบไบŽ Atlas ่ฎบๆ–‡ ็š„ๅคšๆจกๆ€่‡ชๅŠจ้ฉพ้ฉถๅคง่ฏญ่จ€ๆจกๅž‹ๅฎž็Žฐใ€‚ๅฐ† StreamPETR๏ผˆ3D ็›ฎๆ ‡ๆฃ€ๆต‹๏ผ‰ๅ’Œ TopoMLP๏ผˆ่ฝฆ้“็บฟๆฃ€ๆต‹๏ผ‰ๆๅ–็š„ 3D visual tokens ๆณจๅ…ฅ Vicuna-7B LLM๏ผŒๅฎž็Žฐๆฃ€ๆต‹ใ€่ฝฆ้“็บฟใ€่ง„ๅˆ’็ญ‰ๅคšไปปๅŠก็ปŸไธ€็”Ÿๆˆใ€‚

้กน็›ฎ็ป“ๆž„

3dtokenizer-atlas/
โ”œโ”€โ”€ train_atlas.py                  # Atlas LLM ่ฎญ็ปƒๅ…ฅๅฃ
โ”œโ”€โ”€ eval_atlas.py                   # Atlas ่ฏ„ไผฐๅ…ฅๅฃ
โ”œโ”€โ”€ extract_streampetr_tokens.py    # ้ข„ๆๅ– StreamPETR detection tokens
โ”œโ”€โ”€ extract_topomlp_tokens.py       # ้ข„ๆๅ– TopoMLP raw outputs (lane ๆ ทๆœฌ)
โ”œโ”€โ”€ train_streampetr.sh             # StreamPETR ้ข„่ฎญ็ปƒๅฏๅŠจ่„šๆœฌ
โ”œโ”€โ”€ train_topomlp.sh                # TopoMLP ้ข„่ฎญ็ปƒๅฏๅŠจ่„šๆœฌ
โ”‚
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ streampetr_atlas_aligned.py # StreamPETR ้…็ฝฎ (EVA-02 ViT-L, 800x1600)
โ”‚   โ”œโ”€โ”€ topomlp_atlas_aligned.py    # TopoMLP ้…็ฝฎ (EVA-02 ViT-L, 800x1600)
โ”‚   โ”œโ”€โ”€ ds_zero2.json               # DeepSpeed ZeRO-2 ้…็ฝฎ
โ”‚   โ””โ”€โ”€ REPRODUCTION.md             # ๅค็Žฐๆ–‡ๆกฃ
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ model/
โ”‚   โ”‚   โ”œโ”€โ”€ modeling_atlas.py       # AtlasForCausalLM ไธปๆจกๅž‹
โ”‚   โ”‚   โ”œโ”€โ”€ streampetr_adapter.py   # StreamPETR โ†’ ๆฃ€ๆต‹ token ้€‚้…ๅ™จ
โ”‚   โ”‚   โ”œโ”€โ”€ topomlp_adapter.py      # TopoMLP โ†’ ๅœฐๅ›พ token ้€‚้…ๅ™จ (Perceiver resampler)
โ”‚   โ”‚   โ””โ”€โ”€ token_resampler.py      # CrossAttentionTokenResampler
โ”‚   โ”œโ”€โ”€ dataset/
โ”‚   โ”‚   โ”œโ”€โ”€ atlas_dataset.py        # AtlasDataset + Collate
โ”‚   โ”‚   โ””โ”€โ”€ scene_sampler.py        # SceneSequentialSampler (ๆ—ถๅบ้‡‡ๆ ท)
โ”‚   โ”œโ”€โ”€ eval/
โ”‚   โ”‚   โ””โ”€โ”€ metrics.py              # ่ฏ„ไผฐๆŒ‡ๆ ‡ (F1/Chamfer/L2/Collision)
โ”‚   โ””โ”€โ”€ prompting.py                # ๅคšไปปๅŠก Prompt ๆจกๆฟ
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ gen_atlas_full_data.py               # nuScenes โ†’ ๆฃ€ๆต‹ QA JSON
โ”‚   โ”œโ”€โ”€ gen_atlas_openlane_subsetB_lane_qa.py # OpenLane-V2 โ†’ ่ฝฆ้“็บฟ QA JSON
โ”‚   โ””โ”€โ”€ gen_atlas_planning_qa.py             # nuScenes โ†’ ่ง„ๅˆ’ QA JSON
โ”‚
โ”œโ”€โ”€ data/                                    # ่ฎญ็ปƒ/้ชŒ่ฏๆ•ฐๆฎ (JSON)
โ”‚   โ”œโ”€โ”€ atlas_nuscenes_train.json            # ๆฃ€ๆต‹ (28,130 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ atlas_nuscenes_val.json              # ๆฃ€ๆต‹้ชŒ่ฏ (6,019 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ openlane_subsetB_lane_train_4pt.json # ่ฝฆ้“็บฟ (27,968 ๆ ทๆœฌ, 4 ็‚น/lane)
โ”‚   โ”œโ”€โ”€ openlane_subsetB_lane_val_4pt.json   # ่ฝฆ้“็บฟ้ชŒ่ฏ (6,019 ๆ ทๆœฌ)
โ”‚   โ”œโ”€โ”€ atlas_planning_train.json            # ่ง„ๅˆ’ (23,541 ๆ ทๆœฌ)
โ”‚   โ””โ”€โ”€ atlas_planning_val.json              # ่ง„ๅˆ’้ชŒ่ฏ (5,037 ๆ ทๆœฌ)
โ”‚
โ”œโ”€โ”€ pretrained/                     # ้ข„่ฎญ็ปƒๆƒ้‡
โ”‚   โ”œโ”€โ”€ vicuna-7b-v1.5/            # Vicuna-7B-v1.5 LLM
โ”‚   โ”œโ”€โ”€ eva02_L_coco_det_sys_o365_remapped_fixed.pth
โ”‚   โ””โ”€โ”€ streampetr/
โ”‚       โ””โ”€โ”€ streampetr_eva02_ep24.pth
โ”‚
โ”œโ”€โ”€ work_dirs/
โ”‚   โ”œโ”€โ”€ atlas_full_repro/           # ๅฝ“ๅ‰่ฎญ็ปƒ่พ“ๅ‡บ
โ”‚   โ”œโ”€โ”€ precomputed_det_tokens/     # ้ข„ๆๅ–็š„ StreamPETR tokens
โ”‚   โ”‚   โ””โ”€โ”€ train/                  # 56,098 ไธช .pt ๆ–‡ไปถ (nuScenes + OpenLane)
โ”‚   โ”œโ”€โ”€ precomputed_map_tokens/     # ้ข„ๆๅ–็š„ TopoMLP raw outputs
โ”‚   โ”‚   โ””โ”€โ”€ train/                  # 27,968 ไธช .pt ๆ–‡ไปถ (OpenLane lane ๆ ทๆœฌ)
โ”‚   โ””โ”€โ”€ topomlp_atlas_aligned/     # TopoMLP ้ข„่ฎญ็ปƒๆƒ้‡
โ”‚       โ””โ”€โ”€ epoch_24.pth
โ”‚
โ””โ”€โ”€ external/                       # ๅค–้ƒจไพ่ต–
    โ”œโ”€โ”€ StreamPETR/
    โ”œโ”€โ”€ TopoMLP_Repo/
    โ””โ”€โ”€ nuscenes-devkit/

ๆจกๅž‹ๆžถๆž„

                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  6x ็Žฏ่ง†็›ธๆœบๅ›พ็‰‡ โ†’ โ”‚ StreamPETR (frozen, EVA-02 ViT-L)    โ”‚โ†’ det tokens [B, 256, 256]
                   โ”‚ TopoMLP   (frozen, EVA-02 ViT-L)    โ”‚โ†’ lane queries โ†’ Resampler โ†’ map tokens [B, 256, 256]
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                         AtlasUnifiedProjector
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     โ”‚ projector_det: Linear(256โ†’4096) โ”‚  โ† ๅ•ๅฑ‚็บฟๆ€งๆŠ•ๅฝฑ
                     โ”‚ projector_map: Linear(256โ†’4096) โ”‚
                     โ”‚ projector_rp:  Linear(3โ†’256)    โ”‚  โ† Reference Point, zero-init
                     โ”‚ features += projector_rp(ref)   โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                    ๆณจๅ…ฅๅˆฐ <query> token ไฝ็ฝฎ (256 det + 256 map)
                                    โ†“
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚   Vicuna-7B (ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ, DeepSpeed)   โ”‚
                   โ”‚   Causal Language Modeling Loss      โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ†“
                         ๅคšไปปๅŠกๆ–‡ๆœฌ่พ“ๅ‡บ
              (3D ๆฃ€ๆต‹ / ่ฝฆ้“็บฟ / ่ง„ๅˆ’่ฝจ่ฟน)

่ฎญ็ปƒ้…็ฝฎ

ไธŽ่ฎบๆ–‡ (arXiv:2405.18361) Appendix B.2 ๅฏน้ฝใ€‚

Atlas LLM (ๅฝ“ๅ‰่ฎญ็ปƒ)

ๅ‚ๆ•ฐ ๅ€ผ
LLM Vicuna-7B-v1.5
ๅพฎ่ฐƒๆ–นๅผ ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ (ๆ—  LoRA)
ๅฏ่ฎญ็ปƒๅ‚ๆ•ฐ 6,740,530,176
Learning Rate 2e-5
Optimizer AdamW (weight_decay=1e-4, torch_adam, adam_w_mode)
LR Schedule Cosine with warmup (3% steps)
Epochs 8
Batch Size 1 per GPU
Gradient Accumulation 1
Effective Batch Size 8 (8 GPU x 1 x 1 accum)
Total Steps 79,639
Warmup Steps 2,389
Max Sequence Length 4096 tokens
ๅˆ†ๅธƒๅผ DeepSpeed ZeRO-2 (optimizer sharding)
GPU 8x NVIDIA A100 80GB
็ฒพๅบฆ BF16 (model + gradients, via DeepSpeed bf16), optimizer states sharded
Memory Queue StreamPETR temporal modeling (3 frames, top-256, FIFO)

่ฎญ็ปƒๆ•ฐๆฎ

ไปปๅŠก ๆ•ฐๆฎๆ–‡ไปถ ๆ ทๆœฌๆ•ฐ
3D ็›ฎๆ ‡ๆฃ€ๆต‹ atlas_nuscenes_train.json 28,130
3D ่ฝฆ้“็บฟๆฃ€ๆต‹ openlane_subsetB_lane_train_4pt.json 27,968
่ฝจ่ฟน่ง„ๅˆ’ atlas_planning_train.json 23,541
ๆ€ป่ฎก 79,639

่ฝฆ้“็บฟๆ•ฐๆฎไฝฟ็”จ 4 ไธชๅ‡ๅŒ€้‡‡ๆ ท็‚น/lane (ไธŽ่ฎบๆ–‡ Appendix A.2 ไธ€่‡ด)ใ€‚ๆ‰€ๆœ‰ๅๆ ‡ไฝฟ็”จ 1000-bin ็ฆปๆ•ฃๅŒ–๏ผŒBEV ่Œƒๅ›ด [-50m, +50m]ใ€‚

3D Tokenizer ้ข„่ฎญ็ปƒ (ๅทฒๅฎŒๆˆ)

ๅ‚ๆ•ฐ StreamPETR TopoMLP
Backbone EVA-02 ViT-L (embed_dim=1024) EVA-02 ViT-L (embed_dim=1024)
Resolution 800x1600 800x1600
Queries 256 (detection) 256 (map, resampled from 1800)
Control Points - 4 per lane
Epochs 24 24
ๆ•ฐๆฎ้›† nuScenes trainval OpenLane-V2 subset-B

ๅฟซ้€Ÿๅผ€ๅง‹

1. ็Žฏๅขƒ

conda activate streampetr
# ไธป่ฆไพ่ต–: PyTorch 2.0+, transformers, peft, flash-attn, mmcv 1.7, mmdet3d 1.0
# DeepSpeed (ZeRO-2): pip install deepspeed

2. ๆ•ฐๆฎๅ‡†ๅค‡

# nuScenes ๆ•ฐๆฎๆ น็›ฎๅฝ• (ๅซ v1.0-trainval/ ๅ’Œ samples/)
export DATA_ROOT=/path/to/nuscenes

# OpenLane-V2 subset-B
export OPENLANE_ROOT=/path/to/OpenLane-V2/subset_B

# ็”Ÿๆˆ่ฝฆ้“็บฟ QA ๆ•ฐๆฎ (4 ็‚น/lane, ไธŽ่ฎบๆ–‡ไธ€่‡ด)
python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split train --out_json data/openlane_subsetB_lane_train_4pt.json

python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split val --out_json data/openlane_subsetB_lane_val_4pt.json

3. ่ฎญ็ปƒ

# 8x A100 ๅ…จๅ‚ๆ•ฐๅพฎ่ฐƒ + DeepSpeed ZeRO-2
# ้œ€่ฆๅ…ˆ่ฟ่กŒ extract_streampetr_tokens.py ๅ’Œ extract_topomlp_tokens.py ้ข„ๆๅ– tokens
# ๆœ‰ๆ•ˆ batch size = 8 GPU ร— 1 ร— 1 accum = 8 (ไธŽ่ฎบๆ–‡ไธ€่‡ด)
torchrun --nproc_per_node=8 train_atlas.py \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --streampetr_config configs/streampetr_atlas_aligned.py \
  --streampetr_ckpt pretrained/streampetr/streampetr_eva02_ep24.pth \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --precomputed_det_tokens work_dirs/precomputed_det_tokens/train \
  --precomputed_map_tokens work_dirs/precomputed_map_tokens/train \
  --data_json data/atlas_nuscenes_train.json,data/atlas_planning_train.json,data/openlane_subsetB_lane_train_4pt.json \
  --data_root /mnt/data/nuscenes \
  --image_path_remap /home/guoyuanbo/autodl-tmp/OpenLane-V2=/mnt/OpenLane-V2 \
  --output_dir work_dirs/atlas_full_repro \
  --lr 2e-5 --weight_decay 1e-4 \
  --batch_size 1 --epochs 8 --gradient_accumulation_steps 1 \
  --warmup_ratio 0.03 --max_grad_norm 1.0 \
  --save_epochs 2 --log_steps 100 \
  --seed 42 --num_workers 2 \
  --deepspeed configs/ds_zero2.json

4. ่ฏ„ไผฐ

python eval_atlas.py \
  --checkpoint work_dirs/atlas_full_repro/final/checkpoint.pt \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/openlane_subsetB_lane_val_4pt.json \
  --data_root $DATA_ROOT \
  --batch_size 1 --max_new_tokens 512 --no_flash_attn

ๅ‚่€ƒ

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for guoyb0/3dtokenizer-atlas