Atlas — 3D-Tokenized LLM for Autonomous Driving

基于 Atlas 论文的多模态自动驾驶大语言模型实现。将 StreamPETR（3D 目标检测）和 TopoMLP（车道线检测）提取的 3D visual tokens 注入 Vicuna-7B LLM，实现检测、车道线、规划等多任务统一生成。

项目结构

3dtokenizer-atlas/
├── train_atlas.py                  # Atlas LLM 训练入口
├── eval_atlas.py                   # Atlas 评估入口
├── extract_streampetr_tokens.py    # 预提取 StreamPETR detection tokens
├── extract_topomlp_tokens.py       # 预提取 TopoMLP raw outputs (lane 样本)
├── train_streampetr.sh             # StreamPETR 预训练启动脚本
├── train_topomlp.sh                # TopoMLP 预训练启动脚本
│
├── configs/
│   ├── streampetr_atlas_aligned.py # StreamPETR 配置 (EVA-02 ViT-L, 800x1600)
│   ├── topomlp_atlas_aligned.py    # TopoMLP 配置 (EVA-02 ViT-L, 800x1600)
│   ├── ds_zero2.json               # DeepSpeed ZeRO-2 配置
│   └── REPRODUCTION.md             # 复现文档
│
├── src/
│   ├── model/
│   │   ├── modeling_atlas.py       # AtlasForCausalLM 主模型
│   │   ├── streampetr_adapter.py   # StreamPETR → 检测 token 适配器
│   │   ├── topomlp_adapter.py      # TopoMLP → 地图 token 适配器 (Perceiver resampler)
│   │   └── token_resampler.py      # CrossAttentionTokenResampler
│   ├── dataset/
│   │   ├── atlas_dataset.py        # AtlasDataset + Collate
│   │   └── scene_sampler.py        # SceneSequentialSampler (时序采样)
│   ├── eval/
│   │   └── metrics.py              # 评估指标 (F1/Chamfer/L2/Collision)
│   └── prompting.py                # 多任务 Prompt 模板
│
├── scripts/
│   ├── gen_atlas_full_data.py               # nuScenes → 检测 QA JSON
│   ├── gen_atlas_openlane_subsetB_lane_qa.py # OpenLane-V2 → 车道线 QA JSON
│   └── gen_atlas_planning_qa.py             # nuScenes → 规划 QA JSON
│
├── data/                                    # 训练/验证数据 (JSON)
│   ├── atlas_nuscenes_train.json            # 检测 (28,130 样本)
│   ├── atlas_nuscenes_val.json              # 检测验证 (6,019 样本)
│   ├── openlane_subsetB_lane_train_4pt.json # 车道线 (27,968 样本, 4 点/lane)
│   ├── openlane_subsetB_lane_val_4pt.json   # 车道线验证 (6,019 样本)
│   ├── atlas_planning_train.json            # 规划 (23,541 样本)
│   └── atlas_planning_val.json              # 规划验证 (5,037 样本)
│
├── pretrained/                     # 预训练权重
│   ├── vicuna-7b-v1.5/            # Vicuna-7B-v1.5 LLM
│   ├── eva02_L_coco_det_sys_o365_remapped_fixed.pth
│   └── streampetr/
│       └── streampetr_eva02_ep24.pth
│
├── work_dirs/
│   ├── atlas_full_repro/           # 当前训练输出
│   ├── precomputed_det_tokens/     # 预提取的 StreamPETR tokens
│   │   └── train/                  # 56,098 个 .pt 文件 (nuScenes + OpenLane)
│   ├── precomputed_map_tokens/     # 预提取的 TopoMLP raw outputs
│   │   └── train/                  # 27,968 个 .pt 文件 (OpenLane lane 样本)
│   └── topomlp_atlas_aligned/     # TopoMLP 预训练权重
│       └── epoch_24.pth
│
└── external/                       # 外部依赖
    ├── StreamPETR/
    ├── TopoMLP_Repo/
    └── nuscenes-devkit/

模型架构

                   ┌─────────────────────────────────────┐
  6x 环视相机图片 → │ StreamPETR (frozen, EVA-02 ViT-L)    │→ det tokens [B, 256, 256]
                   │ TopoMLP   (frozen, EVA-02 ViT-L)    │→ lane queries → Resampler → map tokens [B, 256, 256]
                   └─────────────────────────────────────┘
                                    ↓
                         AtlasUnifiedProjector
                     ┌────────────────────────────────┐
                     │ projector_det: Linear(256→4096) │  ← 单层线性投影
                     │ projector_map: Linear(256→4096) │
                     │ projector_rp:  Linear(3→256)    │  ← Reference Point, zero-init
                     │ features += projector_rp(ref)   │
                     └────────────────────────────────┘
                                    ↓
                    注入到 <query> token 位置 (256 det + 256 map)
                                    ↓
                   ┌────────────────────────────────────┐
                   │   Vicuna-7B (全参数微调, DeepSpeed)   │
                   │   Causal Language Modeling Loss      │
                   └────────────────────────────────────┘
                                    ↓
                         多任务文本输出
              (3D 检测 / 车道线 / 规划轨迹)

训练配置

与论文 (arXiv:2405.18361) Appendix B.2 对齐。

Atlas LLM (当前训练)

参数	值
LLM	Vicuna-7B-v1.5
微调方式	全参数微调 (无 LoRA)
可训练参数	6,740,530,176
Learning Rate	2e-5
Optimizer	AdamW (weight_decay=1e-4, torch_adam, adam_w_mode)
LR Schedule	Cosine with warmup (3% steps)
Epochs	8
Batch Size	1 per GPU
Gradient Accumulation	1
Effective Batch Size	8 (8 GPU x 1 x 1 accum)
Total Steps	79,639
Warmup Steps	2,389
Max Sequence Length	4096 tokens
分布式	DeepSpeed ZeRO-2 (optimizer sharding)
GPU	8x NVIDIA A100 80GB
精度	BF16 (model + gradients, via DeepSpeed bf16), optimizer states sharded
Memory Queue	StreamPETR temporal modeling (3 frames, top-256, FIFO)

训练数据

任务	数据文件	样本数
3D 目标检测	`atlas_nuscenes_train.json`	28,130
3D 车道线检测	`openlane_subsetB_lane_train_4pt.json`	27,968
轨迹规划	`atlas_planning_train.json`	23,541
总计		79,639

车道线数据使用 4 个均匀采样点/lane (与论文 Appendix A.2 一致)。所有坐标使用 1000-bin 离散化，BEV 范围 [-50m, +50m]。

3D Tokenizer 预训练 (已完成)

参数	StreamPETR	TopoMLP
Backbone	EVA-02 ViT-L (embed_dim=1024)	EVA-02 ViT-L (embed_dim=1024)
Resolution	800x1600	800x1600
Queries	256 (detection)	256 (map, resampled from 1800)
Control Points	-	4 per lane
Epochs	24	24
数据集	nuScenes trainval	OpenLane-V2 subset-B

快速开始

1. 环境

conda activate streampetr
# 主要依赖: PyTorch 2.0+, transformers, peft, flash-attn, mmcv 1.7, mmdet3d 1.0
# DeepSpeed (ZeRO-2): pip install deepspeed

2. 数据准备

# nuScenes 数据根目录 (含 v1.0-trainval/ 和 samples/)
export DATA_ROOT=/path/to/nuscenes

# OpenLane-V2 subset-B
export OPENLANE_ROOT=/path/to/OpenLane-V2/subset_B

# 生成车道线 QA 数据 (4 点/lane, 与论文一致)
python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split train --out_json data/openlane_subsetB_lane_train_4pt.json

python scripts/gen_atlas_openlane_subsetB_lane_qa.py \
  --openlane_root $OPENLANE_ROOT \
  --split val --out_json data/openlane_subsetB_lane_val_4pt.json

3. 训练

# 8x A100 全参数微调 + DeepSpeed ZeRO-2
# 需要先运行 extract_streampetr_tokens.py 和 extract_topomlp_tokens.py 预提取 tokens
# 有效 batch size = 8 GPU × 1 × 1 accum = 8 (与论文一致)
torchrun --nproc_per_node=8 train_atlas.py \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --streampetr_config configs/streampetr_atlas_aligned.py \
  --streampetr_ckpt pretrained/streampetr/streampetr_eva02_ep24.pth \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --precomputed_det_tokens work_dirs/precomputed_det_tokens/train \
  --precomputed_map_tokens work_dirs/precomputed_map_tokens/train \
  --data_json data/atlas_nuscenes_train.json,data/atlas_planning_train.json,data/openlane_subsetB_lane_train_4pt.json \
  --data_root /mnt/data/nuscenes \
  --image_path_remap /home/guoyuanbo/autodl-tmp/OpenLane-V2=/mnt/OpenLane-V2 \
  --output_dir work_dirs/atlas_full_repro \
  --lr 2e-5 --weight_decay 1e-4 \
  --batch_size 1 --epochs 8 --gradient_accumulation_steps 1 \
  --warmup_ratio 0.03 --max_grad_norm 1.0 \
  --save_epochs 2 --log_steps 100 \
  --seed 42 --num_workers 2 \
  --deepspeed configs/ds_zero2.json

4. 评估

python eval_atlas.py \
  --checkpoint work_dirs/atlas_full_repro/final/checkpoint.pt \
  --llm_model pretrained/vicuna-7b-v1.5 \
  --topomlp_config configs/topomlp_atlas_aligned.py \
  --topomlp_ckpt work_dirs/topomlp_atlas_aligned/epoch_24.pth \
  --data_json data/openlane_subsetB_lane_val_4pt.json \
  --data_root $DATA_ROOT \
  --batch_size 1 --max_new_tokens 512 --no_flash_attn

参考

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for guoyb0/3dtokenizer-atlas

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

Paper • 2405.18361 • Published May 28, 2024