Sigma: The Key for Vision–Language–Action Models toward Telepathy

Sigma is a telepathy-style Vision–Language–Action (VLA) model built on top of lerobot/pi05_base.
It adds a semantic “telepathy” path and LoRA adapters that steer continuous robot control using internal semantic memory and intent states, while keeping the original π0.5 backbone weights intact and recoverable.

1. Summary

Base policy: lerobot/pi05_base (π0.5)
Author: Libo Wang
GPU for training: single RTX 4090 (24GB)
Data: lerobot/svla_so101_pickplace
Objective:
Make a π0.5-style VLA use internal semantic & intent states to refine continuous control, rather than only imitating trajectories.

Sigma keeps the perception and control structure of π0.5, and introduces an additional pathway that:

fuses vision, language, and robot state into a shared latent sequence,
maintains a semantic state m_t and an intent vector z_intent over time,
converts them into telepathy factors that modulate the policy’s action outputs as residual corrections.

2. Architecture at a Glance

Sigma can be seen as π0.5 + telepathic head + LoRA adapters:

Vision / State stream
- reuse π0.5 encoders for images and robot state;
- add FiLM-style modulation from telepathy factors on vision tokens.
Language–semantic stream
- take text tokens, vision tokens, and state tokens into a shared MLLM backbone;
- derive:
  - a semantic memory m_t that accumulates cross-time information,
  - an intent vector z_intent,
  - pooled semantic factors aligned with the text embedding space.
Action stream (three branches)
- treat π0.5 outputs as baseline:
  - action vector (per-step),
  - action chunk (short horizon),
  - action trajectory (full horizon);
- learn residual actions driven by telepathy factors on all three branches.

The resulting policy still looks like π0.5 from the outside (same inputs, same output types), but actions are now corrected by an internal telepathy pathway that is aware of deep semantics and associative intent.

3. Training Setup

3.1 Dataset & preprocessing

Upstream dataset: lerobot/svla_so101_pickplace
Task: pick-and-place style manipulation with multi-frame RGB + robot state + continuous actions.

A preprocessing script (dataset_preprocess_sigma_vla.py) does:

sliding-window segmentation with horizon T = 16,
filtering out windows with nearly zero action norm to remove static segments,
packing vision frames, robot state, and 3-scale action targets into tensor batches,
exporting three sharded files:

storage/sigma_pickplace/shard_00000.pt
storage/sigma_pickplace/shard_00001.pt
storage/sigma_pickplace/shard_00002.pt

These shards are the only data used for Sigma training and evaluation.

3.2 LoRA fine-tuning (Sigma training)

Training is performed on a single RTX 4090 using train_sigma_telepathy_vla_lora.py:

python train_sigma_telepathy_vla_lora.py \
  --base_model_id lerobot/pi05_base \
  --dataset_dir /workspace/storage/sigma_pickplace \
  --output_dir /workspace/storage/sigma_lora_out \
  --batch_size 4 \
  --gradient_accumulation_steps 4 \
  --max_steps 300 \
  --dtype bf16

Key aspects:

freeze backbone weights from lerobot/pi05_base;
attach LoRA on key projections (q, k, v, o) and the telepathy heads;
jointly optimize:
- three control losses:
  - L_act_vec for per-step action vectors,
  - L_act_chk for short-horizon chunks,
  - L_act_trj for full trajectories;
- semantic & telepathy regularizers:
  - alignment of semantic factors with text embeddings,
  - control of telepathy factor norm tau_l2.

All LoRA and telepathy parameters are stored under:

storage/sigma_lora_out/
  sigma_telepathy_heads.pt
  adapter_config.json
  adapter_model.bin
  ...

3.3 Telepathy-aware training logic

Two key training mechanisms are implemented inside the loss:

Telepathic Residual Action Focusing (TRAF)
Focuses learning on residual actions instead of full actions, and uses hard-sample mining (top-k error segments) to allocate more gradient budget to difficult humanoid control windows.
Telepathic Semantic Alignment Curriculum (TSAC)
Gradually increases the weights of:
- semantic memory–text alignment,
- intent–telepathy alignment, while maintaining action regression as the primary objective early on.
  Late in training, Sigma is encouraged to let internal semantic/intent structure drive the residual corrections.

4. Inference-time Telepathy Adapter

A lightweight adapter (sigma_adapter.py) controls how much the telepathy residuals are allowed to modify the baseline π0.5 actions:

reads:
- baseline π0.5 actions (base_action_vector, …),
- Sigma residuals,
- telepathy diagnostics (norms, cosine alignments),
computes a risk-aware scaling factor in min_scale, max_scale,
blends:

action = base_action + scale * telepathy_residual

If residuals are too large or misaligned, scale is pushed toward 0, effectively reverting to π0.5 behavior.
If residuals are moderate and well aligned, scale approaches 1, enabling telepathy-enhanced control.

5. Evaluation Protocol

Evaluation uses eval_sigma_vla_rollout.py in offline closed-loop replay:

both Sigma and the baseline:
- use the same preprocessed shards (shard_0000x.pt),
- share the same telepathy heads file sigma_telepathy_heads.pt,
only Sigma:
- loads LoRA weights,
- activates telepathy residuals and the adapter in control output.

5.1 CHECK A – telepathy geometry & alignment sanity

CHECK A verifies that telepathy geometry is identical between experimental and control runs:

heads_tensors = 325
mean ≈ 0.002, std ≈ 0.107, rms ≈ 0.107 for telepathy head weights
avg_tau_l2 ≈ 51.6 – average L2 norm of telepathy factors
avg_semantic_text_alignment ≈ 0.13 – semantic factor vs. text embedding alignment

These numbers are matched between Sigma and the π0.5 baseline, so behavior differences cannot be explained by changing telepathy parameters or text alignment geometry.

5.2 CHECK B – multiscale control & telepathy metrics

CHECK B defines and reports:

mse_vec – per-step action vector MSE (fine-grain control precision)
mse_chk – short segment chunk MSE (local motion consistency)
mse_trj – full trajectory MSE (long-horizon tracking)
tau_l2 – telepathy factor norms (activation strength)
sem_align – semantic alignment (e.g., cosine) between semantic factors and text embeddings

On the same 723 samples and 181 batches:

Sigma shows consistently lower mse_vec, mse_chk, mse_trj than the baseline,
while tau_l2 and sem_align remain similar between both models.

This pattern supports the interpretation that Sigma uses the same semantic / telepathy geometry more effectively, converting it into tangible gains in control accuracy instead of merely altering the embedding space.

6. How to Use Sigma

⚠️ You must have access to lerobot/pi05_base and the preprocessed shards or an equivalent environment to reproduce full experiments.

6.1 Installation (example)

# base env
pip install "transformers>=4.40.0" accelerate torch torchvision
pip install lerobot

# clone this repository (example path)
git clone https://github.com/Veltraxor/Sigma.git
cd Sigma

6.2 Loading Sigma on top of pi0.5

import torch
from lerobot import Pi05Policy
from sigma_vla import SigmaTelepathyVLA, SigmaTelepathyAdapter

device = "cuda"
dtype = torch.bfloat16

# 1. Load base π0.5 policy
base_policy = Pi05Policy.from_pretrained("lerobot/pi05_base")

# 2. Build Sigma on top of the base policy
sigma_policy = SigmaTelepathyVLA.from_base(
    base_policy=base_policy,
    lora_dir="./storage/sigma_lora_out",
    telepathy_heads_path="./storage/sigma_lora_out/sigma_telepathy_heads.pt",
    device=device,
    dtype=dtype,
)

# 3. Optional runtime adapter
adapter = SigmaTelepathyAdapter(
    min_scale=0.0,
    max_scale=1.0,
    risk_temperature=1.0,
)

# 4. Single batch forward (offline replay)
batch = {
    "vis_obs": vis_obs_tensor,           # [B, T, C, H, W]
    "robot_state": robot_state_tensor,   # [B, T, D_state]
    "texts": list_of_text_prompts,       # length B
}

with torch.no_grad():
    out = sigma_policy(**batch, use_telepathy=True)
    blended_action = adapter(
        base_action_vector=out["base_action_vector"],
        telepathy_residual=out["telepathy_residual_vector"],
        telepathy_factors=out["telepathy_factors"],
    )

7. Repository Layout (typical)

A typical Sigma repo / model card includes:

README.md                      # this file
sigma_env.example              # example env file for HF tokens, paths
dataset_preprocess_sigma_vla.py
train_sigma_telepathy_vla_lora.py
eval_sigma_vla_rollout.py
sigma_telepathy_vla.py         # model definition
sigma_adapter.py               # inference-time adapter

storage/
  sigma_pickplace/
    shard_00000.pt
    shard_00001.pt
    shard_00002.pt
  sigma_lora_out/
    sigma_telepathy_heads.pt
    adapter_config.json
    adapter_model.bin
    ...

logs/
  sigma_eval_report.json
  sigma_eval_checkA.json
  sigma_eval_checkB.json

You can adapt this layout to your own environment; the key assumption is that Sigma is always loaded as a LoRA + telepathy delta on top of lerobot/pi05_base.

8. Intended Use, Risks, and Limitations

Intended use
Sigma is intended for research and experimentation on:
- semantic / telepathy-style control in VLA systems,
- offline trajectory analysis and simulation,
- early-stage humanoid / manipulator control studies.
Not intended for
- direct deployment on physical robots without additional safety layers;
- safety-critical or human-facing applications.
Known limitations
- trained only on svla_so101_pickplace;
- evaluated only in offline replay;
- telepathy path tuned for a single task family and embodiment.

Users should treat Sigma as a proof-of-concept that demonstrates how “deep semantic + associative intent” can be engineered into residual control, not as a generic controller.

9. Author & Acknowledgements

Author: Libo Wang
Base policy and dataset by Physical Intelligence / LeRobot teams.
Training environment based on a single RTX 4090 GPU; all scripts are structured to be portable to other single-GPU or multi-GPU setups with minimal changes.

10. Citation

If you use Sigma, please cite both the original π0.5 / OpenPI work and this Sigma extension.

π0.5 / OpenPI:

@article{openpi2024,
  title   = {Open-World Robotic Manipulation with Vision-Language-Action Models},
  author  = {Physical Intelligence},
  year    = {2024},
  url     = {https://github.com/Physical-Intelligence/openpi}
}

Sigma (example entry):

@article{sigma2025,
  title   = {Sigma: The Key for Vision--Language--Action Models toward Telepathy},
  author  = {Wang, Libo},
  year    = {2025},
  note    = {Telepathy-style extension of lerobot/pi05_base},
  url     = {https://huggingface.co/Veltraxor/Sigma}
}

Downloads last month: 42

Safetensors

Model size

4B params

Tensor type

F32

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Veltraxor/Sigma

Base model

lerobot/pi05_base

Adapter

(1)

this model

Veltraxor
/

Sigma