Sigma: The Key for Vision–Language–Action Models toward Telepathy
Sigma is a telepathy-style Vision–Language–Action (VLA) model built on top of lerobot/pi05_base.
It adds a semantic “telepathy” path and LoRA adapters that steer continuous robot control using internal semantic memory and intent states, while keeping the original π0.5 backbone weights intact and recoverable.
1. Summary
- Base policy:
lerobot/pi05_base(π0.5) - Author: Libo Wang
- GPU for training: single RTX 4090 (24GB)
- Data:
lerobot/svla_so101_pickplace - Objective:
Make a π0.5-style VLA use internal semantic & intent states to refine continuous control, rather than only imitating trajectories.
Sigma keeps the perception and control structure of π0.5, and introduces an additional pathway that:
- fuses vision, language, and robot state into a shared latent sequence,
- maintains a semantic state m_t and an intent vector z_intent over time,
- converts them into telepathy factors that modulate the policy’s action outputs as residual corrections.
2. Architecture at a Glance
Sigma can be seen as π0.5 + telepathic head + LoRA adapters:
Vision / State stream
- reuse π0.5 encoders for images and robot state;
- add FiLM-style modulation from telepathy factors on vision tokens.
Language–semantic stream
- take text tokens, vision tokens, and state tokens into a shared MLLM backbone;
- derive:
- a semantic memory m_t that accumulates cross-time information,
- an intent vector z_intent,
- pooled semantic factors aligned with the text embedding space.
Action stream (three branches)
- treat π0.5 outputs as baseline:
- action vector (per-step),
- action chunk (short horizon),
- action trajectory (full horizon);
- learn residual actions driven by telepathy factors on all three branches.
- treat π0.5 outputs as baseline:
The resulting policy still looks like π0.5 from the outside (same inputs, same output types), but actions are now corrected by an internal telepathy pathway that is aware of deep semantics and associative intent.
3. Training Setup
3.1 Dataset & preprocessing
- Upstream dataset:
lerobot/svla_so101_pickplace - Task: pick-and-place style manipulation with multi-frame RGB + robot state + continuous actions.
A preprocessing script (dataset_preprocess_sigma_vla.py) does:
- sliding-window segmentation with horizon
T = 16, - filtering out windows with nearly zero action norm to remove static segments,
- packing vision frames, robot state, and 3-scale action targets into tensor batches,
- exporting three sharded files:
storage/sigma_pickplace/shard_00000.pt
storage/sigma_pickplace/shard_00001.pt
storage/sigma_pickplace/shard_00002.pt
These shards are the only data used for Sigma training and evaluation.
3.2 LoRA fine-tuning (Sigma training)
Training is performed on a single RTX 4090 using train_sigma_telepathy_vla_lora.py:
python train_sigma_telepathy_vla_lora.py \
--base_model_id lerobot/pi05_base \
--dataset_dir /workspace/storage/sigma_pickplace \
--output_dir /workspace/storage/sigma_lora_out \
--batch_size 4 \
--gradient_accumulation_steps 4 \
--max_steps 300 \
--dtype bf16
Key aspects:
- freeze backbone weights from
lerobot/pi05_base; - attach LoRA on key projections (q, k, v, o) and the telepathy heads;
- jointly optimize:
- three control losses:
L_act_vecfor per-step action vectors,L_act_chkfor short-horizon chunks,L_act_trjfor full trajectories;
- semantic & telepathy regularizers:
- alignment of semantic factors with text embeddings,
- control of telepathy factor norm
tau_l2.
- three control losses:
All LoRA and telepathy parameters are stored under:
storage/sigma_lora_out/
sigma_telepathy_heads.pt
adapter_config.json
adapter_model.bin
...
3.3 Telepathy-aware training logic
Two key training mechanisms are implemented inside the loss:
Telepathic Residual Action Focusing (TRAF)
Focuses learning on residual actions instead of full actions, and uses hard-sample mining (top-k error segments) to allocate more gradient budget to difficult humanoid control windows.Telepathic Semantic Alignment Curriculum (TSAC)
Gradually increases the weights of:- semantic memory–text alignment,
- intent–telepathy alignment,
while maintaining action regression as the primary objective early on.
Late in training, Sigma is encouraged to let internal semantic/intent structure drive the residual corrections.
4. Inference-time Telepathy Adapter
A lightweight adapter (sigma_adapter.py) controls how much the telepathy residuals are allowed to modify the baseline π0.5 actions:
- reads:
- baseline π0.5 actions (
base_action_vector, …), - Sigma residuals,
- telepathy diagnostics (norms, cosine alignments),
- baseline π0.5 actions (
- computes a risk-aware scaling factor in min_scale, max_scale,
- blends:
action = base_action + scale * telepathy_residual
If residuals are too large or misaligned, scale is pushed toward 0, effectively reverting to π0.5 behavior.
If residuals are moderate and well aligned, scale approaches 1, enabling telepathy-enhanced control.
5. Evaluation Protocol
Evaluation uses eval_sigma_vla_rollout.py in offline closed-loop replay:
- both Sigma and the baseline:
- use the same preprocessed shards (
shard_0000x.pt), - share the same telepathy heads file
sigma_telepathy_heads.pt,
- use the same preprocessed shards (
- only Sigma:
- loads LoRA weights,
- activates telepathy residuals and the adapter in control output.
5.1 CHECK A – telepathy geometry & alignment sanity
CHECK A verifies that telepathy geometry is identical between experimental and control runs:
heads_tensors = 325mean ≈ 0.002,std ≈ 0.107,rms ≈ 0.107for telepathy head weightsavg_tau_l2 ≈ 51.6– average L2 norm of telepathy factorsavg_semantic_text_alignment ≈ 0.13– semantic factor vs. text embedding alignment
These numbers are matched between Sigma and the π0.5 baseline, so behavior differences cannot be explained by changing telepathy parameters or text alignment geometry.
5.2 CHECK B – multiscale control & telepathy metrics
CHECK B defines and reports:
mse_vec– per-step action vector MSE (fine-grain control precision)mse_chk– short segment chunk MSE (local motion consistency)mse_trj– full trajectory MSE (long-horizon tracking)tau_l2– telepathy factor norms (activation strength)sem_align– semantic alignment (e.g., cosine) between semantic factors and text embeddings
On the same 723 samples and 181 batches:
- Sigma shows consistently lower
mse_vec,mse_chk,mse_trjthan the baseline, - while
tau_l2andsem_alignremain similar between both models.
This pattern supports the interpretation that Sigma uses the same semantic / telepathy geometry more effectively, converting it into tangible gains in control accuracy instead of merely altering the embedding space.
6. How to Use Sigma
⚠️ You must have access to
lerobot/pi05_baseand the preprocessed shards or an equivalent environment to reproduce full experiments.
6.1 Installation (example)
# base env
pip install "transformers>=4.40.0" accelerate torch torchvision
pip install lerobot
# clone this repository (example path)
git clone https://github.com/Veltraxor/Sigma.git
cd Sigma
6.2 Loading Sigma on top of pi0.5
import torch
from lerobot import Pi05Policy
from sigma_vla import SigmaTelepathyVLA, SigmaTelepathyAdapter
device = "cuda"
dtype = torch.bfloat16
# 1. Load base π0.5 policy
base_policy = Pi05Policy.from_pretrained("lerobot/pi05_base")
# 2. Build Sigma on top of the base policy
sigma_policy = SigmaTelepathyVLA.from_base(
base_policy=base_policy,
lora_dir="./storage/sigma_lora_out",
telepathy_heads_path="./storage/sigma_lora_out/sigma_telepathy_heads.pt",
device=device,
dtype=dtype,
)
# 3. Optional runtime adapter
adapter = SigmaTelepathyAdapter(
min_scale=0.0,
max_scale=1.0,
risk_temperature=1.0,
)
# 4. Single batch forward (offline replay)
batch = {
"vis_obs": vis_obs_tensor, # [B, T, C, H, W]
"robot_state": robot_state_tensor, # [B, T, D_state]
"texts": list_of_text_prompts, # length B
}
with torch.no_grad():
out = sigma_policy(**batch, use_telepathy=True)
blended_action = adapter(
base_action_vector=out["base_action_vector"],
telepathy_residual=out["telepathy_residual_vector"],
telepathy_factors=out["telepathy_factors"],
)
7. Repository Layout (typical)
A typical Sigma repo / model card includes:
README.md # this file
sigma_env.example # example env file for HF tokens, paths
dataset_preprocess_sigma_vla.py
train_sigma_telepathy_vla_lora.py
eval_sigma_vla_rollout.py
sigma_telepathy_vla.py # model definition
sigma_adapter.py # inference-time adapter
storage/
sigma_pickplace/
shard_00000.pt
shard_00001.pt
shard_00002.pt
sigma_lora_out/
sigma_telepathy_heads.pt
adapter_config.json
adapter_model.bin
...
logs/
sigma_eval_report.json
sigma_eval_checkA.json
sigma_eval_checkB.json
You can adapt this layout to your own environment; the key assumption is that Sigma is always loaded as a LoRA + telepathy delta on top of lerobot/pi05_base.
8. Intended Use, Risks, and Limitations
Intended use
Sigma is intended for research and experimentation on:- semantic / telepathy-style control in VLA systems,
- offline trajectory analysis and simulation,
- early-stage humanoid / manipulator control studies.
Not intended for
- direct deployment on physical robots without additional safety layers;
- safety-critical or human-facing applications.
Known limitations
- trained only on
svla_so101_pickplace; - evaluated only in offline replay;
- telepathy path tuned for a single task family and embodiment.
- trained only on
Users should treat Sigma as a proof-of-concept that demonstrates how “deep semantic + associative intent” can be engineered into residual control, not as a generic controller.
9. Author & Acknowledgements
- Author: Libo Wang
- Base policy and dataset by Physical Intelligence / LeRobot teams.
- Training environment based on a single RTX 4090 GPU; all scripts are structured to be portable to other single-GPU or multi-GPU setups with minimal changes.
10. Citation
If you use Sigma, please cite both the original π0.5 / OpenPI work and this Sigma extension.
π0.5 / OpenPI:
@article{openpi2024,
title = {Open-World Robotic Manipulation with Vision-Language-Action Models},
author = {Physical Intelligence},
year = {2024},
url = {https://github.com/Physical-Intelligence/openpi}
}
Sigma (example entry):
@article{sigma2025,
title = {Sigma: The Key for Vision--Language--Action Models toward Telepathy},
author = {Wang, Libo},
year = {2025},
note = {Telepathy-style extension of lerobot/pi05_base},
url = {https://huggingface.co/Veltraxor/Sigma}
}
- Downloads last month
- 42
Model tree for Veltraxor/Sigma
Base model
lerobot/pi05_base