Behavior Uncloning — PO (step 60)
VLA unlearning checkpoint: pi0.5 model with PO unlearning applied.
Results
| Metric | Value |
|---|---|
| Method | PO |
| Training Steps | 60 |
| Forget Task | "turn on the stove" (LIBERO-Goal T6) |
| Forget SR | 0% (baseline: 100%) |
| Retain SR | 100.0% (baseline: 97.8%) |
| HM | 1.0 |
Usage
# Serve with openpi
uv run scripts/serve_policy.py --env LIBERO policy:checkpoint \
--policy.config pi05_libero --policy.dir <path_to_checkpoint>
Method
Action Redirection (PO): L = L_flow(obs_forget, 0) + β·L_retain. Trains model to output zero actions on forget task.
Base model: pi0.5 LIBERO
See full report: experiment_report.md