Abstract
Humanoid robots have achieved impressive locomotion performance, yet contact-rich and long-horizon manipulation remains a major bottleneck. Manipulation is inherently contact-rich and demands compliant whole-body control for stable interaction, while its diversity and long-horizon nature favor modular, planner-compatible interfaces over joint-space tracking. We propose CEER, a compliant end-effector-root (EE-root) control abstraction for modular humanoid loco-manipulation within a hierarchical planning framework. CEER enables compliance-aware whole-body control in an interpretable task space defined by root motion commands and end-effector pose targets, and supports plug-and-play integration with heterogeneous high-level planners. A teacher-student framework is adopted to distill a general motion-tracking controller into a low-level policy that consumes only EE-root commands. We further construct a hierarchical system that integrates heterogeneous planners and task modules through the EE-root interface, enabling diverse manipulation tasks without retraining the underlying whole-body policy. Experiments in simulation and on hardware demonstrate 3.3 cm end-effector tracking accuracy with substantially reduced jerk compared to baselines, stable contact-rich manipulation under teleoperation, and up to 70% success in simulated single-object loco-manipulation tasks within a room-scale environment. These results indicate that compliant EE-root control provides a practical abstraction for humanoid loco-manipulation, enabling modular and scalable integration of diverse skills.
Overview

Video
System
Overview of the proposed three-layer hierarchical system. High-level planners and task modules connect to the compliant whole-body policy through a single end-effector–root (EE-root) interface.

A teacher-student framework distills a general motion-tracking controller into a low-level policy that consumes only EE-root commands.

Results
End-effector tracking accuracy and smoothness compared to baselines — CEER achieves 3.3 cm position error with substantially reduced jerk.

Contact-rich manipulation on hardware across diverse tasks.

Single-object loco-manipulation in a room-scale simulated environment, including spatial-relation and long-horizon tasks.

Links
Citation
| |