- One-shot demo highlights advances in end-to-end speech-to-motion control
- G1 humanoid responds to spoken commands without pre-programmed trajectories
Unitree (宇树科技) on May 19 released a single-take video showing its G1 humanoid robot generating movements in real time from spoken voice commands, marking a fresh step toward fully AI-driven speech-to-action control.
The video, recorded with live on-site audio and no editing, showed the robot responding directly to verbal instructions such as throwing punches and performing body movements without relying on pre-set motion trajectories.
According to Unitree, the current system supports a full pipeline covering speech recognition, intent parsing, task decomposition, motion generation and execution control.
Spoken instructions are first converted into text before being interpreted by a large AI model, which then coordinates the robot’s joint motors to carry out the requested movement.

The company said all motions in the demo were generated in real time by AI rather than retrieved from pre-programmed libraries.
It acknowledged the current version still experiences some response latency and that movement smoothness remains a work in progress.
Industry analysts said the demonstration suggests humanoid whole-body controllers have become stable enough to connect large language model-based voice inputs with real-time motion generation.

The next stage, they said, will likely focus on context-aware task execution and more adaptive autonomous behavior.
Unitree launched the G1 humanoid in 2024. The robot stands about 127 centimeters tall, weighs roughly 35 kilograms and comes equipped with between 23 and 43 joint motors, as well as dexterous hands.
The company has accelerated product launches in recent weeks. On May 12, Unitree unveiled what it described as the world’s first mass-produced manned “mecha,” with prices starting from 3.9 million yuan ($574,197).
