In this video, speech recognition is used to control a robotic arm (e.Do by Comau) keeping hands free. Two modalities are available: predefined poses or spatial mode using x, y and z coordinates. The deep neural network model is speaker independent and classifies audio into ten keywords.

The user can enter ten commands. A neural network is trained to classify the incoming audio into one of the keywords and deployed on a ST Sensortile board. The recognized command is used to control an e.Do robotic arm by Comau.

We developed a ROS package in Python that:
- handles the serial communication with the SensorTile and
- controls e.DO using ROS Messages or ROS MoveIt

The robot goes directly to predefined poses corresponding to the speech command received.
The implemented commands are:
- “up”, “down”, “left”, and “right” corresponding to four poses in those directions with a top view of the robot;
- “off” brings the robot to the rest position;
- “on” moves the robot forward, backward, and then brings it to the rest position;

At any moment the movement of the robot can be stopped using the command “stop”.
Then the previous movement can resume with the command “go” or another movement command can be issued.

In spatial mode we can move the robot end effector in space using x, y, and z coordinates.
The commands used are:
- “up” and “down” respectively to increase and decrease the z coordinate;
- “left” and “right” to increase and decrease the x coordinate;
- “on” and “off” to increase and decrease the y coordinate.

Video by Santer Reply.

Follow us on Linkedin and Twitter to discover the next steps!

Read more about this ALOHA use case: Speech recognition in Smart Industry


Project Coordinator
Giuseppe Desoli - STMicroelectronics

Scientific Coordinator
Paolo Meloni - University of Cagliari, EOLAB

Dissemination Manager
Francesca Palumbo - University of Sassari, IDEA Lab