In a dramatic breakthrough for robotics, researchers in the College of Engineering at Oregon State University used a reinforcement learning algorithm operating in a simulated environment to train a bipedal robot to walk, run, hop, skip, and climb stairs in the real world.
The “sim-to-real” learning process represents a transformation in robotics control, according to Jonathan Hurst, professor of mechanical engineering and robotics.
“It’s groundbreaking work that’s simply never been done before,” said Hurst, who developed the robot, named Cassie, at Oregon State in 2017 and later began marketing it through Agility Robotics, the spinoff company he cofounded.
Typically, bipedal robots learn to move using reference trajectories that specify the positions and velocities of limbs and joints. A reinforcement learning algorithm, which earns rewards for imitating those trajectories, trains the neural network that controls the robot.
But finding good reference trajectories can be difficult; so is ensuring that they’re compatible with the robot’s hardware. In addition, reference trajectories may not capture the variation of movement needed to fully realize gaits in changing circumstances.
“For a robot like Cassie, which has a tremendous amount of dynamic complexity, writing down equations that describe the robot’s motion is just too complicated,” Hurst said.
At first, just getting the robot to take a few steps on a treadmill without falling was difficult when traditional training techniques were applied, added Alan Fern, professor of computer science. “We thought Cassie was capable of all of the major bipedal gaits, but we didn’t know how to specify them,” he said.
A turning point occurred when the team developed a straightforward mathematical framework describing all common bipedal gaits based on their periodic motions.
“We started thinking in terms of the periodic structure of each gait and how to specify constraints on foot force and velocity,” Fern explained.
When walking, for example, the foot moving through the air has velocity but no force, while the grounded foot registers force but no velocity. By adjusting the force and velocity constraints, the entire realm of bipedal gaits could be stipulated.
Training occurred entirely within a simulation. Through trial and error, a reinforcement learning algorithm trained the neural network used to control a virtual Cassie. The algorithm attempted to behave in a way that maximized its rewards, which it earned when it accurately reproduced the forces and velocities of specified gaits. So, during the swing phase of a walking step, velocity was rewarded and force was penalized, prompting the algorithm to lift a foot.
Simulated training allows the reinforcement learning algorithm to run through hundreds of millions of practice steps. By contrast, training on a treadmill would severely limit the number of repetitions and inevitably involve numerous falls — and possibly damage the hardware.
However, small differences between the physics of the virtual and real environments must be reconciled, or they can lead to failures in the physical realm. Bridging the “realism gap,” involves a process called domain randomization, in which small perturbations to fundamental physical parameters, like friction, are seeded into the simulation. The process enhances the robot’s ability to generalize its algorithm across different environments.
The researchers also decided to use a recurrent neural network — rather than a feed-forward neural network — because it incorporates memory. RNNs are adept at error-correcting and encoding domain randomizations, even when faced with situations not encountered during training.
“We think this memory, combined with domain randomization, was critical for creating robust behavior in the real world,” Fern said.
Training was halted after 150 million practice steps, which only took several hours in real time. Then the knowledge about the newly learned gaits was transferred to Cassie. In a separate sim-to-real process, Cassie was also trained to negotiate stairs.
So that Cassie would be prepared to handle the range of step elevations that it would encounter in the physical world, stair heights in the simulation were randomized within reasonable limits. That way, the robot learned to lift its leg high enough to cover those limits.
Cassie executed all the specified gaits and repeatedly navigated stairs — a remarkable feat for a robot that completely lacks external sensors and that relies entirely on proprioceptive feedback from contact between its limbs and the ground for information about its position in the world. It also robustly completed tasks not modeled in training, such as traversing curbs, inclines, and logs in a natural setting.
“This was all really shocking,” Fern said. “I can’t overstate how surprised we were by the results. When we watched Cassie climb stairs in the simulation, at first we thought it was some simulator artifact, because a ‘blind’ robot couldn’t possibly do that in the real world. Then they brought the robot outside, and it walked up and down stairs all over campus.”
According to Fern, Cassie learned to change its gait and use higher default trajectories of its feet when necessary to clear each stair, then it learned some basic reactive behaviors when it sensed it was becoming unstable. He likened the skill to how a blindfolded human would climb or descend steps.
“We would try to feel our way ahead one foot at a time, and we could do it, carefully, without seeing the stairs,” he said.
Even after poor foot placements and stumbles on stairs, Cassie showed a remarkable ability to instantly adjust and recover. And in July, it became the first bipedal robot to run a 5K.
“This is a new tool that will be part of robot control moving forward,” Hurst said. “As we figure out how to make robots go where people go, we’ll be enabling an entirely new era of robotics. Until recently, the core science of how to make a two-legged robot walk or run was not known. Now that’s changing.”