How to teach new skills to robots?

A warehouse robot pulls mugs off a shelf and places them into boxes for delivery as e-commerce orders flood in. Everything runs smoothly until the warehouse undergoes a change, requiring the robot to grasp taller, narrower mugs that are stored upside down.

Reprogramming that robot involves hand-labeling thousands of images that show it how to grasp these new mugs, then training the system all over again.

However, MIT researchers have developed a novel technique that would just take a few human demonstrations to reprogram the robot. This machine-learning method enables a robot to pick up and place never-before-seen objects that are in random poses it has never encountered. The robot would be ready to undertake a new task in 10 to 15 minutes.

The technique includes a neural network that was initially designed for reconstructing the shape of 3D objects. The system uses what the neural network has learnt about 3D geometry to grasp new objects that are similar to those in the demos.

The researchers demonstrate that their system can effectively manipulate never-before-seen cups, bowls, and bottles, stacked in random configurations, using only 10 demonstrations to educate the robot in simulations and on a real robotic arm.

In simulations and using a real robotic arm, the researchers show that their system can effectively manipulate never-before-seen mugs, bowls, and bottles, arranged in random poses, using only 10 demonstrations to teach the robot.

“Our major contribution is the general ability to much more efficiently provide new skills to robots that need to operate in more unstructured environments where there could be a lot of variability. The concept of generalization by construction is a fascinating capability because this problem is typically so much harder,” says Anthony Simeonov, a graduate student in electrical engineering and computer science (EECS) and co-lead author of the paper.

Grasping Geometry

A robot may be trained to pick up a specific item, but if that item is lying on its side (for example, because it has fallen over), the robot perceives this as a totally different position. This is one reason it is so hard for machine-learning systems to generalize to new object orientations.

To address this problem, the researchers developed a Neural Descriptor Field (NDF), a novel sort of neural network model that learns the 3D geometry of a class of things. The model uses a 3D point cloud, which is a collection of data points or coordinates in three dimensions, to compute the geometric representation for a single item. A depth camera, which provides information on the distance between an item and a viewpoint, can supply the data points. The network was trained in simulation on a vast dataset of synthetic 3D forms, but it can now be applied to real-world objects.

The team designed the NDF with a property known as equivariance. With this property, if the model is shown an image of an upright mug, and then shown an image of the same mug on its side, it understands that the second mug is the same object, just rotated.

“This equivariance is what allows us to much more effectively handle cases where the object you observe is in some arbitrary orientation,” Simeonov says.

As the NDF learns to reconstruct shapes of similar objects, it also learns to associate related parts of those objects. For instance, it learns that the handles of mugs are similar, even if some mugs are taller or wider than others, or have smaller or longer handles.

“If you wanted to do this with another approach, you’d have to hand-label all the parts. Instead, our approach automatically discovers these parts from the shape reconstruction,” Du says.

The researchers use this trained NDF model to teach a robot a new skill with only a few physical examples. They move the hand of the robot onto the part of an object they want it to grip, like the rim of a bowl or the handle of a mug, and record the locations of the fingertips.

Because the NDF has learned so much about 3D geometry and how to reconstruct shapes, it can infer the structure of a new shape, which enables the system to transfer the demonstrations to new objects in arbitrary poses, Du explains.

Picking a winner

They tested their concept using mugs, bowls, and bottles as items in simulations and on a real robotic arm. On pick-and-place tasks with new objects in new orientations, their technique had an 85 percent success rate, while the best baseline only had a 45 percent success rate. Grasping a new object and placing it in a certain area, such as hanging mugs on a rack, is a sign of success.

Many baselines rely on 2D picture data rather than 3D geometry, making equivariance integration more challenging for these algorithms. This is one of the reasons why the NDF approach outperformed the others.

While the researchers were happy with its results, their method is limited to the object category for which it was taught. Because these things have geometric properties that are too different from what the network was trained on, a robot taught to pick up mugs won’t be able to pick up boxes or headphones.

“In the future, scaling it up to many categories or completely letting go of the notion of category altogether would be ideal,” Simeonov says.

They also plan to adapt the system for nonrigid objects and, in the longer term, enable the system to perform pick-and-place tasks when the target area changes.

Witness the world robotic Championship Technoxian 2022 and visit the online portal for all further updates :





Related posts

Leave a Comment