robots learn faster than ever
A new artificial intelligence startup called Osaro aims to give industrial robots the same turbocharge that DeepMind Technologies gave Atari-playing computer programs.
In December 2013, DeepMind showcased a type of artificial intelligence that had mastered seven Atari 2600 games from scratch in a matter of hours, and could outperform some of the best human players. Google swiftly snapped up the London-based company, and the deep-reinforcement learning technology behind it, for a reported $400 million.
Now Osaro, with $3.3 million in investments from the likes of Peter Thiel and Jerry Yang, claims to have taken deep-reinforcement learning to the next level, delivering the same superhuman AI performance but over 100 times as fast.
Deep-reinforcement learning arose from deep learning, a method of using multiple layers of neural networks to efficiently process and organize mountains of raw data (see “10 Breakthrough Technologies 2013: Deep Learning”). Deep learning now underlies many of the best facial recognition, video classification, and text or speech recognition systems from Google,Microsoft, and IBM Watson.
Deep-reinforcement learning adds control to the mix, using deep learning’s ability to accurately classify inputs, such as video frames from a game of Breakout or Pong, to work toward a high score. Deep-reinforcement learning systems train themselves automatically by repeating a task over and over again until they reach their goal. “The power of deep reinforcement is that you can discover behaviors that a human would not have guessed or thought to hand code,” says Derik Pridmore, president and chief operating officer of Osaro.
Training a new AI system from a blank slate, however, can take a long time. DeepMind’s Atari demo required tens of millions of video frames, the equivalent of many thousands of games, to reach near-perfection. That is fine for digital tasks that can be virtually compressed to hours or minutes in supercomputers, but it doesn’t translate well to real-world robotics.
“A robot is a physically embodied system that takes time to move through space,” says Pridmore. “If you want to use basic deep-reinforcement learning to teach a robot to pick up a cup from scratch, it would literally take a year or more.”
To accelerate that training process,Osaro took inspiration from the way people learn most activities — by watching other people. Osaro has built a games-playing program that starts by observing a human play several games; it then uses those behaviors as a jumping-off point for its own training efforts. “It doesn’t copy a human and you don’t have to play precisely or very well. You just give it a reasonable idea of what to do,” says Pridmore. He claims Osaro’s AI system can pick up a game 100 times as fast as DeepMind’s program, although the company has yet to publish its research.
Osaro’s first application for its deep-reinforcement learning technology is likely to be high-volume manufacturing, where reprogramming assembly line robots can currently take weeks of effort from highly skilled (and highly paid) professionals. Pridmore says Osaro can reduce that time to around a week, with an added benefit of building efficient control systems that can cope with “noisy” conditions such as uneven components or changing lighting.
Eventually, says Pridmore, the training process should be almost effortless. “In the future, you will be able to give a robot three buckets of parts, show it a finished product, and simply say, ‘Make something like this.’” That day is still some ways off. Osaro’s next step is to run simulated robotic demos in a virtual environment called Gazebo before launching with industrial robot manufacturers and their customers in 2017.
Oren Etzioni, executive director of the Allen Institute for Artificial Intelligence, says the approach is “technically exciting” and “tantalizing.” Pieter Abbeel, a professor of computer science at the University of California, Berkeley, and organizer of a deep-reinforcement learning symposium, agrees. “Learning more directly from human demonstrations and advice in all kinds of formats is intuitively the way to get a system to learn more quickly,” he says. “However, developing a system that is able to leverage a wide range of ways of learning modalities is challenging.”