Cheap Lasers and Bad Math; The Coming Revolution in Robot Perception
Contents
In 2011, I took a ride in Google’s self-driving car. It was an earlier prototype with a monitor mounted on the passenger side where you could see in real time what the car was seeing. It was set for a pretty aggressive test and was laying rubber on the road as we weaved through cones on the closed track. Plainly put, it drove better than any human I’ve ever driven with, and as it drove I could see on the monitor everything that it saw. The entire scene in 360 degrees was lit up. It saw everything around it in a way humans never could. This was the future of auto safety.
As I got out of the car and watched it take other excited passengers around the track I was convinced self-driving cars were right around the corner. Clearly that did not happen. The reason why was not something I would learn about until I got into computer vision work several years later. That beautiful 360 degree situational awareness was driven primarily by one sensor, the Velodyne LIDAR unit.
LIDAR is laser ranging. It uses lasers to measure the distance to objects. It has been used for years by the military and in industrial applications. But it hasn’t really been a consumer technology, experienced by the public. One major reason has been the cost. The LIDAR unit on top of Google’s car in 2011 cost roughly $75,000.
In the years since I took that ride, the size and cost of LIDAR units has decreased. The smaller units can be seen on many robots including Boston Dynamics‘ famous BigDog and Atlas robots. These cost about $30,000. In 2015 that cost came down to about $8,000 for a unit with fewer lasers. These units were small enough to comfortably be carried by small consumer drones and were used in robotics projects globally.
As exciting as that was, 2016 puts us on the precipice of an even greater breakthrough, both from established providers and new upstarts on Kickstarter. The sub $500 LIDAR sensor is about to hit the market. Usually when prices for a technology fall that fast you see it start to pop up in new and interesting places. Along with advances in other vision technologies, such as structured light and structure through motion, LIDAR sensing is about to give robots the ability to navigate through the world around them in ways that were hitherto expensive and difficult.
There are some amazing feats of robot cooperation and precise movement in pre-mapped and controlled environments, from drone swarms navigating quickly through small openings, to high speed precision welding. But this capability of a robot learning its three-dimensional environments, known to roboticists as SLAM, Simultaneous Localization and Mapping, hasn’t been widely available for use in low-cost robots. That is about to change.
Unfortunately the availability of low-cost sensors is only half the problem. Processing power is also a limiting factor in three-dimensional SLAM. Vision processing takes a significant amount of processing power, a difficult challenge in smaller robots and drones. You can do some of this work with off-board computing, but in high speed challenges, like autonomous flight or driving, the risk of communications lag is too great. For safety, it is best to do this computing on board the robot.
A few years ago this presented difficulties, but now we find new hardware on the cusp of solving these problems as well. NVIDIA, long known for its graphics cards, has begun the drive towards greater GPU processing power in embedded systems and robots. The NVIDIA Jetson embedded platform is targeted at Deep Learning AI researchers, but holds great promise for computer vision processing problems.
NVIDIA is not alone. New chips are also being developed by startups with computer vision as a target. For example, Singular Computing is taking the interesting approach of designing chips that are imprecise. Inspired by human neurons, which only fire together when wired together about 90% of the time, Singular explored what could be done in chip architecture if you allowed up to a 1% error in mathematical results. The answer? You could pack about 1000 times the processors in the same space. For doing vision processing this means a picture that is nearly identical to one produced by a traditional computer, in a fraction of the time and for a fraction of the power. The ability for robots to both see and reason could eventually be powered by bad math.
To date most robots have either had a tie to an operator who controlled the robot, or they operated in a very controlled environment such as a factory floor. The idea of robots navigating freely and interacting with people is still very new, and reserved for a few test cases like the Google self-driving car. The coming reduction in cost and increase in capabilities in robotic perception in the next few years are about to forever alter the landscape of where we see robots in our daily lives. So, is the world ready for volumes of self-navigating robots with narrow AIs entering our streets, homes, workplaces, schools, or battlefields in the next five to ten years? This is a question we should all be considering.
*Photo credit Peter Haas.
About the Future of Life Institute
The Future of Life Institute (FLI) is a global non-profit with a team of 20+ full-time staff operating across the US and Europe. FLI has been working to steer the development of transformative technologies towards benefitting life and away from extreme large-scale risks since its founding in 2014. Find out more about our mission or explore our work.