The Heart of a Robot? Computer Vision & AI

January 10, 2020

Max Maxfield

AI, computer vision, machine learning, facial recognition

Editor’s Note: insight.tech stands in support of ending acts of racism, inequity, and social injustice. We do not tolerate our sponsors’ products being used to violate human rights, including but not limited to the abuse of visualization technologies by governments. Products, technologies, and solutions are featured on insight.tech under the presumption of responsible and ethical use of artificial intelligence and computer vision tools, technologies, and methods.

Robots have long been a staple of science fiction stories. Robby the Robot, with a distinct personality and a dry wit, first appeared in the 1956 movie Forbidden Planet. And in the U.S. television series Lost in Space, the Model B9 robot had both superhuman strength and musical talents.

Of course, we are still a long way from fully autonomous robots that offer personality, protection, and companionship—on this planet, let alone in the farthest reaches of space. But we are starting to see glimpses of intelligent robots in a wide range of everyday applications:

Greeting customers, answering questions, and guiding retail shoppers
Providing information about hospital facilities and guidance on continuing patient care
Receiving guests, guiding them to reception, and transporting luggage to hotel rooms
Accepting payments and gathering account information in banking centers
Shuttling goods around warehouses and serving as after-hours security guards

The Anatomy of a Smart Service Robot

Video 1 shows Smart Service and Delivery Robots from New Era AI Robotic Inc. The systems use simultaneous localization and mapping (SLAM) algorithms, voice and facial recognition software, and a comprehensive sensor suite to carry out the tasks mentioned above.

Video 1. Intelligent service and delivery robots are used as assistants in several industries. (Source: New Era AI Robotic)

These capabilities are executed on two separate subsystems: one for navigation and control, and the other to drive user interfaces.

At the Core: Computer Vision and Deep Learning

New Era’s in-house SLAM technology is at the heart of its robots, allowing the 40- to 50-kg systems to safely navigate surroundings. The deterministic, control-oriented SLAM software runs against input data from multiple sensors to give robots a 2D/3D view of their surroundings for object detection, recognition, and avoidance.

“Autonomous cars have many, many sensors,” said Allen Tsai, chief engineer of SLAM software at New Era AI. “Likewise, indoor robots can’t just rely on one sensor. In real-world environments like shopping malls where there are a lot of people, nothing is static.”

Initially, the systems leveraged just a 2D planar LiDAR sensor array. Although this LiDAR is cost-effective and reliable, it proved limiting for robots navigating dynamic three-dimensional spaces. By adding an Intel^® RealSense^™ camera to the design, New Era implemented stereo vision for better perception of angles, corners, and more (Figure 1).

The Intel® RealSense™ camera provides depth perception and angular information. (Source: Digital Trends) — Figure 1. The Intel^® RealSense^™ camera provides depth perception and angular information. (Source: Digital Trends)

“With Intel RealSense, we are able to use classic computer vision algorithms to enhance images and identify features,” Tsai continued. “And then we infuse that with our LiDAR sensor so we’re not dependent on just one sensor.”

A quad-core Intel^® Core^™ i5-based Linux PC processes sensor data from the LiDAR array and RealSense camera, then applies the SLAM algorithms to these inputs. These algorithms map out the physical space that a robot interacts with down to 5-centimeter accuracy. The software then overlays descriptors that identify characteristics like rooms, corridors, objects, etc.

The SLAM algorithms are extremely memory efficient, allowing thousands of maps to be stored on a robot’s hard drive at any given time. As a result, each robot requires only 4 GB of DDR4 memory.

The SLAM algorithms are extremely memory efficient, allowing thousands of maps to be stored on a robot’s harddrive at any given time. As a result, each robot requires only 4 GB of DDR4 memory.

Human Interaction with Facial Recognition and AI

The second compute subsystem runs all of the applications necessary for interaction with humans, including facial recognition, voice detection, chatbots, and a touchscreen UI. It is based on a Windows PC that leverages a quad-core Intel^® Pentium^® N4200 CPU and runs convolutional neural network (CNN) algorithms developed using the Intel^® OpenVINO^™ Toolkit (Video 2).

Video 2. Robots use the Intel^® OpenVINO^™ Toolkit algorithms to detect human faces and emotions. (Source: Omar Lam Demonstration)

OpenVINO helped New Era AI engineers optimize algorithms for execution on the the Pentium processor, which contains an integrated Intel^® HD Graphics 505 GPU. This delivers enough throughput for images captured by the RealSense camera to be processed in real time. It also opens up a range of critical facial recognition functions.

The OpenVINO-optimized algorithms not only help the robots detect humans, they are even used to analyze age, gender, and emotion. With this information—collected as anonymized metadata—robot operators can determine what demographic is most likely to interact with the robot, where, and for how long. In a retail or hospitality setting, for instance, these analytics can be used to maximize sales or improve customer service.

And thanks to local connectivity provided by the Windows PC, new algorithms, chatbots, and other software can be updated over time.

More Realistic Robots

The engineers at New Era AI Robotic continue integrating technologies that will make interacting with their robot platforms a more natural, human-like experience.

For instance, next-generation designs may leverage Intel^® Movidius^™ vision processing units (VPUs) and/or the Intel^® Neural Compute Stick, in conjunction with more advanced OpenVINO algorithms. This technology stack could have significant implications for the platform, enabling simultaneous multi-person communications, localized natural language processing (NLP), and even improved image throughput and resolution for more granular mapping and navigation.

While intelligent robots are not yet capable of being intergalactic companions, they are leaps and bounds ahead of anything that was available just a few short years ago. They also offer a glimpse of the integrated human/robot society we can look forward to in the years and decades to come.