Meet Walker, an intelligent humanoid service robot. Walker can do a variety of tasks, from interacting with people to pouring drinks, transporting goods, wiping off surfaces, and writing notes. The robot can even climb stairs (Video 1).
Video 1. Developed by UBTECH Robotics, Inc., Walker is an intelligent service robot that leverages advanced CV technology.
While many engineering disciplines were needed to bring Walker to life, what makes it exceptional is computer vision. Object detection, facial recognition, target tracking, human pose estimation, and simultaneous localization and mapping (SLAM) algorithms are all essential to Walker’s abilities.
Custom Computer Vision Remains a Challenge
But just because advanced CV technology has become widely available through open-source projects like OpenFace and OpenCV doesn’t mean that it can be used right off the shelf. In fact, attempts to port CV software directly from one context to another while maintaining comparable performance rarely, if ever, succeed.
Take, for example, human pose estimation in a healthcare assistant versus the same function in a customer service representative. In the healthcare use case, the algorithm will likely be optimized to detect falls or incapacitated patients. In customer service, it probably emphasizes what can be inferred about a person’s emotions.
Both instances could even use similar types of neural networks, but the individual applications still dictate significant variations in the final AI architecture.
And the above example doesn’t even consider the dizzying combinations of hardware that best suit a particular use case. These can include different connectivity interfaces and protocols, camera lenses and latencies, processor types and memory configurations, and so on. Nor does it account for the fact that the continued commoditization of hardware actually makes developing unique, specialized computer vision a value-added differentiator.
In short, innovative computer vision requires custom AI models. Unfortunately, this takes time. Usually, a lot of time.
“There are different software frameworks available, a variety of neural network models to build upon, and a multitude of hardware components and peripherals that can be used,” says Christian Lang, Senior Manager of Embedded Solutions at Avnet, Inc. “Designers need months just to evaluate the right hardware and software setup for an application.”
The best way to accelerate CV app dev? Eliminate the time-consuming eval process by using an out-of-the-box platform like @avnet that can be up and running in hours.
Accelerating Application Development
Still, all of the enabling technologies for advanced computer vision already exist. So the time spent evaluating cameras and computing hardware, selecting software development frameworks, and importing models from open-source repositories is really a technology integration challenge that delays the rapid prototyping of custom AI models.
“The best way to accelerate computer vision application development is to eliminate the time-consuming evaluation process by providing out-of-the-box, intelligent, real-time analytics at the edge, which can be up and running within hours,” Lang says.
And Avnet has done this by combining all of the components necessary for an edge-based CV system into a Visual Analytics at the Edge platform—with all the essential hardware and software components needed to rapidly prototype advanced CV models.
The company created a proof-of-concept to demonstrate how its flexible hardware-software solutions stack can be adapted to the requirements of industries ranging from retail and education to industrial automation and public safety. It consists of a complete vision camera setup, application software from Avnet subsidiary Softweb Solutions Inc., and rugged edge AI processor platforms. And it leverages Intel® Neural Compute Stick technology based on eight Intel® Movidius™ Myriad™ X VPUs.
But most of the other magic is delivered courtesy of the Intel® OpenVINO® Toolkit™, an AI model optimization environment that integrates dozens of computer vision software components through a single, intuitive API. It provides the Neural Compute Stick with Open Neural Network Exchange (ONNX) support, which in turn gives developers access to all of the most popular CV development frameworks, such as TensorFlow and Caffe.
Once these software components are integrated into the OpenVINO environment, the toolkit passes them through a model optimizer and inferencing engine to reduce the overall footprint and enhance the performance of CV workloads. And perhaps just as important, the open-source development tool not only automates this process for VPUs but CPUs, GPUs, and FPGAs as well.
As a result, computer vision engineers can begin customizing AI models almost immediately without worrying about whether their go-to-market hardware configuration differs from their prototyping platform.
To date, the platform has been used for the rapid prototyping of custom anonymous behavior and pattern analysis models and real-time posture detection algorithms (Figure 1).
“The innovation is a combination of new technologies in a single platform that takes complex hardware and software down to a level that customers can bring up and test their applications within hours,” Lang explains. “The goal was to take our customers from the typical experience of learning how to do things to the much faster experience of ‘Aha! That’s the way it works!’”
What’s in a Pose?
Vision-first AIs are poised to upend virtually every industry, not to mention our everyday lives. Of course, everyone wants a part of this next tech revolution. Unfortunately, the learning curve that must be overcome to innovate in the world of advanced computer vision is steep.
The way for new market entrants to make up that ground is by refusing to reinvent wheels that already exist, and by borrowing time in the form of work that others have already completed. The real prize will be won by those who create something custom and new.