How Deep Learning Makes Cities Safer

August 9, 2018

Erik Sherman

Video systems havebecome a cornerstone of citywide safety and security—but these systems are already producing more video than cities can use. Finding staff to monitor video feeds and search archives has been an ongoing challenge. And as more cameras are deployed, the problem is growing.

Artificial intelligence (AI) can help by making it easier to search and analyze video feeds, reducing the need for manual labor. But conventional AI is challenging to deploy, often imposing high costs and long timelines.

“To customize an AI algorithm for a new application or location might take an R&D team four to six months,” said Sean Lin, Product Manager at GeoVision Inc. “And the results can be underwhelming, with too many false alarms and other errors”. He went on to say that “what cities need is an easier way for operators to pinpoint what they are looking for in a critical video feed, instead of searching for a needle in a haystack.”

The arrival of deep-learning solutions is dramatically improving computer vision and video analytics. These systems are more powerful, easier to deploy, and available today.

With deep learning, different models can be trained based on the environmental characteristics where cameras are being set up. Algorithms are fundamentally customized for each situation—without any rewrite.

Huge amounts of video data are a help, not a hindrance. Deep learning can continuously ingest data to adapt to new conditions and requirements.

Deep Learning is Changing the Game

With deep learning, computer vision techniques such as facial recognition or motion detection have become more sophisticated, transforming surveillance and other video applications.

In a controlled environment, traditional algorithms work fine, but they are typically written for specific use cases. Detecting an object or person crossing a predetermined virtual line, for example, is fundamentally a simple yes or no algorithm. The challenge is when this algorithm is implemented in a more complex scenario.

Lin provides clear examples: “When you take a traditional algorithm and apply it to different camera locations—some might be in the park, some might be in the street—these actual environments appear differently in the video feed. Traditional algorithms won’t be able to handle these subtleties.”

“On a busy street, motion detection or intruder alarms may get a lot of false alarms because people are constantly moving around. That’s the limit of a traditional algorithm,” he said.

Another common scenario is face recognition where police have identified a wanted person. “With deep learning, we can basically register this person’s face with just a single picture or video into the database. Then our software can automatically go through all the surveillance recording from a month ago, two months ago, and automatically find that person for the authorities,” said Lin.

He predicts that it will soon be possible to have just a sketch instead of a photo. While recognition accuracy would go down, this would not be at all possible using traditional algorithms.

This is where the GeoVision Inc. Smart Video Management Solution(GV-VMS) comes in, taking the AI model further and allowing for more complicated and intensive analysis. GeoVision deep-learning algorithms can be trained for a multitude of conditions, including:

Counting people or objects moving in two directions
Detecting and recognizing faces for multiple applications
Masking faces for privacy when detected in the video
“Defogging” video taken in murky conditions so it can be seen clearly
Stitching video from multiple cameras into a single panoramic view
Stabilizing video in a vibrating environment
Counting crowds where occupancy codes may be restricted
Removing distortion created by wide-angle lenses
Intelligently searching for an event when an area has motion

An End-to-End Solution

Underlying GeoVision's unique deep-learning functionality is a comprehensive system comprising cameras, recording servers, and a video control center. It connects with GeoVision and third-party IP cameras through standard protocols as shown in Figure 1. This scale is made possible by Intel^® processors that increase video-processing efficiency and deep-learning capability.

Figure 1: GeoVision Smart Video Management System

Based on Intel^® x86 architecture, GV-VMS fully utilizes the Intel^® Core^™ processor. When combined with an implementation of the Intel^® OpenVINO™ toolkit, performance of deep learning-driven video analytics is boosted eight to tenfold. This enables a greater capacity for simultaneous video processing without any additional requirements.

GeoVision cameras have the ability for deep learning at the network’s edge. The cameras can send alerts when something is detected rather than transmitting all the video to a central location for analysis, reducing delay before action can be taken.

Most cities also have legacy video systems with cameras, gateways, and software. GeoVision application programming interfaces (APIs) and a software developer’s kit (SDK) allow connectivity with legacy hardware and software. The GeoVision Control Center provides unified cloud management software, integrating all IP cameras into an overall security and management system.

In one example, Vatican City has been using video surveillance systems for decades. Over the years, this resulted in different cameras, gateways, and software tools from multiple vendors. Working with GeoVision, Vatican City strategically integrated old cameras and software into a central surveillance solution. Cameras in important government buildings, churches, chapels, and intersections are under central control. The GeoVision solution creates a unified system—monitoring video feeds across 140 sites across Rome.

Smart and Scalable

The solution can scale to effectively any level of video use. A single implementation can manage up to 57,600 video streams. The system transfers video data to the unified cloud management system, which can monitor and control more than 1,000 GV-VMS systems. On the back end, big data storage is available in the customer's data center or in the cloud, using Intel processor-based servers.

The Smart Video Management Solution also integrates with other systems, like fire detection or access control–extending its overall functionality. Connected with access control, as an example, the solution could use facial recognition to allow people into restricted areas of any sort, whether limited building or parking access. It can display data from other systems in a centralized window.

By combining deep learning and the ability to integrate with other hardware and software, cities can use solutions like those from GeoVision to improve video surveillance. Deep learning improves automated responses, integration improves operational efficiency, and scalability means that a city won’t outgrow the capabilities of the video system.

“When it comes to a city scenario, a traditional video surveillance solution covers all the basics. But once a project grows to city scale, in just one day you can have thousands of hours of recording. It takes too long and too many people to actually pinpoint something that you’re looking for. The GeoVision solution makes it easier for the operator to isolate exactly who or what they are looking for,” said Lin.