ATM skimming is a common way for criminals to capture debit and credit card information from unsuspecting bank customers. In fact, it’s become such an issue that a recent report estimated skimming results in losses of roughly $2 billion per year.
ATM skimming technology is so effective because it’s cheap and extremely difficult to spot. The elements of a skimming system usually consist of a card reader inserted into the original card receptacle and a hidden camera or other type of recording device to capture a user’s PIN (Figure 1).
In an attempt to prevent skimming, Italy’s Banca Monte dei Paschi di Siena turned to advanced technology. Initially, bank officials looked into monitoring an RGB camera feed present on each of its ATMs. In theory, this would allow them to identify criminals installing skimming devices and shut down the entire system to protect their customers from theft.
But in practice it presented challenges of its own. Because the RGB cameras were always on, they recorded sensitive customer financial information (such as PIN and card numbers) in addition to any nefarious activity.
Video feeds from the RGB cameras also had to be cleaned and filtered, which required additional edge computing that was not readily available. Finally, the bank had only 10 operators to monitor activity at more than 10,000 ATMs across the country.
An alternative was needed that could reliably detect installation of skimming devices, respect the privacy of bank customers, and reduce the load on operators.
The bank eventually partnered with SECO, which developed a novel security solution called ATMSense.
The “secret sauce” of @SECO_spa ATMSense? Gesture recognition algorithms that analyze sequences of frames over time.
Computer Vision and the Anatomy of ATM Monitoring
ATMSense is an aftermarket intelligent vision system that can be deployed on existing ATMs, thereby minimizing infrastructure costs. The system is based on an Intel® RealSense™ depth camera, gesture recognition algorithms, and SECO’s SBC-A80-eNUC single-board computer that features a 1.6 GHz Intel® Pentium® N3710 processor and 4 GB of RAM (Figure 2).
The key differentiator between ATMSense and other ATM monitoring solutions is that the RealSense depth camera uses infrared technology – as opposed to a typical RGB video feed – to create depth maps that obscure any personally identifiable information (Figure 3). The cameras are also positioned such that they do not capture interactions with an ATM keypad interface, only the general actions and behaviors of ATM users.
The RealSense camera performs some initial processing on captured video feeds to remove unwanted noise, then passes the depth maps on to the SBC-A80-eNUC. There, further image processing occurs on the Pentium processor before gesture recognition algorithms are applied to the maps. In all, image processing and neural net prediction require just 40 ms, helping reduce power consumption.
If the neural network algorithms identify suspicious activity such as the installation of a skimming device, the ATMSense system activates the RealSense camera’s RGB sensor and issues automated alerts to operators at the bank’s monitoring facility over an LTE or Ethernet connection. This allows operators to view interactions with the ATM in real time and determine what action should be taken.
There are also speakers in the camera, so if ATM users are in trouble, they can be informed that the police are on their way to help.
Neural Network Algorithms and Gesture Recognition
The “secret sauce” of the ATMSense solution is, of course, the gesture recognition algorithms that analyze sequences of frames over time. Called the “Mona” gesture recognition algorithm, it contains three convolution layers, two long-short-term layers, and three additional neural network layers.
But surprisingly, SECO engineers were able to train these algorithms using just 25,000 image frames recorded during fake withdrawals made by university students on an ATM in a lab.
“We were surprised by the small range of data needed to develop the model,” said Antonio Rizzo, Scientific Coordinator of SECO’s UDOO project. “Over 200 university students carried out different kinds of malicious activities, including ATM skimming and ATM bombing. The prediction accuracy was over 93 percent.”
Once the algorithm had been refined, a production-ready version of the ATMSense system was deployed on a trial basis at the main branch of the Monte dei Paschi di Siena bank. The SECO monitoring solution was integrated very easily by simply inserting the SBC and RealSense camera behind metal plates on existing ATMs, which helped lower the overall solution cost.
Over the course of four weeks in 2019, the system recorded more than 10,000 ATM transactions, which included another 200 faux skimming device installations performed again by university students. This resulted in a huge gesture-recognition data set that SECO—with permission from Monte dei Paschi di Siena—is considering opening up to the engineering community to help advance gesture recognition research and development.
Cybersecurity and the ATM Arms Race
Technology is more ubiquitous, affordable, and accessible than ever before. Unfortunately, this has become a universal truth for both the good guys and criminals. The main difference is that security professionals have to protect their entire infrastructure all of the time, but bad actors need to find only one exploitable vulnerability.
Given the low cost and automated nature of security systems like ATMSense, bank officials and their customers may once again be able to swipe their ATM cards without thinking twice.
Monte dei Paschi di Siena and the Italian government are currently analyzing findings from last year’s four-week trial with the hope of being able to deploy the SECO solution at ATMs across the country.