Companies seeking to transform their businesses by deploying large Internet of Things (IoT) sensor networks face challenges in getting Industrial Internet of Things (IoT) applications into production. Intel® IoT Gateway Technology has made it easy to connect these sensors to the cloud, but many organizations struggle to find ways to effectively use data from sensors and other streaming data sources in their business operations. A McKinsey & Company report suggests that better utilization of sensor-based data could deliver improved productivity worth up to $11.1 trillion per year by 2025, but currently in many industries as little as one percent of sensor-based data is actually analyzed. As an industry, we obviously have a long way to go.
One key to unlocking this productivity is improving the handling of “fast data.” Fast data differs from other data in that it is generated in high volume and its value is time-sensitive. Consider big data such as clickstream data, financial ticker data, log aggregation, or sensor data from real-time processes. These fast-moving events can occur thousands to tens of thousands of times per second. If all the system does is dump the fast data into a Hadoop Distributed File System (HDFS), an analytic relational database management system (RDBMS), or flat files, the opportunity to immediately act or deliver an alert is lost.
Some groundbreaking solutions to this dilemma come from Objectivity, a Silicon Valley company serving Global 1000 companies and leading government agencies. Their solutions, combine data from multiple sources in real time to support advanced analytics in law enforcement, cyber intelligence, geoscience, telecommunications, energy, logistics, and other industries implementing IoT applications.
To give developers an even faster ramp up for fast data IoT solutions, Objectivity announced in late 2015 support for the open source Trusted Analytics Platform (TAP) introduced by Intel in 2015 as part of the Intel® IoT Platform. In this post, we first look at Objectivity’s solution, ThingSpan*, and then the significance of its support for TAP.
A Data Fusion Platform for the IoT
ThingSpan is a purpose-built information fusion platform designed to correlate sensor-driven fast data and enterprise big data in real time from multiple sources to drive advanced analytics and insight. It is a massively scalable distributed platform that combines the power of object data modeling technology with the high-performance, parallel processing of Hadoop and Apache Spark to deliver a faster, more effective way of supporting mission-critical applications at big data scale.
While most fusion applications running on Big Data platforms rely on batch processing, which can take from hours to days, ThingSpan-based fusion applications can collect and interpret data in real time, analyzing new information against an organization’s existing data. ThingSpan delivers optimal speed and processing power for ingesting data from a wide variety of structured and unstructured sources and analyzing their relationships in-store to uncover value from time-sensitive data.
ThingSpan uses a hierarchically-structured, object data model to enable organizations to discover new patterns and associations in their big data archives and streaming, real-time data from IoT sensors and devices (Figure 1). As new data is collected, it is immediately analyzed against existing queries, so that there is no delay in taking necessary actions and receiving valuable insights.
Figure 1. The ThingSpan architecture enables organizations to discover new patterns and associations in their big data archives and streaming, real-time data.
ThingSpan organizes data about people, locations, events, and devices, into real-world objects. This allows information about association to be persisted, eliminating the need to constantly join queries across different data tables as required in traditional relational databases, a process which is typically expensive, time-consuming, and inefficient. Combined with native support for major open source initiatives like HDFS, YARN, Spark, and Kafka, ThingSpan delivers the solution that Industrial IoT industry leaders need to gain real-time insight into their data streams and achieve ROI on IoT investment.
ThingSpan’s components include:
- ThingSpan for HDFS, which adapts Hadoop's HDFS environment for simpler, high-speed data processing and analysis
- ThingSpan for Apache Spark, which provides adapters for Spark that enable users to manage Spark DataFrames and convert ThingSpan-collected data to Spark components, such as SQL and MLlib (Figure 2)
- ThingSpan Metadata Store, which enables users to pre-define metadata schemas to define relationships between data, so that these relationships are stored in-memory
- ThingSpan Rest API, which provides a simple interface for defining and managing queries, and transforming data
Figure 2. ThingSpan’s advanced storage layer is fully integrated into SparkSQL to optimize movement of data in and out of the processing layer.
Accelerating Time to Production with TAP
With the announcement of TAP, Objectivity saw the opportunity to help developers and ThingSpan customers accelerate time to production for Industrial IoT applications. TAP is an open source project developed by Intel designed to make it easier and faster for data scientists to deploy big data analytics. Pushing the fusion workflow into TAP enable a dramatic execution performance improvement when enriching big data with fast, streaming data.
TAP makes it easier for developers, data scientists, and system operators to collaborate by providing a shared, flexible environment for advanced analytics in public and private clouds. Its extensible environment combines many open-source components into a single, integrated platform in which capabilities are exposed as easy-to-integrate tools and services rather than having to recreate existing features (Figure 3). This allows data scientists and developers to focus on their areas of expertise and value rather than getting mired in a complex integration project (Figure 4).
Figure 3. TAP’s extensible environment combines open-source components into a single, integrated platform in which capabilities are exposed as easy-to-integrate tools and services.
Data scientists get extensible tools, scalable algorithms, and powerful engines to train and deploy predictive models. Developers get consistent APIs, services, and runtimes to quickly integrate these models into applications. System operators get an integrated stack that they can easily provision in a cloud infrastructure.
Figure 4. TAP helps developers, data scientists, and system operators do their jobs rather get mired in complex system integration.
Faster, Easier Data Fusion for Real-Time Insight
ThingSpan’s object-oriented approach to data fusion and support for TAP provides an excellent solution for making sure time-sensitive information is translated into relevant insight – and action – for greater business value. For even more help, you can find a complete repository of innovative IoT software and hardware solutions at the Intel® Internet of Things Solutions Alliance’s Solution Directory.