Towards a Trustworthy Pervasive Sensing Substrate with Provenance-based Sensor Data Analytics

Rich high-frequency multi-modal sensor data streams, continually captured by mobile, embedded and human sensors and processed by machine learning algorithms, are revolutionizing a range of scientific, engineering, and humanities disciplines. Innovative applications in domains such as precision medicine, energy and water management, and smart cities seek to provide new insights and trigger just-in-time interventions. Software tools and cloud services tailored to collection, transport, storage, processing, and visualization of sensory data are available.

Clearly, there has been considerable progress towards the vision of a multi-tenant pervasive substrate providing sensing as a service for applications that need awareness of the state of the natural, engineered, and social world around us. Yet, a significant challenge remains: the trust that consumers and producers of sensory data can place in this emerging pervasive sensing substrate. With diverse sensors deployed out in the wild, and sensory information traversing multiple entities along the data-to-decision pathway, decisions makers who make use of sensory data face the problem of uncertain and variable data quality, and a lack of visibility into necessary contextual information that would help explain the data and its quality. Likewise, potential data contributors with privacy concerns face the uncertainty of how their sensor data is managed downstream.

A key to addressing both these problems is to have metadata accompanying the sensor measurement values so that one the one hand downstream users gain visibility into quality and provenance of the data, and on the other hand upstream users can exercise control over how the data is handled downstream. The cyberinfrastructure underlying the pervasive sensing substrate must therefore provide run-time support for efficiently capturing, representing, propagating, querying, and reasoning about metadata relating to quality, provenance, and usage constraints associated with the sensor measurements. Furthermore, the sensor processing software must be designed so that they also derive the metadata associated with the output values they produce, taking into account not only the input values but also the input metadata.

The talk will present ideas towards architecting a sensor cyber-infrastructure that incorporates a metadata framework with the aforementioned characteristics. Across two recently funded NSF projects, mProv (http://mprov.md2k.org) and MetroInsight (http://metroinsight.io), we are working towards developing such sensor cyber-infrastructures targeting mHealth and urban area sensing application domains respectively. In these systems, the multimodal high-frequency real-time sensor data streams would not only carry sensor measurement values but also metadata relating to quality, provenance, and usage policy so that knowledge discovery and decision making can be done robustly and responsibly.

Keynote speaker: Mani Srivastava

Mani Srivastava is on the faculty at UCLA where he is associated with the EE Department with a joint appointment in the CS Department. His research is broadly in the area of networked human-cyber-physical systems, and spans problems across the entire spectrum of applications, architectures, algorithms, and technologies. His current interests include issues of energy efficiency, privacy and security, data quality, and variability in the context of systems and applications for mHealth and sustainable buildings. He is a Fellow of the IEEE.

Latest News

Important Dates

Paper submission
November 11, 2016
December 6, 2016

Acceptance notification
December 23, 2016

Camera-Ready due
January 13, 2017

Author registration due
January 13, 2017

Workshop date
March 17, 2017