Specification of data properties to identify anomalies in scientific sensor data

Irbis J Gallegos, University of Texas at El Paso

Abstract

Environmental scientists use advanced sensor technology such as meteorological towers, wireless sensor networks and robotic trams equipped with sensors to perform data collection at remote research sites. Because the amount of environmental sensor data acquired by such instruments is increasing, the ability to evaluate the accuracy of the data at collection time and to check that the instrumentation is operating correctly become critical in order to not lose valuable time and information. The goal of the research is to define a solution, based on software-engineering techniques, to support the scientist's ability to specify data properties that can identify anomalies in scientific sensor data collected by instruments in remote locations. ^ The research effort included deriving a data property categorization from the findings of a literature survey of 15 projects that collected environmental data from sensors and a case study conducted in the Arctic. More than 500 published data properties were manually extracted and analyzed from the surveyed projects and the Arctic case study with scientists, who were collecting hyperspectral data using robotic tram systems. The data property categorization catalogs recurrent data patterns that have been used by scientists. The Specification and Pattern System (SPS) from the software-engineering community was used as a model to develop a system, called Data Specification and Pattern System (D-SPS), to define patterns and scopes of data properties based on the data property categorization. D-SPS provides the foundation for the Data Property Specification (DaProS) tool that can assist scientists in specification of sensor data properties. ^ A series of experiments were conducted in collaboration with experts working with Eddy covariance (EC) data from the Jornada Basin Experimental Range to determine if the approach for specifying data properties is effective for specifying data properties and identifying anomalies in sensor data. A complementary sensor data verification tool was developed to verify the expert-specified data properties over the EC data. The approach successfully identified and distinguished anomalies due to environmental variability events from anomalies due to equipment malfunctioning. In addition, this work also identified key factors that influence the effectiveness of the data anomaly detection process. ^

Subject Area

Environmental Sciences|Computer Science

Recommended Citation

Gallegos, Irbis J, "Specification of data properties to identify anomalies in scientific sensor data" (2011). ETD Collection for University of Texas, El Paso. AAI3490104.
http://digitalcommons.utep.edu/dissertations/AAI3490104

Share

COinS