Eliminating duplicates under interval and fuzzy uncertainty: An asymptotically optimal algorithm and its geospatial applications
Geospatial databases generally consist of measurements related to points (or pixels in the case of raster data), lines, and polygons. In recent years, the size and complexity of these databases have increased significantly and they often contain duplicate records, i.e., two or more close records representing the same measurement result. In this thesis, we address the problem of detecting duplicates in a database consisting of point measurements. As a test case, we use a database of measurements of anomalies in the Earth's gravity field that we have compiled. In this thesis, we describe a natural duplicate deletion algorithm and show that it requires (in the worst case) quadratic time; we also propose a new asymptotically optimal O(n · log(n)) algorithm. These two algorithms have been successfully applied to gravity databases. We believe that they will prove to be useful when dealing with many other types of spatial data.* ^ *This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation).^
Torres, Roberto, "Eliminating duplicates under interval and fuzzy uncertainty: An asymptotically optimal algorithm and its geospatial applications" (2003). ETD Collection for University of Texas, El Paso. AAIEP10611.