Publication Date



Technical Report: UTEP-CS-07-25

Published in: Marek Reformat and Michael R. Berthold (eds.), Proceedings of the 26th International Conference of the North American Fuzzy Information Processing Society NAFIPS'2007, San Diego, California, June 24-27, 2007, pp. 554-559.


Geospatial databases generally consist of measurements related to points (or pixels in the case of raster data), lines, and polygons. In recent years, the size and complexity of these databases have increased significantly and they often contain duplicate records, i.e., two or more close records representing the same measurement result. In this paper, we address the problem of detecting duplicates in a database consisting of point measurements. As a test case, we use a database of measurements of anomalies in the Earth's gravity field that we have compiled.

In our previous papers, we have proposed a new fast (O(n log(n))) duplication deletion algorithm for the case when closeness of two points (x1,y1) and (x2,y2) is described as closeness of both coordinates. In this paper, we extend this algorithm to the case when closeness is described by an arbitrary metric.

Both algorithms have been successfully applied to gravity databases.