Publication Date

2-2006

Comments

UTEP-CS-05-30a.

Published in the

Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems IPMU'06, Paris, France, July 2-7, 2006, pp. 802-809.

Abstract

In many application areas, it is important to detect outliers. The traditional engineering approach to outlier detection is that we start with some "normal" values x1,...,xn, compute the sample average E, the sample standard deviation sigma, and then mark a value x as an outlier if x is outside the k0-sigma interval [E-k0*sigma,E+k0*sigma] (for some pre-selected parameter k0). In real life, we often have only interval ranges [xi-,xi+] for the normal values x1,...,xn. In this case, we only have intervals of possible values for the bounds L=E-k0*sigma and U=E+k0*sigma. We can therefore identify outliers as values that are outside all k0-sigma intervals, i.e., values which are outside the interval [L-,U+]. In general, the problem of computing L- and U+ is NP-hard; a polynomial-time algorithm is known for the case when the measurements are sufficiently accurate, i.e., when "narrowed" intervals do not intersect with each other. In this paper, we use constraint satisfaction to show that we can efficiently compute L- and U+ under a weaker (and more general) condition that neither of the narrowed intervals is a proper subinterval of another narrowed interval.

tr05-30.pdf (86 kB)
original file:UTEP-CS-05-30

Share

COinS