Fast algorithms for computing odd moments in statistical analysis of privacy-related interval data
In many practical situations, we perform statistical analysis of data about human objects. For example, the main purpose of a census is to enable us to get statistical information about the population: how fast does it grow, what are the trends in population and income change, etc. These trends can be found if we find the mean values of different characteristics, degrees of deviation from these mean values (e.g., variance and higher moments), correlations between different characteristics, etc. Similarly, it is important to find the correlation between the effectiveness of a certain treatment and age, gender, etc., of patients. ^ In all these situations, it is important to maximally preserve the privacy of the people whose data is being processed. One way to preserve this privacy is not to store the actual values of the corresponding quantities, but only to store ranges (intervals). For example, instead of the actual value of the age, we store the range (0 to 10, 10 to 20, etc.). This interval representation solves the privacy problem, but a new problem appears: how to perform statistical analysis of this interval data? Different values xi from the given ranges [xi, xi], in general, lead to different values of the desired statistical characteristic C(x 1,…,xn). It is therefore necessary to find the range C = [C, C] of possible values of the desired characteristic C(x1,…,x n) when xi ∈ [ xi, xi]. ^ Several researchers have developed efficient algorithms for statistical processing under such privacy-related interval uncertainty. Specifically, efficient algorithms have been designed for computing the range C for mean, variance, central moments M2 p, and for the third central moment M 3. ^ Higher odd moments are also important in describing the shape of the empirical distributions. For computing these moments under privacy-related interval uncertainty, no efficient algorithm was previously known. In this thesis, we design a new efficient O (n2) algorithm for computing the range for odd central moments under privacy-related interval uncertainty. ^
Vargas, Jorge Ivan, "Fast algorithms for computing odd moments in statistical analysis of privacy-related interval data" (2007). ETD Collection for University of Texas, El Paso. AAI1445695.