Pre-tuned Principal Component Regression and Several Variants

Pei Wang, University of Texas at El Paso

Abstract

The regression coefficient estimates from ordinary least squares (OLS) have a low probability of being close to the real value when there is a multicollinearity problem in the design matrix. In order to combat this problem, many regularized methods have been introduced. Principal components regression (PCR) is an important analysis tool for dealing with multicollinearity and high-dimensionality. In conventional PCR, the first step is to change the original predictors to orthogonal principal components (PC's) by a linear transformation. These PC's correspond to the eigenvalues which are sorted in a decreasing order. The next step is to regress the response on a number of the PC's and to compute the model selection criteria such as AIC, BIC, GCV for each model. The final step is to compare the criteria values and choose the model corresponding to the smallest value. However, the traditional way of doing PCR is quite computationally inefficient. Thus, we proposed three competitive models to overcome this problem. In these proposed methods, the number of PC's can be automatically determined. The main idea involves approximation of the indicator threshold function with a smooth sigmoid surrogate function, yet in several different ways. In our first model PTPCR, we used the logistic function with a large fixed shape parameter and an undetermined threshold parameter to replace the indicate function. Then the selection criterion can be treated as an objective function for optimization to estimate the threshold parameter. The PC's to be included in the final model can be obtained by selecting those with eigenvalues greater than the estimated threshold parameter. This reformulation facilitates direct estimation of the best number of PC's, leading to much improved computational efficiency. Apart from the PTPCR, we proposed another two models: PTPCR-V1 and PTPCR-V2. In PTPCR-V1, we free the shape parameter in the logistic function. Then we optimized the criterion function with respect to both shape and threshold parameters. PTPCR-V2 is fit in a similar manner to PTPCR, except for that the preference order of PC's is now based on the regression coefficients in the PTPCR-V2. On the basis of extensive simulation studies, all our three proposed models perform better than PCR. More specically, PTPCR yields a similar predictive performance to PCR yet with a shorter computing time, while PTPCR-V1 and PTPCR-V2 outperform PCR not only in terms of computational efficiency, but also in prediction accuracy.

Subject Area

Statistics

Recommended Citation

Wang, Pei, "Pre-tuned Principal Component Regression and Several Variants" (2016). ETD Collection for University of Texas, El Paso. AAI10118215.
https://scholarworks.utep.edu/dissertations/AAI10118215

Share

COinS