Date of Award

2016-01-01

Degree Name

Master of Science

Department

Mathematical Sciences

Advisor(s)

Xiaogang Su

Abstract

The regression coecient estimates from ordinary least squares (OLS) have a low probability of being close to the real value when there is a multicollinearity problem in the design matrix. In order to combat this problem, many regularized methods have been introduced. Principal components regression (PCR) is an important analysis tool for dealing with multicollinearity and high-dimensionality. In conventional PCR, the rst step is to change the original predictors to orthogonal principal components (PC's) by a linear transformation. These PC's correspond to the eigenvalues which are sorted in a decreasing order. The next step is to regress the response on a number of the PC's and to compute the model selection criteria such as AIC, BIC, GCV for each model. The nal step is to compare the criteria values and choose the model corresponding to the smallest value. However, the traditional way of doing PCR is quite computationally inecient. Thus, we proposed three competitive models to overcome this problem. In these proposed methods, the number of PC's can be automatically determined. The main idea involves approximation of the indicator threshold function with a smooth sigmoid surrogate function, yet in several dierent ways. In our rst model PTPCR, we used the logistic function with a large xed shape parameter and an undetermined threshold parameter to replace the indicate function. Then the selection criterion can be treated as an objective function for optimization to estimate the threshold parameter. The PC's to be included in the nal model can be obtained by selecting those with eigenvalues greater than the estimated threshold parameter. This reformulation facilitates direct estimation of the best number of PC's, leading to much improved computational eciency. Apart from the PTPCR, we proposed another two models: PTPCR-V1 and PTPCR-V2. In PTPCR-V1, we free the shape parameter in the logistic function. Then we optimized the criterion function with respect to both shape and threshold parameters. PTPCR-V2 is t in a similar manner to PTPCR, except for that the preference order of PC's is now based on the regression coecients in the PTPCR-V2. On the basis of extensive simulation studies, all our three proposed models perform better than PCR. More specically, PTPCR yields a similar predictive performance to PCR yet with a shorter computing time, while PTPCR-V1 and PTPCR-V2 outperform PCR not only in terms of computational eciency, but also in prediction accuracy.

Language

en

Provenance

Received from ProQuest

File Size

75 pages

File Format

application/pdf

Rights Holder

Pei Wang

Share

COinS