Calculation Methods for Linear Regression

Given the linear regression model Y = βx + ε, finding the least squares solution

is equivalent to solving the normal equations

. Thus the solution for

is given by:

The Business Analysis Module includes three classes for calculating multiple linear regression parameters: RWLeastSqQRCalc, RWLeastSqQRPvtCalc, and RWLeastSqSVDCalc. The following three sections provide a brief description of the method encapsulated by each class, and its pros and cons.

Class RWLeastSqQRCalc encapsulates the QR method. This method begins by decomposing the regression matrix X into the product of an orthogonal matrix Q and an upper triangular matrix R. The QR representation is then substituted into the equation in “Calculation Methods for Linear Regression” to obtain the solution


Pros:	Good performance. Parameter values are recalculated very quickly when adding or removing predictor variables. Model selection performance is best with this calculation method.
Cons:	Calculation fails when the regression matrix X has less than full rank. (A matrix has less than full rank if the columns of X are linearly dependent.) Results may not be accurate if X is extremely ill-conditioned.

Class RWLeastSqQRPvtCalc uses essentially the same QR method described in “RWLeastSqQRCalc”, except that the QR decomposition is formed using pivoting.


Pros:	Calculation succeeds for regression matrices of less than full rank. However, calculations fail if the regression matrix contains a column of all 0s.
Cons:	Slower than the straight QR technique described in “RWLeastSqQRCalc.”

Class RWLeastSqSVDCalc employs singular value decomposition (SVD). The method solves the least squares problem by decomposing the regression matrix into the form

, where P is an

matrix consisting of p orthonormalized eigenvectors associated with the p largest eigenvalues of

, Q is a

orthogonal matrix consisting of the orthonormalized eigenvectors of

, and Σ = diag(σ1, σ2, ... , σp) is a

diagonal matrix of singular values of X. This singular value decomposition of X is used to solve the equation in “Calculation Methods for Linear Regression”.


Pros:	Works on matrices of less than full rank. Produces accurate results when X has full rank, but is highly ill-conditioned.
Cons:	Slower than the straight QR technique described in “RWLeastSqQRCalc.”