Multiple Linear Regression

Business Analysis Module User's Guide
Rogue Wave web site: Home Page | Main Documentation Page

3.2 Multiple Linear Regression

In the late 1880s, Francis Galton was studying the inheritance of physical characteristics. In particular, he wondered if he could predict a boy's adult height based on the height of his father. Galton hypothesized that the taller the father, the taller the son would be. He plotted the heights of fathers and the heights of their sons for a number of father-son pairs, then tried to fit a straight line through the data. If we denote the son's height by H_S and the father's height by H_F, we can say that in mathematical terms, Galton wanted to determine constants β₀ and β₁ such that:

H_S= β₀ + β₁H_F

This is an example of a simple linear regression problem with a single predictor variable, H_F. The parameter β₀ is called the intercept parameter. In general, a regression problem may consist of several predictor variables. Thus the multiple linear regression problem may be stated as follows:

Let Y be a random variable that can be expressed in the form:

Y = β₀ + β₁x₁ + ... + β_{p – 1} + ε

where x₁, x₂, ... , x_{p – 1} are known constants, and ε is a fluctuation error. The problem is to estimate the parameters β_j. If the x_j are varied and the n values Y₁, Y₂, ..., Y_n of Y are observed, then we write:

Y_i = β₀ + β₁x_i1 + ... + β_{p – 1}x_{i, p – 1} + ε_i (i = 1, 2, ..., n)

where x_ij is the i^th value of x_j. Writing these n equations in matrix form we have:

or:

Y = Xβ + ε

where x₁₀ = x₂₀ = ... = x_n0 = 1

We call the matrix X the regression matrix, each Y_i a response variable, Y the response vector, and x_j the predictor variable.

3.2.1 Parameter Calculation by Least Squares Minimization

The method of least squares consists of minimizing with respect to β. Setting θ = Xβ, we minimize:

subject to:

Let be the least squares estimate of β. The fitted regression is denoted by:

The elements of are called the residuals. The value of:

is called the residual sum of squares. The matrix:

which is the regression matrix without the first column of 1s, is called the predictor data matrix.

3.2.2 Model Variance

The variance of the model is defined to be the variance of ε. The statistic:

is an unbiased estimator of this variance.

3.2.3 Parameter Dispersion (Variance-Covariance) Matrix

The dispersion matrix for the parameter estimates is the matrix , where is the covariance of and . The dispersion matrix is calculated according to the formula where S² is the estimated variance, as defined above, and X and are the regression matrix and its transpose, respectively.

3.2.4 Significance of the Model (Overall F Statistic)

The overall F statistic is a statistic for testing the null hypothesis β₁ = β₂ = ... = β_{p – 1} = 0. It is defined by the equation:

where

This statistic follows an F distribution with (p-1) and (n-p) degrees of freedom.

3.2.4.1 p-Value

The p-value is the probability of seeing the value of the F statistic for a given linear regression if the null hypothesis:

β₀ = β₁ = ... = β_{p – 1} = 0

is true.

3.2.4.2 Critical Value

The critical value of the F statistic for a specified significance level, α , is the value, v, of the F statistic such that if the F statistic calculated for the multiple linear regression is greater than v, we reject the hypothesis β₁ = β₂ = ... = β_{p – 1} = 0 at the significance level α.

3.2.5 Significance of Predictor Variables

Let be the estimate for element j of the parameter vector β. The T statistic for the parameter estimate is a statistic for testing the hypothesis that

It is calculated according to the formula:

where is the j^th diagonal element of the dispersion matrix. This statistic is assumed to follow a T distribution with n – p degrees of freedom.

3.2.5.1 p-Values

The p-value for each parameter estimate is the probability of seeing the value of the calculated parameter using the formula in Section 3.2.5 if the hypothesis β_j = 0 is true.

3.2.5.2 Critical Values

The critical value of a parameter T statistic for a given level of significance α is the value v_j, such that if the absolute value of the T statistic calculated for a given parameter is greater than v_j, we reject the hypothesis β_j = 0 at the significance level α.

3.2.6 Prediction Intervals

Suppose that we have calculated parameter estimates for our linear regression problem. Suppose further that we have a vector of values, x, for the predictor variables. We may obtain an α level confidence interval for the value , which is the value of the dependent of the observed variable predicted by our model, according to the formula:

where t(n – p;α/2) is the value at α/2 of the cumulative distribution function for a T distribution, S is the estimated variance, and X is the regression matrix.

The Rogue Wave name and logo, and SourcePro, are registered trademarks of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.