Linear Models Lecture 1: Linear Regression

Overview

Linear models are an important fundamental topic in experimental design, since the form of the model that we select dictates what experiments are run.

Here, we cover linear regression for the purpose of building empirical models.

Linear Regression

Linear regression is simplest form of model

  • Mathematical function (Exact) vs statistical function (approx)

Fitting outputs/responses Y to inputs X

  • Multiple inputs are possible
  • With 2 independent variables x1 and x2, probability distribution Y for each (x1, x2) point is assumed by regression model
  • Relation betwen means of Y(x1, x2) pof and independent variable values x1, x2 is given by a regression surface

Selecting independent variables

  • Variables that reduce varance of Y after effects of variable have been accounted for
  • Importance of variable as causal agent
  • Degree to which response can be obtained accurately, quickly, economically
  • Degree to which variable can be set

Part I - basic regression

  • Focuses on linear models

$$ Y_i = \beta_0 + \beta_1 x_i + \epsilon_i $$

$Y_i$ is the ith trail, repsonse value. $X_i$ is the known constant input variable with the trail. The beta values are parameters. The term $\epsilon_i$ is a random error term, normally distributed, 0 mean, variance $\sigma^2$

A simple linear first order model:

  • Simple - 1 indep var
  • linear - var appears only to 1st power

Model Features

Model features:

  • Observation $Yi$ is sum of model prediction plus random error
  • if expectation of error is 0 (that's what zero mean implies), then

$$ E(Y_i) = E(\beta_0 + \beta_1 X_i + \epsilon_i) $$

which reduces to:

$$ E(Y_i) = \beta_0 + \beta_1 X_i $$

regression function is linear function.

Error terms are assuemd to have constant variance, $V(Y_i) = \sigma^2$

Model Interpretation

beta0 and beta1 are regression coefficients

$\beta_1$ is the slope - it inicats the change int he mean of the probability distribution of Y per unit X increase

Alternative version ofm odel: if X written as dummy variable plus $X_i$, then the mode becomes (let $X_0 = $):

$$ Y_i = \beta_0 X_0 + \beta_1 X_1 + \epsilon_i \\ Y_i = \beta_0 + \beta_1 \left( X_i - \overline{X} \right) + \beta_1 \overline{X} + \epsilon_i \\ = \left( \beta_0 + \beta_1 \overline{X} \right) + \beta_1 \left( X_i - \overline{X} \right) + \epsilon_i $$

This can be written in terms of the slightly modiied $\beta$, namely $\beta^{\star}$:

$$ Y_i = \beta_0^{\star} + \beta_1 \left( X_i - \overline{X} \right) + \epsilon_i $$

Method of Least Squares

Considering observations from N trials, ith trial we write:

$$ Y_i = \left( \beta_0 + \beta_i X_i \right) $$

Method of least squars minimize the square of $Q$, where $Q$ is:

$$ Q = \sum_{i=1}^{N} \left( Y_i - \beta_0 - \beta_1 X_i \right)^2 $$

The reesults areestimators of $\beta_0, \beta_1$, namely, $b_0, b_1$.

Least squares estimators: start with normal equations:

$$ \sum Y_i = n b_0 + b_1 \sum X_i \\ \sum X_i Y_i = b_0 \sum X_i + b_1 \sum X_i^2 $$

To obtain $b_0, b_1$ directly,

$$ b_1 = \dfrac{ \sum_i X_i Y_i - \dfrac{ ( \sum_i X_i )( \sum_i Y_i ) }{n} }{ \sum_i X_i^2 - \dfrac{ (\sum X_i)^2 }{ n } } $$

and the other coefficient is:

$$ b_0 = \dfrac{1}{n} \left( \sum_I Y_i - b_1 \sum_i X_i \right) = \overline{Y} - b_1 \overline{X} $$

Note: to derive, use calculus. Set derivative of $\dfrac{\partial Q}{\partial}$ to 0.

Fitted Model

$\hat{Y}_i$ is a fitted value (using model)

Regression model:

$$ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i $$

Alternate version in terms of $\beta^{\star}$:

$$ Y_i = \beta_0^{\star} + \beta_1 ( X_i - \overline{X}) + \epsilon_i $$

Now the coefficients are determined by (note: compare to above...):

$$ b_1 = \dfrac{ \sum (X_i - \overline{X})(Y_i -\overline{Y}) }{ \sum (X_i - \overline{X})^2 } $$

and

$$ b_0 = \overline{Y} - b_1 \overline{X} $$

Properties of Least Squares Estimators

Due to Gauss Markov theorem, least squares estimators are unbiased - they neither overpredict nor underpredict.

Furthermore, they have minimum variance among all unbiased linear estimators

So using this informaiton, we say the fitted value is:

Using linear regression model,

$$ \hat{Y}_i = b_0 + b_1 x_i $$

or, using an alternate model,

$$ \hat{Y}_i = \overline{Y} + b_1 (X_i - \overline{X}) $$

The residual $e_i$ can be written as the difference between the actual and the predicted values:

$$ e_i = Y_i - \hat{Y}_i = Y_i - b_0 - b_1 X_i $$

Properties of Fitted Regression Lines

Sum of residuals is 0:

$$ \sum_i e_i = 0 $$

Sum of squared residuals is minimized

(Plotting quantity Q vs. $b_0$ or $b_1$ would show minimum)

Sum of observed values $Y_i$ equals sum of fitted values $\hat{Y}_i$:

$$ \sum_i Y_i = \sum_i \hat{Y}_i $$

Sum of weighted residuals is zero when weighted by level:

$$ \sum_i X_i e_i = 0 $$

Sum of weighted residuals is zero when residual is weighted by FITTED response:

$$ \sum_i \hat{Y}_i e_i = 0 $$

The regression line always passes through the point $(\overline{X}, \overline{Y})$.

Estimating Variance

For a regression line, we are drawing samples from a distribution. The fewer samples we have, the less representative our estimates of the distribution will be. Each degree of freedom is used to improve an estimate of a parameter, whether it be the mean, the variance, or the coefficients of a linear model.

Variance without linear model

Suppose we have a set of samples of a given distribution, and we wish to estiamte the variance in the samples.

Each sample drawn from the distribution gives one degree of freedom (DOF). One DOF will give the mean $\hat{Y}$. The remainder can give an estimate of the variance in the population. These two quantities will characterize the distribution if it is normal.

The real variance $\sigma^2$ is estimated with $s^2$:

$$ s^2 = \dfrac{ \sum (Y_i - \overline{Y}) }{ n-1 } $$

Here, $n$ is the number of samples, giving $n$ degrees of freedom. One degree of freedom estimates the mean, so $n-1$ is the number of degrees of freedom left to estimate the variance.

Variance with linear model

Alternatively, we saw how to assemble an unbiased linear estimator based on $b_0$ and $b_1$.

Using a linear regression model, we can get the sum of squared errors via:

$$ SSE = \sum_i e_i $$

which then gives us MSE:

$$ MSE = \dfrac{SSE}{n-2} = \dfrac{ \sum_i e_i }{n-2} $$

The mean squared error MSE is an unbiased estimator of $\sigma^2$, and it also satisfies the property:

$$ E(MSE) = \sigma^2 $$

Now, we can go one setp further: specify a model for $\epsilon_i$, assume errors are normally distributed. Use the functional form of a normal distribution (Gaussian distribution) to get a maximum likelihood.

Unbiased estimator:

$$ MSE = \dfrac{n}{n-2} \hat{\sigma}^2 $$

Maximum likelihood estimator:

$$ MSE = \dfrac{1}{n} \left( \sum_i (Y_i - \hat{Y}_i )^2 \right) $$

Difference? $\frac{1}{n-2}$ versus $\frac{1}{n}$.