Linear models are an important fundamental topic in experimental design, since the form of the model that we select dictates what experiments are run.
Here, we cover linear regression for the purpose of building empirical models.
Linear regression is simplest form of model
Fitting outputs/responses Y to inputs X
Selecting independent variables
Part I - basic regression
$$ Y_i = \beta_0 + \beta_1 x_i + \epsilon_i $$
$Y_i$ is the ith trail, repsonse value. $X_i$ is the known constant input variable with the trail. The beta values are parameters. The term $\epsilon_i$ is a random error term, normally distributed, 0 mean, variance $\sigma^2$
A simple linear first order model:
Model features:
$$ E(Y_i) = E(\beta_0 + \beta_1 X_i + \epsilon_i) $$
which reduces to:
$$ E(Y_i) = \beta_0 + \beta_1 X_i $$
regression function is linear function.
Error terms are assuemd to have constant variance, $V(Y_i) = \sigma^2$
beta0 and beta1 are regression coefficients
$\beta_1$ is the slope - it inicats the change int he mean of the probability distribution of Y per unit X increase
Alternative version ofm odel: if X written as dummy variable plus $X_i$, then the mode becomes (let $X_0 = $):
$$ Y_i = \beta_0 X_0 + \beta_1 X_1 + \epsilon_i \\ Y_i = \beta_0 + \beta_1 \left( X_i - \overline{X} \right) + \beta_1 \overline{X} + \epsilon_i \\ = \left( \beta_0 + \beta_1 \overline{X} \right) + \beta_1 \left( X_i - \overline{X} \right) + \epsilon_i $$
This can be written in terms of the slightly modiied $\beta$, namely $\beta^{\star}$:
$$ Y_i = \beta_0^{\star} + \beta_1 \left( X_i - \overline{X} \right) + \epsilon_i $$
Considering observations from N trials, ith trial we write:
$$ Y_i = \left( \beta_0 + \beta_i X_i \right) $$
Method of least squars minimize the square of $Q$, where $Q$ is:
$$ Q = \sum_{i=1}^{N} \left( Y_i - \beta_0 - \beta_1 X_i \right)^2 $$
The reesults areestimators of $\beta_0, \beta_1$, namely, $b_0, b_1$.
Least squares estimators: start with normal equations:
$$ \sum Y_i = n b_0 + b_1 \sum X_i \\ \sum X_i Y_i = b_0 \sum X_i + b_1 \sum X_i^2 $$
To obtain $b_0, b_1$ directly,
$$ b_1 = \dfrac{ \sum_i X_i Y_i - \dfrac{ ( \sum_i X_i )( \sum_i Y_i ) }{n} }{ \sum_i X_i^2 - \dfrac{ (\sum X_i)^2 }{ n } } $$
and the other coefficient is:
$$ b_0 = \dfrac{1}{n} \left( \sum_I Y_i - b_1 \sum_i X_i \right) = \overline{Y} - b_1 \overline{X} $$
Note: to derive, use calculus. Set derivative of $\dfrac{\partial Q}{\partial}$ to 0.
$\hat{Y}_i$ is a fitted value (using model)
Regression model:
$$ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i $$
Alternate version in terms of $\beta^{\star}$:
$$ Y_i = \beta_0^{\star} + \beta_1 ( X_i - \overline{X}) + \epsilon_i $$
Now the coefficients are determined by (note: compare to above...):
$$ b_1 = \dfrac{ \sum (X_i - \overline{X})(Y_i -\overline{Y}) }{ \sum (X_i - \overline{X})^2 } $$
and
$$ b_0 = \overline{Y} - b_1 \overline{X} $$
Due to Gauss Markov theorem, least squares estimators are unbiased - they neither overpredict nor underpredict.
Furthermore, they have minimum variance among all unbiased linear estimators
So using this informaiton, we say the fitted value is:
Using linear regression model,
$$ \hat{Y}_i = b_0 + b_1 x_i $$
or, using an alternate model,
$$ \hat{Y}_i = \overline{Y} + b_1 (X_i - \overline{X}) $$
The residual $e_i$ can be written as the difference between the actual and the predicted values:
$$ e_i = Y_i - \hat{Y}_i = Y_i - b_0 - b_1 X_i $$
Sum of residuals is 0:
$$ \sum_i e_i = 0 $$
Sum of squared residuals is minimized
(Plotting quantity Q vs. $b_0$ or $b_1$ would show minimum)
Sum of observed values $Y_i$ equals sum of fitted values $\hat{Y}_i$:
$$ \sum_i Y_i = \sum_i \hat{Y}_i $$
Sum of weighted residuals is zero when weighted by level:
$$ \sum_i X_i e_i = 0 $$
Sum of weighted residuals is zero when residual is weighted by FITTED response:
$$ \sum_i \hat{Y}_i e_i = 0 $$
The regression line always passes through the point $(\overline{X}, \overline{Y})$.
For a regression line, we are drawing samples from a distribution. The fewer samples we have, the less representative our estimates of the distribution will be. Each degree of freedom is used to improve an estimate of a parameter, whether it be the mean, the variance, or the coefficients of a linear model.
Suppose we have a set of samples of a given distribution, and we wish to estiamte the variance in the samples.
Each sample drawn from the distribution gives one degree of freedom (DOF). One DOF will give the mean $\hat{Y}$. The remainder can give an estimate of the variance in the population. These two quantities will characterize the distribution if it is normal.
The real variance $\sigma^2$ is estimated with $s^2$:
$$ s^2 = \dfrac{ \sum (Y_i - \overline{Y}) }{ n-1 } $$
Here, $n$ is the number of samples, giving $n$ degrees of freedom. One degree of freedom estimates the mean, so $n-1$ is the number of degrees of freedom left to estimate the variance.
Alternatively, we saw how to assemble an unbiased linear estimator based on $b_0$ and $b_1$.
Using a linear regression model, we can get the sum of squared errors via:
$$ SSE = \sum_i e_i $$
which then gives us MSE:
$$ MSE = \dfrac{SSE}{n-2} = \dfrac{ \sum_i e_i }{n-2} $$
The mean squared error MSE is an unbiased estimator of $\sigma^2$, and it also satisfies the property:
$$ E(MSE) = \sigma^2 $$
Now, we can go one setp further: specify a model for $\epsilon_i$, assume errors are normally distributed. Use the functional form of a normal distribution (Gaussian distribution) to get a maximum likelihood.
Unbiased estimator:
$$ MSE = \dfrac{n}{n-2} \hat{\sigma}^2 $$
Maximum likelihood estimator:
$$ MSE = \dfrac{1}{n} \left( \sum_i (Y_i - \hat{Y}_i )^2 \right) $$
Difference? $\frac{1}{n-2}$ versus $\frac{1}{n}$.