3.1 Simple Linear Regression

Simple linear regression assumes that there is approximately a linear relationship between a predictor variable \(X\) and a quantitative response \(Y\), given by

Here, \(\beta_0\) and \(\beta_1\) are the coefficients representing the intercept and slope terms in the linear model.

3.1.1 Estimating the Coefficients

Let

represent \(n\) observation paris. Our goal is to obtain coefficient estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\) such that \(y_i \approx \hat{\beta}_0 + \hat{\beta}_1x_i\). Let \(\hat{y_i} = \hat{\beta}_0 + \hat{\beta}_1 x_i\) be the prediction for \(Y\) on the \(i\) th value of \(X\). Then \(e_i = y_i - \hat{y}_i\) represents the \(i\) th residual, and we define the residual sum of squares (RSS) as

The least squares approach choose \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimize the RSS. We can show that the minimizers are

where \(\bar{y}\) and \(\bar{x}\) are the sample means.

3.1.2 Assessing the Accuracy of the Coefficient Estimates

Recall that we assume the true relationship between \(X\) and \(Y\) takes the form \(Y = f(X) + \varepsilon\) for some unknown function \(f\), where \(\varepsilon\) is a mean-zero random error term. If \(f\) is to be approximated by a linear function, then we can write this relationship as

The error term may include the error caused by an inappropriate model, missed variables, or measurement error. We typically assume that the error term is independent of \(X\). The model given by eq3-5 defines the population regression line, and the estimates in eq3-4 characterize the least squares line.