Module 9: OLS Assumptions and Extensions


Module 9 introduces the idea that the OLS estimator is BLUE: It is the best, linear, unbiased estimator available. But this requires that some important assumptions hold. Module 9 thus lays out these assumptions as well as methods for checking for potential violations. Module 9 then lays out some common OLS extensions, including dummy and categorical independent variables and interaction terms.


  • Explain what we mean when we say that the OLS estimator is BLUE.
  • Grasp the intuition behind the core OLS assumptions.
  • Examine and begin checking assumptions in R; explain what violations of different assumptions mean for statistical inference.
  • Incorporate and interpret dummy and categorical independent variables as well as interaction terms in linear regression.


Assignment 4 is due by 11:59pm CST on Wednesday, 3/17. Be sure to download the Assignment 4 RMD and review it early in the week so that you can plan your time accordingly.

The Module

As you work your way through the video lectures, note that I again draw many examples from the “CASchools” dataset that is available upon installing and loading the “AER” package in R. You should feel free to follow along with these examples in R, pausing the video as necessary. To do so, start by running this code, which loads the required package and creates our main IV and DV:

# Load the data

# Dependent variable
CASchools$score <- (CASchools$read + CASchools$math)/2

# Independent variable
CASchools$STR <- CASchools$students/CASchools$teachers

# Our simple model
model <- lm(score ~ STR, data = CASchools)

OLS Assumptions: BLUE

First read OIS Section 8.3 on “Checking Model Assumptions Using Graphs.” Then watch the video lecture series on “OLS Assumptions.”

OLS Assumptions: The Core Assumptions

OLS Assumptions: Checking Assumptions

OLS Extensions

We have so far mainly included numeric variables in our regression models. These are by far the easiest to interpret. But we might also wish include other variable types in our models, including dummy and categorical variables. The main challenges here concern interpretation and procedure. In terms of interpretation, we can’t straightforwardly interpret the coefficient for a dummy independent variable as “a change in Y is linked to a change in Y.” In terms of procedure, we need to be somewhat careful in our syntax whenever we run a regression that includes a categorical independent variable.

Start by reading Kellstedt and Whitten, the Fundamentals of Political Science Research, pp. 202-212. Then watch the brief video series on “OLS Extensions.”

Extensions: Dummy independent variables

As you watch the next video, note that there is an error on my last, which I neglected to correct during recording. Specifically, when you come to the last slide, note the final bullet point: STR should equal 0. Thus, the correct interpretation should read “On average, when STR = 0, we expect schools with Hi_ELL to have test scores around 692.361 – 19.533 = 672.8.”

Extensions: Categorical independent variables

Extensions: Interaction terms