Module 8: Multivariate regression

Overview

Module 8 builds directly from the concepts that we examined in Module 7 on bivariate regression. It shows how omitted variable bias can undermine our regression results and demonstrates how multivariate regression, when well done, can help to mitigate bias. It lays out the logic of multivariate regression and helps students to begin running, interpreting, and using multivariate regression for prediction.

Objectives

  • Explain what omitted variable bias is as well as how multivariate regression can help us to mitigate it.
  • Build and run multivariate regression models in R.
    • Explain the logic behind your IV, DV, and controls.
    • Interpret regression results.
    • Present regression results in publishable tables.
    • Use a regression model for prediction.

Assignment

Exercise 4 is due by 11:59pm CST on 3/08. Be sure to download the RMD for Exercise 4 from Canvas and review it early on in the week so that you can plan your time accordingly.

Module: Multivariate regression

Omitted variable bias

First, read OIS Sections 8.1 and 8.2 on “Multiple Regression”. Then watch the video series on “Multivariate Regression.” As with last week’s Module, I have divided what would normally be a single lecture into a series of short videos on different subtopics within multivariate regression.

Begin by watching the video on “Omitted variable bias: Intuition.”

Now watch the video on “Omitted variable bias: The mechanics.” Note that this videos uses maths to demonstrate the intuition that I laid out in the first video. The maths are limited to linear algebra, so I encourage you to follow along with the different steps as best a possible. I happen to think that this is one instance in which the maths do help us toward a better grasp of the concept.

Multivariate regression

Now that we have examined OVB, how can multivariate regression help to mitigate it? We examine that question in the next video. Note that in this videos and those that follow, I will walk you through some examples using the CASchools data that are contained in AER package in R. You can follow along. Simply use the code below to install and load the data as well as create our dependent and independent variables:

# Load the data
require(AER)
data(CASchools)

# Dependent variable
CASchools$score <- (CASchools$read + CASchools$math)/2

# Independent variable
CASchools$STR <- CASchools$students/CASchools$teachers

# Our simple model
model <- lm(score ~ STR, data = CASchools)

Now watch the videos on “Multivariate regression: The logic” and “Multivariate regression: Prediction”.

One of the last points I make in the video on “Prediction” is that we should avoid so-called “kitchen sink” or “garbage can” regression models that contain lots of control variables not rooted in good theory. So what criteria should guide our control strategy? Let’s see what other scholars suggest as to an answer to this question.

First read Chapter 4 from John Martin’s book on “Thinking through statistics.” What strategies does the author suggest? What are some of the potential pitfalls that potentially crop up as we embark on a control strategy, and how do we avoid them?

How do Martin’s arguments play out in actual scholarship? Earlier in the quarter, we read the first portion of Koch and Nicholson’s article on “Death and Turnout.” You should now read the remainder of the article. Here are some questions to guide your reading and prepare you for the practice questions and exercise:

  • What is the authors’ dependent variable?
  • What is their main independent variable?
  • What is the expected direction of the relationship between IV and DV?
  • What are some variables that may confound this relationship?
  • What variables do the authors control for and why? (Focus on their discussion of Table 4 on p. 942).
  • What do you think about these controls? Do you buy the authors’ reasoning? Why or why not?
  • Finally, pay close attention to the authors’ interpretation of their results. Note the specific language they use to discuss statistical and substantive significance. (As above, focus on their discussion of Table 4).

Practice questions and exercise

Now complete the Module 8 practice questions. These are short and straightforward; I have done much of the coding for you as a means of walking you through running and presenting multivariate regression in R. But note that Exercise 4 builds directly from the practice questions, so be sure to work through them.

Once you have completed the practice questions, be sure to download and complete Exercise 4.