Module 5: What kind of data analysis? (Part 1)


Module 5 examines data and data analysis. We examine how to evaluate the relevance and credibility of secondary data for our research as well as justify our data for readers. Module 5 also introduces the concept of focal relationships and examines how we can use focal relationships to guide the development of our data analysis plan. We examine how using focal relationships to guide our data analysis plays out in “More coffee, less crime” by Papachristos et al. Finally, the Module introduces regression analysis with panel data.


  • Evaluate the relevance and credibility of secondary data for your independent research; justify your data for readers.
  • Explain the concept of a focal relationship and how it should guide our data analysis (and all other aspects of our research).
  • Grasp the intuition behind fixed effects (FE) regression when analyzing panel data. Implement a simple FE regression in R and/or Stata.


  • Thu/Fri @ 8pm. Post your Q & A about the week’s reading at Ed Discussion.


Recap. of Module 4

Before we dive into Module 5 material, watch the video on “Recap. of Module 4.”

Evaluating and justifying your data

Most of you will analyze secondary data this quarter. Secondary data are data that you do not personally collect via surveys, interviews, and so forth. They are instead collected by someone else for some other study, and you are simply re-purposing them for your own study. You must therefore scrutinize the data: Are they credible and relevant for your purposes? What are their limitations? You must also answer these questions for your readers in your final research resport.

The first video examines how to go about performing this sort of assessment. Watch the video on “Assessing and justifying your data.”

Using focal relationships to develop your analysis plan

Once you have obtained some (relevant and credible) data, you can now begin developing your analysis plan. How should you go about doing this? In the video below, I advocate an approach that centers on so-called “focal relationships.”

However, before you watch the video, you should read “More coffee, less crime” by Papachristos et al. Pay close attention, in particular, to the authors’ analysis. Here are some questions to guide you toward that end:

  • The authors’ analysis proceeds in several steps. How would you describe these different steps?
  • How do the authors familiarize readers with their data (i.e., trends and patterns)?
  • What kind of regression analysis do the authors perform? Read the details and write them down.
  • Why do the authors select this sort of regression? How do they justify their model selection?

Now watch the video on “Developing an analysis plan.”

Fixed effects regression using panel data

In “More coffee, less crime,” the authors analyze panel data. Panel data are basically data in which we have observations on our units over time. Many of you will analyze some type of panel data this quarter, and this raises some unique analytical opportunities and issues. I have therefore created a brief video on how to run so-called “fixed effects” regression when using panel data.

In the video, I use mainly R, but I also include some code for running fixed effects regression in Stata. I note at the outset that the example in the video is drawn from “Econometrics in R,” which is an excellent online resource anyone who is interested. Here is a link ( to the example from which I have drawn.

Watch the video on “Intro. to panel data”. NOTE: On one slide, I incorrectly label the IV and DV. Throughout the regression examples, the correct DV is rate, which is the number of traffic deaths per 10,000 people in a given state-year. The IV is beertax, which is the tax on a case of beer (adjusted for 1988 dollars).

Module 5 R Exercise: Maps in R

This week’s R exercise centers on map making. We have of course just seen some really excellent examples of how maps can help to support a claim in the Papachristos et al. reading. Download the RMD file from Canvas along with the accompanying .xlsx data on chicago_crime. Complete the RMD and knit your file to PDF. Note that in order to complete the exercise, you must first complete the first portion of the Module 4 R exercise. Specifically, you should complete all steps up to the merger and then save your cleaned GDP per capita data as a .csv file. You will load this dataset into R as part of this week’s exercise.