Module 6: What kind of data analysis 2


Module 6 is the second part of a 2-part series on data analysis. Whereas Module 5 focused mainly on assessing the credibility and relevance of data, Module 6 focuses on developing an analysis plan. We examine how a good analysis plan is linked to argument as well as how this plays out in the reading on Queens by Dube and Harish. We then examine some key considerations in developing an analysis plan.


  • Explain how data analysis is linked to argument.
  • Examine the three main “links” of an analysis plan and how they crop up in social science research.
  • Map out an analysis plan based on your argument, focal relationship variables’ type, and data structure.
  • Use your own data to create 1-2 visualizations that motivate your quetstion or support your argument.


  • Thu/Fri @ 8pm. Post your Q & A about the week’s reading at Ed Discussion.

Module 6

Review of Module 5

To get started, watch the video on “Module 5 review.”

Analysis plans as argument

Whereas Module 5 focused on assessing the credibility and relevance of our data, Module 6 focuses on analysis plans: How to conceptualize them, what they look like in practice, and how we can about developing them.

A running theme throughout the Module is that good data analysis is an extension of an argument: It should be tightly linked to the mechanisms that we lay out in our argument. The next video introduces this idea.

Now watch the video on “Analysis plans as argument.”

Anatomy of a plan: “Queens”

How do these three “links” play out in practice? What do they look like in scholarship? In a moment, I will ask you to read “Queens”, by Dube and Harish. I’ve selected this article for two reasons. For one, it tackles a fascinating question: Are states ruled by women less prone to conflict than those ruled by men? In addition, the article nicely exemplifies the three links that we examined above. Accordingly, as you read, try to answer the following questions:

  • What are the focal relationship variables?
  • What are the mechanisms that link the variables?
  • What are some observable implications of the mechanisms? Can you think of some that the authors do not address?
  • How is the analysis linked to the mechanisms?

We will tackle some of these questions in the video below, but you should try to answer them for yourself before watching. Now read “Queens” by Dube and Harish. When you have finished reading, watch the video on “Anatomy of a plan: Queens.”

Developing your analysis plan

So far, we have examined how data analysis is linked to argument. But there are a couple of other considerations that we should make as we develop our analysis plans. We briefly examine these considerations in the final video. As you watch, think about your own data in some detail: What type of variables are in your focal relationship? What is the overall structure of your data? What are the units of analysis?

The reason to keep your answers to these questions in mind is that, ultimately, they should inform your analysis. For instance, in Module 5, Papachristos et al. used so-called Poisson regressions rather than OLS because their dependent variable was a “count” variable that took the value 0 for many observations. Meanwhile, in Module 2, Albertus and Deming used fixed effects regression analysis because they were using panel data and worried that omitted country-level variables might otherwise bias their coefficient estimates.

In this vein, upon watching the video, it may be worth reviewing your notes on the readings from past Modules. In particular: What sort of analysis did different authors perform? How does their analysis plan seem to be shaped by data and variable considerations?

Now watch the video on “Developing your analysis plan.”

Stata and R Exercise: Visualizing your data

The Module 6 R exercise is intended to get you thinking about the first 1-2 pieces of your final data analysis, which will very likely to consist of some kind of table, plot, map, or other visualizations. If you are a Stata user, you should complete the exercise in Stata. Stata has a user-friendly interface for generating simple plots such as histograms, scatterplots, and barplots. Note that in order to complete the exercise, you will need to have your data in hand. You do not have to submit your completed exercise.

Before completing the exercise, let’s reflect on the properties of good data visualization by reading “Aesthetics and technique in data graphical design” by Tufte. Go ahead and read the chapter.

When you have finished reading, follow the instructions that I have pasted below. If you are completing the Module exercises in R, note that these instructions are shown in the RMD file:

Exercise instructions

Be sure to read Tufte’s chapter on “Aesthetics and technique in data graphical design” before you complete this exercise. Think about Tufte’s advice about how to create an impactful graphic and try to implement it below. In particular, label your graphics, use nice colors, and tell a story.

A: Univariate description

Create two visualizations of the univariate distribution of your two main variables – that is, your focal relationship variables. Think of this as presenting your data to your readers. Be sure to consider the your variables’ type and select your visualization type accordingly (e.g., don’t create a histogram for an indicator variable; use a barplot instead).

B: Create a bivariate graph

Create a visualization that begins to capture your theory about your focal relationship. That is, create a visualization of the association between your two main variables of interest. Think of this step as presenting your story (argument) to your readers.


Using your data, create the ugliest and most useless graphic you can imagine. What makes it bad / useless?