Module 2: Summarizing data, numerically and visually

SOSC 13200-2


Module 2 asks you to engage some fundamental questions: What are data? Cases? Variables? It additionally askes you to grapple with critical issues surrounding measurement, as well as consider how these issues play out in common measures of human behavior and well-being. Finally, the Module draws on the data from Card & Krueger (1994) to explore some important statistical and visual tools for exploring and describing single variables.


  • Define data, cases, and variables
  • Explain the qualities of “good” measurement; explore how measurement issue play out in social science research.
  • Grasp the intuitions behind common measures of central tendency and spread; become familiar with their notation and learn to calculate them in R
  • Explore data in R using summary functions, tables, and plots

To Do

Download, complete, and submit Assignment 2 by 11:5pm on 1/24. The file will be available for download on Tuesday, 1/18. I recommend that you preview the Assignment shortly after it is posted so that you can plan your time accordingly.

…And Now, The Module

What are data?

Start by watching the video on “Data” below.

Variables and measurement

As the video mentions, we will dig into some actual data as we complete the practice questions. And, Assignment 2 is an opportunity to replicate portions of Card & Krueger’s own summary of their data. But first, watch the video on “Variables & Measurement.” Note that the video will occasionally ask you to press “pause” and then spend some time answering some “class questions.” You don’t have to submit your answers to these questions. However, quickly jotting your answers down may be useful, as we may circle back to some of the questions in our next class meeting.

How do the issues surrounding conceptual clarity, validity, and reliability play out in real life? In a moment, you will read “Measuring and Understanding Behavior, Welfare, and Poverty” by Nobel-prize winning economist Angus Deaton. Before you read, take a moment to answer these questions:

  • What measures of human welfare and poverty can you think of?
  • How good do you think these measures are?
  • What are some potential problems with the conceptual clarity, validity, and reliability of these measures?

Now read the article. As you read, think about the different examples that Deaton lays out. What sort of violation does each one exemplify (conceptual clarity, validity, or reliability)? And critically, what is at stake?

Summarizing variables numerically

Before watching the videos below, you should first read the assigned excerpts from the Verzani (Simple R) and Wickham (ggplot2) texts. These will be useful as you engage, and in some cases, follow along with the videos.

You should also read Card & Krueger’s (1994) article on “Minimum Wages and Employment.” Many of you will remember the article from the fall quarter. As we revisit the article now, we are mainly concerned with getting a strong grasp of the authors’ data: What are the units? What sorts of variables do the data contain? How do the authors describe their data, numerically and visually? Accordingly, pay particular attention to the authors’ description of the dataset as well as any tables and figures that summarize the different variables.

Once you have completed the readings above, watch the videos on “Summarizing Variables Numerically” and “Summarizing Variables Visually.”

Let’s get some additional practice with summarizing data in R. On Monday evening, I will post a practice exercise for the week that will guide you through some basic operations that you will later apply on Assignment 2. The practice exercise will be located at Canvas under “Files/Practice Exercises.” Recall that although you should complete the practice exercise, you do not have to submit it to me for a grade. It is for your learning only.

Remember: If you get stuck at any point… breathe. Coding can be frustrating at first, but we will work through it together. There a lots of ways to seek help:

  1. Use the “Help” tab in RStudio
  2. Internet search
  3. Post your question to Ed Discussion
  4. As a final option, email me directly or visit me during my office hours

As you seek help, try to specify the nature of the problem: Examine any warnings or error messages. What line of code seems to be the issue? Which function, specifically? (During knitting, Markdown will often tell you which line of code is stalling the knitting process.) If you are getting error messages, are you missing parentheses, commas, or quotations? (This happens to me all the time.) Answering these questions will help to ensure that you get the help you need.