Module 5 builds directly from Module 4 by examining how we can measure and report the uncertainty surrounding our estimates of population parameters. In particular, it introduces the concepts of standard error and confidence intervals. It then aims to build your familiarity with these concepts through a set of practice questions that will help you simulate and visualize their core logic.
- Explain what standard error and confidence intervals (CIs) capture, conceptually
- Link these concepts to the CLT and properties of the normal distribution
- Calculate standard error and build CIs for numeric and indicator variables manually
- Use R to calculate standard error, build CIs, and examine the mechanics of these concepts
- Grasp the precise meaning of CIs by building and visualizing 100 CIs around sample means drawn from a simulated population
Assignment 2 is due at 5pm CST on Monday, 2/15. The Assignment 2 RMD file is located at Canvas. Be sure to download and skim through the file early in the week so that you can plan your time accordingly.
Foundations for inference
We cover less new conceptual terrain this week than we did in Module 4. This is partly because we want to focus in this module on linking the different concepts that we have examined so far. In particular, as we examine standard error and confidence intervals below, you should at times pause and try to answer one very important question: “How are these ideas linked to the CLT and/or properties of the Normal distribution?”
Because the CLT and normal distribution are so critical for this week’s module, it’s worth taking a moment to revise them. Toward that end, watch the video on “Foundations for inference 2.”
Standard error and confidence intervals
Now read OIS, Sections 4.1 through 4.2 (pp. 169-179). Note that the Syllabus incorrectly states that you should also read Section 4.3. This is an error; we will read Section 4.3 next week when we begin performing hypothesis tests.
Now watch the videos on “Standard error” and “Confidence intervals”.
Extending confidence intervals to indicator variables
Let’s now see how we might extend the logic of CIS to indicator variables. We have already encountered indicator variables while working with the British Election Study (BES) data in Exercise 1 and Assignment 1: In particular, we created the indicator variable
female, which took the value 1 if the respondent was female and 0 if the respondent was male. Can we apply CIs to this sort of variable?
Think about this for a moment: In Assignment 1, many of you correctly noted that
female is a (nominal) categorical variable, and accordingly, it seems strange to calculate its mean and standard deviation. But many of you also noted that the mean does supply us with some useful information about the variable: namely, the proportion of respondents in the data that are female. This is a good intuition. When we deal with indicator variables, we are – strictly speaking – interested in a proportion rather than an average or central tendency (which are captured by the arithmetic mean). Accordingly, we have to shift our thinking and terminology somewhat. But don’t be thrown by this shift! Although the next video introduces some new parameters (π) and estimators (p-hat) – as well as some math – the concepts themselves are fairly straightforward. For instance, you will find that the formula for calculating the standard deviation of the sample proportion (p-hat) initially looks very similar to the formula for calculating the standard deviation of a numeric variable.
With this in mind, watch the video on “Confidence Intervals: Extensions”.
Note that we did not read the sections from OIS that deal with the sample proportion. If you would like to do so, the relevant section is Section 3.3 (“Geometric distribution”) on pages 141-145. (The following section on the “Binomial distribution” is also a useful.) We will read the section on the sampling distribution of the sample proportion next week (Section 4.5).
Now complete the practice questions, which can be downloaded at the “Files/Practice Questions” section at Canvas. You’ll note that the functions that you used to complete Exercise 2 will be very handy here. In fact, we are going to “prove” the logic of CIs in the very same way that we “proved” the CLT: experimentally, through simulation. The real payoff in using R to simulate our data and build CIs is that we can actually visualize what we mean when we interpret our CIs by saying that “we constructed this CI using a method that produces CIs that contain the true population parameter 95 of 100 times.” Let’s attach some real meaning to this phrase!
Once you have completed the practice questions, complete Assignment 2. You can download the RMD at the “Files/Exercises and Assignments” section at Canvas. Submit your completed PDF via Canvas by 5pm on Monday, 2/15. I repeat my standing admonition that you collaborate with your fellow students to complete the practice questions, exercises, and assignments. Of, course, this does not mean copy-pasting answers or code; it does mean that you should talk about the questions, discuss your answers, and help each other to better understand both the concepts and code.