Module 4: Foundations for inference 1

Overview

Module 4 introduces foundations for statistical inference. In particular, it introduces the concepts of random variables, probability density functions, the law of large numbers, sampling distributions, and the central limit theorem. In addition to introducing these concepts, the Module aims to help you become familiar with their mechanics by applying them in R using simulated data.

Objectives

  • Define random variables and probability density functions (PDFs) as well as explain their usage in statistics
  • Become familiar with some common PDFs and the real-world processes that they model, or describe
  • Explain what the Law of Large Numbers tells us
  • Explain what the Central Limit Theorem says and why it matters

Exercise 2

Exercise 2 will ask you to explain the mechanics of the Central Limit Theorem using simulated data and plots. It is due at 5pm CST on Monday, 2/08. It can be downloaded at the “Files/Assignments” section at Canvas.

And now, the Module

We switch gears somewhat in this module. So far, we have focused on actual, observed data (samples). We now want to begin thinking about how we can use these data to draw inferences about the populations from which they are drawn. Toward that end, we are going to examine some foundational statistical concepts. We start with the concepts of random variables and probability density functions (PDFs).

Start by reading OIS Sections 3.1 and 3.2 on “The Normal Distribution.” Then watch the video on “Random Variables and PDFs.”

If you would like to go into these topics in greater detail, note that the Syllabus has some suggested but optional readings. The readings examine Probability” and “Random Variables and Continuous Distributions” in greater detail. Again, these readings are suggested but optional.

Now watch the video on “The Normal Distribution.”

The Module 4 practice questions (parts 1 and 2) will further familiarize you with PDFs by guiding you through an exploration of some especially common ones using simulated data and plots. In the meantime, before moving on, take a moment to answer a few fundamental conceptual questions:

  • Conceptually, what are random variables?
  • What is their relationship to the observed variables that we have been working with up until now?
  • And, similarly, what are PDFs?
  • What are some of the real-life, chance processes modelled by one or two of the PDFs from the video?

Now read Lane, Introductory Statistics, pp. 300-315. The PDF is available at the “Files/Readings” section at Canvas.

Now watch the parts 1 and 2 of the video on the “LLN and CLT”.

The best way to really grasp the LLN and CLT is to first become very familiar with their mechanics. We are going to do this in a couple of ways. First, we will spend some time working with a java-based simulation of the Cental Limit Theorem at http://onlinestatbook.com/stat_sim/sampling_dist/. Second, in Exercise 2, we will explore the CLT using our own simulated data and graphics as well as teach the concept to the general public via a very brief article.

Watch the “CLT demo” below, in which I briefly demonstrate the usage of the simulation at onlinestatboook.com.

Note: While working with the wonky population distribution in the video, at roughly 7 mins. 20 secs. , I incorrectly say something like “even now the sampling distribution is beginning to look a bit like a normal distribution.” In fact, the opposite is true: At just 2 SRS of size 10, the sampling distribution looks anything but normally distributed. The point is that it approximates the normal distribution as we take more and more SRS. Sorry if that error caused confusion.

Now go to http://onlinestatbook.com/stat_sim/sampling_dist/. Spend some time playing with the simulation. In particular:

  • Play with parent populations with various distributions. Start with a normally distributed population and then move to a uniformly distributed one. Then draw some really wonky distributions of your own.
  • Play with different sample sizes (n). Start with small samples (e.g., n=5). Then take a bunch of samples and see what happens to the sampling distribution of your sample statistic. Do it again, this time increasing your sample size. What happens to the sampling distribution of your sampling statistic as you change the sample size?
  • Start slow. Use the “animated” button to draw one sample at a time as you get started. Then, gradually speed up the sampling process, taking 5 samples at a time, and so forth. What happens to the mean and standard deviation of the sampling distribution as you complete more repetitions?
  • Compare sampling distributions. Set both of the lower two sets of axes to generate sampling distributions of the sample mean. But, designate different sample sizes for the two plots. Then, compare the two sampling distributions as you draw more and more samples. How does the sample size affect the resulting distribution?

After you have spent some time with the online simulation, complete the Module 4 practice questions. As in past Modules, I have split the practice questions into two parts in order to ease the knitting process.

Once you have completed the practice questions, be sure to complete Exercise 2 on the Central Limit Theorem. As always, please collaborate and deliberate with each other on Ed Discussions. I will continue to monitor Ed Discussions regularly to clarify, guide, and help to resolve unanswered questions.