# Lesson 2

Statistical Questions

## 2.1: Pencils on A Plot (5 minutes)

### Warm-up

The purpose of this warm-up is for students to review how to represent measurements on a dot plot and how to interpret the data.

### Launch

Arrange students in groups of 2. Distribute rulers marked in inches to each group, and ensure each student has a pencil.

Display the large class dot plot prepared before class for all to see and access. Tell students to measure the length of their pencil to the nearest $$\frac14$$ inch and record their measurement as a dot on the class dot plot. Give each student a dot sticker as a way to record their measurement.

When the class data is recorded, give students 1 minute of quiet work time. Then, ask partners to briefly share their responses and follow with a whole-class discussion.

### Student Facing

1. Measure your pencil to the nearest $$\frac14$$-inch. Then, plot your measurement on the class dot plot.
2. What is the difference between the longest and shortest pencil lengths in the class?
3. What is the most common pencil length?
4. Find the difference in lengths between the most common length and the shortest pencil.

### Anticipated Misconceptions

Some students may struggle with subtracting the shortest pencil length from the longest. Ask if they could use the horizontal axis to find the difference (e.g., by adding up from the shorter length to the longer one).

### Activity Synthesis

The purpose of the discussion is for students to recognize the usefulness of the dot plot structure.

Ask a student to share their responses for each of the questions. Record and display their reasoning for all to see. After the student shares, ask the class if they agree or disagree and why. Some discussion may arise about the interpretation of the most common pencil size. It is ok to allow some ambiguity at this time.

To involve more students in the conversation, consider asking some of the following questions:

• Who can restate ___’s reasoning in a different way?
• Did anyone have the same response but would explain it differently?
• Did anyone find the difference between the shortest and longest lengths in a different way?
• Does anyone want to add on to _____’s reasoning?

## 2.2: What’s in the Data? (15 minutes)

### Activity

In this activity, students reason abstractly and quantitatively (MP2) about numerical data sets to match them with questions that are likely to produce the data. Along the way, they categorize data sets based on whether more than one different value is present and make use of this structure (MP7) to define variability. They see that some survey questions lead to responses that are expected to vary when posed to different people (e.g., How many books did you read last year?), but others produce responses that are likely to be the same (e.g., What year is it this year?).

As students match questions with data sets, look out for different plausible explanations for their choices. Their matches are reasonable if they can explain why the given data could be responses to a question. Identify one student to share the response to each question. Also notice students who offer different but equally reasonable explanations for the same data set; invite them to share later.

### Launch

Tell students that they will be looking at numerical data sets and thinking about what question could produce the responses in each data set. Emphasize that they need to be able to support their matching decisions with reasonable explanations.

If necessary, guide students to understand how to read the table by asking:

• “What values are included in data set A?” (0, 1, 1, 3, 0, 0, 0, 2, 1, 1)
• “To which data set does the number 8 belong?” (data set C)
• “How many people answered '1' to the question that produced the data for data set A?” (4)

Keep students in groups of 2. Give them 5 minutes of quiet work time and 1–2 minutes to share their responses with their partner.

Representing, Conversing: MLR5 Co-Craft Questions. Display the chart with the five data sets without revealing the questions that follow. Ask pairs of students to create possible survey questions that could lead to these data sets. Then, invite pairs to share their questions with the class. Highlight features of each data set that are described when students share the questions they came up with. This will help students make sense of the data creatively, drawing on their experiences with data, before engaging with the specific language of the questions provided in the task.
Design Principle(s): Support sense-making; Cultivate conversation

### Student Facing

Ten sixth-grade students at a school were each asked five survey questions. Their answers to each question are shown here.

data set A

0

1

1

3

0

0

0

2

1

1

data set B

12

12

12

12

12

12

12

12

12

12

data set C

6

5

7

6

4

5

3

4

6

8

data set D

6

6

6

6

6

6

6

6

6

6

data set E

3

7

9

11

6

4

2

16

6

10

1. Here are the five survey questions. Match each question to a data set that could represent the students’ answers. Explain your reasoning.

• Question 1: Flip a coin 10 times. How many heads did you get?

• Question 2: How many books did you read in the last year?

• Question 3: What grade are you in?

• Question 4: How many dogs and cats do you have?

• Question 5: How many inches are in 1 foot?

2. How are survey questions 3 and 5 different from the other questions?

### Anticipated Misconceptions

Some students may have trouble matching questions and data sets because they do not attend carefully to the range of possible solutions. For example, they may not notice that a data set with “11” as a data value cannot be a response to the first question about flipping a coin 10 times. Ask them to study the questions and data values more closely, and to look for values that seem unlikely or impossible for a given context.

### Activity Synthesis

The purpose of this discussion is for students to define variability and recognize when it is present.

Select previously identified students to share their choices and explanations. Briefly poll the class after each explanation to see if others made the same choice for the same reason. If not, invite students with different explanations to share.

Discuss how the question about grade level and the one about number of inches in a foot are different from the others. If not mentioned by students, highlight the idea of variability. Explain that we use the term variability to describe data sets in which not every data value is the same. Data sets B and D are unlike the other sets because they show no variability. In future lessons, we will look deeper into the concept of variability and what it can tell us about the data we have collected.

## 2.3: What Makes a Statistical Question? (15 minutes)

### Activity

In the previous activities, students made sense of data sets contextually and reasoned about possible questions that could produce them. They also looked at variability in data sets and contrasted data with and without variability. Here they use both the experience of reasoning about questions and the idea of variability to define statistical questions.

From their work in earlier grades, students are familiar with the idea that some questions can be answered by collecting data (e.g., “How many students in our class likes ice cream?”). In this activity, students learn that a statistical question is one that can be answered by using data in which variability is expected.

For example, the question, “What is the favorite subject of students in my class?” is a statistical question because we need data about favorite subjects and we can expect students to have different preferences. The question, “What is the counselor's favorite subject?” is not a statistical question because it can be answered by collecting a single data value. Even if multiple responses were collected, the responses are not expected to show variability.

As students analyze and discuss examples and non-examples of statistical questions, listen for groups who distinguish the two in terms of the data needed to answer the questions. For example, some questions may require collecting data that will probably show some variability while other questions may have only a single response. Invite them to share later.

### Launch

Arrange students in groups of 3–4. Give students 1–2 minutes of quiet time to study the examples and non-examples of statistical questions and then 4–5 minutes to discuss with their group how the two sets are different and generate a rough definition of statistical questions. Pause the class for a discussion about their work and to review the concept of “variability” before having students complete the rest of the activity. Pause the class after the first question.

Set up a two-column table that can be displayed for all to see. Use the two table columns to record students' observations about characteristics of statistical and non-statistical questions during discussion.

Invite groups to share their observations and record them for all to see. Be sure that the class hears from students who distinguish statistical and non-statistical questions in terms of the data needed to answer them. If not mentioned by students, highlight that answering all three statistical questions requires data, and that each data set will most likely have variability. If not mentioned by students, explain that we use the term variability to describe data sets in which not every data value is the same. In contrast, finding out the color of the principal’s car, whether Elena has a cell phone, and Diego’s reading preference does not require data, or any data collected are not expected to vary.

Representation: Develop Language and Symbols. Display or provide charts with symbols and meanings. Add new examples and non-examples of statistical questions to the visual display.
Supports accessibility for: Conceptual processing; Memory

### Student Facing

These three questions are examples of statistical questions:

• What is the most common color of the cars in the school parking lot?
• What percentage of students in the school have a cell phone?
• Which kind of literature—fiction or nonfiction—is more popular among students in the school?

These three questions are not examples of statistical questions:

• What color is the principal’s car?
• Does Elena have a cell phone?
• What kind of literature—fiction or nonfiction—does Diego prefer?
1. Study the examples and non-examples. Discuss with your partner:

1. How are the three statistical questions alike? What do they have in common?
2. How are the three non-statistical questions alike? What do they have in common?
4. What makes a question a statistical question?

Pause here for a class discussion.

2. Read each question. Think about the data you might collect to answer it and whether you expect to see variability in the data. Complete each blank with “Yes” or “No.”

1. How many cups of water do my classmates drink each day?

• Is variability expected in the data? ______

• Is the question statistical? _____

2. Where in town does our math teacher live?

• Is variability expected in the data? ______

• Is the question statistical? _____

3. How many minutes does it take students in my class to get ready for school in the morning?

• Is variability expected in the data? ______

• Is the question statistical? _____

4. How many minutes of recess do sixth-grade students have each day?

• Is variability expected in the data? ______

• Is the question statistical? _____

5. Do all students in my class know what month it is?

• Is variability expected in the data? ______

• Is the question statistical? _____

### Anticipated Misconceptions

Students might think that if the response to a question requires counting or some kind of analysis then the question is statistical. Though statistical questions do require analysis, help students see that the starting point for distinguishing a statistical question is to see whether the data used to answer it have variability, which would then determine if analysis is called for. (In other words, a data set that shows no variability—i.e. has the same value for all data points, or has only a single data point—would not require analysis.)

### Activity Synthesis

Briefly poll students on their responses to the second set of questions. Be sure students understand that a question is statistical if we need data to answer it, and the data are expected to have variability.

Students may have trouble recognizing statistical questions in some cases. Here are two examples to ask students:

• “Who is the tallest person in the class?” may appear to be non-statistical, either because we might be able to visually tell who is tallest (so data seem unnecessary), or because we believe that everyone in the class would give the same answer (so no variability is expected).

• “How long is the longest river in the United States?” may appear to be non-statistical because we could readily look up the answer.

While the tallest person may be obvious in some classrooms, it is helpful to remember that this is not true in all classrooms. The students in a class are often close (but not identical) in height, and finding out who is tallest requires analyzing different heights. So the question, “Who is the tallest person in the class?” is generally a statistical question. In particular, because data may need to be collected in order to answer the question by measuring the heights of students.

Likewise, while the longest river in the country can be easily researched, it is helpful to remember that this was not always the case. The answer may be considered a fact now, but the question was once a statistical question—at some point, lengths of rivers were collected and compared in order to answer it.

To tell statistical questions from non-statistical ones, it is useful to look closely at the context of the questions and what it takes to answer them.

Reading, Representing, Conversing: MLR3 Clarify, Critique, Correct. To help students make sense of “What makes a question a statistical question?”, offer an incorrect response such as “A statistical question is one in which more than one answer is possible and a non-statistical question has only one possible answer.” Invite students to offer a correct response by asking “What language might you add or change to make this statement more accurate?” Improved responses should include the term “variability” with an explanation of what variability means. This will support student understanding of variability through language production.
Design Principle(s): Maximize meta-awareness; Support sense-making

## 2.4: Sifting for Statistical Questions (15 minutes)

### Optional activity

This activity provides additional practice in determining what it means for a question to be statistical in nature.

In a previous lesson, students learned about variability in data and about statistical questions. Here they develop a deeper understanding of statistical questions by studying a wider range of examples and non-examples. Students sort a variety of questions and explain why they are or are not statistical. When students explain their reasoning, critique the reasoning of others, and attempt to persuade, they engage in MP3. Students also begin writing statistical questions and think about the data that might be used to answer the questions.

As students discuss and sort the questions in groups, listen for the rationales they give and notice questions that they have trouble classifying so that they could be addressed later.

### Launch

Arrange students in groups of 2. Give each group a set of pre-cut cards that contain questions (from the blackline master). Give partners 4–5 minutes to sort the cards into three piles—Statistical, Non-statistical, and Unsure—and another 3–4 minutes to discuss their decisions with another group and then record the result of their deliberations.

If necessary, demonstrate productive ways for partners to communicate during a sorting activity. For example, partners take turns identifying a category for a card and explaining why they think it is a match. The other partner either accepts their explanation, or explains why they don't think it should be included in that category. Then they change roles for the next card to sort.

The word “typical” appears for the first time in this activity (in one of the questions to be sorted: “What is a typical number of students per class in your school?”). The term is connected to the idea of center and spread later in the unit, but is used informally here. If needed, explain that we can think of “typical” as meaning what is common or what can be expected in a given group.

Action and Expression: Develop Expression and Communication. Maintain a display of important terms and vocabulary. During the launch take time to review the following terms from previous activity that students will need to access for this activity: typical value, examples and non-examples of statistical questions.
Supports accessibility for: Memory; Language
Conversing: MLR8 Discussion Supports. Students should take turns selecting a card and explaining their reasoning to their partner. Display the following sentence frames for all to see: “_____ is a statistical question because . . .", “_____ is not statistical question because . . .", and "I am unsure about this question because . . . ." Encourage students to challenge each other when they disagree. This will help students clarify their reasoning about what it means for a question to be statistical in nature.
Design Principle(s): Support sense-making; Maximize meta-awareness

### Student Facing

1. Your teacher will give you and your partner a set of cards with questions. Sort them into three piles: Statistical Questions, Not Statistical Questions, and Unsure.

2. Compare your sorting decisions with another group of students. Start by discussing the two piles that your group sorted into the Statistical Questions and Not Statistical Questions piles. Then, review the cards in the Unsure pile. Discuss the questions until both groups reach an agreement and have no cards left in the Unsure pile. If you get stuck, think about whether the question could be answered by collecting data and if there would be variability in that data.

3. Record the letter names of the questions in each pile.

• Statistical questions:
• Non-statistical questions:

### Student Facing

#### Are you ready for more?

Tyler and Han are discussing the question, “Which sixth-grade student lives the farthest from school?”

• Tyler says, “I don’t think the question is a statistical question. There is only one person who lives the farthest from school, so there would not be variability in the data we collect.”

• Han says: “I think it is a statistical question. To answer the question, we wouldn’t actually be asking everyone, 'Which student lives the farthest from school?' We would have to ask each student how far away from school they live, and we can expect their responses to have variability.”

Do you agree with either one of them? Explain your reasoning.

### Anticipated Misconceptions

As students encounter additional examples of statistical questions, expect to see several areas of confusion.

• Students may confuse statistical questions with survey questions. A survey question is what we use to collect data. A statistical question is what we are trying to answer using collected data. For instance, the question, “How old are you?” is a survey question, because it can be used to gather data about the ages of people in a group being studied.  The question, “Are most residents of this building older or younger than 30?” is a statistical question, because answering it requires collecting and analyzing the ages of the residents.
• Related to the potential confusion about statistical and survey questions, students may mistakenly think that the number of possible answers to a question is what defines a statistical question. In other words, they may say that the question, “Which ice cream flavor is most popular in this class?” is not statistical because there is potentially only one answer (e.g., “chocolate is most popular”). Students may need to be reminded that answering the question requires surveying the students on their ice cream preferences, and that the responses are expected to have variability.

### Activity Synthesis

Most of the discussions happen in small groups. Bring the class together to discuss any remaining disagreements or questions. Select previously identified groups that had trouble classifying some of the cards to share their thinking and ask the class to help resolve the issue if possible.

• “Do you all agree with the list of the statistical questions?”
• “If not, which one(s) are harder to distinguish? Why?”
• “Now that you have seen more examples of statistical questions, what new insights do you have about them?”

Remind students that we might classify some questions differently depending on when the questions are asked. For example, if a statistical question has been previously studied and its answer now considered a fact, the question may no longer be considered statistical later. For example, “How tall is the tallest mountain in the world?” was long ago a statistical question, when not all mountains had not been measured and topographic data had not been assembled and analyzed. We can now find this information easily, without having to collect and study data, because that work has been done and made available. This nuance may help students better distinguish statistical and non-statistical questions, and help them clarify their assumptions when classifying questions as one type or the other (MP3).

## Lesson Synthesis

### Lesson Synthesis

In addition to looking at numerical and categorical data more closely, we also think about whether the data we are studying show variability, and the kinds of questions the data sets could help us answer.

• “What does it mean to say that data have variability?” (Not all the data values are the same.)
• “When might we expect data to have variability?” (When the question we are trying to answer is about a feature or a characteristic of a group that has different members, and each member having different features or characteristics.)
• “When might we expect data to have no variability?” (When the question we are trying to answer is about an individual, or about a feature that all group members have in common.)
• “Give some examples of data that would show variability and some that would not show variability.”
• “What is a statistical question?” (A question that can be answered using data that are expected to have variability.)
• “Give some examples of statistical and non-statistical questions.”
• “What kinds of data are needed to answer them?”
• “How might you collect the data?”
• “What units of measurement are involved?”

## Student Lesson Summary

### Student Facing

We often collect data to answer questions about something. The data we collect may show variability, which means the data values are not all the same.

Some data sets have more variability than others. Here are two sets of figures.

Set A has more figures with the same shape, color, or size. Set B shows more figures with different shapes, colors, or sizes, so set B has greater variability than set A.

Both numerical and categorical data can show variability. Numerical sets can contain different numbers, and categorical sets can contain different categories or types.

When a question can only be answered by using data and we expect that data to have variability, we call it a statistical question. Here are some examples.

• Who is the most popular musical artist at your school?
• When do students in your class typically eat dinner?
• Which classroom in your school has the most books?

To answer the question about books, we may need to count all of the books in each classroom of a school. The data we collect would likely show variability because we would expect each classroom to have a different number of books.

In contrast, the question “How many books are in your classroom?” is not a statistical question. If we collect data to answer the question (for example, by asking everyone in the class to count books), the data can be expected to show the same value. Likewise, if we ask all of the students at a school where they go to school, that question is not a statistical question because the responses will all be the same.