# Lesson 2

Data Representations

## 2.1: Notice and Wonder: Battery Life (5 minutes)

### Warm-up

This is the first Notice and Wonder activity in the course. Students are shown three statistical displays representing the same data set. The prompt to students is “What do you notice? What do you wonder?” Students are given a few minutes to write down things they notice and things they wonder. After students have had a chance to write down their responses, ask several students to share things they noticed and things they wondered; record these for all to see. Often, the goal is to steer the conversation to wondering about something mathematical that the class is about to focus on. The purpose is to make a mathematical task accessible to all students with these two approachable questions. By thinking about them and responding, students gain entry into the context and might get their curiosity piqued.

The purpose of this warm-up is to elicit the idea that the same data can be displayed in different ways, which will be useful when students create different data displays in a later activity. While students may notice and wonder many things about these images, the comparison of the three representations and interpreting the information in each representation are the important discussion points.

This prompt gives students opportunities to see and make use of structure (MP7). Specifically, they might use the structure of the three representations, particularly the structure of the horizontal number line, to find mathematically important similarities in how the same set of data is represented.

### Launch

Display the images for all to see. Ask students to think of at least one thing they notice and at least one thing they wonder. Give students 1 minute of quiet think time, and then 1 minute to discuss the things they notice and wonder with their partner, followed by a whole-class discussion.

### Student Facing

The dot plot, histogram, and box plot summarize the hours of battery life for 26 cell phones constantly streaming video. What do you notice? What do you wonder?

### Activity Synthesis

Ask students to share the things they noticed and wondered. Record and display their responses for all to see. If possible, record the relevant reasoning on or near the image. After all responses have been recorded without commentary or editing, ask students, “Is there anything on this list that you are wondering about now?” Encourage students to respectfully disagree, ask for clarification, or point out contradicting information.

The goal is to help students recall different ways to represent distributions of data. Highlight the similarities between the dot plot and the histogram. Tell students that the tallest bar in the histogram is created from the two data values at 5 and the six data values at 5.5 in the dot plot, and that the final bar is created from the two data values at 6 and the two data values at 6.5 in the dot plot. If time permits, discuss questions such as . . .

• “Which representation(s) shows all the data values?” (The dot plot shows all the data values.)
• “How do you create a box plot?” (You calculate the values for the five-number summary and then graph them on a number line. The first quartile, the median, and the third quartile are used for the box, and the minimum value and maximum value are used for the whiskers.)

## 2.2: Tomato Plants: Histogram (15 minutes)

### Optional activity

The mathematical purpose of this activity is to represent and analyze data with histograms. Students will create two different histograms from the same data set by organizing data into different intervals.

### Launch

Arrange students in groups of 2.

Conversing: MLR 2 Collect and Display. As groups work, circulate and listen to student talk about the similarities and differences between the types of data collected. Write down common or important phrases you hear students say about each type on a visual display. For example,  “these are all numbers” or “this only has one answer." Collect the responses into a visual display. Throughout the remainder of the lesson, continue to update collected student language and remind students to borrow language from the display as needed. In the lesson synthesis, after the terms “numerical data” and “categorical data” have been introduced, ask students to sort the collected language into two groups, one for each type of data. This will help students organize data throughout the unit.
Design Principle(s): Support sense-making; Maximize meta-awareness
Action and Expression: Internalize Executive Functions. Provide students with grid or graph paper to organize their two histograms with different interval widths.
Supports accessibility for: Language; Organization

### Student Facing

A histogram can be used to represent the distribution of numerical data.

1. The data represent the number of days it takes for different tomato plants to produce tomatoes. Use the information to complete the frequency table.
• 47
• 52
• 53
• 55
• 57
• 60
• 61
• 62
• 63
• 65
• 65
• 65
• 65
• 68
• 70
• 72
• 72
• 75
• 75
• 75
• 76
• 77
• 78
• 80
• 81
• 82
• 85
• 88
• 89
• 90
days to produce fruit frequency
40–50
50–60
60–70
70–80
80–90
90–100
2. Use the set of axes and the information in your table to create a histogram.

3. The histogram you created has intervals of width 10 (like 40–50 and 50–60). Use the set of axes and data to create another histogram with an interval of width 5. How does this histogram differ from the other one?

### Student Facing

#### Are you ready for more?

It often takes some playing around with the interval lengths to figure out which gives the best sense of the shape of the distribution.

1. What might be a problem with using interval lengths that are too large?

2. What might be a problem with using interval lengths that are too small?

3. What other considerations might go into choosing the length of an interval?

### Anticipated Misconceptions

Students may struggle to know how to place numbers that lie on the boundary between intervals. For example, students may not know if a value like 60 should be included in the interval 50–60 or 60–70. Explain to students that the lower boundary value is included in the interval, and the upper boundary value is not. For example, the interval 60–70 includes all the values that are greater than or equal to 60 and less than 70.

### Activity Synthesis

The purpose of this discussion is to make sure that students know how to create and begin to interpret histograms. Here are some questions for discussion.

• “Where did you put the 60? In the 50–60 interval or 60–70 interval?” (Tell students that we use the convention of including the 60 in the 60–70 interval. The interval 60–70 means all the values greater than or equal to 60, but less than 70. The interval 50–60 means all values greater than or equal to 50 but less than 60.)
• “What information is easily seen in the histogram?” (The shape of the distribution as well as estimates for the measure of center and measure of variability.)
• “According to each histogram, what appears to be the typical number of days it takes a tomato plant to produce tomatoes?” (Using the first histogram, it appears that the typical number of days is somewhere between 60 and 80. Looking at the second histogram, it appears that the typical number of days could be between 75 and 80.)
• “What information is not seen in the histogram?” (You are not able to see the actual values. You only know the number of values within an interval rather than the values themselves.)
Writing, Speaking: MLR 1 Stronger and Clearer Each Time. Use this with successive pair shares to give students a structured opportunity to revise and refine their explanation of the differences between the two histograms. Ask each student to meet with 2–3 other partners in a row for feedback. Provide students with prompts for feedback that will help individuals strengthen their ideas and clarify their language. For example, “what do the different heights of the bars mean?” “what do the different numbers of bars mean?” “can you give an example of . . .?” etc. Students can borrow ideas and language from each partner to strengthen their final explanation.
Design Principle(s): Optimize output (for comparison)

## 2.3: Tomato Plants: Box Plot (10 minutes)

### Optional activity

The mathematical purpose of this activity is to represent the distribution of data on the real number line with a box plot and help students think informally about the median as a measure of center. Students calculate the values for the five-number summary and create a box plot. The median, quartiles, and extreme values split the data set into four intervals with approximately the same number of data values in each. Students engage in MP2 when they interpret these values in the given context. Although these intervals are often called “quartiles,” the term “quarters” is used in these materials to avoid confusion with the quartile values Q1 and Q3.

### Launch

Keep students in groups of 2. Give students 5 minutes to work the questions. Ask them to compare their answers with their partner after each question.

Action and Expression: Internalize Executive Functions. To support development of organizational skills in problem solving, chunk this task into more manageable parts. For example, instruct students to reference their sequential data, divide the data into quarters, then find the median, Q1, and Q3.
Supports accessibility for: Memory; Organization

### Student Facing

A box plot can also be used to represent the distribution of numerical data.

minimum Q1 median Q3 maximum
1. Using the same data as the previous activity for tomato plants, find the median and add it to the table. What does the median represent for these data?
2. Find the median of the least 15 values to split the data into the first and second quarters. This value is called the first quartile. Add this value to the table under Q1. What does this value mean in this situation?
3. Find the value (the third quartile) that splits the data into the third and fourth quarters and add it to the table under Q3. Add the minimum and maximum values to the table.
4. Use the five-number summary to create a box plot that represents the number of days it takes for these tomato plants to produce tomatoes.

### Anticipated Misconceptions

For students who have difficulty calculating the median, remind them that the median is the middle of a sequential data set. For students who have difficulty finding Q1 and Q3, ask them how many groups should we have if we are splitting the data into “quarters.” The data should be divided into four equal groups and the median of the lower half of the values is Q1 and the median of the upper half of the values is Q3.

### Activity Synthesis

The goal is to make sure students understand the five-number summary and to help them think informally about the median as a measure of center. Here are some questions for discussion.

• “What information is easily seen in the box plot?” (The minimum value, quartiles including the median, and the maximum value. This also highlights the interquartile range and the range.)
• “According to the box plot, what is the typical number of days it takes a tomato plant to produce fruit?” (The typical number of days is 71 because the median of the data is 71 days.)

## Lesson Synthesis

### Lesson Synthesis

In this lesson students viewed data represented by dot plots, histograms, and box plots.

• “What are the strengths of each of the representations?” (A dot plot lets you see all of the data and how it is distributed. The histogram summarizes the data into intervals that make for fewer columns. The box plot displays the five-number summary graphically.)
• “What are the weaknesses of each of the representations?” (A dot plot has many columns of dots that can make it difficult to determine patterns graphically. Both the histogram and the box plot do not display each individual value in the data set which means that the mean cannot be calculated directly from either representation.
• “How do you find the ‘typical’ value for a data set?” (You can calculate the mean or median, or estimate the mean or median using a graphical representation.)

## Student Lesson Summary

### Student Facing

The table shows a list of the number of minutes people could intensely focus on a task before needing a break. 50 people of different ages are represented.

• 19
• 7
• 1
• 16
• 20
• 2
• 7
• 19
• 9
• 13
• 3
• 9
• 18
• 13
• 20
• 8
• 3
• 14
• 13
• 2
• 8
• 5
• 17
• 7
• 18
• 17
• 8
• 8
• 7
• 6
• 2
• 20
• 7
• 7
• 10
• 7
• 6
• 19
• 3
• 18
• 8
• 19
• 7
• 13
• 20
• 14
• 6
• 3
• 19
• 4

In a situation like this, it is helpful to represent the data graphically to better notice any patterns or other interesting features in the data. A dot plot can be used to see the shape and distribution of the data.

There were quite a few people that lost focus at around 3, 7, 13, and 19 minutes and nobody lost focus at 11, 12, or 15 minutes. Dot plots are useful when the data set is not too large and shows all of the individual values in the data set. In this example, a dot plot can easily show all the data. If the data set is very large (more than 100 values, for example) or if there are many different values that are not exactly the same, it may be hard to see all of the dots on a dot plot.

A histogram is another representation that shows the shape and distribution of the same data.

Most people lost focus between 5 and 10 minutes or between 15 and 20 minutes, while only 4 of the 50 people got distracted between 20 and 25 minutes. When creating histograms, each interval includes the number at the lower end of the interval but not the upper end. For example, the tallest bar displays values that are greater than or equal to 5 minutes but less than 10 minutes. In a histogram, values that are in an interval are grouped together. Although the individual values get lost with the grouping, a histogram can still show the shape of the distribution.

Here is a box plot that represents the same data.

Box plots are created using the five-number summary. For a set of data, the five-number summary consists of these five statistics: the minimum value, the first quartile, the median, the third quartile, and the maximum value. These values split the data into four sections each representing approximately one-fourth of the data. The median of this data is indicated at 8 minutes and about 25% of the data falls in the short second quarter of the data between 6 and 8 minutes. Similarly, approximately one-fourth of the data is between 8 and 17 minutes. Like the histogram, the box plot does not show individual data values, but other features such as quartiles, range, and median are seen more easily. Dot plots, histograms, and box plots provide 3 different ways to look at the shape and distribution while highlighting different aspects of the data.