# Lesson 6

Areas in Histograms

## 6.1: Find the Area (5 minutes)

### Warm-up

The mathematical purpose of this activity is for students to find the area of a different regions under a curve. Monitor for students who divide the shaded area into a square and triangle and those who treat the entire shape as a trapezoid. Later in this lesson, students will find areas under a normal curve to approximate the proportion of values in an interval.

### Launch

Tell students that the phrase “the area under the curve” means the area between the $$x$$-axis and the curve.

### Student Facing

1. Find the shaded area between the function, the $$x$$-axis, and the boundaries $$x = 1$$ and $$x = 2$$. Explain or show your reasoning.
2. What proportion of the area between the function, the $$x$$-axis, and the boundaries $$x = 0$$ and $$x = 7$$ is shaded? Explain or show your reasoning.

### Activity Synthesis

The purpose of this discussion is to find strategies for comparing the area of a shaded region to the total area under a curve. Select previously identified students to share how they determined the area of the shaded region. Here are some questions for discussion.

• “How did you determine the area of the shaded region?” (I see that the region is composed of a unit square and a triangle with base 1 and height 2. I know the formula for a triangle is one-half base times height so 2 square units is the area of the shaded region.)
• “How did you find the area under the curve?” (I used the vertical lines $$x = 3$$, $$x = 4$$, $$x = 5$$, and $$x = 6$$, to divide the region into triangles and rectangles, then I found the area of each of the regions and added them together.)
• “Andre said he drew the vertical line $$x = 3.5$$ and found the area to the left of the line and then doubled it. Will Andre’s method work? Explain your reasoning.” (Yes, Andre’s method will work because $$x = 3.5$$ is a line of symmetry.)

## 6.2: Story Submissions (15 minutes)

### Activity

The mathematical purpose of this activity is for students to understand that approximately normal distributions have a similar proportion of area in regions described by the mean and standard deviations. Students are given a data set including the number of words for 200 submitted short stories. They are asked to determine the proportion of submitted stories that are within certain ranges.

Monitor for students who:

1. return to the original data set for each question and count the values in each interval
2. create a frequency table to break the data into natural intervals
3. create a histogram to visualize the areas in question
4. use technology to count values in each region

The problem mentions that a page holds approximately 200 words to help students identify natural intervals to summarize the data into a frequency table or histogram.

Making statistical technology available gives students an opportunity to choose appropriate tools strategically (MP5).

### Launch

Arrange students in groups of 2.

Ask, “What do you think the phrase ‘within two standard deviations of the mean’ means?” (It means the interval from the value of the mean minus two times the standard deviation to the value of the mean plus two times the standard deviation.)

If students are using a computer to create a spreadsheet, histogram, or find other information, suggest that they copy and paste the data into a program like Geogebra, Desmos, or other spreadsheet program rather than type all of the data themselves.

### Student Facing

A publisher takes submissions for short stories to include in a book. 200 stories are submitted, but the publisher needs to be aware of how long each story is. The way the publisher will put together the collection of stories, a page typically contains 200 words. The mean number of words for each story is 2,600 and the standard deviation is 400 words.

• 2844
• 2643
• 3316
• 2084
• 2316
• 2513
• 2931
• 2563
• 2655
• 2345
• 2465
• 2821
• 2493
• 2263
• 2706
• 2501
• 2627
• 2220
• 2372
• 2635
• 3066
• 2824
• 2357
• 2522
• 2564
• 2901
• 2118
• 2325
• 3551
• 2734
• 2888
• 2695
• 2763
• 2867
• 3301
• 2546
• 2174
• 2515
• 2936
• 3308
• 3624
• 2927
• 3101
• 3118
• 2761
• 3020
• 2556
• 3193
• 2513
• 3247
• 2476
• 2678
• 2466
• 3311
• 2863
• 2632
• 2669
• 2710
• 2440
• 2846
• 2425
• 3143
• 2491
• 2736
• 2115
• 2175
• 1722
• 3462
• 2570
• 2797
• 2505
• 2308
• 2224
• 1613
• 2361
• 2724
• 2438
• 3377
• 2156
• 2219
• 2302
• 1908
• 1453
• 2213
• 3172
• 2976
• 2042
• 3063
• 2954
• 3153
• 2470
• 1650
• 2404
• 2188
• 2722
• 2359
• 2635
• 2896
• 2809
• 2864
• 2756
• 2663
• 2259
• 2904
• 3138
• 2739
• 2784
• 3124
• 1867
• 3184
• 2073
• 2463
• 2374
• 1976
• 2746
• 3462
• 2730
• 1952
• 2068
• 3054
• 2476
• 2853
• 2538
• 2167
• 2732
• 3304
• 2347
• 3015
• 2151
• 2446
• 2714
• 2839
• 2727
• 2489
• 2481
• 2367
• 3116
• 2650
• 2477
• 2360
• 2975
• 2871
• 2946
• 1849
• 2897
• 2625
• 2938
• 2407
• 2218
• 2287
• 2356
• 2125
• 3296
• 2289
• 2379
• 2868
• 2715
• 2793
• 2631
• 2973
• 2876
• 2295
• 2551
• 2381
• 3259
• 3094
• 2452
• 2149
• 3043
• 2638
• 2549
• 2542
• 2753
• 2985
• 2501
• 2393
• 2896
• 2135
• 3191
• 2319
• 1984
• 2013
• 2462
• 3186
• 2674
• 2273
• 2483
• 2671
• 2702
• 2819
• 2197
• 2427
• 2018
• 1927
• 2428
• 2438
• 1852
• 2395
• 1826
• 2767
1. If a histogram is created using intervals of 200 words, what would be the area of the bar representing the number of stories that contain between 2,000 and 2,200 words? Explain or show your reasoning.
2. What proportion of the total area is represented by the bar for stories that contain between 2,000 and 2,200 words? Explain or show your reasoning.
3. What proportion of stories in this group contain between 2,000 and 2,200 words? Explain or show your reasoning.
4. How does the proportion of the area you calculated relate to the proportion of stories in the group that contain between 2,000 and 2,200 words?
5. What proportion of stories in this group are within 1 standard deviation of the mean number of words?
6. What proportion of stories in this group are within 2 standard deviations of the mean number of words?
7. What proportion of stories in this group are within 1 standard deviation of 2,400 words?

### Launch

Arrange students in groups of 2. Distribute one copy of the blackline master to each group.

Ask, “What do you think the phrase ‘within two standard deviations of the mean’ means?” (It means the interval from the value of the mean minus two times the standard deviation to the value of the mean plus two times the standard deviation.)

### Student Facing

A publisher takes submissions for short stories to include in a book. 200 stories are submitted, but the publisher needs to be aware of how long each story is. The way the publisher will put together the collection of stories, a page typically contains 200 words. The mean number of words for each story is 2,600 and the standard deviation is 400 words.

1. If a histogram is created using intervals of 200 words, what would be the area of the bar representing the number of stories that contain between 2,000 and 2,200 words? Explain or show your reasoning.
2. What proportion of the total area is represented by the bar for stories that contain between 2,000 and 2,200 words? Explain or show your reasoning.
3. What proportion of stories in this group contain between 2,000 and 2,200 words? Explain or show your reasoning.
4. How does the proportion of the area you calculated relate to the proportion of stories in the group that contain between 2,000 and 2,200 words?
5. What proportion of stories in this group are within 1 standard deviation of the mean number of words?
6. What proportion of stories in this group are within 2 standard deviations of the mean number of words?
7. What proportion of stories in this group are within 1 standard deviation of 2,400 words?

### Student Facing

#### Are you ready for more?

Prove more generally that the proportion of total area taken up by a bar in a histogram is equal to the proportion of all data values that are contained in the interval represented by bar. To begin, let $$n$$ represent the number of data values in an interval given by one bar, $$M$$ represent the number of data values in the entire set, and $$w$$ be the width of the interval in each bar of the histogram. Prove that the proportion of area taken up by the bar is $$\frac{n}{M}$$.

### Anticipated Misconceptions

Some students may have trouble understanding the phrase “within 1 standard deviation of the mean.” Ask students to draw a number line containing the mean value and mark on the number line 1 standard deviation below and 1 standard deviation above the mean.

### Activity Synthesis

The goal of this discussion is to get students thinking about histograms as an area model for a distribution. Select previously identified students to share in this order:

1. return to the original data set for each question and count the values in each interval
2. create a frequency table to break the data into natural intervals
3. create a histogram to visualize the areas in question
4. use technology to sort or count values in each interval

Connect the methods used by asking students about the advantages and disadvantages of the methods they used. (The summary methods, such as a frequency table or histogram, require additional time to set up, but are quicker to find the useful values for questions such as the ones included in the activity. Using technology, such as spreadsheets, to sort or count the data is also quick and accurate, but may take some time to enter the data if it is not already on the computer.)

If possible, display a histogram created by the students. If none of the students created a histogram, display the one here.

Here are some questions for discussion.

• “How is this activity related to the previous one?” (In the previous activity, we figured out the area of a shaded region and divided it by the total area under the curve. In this activity, we can think of the bars as having an area.)
• “Why would a normal distribution be a good model for the data summarized in the histogram?” (The distribution is approximately symmetric and bell-shaped.)
• “Why do you think the area for a bar representing 2,000 to 2,200 words has the same proportion of the whole as the proportion of stories in the same region to the whole?” (Since the area came from $$\frac{18 \boldcdot 200}{200 \boldcdot 200}$$, which is equivalent to $$\frac{18}{200}$$.)
• “Was it surprising to you that more than 95% of the data was within two standard deviations of the mean or that almost 70% of the data was within one standard deviation of the mean? Explain your reasoning.” (Both were surprising to me. I had never really thought about using the standard deviation to create an interval around the mean, so I had never thought about using the standard deviation to estimate the proportion of the data that was within a specific interval. But since most of the data is near the center, I can understand why it is this way.)
• “Why do you think it could be useful to look at the proportion of values that are within one standard deviation of the mean?” (In the last lesson, we were introduced to normal curves that were based on the mean and standard deviation. If we modeled a histogram with a standard curve, it would be interesting to know what proportion of values fall within a given interval on normal curves.)

## 6.3: Website Load Times (10 minutes)

### Activity

The mathematical purpose of this activity is for students to recognize that, with approximately normally distributed data, regions described completely by the mean and standard deviation represent the same proportion of the whole. For example, there should always be approximately 68% of the data within one standard deviation of the mean. This activity provides a second context for approximately normal data so that comparisons can be made and to highlight the similar proportions for regions described only in terms of the mean and standard deviation.

### Launch

Tell students that they are going to do an activity very similar to the previous one involving finding proportions using a histogram.

### Student Facing

A company collects data from 10,000 websites about how long it takes to load the site. The number of seconds it takes to fully load the website is summarized in the relative frequency table.

1.4–1.6 0.0003
1.6–1.8 0.0012
1.8–2.0 0.0053
2.0–2.2 0.0181
2.2–2.4 0.0442
2.4–2.6 0.0910
2.6–2.8 0.1555
2.8–3.0 0.1861
3.0–3.2 0.1938
3.2–3.4 0.1447
3.4–3.6 0.0923
3.6–3.8 0.0447
3.8–4.0 0.0166
4.0–4.2 0.0048
4.2–4.4 0.0012
4.4–4.6 0.0002

The relative frequency histogram summarizes the same data.

The mean time to load a website is 3 seconds and the standard deviation is 0.4 seconds.

1. Would a normal distribution be a good model for this distribution? Explain your reasoning.
2. What proportion of websites loaded within 1 standard deviation of the mean?
3. What proportion of websites loaded within 2 standard deviations of the mean?
4. What proportion of websites loaded within 1 standard deviation of 2.8 seconds?
5. Compare the proportion of websites within 1 standard deviation of the mean to the proportion of stories in the submissions that are within 1 standard deviation of the mean number of words from the previous task. Do the same for the proportion within 2 standard deviations.

### Activity Synthesis

Select students to share their comparisons.

Students may comment on the fact that the two activities had some questions that are very much alike, and their answers are very much alike. If no students mention this, point it out. For example, each task asked about the proportion of stories or websites that were within 1 standard deviation of a value that was half of a standard deviation below the mean (within 1 standard deviation of 2,400 words or within 1 standard deviation of 2.8 seconds), and they both had about 62% of the values in that region.

Display the images for the normal distributions that model each situation. Tell students that since both questions ask for the proportion of participants within one standard deviation of a value that is the same number of standard deviations from the mean, and we know that these distributions are normal, we would expect the region in each image to include approximately the same proportion of values from the data set. That is what it means to say that a normal distribution is defined only by the mean value and the standard deviation.

Here are some questions for discussion:

• “Describe what is happening in the two graphs.” (Each graph represents a normal curve fit to the data in the words or the website problem. Each of the shaded regions represents the proportion of the data that is within one standard deviation of 2,400 words and 2.8 seconds, respectively.)
• “Describe a story that includes 2,400 words in terms of the mean and standard deviation only. Then do the same for a website that loads in 2.8 seconds.” (Both values are half of a standard deviation below the mean.)
• “How would this graph change if the shaded region was within one standard deviation of the mean?” (The shaded region would be shifted to the right and it would take up about 68% or 69% of the area under the curve. It would be symmetric around the normal curve’s vertical line of symmetry.)
• “Why do you think it is helpful to model bell-shaped symmetric data using a normal curve?” (It is helpful because then you can compare data sets with different means and different standard deviations.)
Reading, Writing, Speaking: MLR3 Clarify, Critique, Correct. Use this routine to clarify the meaning of the phrase “within one standard deviation of the mean.” Display this ambiguous response: “I added up the relative frequencies before and after 3 seconds to get the proportion.” Ask students to identify the error, critique the reasoning, and write an improved explanation. Invite students to share their response with a partner before selecting 1–2 students to share with the whole-class. Listen for and amplify language students use to explain why both the mean and standard deviation are necessary in calculating the proportion. This helps students evaluate, and improve upon, the written mathematical arguments of others, as they clarify how to calculate a proportion of data within a certain standard deviation from the mean.
Design Principle(s): Optimize output (for explanation); Maximize meta-awareness
Engagement: Develop Effort and Persistence. Break the class into small discussion groups and then invite a representative from each group to report back to the whole class. Remind students that each member should be prepared to speak on behalf of the group if called on.
Supports accessibility for: Language; Social-emotional skills; Attention

## Lesson Synthesis

### Lesson Synthesis

Tell students, “A biologist measures the stride lengths of a population of emus, the second-tallest birds in the world, and the stride lengths of a population of ostriches, the tallest birds in the world. The biologist found that the stride lengths of both populations were approximately normally distributed.” Display the information for all to see.

• The mean stride length of the population of emus is 3 meters with a standard deviation of 0.5 meters.
• The mean stride length of the population of ostriches is 4.5 meters with a standard deviation of 0.75 meters.

Here are some questions for discussion:

• “Approximately 34% of the ostriches have stride lengths between 4.5 and 5.25 meters. Describe these values in terms of the mean and standard deviation only. What interval would represent a similar percentage of emus?” (4.5 is the mean and 5.25 is one standard deviation above the mean. For emus, a similar percentage can be found between 3 to 3.5 meters since it is from the mean to one standard deviation above the mean.)
• “How can this percentage be seen using a graph of the normal curve?” (There is a vertical line down the middle, then another vertical line to the right, and shaded in between the two lines.)
• “Approximately 95% of the emus have a stride length between 2 and 4 meters. What interval for the ostriches has a similar percentage?" (3 to 6 meters since these values are within 2 standard deviations of the mean.)
• “How can this percentage be seen using a graph of the normal curve?” (The area under the normal curve is shaded two standard deviations to the left and right of the mean. The shading is symmetric around the center of the curve.)

## Student Lesson Summary

### Student Facing

There is an important connection between areas in histograms and the data represented by the histogram. In particular, the proportion of the total area in the histogram that is represented by a single bar in the histogram is equivalent to the proportion of all the data that is included in that interval. This is made more interesting by the fact that, for normally distributed data, the proportion of values in an interval whose endpoints are described by the mean and standard deviation is always the same.

For example, a woodshop produces boards of various lengths. During a certain week, 5,000 boards are produced and measured. The mean length is 6 feet, and the standard deviation length is 1 foot. The table and histogram show a summary of the board lengths.

board length 3.5–4 4–4.5 4.5–5 5–5.5 5.5–6 6–6.5 6.5–7 7–7.5 7.5–8 8–8.5
frequency 113 220 460 747 960 955 753 460 220 112

The total area of all the rectangles in the histogram is 2,500 since we could stack all the bars on top of one another and have a rectangle that is 5,000 tall and 0.5 wide. If we look at just the rectangles representing boards between 5.5 and 6 feet wide, the area is 480, which is 19.2% of the total area since $$\frac{480}{2,500} = 0.192$$. Similarly, we can see from the data that 19.2% of the data is in this same interval since $$\frac{960}{5,000} = 0.192$$. It is not a coincidence that these values are the same! The proportion of the total area that is in one of the rectangles is always equivalent to the proportion of all the data values that are in the same interval.

When the data is normally distributed, the proportions of certain regions are always the same. For example, there is always about 68% of the data within one standard deviation of the mean. Since the boards produced by the woodshop are approximately normal, we can test this information.

The boards within one standard deviation of the mean are between 5 and 7 feet long. Using the table, we can see that 3,415 boards are in this range ($$747+960+955+753 = 3,\!415$$) and those represent 68.3% ($$\frac{3,415}{5,000} = 0.683$$) of the boards produced in the woodshop.

Let’s say that, another week, the woodshop produces 5,000 boards again, but this time, the mean is 6.5 feet and the standard deviation is 0.75 feet. As long as the board lengths continue to be approximately normal, we can expect about 68% of the boards to be within 1 standard deviation of the mean. For that week, it means that about 68% of the boards will be between 5.75 and 7.25 feet long.

In fact, as long as the interval can be described using only the mean and standard deviation and the data is normally distributed, the proportion of data values in the interval can be found. In general, about 68% of the data is within 1 standard deviation of the mean, about 95% of the data is within 2 standard deviations of the mean, and more than 99% of the data is within 3 standard deviations of the mean.