Lesson 5

Describing Trends in Scatter Plots

Let’s look for associations between variables.

5.1: Which One Doesn’t Belong: Scatter Plots

Which one doesn’t belong? 

Four scatterplots.

 

5.2: Fitting Lines

Experiment with finding lines to fit the data. Drag the points to move the line. You can close the expressions list by clicking on the double arrow.

  1. Here is a scatter plot. Experiment with different lines to fit the data. Pick the line that you think best fits the data. Compare it with a partner’s.

     
  2. Here is a different scatter plot. Experiment with drawing lines to fit the data. Pick the line that you think best fits the data. Compare it with a partner’s.

     
  3. In your own words, describe what makes a line fit a data set well.

5.3: Good Fit Bad Fit

The scatter plots both show the year and price for the same 17 used cars. However, each scatter plot shows a different model for the relationship between year and price.

Two scatterplots.
  1. Look at Diagram A.
    1. For how many cars does the model in Diagram A make a good prediction of its price?

    2. For how many cars does the model underestimate the price?

    3. For how many cars does it overestimate the price?

  2. Look at Diagram B.
    1. For how many cars does the model in Diagram B make a good prediction of its price?

    2. For how many cars does the model underestimate the price?

    3. For how many cars does it overestimate the price?

  3. For how many cars does the prediction made by the model in Diagram A differ by more than $3,000? What about the model in Diagram B?

  4. Which model does a better job of predicting the price of a used car from its year?

5.4: Practice Fitting Lines

  1. Is this line a good fit for the data? Explain your reasoning.
    A scatterplot. Horizontal, from 1000 to 1500, by 125’s. Vertical, from 0 to 4000, by 1000’s. 21 data po0nts above and below line. Trends downward and right. 
  2. Draw a line that fits the data better.
    A scatterplot. Horizontal, from 1000 to 1500, by 125’s. Vertical, from 0 to 4000, by 1000’s. 21 data po0nts. Trend downward and to right.
  3. Is this line a good fit for the data? Explain your reasoning.
    A scatterplot.
  4. Draw a line that fits the data better.
    A scatterplot. Horizontal, from 0 to 100 by 25’s. Vertical, from 0 to 200, by 50’s. 20 data points.. Trends upward and right, clustered in two groups.


A scatterplot, 30 points arranged very close to the line from 0 comma 0 to 10 comma 30.
A scatterplot, points at x= 0 lie between negative 2 and 8, generally trend up and to the right. Points at x = 9 lie between 20 and 35.
A scatterplot, points at x= 0 lie between negative 18 and negative 2, generally trend up and to the right. Points at x = 9 lie between 15 and 40.

These scatter plots were created by multiplying the \(x\)-coordinate by 3 then adding a random number between two values to get the \(y\)-coordinate. The first scatter plot added a random number between -0.5 and 0.5 to the \(y\)-coordinate. The second scatter plot added a random number between -2 and 2 to the \(y\)-coordinate. The third scatter plot added a random number between -10 and 10 to the \(y\)-coordinate.

  1. For each scatter plot, draw a line that fits the data.
  2. Explain why some were easier to do than others.

Summary

When a linear function fits data well, we say there is a linear association between the variables. For example, the relationship between height and weight for 25 dogs with the linear function whose graph is shown in the scatter plot.

A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.

Because the model fits the data well and because the slope of the line is positive, we say that there is a positive association between dog height and dog weight.

What do you think the association between the weight of a car and its fuel efficiency is?

Scatterplot, weight, kilograms, 1000 to 2500 by 250, fuel efficiency, miles per gallon, 14 to 32 by 2. Points are arranged close to the line through 1100 comma 28 down and right through 2300 comma 14.

Because the slope of a line that fits the data well is negative, we say that there is a negative association between the fuel efficiency and weight of a car.

Glossary Entries

  • negative association

    A negative association is a relationship between two quantities where one tends to decrease as the other increases. In a scatter plot, the data points tend to cluster around a line with negative slope.

    Different stores across the country sell a book for different prices.

    The scatter plot shows that there is a negative association between the the price of the book in dollars and the number of books sold at that price.

    Scatterplot with line of best fit.
  • outlier

    An outlier is a data value that is far from the other values in the data set.

    Here is a scatter plot that shows lengths and widths of 20 different left feet. The foot whose length is 24.5 cm and width is 7.8 cm is an outlier.

    A scatterplot with line.
  • positive association

    A positive association is a relationship between two quantities where one tends to increase as the other increases. In a scatter plot, the data points tend to cluster around a line with positive slope.

    The relationship between height and weight for 25 dogs is shown in the scatter plot. There is a positive association between dog height and dog weight.

    A scatterplot, horizontal, dog height in inches, 6 to 30 by 3, vertical, 0 to 112 by 16. Same scatterplot as previous, this time with a line through 9 comma 0 and 27 comma 80.