Illinois State University Mathematics Department

MAT 312: Probability and Statistics for Middle School Teachers

Dr. Roger Day (day@ilstu.edu)



Test #2: Possible Solutions

Scoring
  • Part I: 25 Multiple Choice Questions (1 pt each)
  • Part II: 2 Open-Response Questions (25 pts total)
  • Total: 50 points
  • Impact on Course Grade: 20% of your Grade

Criteria Used to Evaluate Part II Responses

26: 15 points

a) 2 pts: accurate scatter plot
b) 3 pts: correct median-median points, identified on scatter plot and listed as ordered pairs
c) 6 pts
(i) and (ii): 2 pts each: accurate interpretation, clearly expressed
(iii): 2 pts: correct numerical response to nearest tenth of a pound
d) 2 pts: correct least-squares equation
e) 2 pts: correct numerical response to nearest hundredth

27: 10 points

a) 2 pts: correct equation
b) 1 pt: correct statement of the correlation coefficient, precisely as shown on calculator screen
c) 1 pt: correct SSE, rounded to nearest hundredth of a unit
d) 2 pts: accurate residual plot, including indication of scales
e) 4 pts: comprehensive and accurate report of your exploration, with reference to at least two additional models and appropriate use of criteria for judging goodness of fit

Part I: Multiple Choice

For each question, choose the one best response and circle that letter at the appropriate spot on the answer sheet.

1.The scatter plot to the right shows _?_ relationship. Assume that vertical and horizontal axes are identically scaled.

a. a strong negative
b. a strong positive
c. a weak negative
d. a weak positive


2.The visual representation shown here helps describe the relationship between direct current electrical output from a wind power generator and wind speed. The plot provides information about the _?_ of that relationship.

a. center, spread, and shape
b. direction, shape, and location
c. location, value, and shape
d. shape, strength, and direction
e. source, direction, and value

3. Estimate the slope of a spaghetti line that might appropriately fit the data plotted in Question 2. Note the axes scale values.

a. -4.00
b. 0.20
c. 0.75
d. 1.25
e. 5.00


4. True or false: A least-squares linear regression line maximizes the sum of the squared residual values.

a. True
b. False


The following data is used for problems 5 through 7.

Table 1: Comparison of Per Capita Ice Cream Consumption and Price of Ice Cream (Thirty 4-Week Periods)

Price per pint, in dollars,
for Ice Cream (x)

Ice Cream Consumption
in Pints Per Capita (y)
.270
.282
.277
.280
.272
.262
.275
.267
.265
.277
.282
.270
.272
.287
.277
.287
.280
.277
.277
.277
.292
.287
.277
.285
.282
.265
.265
.265
.268
.260
.386
.374
.393
.425
.406
.344
.327
.288
.269
.256
.286
.298
.329
.318
.381
.381
.470
.443
.386
.342
.319
.307
.284
.326
.309
.359
.376
.416
.437
.548

The data above is from a research study. Ice cream consumption was measured over 30 four-week periods. One purpose of the study was to determine whether ice cream consumption depended on the price of ice cream. The researcher uses least-squares linear regression to determine this prediction equation:

y = -2.047x + 0.9230.

5. Which one of the following statements is least correct?

a. The prediction equation assures there is a strong negative relationship between price and consumption of ice cream.
b. The prediction equation can be used to predict general patterns in ice cream consumption based on ice cream price.
c. The prediction equation indicates that if ice cream was available at no cost ($0), the per capita consumption would be just less than 1 pint.
d. The prediction equation indicates that there is a negative relationship between price and consumption of ice cream.

6. Which one of the following statements about the prediction line y = -2.047x + 0.9230 is most correct?

a. The linear equation can be used to state the actual consumption of ice cream when we know the price of ice cream.
b. The linear equation generates predictions for ice cream consumption where the sum of the squared residuals are minimized.
c. The linear equation predicts that ice cream priced at $0.400 (40 cents) per pint will yield a per capita consumption of about one-fifth of a pint.
d. The linear equation suggests that when per capita consumption is at 0.300 pints, the price of ice cream will be about $0.403 (40.3 cents) per pint.

7. Which statement best interprets the meaning of the slope of the prediction equation?

a. For a $1 increase in the price of a pint of ice cream, we can estimate a per capita ice cream consumption increase of approximately $2.05.
b. For a $1 increase in the price of a pint of ice cream, we can estimate a per capita ice cream consumption decrease of approximately 2.05 pints.
c. For ice cream priced at $1 per pint, we can estimate that the per capita ice cream consumption will decrease by 0.9230 pints.
d. For 1 pint of ice cream consumed per capita, we can estimate its price will be approximately $2.05.


8. True or false: For any two-variable data set, the calculator's cubic regression model will always generate a smaller SSE than will the calculator's linear regression model.

a. True
b. False


9 . A median-median line is to be generated from a scatter plot of data. Given the scatter plot, what is the first step in creating the median-median line?

a. Calculate the slope of the median-median line.
b. Determine the ordered pairs corresponding to the summary points.
c. Draw an ellipse around the points of the scatter plot.
d. Partition the data into three groups with, if possible, an equal number of points in each group.
e. None of the statements (a) through (d) correctly identify the first step.
f. More than one of the steps in statements (a) through (d) could be completed first.


The following situation is used for problems 10 through 14.

A horticulturist gathered the data shown to the right. We are interested in the relationship that may exist between tree age (x) and tree diameter (y).

It can be shown that the equation for the median median line that models this data set is y = 0.18148x + 0.967901. Also, the equation of the least-squares linear regression line for this data is y = 0.16065x + 1.465806. The sum of the squared residuals for the least-squares linear regression line is 25.87845.

Tree Age and Diameter

This table lists the ages and diameters of 27 chestnut oak trees planted on a poor site.

Age in Years (x)
Diameter in Inches (y)
4
5
8
8
8
10
10
12
13
14
16
18
20
22
23
25
28
29
30
30
33
34
35
38
38
41
42
1.0
1.2
1.3
2.3
3.3
2.4
3.8
5.1
3.8
2.7
4.7
4.9
5.8
6.1
5.0
6.8
6.2
4.8
6.2
7.3
8.2
6.8
7.3
5.2
7.3
7.7
7.8
This data is adapted from Elements of Forest Mensuration (1936) by Chapman & Demeritttree. The diameters were measured 48" off the ground.

10. True or false: The slope of the least-squares regression line is smaller than the slope of the median-median line.

a. True.
b. False.
c. It cannot be determined from the information provided.

11. The median-median line model predicts that a 32-year old tree from the research location will have a diameter of _?_ inches.

a. 5.141
b. 5.807
c. 6.606
d. 6.775
e. None of these are correct.

12. Which statement below is the most meaningful interpretation of the slope of the least-squares regression line?

a. A tree's diameter is estimated to increase by 1.465806 inches for every 1-year increase in the tree's age.
b. A tree's diameter is estimated to increase by 0.16065 inches for every 1-year increase in the age of the tree.
c. For every 1-inch increase in a tree's diameter, its age is estimated to have increased by 0.16065 years.
d. For every 1-inch increase in a tree's diameter, its age is estimated to have increased by 1.465806 years.

13. Which statement below is the most correct interpretation of the y-intercept of the median-median line?

a. A tree's diameter will be 0.967901 inches when the tree is 0.18148 years old.
b. A tree's diameter will increase by 0.967901 inches during the tree's first year of life.
c. The median-median line cuts through the x-axis at 0.967901.
d. The median-median line y-intercept indicates that a tree has a diameter of 0.967901 inches when the tree is 0 years old.

14. Which statement below is the most meaningful interpretation of the sum of the squared residuals (SSE) for the least-squares regression line?

a. Because the SSE is greater than 25, this straight-line model is of no use for predicting tree diameter (y).
b. Because the SSE is so small, the least-squares regression line will be the best model for these data.
c. No other linear model fit to these data will produce a smaller sum of the squared residuals.
d. Only the median-median line model will produce a smaller SSE.


15. A stem-and-leaf plot _?_.

a. can have only one-line stems
b. cannot be used with values expressed as decimal fractions
c. does not preserve the values of a data set
d. is not a visual summary of the data
 

16. Among the following statistics, which one is most likely being used to support the following statement:

"The first-place score of 178 is clearly an outlier among all scores for this event"?
 
a. correlation coefficient
b. 5-number summary
c. mean
d. mode
e. SSE

17. Which of the following orders correctly represents the measures of central tendency for the distribution shown here?

a. A: mean B: median C: mode
b. A: median B: mean C: mode
c. A: median B: mode C: mean
d. A: mode B: mean C: median
e. A: mode B: median C: mean


Use the diagram below for questions 18 through 21.

18. In the box plot, where is the 75th percentile located?

a. at 17
b. at 20
c. at 29
d. somewhere between 20 and 38

19. Which quartile in the data set exhibits the most spread?

a. the lowest quartile
b. the 25th to 50th percentile
c. the 50th to 75th percentile
d. the highest quartile

20. What value in the data set is the smallest value inside the lower inner fence?

a. -4
b. 5
c. 14
d. 28
e. More than one of these values satisfy the conditions stated.
f. None of these values satisfy the conditions stated.

21. Although not shown in the plot, which of the following values would be considered an outlier in this data set?

a. -5
b. 6
c. 17
d. 25
e. More than one of these values would be an outlier.
f. None of these values would be outliers.


The following situation is used for problems 22 and 23.

Suppose that the ages in years of professional golfers on the PGA Tour forms a mound-shaped (normal) distribution with a mean of 34 years and a standard deviation of 6 years.

22. Ages ranging from 34 years to 40 years represent approximately what portion of all the ages in this distribution?

a. 5%
b. 13.5%
c. 27%
d. 34%
e. 68%
f. 95%

23. Determine an age that will be exceeded by approximately 97.5% of all ages in the distribution.

a. 16 years
b. 22 years
c. 28 years
d. 34 years
e. 40 years
f. 46 years


24. Here is a representation for a linear relationship:

Choose the representation below that captures the same linear relationship. Correct Response: E

25. Determine the upper inner fence for this data set: 1,2,4,4,4,7,8,8,9,9,9,9,9,11,15,15,20

a. -5
b. 6
c. 10
d. 19
e. 20
f. None of these.

Part II: Open Response

Complete each question and write your response in the space provided.

26. Here are weight-loss results for nine people that are part of a low-carb diet program.

Participant ID #
Number of Weeks on the Diet (x)
Total Weight Loss in Pounds (y)
1
2
3
4
5
6
7
8
9
10
4
8
5
12
7
5
9
6
19
18
8
16
26
12
12
29
15

a. On the grid provided above, create a scatter plot for these data. Represent weeks on the diet on the horizontal axis (x) and weight loss in pounds on the vertical axis (y).

b. Suppose that we want to create a median-median line for these data. Begin that process by finding the three summary points used to help determine the equation of the line. DO NOT go beyond this step! You must identify these ordered pairs on the scatter plot and also list them here.

(x1,y1) = (5,16)
(x2,y2) = (7,12)
(x3,y3) = (10,26)

c. When we complete the process of determining the median-median line of best fit, the equation is y = 2x + (10/3), where x represents number of weeks in the program and y represents total weight loss in pounds.

i) Interpret the value 2 in the median-median line equation as it relates to these data.

 For each increase of 1 week in the weight-loss program, a participant will experience a 2-pouind weight loss.

ii) Interpret the value (10/3) as it relates to these data and the median-median line.

 With respect to the equation itself, it says that for 0 weeks in the program, a participant will lose 10/3 pounds. Of course, this isn't meaningful if we assume that some positive time participation is required to actually lose weight.

iii) Use the median-median line to predict the weight loss for a participant who has been in the program for 11 weeks.

 When x = 11 weeks in the median-median line, we determine that y = 25.3 pounds lost (rounded).

d. Enter these data into your calculator and create the least-squares linear regression equation, where weeks on the diet is the independent variable (x) and weight loss in pounds is the dependent variable (y).

 y = 1.363095238x + 7.226190476

e. For these data, calculate the difference between the SSEs for the median-median line and the least-squares line.

median-median line SSE: 289.666667,
least-squares linear regression SSE: 261.5059524,
difference: 28.16071429


27. The questions here are to be used with the planetary data shown below. Begin by carefully entering the data into your calculator.

This data shows the distance and orbital period of each planet in our solar system. Note that planet Earth is the third entry in the table.

Distance from the Sun in Astronomical Units (x)
Orbital Period in Years (y)
0.386
0.720
1.00
1.52
5.19
9.53
19.2
30.0
39.5
0.241
0.615
1.00
1.88
11.9
29.4
83.8
164
248

a. Use your calculator to generate the least-squares linear regression line for these data. Write each equation in the form y = mx + b, where x is the distance from the sun and y is the orbital period.

 y = 6.103775467x - 12.50541652

b. State the correlation coefficient for the least-squares linear regression line.

 r = 0.988682691

c. Calculate the SSE for the least-squares linear regression line.

 SSE = 1438.910023

d. Calculate and plot the residuals for the least-squares linear regression line. Sketch the residual plot here. On the graph, include indication of the scale you are using.

e. Explore at least two other models to better represent these data as compared to the least-squares linear regression line. Report the results of your exploration with respect to the criteria we have identified for judging goodness of fit.

Here is goodness-of-fit information for the potential models available using your calculator's regression capabilities. Your report should include identification of at least two models beyond the linear and discuss the various components of each model that comprise goodness of fit.

You also might be interested to know that Kepler's Third Law precisely describes the relationship between a planet's distance from the sun and its orbital period!

Name/Type of Model
SSE
correlation coefficient (r)
residuals
least-squares linear regression
1438.910
r=0.988682691
pattern (quadratic?)
median-median line
1924.299
r not calculated

quadratic regression
18.3277
R^2=0.999713
pattern (cubic?)
cubic regression
0.979485
R^2=0.9999846795
pattern (quartic?)
quartic regression
0.1557246959
R^2=0.9999975643
pattern (damped occilation?)
exponential regression
229407.86
r=0.8934489799
pattern
logrithmic regression
20425.904
r=0.824930726
pattern
power regression
0.1372815378
r^2=0.9999989029
r=0.9999994514
pattern (cubic?)
logistical regression
423.0237
r not calculated
pattern
sinusoidal regression
512300.97
r not calculated
pattern