Illinois State University Mathematics Department

 MAT 312: Probability and Statistics for Middle School Teachers Spring 1999 9:35 - 10:50 am TR STV 350A Dr. Roger Day (day@ilstu.edu)

 Possible Solutions: Test #2 .

Part I: Multiple Choice

 1. The scatter plot to the right shows a _?_ relationship. Assume that vertical and horizontal axes are identically scaled. a. weak negative b. weak positive c. strong positive d. strong negative
 2. The visual representation shown here helps describe the relationship between mathematics placement-test scores and writing-test scores for an incoming class of students. The plot provides information about the _?_ of that relationship. a. source, direction, and value b. center, spread, and shape c. location, value, and shape d. shape, strength, and direction e. direction, shape, and location

3. Estimate the slope of a spaghetti line that might appropriately fit the data plotted in Question 2.

a. -0.75
b. -0.35
c. 0
d. 0.25
e. 0.9
f. 2.10

4. True or false: A median-median line is more resistant to outliers than is a least-squares linear regression line.

a. True
b. False

The following situation is used for problems 5 through 7.

 Table 1: Raises and Job Performance Ratings for University Administrators Administrator ID # Annual Salary Raise (y) Job Rating: 5-point Scale (x) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 \$18,000 16,700 15,787 10,608 10,268 9,795 9,513 8,459 6,099 4,557 3,741 3,718 3,652 3,227 2,808 2.76 1.52 4.40 3.10 3.83 2.84 2.10 2.38 3.59 4.11 3.14 3.64 3.36 2.92 3.00

A faculty group seeks to determine whether job rating (call it x) is a useful linear predictor of annual salary raise (call it y). The group uses least-squares linear regression to determine this prediction equation:

y = -1782.83x + 14012.17.

5. Which one of the following statements is most correct?

a. The prediction equation assures that the actual raises for the 15 administrators fall in a perfect straight line.
b. The prediction equation suggests that if we know an administrator's job rating, we can determine his or her exact salary raise.
c. The prediction equation suggests that for the mean job rating an administrator would get the mean salary increase.
d. The prediction equation suggests that, on the average, administrators have a poor job rating.
e. None of the statements (a) through (d) are correct.
f. More than one of the statements (a) through (d) are correct.

6. Which one of the following statements about the prediction line y = -1782.83x + 14012.17 is least correct?

a. The equation can be used to state the actual raise for an administrator by knowing his or her job rating.
b. The equation produces predicted raises with the sum of the squared residuals minimized.
c. If the linear model is accepted as valid, the equation could be used to predict an administrator raise from an administrator's job rating.
d. The equation produces predicted raises with an average residual (average error) of 0.

7. Which statement best interprets the meaning of the slope of the prediction equation?

a. For a 1-point increase in an administrator job rating, we can estimate a salary increase of \$1782.83.
b. For a 1-point increase in an administrator job rating, we can estimate a salary decrease of \$1782.83.
c. For an administrator with a job rating of 1.00, we can estimate his or her raise to be \$1782.83.
d. For a \$1 salary raise, we can estimate that the administrator job rating will decrease by 1782.83.

8. True or false: The best model to represent any two-variable data set is a least-squares linear regression equation.

a. True
b. False

9. A median-median line is to be generated on a scatter plot. The plot has been divided into three sections and a median-median point for each of the three sections has been determined. What is the next step in creating the median-median line?

Note that a response of (a) or (e) has been counted as correct for this question.
a. Calculate the slope of the median-median line.
b. Determine the value of r, the correlation coefficient.
c. Draw an ellipse around the points of the scatter plot.
d. Identify the centroid of the data.
e. None of the statements (a) through (d) correctly identify the next step.
f. More than one of the steps in statements (a) through (d) could be completed next.

The following situation is used for problems 10 through 14.

 Emotional exhaustion, or burnout, can be a significant problem for college students. Researchers have used linear models to investigate how emotional exhaustion may relate to aspects of college life. One study considered how an index of exhaustion (determined through responses to a questionnaire) related to what portion of a person's social contact was with students in the same program or field of study. The researchers called this factor "concentration of contacts." The table here lists the values of the emotional exhaustion index (higher values indicate greater emotional exhaustion) and percent concentration of contacts for a sample of 25 education students from a large university. It can be shown that the equation for the median-median line that models this data set is y = 11.51x - 148.833. Also, the equation of the least-squares linear regression line for this data is y = 8.865x - 29.497, where the sum of the squared residuals is 698,009.

10. True or false: The slope of the least-squares regression line is greater than the slope of the median-median line.

a. True.
b. False.
c. It cannot be determined from the information provided.

11. The median-median line model predicts that someone with a concentration of 50% will have an emotional exhaustion index of _?_.

a. 0.7825
b. 413.78
c. 426.67
d. 698,009
e. None of these are correct.

12. Which statement below is the most meaningful interpretation of the slope of the least-squares regression line?

a. The emotional exhaustion index is estimated to decrease by 29.497 units for every 1% increase in concentration.
b. The emotional exhaustion index is estimated to increase by 8.865 units for every 1% increase in concentration.
c. The percent concentration is estimated to increase by 8.865 units for every 1 unit increase in emotional exhaustion.
d. The least-squares slope, 8.865, is the smallest slope that can be estimated using these data.

13. Which statement below is the most meaningful interpretation of the y-intercept of the median-median line?

a. The emotional exhaustion index is -148.833 when concentration is 0%.
b. The emotional exhaustion index increases by 148.833 units when concentration is increased from 0% to 1%.
c. The median-median line y-intercept has no meaningful interpretation, because 0% concentration is outside the range of the data.
d. The median-median line cuts through the x-axis at -148.833.

14. Which statement below is the most meaningful interpretation of the sum of the squared residuals for the least-squares regression line?

a. No other linear model will produce a larger sum of squared residuals.
b. The large value of the sum of the squared residuals indicates that a straight-line model is of no use for predicting emotional exhaustion (y).
c. No other straight-line model fit to these data will produce a smaller sum of the squared residuals.

15. A box-and-whiskers plot is a visual representation best suited for _?_ data.

a. quantitative
b. qualitative
c. loose

16. Among the following statistics, which one is likely being used to help make the following claim:

"Based on the sample, we are 95% sure that the average speed of all drivers on this highway is between 63 and 69 miles per hour"?

a. 5-number summary
b. mode
c. median
d. standard deviation

 17. Which of the following orders correctly represents the measures of central tendency for the distribution shown here? a. P: mean Q: median R: mode b. P: mode Q: mean R: median c. P: median Q: mode R: mean d. P: median Q: mean R: mode e. P: mode Q: median R: mean

This visual representation shows test scores of 24 students in a statistics course.

18. If m students scored in the range shown and n students scored in the range shown, which statement is most correct?

a. n > m
b. n = m
c. m = 15
d. n = 21

The following situation is used for problems 19 and 20.

The distribution of times that it takes to drive from Illinois Wesleyan University to College Hills Mall at 5:00 pm on a weekday is mound shaped (normal) with a mean of 18 minutes and a standard deviation of 5 minutes.

19. Driving times ranging from 13 minutes to 23 minutes represent approximately what portion of all the driving times?

a. 5%
b. 13.5%
c. 27%
d. 34%
e. 68%
f. 95%

20. Determine a driving time that will be exceeded by approximately 2.5% of all drivers making the trip from IWU to College Hills Mall.

a. 8 minutes
b. 13 minutes
c. 18 minutes
d. 23 minutes
e. 28 minutes
f. 33 minutes

Part II: Open Response

Complete each question and write your response in the space provided. Please include descriptive comments as necessary.

21. Here are the results of the long jump and 50-meter dash for six middle-school students.

 Andy Becky Carl Dan Edna Fran x: jump length (ft) y: dash time (sec) 15 6 12 8 17 6 9 10 11 10 14 8

a. On the grid provided above, create a scatter plot for these data. Represent jump length on the horizontal axis (x) and dash time on the vertical axis (y). See plot above.

b. Suppose that we want to create a median-median line for these data. Begin that process by finding the three median-median points used to help determine the equation of the line. DO NOT go beyond this step! You may show the ordered pairs on the scatter plot, but also list them here.

Median-Median Points: (10,10), (13,8), (16,6)

c. When we complete the process of determining the median-median line of best fit, the equation is , where x represents a student's jump length and y represents a student's dash time.

i) Interpret the value as it relates to these data and the median-median line. The slope represents a decrease of two-thirds of a second in dash time for every one foot increase in jump length.

ii) Interpret the value as it relates to these data and the median-median line. This is the y-intercept of the median-median line. For the context of jump length vs. dash time, this would represent a dash time of seconds (16.67 seconds) for someone who had a long jump of 0 feet. In essence, it says that even if a person could jump no distance in the broad jump, that person could complete the 50-meter dash. We need to be suspect of this conclusion, given that it is an extrapolation well beyond the existing values in the data set.

iii) Use the median-median line to predict the dash time for a student who jumped 10 feet.

22. The questions here are to be used with the data set relating ages of trees to the diameters of the trees. The data set was distributed prior to the test.

a. Determine the centroid of the data. The centroid is at (22,5).

b. Use your calculator to generate the median-median line and the least squares linear regression line for these data. Write each equation in the form d = ma + b, where d is the diameter and a is the tree age.

 median-median line: d=0.1815a+0.9679 least-squares linear regression line: d=0.1606a+1.4658 Slopes and d-intercepts are rounded to nearest ten-thousandth.

c. With respect to the least-squares linear regression line, calculate the sum of the squared residual values for the first five data values in the table, that is, for the ordered pairs with ages 4, 5, 8, 8, and 8. Show your calculations here.

 tree age (a) tree diameter (d) predicted diameter (d') using least-squares model residual (d-d') square of residual 4 1.0 2.1084 -1.1084 1.2285 5 1.2 2.2690 -1.0690 1.1428 8 1.3 2.7510 -1.4510 2.1053 8 2.3 2.7510 -0.4510 0.2034 8 3.3 2.7510 0.5490 0.3014 SSE for first five values 4.9815

23. A 1990s labor dispute between the Major League Baseball Players' Association and the team owners changed drastically the face of professional baseball. During a recent spring, replacement players populated the training grounds of the major league teams.

Suppose we know that the distribution of the ages, in years, of all replacement players in spring training that season was mound shaped and symmetric (normal).

a. If the mean age of the replacement players was 20.4 years, and if 95% of the replacement players ranged from 18.6 to 22.2 years old, determine the standard deviation for the distribution. Show your calculations.

For a mound-shaped and symmetric (normal) distribution, we know that 95% of the data covers the range from 2 standard deviations below the mean to 2 standard deviations above the mean. So for this situation, the ages from 18.6 to 22.2 years represents 4 standard deviations. This results in a standard deviation of (22.2-18.6)/4 = 0.9 years.

b. Suppose, instead, that the mean age of the replacement players was 21.1 years and the distribution of ages had a standard deviation of 1.1 years. Was it unlikely that a team has a replacement player who would turned 24 years old during that training camp? Explain.

If we take "unlikely" to mean that a player's age is outside the middle 95% of the data, then a player's age would be unlikely if it was 2 or more standard deviations away from the mean. For the information given here, that means the age would have to be less than 21.1-2*1.1=18.9 years old or more than 21.1+2*1.1=23.3 years old. A 24-year-old replacement player would be unlikely under these conditions.