Illinois State University Mathematics Department
MAT 312: Probability and Statistics for Middle School Teachers Spring 1999 
Possible Solutions: Test #2 . 
Part I: Multiple Choice
1. The scatter plot to the right shows a _?_ relationship. Assume that vertical and horizontal axes are identically scaled.
a. weak negative
b. weak positive
c. strong positive
d. strong negative
2. The visual representation shown here helps describe the relationship between mathematics placementtest scores and writingtest scores for an incoming class of students. The plot provides information about the _?_ of that relationship.
a. source, direction, and value
b. center, spread, and shape
c. location, value, and shape
d. shape, strength, and direction
e. direction, shape, and location3. Estimate the slope of a spaghetti line that might appropriately fit the data plotted in Question 2.
 a. 0.75
 b. 0.35
 c. 0
 d. 0.25
 e. 0.9
 f. 2.10
4. True or false: A medianmedian line is more resistant to outliers than is a leastsquares linear regression line.
 a. True
 b. False
The following situation is used for problems 5 through 7.
Table 1: Raises and Job Performance Ratings for University Administrators
Administrator ID #
Annual Salary Raise (y)
Job Rating: 5point Scale (x)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15$18,000
16,700
15,787
10,608
10,268
9,795
9,513
8,459
6,099
4,557
3,741
3,718
3,652
3,227
2,8082.76
1.52
4.40
3.10
3.83
2.84
2.10
2.38
3.59
4.11
3.14
3.64
3.36
2.92
3.00A faculty group seeks to determine whether job rating (call it x) is a useful linear predictor of annual salary raise (call it y). The group uses leastsquares linear regression to determine this prediction equation:
y = 1782.83x + 14012.17. 5. Which one of the following statements is most correct?
 a. The prediction equation assures that the actual raises for the 15 administrators fall in a perfect straight line.
 b. The prediction equation suggests that if we know an administrator's job rating, we can determine his or her exact salary raise.
 c. The prediction equation suggests that for the mean job rating an administrator would get the mean salary increase.
 d. The prediction equation suggests that, on the average, administrators have a poor job rating.
 e. None of the statements (a) through (d) are correct.
 f. More than one of the statements (a) through (d) are correct.
6. Which one of the following statements about the prediction line y = 1782.83x + 14012.17 is least correct?
 a. The equation can be used to state the actual raise for an administrator by knowing his or her job rating.
 b. The equation produces predicted raises with the sum of the squared residuals minimized.
 c. If the linear model is accepted as valid, the equation could be used to predict an administrator raise from an administrator's job rating.
 d. The equation produces predicted raises with an average residual (average error) of 0.
7. Which statement best interprets the meaning of the slope of the prediction equation?
 a. For a 1point increase in an administrator job rating, we can estimate a salary increase of $1782.83.
 b. For a 1point increase in an administrator job rating, we can estimate a salary decrease of $1782.83.
 c. For an administrator with a job rating of 1.00, we can estimate his or her raise to be $1782.83.
 d. For a $1 salary raise, we can estimate that the administrator job rating will decrease by 1782.83.
8. True or false: The best model to represent any twovariable data set is a leastsquares linear regression equation.
 a. True
 b. False
9. A medianmedian line is to be generated on a scatter plot. The plot has been divided into three sections and a medianmedian point for each of the three sections has been determined. What is the next step in creating the medianmedian line?
Note that a response of (a) or (e) has been counted as correct for this question.
 a. Calculate the slope of the medianmedian line.
 b. Determine the value of r, the correlation coefficient.
 c. Draw an ellipse around the points of the scatter plot.
 d. Identify the centroid of the data.
 e. None of the statements (a) through (d) correctly identify the next step.
 f. More than one of the steps in statements (a) through (d) could be completed next.
The following situation is used for problems 10 through 14.
Emotional exhaustion, or burnout, can be a significant problem for college students. Researchers have used linear models to investigate how emotional exhaustion may relate to aspects of college life.
One study considered how an index of exhaustion (determined through responses to a questionnaire) related to what portion of a person's social contact was with students in the same program or field of study. The researchers called this factor "concentration of contacts." The table here lists the values of the emotional exhaustion index (higher values indicate greater emotional exhaustion) and percent concentration of contacts for a sample of 25 education students from a large university.
It can be shown that the equation for the medianmedian line that models this data set is y = 11.51x  148.833. Also, the equation of the leastsquares linear regression line for this data is y = 8.865x  29.497, where the sum of the squared residuals is 698,009.
10. True or false: The slope of the leastsquares regression line is greater than the slope of the medianmedian line.
 a. True.
 b. False.
 c. It cannot be determined from the information provided.
11. The medianmedian line model predicts that someone with a concentration of 50% will have an emotional exhaustion index of _?_.
 a. 0.7825
 b. 413.78
 c. 426.67
 d. 698,009
 e. None of these are correct.
12. Which statement below is the most meaningful interpretation of the slope of the leastsquares regression line?
 a. The emotional exhaustion index is estimated to decrease by 29.497 units for every 1% increase in concentration.
 b. The emotional exhaustion index is estimated to increase by 8.865 units for every 1% increase in concentration.
 c. The percent concentration is estimated to increase by 8.865 units for every 1 unit increase in emotional exhaustion.
 d. The leastsquares slope, 8.865, is the smallest slope that can be estimated using these data.
13. Which statement below is the most meaningful interpretation of the yintercept of the medianmedian line?
 a. The emotional exhaustion index is 148.833 when concentration is 0%.
 b. The emotional exhaustion index increases by 148.833 units when concentration is increased from 0% to 1%.
 c. The medianmedian line yintercept has no meaningful interpretation, because 0% concentration is outside the range of the data.
 d. The medianmedian line cuts through the xaxis at 148.833.
14. Which statement below is the most meaningful interpretation of the sum of the squared residuals for the leastsquares regression line?
 a. No other linear model will produce a larger sum of squared residuals.
 b. The large value of the sum of the squared residuals indicates that a straightline model is of no use for predicting emotional exhaustion (y).
 c. No other straightline model fit to these data will produce a smaller sum of the squared residuals.
15. A boxandwhiskers plot is a visual representation best suited for _?_ data.
 a. quantitative
 b. qualitative
 c. loose
 d. basketball
16. Among the following statistics, which one is likely being used to help make the following claim:
17. Which of the following orders correctly represents the measures of central tendency for the distribution shown here? a. P: mean Q: median R: mode 

This visual representation shows test scores of 24 students in a statistics course.
18. If m students scored in the range shown and n students scored in the range shown, which statement is most correct?
The following situation is used for problems 19 and 20.
The distribution of times that it takes to drive from Illinois Wesleyan University to College Hills Mall at 5:00 pm on a weekday is mound shaped (normal) with a mean of 18 minutes and a standard deviation of 5 minutes.
19. Driving times ranging from 13 minutes to 23 minutes represent approximately what portion of all the driving times?
20. Determine a driving time that will be exceeded by approximately 2.5% of all drivers making the trip from IWU to College Hills Mall.
Complete each question and write your response in the space provided. Please include descriptive comments as necessary.
21. Here are the results of the long jump and 50meter dash for six middleschool students.
Andy 
Becky 
Carl 
Dan 
Edna 
Fran 

x: jump length (ft) y: dash time (sec) 
15 6 
12 8 
17 6 
9 10 
11 10 
14 8 
a. On the grid provided above, create a scatter plot for these data. Represent jump length on the horizontal axis (x) and dash time on the vertical axis (y). See plot above.
b. Suppose that we want to create a medianmedian line for these data. Begin that process by finding the three medianmedian points used to help determine the equation of the line. DO NOT go beyond this step! You may show the ordered pairs on the scatter plot, but also list them here.
c. When we complete the process of determining the medianmedian line of best fit, the equation is , where x represents a student's jump length and y represents a student's dash time.
ii) Interpret the value as it relates to these data and the medianmedian line. This is the yintercept of the medianmedian line. For the context of jump length vs. dash time, this would represent a dash time of seconds (16.67 seconds) for someone who had a long jump of 0 feet. In essence, it says that even if a person could jump no distance in the broad jump, that person could complete the 50meter dash. We need to be suspect of this conclusion, given that it is an extrapolation well beyond the existing values in the data set.
iii) Use the medianmedian line to predict the dash time for a student who jumped 10 feet.
22. The questions here are to be used with the data set relating ages of trees to the diameters of the trees. The data set was distributed prior to the test.
a. Determine the centroid of the data. The centroid is at (22,5).b. Use your calculator to generate the medianmedian line and the least squares linear regression line for these data. Write each equation in the form d = ma + b, where d is the diameter and a is the tree age.
medianmedian line: d=0.1815a+0.9679
leastsquares linear regression line: d=0.1606a+1.4658
Slopes and dintercepts are rounded to nearest tenthousandth.
c. With respect to the leastsquares linear regression line, calculate the sum of the squared residual values for the first five data values in the table, that is, for the ordered pairs with ages 4, 5, 8, 8, and 8. Show your calculations here.






























SSE for first five values 

23. A 1990s labor dispute between the Major League Baseball Players' Association and the team owners changed drastically the face of professional baseball. During a recent spring, replacement players populated the training grounds of the major league teams.
Suppose we know that the distribution of the ages, in years, of all replacement players in spring training that season was mound shaped and symmetric (normal).
a. If the mean age of the replacement players was 20.4 years, and if 95% of the replacement players ranged from 18.6 to 22.2 years old, determine the standard deviation for the distribution. Show your calculations.
For a moundshaped and symmetric (normal) distribution, we know that 95% of the data covers the range from 2 standard deviations below the mean to 2 standard deviations above the mean. So for this situation, the ages from 18.6 to 22.2 years represents 4 standard deviations. This results in a standard deviation of (22.218.6)/4 = 0.9 years.
b. Suppose, instead, that the mean age of the replacement players was 21.1 years and the distribution of ages had a standard deviation of 1.1 years. Was it unlikely that a team has a replacement player who would turned 24 years old during that training camp? Explain.
If we take "unlikely" to mean that a player's age is outside the middle 95% of the data, then a player's age would be unlikely if it was 2 or more standard deviations away from the mean. For the information given here, that means the age would have to be less than 21.12*1.1=18.9 years old or more than 21.1+2*1.1=23.3 years old. A 24yearold replacement player would be unlikely under these conditions.