Illinois State University Mathematics Department
MAT 312: Probability and Statistics for Middle School Teachers
- Part I: 20 Multiple Choice Questions (1.2 pt each)
- Part II: 3 Open-Response Questions (11,8,6 pts)
- Total: 50 points
- Impact on Course Grade: 15% of your Semester Grade
Criteria Used to Evaluate Part II Responses
21: 11 points
- a) 2pts: accurate scatter plot
- b) 3 pts: correct median-median points
- c) 6 pts
- (i) and (ii): 2 pts each: accurate interpretation, clearly expressed
- (iii): 2 pts: correct numerical response to nearest tenth of a second
22: 8 points
- a) 2 pts: correct numerical response
- b) 4 pts: 2 pts for each correctly stated equation
- c) 2 pts: correct numerical response
23) 6 points
- a) 3 pts: correct numerical response
- b) 3 pts: accurate and clear explanation
Part I: Multiple Choice
For each question, choose the one best response and circle that letter at the appropriate spot on the answer sheet.
1.The scatter plot to the right shows a _?_ relationship. Assume that vertical and horizontal axes are identically scaled.
a. weak negative
b. weak positive
c. strong positive
d. strong negative
2.The visual representation shown here helps describe the relationship between mathematics placement-test scores and writing-test scores for an incoming class of students. The plot provides information about the _?_ of that relationship.
a. source, direction, and value
b. center, spread, and shape
c. location, value, and shape
d. shape, strength, and direction
e. direction, shape, and location
3. Estimate the slope of a spaghetti line that might appropriately fit the data plotted in Question 2.
- a. -0.75
- b. -0.35
- c. 0
- d. 0.25
- e. 0.9
- f. 2.10
4. True or false: A median-median line is more resistant to outliers than is a least-squares linear regression line.
- a. True
- b. False
The following situation is used for problems 5 through 7.
Table 1: Raises and Job Performance Ratings for University Administrators
Administrator ID #
Annual Salary Raise (y)
Job Rating: 5-point Scale (x)
A faculty group seeks to determine whether job rating (call it x) is a useful linear predictor of annual salary raise (call it y). The group uses least-squares linear regression to determine this prediction equation:
y = -1782.83x + 14012.17.
5. Which one of the following statements is most correct?
- a. The prediction equation assures that the actual raises for the 15 administrators fall in a perfect straight line.
- b. The prediction equation suggests that if we know an administrator's job rating, we can determine his or her exact salary raise.
- c. The prediction equation suggests that for the mean job rating an administrator would get the mean salary increase.
- d. The prediction equation suggests that, on the average, administrators have a poor job rating.
- e. None of the statements (a) through (d) are correct.
- f. More than one of the statements (a) through (d) are correct.
6. Which one of the following statements about the prediction line y = -1782.83x + 14012.17 is least correct?
- a. The equation can be used to state the actual raise for an administrator by knowing his or her job rating.
- b. The equation produces predicted raises with the sum of the squared residuals minimized.
- c. If the linear model is accepted as valid, the equation could be used to predict an administrator raise from an administrator's job rating.
- d. The equation produces predicted raises with an average residual (average error) of 0.
7. Which statement best interprets the meaning of the slope of the prediction equation?
- a. For a 1-point increase in an administrator job rating, we can estimate a salary increase of $1782.83.
- b. For a 1-point increase in an administrator job rating, we can estimate a salary decrease of $1782.83.
- c. For an administrator with a job rating of 1.00, we can estimate his or her raise to be $1782.83.
- d. For a $1 salary raise, we can estimate that the administrator job rating will decrease by 1782.83.
8. True or false: The best model to represent any two-variable data set is a least-squares linear regression equation.
- a. True
- b. False
9 . A median-median line is to be generated on a scatter plot. The plot has been divided into three sections and a median-median point for each of the three sections has been determined. What is the next step in creating the median-median line?
- a. Calculate the slope of the median-median line.
- b. Determine the value of r, the correlation coefficient.
- c. Draw an ellipse around the points of the scatter plot.
- d. Identify the centroid of the data.
- e. None of the statements (a) through (d) correctly identify the next step.
- f. More than one of the steps in statements (a) through (d) could be completed next.
The following situation is used for problems 10 through 14.
Emotional exhaustion, or burnout, can be a significant problem for college students. Researchers have used linear models to investigate how emotional exhaustion may relate to aspects of college life.
One study considered how an index of exhaustion (determined through responses to a questionnaire) related to what portion of a person's social contact was with students in the same program or field of study. The researchers called this factor "concentration of contacts." The table here lists the values of the emotional exhaustion index (higher values indicate greater emotional exhaustion) and percent concentration of contacts for a sample of 25 education students from a large university.
It can be shown that the equation for the median median line that models this data set is y = 11.51x - 148.833. Also, the equation of the least-squares linear regression line for this data is y = 8.865x - 29.497, where the sum of the squared residuals is 698,009.
10. True or false: The slope of the least-squares regression line is greater than the slope of the median-median line.
- a. True.
- b. False.
- c. It cannot be determined from the information provided.
11. The median-median line model predicts that someone with a concentration of 50% will have an emotional exhaustion index of _?_.
- a. 0.7825
- b. 413.78
- c. 426.67
- d. 698,009
- e. None of these are correct.
12. Which statement below is the most meaningful interpretation of the slope of the least-squares regression line?
- a. The emotional exhaustion index is estimated to decrease by 29.497 units for every 1% increase in concentration.
- b. The emotional exhaustion index is estimated to increase by 8.865 units for every 1% increase in concentration.
- c. The percent concentration is estimated to increase by 8.865 units for every 1 unit increase in emotional exhaustion.
- d. The least-squares slope, 8.865, is the smallest slope that can be estimated using these data.
13. Which statement below is the most meaningful interpretation of the y-intercept of the median-median line?
- a. The emotional exhaustion index is -148.833 when concentration is 0%.
- b. The emotional exhaustion index increases by 148.833 units when concentration is increased from 0% to 1%.
- c. The median-median line y-intercept has no meaningful interpretation, because 0% concentration is outside the range of the data.
- d. The median-median line cuts through the x-axis at -148.833.
14. Which statement below is the most meaningful interpretation of the sum of the squared residuals for the least-squares regression line?
- a. No other linear model will produce a larger sum of squared residuals.
- b. The large value of the sum of the squared residuals indicates that a straight-line model is of no use for predicting emotional exhaustion (y).
- c. No other straight-line model fit to these data will produce a smaller sum of the squared residuals.
15. A box-and-whiskers plot is a visual representation best suited for _?_ data.
- a. quantitative
- b. qualitative
- c. loose
- d. basketball
16. Among the following statistics, which one is likely being used to help make the following claim:
17. Which of the following orders correctly represents the measures of central tendency for the distribution shown here?
a. P: mean
Q: median R: mode
This visual representation shows test scores of 24 students in a statistics course.
18. If m students scored in the range shown and n students scored in the range shown, which statement is most correct?
a. n > m
b. n = m
c. m = 15
d. n = 21
The following situation is used for problems 19 and 20.
19. Driving times ranging from 13 minutes to 23 minutes represent approximately what portion of all the driving times?
20. Determine a driving time that will be exceeded by approximately 2.5% of all drivers making the trip from IWU to College Hills Mall.
Complete each question and write your response in the space provided. Please include descriptive comments as necessary.
21. Here are the results of the long jump and 50-meter dash for six middle-school students.
x: jump length (ft)
y: dash time (sec)
a. On the grid provided above, create a scatter plot for these data. Represent jump length on the horizontal axis (x) and dash time on the vertical axis (y).
b. Suppose that we want to create a median-median line for these data. Begin that process by finding the three median-median points used to help determine the equation of the line. DO NOT go beyond this step! You may show the ordered pairs on the scatter plot, but also list them here.
c. When we complete the process of determining the median-median line of best fit, the equation is , where x represents a student's jump length and y represents a student's dash time.
ii) Interpret the value as it relates to these data and the median-median line.
iii) Use the median-median line to predict the dash time for a student who jumped 10 feet.
22. The questions here are to be used with the data set relating ages of trees to the diameters of the trees. The data set was distributed prior to the test.
a. Determine the centroid of the data.
b. Use your calculator to generate the median-median line and the least squares linear regression line for these data. Write each equation in the form d = ma + b, where d is the diameter and a is the tree age.
c. With respect to the least-squares linear regression line, calculate the sum of the squared residual values for the first five data values in the table, that is, for the ordered pairs with ages 4, 5, 8, 8, and 8. Show your calculations here.
23. A 1990s labor dispute between the Major League Baseball Players' Association and the team owners changed drastically the face of professional baseball. During a recent spring, replacement players populated the training grounds of the major league teams.
Suppose we know that the distribution of the ages, in years, of all replacement players in spring training that season was mound shaped and symmetric (normal).
a. If the mean age of the replacement players was 20.4 years, and if 95% of the replacement players ranged from 18.6 to 22.2 years old, determine the standard deviation for the distribution. Show your calculations.
b. Suppose, instead, that the mean age of the replacement players was 21.1 years and the distribution of ages had a standard deviation of 1.1 years. Was it unlikely that a team has a replacement player who would turned 24 years old during that training camp? Explain.