Illinois State University Mathematics Department

 MAT 312: Probability and Statistics for Middle School Teachers Spring 1999 9:35 - 10:50 am TR STV 350A Dr. Roger Day (day@ilstu.edu)

### Possible Solutions: Test #1

Part I: Multiple Choice

For each question, choose the one best response and circle that letter at the appropriate spot on the answer sheet.

1. Which visual representation preserves the values in a data set?

a. box-and-whiskers plot
b. relative frequency histogram
c. circle graph
d. root-and-stalk plot
e. More than one of the representations preserve the values in a data set.
f. None of these representations preserve the values in a data set.

For items 2-5, select one of the following data types to best describe each variable.

 a. ratio data b. ordinal data c. interval data d. nominal data

 2. weight A 3. eye color D 4. birth order B 5. air temperature C

For questions 6-9, use the following data set: 2, 4, 6, 8, 10, 12

6. Determine the range of the data.

a. 2
b. 8
c. 10
d. 12

7. Determine the median of the data.

a. 6
b. 7
c. 8
d. 9

8. Determine the midspread of the data.

a. 4
b. 6
c. 8
d. 10

9. Determine the mean deviation of the data.

a. 3
b. 3.74
c. 6
d. 18

10. Consider the following data set: 13, 18, 28, 28, 31, 26, 35, 19. Which measure of central tendency will change the most if the "19" should have been a "49"?

a. mean
b. median
c. mode

11. What are three characteristics used to describe a data set?

a. shape, direction, strength

Use the diagram below for questions 12 through 15.

12. In the box plot, where is the 75th percentile located?

a. at 17
b. at 20
c. at 29
d. somewhere between 20 and 38

13. Which quartile in the data set exhibits the most spread?

a. the lowest quartile
b. the 25th to 50th percentile
c. the 50th to 75th percentile
d. the highest quartile

14. What value in the data set is the smallest value inside the lower inner fence?

a. -4
b. 5
c. 14
d. 28
e. More than one of these values satisfy the conditions stated.
f. None of these values satisfy the conditions stated.

15. Although not shown in the plot, which of the following values would be considered an outlier in this data set?

a. -5
b. 6
c. 17
d. 25
e. More than one of these values would be an outlier.
f. None of these values would be outliers.

16. In which situation is it appropriate to use the mode as the preferred measure of central tendency?

a. in reporting average selling price for homes in a community
b. when the distribution is significantly skewed to the left or to the right
c. in determining what size shoes to reorder in a retail establishment
d. when the data is considered modal data

17. In a negatively skewed distribution, which of the following is most likely true?

a. The mean and median are approximately equal.
b. The mean is greater than the median.
c. The mean is less than the median.
d. The median takes on the value 0.

18. The distribution of the life span of a certain South American fruit fly has a symmetric, mound-shaped (normal) distribution with a mean of 400 hours and a standard deviation of 25 hours. Within what bounds do we expect approximately 95% of the life spans to fall?

a. 375 to 425 hours
b. 375 to 450 hours
c. 350 to 425 hours
d. 350 to 450 hours

For questions 19-22, use this setting:

A light bulb manufacturer wants to show that a new bulb outlasts a major competitor's light bulb. The manufacturer tests 30 bulbs and records how long each bulb lasts. The data is shown here.

19. Of the 30 light bulbs sampled, what is the relative frequency of light bulbs lasting longer than 400 hours?

a. 6
b. 20%
c. 24
d. 80%

20. What value represents the 25th percentile of the observations?

a. 420 hrs
b. 480 hrs
c. 490 hrs
d. 500 hrs

21. What is the midspread of the light bulb data?

a. 150 hours
b. 480 hours
c. 510 hours
d. 630 hours
e. The midspread is impossible to determine here.

22. The distribution shown in the plot on the previous page appears to be _?_.

a. positively skewed
b. negatively skewed
c. symmetric
d. uniform

23. Which of the following orders correctly represents the measures of central tendency for the distribution shown here?

a. A: mean, B: median, C: mode
b. A: mode, B: mean, C: median
c. A: median, B: mode, C: mean
d. A: median, B: mean, C: mode
e. A: mode, B: median, C: mean
f. None of these orders are correct.

Use this information for questions 24 and 25:

The consumption of caffeine has become an increasingly important concern to consumers, because research has suggested that caffeine may be associated with birth defects, cancer, cardiovascular disease, and other health problems. Federal FDA scientists have conducted studies to assess the possible association between caffeine and these various health problems. Although many results have been inconclusive with regard to implicating caffeine as a cause for such health problems, the consumption of caffeine in moderation seems to be important (FDA Consumer, Dec 1987/Jan 1988).

One FDA study found that the distribution of the amount of caffeine in a 12-ounce can of a specific brand and variety of cola (Coke Classic) had a mound-shaped (normal) distribution with a mean of 19 milligrams (mg) and a standard deviation of 2 mg.

24. Based on this study, where can you expect about 95% of the caffeine measurements for 12-ounce cans of Coke Classic to fall?

a. between 17 and 21 mg
b. between 15 and 23 mg
c. between 13 and 25 mg
d. at least 19 mg

25. Suppose that one 12-ounce can of cola -- with its brand identification masked -- is randomly selected from a super-market shelf where there are many varieties of cola. The cola in this can is determined to have a caffeine content of 26 mg. What would you conclude about the brand of this randomly selected can of cola, and why?

a. It is impossible for it to be a can of Coke Classic, because 26 mg of caffeine was never found in the experimental study.

b. It is very unlikely to be a can of Coke Classic, because 26 mg falls outside the expected range of caffeine measurements for Coke Classic.

c. It could be a can of Coke Classic, because 26 mg is only 7 mg more than the mean of 19 mg.

d. It may or may not be Coke Classic. Although 26 mg seems like a lot, the next can of cola will probably have less caffeine.

Part II: Open Response

Complete each question and write your response in the space provided. Please include descriptive comments as necessary.

26. Here is a relative frequency histogram that shows points earned by 200 students in physical education classes at an elementary school. Use the histogram to answer questions (a) through (e) below. Include a brief note to describe how you arrived at your solution response.

a. What range of points was earned by the top 5% of the students?

The three right-most or upper measurement classes sum to represent 5% of the data. These three classes represent scores from 15 to 21, not including 21.

b. Suppose a child was chosen at random from the top 50% of those earning points. What is the minimum number of points that student could have earned?

The median, or 50th percentile, occurs in the 7-to-9 measurement class. We cannot, however, determine where within that class the actual score occurred. Therefore, the minimum number of points a student in the top 50% could have earned is 7 points.

c. How many students earned less than 5 points?

The first two measurement classes span from 1 to 5 points, not including 5 points. This represents 21% of the data, or 42 students (21% of 200).

d. In what measurement class does the 25th percentile lie?

From (c) we know that 21% of the data range from 1 to 5 points, and the next measurement class (5 to 7 points) represents another 27% of the data. The 25th percentile must be in this 5 to 7 point range.

e. Suppose this histogram was used to create a cumulative frequency histogram. What would be the height (cumulative frequency) for the measurement class spanning 11 to 13 points?

Adding the frequencies up to that class and including it, the cumulative value is 92%, or 0.92.

27. A data set of flange lengths has a midspread of 12 cm and a median of 50 cm. State a value that will lie between the lower inner fence and the lower outer fence if we know that the median divides the middle 50% of the data symmetrically. Describe the process you used to determine a value.

Any value between 8 and 26 satisfies the requirements stated in the problem. If the midspread is 12 cm and the median is at 50 cm, and the median divides the middle 50% of the data symmetrically, then the midspread ranges from 44 to 56 cm. The lower inner fence is at 26 cm, or 1.5 midspreads (18 units) below the 25th percentile (that is, below 44). The lower outer fence is at 8 cm, or 3.0 midspreads (36 units) below the 25th percentile (that is, below 44).

28. We have considered the properties of spread and location to help characterize a one-variable data set. Describe how these two properties differ. Be specific!

Location refers to the center or anchor point of a distribution. It provides us a typical or representative value of the data set. A representation of location serves to position a data set along the continuum from positive infinity to negative infinity. Typical location measures, or measures of central tendency, include the mean, the median, and the mode.

Spread refers to the variability or dispersion of a data set. Measures or pictures of spread represent how the data vary along the full range of the data set. Spread can also be interpreted as depicting the concentration of the data relative to its full range of values. Numerical measures of spread include variance, standard deviation, mean deviation, and absolute deviation, among others. Many visual representation of data help to show the spread of a data set. Histograms and box plots are among those useful visual representations, as are line plots and stem plots.

29. By federal law a box of cereal labeled as containing 16 ounces must contain at least 16 ounces of cereal. At Kick-a-Poo Milling, it is known that the machine filling the cereal boxes produces a distribution of fill weights that is symmetric and mound-shaped (normal), with mean equal to the setting on the machine and with a standard deviation equal to 0.03 ounces.

a. To insure that no less than 99% of the boxes contain at least 16 ounces, to what mean fill rate should the machine be set? Explain your response.

The machine's mean fill rate should be set to 16.09 ounces. With the standard deviation of 0.03 ounces, this will assure that 99% or more of the boxes of cereal will weigh at least 16%. This is because the weight 16 ounces is 3 standard deviations below the mean, and from 3 standard deviations below the mean to 3 standard deviations above the mean represents represents at least 99% of the data in a normal distribution.

b. Suppose that the weights of a random sample of filled 16-ounce cereal boxes has been checked. The sample shows a mean of 16.10 ounces and a standard deviation of 0.05 ounces. Suppose you are the floor supervisor at Kick-a-Poo Milling. What does this information tell you about your company meeting cereal-box fill-weight requirements? Explain.

This indicates that 16 ounces is 2 standard deviations below the mean, meaning that approximately 2.5% of the cereal box fill weights will be less than 16 ounces. This is because approximately 97.5% of the data in a normal distribution lie at or above 2 standard deviations below the mean.

In the context of the problem situation, the fill process is not assuring that Kick-A-Poo Milling meets the federal truth-in-labeling law described in the problem situation. Either the process must be "tightened up" (generate less variability in the fill weights) or the mean fill weight setting on the machine must be pushed up to 16.15 ounces. Carrying out the latter strategy will help assure compliance with the law, but will also result in more cereal being used to fill the boxes.