Illinois State University Mathematics Department


MAT 312: Probability and Statistics for Middle School Teachers

Spring 1999
9:35 - 10:50 am TR STV 350A
Dr. Roger Day (day@math.ilstu.edu)



Possible Solutions to Problem Set #2
return to Problem Set #2


A. For the first question, you are to create a specified data set and then represent it in various ways.

Game #
Scores

The table to the left shows one of many many possible 20-game team-points scoring records that satisfies the restrictions of the problem. The scores are already listed in ascending order.

1
55
2
55

Here is a stem plot of the data.

3
57
4
58
5
59
6
60
7
65
8
65

Here is a box plot of the data. The 5-number summary is

55,59.5,79,82.5,86.

9
75
10
79
11
79
12
80
13
81
14
81
15
82

Here are the numerical values requested in the problem set.

mean

72.7

median

79

range

86-55=31

population standard deviation

11.472

5-number summary

55, 59.5, 79, 82.5, 86

lower outer fence

59.5 - 3*23 = 59.5 - 69 = -9.5

lower inner fence

59.5 - 1.5*23 = 59.5 - 34.5 = 25

upper inner fence

82.5 + 1.5*23 = 82.5 + 34.5 = 117

upper outer fence

82.5 + 3*23 = 82.5 + 69 = 151.5

16
83
17
84
18
85
19
85
20
86

mean
72.7
median
79
std dev
11.472

1. List the scores in ascending order. See above.

2. Create a stem-and-leaf plot and a box-and-whiskers plot of your data set. See above.

3. List the mean, the median, the range, the population standard deviation, the 5-number summary, and the upper and lower inner and outer fences. See above.

4. Describe the data as either nominal, ordinal, interval, or ratio data. Justify your choice.

The data are ratio data, for the values represent data that can be compared using multiplication or division and the value 0 represents the absence of points. Here, 45 points is half the points of a 90-point game, and when a team scores 80 points, it scores 4 times the number of points it does when it scores 20 points.

5. Based on your data set, comment on whether or not this womens' team had a winning or losing basketball season.

It's virtually impossible to tell whether the team had a winning or losing season when all we have is this information. Unless we know something about the opponents, either individually or collectively, we can draw no conclusions about the won-loss record of this team.

B. Here are some prices of compact-disc players from Consumer Reports. Generate a description of the prices for this set of CD players. Your description should include written, numerical, and visual representations.

Numerical Summaries

mean: $263.96
5-number summary
120, 140, 225, 352.50, 560
standard deviation: $136.40
midspread: $212.50
n=24 data points
no outliers beyond fences
The data are skewed slightly positive (the mean is greater than the median).

Visual Display: Stem-and-leaf plot

Visual Summaries (Histogram, Box Plot)

6. In your description, clearly identify the location, spread, and shape of the data.

location

The median price is $225.

Because the data are positively skewed, I used the median rather than the mean as a measure of location.

spread

The middle 50% of the data cover the prices from $140 to $352.50 (i.e., the midspread is $212.50).

Again, because the data are positively skewed, I used the midspread rather than the standard deviation as a measure of spread.

shape

The data set is positively skewed, with a concentration of prices from $120 to $185, then a long tail extending to $560.

7. Examine and reflect upon the representations you chose to use in your description. Why did you choose those particular representations?

Your answers will vary, depending on your choices. Much has to do with the size of the data set and the fact that it is positively skewed. See third column in table of response #6.

C. Here are statistics for the top 50 finishers in the 1994 Iditarod Dog Sled Race. The information was transferred from a file found on the Internet. Use the data to respond to the following questions.

8. Create a box plot and an absolute frequency histogram of the elapsed race times for the top 50 finishers. Note that race-time information is given in the four columns on the right side of the table, where elapsed time is expressed in days, hours, minutes, and seconds.

Here are the 50 race times expressed in days to the nearest hundredth of a day.

10.54
11.371
12.28
13.255
15.19
10.76
11.373
12.30
13.26
15.25
10.91
11.48
12.325
13.27
15.28
10.93
11.49
12.327
13.34
15.31
10.94
11.58
12.412
13.37
15.38
11.00
11.652
12.413
13.68
15.80
11.07
11.653
12.53
14.44
15.84
11.15
11.66
12.55
14.46
16.17
11.18
11.72
12.83
14.70
16.45
11.26
11.73
13.247
15.16
16.68

Here is a box plot and a histogram of the data.

Comments on similarities and differences between the box plot and the histogram.

Both plots are visual summaries of the data, capturing the data using a few specific values. Both plots show a concentration of values at the lower end of the scale. The box plot shows this with narrower quartiles and the hisrogram shows this by representing more than half the values in the data set in days 10, 11, and 12.

The figure above shows the classic difference between box plots and histograms: The box plot shows quartiles of data with variable endpoints. Each "chunk" is one-fourth the data set, but each covers a variable range. The histogram has a fixed range for each chunk (each measurement class), yet the number of data points in each chunk is variable.

a. On your box plot, label the 5-number summary of the data. See plot shown above.

b. State the inner and outer fences as units of time expressed in parts of a day to the nearest hundredth of a day (e.g., 12.54 days). Show the four fences on your box plot.

The midspread is 2.974 days. Moving out 1.5 and 3 times the midspread from the upper and lower quartiles shown above, the fences are 2.56 (lower outer), 7.02 (lower inner), 18.92 (upper inner), and 23.38 (upper outer).

c. Your box plot should be the type of box plot that identifies outliers with solid and hollow dots, if any exist.

No outliers exist using the fences criteria.

d. Line up vertically your box plot and your histogram so they share the same horizontal scale. Comment on similarities and differences in what is revealed in these two visual summaries.

See the plots above and the commentary just below the plot.

9. Use your calculator to create a scatter plot to compare the number of dogs on a team (fifth column in from right edge of table) to the elapsed race time for that team. Use the horizontal axis for number of dogs and the vertical axis for elapsed race time. Express elapsed race time in parts of a day as explained in question 8(b).

Write a brief, specific, and justifiable statement to describe what is revealed in the scatter plot about the relationship between the number of dogs on a team to the elapsed race time for that team. (You DO NOT need to sketch your scatter plot.)

The scatter plot shows no clear relationship between the number of dogs on a team and the elapsed race time. The ellipse test helps justify this, for an ellipse that captures the points in the scatter plot is close to being a circle. We can also look at a specific number of dogs on a team and see a wide variation in the elapsed times for teams with that number of dogs. for example, look at elapsed times for all 12-dog teams.

10. Use your calculator to generate a median-median line of best fit to model the relationship between the number of dogs on a team and the elapsed race time, expressed in parts of a day, for that team.

a. Write the equation of the median-median line. Use d as the independent variable that represents the number of dogs and t as the dependent variable that represents the elapsed race time for a team. In your equation, round the slope and the t&endash;axis intercept to the nearest thousandth of a unit.
t=0.385d+8.906

b. According to the median-median line, what race time is predicted for a team of 13 dogs? How does that compare to actual race results?

For d=13 dogs, t=13.909 days. The actual data shows that the 13-dog teams had elapsed times of 10.93 days, 14.46 days, 15.19 days, and 15.80 days. These times span more than a 4-day time period, and the value predicted by the median-median line is within the upper half of that range.

c. Identify the three ordered pairs that represent the centers of each one third of the data set, according to the median-median line procedure.

The median-median points are (8,12.53), (10, 11.66), and (12, 14.07), where the first value in each ordered pair is the number of dogs and the second value is the elapsed race time in days.

11. What information in the data table can be used to conclude that more than 50 racers participated? Explain.

In the "BIB" column there are numbers larger than 50. We could conjecture that there were the BIBS worn by the racers, each marked with the number indicated. If a number less than 59 isn't shown in this column, that racer may not have finished or at least may not have finished in the 50 fastest times.



Assigned: Tuesday 16 February 1999

Due: Tuesday 23 February 1999
.