Illinois State University Mathematics Department


MAT 312: Probability and Statistics for Middle School Teachers

Spring 1999
9:35 - 10:50 am TR STV 350A
Dr. Roger Day (day@math.ilstu.edu)



Possible Solutions to Spring 99 Semester Exam

Part I: Multiple Choice

For each question, choose the one best response and circle that letter at the appropriate spot on the answer sheet.

1.

The manufacturer of a new type of light bulb wants to show that the new bulbs outlast those of a major competitor. The manufacturer tested 30 bulbs and recorded the life span of each. Here are the data.

The data are represented in a _?_.

a. line plot
b. stem-and-leaf plot
c. box-and-whisker plot
d. scatter plot
e. vertical plot
f. nice plot

2.

What portion of the bulbs tested lasted less than 400 hours? (See question 1.)

a. 6%
b. 20%
c. 24%
d. 80%

3.

Determine the 75th percentile of this data set. (See question 1.)

a. 420 hours
b. 480 hours
c. 490 hours
d. 630 hours

4.

The plot here shows the distribution of heights of residents in a Rockford nursing home. The median height lies in which measurement class?

a. 50-55 inches
b. 55-60 inches
c. 60-65 inches
d. 65-70 inches
e. None of these measurement classes contain the median height.

5.

This visual representation shows test scores of 48 students in a science course. How many students scored at least 80 on the test?

a. 12
b. 20
c. 40
d. 50

6.

If a represents the number of students who scored in the second quartile and b represents the number of students who scored in the third quartile, which statement is most correct? (See question 5 figure.)

a. a = b
b. b < a
c. a = 19
d. b = 13

7.

In a distribution that is negatively skewed, which statement is most likely to be true?

a. The mean and median will be equal.
b. The mean will be greater than the median.
c. The mean will be less than the median.
d. The mean will equal 0.

8.

For the plot here, what correlation coefficient is most likely? Assume that the axes scales are equal.

a. 1.0
b. 0.67
c. 0.06
d. &endash;0.64
e. &endash;1.0

9.

The time that it takes to drive from Stevenson Hall to Eastland Mall at 9:00 am on a Saturday is normally distributed with a mean of 12 minutes and a standard deviation of 2 minutes.

Driving times ranging from 10 minutes to 12 minutes represent approximately what portion of all the driving times?

a. 5%
b. 13.5%
c. 27%
d. 34%
e. 68%
f. 95%

10.

[Refer to question 9.] Determine the driving time that will be exceeded by approximately 97.5% of all drivers making the trip from Stevenson Hall to Eastland Mall at 9:00 am on a Saturday.

a. 6 minutes
b. 8 minutes
c. 10 minutes
d. 14 minutes
e. 16 minutes
f. 18 minutes

11.

The visual representation shown here helps describe the relationship between mathematics placement test scores and writing test scores for an incoming class of students.

The plot provides information about the _?_ of that relationship.

a. source, direction, and value
b. center, spread, and shape
c. location, value, and shape
d. shape, strength, and direction
e. direction, shape, and location

12.

State an appropriate estimate for the slope of a straight line that could be fit to these data. Note the axes scales. (See question 11.)

a. &endash;0.75
b. &endash;0.35
c. 0
d. 1.2
e. 4.0
f. 8.54

13.

A running club on a college campus has 20 members, each of whom is eligible to serve on the club's 4-member executive committee. In how many ways can the executive committee be selected from the membership?

a. 4845
b. 116,280
c. 160,000
d. None of these are correct.

14.

If the chair of the executive committee is to be elected from among the 20 members, in how many ways could the 4-member executive committee be formed, assuming that the elected chair is to be one of the four members? (See question 13.)

a. 4845
b. 116,280
c. 160,000
d. None of these are correct.

15.

The spinner shown here is spun once. Each of the center angles measures 60. Determine the probability that the pointer lands on 20 or 30, given that it lands on a multiple of 10.

a. 1/6
b. 1/3
c. 1/2
d. 2/3
e. None of these are correct.

16.

Here are the theoretical probabilities for an experiment whose sample space is {0,1,2,3,4,5}.

x
0
1
2
3
4
5
p(x)
1/5
3/8
3/10
1/10
1/50
1/200

According to the theoretical probabilities, which value ought to occur most often?

a. 0
b. 1
c. 2
d. 3
e. 4
f. 5

17.

Determine P(x 3). (See question 16.)

a. 0
b. 1/40
c. 1/10
d. 1/8
e. 1

18.

Determine the expected value of this experiment. (See question 16.)

a. 1.0
b. 1.38
c. 2.0
d. 2.5
e. 3.0

19.

Suppose the experiment was carried out 200 times and a histogram of the results was created. The histogram would most likely appear _?_.

a. bimodal
b. symmetric
c. negatively skewed
d. uniform
e. positively skewed

20.

Two numbers, a and b, are each created using a TI-83 random number generator. The random number generator creates numbers between 0 and 1. What is the probability that both a and b are less than 0.5 ?

a. 0
b. 0.05
c. 0.25
d. 0.5
e. 0.75
f. 1.0


Part II: Open Response

Complete each question and write your response in the space provided. Please include descriptive comments as necessary.

 

A.

A student group sells donuts at the mall. On recent Saturdays, they've been recording the number of donuts sold, along with the selling price. Here's the data:

A.1. On the grid provided, create a scatter plot for these data. Represent sale price in cents on the horizontal axis (x) and number of donuts sold on the vertical axis (y). Clearly indicate how you have scaled each axis.

A.2. Explain whether either the table of values or your scatter plot reveal a relationship between donut sale price and the number of donuts sold.

Both the tabel of values and the scatter plot reveal a strong negative relationship between donut sale price and the number of donuts sold. As the price increases, the number sold decreases. The scatter plot shows an apparent linear relationship.

A.3. Least-squares linear regression is applied to the donut sales data set, resulting in the equation y = -36.583x + 2121.10, where x represents donut sale price in cents and y represents the number of donuts sold. A correlation coefficient of r = &endash;0.9995 is computed with this least-squares linear regression equation.

A.3.a. What is the slope of the regression equation? Describe its meaning in the context of this data set. Be specific.

The slope is -36.583. This represents the rate of change of the number of donuts sold and price of donuts. For every 1 cent increase in price (i.e., x increases by 1), the number of donuts sold decreases by 36 or 37 donuts (i.e., y decreases by 36.583).

A.3.b. State two reasons you might question or doubt the meaningfulness of the y&endash;intercept of the regression equation. Be specific.

The y-intercept is 2121.1. As an ordered pair in the context of the problem, this represents that at a price of 0 cents (apparently donuts are given away), the number of donuts sold (given away) is about 2121.

The meaningfulness of the y-intercepts could be doubted or questioned for several reasons. Two primary reasons are that this ordered pair represents an extrapolation beyond the existing data. That is, the apparent linear pattern exhibited by the data is assumed to continue outside the bounds of the data set, all the way to a price of 0 cents for the donuts. This linear pattern may not extend in this manner.

Also, giving donuts away could be questioned, for at least two reasons. Is it reasonable to suggest that the group would give donuts away, thereby apparently eliminating any income? Also, in connection with exptrapolation, if indeed the group did give away donuts, there may be far more or far fewer donuts than 2121 given away.

A.3.c. Use the linear regression equation to predict the number of donuts sold when the sale price is 60 cents.

When x=60 is substituted into the linear regression equation, a value of -73.88 is returned. This says that at a price of 60 cents, a negative number of donuts is sold.

While the arithmetic of the calculation is accurate, it is an impossible situation to sell a negative number of donuts. Therefore, the prediction would be that no donuts would be sold at the price of 60 cents. Again this assumes that the linear relationship inherent in the regression equation continues outside the bounds of the data set.

2, 2, 6: 10 points total

B.

On the wall at a local pizzeria is a square dart board, each side 10" long. For $1 a customer can try to win a pizza by throwing a dart at the board.

The board contains three smaller squares whose centers are at the center of the board. Any dart landing in the innermost square, a square with side length 1", earns a large pizza ($10 value). If a dart sticks in the first layer outside the innermost square, part of a square of side length 3", the customer gets a medium pizza ($5 value). For a dart sticking in the next layer, part of a square with side length 5", the customer gets a small pizza ($2 value). A dart on any other portion of the board wins no prize.

B.4. Suppose a dart hits the board at some random point. What is the probability of winning a medium-size pizza?

The area associated with winning a medium-size pizza is the 3-inch square less the 1-inch square within it. This represents an area of 8 square inches. The entire board has an area of 100 square inches. Therefore, the desired probability is 8/100 or 2/25.

B.5. If customers played this game many many times, and we assume that darts always hit the board at some random location, what is the expected gain or loss per play, from a customer's standpoint?

Here are the net gains (losses) associated with the four possible outcomes of a dart randomly hitting the board, together with the probabilities of each.

net gain (g)
9
4
1
-1
probability p(g)
1/100
8/100
16/100
75/100

The expected gain (loss) is the sum of the product of each net gain with its probability.

Expected gain

= 9(1/100) + 4(8/100) + 1(16/100) + (-1)(75/100)

= -18/100 = -18 cents

This says that in the long run, over the course of many plays, a player can expect to lose 18 cents per play.

B.6. Assume again that darts always hit the board at some random location. What is the net gain the pizzeria can expect if 1000 rounds of this game are played some weekend? Take into account only the cost to play and the value of the prizes.

If each player loses 18 cents per play, the pizzeria must gain 18 cents per play. Thus, over 1000 plays, the pizzeria can expect to have a net gain of $180.

3,3,4: 10 points total

C.

C.7. In the far-off world of Balbion III in the Mostarth Galaxy, each year every family in the city of Krameth is given a pet. There are three species of pets randomly distributed to the Kramethian families. A family receives a Quark with probability 0.2, a Rorst with probability 0.3, and a Swimp with probability 0.5.

Determine the probability that in a three-year sequence, a family gets:

We will use Q to mean a family gets a Quark, R to mean a family gets a Rorst, and S to mean a family gets a Swimp. Then P(Q) = 0.2, P(R) = 0.3, and P(S) = 0.5.

You may find a tree diagram helpful for this problem, or some sort of organized list that shows all possible 3-pet arrangements a family could have. Note that arrangement is important here, for QRS differs from SRQ in the order the pets were received by a family over a three-year period.

C.7.a. two Rorsts and a Quark

There are three ways to get two Rorsts and a Quark: RRQ, RQR, and QRR. Each of these has probability 0.018, determined by the product (0.2)(0.2)(0.3). The desired probability is then the sum of the three individual probabilities, or 0.054.

C.7.b. three pets all of the same species

The three situations are QQQ, RRR, and SSS. the probabilities associated with these are (0.2)^3, (0.3)^3, and (0.5)^3. The sum of these values is the probability we seek: 0.16.

C.7.c. at least two Swimps

Here are the ways for a family to get at least two Swimps, together with the probability of each:

arrangement
SSQ
SQS
QSS
SSR
SRS
RSS
SSS
probability
0.05
0.05
0.05
0.075
0.075
0.075
0.125

The sum of these probabilities represents the probability that a family gets at least two Swimps. Thast value is 0.50.

3,3,4: 10 points total

D.

Suppose we know that the distribution of the waiting times (in minutes) for drivers boarding a ferry boat on Lake Erie is mound-shaped and symmetrical, that is, the waiting times are normally distributed. The mean waiting time is 16 minutes and the standard deviation is 4 minutes.

D.8. What portion of all drivers will wait 16 minutes or less?

Because 16 is the mean of this normal distribution, half the waiting times will be greater than 16 minutes and half will be less than 16 minutes. Therefore half the drivers will wait 16 minutes or less.

D.9. What is probability that a driver will wait 20 minutes or more?

The waiting times up to 16 minutes represent half the data set and the times from 16 minutes to 20 minutes represent an additional 34% of the data values. This means that 84% of drivers wait 20 minutes or less. The remaining drivers, 16%, wait 20 minutes or longer.

D.10. Based on the information given, we know that approximately 2.5% of all drivers wait at least x minutes. What value of x makes this true?

This value, 2.5% of all drivers, represents the 2.5% longest waiting times in the data set. These data values represent values at and beyond 2 standard deviations above the mean. This value is 24 minutes, the desired value of x.

3, 3, 4: 10 points total

E.

This problem requires you to design and carry out a simulation. The situation is first described to you and then several questions are asked related to the simulation.

My nephew Seth noticed that Kellogg's cereals offered a set of 3 cartoon characters in its current cereal selections. One cartoon character is in each specially marked box of cereal and the cartoon characters are equally distributed among the cereal boxes currently coming off the production line. Seth wondered how many boxes of cereal he'd have to purchase to get the entire set of cartoon characters.

Design and carry out a simulation to address Seth's question. Assume that one trial of your simulation will determine the number of boxes of cereal he must purchase to get a complete set of 3 cartoon characters.

The solution presented here represents an example solution using one method and the outcomes of specific trials I carried out. Other solutions will result from different (equivalent) models and the specific outcomes of your trials.

E.11. Describe the model you will use to simulate this situation. In your description:

E.11.a. Indicate how you will generate random outcomes.
 

Random outcomes will be generated by the random number generator of a TI-83 graphing calculator.

 
E.11.b. Specify the decisions you will make based on the random outcomes.
 

I will produce random integers from the set {1,2,3}. Each value represents one of the three possible cartoon characters in the boxes. When a 2 shows up as the random number, that represents Seth getting the second of three unique cartoon characters.

 
E.11.c. Justify that your model accurately represents the situation.
 

The three characters are equally distributed in the boxes, so each is equally likely to be contained in a box selected at random off the shelf. The random numbers from the set {1,2,3} generated by the TI-83 are assumed to be equally likely to occur. The model for generating random numbers matches the probabilistic situation within the context of the problem.

E.12. Show the details of one trial of your simulation. Include:

E.12.a. a list of the random outcomes you generated for one trial,
 

One trial: 3, 2, 2, 3, 1

 
E.12.b. the decisions you made based on the random outcomes, and
 

The 3 represents the third of the three characters, the 2 the second, the 1 the first. I kept generating random numbers only until I had at least one of each.

 
E.12.c. the number of cereal boxes required to get a complete set of cartoon characters, based on this single trial.
 
The above trial meant that it required 5 boxes to generate a complete set of three unique cartoon characters.

E.13. Carry out at least 10 trials of this simulation. Use the results to answer Seth's original question: How many boxes of cereal will he have to purchase to get the entire set of cartoon characters?

The 10 trials, carried out as described here, required the following numbers of boxes to complete the set: 5, 8, 6, 6, 3, 5, 4, 7, 9, 5. Thus, 58 boxes were required, or approximately 6 boxes per trial.

I would tell Seth that he may get lucky and only require 3 boxes, but its more likely it will require 5 or 6 boxes, and perhaps even more.

To be more sure of our results we could run the simulation for 100 trials or 1000 trials and look at the distribution of the required number of boxes of cereal. Ten trials are not enough to make confident predictions.

3, 3, 4: 10 points total

BONUS!

Assume that 2% of the population is on drugs. A test is 98% accurate in indicating whether or not a person is on drugs. This means that people on drugs will test positive* 98% of the time and people not on drugs will test negative* 98% of the time.

Determine the probability that a person is on drugs given that the person's test result is positive. Provide clear and specific evidence to support your response.

If you're interested in getting to the solution of this problem, start by giving yourself a population of 100,000 people. Now divide the group into those using drugs and those not using drugs. Further divide the groups into those who test positive and those who test negative for drug use. You should be able to get a box of data that looks something like this, with values in each of the four cells:

100,000 people in all
positive test for drug use
negative test for drug use
those on drugs
?
?
those not on drugs
?
?

Next, try to determine what cells of the table relate to the question. What portion of those who test positive are on drugs?


*A positive test means the test results indicate the person is on drugs; a negative test means the test results indicate the person is not on drugs.