Illinois State University Mathematics Department


MAT 312: Probability and Statistics for Middle School Teachers

Dr. Roger Day (day@math.ilstu.edu)



Putting It All Together:
Location, Spread, and Shape For One-Variable Data Sets


Throughout the previous sections, we have described and illustrated characteristics of locations, spread, and shape for one-variable data sets. Here, we consider all three of these characteristics as we explore and analyze specific data sets.

Example #1

Suppose a manufacturing process creates pipes of length 50 cm. Because the process isn't perfect, some of the pipes may measure less than 50 cm and some may measure more than 50 cm. Every day during the manufacturing process, a random sample of pipes is pulled from the entire group that has been manufactured and their lengths are analyzed. Here are the lengths from a random sample of 30 pipes pulled from those manufactured on a recent day. Assume that the sample pipe lengths are from a population of pipe lengths that has a normal distribution.

The pipe buyer has been assured by the manufacturer that 95% of all pipe lengths are within 0.02 cm of 50 cm. Does this sample support that claim? Write a concise paragraph to respond to this question and include evidence to support your claim.

50.032
49.993
49.928
50.037
50.017
49.988
50.009
49.988
50.018
49.990
50.013
50.018
49.995
50.000
50.040
49.973
50.002
50.024
50.034
50.055
50.011
50.034
49.993
49.986
49.958
49.975
49.999
50.023
50.044
50.000

The mean of the sample is 50.0059 cm with a sample standard deviation of 0.0273 cm. Here are those values represented within a picture of the normal distribution.

From our knowledge of the normal distribution, the manufacturer claims that 95% of the lengths are within the range from 49.98 cm to 50.02 cm. This represents two standard deviations away from the mean of 50.00 cm. However, the sample, assumed to be from a population of pipe lengths that are normally distributed, shows that 95% of the pipe lengths are within the range from 49.9513 cm to 50.0605 cm. This is because the mean of the sample is 50.0059 cm with a sample standard deviation of 0.0273 cm.

The sample shows that the manufacturing process has more error than the manufacturer claims.

Example #2

Here are data about the land speed of 32 different animals. Describe the location, spread and shape of the distribution of these data.

Land Speed of Various Animals, in Miles Per Hour

Cheetah

70

Coyote

43

Mule Deer

35

Human

28

Pronghorn Antelope

61

Gray Fox

42

Jackal

35

Elephant

25

Wildebeest

50

Hyena

40

Reindeer

32

Black Mamba Snake

20

Lion

50

Zebra

40

Giraffe

32

6-Lined Race Runner

18

Thomson's Gazelle

50

Mongolian Wild Ass

40

White-Tailed Deer

30

Wild Turkey

15

Quarter Horse

48

Greyhound

39

Wart Hog

30

Squirrel

12

Elk

45

Whippet

36

Grizzly Bear

30

Pig (domestic)

11

Cape Hunting Dog

45

Rabbit (domestic)

35

Cat (domestic)

30

Chicken

9

Most of these measurements are for maximum speeds over approximate quarter-mile distances. Exceptions include the lion and the elephant, whose speeds were clocked in the act of charging; the whippet, which was times over a 200-yard run (of 13.6 seconds); and the black mamba and six-lined race runner, which were measured over very small distances. Source: The World Almanac and Book of Facts, 1994, p. 175.

On the left below is a TI-83 screen shot showing both a modified box-and-whiskers plot and a histogram, for the data shown above. On the calculator screen, the horizontal scales ranges from 5 to 75 miles per hour (mph), with tick marks every 5 units. The vertical scale ranges from 0 to 15 with tick marks every 2 units. The two additional screen shots show numerical statistics calculated by the TI-83.

The median and mean of the data set are very close, at or near 35 mph. The mode is 30 mph. Based on these numerical summaries and the visual summaries (box plot and histogram), we can safely anchor the distribution at 35 mph. Because the difference between the mean and median is so close to 0, there is little skewness in this distribution.

The 5-number summary for the data is 9-29-35-44-70, resulting in a range of 61, a lowspread of 26, a midspread of 15, and a highspread of 35. These values indicate that the data is more highly concentrated in the middle of the range of speeds and more spread out at the ends of the ditribution. This is supported by the histogram as well, for it shows more values in the middle of the distribution (24 of the 32 values, or 75% of the data, range from 25 to 54 mph) than at either end of the distribution. The TI-83 modified box plot indicates one outlier value, at 70 mph, the speed of the cheetah. You can verify that this speed is more than 1.5 midspreads beyond the 75th percentile.

Although not perfectly a normal distribution, the distribution is somewhat mound-shaped. A perfectly normal distribution for a population with a mean of 35.19 and a standard deviation of 13.82 would have 22 of 32 values (approximately 68% of the data) in the range from 21 mph to 49 mph. Here, there are 21 such values. Likewise, for a perfectly normal distribution, 30 values would range from 7 mph to 63 mph. Here, 31 values fall in that range.

Example #3

The 2000 presidential election was hotly contested and highly controversial. A Federal Elections Commission website provides a variety of data related to this election.

  • From this website, generate a table to show the percent of popular vote, by state (including the District of Columbia), earned by Al Gore and George W. Bush. Round percentages to the nearest hundredth of a percent.

Analyze the percentages calculated and present a report of your analysis.

In your report:

  • Create at least two different visual displays and two different visual summaries of the data. For at least one visual display and at least one visual summary, your visual representation should include both data sets for comparative purposes.
  • Report on where the center of each data set resides as well as on the variability of each data set.
  • Describe the overall shape of the distributions.

Here are a dot plot that compares the data as well as a back-to-back stemp-and-leaf plot. You can use your calculator to create a histogram as well as a box plot. Try showing the two box plots on the same screen for comparison.

 

Numerical Summaries

mean
median
standard deviation
5-number summary
Bush
49.63
50.42
10.30
8.95--43.97--50.42--56.84--67.76
Gore
46.01
46.46
10.10
26.34--40.9--46.46--50.63--85.16

For each candidate the mean and median are quite similar so either measure can be seen as the center location of the data. The standard deviation for Bush is slightly higher than for Gore, likely because of the extremely low percentage for the District of Columbia for Bush (8.95%).

Thse distributions are mound-shaped and skewed. The middle 50% of the data for Gore is more highly compressed than for Bush.

Example #4

The dot plots shown here represent lengths of steel rods created by machines A, B, C, and D at a manufacturing plant. The rods are to have length 4.7 inches with an error allowance of 0.1 inches above or below that value. Any rod outside these specifications is not delivered to the buyer.

  • Describe each distribution in terms of its location and its spread. Justify your description.

Machine A: The distribution is anchored just beyond 4.7 inches (median) and it appears to be almost uniformly distributed from just greater than 4.6 inches to just less than 4.8 inches.

Machine B: This distribution is also anchored just beyond 4.7 inches (median) but it appears to be widely distributed from just greater than 4.5 inches to approximately 4.95 inches. There are modest gaps at around 4.65 inches and 4.75 inches, with a cluster inbetween and just below the gap at 4.65 inches. Except for the gaps, the data is close to being uniformly distributed.

Machine C: This distribution is anchored at about 4.75 inches (median) but it appears to be bimodal, with data values distributed from 4.5 inches to just less than 4.9 inches. There are clusters of values around the 4.6 inch length and from 4.75 inches to 4.8 inches. Except for the gaps, the data is close to being uniformly distributed. The data are more spread out in the lower 50% of the distribution compared to the upper 50% of the distribution.

Machine D: This distribution is anchored at about 4.73 inches (median) and it appears to almost mound-shaped and symmetrical, with data values distributed from just greater than 4.6 inches to just less than 4.9 inches. The data are more compressed in the middle 50% of the distribution compared to the lower 25% or upper 25% of the distribution.

  • Of the four machines, which, if any, need not be checked or altered? Why is that?

Machine A seems to be performing according to specifications. Machine D is not far from that, but is anchored a bit high and spread out more than allowed by the error allowance. Both Machines B and C need attention.

  • Which machine seems most stable in production? Which is least stable? How can you tell?

Machine A seems quite stable, given its distribution as alost uniform. Machine D represents a sample we could expect from a normally distributed population, which is what we might expect from this process. Machines B and C seem least stable, with Machine B showing a large variation and Machine C showing inconsistent production across a distribution that is outside the tolerance levels.

  • Which machine produces rods farthest from the target length? How did you determine that?

Machine B has the greatest deviation from the target length 4.7 inches. We can see this by comparing each value in the distribution to the desired location 4.7 inches. 




Return to MAT 312 Homepage