Illinois State University Mathematics Department

MAT 312: Probability and Statistics for Middle School Teachers Dr. Roger Day (day@math.ilstu.edu) |

Putting It All Together: Location, Spread, and Shape For One-Variable Data Sets |
|||

Throughout the previous sections, we have described and illustrated characteristics of locations, spread, and shape for one-variable data sets. Here, we consider all three of these characteristics as we explore and analyze specific data sets.

Example #1Suppose a manufacturing process creates pipes of length 50 cm. Because the process isn't perfect, some of the pipes may measure less than 50 cm and some may measure more than 50 cm. Every day during the manufacturing process, a random sample of pipes is pulled from the entire group that has been manufactured and their lengths are analyzed. Here are the lengths from a random sample of 30 pipes pulled from those manufactured on a recent day.

Assume that the sample pipe lengths are from a population of pipe lengths that has a normal distribution.The pipe buyer has been assured by the manufacturer that 95% of all pipe lengths are within 0.02 cm of 50 cm. Does this sample support that claim? Write a concise paragraph to respond to this question and include evidence to support your claim.

50.032 49.993 49.928 50.037 50.017 49.988 50.009 49.988 50.018 49.990 50.013 50.018 49.995 50.000 50.040 49.973 50.002 50.024 50.034 50.055 50.011 50.034 49.993 49.986 49.958 49.975 49.999 50.023 50.044 50.000 The mean of the sample is 50.0059 cm with a sample standard deviation of 0.0273 cm. Here are those values represented within a picture of the normal distribution.

From our knowledge of the normal distribution, the manufacturer claims that 95% of the lengths are within the range from 49.98 cm to 50.02 cm. This represents two standard deviations away from the mean of 50.00 cm. However, the sample, assumed to be from a population of pipe lengths that are normally distributed, shows that 95% of the pipe lengths are within the range from 49.9513 cm to 50.0605 cm. This is because the mean of the sample is 50.0059 cm with a sample standard deviation of 0.0273 cm.

The sample shows that the manufacturing process has more error than the manufacturer claims.

Example #2Here are data about the land speed of 32 different animals. Describe the location, spread and shape of the distribution of these data.

Land Speed of Various Animals, in Miles Per Hour Cheetah

70 Coyote

43 Mule Deer

35 Human

28 Pronghorn Antelope

61 Gray Fox

42 Jackal

35 Elephant

25 Wildebeest

50 Hyena

40 Reindeer

32 Black Mamba Snake

20 Lion

50 Zebra

40 Giraffe

32 6-Lined Race Runner

18 Thomson's Gazelle

50 Mongolian Wild Ass

40 White-Tailed Deer

30 Wild Turkey

15 Quarter Horse

48 Greyhound

39 Wart Hog

30 Squirrel

12 Elk

45 Whippet

36 Grizzly Bear

30 Pig (domestic)

11 Cape Hunting Dog

45 Rabbit (domestic)

35 Cat (domestic)

30 Chicken

9 Most of these measurements are for maximum speeds over approximate quarter-mile distances. Exceptions include the lion and the elephant, whose speeds were clocked in the act of charging; the whippet, which was times over a 200-yard run (of 13.6 seconds); and the black mamba and six-lined race runner, which were measured over very small distances. Source:

The World Almanac and Book of Facts, 1994, p. 175.On the left below is a TI-83 screen shot showing both a modified box-and-whiskers plot and a histogram, for the data shown above. On the calculator screen, the horizontal scales ranges from 5 to 75 miles per hour (mph), with tick marks every 5 units. The vertical scale ranges from 0 to 15 with tick marks every 2 units. The two additional screen shots show numerical statistics calculated by the TI-83.

The median and mean of the data set are very close, at or near 35 mph. The mode is 30 mph. Based on these numerical summaries and the visual summaries (box plot and histogram), we can safely anchor the distribution at 35 mph. Because the difference between the mean and median is so close to 0, there is little skewness in this distribution.

The 5-number summary for the data is 9-29-35-44-70, resulting in a range of 61, a lowspread of 26, a midspread of 15, and a highspread of 35. These values indicate that the data is more highly concentrated in the middle of the range of speeds and more spread out at the ends of the ditribution. This is supported by the histogram as well, for it shows more values in the middle of the distribution (24 of the 32 values, or 75% of the data, range from 25 to 54 mph) than at either end of the distribution. The TI-83 modified box plot indicates one outlier value, at 70 mph, the speed of the cheetah. You can verify that this speed is more than 1.5 midspreads beyond the 75th percentile.

Although not perfectly a normal distribution, the distribution is somewhat mound-shaped. A perfectly normal distribution for a population with a mean of 35.19 and a standard deviation of 13.82 would have 22 of 32 values (approximately 68% of the data) in the range from 21 mph to 49 mph. Here, there are 21 such values. Likewise, for a perfectly normal distribution, 30 values would range from 7 mph to 63 mph. Here, 31 values fall in that range.

Example #3The 2000 presidential election was hotly contested and highly controversial. A Federal Elections Commission website provides a variety of data related to this election.

- From this website, generate a table to show the percent of popular vote, by state (including the District of Columbia), earned by Al Gore and George W. Bush. Round percentages to the nearest hundredth of a percent.
Analyze the percentages calculated and present a report of your analysis.

In your report:

- Create at least two different visual displays and two different visual summaries of the data. For at least one visual display and at least one visual summary, your visual representation should include both data sets for comparative purposes.
- Report on where the center of each data set resides as well as on the variability of each data set.
- Describe the overall shape of the distributions.
Here are a dot plot that compares the data as well as a back-to-back stemp-and-leaf plot. You can use your calculator to create a histogram as well as a box plot. Try showing the two box plots on the same screen for comparison.

Numerical Summaries

mean median standard deviation 5-number summary Bush 49.63 50.42 10.30 8.95--43.97--50.42--56.84--67.76 Gore 46.01 46.46 10.10 26.34--40.9--46.46--50.63--85.16 For each candidate the mean and median are quite similar so either measure can be seen as the center location of the data. The standard deviation for Bush is slightly higher than for Gore, likely because of the extremely low percentage for the District of Columbia for Bush (8.95%).

Thse distributions are mound-shaped and skewed. The middle 50% of the data for Gore is more highly compressed than for Bush.

Example #4The dot plots shown here represent lengths of steel rods created by machines A, B, C, and D at a manufacturing plant. The rods are to have length 4.7 inches with an error allowance of 0.1 inches above or below that value. Any rod outside these specifications is not delivered to the buyer.

- Describe each distribution in terms of its location and its spread. Justify your description.
Machine A: The distribution is anchored just beyond 4.7 inches (median) and it appears to be almost uniformly distributed from just greater than 4.6 inches to just less than 4.8 inches.

Machine B: This distribution is also anchored just beyond 4.7 inches (median) but it appears to be widely distributed from just greater than 4.5 inches to approximately 4.95 inches. There are modest gaps at around 4.65 inches and 4.75 inches, with a cluster inbetween and just below the gap at 4.65 inches. Except for the gaps, the data is close to being uniformly distributed.

Machine C: This distribution is anchored at about 4.75 inches (median) but it appears to be bimodal, with data values distributed from 4.5 inches to just less than 4.9 inches. There are clusters of values around the 4.6 inch length and from 4.75 inches to 4.8 inches. Except for the gaps, the data is close to being uniformly distributed. The data are more spread out in the lower 50% of the distribution compared to the upper 50% of the distribution.

Machine D: This distribution is anchored at about 4.73 inches (median) and it appears to almost mound-shaped and symmetrical, with data values distributed from just greater than 4.6 inches to just less than 4.9 inches. The data are more compressed in the middle 50% of the distribution compared to the lower 25% or upper 25% of the distribution.

- Of the four machines, which, if any, need
notbe checked or altered? Why is that?Machine A seems to be performing according to specifications. Machine D is not far from that, but is anchored a bit high and spread out more than allowed by the error allowance. Both Machines B and C need attention.

- Which machine seems most stable in production? Which is least stable? How can you tell?
Machine A seems quite stable, given its distribution as alost uniform. Machine D represents a sample we could expect from a normally distributed population, which is what we might expect from this process. Machines B and C seem least stable, with Machine B showing a large variation and Machine C showing inconsistent production across a distribution that is outside the tolerance levels.

- Which machine produces rods farthest from the target length? How did you determine that?
Machine B has the greatest deviation from the target length 4.7 inches. We can see this by comparing each value in the distribution to the desired location 4.7 inches.

Return to MAT 312 Homepage |