๐Ÿ“Š Comparing Distributions

Cambridge Lower Secondary ยท Grade 7 ยท Statistics & Data

Average vs Spread
Mean tells you the centre.
Range tells you how spread out the data is.
Comparison Statement
Class A mean = 14, Class B mean = 9.
Class A scored higher on average by 5 marks.
Outlier Effect
One extreme value shifts the mean dramatically.
The median barely moves โ€” it's resistant to outliers.

Live comparison preview!

Class A: 12, 15, 9, 18, 11  ยท  Class B: 6, 8, 7, 10, 9

Class A mean: 13
Class B mean: 8

Class A's mean is 5 higher than Class B's, suggesting Class A performed better on average.

What you'll learn:

  • Calculate and interpret mean, median, mode and range
  • Compare two datasets using averages AND spread
  • Write structured comparison statements
  • Build and read back-to-back stem-and-leaf diagrams
  • Use bar charts and frequency diagrams to compare distributions
  • Identify outliers and understand their effect on the mean
  • Recognise positive skew, negative skew and symmetry

๐Ÿ“– Learn: Comparing Distributions

Part 1: Measures of Average

There are three measures of average, each with its own strength:

MeasureHow to find itBest used when...
MeanSum of all values รท number of valuesData has no extreme outliers
MedianMiddle value when data is orderedThere are outliers (it's resistant)
ModeMost frequently occurring valueDescribing the most common item
Example data: 3, 7, 7, 9, 12, 15, 15, 15, 20
Mean = (3+7+7+9+12+15+15+15+20) รท 9 = 103 รท 9 = 11.4
Median = middle value (5th of 9) = 12
Mode = most frequent = 15 (appears 3 times)
๐Ÿ’ก For an even number of values: median = average of the two middle values.

Part 2: Range โ€” Measuring Spread

The range tells us how spread out the data is.

๐Ÿ“Œ Formula: Range = Highest value โˆ’ Lowest value
Example: data = 3, 7, 7, 9, 12, 15, 15, 15, 20  โ†’  Range = 20 โˆ’ 3 = 17
๐Ÿ’ก A large range means the data is widely spread. A small range means the data is clustered together โ€” more consistent.

When comparing two groups, you must comment on both the average AND the spread. Commenting on only one gets half marks!

Part 3: Comparing Two Datasets

A complete comparison requires two statements:

1. Compare an average (mean or median) โ€” who scored higher/lower and by how much?
2. Compare the spread (range) โ€” whose results were more consistent/variable?
Template:
"Class A has a higher mean of [xฬ„ = 14] compared to Class B [xฬ„ = 9], suggesting Class A performed better on average.
Class A also has a smaller range [10] than Class B [18], so Class A's results were more consistent."
๐Ÿ’ก Always state the actual values in brackets โ€” this is what earns marks in exams.

Part 4: Back-to-Back Stem-and-Leaf Diagrams

A back-to-back stem-and-leaf diagram places two datasets either side of a shared stem, making comparison easy.

Dataset A (Class A scores): 12, 15, 23, 25, 28, 31, 34
Dataset B (Class B scores): 11, 18, 21, 22, 29, 30, 36
Class A (read right to left) Stem Class B (read left to right)
52 1 18
853 2 129
41 3 06
Key: 2 | 1 means 12 (Class A reads right-to-left from stem)
Median of Class A (7 values) = 4th value = 25
Median of Class B (7 values) = 4th value = 22
๐Ÿ’ก Always write a key! e.g. "Key: for Class A, 5|2 means 25"

Part 5: Outliers and Skew

Outlier: a value that is much higher or much lower than the rest of the data.

Dataset without outlier: 8, 9, 10, 11, 12  โ†’  Mean = 10, Median = 10
Dataset with outlier: 8, 9, 10, 11, 12, 50  โ†’  Mean = 16.7, Median = 10.5
๐Ÿ’ก The mean is pulled towards the outlier. The median barely changes. When data has outliers, the median is a more representative average.

Skew describes the shape of a distribution:

Positive Skew
mode mean

Long tail to the right.
Mean > Median > Mode.

Symmetric
mean=median=mode

Balanced about centre.
Mean = Median = Mode.

Negative Skew
mode mean

Long tail to the left.
Mean < Median < Mode.

๐Ÿ’ก Worked Examples

Example 1: Calculating All Four Measures

Find the mean, median, mode and range for: 4, 7, 3, 9, 7, 12, 7, 5

Step 1: Order the data: 3, 4, 5, 7, 7, 7, 9, 12
Step 2: Mean = (3+4+5+7+7+7+9+12) รท 8 = 54 รท 8 = 6.75
Step 3: Median = average of 4th and 5th values = (7+7) รท 2 = 7
Step 4: Mode = most frequent value = 7 (appears 3 times)
Step 5: Range = 12 โˆ’ 3 = 9
๐Ÿ’ก Always order the data first! It makes finding the median and range much easier.

Example 2: Writing a Full Comparison Statement

Class A test scores: 8, 12, 14, 15, 16    Class B test scores: 5, 6, 8, 9, 12

Class A: Mean = 65รท5 = 13, Median = 14, Range = 16โˆ’8 = 8
Class B: Mean = 40รท5 = 8, Median = 8, Range = 12โˆ’5 = 7
Comparison (average): Class A has a higher mean (xฬ„ = 13) than Class B (xฬ„ = 8), suggesting Class A performed better overall, scoring 5 marks higher on average.
Comparison (spread): Class A has a slightly larger range (8) than Class B (7), meaning Class B's results were marginally more consistent.
โœ… Always compare both an average and the spread. State the actual values in your comparison.

Example 3: Reading a Back-to-Back Stem-and-Leaf

The diagram shows reaction times (ms) for Group X and Group Y.

Group X Stem Group Y
9422158
8533047
62419

Key: For Group X, 5|3 means 35ms. For Group Y, 3|0 means 30ms.

Group X values: 22, 24, 29, 33, 35, 38, 42, 46 โ†’ Median = (35+38)รท2 = 36.5 ms
Group Y values: 21, 25, 28, 30, 34, 37, 41, 49 โ†’ Median = (30+34)รท2 = 32 ms
Comparison: Group X has a higher median reaction time (36.5 ms) than Group Y (32 ms), suggesting Group Y reacted faster on average.

Example 4: Outlier Effect on the Mean

A teacher records these homework scores: 6, 7, 7, 8, 8, 9, 42

Mean with outlier: (6+7+7+8+8+9+42) รท 7 = 87 รท 7 โ‰ˆ 12.4
Median with outlier: ordered data โ†’ 4th value = 8
If we remove the outlier (42): Mean = 45 รท 6 = 7.5, Median = (7+8)รท2 = 7.5
Comment: The value 42 is an outlier. It pulls the mean from 7.5 to 12.4, making it an unrepresentative average. The median (8) is a better measure here as it is resistant to the outlier.
๐Ÿ’ก The mean is sensitive to outliers. The median is resistant (robust) to them.

๐Ÿ“Š Interactive Visualizer

Live Dataset Comparator

Enter comma-separated numbers for each class, then press Compare!

Class A
Class B

Outlier Impact Demo

Base dataset: 5, 6, 7, 7, 8, 9, 8, 6. Watch what happens when you add an outlier!

Mean

6.875

Median

7

Back-to-Back Stem-and-Leaf Builder

Enter two datasets of two-digit numbers (10โ€“99). The diagram will be built automatically!

Group A
Group B

Write the Comparison Scaffold

Stats are shown below. Build your comparison sentence by selecting the correct options!

Class A: Mean = 14, Median = 13, Range = 12   |   Class B: Mean = 9, Median = 10, Range = 7

Sentence 1 โ€” Compare an average:

Class A has a than Class B , suggesting Class A .

Sentence 2 โ€” Compare the spread:

Class A has a range than Class B , meaning Class A's results were .

โœ๏ธ Exercise 1: Mean, Median, Mode & Range

Calculate all four measures. Round to 1 decimal place where needed.

โœ๏ธ Exercise 2: Compare Two Datasets

Calculate the mean and range for each group, then answer the comparison questions.

โœ๏ธ Exercise 3: Stem-and-Leaf Diagrams

Read the diagrams carefully and answer the questions.

๐Ÿ“ˆ Exercise 4: Interpreting Frequency Charts

Read the comparative frequency chart and answer the questions.

โœ๏ธ Exercise 5: Write Full Comparison Statements

Choose the correct words/values to build comparison statements. Every answer must include an average comparison AND a spread comparison.

๐Ÿ“ Practice Questions

  1. Find the mean of: 3, 7, 9, 5, 11
  2. Find the median of: 12, 4, 8, 15, 6, 10, 3
  3. Find the mode of: 4, 7, 2, 7, 9, 3, 7, 1
  4. Find the range of: 14, 3, 8, 19, 6, 11
  5. A dataset is: 5, 8, 8, 9, 12, 15. Find the median.
  6. Class A scores: 10, 12, 14, 16, 18. Class B scores: 5, 8, 13, 18, 21. Calculate the mean and range for each class.
  7. Which class in Q6 had a higher mean? Which had a larger range? What does the range tell us about consistency?
  8. Write a comparison statement for Q6 using both mean and range.
  9. A back-to-back stem-and-leaf diagram shows: Team A leaves for stem 4: 2, 5, 7. Team B leaves for stem 4: 1, 3, 8. What are these values?
  10. A dataset: 3, 5, 6, 6, 7, 8, 50. Identify the outlier and state its effect on the mean versus the median.
  11. Dataset: 4, 6, 7, 7, 8, 9. Find the mean and median. Now add an outlier of 45. Recalculate both. How much did each change?
  12. Describe a positively skewed distribution in terms of the tail and the relationship between mean, median and mode.
  13. Two groups ran 100 m. Group X mean = 14.2 s, range = 3.1 s. Group Y mean = 13.8 s, range = 5.4 s. Which group was faster on average? Which was more consistent?
  14. The heights (cm) of 7 plants: 12, 15, 18, 15, 22, 15, 19. Find all four measures.
  15. Why is the median sometimes a better average than the mean?
  16. Two shops record daily sales. Shop A: mean = ยฃ320, range = ยฃ80. Shop B: mean = ยฃ310, range = ยฃ200. Write a comparison statement.
  17. A dataset has mean = 10 and median = 10. A value of 40 is added. Will the mean or median change more?
  18. In a stem-and-leaf diagram, the stem is 3 and a leaf is 6. What value does this represent?
  19. Describe what a symmetric distribution looks like and state the relationship between its mean, median and mode.
  20. The range of Dataset A is 4 and the range of Dataset B is 14. What does this tell you about the two distributions?
  1. (3+7+9+5+11) รท 5 = 35 รท 5 = 7
  2. Ordered: 3, 4, 6, 8, 10, 12, 15 โ†’ 4th value = 8
  3. 7 appears 3 times โ†’ Mode = 7
  4. 19 โˆ’ 3 = 16
  5. Ordered: 5, 8, 8, 9, 12, 15 โ†’ median = (8+9)รท2 = 8.5
  6. Class A: mean = 70รท5 = 14, range = 18โˆ’10 = 8. Class B: mean = 65รท5 = 13, range = 21โˆ’5 = 16
  7. Class A had a higher mean (14 vs 13). Class B had a larger range (16 vs 8), meaning Class B's scores were less consistent
  8. "Class A has a higher mean (14) than Class B (13), suggesting Class A performed slightly better. Class A also has a smaller range (8) than Class B (16), showing Class A's results were more consistent."
  9. Team A: 42, 45, 47. Team B: 41, 43, 48
  10. Outlier = 50. Without it: mean โ‰ˆ 6.4, median = 6.5. With it: mean โ‰ˆ 13.4, median = 6.5. The mean is pulled up dramatically; the median barely changes.
  11. Original mean = 41รท6 โ‰ˆ 6.8, median = (7+7)รท2 = 7. With 45: mean = 86รท7 โ‰ˆ 12.3 (changed by ~5.5), median = 7 (changed by 0)
  12. Positively skewed: long tail to the right; Mean > Median > Mode
  13. Group Y was faster on average (13.8 s < 14.2 s). Group X was more consistent (range 3.1 s < 5.4 s)
  14. Ordered: 12, 15, 15, 15, 18, 19, 22. Mean = 116รท7 โ‰ˆ 16.6, Median = 15, Mode = 15, Range = 22โˆ’12 = 10
  15. The median is resistant to outliers โ€” it gives a more representative average when extreme values are present
  16. "Shop A has a higher mean (ยฃ320) than Shop B (ยฃ310). Shop A also has a much smaller range (ยฃ80 vs ยฃ200), meaning Shop A's sales were far more consistent/predictable."
  17. The mean will change much more. The median will shift only slightly (one position).
  18. 36
  19. Bell-shaped, balanced about the centre. Mean = Median = Mode
  20. Dataset A is much more consistent/clustered (spread of only 4); Dataset B is far more spread out/variable (spread of 14)

๐Ÿ† Challenge: Multi-Step & Extended Problems

  1. A teacher says: "The mean mark for my class was 62%." A student argues: "But the median was only 54%. Why is there such a big difference?" Explain, and state which measure is more representative. What does this suggest about the shape of the distribution?
  2. Two basketball players' points per game over 8 games:
    Player A: 12, 15, 18, 12, 20, 14, 11, 20
    Player B: 8, 25, 6, 30, 7, 28, 5, 27
    (a) Calculate the mean and range for each player. (b) Write a full comparison statement. (c) Which player would you prefer if your team needed consistent scoring? Justify using statistics.
  3. The back-to-back stem-and-leaf diagram shows hours of TV watched per week by Year 7 and Year 8 students:
    Year 7 | Stem | Year 8
    9 5 2 | 1 | 3 6 8
    8 6 4 1 | 2 | 0 2 5 9
    5 3 | 3 | 1 4
    (a) How many students are in each year group? (b) Find the median for each year group. (c) Write a comparison statement.
  4. A dataset of 6 values has a mean of 12 and a range of 10. The smallest value is 7 and the largest is 17. One value is 8, another is 14. The mode is 15. Find all six values.
  5. The ages of employees at two companies:
    Company X: 22, 24, 28, 31, 34, 38, 42, 45
    Company Y: 19, 20, 21, 22, 23, 45, 52, 60
    (a) Calculate the mean and median age for both companies. (b) For Company Y, which is a better average โ€” mean or median? Justify by identifying the outliers. (c) Compare the ranges.
  6. A frequency table shows maths test scores for Class 1 and Class 2 (scores out of 10):
    Score: 4 5 6 7 8 9 10
    Class 1 freq: 1 2 3 8 6 3 2
    Class 2 freq: 4 3 5 5 3 3 2
    (a) How many students in each class? (b) Find the modal score for each class. (c) Estimate the mean for each class. (d) Compare both classes fully.
  7. A distribution is negatively skewed. Explain what this means in terms of: (a) the shape of the graph, (b) the relationship between mean, median and mode, (c) a real-life example where this might occur.
  8. A sports coach claims: "Team A's mean score of 18 is better than Team B's mean score of 16." A statistician replies: "But Team A's result is misleading due to outliers." Team A data: 5, 6, 7, 8, 8, 9, 67. Team B data: 12, 14, 15, 17, 17, 18, 19. Verify both claims: calculate the mean for each team, identify any outliers, and determine which team's mean is more representative.
  1. The mean is being pulled up by a few very high marks (outliers). The median, at 54%, is more representative as it is resistant to those extreme values. Since mean > median, the distribution is positively skewed โ€” most students scored below the mean, but a few high-scorers pulled it up.
  2. (a) Player A: mean = 122รท8 = 15.25, range = 20โˆ’11 = 9. Player B: mean = 136รท8 = 17, range = 30โˆ’5 = 25. (b) "Player B has a higher mean (17 pts) than Player A (15.25 pts), but Player A has a much smaller range (9 vs 25), showing Player A is far more consistent." (c) For consistent scoring, choose Player A โ€” lower range means more predictable performance.
  3. (a) Year 7: 3+4+2 = 9 students. Year 8: 3+4+2 = 9 students. (b) Year 7 ordered: 12, 15, 19, 21, 24, 26, 28, 33, 35 โ†’ median = 24 hrs. Year 8 ordered: 13, 16, 18, 20, 22, 25, 29, 31, 34 โ†’ median = 22 hrs. (c) "Year 7 has a higher median (24 hrs) than Year 8 (22 hrs), suggesting Year 7 students watched more TV on average."
  4. Sum needed = 12 ร— 6 = 72. Known values: 7, 8, 14, 17, 15, 15. Sum = 76. Adjust: replace one 15 with 11 โ†’ values: 7, 8, 11, 14, 15, 17. Sum = 72 โœ“, range = 10 โœ“, mode = need reconsideration. With mode = 15: 7, 8, 14, 15, 15, 17 โ†’ sum = 76 โ‰  72. Adjust lowest: 7, 8, 10, 15, 15, 17 โ†’ sum = 72 โœ“. Values: 7, 8, 10, 15, 15, 17
  5. (a) Co. X mean = 264รท8 = 33, median = (34+38)รท2? No โ€” ordered: 22,24,28,31,34,38,42,45 โ†’ median = (31+34)รท2 = 32.5. Co. Y mean = 262รท8 = 32.75, median = (22+23)รท2... ordered: 19,20,21,22,23,45,52,60 โ†’ median = (22+23)รท2 = 22.5. (b) Company Y has three high outliers (45, 52, 60) pulling the mean to 32.75, far above most workers' ages. The median (22.5) is more representative. (c) Co. X range = 45โˆ’22 = 23; Co. Y range = 60โˆ’19 = 41. Company Y has a much wider age spread.
  6. (a) Class 1: 1+2+3+8+6+3+2 = 25 students. Class 2: 4+3+5+5+3+3+2 = 25 students. (b) Class 1 modal score = 7. Class 2 modal score = 6. (c) Class 1 mean = (4+10+18+56+48+27+20)รท25 = 183รท25 = 7.32. Class 2 mean = (16+15+30+35+24+27+20)รท25 = 167รท25 = 6.68. (d) Class 1 has a higher mean (7.32 vs 6.68) and higher mode (7 vs 6), suggesting Class 1 performed better overall.
  7. (a) The graph has a long tail to the left; the bulk of data clusters to the right. (b) Mean < Median < Mode โ€” outliers on the left pull the mean down. (c) Example: exam scores where most students do very well but a few score very poorly (e.g. a hard re-sit paper).
  8. Team A mean = (5+6+7+8+8+9+67)รท7 = 110รท7 โ‰ˆ 15.7. Team B mean = (12+14+15+17+17+18+19)รท7 = 112รท7 = 16. Team A has an outlier of 67 (far above the rest: 5โ€“9). Without it, Team A mean = 43รท6 โ‰ˆ 7.2. Team B's mean of 16 is more representative โ€” all values cluster between 12โ€“19 with no outliers. The statistician is correct: Team B's mean is more representative.