Grade 10 · Statistics · Cambridge IGCSE · Age 15–16
Scatter graphs and lines of best fit let us spot patterns in data and make predictions. Correlation tells us how strongly two variables are linked — but never assumes one causes the other. Comparing distributions goes beyond single numbers to describe both average and spread in context.
Plotting, correlation type and strength
Pearson's coefficient from −1 to +1
Through mean point, equation, prediction
When predictions are reliable vs unreliable
Mean/median, IQR/range, in context
Effect on line, PMCC, and comparisons
A scatter graph plots paired data as (x, y) points. Each point represents one observation of two variables. The pattern of points shows whether, and how strongly, the two variables are related.
A line of best fit (regression line) summarises the trend. It must pass through the mean point (x̄, ȳ).
When comparing two datasets, you must comment on both average and spread, and put your comparison in context.
A line of best fit MUST pass through (x̄, ȳ). If it doesn't, it is not the line of best fit. Always calculate and plot the mean point first, then draw the line through it with roughly equal points on each side.
Predicting outside the data range (extrapolation) is unreliable. The trend shown in the data may not continue beyond the measured range. Always check whether the value you are predicting for lies within or outside the data range, and comment accordingly.
A strong correlation between X and Y does NOT mean X causes Y. There could be a third confounding variable affecting both. In exam answers, never write "X causes Y" from a scatter graph alone — write "as X increases, Y tends to increase" (correlation language).
When comparing two distributions, you must (a) compare averages AND spreads, and (b) relate both to the context of the data. Saying "Group A has a higher mean" is incomplete. Add "so Group A performed better on average in the test" — the context matters for marks.
r = −0.85 describes strong NEGATIVE correlation, not strong positive. The sign tells you the direction. When describing correlation, always state both direction AND strength. Never say "r = −0.85 means strong correlation" — say "strong negative correlation".
Enter up to 10 (x,y) data points. The visualiser plots them, computes the mean point (pink cross), draws the regression line, gives an r description, and predicts y from x.
| # | x | y |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 5 | ||
| 6 | ||
| 7 | ||
| 8 | ||
| 9 | ||
| 10 |
For each PMCC value r, enter: 1=strong positive, 2=moderate positive, 3=weak positive, 4=no correlation, 5=weak negative, 6=moderate negative, 7=strong negative.
1. r = 0.92. Enter code (1–7).
2. r = −0.85. Enter code.
3. r = 0.05. Enter code.
4. r = 0.62. Enter code.
5. r = −0.55. Enter code.
6. r = 0.35. Enter code.
7. r = −0.97. Enter code.
8. r = −0.30. Enter code.
Calculate the mean point (x̄, ȳ) for each dataset. Enter x̄ for the first question, ȳ for the second, and so on alternately.
1. Points: (1,3), (3,5), (5,7), (7,9). Find x̄.
2. Same points. Find ȳ.
3. Points: (2,10), (4,14), (6,18), (8,22), (10,26). Find x̄.
4. Same points. Find ȳ.
5. Points: (5,2), (10,4), (15,8), (20,10), (25,6). Find x̄.
6. Same points. Find ȳ.
7. Points: (0,12), (4,8), (8,4), (12,0). Find x̄.
8. Same points. Find ȳ.
Find gradient, then predict y using the given equation.
1. Line through (2,6) and (10,22). Find the gradient.
2. Same line. Find the y-intercept c (y=mx+c).
3. y = 2x + 2. Predict y when x = 7.
4. Line through (0,5) and (8,21). Find gradient m.
5. y = 2x + 5. Predict y when x = 11. Is this interpolation (enter 1) or extrapolation (enter 2), if data range is x=1 to x=9?
6. Line through (4,10) and (12,34). Find gradient.
7. y = 3x + 1. Predict y when x = 5.
8. Line through (1,4) and (9,20). Find y-intercept c.
Use the statistics given to answer each question numerically.
1. Group A: mean=55, median=52. Group B: mean=68, median=67. How much greater is Group B's mean? Enter the difference.
2. Group A: IQR=12. Group B: IQR=20. Which group is more spread? Enter 1 for A, 2 for B.
3. Dataset A: 3,5,7,9,11. Dataset B: 1,4,7,10,13. Both have median=7. Which has larger range? Enter 1=A, 2=B, 3=same.
4. Dataset: 10,12,14,16,100. Which is more representative, mean (enter 1) or median (enter 2)?
5. Group A: mean=70, sd=5. Group B: mean=70, sd=15. Group B is more spread (enter 1) or less spread (enter 2)?
6. Data: 4,6,8,10,12. Mean = ? Enter the mean.
7. Data: 4,6,8,10,12. IQR = Q3−Q1 = ? Enter IQR.
8. Removing outlier 100 from dataset {10,12,14,16,100}: does the mean increase (1) or decrease (2)?
Use understanding of outlier effects and PMCC.
1. r = 0.94 with an outlier. Removing the outlier away from the trend makes r increase (1) or decrease (2)?
2. After removing outlier from Q1, r = 0.98. The correlation became stronger (1) or weaker (2)?
3. Data range x: 5 to 25. Predicting at x=30 is interpolation (1) or extrapolation (2)?
4. Data range x: 5 to 25. Predicting at x=15 is interpolation (1) or extrapolation (2)?
5. r = 0.75 describes strong (1), moderate (2), or weak (3) correlation?
6. r = −0.92 — direction is positive (1) or negative (2)?
7. Outlier above the mean is removed. The mean decreases (1) or increases (2)?
8. Removing an outlier from a dataset generally: decreases the range (1) or increases the range (2)?
Mixed data analysis questions. Enter numerical answers where asked; for codes use the key given in Exercise 1.
1. r = 0.88. Correlation code (1–7)?
2. r = −0.45. Correlation code?
3. Points: (1,4),(2,6),(3,8),(4,10). Find x̄.
4. Same points. Find ȳ.
5. Line through (3,9) and (9,21). Gradient m = ?
6. y = 2x + 3. Predict y at x=8.
7. y = 2x + 3. Predict y at x=20. Data range is x=2 to x=10. Interpolation (1) or extrapolation (2)?
8. Group A mean=60, Group B mean=75. Difference (B−A)?
9. Group A IQR=8, Group B IQR=20. Which is more consistent (smaller spread)? Enter 1=A, 2=B.
10. Data: 2,4,6,8,10. Mean = ?
11. Same data. Median = ?
12. r = 0.12. Correlation code?
13. Line through (0,3) and (5,18). Gradient m = ?
14. y = 3x + 3. Predict y at x = 6.
15. Data: 5,7,9,11,13. x̄ = ?
16. Outlier above mean is removed. Mean decreases (1) or increases (2)?
17. After removing outlier, r increases from 0.70 to 0.91. Correlation got stronger (1) or weaker (2)?
18. r = −0.78. Correlation code?
19. Points: (2,14),(6,18),(10,22),(14,26). ȳ = ?
20. y = 4x − 2. Predict y at x = 5.
21. Data range x: 10 to 50. Predicting at x=35 is interpolation (1) or extrapolation (2)?
22. For skewed data with outliers, which average is more appropriate? Mean=1, Median=2.
23. Line through (2,5) and (8,17). y-intercept c = ?
24. r = 0.55. Correlation code?
25. Dataset: 3,5,7,9,200. Is mean (1) or median (2) more resistant to the outlier?
Harder multi-step data analysis problems.
1. Points: (1,2),(2,4),(3,6),(4,8),(5,10). Line of best fit y=mx+c. Find m.
2. Same data. Find c (y-intercept). Line passes through mean point.
3. Adding outlier (5,30) to data in Q1. Does the gradient of the line increase (1) or decrease (2)?
4. Points: (0,20),(5,15),(10,10),(15,5),(20,0). Find gradient m of line through mean point.
5. Same data. Find mean x̄.
6. Same data. Using y = mx + c with m from Q4, find c.
7. y = −x + 20. Predict y at x=12. Enter value.
8. r = −0.95 describes what type? Enter correlation code.
9. Data: 10,20,30,40,50. Mean=30. If 50 is replaced by 100, new mean = ?
10. Data A: mean=50, sd=4. Data B: mean=50, sd=12. Which has values closer to the mean? Enter 1=A, 2=B.
11. 8 data points: mean=15, total sum Σx = ?
12. Line: y = 1.5x + 4. At what x does y = 25? Enter x value.
Cambridge IGCSE style. Show working on paper where needed. Enter final answers.
Q1. The table shows data for 6 students: hours studied (x) and test score (y).
x: 1, 2, 3, 4, 5, 6 y: 30, 40, 45, 55, 65, 75
(a) Calculate the mean point (x̄, ȳ).
Enter x̄:
Q2. Using the same data: enter ȳ.
Q3. The line of best fit for Q1 data passes through (1, 28) and (6, 78).
Find the gradient of the line of best fit.
Q4. Using the line equation from Q3 (y=10x+18), predict the score for a student who studied 4.5 hours. Is this interpolation or extrapolation? Enter predicted score.
Q5. Class A: mean = 62, IQR = 10. Class B: mean = 58, IQR = 22.
Which class has more consistent (less spread) results? Enter 1 for Class A, 2 for Class B.