Grade 10 · Statistics · Cambridge IGCSE 0580 · Age 14–16
Scatter diagrams let us see whether two variables are related — from studying whether revision time links to exam scores, to whether car age affects fuel efficiency. They are a core part of IGCSE Statistics and appear on most Extended papers.
Independent vs dependent, reading coordinates
Strong/weak, positive/negative, none
(x̄, ȳ) always lies on the line of best fit
Equal points each side, drawn through mean
Gradient from two points on the line
Predicting within data range — reliable
A scatter diagram (also called a scatter graph or scatter plot) displays pairs of numerical data as individual points. Each point represents one individual (person, object, trial) and shows the values of two measurements taken from that individual.
The independent variable is the one you control or that changes naturally. It goes on the x-axis (horizontal). The dependent variable is the one you measure and expect to change as a result. It goes on the y-axis (vertical).
Follow these steps every time:
Once a scatter diagram is drawn, you can read off values. To find the expected y for a given x value, go up from x to the line of best fit, then across to the y-axis. To find the x for a given y, go across from y to the line of best fit, then down to the x-axis.
Correlation describes whether and how strongly two variables are related. We use the pattern of the scatter diagram to decide.
Strong Positive
Weak Positive
No Correlation
Weak Negative
Strong Negative
Perfect Positive
In exams you must describe correlation in context, not just say "positive". Use the variable names.
Just because two variables are correlated does not mean one causes the other. There may be a third "lurking variable" that drives both, or the relationship could be coincidental.
The line of best fit (also called the trend line or regression line) is a straight line that best summarises the trend in a scatter diagram.
Calculate x̄ (mean of all x-values) and ȳ (mean of all y-values). The mean point (x̄, ȳ) always lies on the line of best fit.
The equation takes the form y = mx + c where m is the gradient and c is the y-intercept.
Interpolation means predicting a y-value for an x-value that lies within the range of the data. This is generally reliable.
Extrapolation means predicting a y-value for an x-value that lies outside the range of the data. This is unreliable — the pattern may not continue.
These are the errors that cost marks most often. Study each one carefully.
There is no reason to force the line through the origin. The y-intercept (c in y = mx + c) is determined by the data, not assumed to be zero.
Always check: is the x-value inside the range of the collected data? If not, say "unreliable — extrapolation beyond the data range."
Correlation means two variables move together. Causation means one directly produces the other. These are different claims. Only controlled experiments can show causation.
Small errors in reading graph coordinates have a larger effect when the points are close together. Use widely spaced points for accuracy.
IGCSE mark schemes expect: the type (positive/negative/none), the strength (strong/weak) AND the context (reference to the specific variables). Giving just "positive" often scores only 1 out of 2.
The mean point is a mathematical requirement, not optional. In IGCSE, the examiner's line is drawn through the mean point — your line must pass through (or very close to) it to earn the mark.
| Formula / Concept | Details |
|---|---|
| Mean x̄ | x̄ = (x₁ + x₂ + … + xₙ) / n = Σx / n |
| Mean ȳ | ȳ = (y₁ + y₂ + … + yₙ) / n = Σy / n |
| Mean point | (x̄, ȳ) — the line of best fit ALWAYS passes through this point |
| Gradient of line | m = (y₂ − y₁) / (x₂ − x₁) — use two well-separated points on the line |
| Equation of line | y = mx + c — substitute one point to find c |
| Find y from x | Substitute x into y = mx + c |
| Find x from y | Rearrange: x = (y − c) / m |
| Interpolation | Predicting within data range — generally reliable |
| Extrapolation | Predicting outside data range — unreliable, trend may not continue |
Click on the canvas to add data points. The visualiser will automatically calculate the mean point, draw the line of best fit, and estimate the correlation strength.
1. On a scatter diagram, which axis should the independent variable (the one you control) be placed on? Enter 1 for x-axis, 2 for y-axis.
2. A researcher measures engine size (cm³) and CO₂ emissions (g/km). Which variable goes on the x-axis? Enter 1 for engine size, 2 for CO₂ emissions.
3. Eight students' revision hours: 1,2,3,4,5,6,7,8. Their scores: 30,38,46,54,62,70,78,86. Calculate x̄ (mean revision hours).
4. Using the same data as Q3, calculate ȳ (mean score).
5. A line of best fit passes through (1, 32) and (9, 88). What is the gradient (m)?
6. Using m = 7 and the point (4, 54), find the y-intercept c (from y = mx + c).
7. Using y = 7x + 26, predict y when x = 6.
8. Using y = 7x + 26, find x when y = 61.
1. Data: x = 2, 4, 6, 8, 10. Find x̄.
2. Data: y = 14, 22, 30, 38, 46. Find ȳ.
3. Using (x̄, ȳ) from Q1–Q2, what is the mean point? Enter x̄ value.
4. A line of best fit passes through (2, 15) and (10, 47). Find the gradient.
5. Using m = 4 and point (2, 15), find c in y = mx + c.
6. Using y = 4x + 7, find y when x = 9.
7. x values: 3, 7, 9, 11, 15. y values: 8, 20, 26, 32, 44. Find x̄.
8. Using same data as Q7, find ȳ.
Enter: 1 = Strong Positive, 2 = Weak Positive, 3 = No Correlation, 4 = Weak Negative, 5 = Strong Negative
1. Hours studying vs exam marks — as study time increases, marks consistently increase with little spread.
2. Shoe size vs intelligence — points are randomly scattered with no pattern.
3. Age of car vs value — older cars tend to be worth less, but there is quite a spread.
4. Temperature vs heating bills — as temperature rises, heating bills drop significantly in a tight pattern.
5. Amount of rainfall vs amount of sunshine — more rain generally means less sun, but points vary widely.
6. Number of people in a car vs journey time on same route — no real pattern.
7. Height vs weight for adults — taller people tend to be heavier, fairly tight cluster.
8. Speed of a car vs fuel efficiency (mpg) — faster speeds strongly associated with much lower mpg, tight pattern.
1. A line passes through (0, 5) and (10, 45). Find the gradient.
2. Using gradient = 4 and point (0, 5), find c.
3. A line passes through (3, 20) and (9, 38). Find the gradient.
4. Using gradient = 3 and point (3, 20), find c.
5. Using y = 3x + 11, predict y when x = 7.
6. Using y = 3x + 11, find x when y = 29.
7. A line passes through (1, 50) and (11, 10). Find the gradient.
8. Using m = −4 and point (1, 50), find c. (y = −4x + c)
Enter: 1 = Interpolation (reliable), 2 = Extrapolation (unreliable)
1. Data collected for x in range 5–25. Predicting y when x = 14.
2. Data collected for x in range 5–25. Predicting y when x = 40.
3. Data collected for x in range 5–25. Predicting y when x = 2.
4. Using y = 2x + 3, predict y when x = 10 (data range: 1 to 20). Enter y value.
5. Using y = 2x + 3, predict y when x = 50 (data range: 1 to 20). Enter y value.
6. Data covers ages 10–16. Is predicting for age 13 reliable? Enter 1 for Yes, 2 for No.
7. Using y = −3x + 60 and data range x: 5–15, find y when x = 12.
8. The line of best fit has equation y = 5x + 2. Find x when y = 42.
For correlation type: 1=Strong Pos, 2=Weak Pos, 3=No Corr, 4=Weak Neg, 5=Strong Neg | For reliability: 1=Reliable (interpolation), 2=Unreliable (extrapolation)
1. Find x̄ for the data: 4, 8, 10, 14, 19.
2. Find ȳ for: 12, 20, 26, 34, 48.
3. A line passes through (2, 10) and (8, 34). Find gradient m.
4. Using m = 4 and point (2, 10), find c.
5. Using y = 4x + 2, predict y when x = 7.
6. Using y = 4x + 2, find x when y = 26.
7. Correlation type: number of absences vs end-of-year grade (fewer absences = higher grade, tight pattern).
8. Correlation type: eye colour vs salary — no pattern.
9. Data range: x from 3 to 18. Is predicting for x = 10 reliable? (1=Yes, 2=No)
10. Data range: x from 3 to 18. Is predicting for x = 25 reliable? (1=Yes, 2=No)
11. x values: 1, 3, 5, 7, 9. Find x̄.
12. y values: 5, 11, 17, 23, 29. Find ȳ.
13. A line passes through (0, 3) and (5, 23). Find gradient m.
14. Using m = 4 and point (0, 3), find c.
15. Using y = 4x + 3, predict y when x = 6.
16. A line passes through (5, 40) and (15, 20). Find gradient.
17. Using m = −2 and point (5, 40), find c.
18. Using y = −2x + 50, predict y when x = 12.
19. Using y = −2x + 50, find x when y = 30.
20. Correlation type: hours of exercise per week vs resting heart rate (more exercise = lower heart rate, fairly scattered).
21. x: 6, 8, 10, 12, 14. y: 3, 7, 11, 15, 19. Find the gradient of the line of best fit using the two extreme points.
22. Using m = 2 and point (6, 3), find c.
23. Using y = 2x − 9, predict y when x = 11.
24. Data range x: 6–14. Is predicting y for x = 9 interpolation? (1=Yes, 2=No)
25. x: 2, 5, 8, 11, 14. y: 9, 18, 27, 36, 45. Find ȳ.
1. Six data points have x values: 3, 6, 9, 12, 15, 18 and y values: 11, 19, 27, 35, 43, 51. Find the gradient of the line of best fit using two extreme points.
2. Using the data from Q1, find the mean point x̄ (enter x̄ only).
3. Using the data from Q1, find ȳ.
4. Using gradient m = 8/3 ≈ 2.67 and the mean point (10.5, 31), find c to 2 decimal places.
5. Using y = 2.67x + 3.0 (approx), predict y when x = 7.5. Give your answer to 1 d.p.
6. Is predicting for x = 20 using the equation from Q4 reliable? (1=Yes / 2=No)
7. A line of best fit passes through (4, 58) and (16, 22). Find the gradient.
8. Using m = −3 and point (4, 58), find c.
9. Using y = −3x + 70, find x when y = 40.
10. A dataset gives Σx = 84 and n = 7. Find x̄.
11. The mean point of a dataset is (9, 30). The gradient of the line of best fit is 2.5. Find c.
12. Using y = 2.5x + 7.5, predict y when x = 14.
Mark-scheme style. Show all working in your book. Enter final answers below for self-marking.
A scatter diagram shows the outside temperature (°C) on the x-axis and the number of hot drinks sold at a café on the y-axis. The data was collected over 8 days.
The line of best fit for the café data passes through (5, 93) and (25, 21).
The equation of the line of best fit is y = −3.6x + 111. Data was collected for temperatures 5°C to 26°C.
A student says: "The scatter diagram proves that hot weather causes people to buy fewer hot drinks."
A scientist records the depth (cm) of a river (x) and the water speed (cm/s) (y) at 6 locations. Data: x = 10, 20, 30, 40, 50, 60 and y = 4, 10, 16, 22, 28, 34.