Cambridge Lower Secondary Stage 7 — Unit 6 Learn how data is collected, classified, and organised!
๐
Survey / Collect
▶
โ๏ธ
Tally Chart
▶
๐
Frequency Table
▶
๐
Chart / Analyse
Types of Data
Qualitative (Categorical): Described in words โ colour, gender, favourite subject.
Quantitative โ Discrete: Counted in whole numbers โ number of goals, shoe size.
Quantitative โ Continuous: Measured, can take any value in a range โ height, time, mass.
Primary vs Secondary Data
Primary: You collect it yourself (survey, experiment, observation). Up to date but takes time.
Secondary: Already collected by someone else (internet, books, databases). Quick but may be outdated or biased.
Sampling Methods
Random: Every member has an equal chance โ fair but hard to organise.
Systematic: Every nth person from a list โ quick and easy.
Stratified: Population split into groups; sample proportionally from each โ representative.
Convenience: Whoever is easiest to reach โ fast but often biased.
Bias in Surveys
Leading questions push people to one answer.
Too small a sample is not reliable.
Non-representative samples leave out important groups.
Vague options make data hard to interpret.
Frequency Tables & Tally Charts
A tally chart records data as it is collected using marks (| | | | ᵋ). Every 5th mark crosses the previous four.
A frequency table summarises the tally into counts (frequencies).
A grouped frequency table groups continuous data into class intervals, e.g. 10 ≤ h < 20.
Writing Good Survey Questions
Bad: "Don't you agree that school should start later?" — This is a leading question!
Good: "What time do you think school should start? Tick one: Before 8am / 8am–9am / After 9am"
Tips: Use tick-boxes with clear, non-overlapping options. Keep questions neutral. Include enough choices to cover all likely answers.
Tally Chart Example
Colour
Tally
Frequency
Red
| | | |
4
Blue
ᵋ |
6
Green
| | |
3
White
ᵋ ᵋ
10
Grouped Frequency Table Example
Height (cm)
Tally
Frequency
130 ≤ h < 140
| | |
3
140 ≤ h < 150
ᵋ | |
7
150 ≤ h < 160
ᵋ ᵋ |
11
160 ≤ h < 170
| | | |
4
Worked Examples
Study these carefully before trying the exercises.
Example 1: Classifying Data Types
Classify each item as Qualitative, Quantitative-Discrete, or Quantitative-Continuous:
Eye colour → Qualitative (described in words, no numbers)
Number of siblings → Quantitative-Discrete (counted, whole numbers only)
Time taken to run 100 m → Quantitative-Continuous (measured, any decimal value possible)
Favourite sport → Qualitative (categories, not numbers)
Temperature → Quantitative-Continuous (measured, can be 23.4°C etc.)
Key trick: Ask "Can I measure it exactly on a scale, or just count it?" If neither, it's qualitative.
Example 2: Identifying Sampling Methods
A school has 500 students. The head teacher wants to survey 50 students about the canteen menu.
She puts all names in a hat and picks 50 → Random sampling
She alphabetically lists all 500, then picks every 10th student → Systematic sampling
Year 7 = 100 students (10 selected), Year 8 = 150 (15 selected), Year 9 = 250 (25 selected) → Stratified sampling (proportional)
She surveys the first 50 students who arrive in the morning → Convenience sampling
Stratified sampling is the most representative because each group is proportionally included.
Example 3: Spotting Bias & Rewriting
Biased question: "Most people prefer fizzy drinks. What is your favourite drink?"
Problems:
Leads with a statement that influences the answer.
No tick-box options given — open-ended answers are hard to tally.
Improved question: "What is your favourite type of drink? Tick one: Water □ Fruit juice □ Fizzy drink □ Milk □ Other □"
Example 4: Completing a Grouped Frequency Table
The ages of 25 visitors to a library were recorded. Complete the table.
Note: Always check your frequencies add up to the total (26 here). Use class intervals that don't overlap and cover all values.
Tally Counter Simulator
Click each car card to record it. The tally chart and frequency table update live! Click all 20 cars to see the bar chart.
Tally Chart
Frequency Table
Colour
Frequency
Bar Chart
Exercise 1: Data Type Classifier
Click the correct category for each data item. Get them all right for a surprise!
Score: 0 / 12
Exercise 2: Sampling Method Matcher
Read each scenario and click the sampling method being used.
Score: 0 / 8
Exercise 3: Bias Spotter
Is each survey question BIASED or FAIR? Click to decide. For biased ones, see the improved version!
Score: 0 / 8
Exercise 4: Frequency Table Completer
Fill in the missing values and press Check to see how you did.
Exercise 5: Sample Size Judge
For each scenario, rate the sample as Good, Too Small, or Biased.
Score: 0 / 6
Exercise 6: Survey Design Quiz
Pick the best answer for each survey design question.
Score: 0 / 6
Practice Questions
Write your answers on paper. Reveal answers when ready.
State whether "number of books on a shelf" is qualitative, quantitative-discrete, or quantitative-continuous.
Classify "the mass of a watermelon" as qualitative, quantitative-discrete, or quantitative-continuous.
Give one example of qualitative data and one example of quantitative-continuous data.
What is the difference between primary data and secondary data? Give one advantage of each.
A researcher asks 10 out of 1000 employees about work conditions. Is this sample size likely to be reliable? Explain.
Describe random sampling. Give one advantage and one disadvantage.
A supermarket has 600 customers per day. A manager surveys every 20th customer as they enter. What type of sampling is this?
A school has 300 girls and 200 boys. A stratified sample of 50 is taken. How many boys should be in the sample?
Why is convenience sampling often considered unreliable?
Identify the bias in this question: "Don't you think that too much homework is harmful to students?"
Rewrite this question without bias: "Surely you prefer science to art?"
A tally chart shows: | | | | for Red, ᵋ for Blue, | | for Green. Write out the frequency table with totals.
In a grouped frequency table, the class "20 ≤ t < 30" has a frequency of 8 and "30 ≤ t < 40" has a frequency of 12. What is the total frequency for these two classes?
Why must class intervals in a grouped frequency table not overlap?
Data is collected on the time (in minutes) students spend on homework. Suggest suitable class intervals for a grouped frequency table, given values range from 0 to 120 minutes.
A student surveys only their friends about favourite music. Give two reasons why this is a poor sample.
What does "representative sample" mean?
Give one example of a leading question and explain why it is problematic.
A frequency table has five groups with frequencies: 3, 7, 12, 5, 3. What is the total number of data values?
The ages of 6 students are: 12, 13, 12, 14, 13, 12. Complete a tally chart and frequency table for this data.
Quantitative-discrete โ it is counted in whole numbers.
Quantitative-continuous โ mass is measured and can take any value (e.g. 3.47 kg).
Qualitative: e.g. eye colour. Quantitative-continuous: e.g. height in cm. (Accept other valid examples.)
Primary data is collected by you (advantage: up to date, designed for your purpose). Secondary data already exists (advantage: quick and cheap to obtain).
No โ a sample of 10 from 1000 (1%) is very small and unlikely to represent the full population reliably.
Random sampling gives every member an equal chance. Advantage: unbiased. Disadvantage: difficult to organise with large populations.
Systematic sampling.
Boys make up 200/500 = 2/5 of the school. 2/5 ร 50 = 20 boys.
Convenience sampling only includes people who are easiest to reach, which may not represent the wider population, leading to biased results.
The question is leading โ it suggests the correct answer is "yes" and influences the respondent to agree.
Example: "Which subject do you prefer? Tick one: Science / Art / Other"
Red: 4, Blue: 5, Green: 2. Total: 11.
8 + 12 = 20.
If intervals overlap, a data value could belong to two classes, making tallying impossible and the table meaningless.
Suitable intervals: 0 ≤ t < 20, 20 ≤ t < 40, 40 ≤ t < 60, 60 ≤ t < 80, 80 ≤ t < 100, 100 ≤ t < 120. (Accept intervals of equal width covering 0โ120.)
Friends share similar tastes/backgrounds (biased); the sample is not randomly chosen from the full population.
A representative sample reflects the characteristics of the whole population in correct proportions.
Example: "Don't you agree exercise is important?" โ This pushes the respondent to say yes, so results will not reflect true opinion.
3 + 7 + 12 + 5 + 3 = 30.
Tally: 12: | | |, 13: | |, 14: |. Frequency table โ Age 12: 3, Age 13: 2, Age 14: 1. Total: 6.
Challenge Questions
Exam-style questions โ show full reasoning on paper.
A school wants to find out how students travel to school. There are 400 students: 160 in Year 7, 140 in Year 8, and 100 in Year 9. The school decides to take a stratified sample of 40 students.
(a) How many Year 7 students should be in the sample?
(b) How many Year 9 students should be in the sample?
(c) Give one reason why stratified sampling is better than convenience sampling for this survey.
A student designs this survey question: "Most teenagers spend too long on social media. How many hours per day do you waste on social media?"
(a) Identify two problems with this question.
(b) Write an improved version of the question, including suitable response options.
The heights (cm) of 20 plants are: 12, 18, 25, 31, 14, 22, 29, 17, 35, 11, 27, 33, 20, 16, 24, 38, 9, 21, 28, 15.
(a) Complete a grouped frequency table using class intervals of width 10, starting at 0 ≤ h < 10.
(b) Which class interval has the highest frequency?
(c) Can you tell the exact height of the tallest plant from the frequency table? Explain.
A researcher wants to know the average amount of pocket money received by teenagers in a city of 50,000 teenagers. She surveys 15 students from one school.
(a) Is 15 an appropriate sample size? Explain.
(b) What type of sampling is being used?
(c) Suggest a more reliable sampling method and explain why it is better.
Classify each of the following and justify your classification:
(a) The number of text messages sent in a day.
(b) The temperature of a room.
(c) A person's nationality.
(d) The time taken to complete a puzzle.
A factory produces bolts. Every 50th bolt is tested for size. The factory produces 5,000 bolts per day.
(a) How many bolts are tested each day?
(b) What type of sampling is this?
(c) Give one advantage and one disadvantage of this method for quality control.
Design a short questionnaire (3 questions) to find out about students' reading habits. For each question:
- Ensure it is unbiased.
- Provide appropriate tick-box response options.
- State whether the data collected will be qualitative or quantitative.
A local council surveys residents about a new park. They post a survey online and get 500 responses. The local area has 20,000 residents.
(a) What percentage of residents responded?
(b) Give two reasons why these responses may not represent the whole community.
(c) Suggest one way the council could improve the representativeness of the sample.
The table below shows the number of hours of TV watched per week by 30 students.
Hours (h)
Frequency
Cumulative Frequency
0 ≤ h < 5
4
?
5 ≤ h < 10
9
?
10 ≤ h < 15
11
?
15 ≤ h < 20
6
?
(a) Complete the cumulative frequency column.
(b) How many students watch fewer than 10 hours of TV per week?
(c) What fraction of students watch 15 or more hours? Simplify your answer.
A newspaper reports: "8 out of 10 dentists recommend this toothpaste." Critically evaluate this claim.
(a) What information is missing that would help assess whether the claim is reliable?
(b) Give two ways the survey could have been biased.
(c) How would you design a fair survey to test this claim?
(a) Year 7: 160/400 ร 40 = 16 students. (b) Year 9: 100/400 ร 40 = 10 students. (c) Stratified sampling ensures each year group is represented in proportion to its size, unlike convenience sampling which may accidentally exclude a year group entirely.
(a) Two problems: (1) The phrase "waste on social media" is a loaded/judgmental term that makes the respondent feel guilty. (2) "Most teenagers spend too long" is a leading statement that biases the respondent. (b) Improved question: "On a typical school day, how many hours do you spend on social media? Tick one: Less than 1 hour / 1โ2 hours / 2โ3 hours / More than 3 hours"
(a) Table: 0≤h<10: 2 (9, 12); 10≤h<20: 8 (11,12,14,15,16,17,18); wait โ let me recount: values are 9,11,12,14,15,16,17,18 for 10-20?
Sorted: 9, 11, 12, 14, 15, 16, 17, 18, 20, 21, 22, 24, 25, 27, 28, 29, 31, 33, 35, 38.
0≤h<10: 9 โ freq 1. 10≤h<20: 11,12,14,15,16,17,18 โ freq 7. 20≤h<30: 20,21,22,24,25,27,28,29 โ freq 8. 30≤h<40: 31,33,35,38 โ freq 4. Total = 20. โ
(b) The class 20 ≤ h < 30 has the highest frequency (8). (c) No โ the table shows the tallest plant is in the 30≤h<40 group but you cannot tell its exact height is 38 cm from the table alone.
(a) No โ 15 students from one school is far too small a sample from 50,000 teenagers. It is less than 0.03% of the population. (b) Convenience/opportunity sampling โ she surveyed only the school easiest to access. (c) Stratified random sampling across several schools and areas of the city would be more reliable, as it would proportionally represent all types of teenagers (different ages, areas, income levels).
(a) Quantitative-discrete โ text messages are counted in whole numbers. (b) Quantitative-continuous โ temperature is measured and can take any decimal value. (c) Qualitative โ nationality is a category described in words, not a number. (d) Quantitative-continuous โ time is measured and can be any value (e.g. 3 min 47.2 sec).
(a) 5,000 รท 50 = 100 bolts tested per day. (b) Systematic sampling. (c) Advantage: quick and easy to implement with no need to list all items. Disadvantage: if there is a repeating pattern in the production (e.g. a machine fault every 50 bolts), every defective bolt could be missed.
Example questionnaire:
Q1: "How many books did you read last month?" Options: 0 / 1โ2 / 3โ5 / More than 5. Data type: Quantitative-discrete. Q2: "What type of book do you enjoy most?" Options: Fiction / Non-fiction / Graphic novel / Poetry / Other. Data type: Qualitative. Q3: "On average, how many minutes per day do you spend reading?" Options: 0โ15 / 16โ30 / 31โ60 / More than 60. Data type: Quantitative-continuous (grouped).
(a) 500/20,000 ร 100 = 2.5% responded. (b) Reason 1: Only people with internet access could respond, excluding older residents or those without technology. Reason 2: Only people who feel strongly (positively or negatively) about a park may bother to respond (self-selection bias). (c) Conduct door-to-door interviews using a stratified random sample of addresses across different areas of the community.
(a) Cumulative frequencies: 4, 13, 24, 30. (b) Students watching fewer than 10 hours = cumulative frequency at h<10 = 13 students. (c) Students watching 15 or more hours = 6. Fraction = 6/30 = 1/5.
(a) Missing info: total number of dentists surveyed; how dentists were selected; who funded the survey; what "recommend" means; whether any dentists were excluded. (b) Possible biases: (1) The toothpaste company may have chosen dentists already known to favour their product. (2) Dentists who disagreed may not have been included (non-response or cherry-picking). (c) Fair survey design: randomly select dentists from the national register; ask them blind (without knowing the brand) to compare several toothpastes on specific clinical criteria; use an independent organisation to run the study.