Data handling

Data is a collection of numerical figures and information used in research.
Data handling involves the following processes:
data 1 jhvghadc

6.1 Developing research questions

Before we start the research process, we need to make sure that we state the aim of the research clearly, in a way that can be measured. The aim of the research is then written as a research question. This will guide us to formulate the tool to be used to collect the data. Data may be collected from a population. Open-ended and closed questions may be used.
In an open-ended question, the answer is usually the opinion of the respondent and the respondent can answer in their own words. In this way you can gain insightful data and avoid receiving answers that are biased. A disadvantage of this type of question is that respondents might leave it out if it takes too long to answer.
Closed questions could give options for the respondent to choose from, which is convenient because they can simply tick the right box. A disadvantage of this type is that, when an option provided does not accommodate all respondents, this may lead to a situation where some of the questions are not answered.

6.2 Collecting data

The aim of the research influences the choice of the sample and the method of collecting data. The population is a group that the data is collected from, e.g. Gauteng Matric learners. If the population is large, a sample may be used. A sample is a portion chosen to represent the population e.g. learners from 10 schools in Gauteng to represent the matriculants of Gauteng.
The choice of sample can have an effect on the reliability of the data and could even lead to sample bias. Sample bias occurs when the sample is not representative of that population, e.g. if the selected learners are from city schools only, then the sample will be biased because they might not share similar characteristics with learners from townships and farms schools. Random sampling is used to minimise sample bias. However it might still lead to sample bias, if the random sample is composed of learners from township schools only.
So demographic factors, race, gender, age, etc. should be controlled when sampling.
The method of collecting data will help us to identify the most appropriate tool to be used when collecting the data.
There are three methods of collecting data:

  • Observation: This is the method of collecting data by watching and recording the The advantage of this method is that you don’t interact with people to get the response.
  • Questionnaire: This is a list of questions used to collect data from the respondents. Participants do not have to identify themselves. The advantage of using this method is that you get the information directly from the
  • Interview: The interviewer asks the interviewee questions and records the The advantage of this method is that the interviewer may ask further questions if the response is vague.

6.3 Classifying and organising data

  • Organising data is taking information and arranging it into some kind of order (such as ascending or descending order).
  • Classifying data means organising it in groups or classes, based on some common

Data can be organised by using tally marks. These are a way of counting how many of each group there are. They are used when the data is discreet.
e.g.  Worked example 1
An Audi sales person has ordered cars from their plant in Germany. The table below shows the number of cars they received.

Colour

Red

White

Silver

Black

Model

A3

2

5

4

3

A4

3

2

3

6

S3

4

3

5

5

Q7

1

4

4

3

R8

2

3

1

4

Use the information provided above to construct a tally table for A3 cars.
Solution
Use vertical lines (tally marks) to represent the specific colour; the fifth tally mark should be drawn the across the 4 tally marks.

Colour

Frequency

Tally

Red

2

 data 2 kjhguygad

White

5

 data 3jhguygad

Silver

4

 data 4 kiuhiuyad

Black

3

 data 5 uyguyga

e.g.  Worked example 2
This grouped frequency table shows the heights of seedlings (young plants) in different categories.

Height of seedling (mm)

Frequency

10–14

3

15–19

6

20–24

7

25–29

5

30–34

4

  1. How many plants were measured altogether?
  2. How many plants are less than 20 mm high?
  3. How many plants are more than 24 mm high?
  4. What percentage of seedlings are below 25 mm?
  5. How many plants are at least 25 mm high?

Solutions

  1. 3 + 6 + 7 + 5 + 4 = 25 plants were measured
  2. 3 + 6 = 9 plants are less than 20 mm
  3. 5 + 4 = 9 plants are more than 24 mm
  4. 16 ÷ 25 × 100% = 64%
  5. There are nine plants that fall into the intervals of 25 mm or longer.

Activity 1: Working with frequency tables

The Geography examination marks, expressed as a percentage, of 52 learners were recorded as follows:

54

67

83

34

49

56

78

89

90

79

20

49

50

70

89

57

27

48

56

65

70

22

98

89

29

56

47

95

49

67

89

48

46

89

63

75

45

50

58

73

67

45

76

70

38

46

37

47

36

38

99

100

In the exam you are required to show results in terms of seven performance levels rather than percentages. As a result, the subject internal moderator who is analysing the results needs to work out the number of learners per performance level. Complete the frequency table below to work out the number of learners per performance level.          [14]

FREQUENCY TABLE : LEARNER PERFORMANCE IN GEOGRAPHY

PERFORMANCE LEVEL

PERCENTAGE RANGE

TALLY

FREQUENCY

1

0 to 29

   

2

30 to 39

   

3

40 to 49

   

4

50 to 59

   

5

60 to 69

   

6

70 to 79

   

7

80 to 100

   

Solutions

Frequency table : learner performance in geography

Performance Level

Percentage Range

Tally

Frequency

1

0 to 29

III ✓

4 ✓

2

30 to 39

IIII

5 ✓

3

40 to 49

IIII IIII I ✓

11 ✓

4

50 to 59

IIII III ✓

8 ✓

5

60 to 69

IIII

5 ✓

6

70 to 79

IIII III  ✓

8 ✓

7

80 to 100

IIII IIII I ✓

11 ✓

[14]

6.4 Summarising data

After data has been collected, classified and organised it is not always possible to mention every piece of data in a report. Instead we summarise data by describing the whole data set using just a few numbers. Summarising data also makes it easier to analyse the data later. Data can be summarised by using measures of central tendency or measures of spread.

6.4.1 Measures of central tendency and measures of spread

A measure of central tendency is a single value that attempts to show a central position of a set of data. There are three types of measures of central tendency: mean, mode and median.
Mean
The mean is the most common measure of central tendency that is used, but it can be easily influenced by high or low numbers in the data set. It is also known as the average. It is calculated by adding all the values together and dividing by the number of values in the data set.
Median
The median is the middle number in the data set. To determine the median, you have to write all the numbers in the data set from the smallest to the highest and the number in the middle will be your median. If there is more than one number in the middle (i.e. if the data contains an even number of data values) add the two numbers in the middle and divide the answer by 2.
Mode
The mode is the data value that appears most often in a set of data. No calculation is needed to find the mode. You just find the value that appears most frequently. If no number appears more than the other numbers, then there is no mode.

e.g.  Worked example 3

The principal  of Hills Primary School compiled data of the number of learners who receive social social grants in each class.
He arranged the numbers in ascending order, as follows:

0           0            1            1            1            2            2

2           3            3            3            3            4            4

5           5            6            6            6            7            7

  1. How many different classes are there at Hills Primary School?
  2. Determine:
    1. the mode
    2. the median
    3. the mean

Solutions

  1. 21 (Count how many numbers are in the data set)
  2. (i) Mode = 3 (the number that appears most in the data set)
    (ii) Median = 3 (the middle number in the data set)
    (iii) Mean = 0+0+1+1+1+2+2+2+3+3+3+3+4+4+5+5+6+6+6+7+7
                                                         21
    = 71
       21
    = 3,38
    (Add all the numbers in the data set and divide by the total number of the data set.)

e.g.  Worked example 4

Thembeka compared the monthly salaries of the employees at two call centres, one in Greytown and the other in Johannesburg.
The following are the monthly salaries, in rand, earned by call-centre agents:

Greytown
4 200     4 320     4 500     4 650     4 650     4 650     5 500     5 650     7 250
Johannesburg:
5 500     5 525     5 980     6 250     6 250     6 250     6 300     7 800     8 200     8 900

  1. How many employees are working at the Johannesburg call centre?
  2. Calculate the average (mean) salary earned at the Greytown agency.
  3. Write down the median monthly salary earned at the Johannesburg agency.
  4. What is the modal salary earned at Greytown?
  5. Find the median monthly salary of the employees in both agencies.

Solutions

  1. 10 (count how many numbers are in the data set)
  2. Mean = 4 200 + 4 320 + 4 500 + 4 650 + 4 650 + 4 650 + 5 500 + 5 650 + 7 250
                                                                            9
    = 45 370
          9
    = 5 041,11
    (Add all numbers in the data set (Greytown) and divide by 9.)
  3. (Since there are ten numbers in the data set, add the two middle numbers and divide by two.)
    Median  = 6 250  +  6 250
                                2
    = 12 500
           2
    = 6 250
  4. Mode = 4 650 (the number that appears most in the Greytown data set)
  5. Start by arranging all of the numbers into one data set in ascending order
    4200 4320 4500 4650 4650 4650 5500 5500 5525 5650 5980 6250 6250 6250 6300 7250 7800 8200 8900
    The median is 5 650.

When answering questions on data handling where two or more data sets are given, it is advisable to underline keywords (e.g. in order to choose the correct data set for the question, underline the town for the data set that you have to use).

Activity 2: Measures of central tendency

1. Information for question 1
Thandeka has a shop with a scrapbooking department and a toy department. She kept a record of the ages of the customers who visited the two departments on a particular day.
Scrapbooking is a hobby which involves cutting and pasting photos, pictures and other decorative items into a book.
data 6 jhguygad

Ages of customers who visited the scrapbooking department

35

60

46

57

54

34

60

54

56

46

47

67

65

54

45

Ages of customers who visited the toy department

5

15

25

7

36

21

70

20

17

6

15

65

9

15

1.

a)

Arrange the ages of the customers who visited the toy department in ascending order.

(1)

 

b)

Determine the mode of the ages of customers who visited the scrapbooking department.

(1)

 

c)

Calculate the mean age of the customers who visited the scrapbooking department.

(3)

 

d)

Determine the median age of customers who visited the toy department.

(4)

 

e)

How many customers visited the toy department?

(1)

 

f)

Calculate the percentage of customers older than 50 years who visited the scrapbooking department.

(1)

2.) The table below shows the results of the games played by 16 teams who are playing against each other to win the league.
Absa Premiership

Pos

Team

Pld

W

D

L

GF

GA

Pts

1

Kaizer Chiefs

19

13

4

2

31

12

43

2

Mamelodi Sundowns

19

10

4

5

33

21

34

3

Bidvest Wits

18

10

4

4

23

13

34

4

SuperSport United

20

9

5

6

28

22

32

5

Orlando Pirates

17

9

3

5

21

13

30

6

AmaZulu

21

7

7

7

20

27

28

7

Platinum Stars

18

7

6

5

19

18

27

8

Bloem Celtic

19

6

8

5

25

24

26

9

Ajax Cape Town

20

7

5

8

20

22

26

10

Moroka Swallows

18

6

5

7

22

22

23

11

University of Pretoria

20

7

2

11

19

21

23

12

Black Aces

18

6

5

7

16

20

23

13

Maritzburg Utd

19

5

5

9

19

25

20

14

Polokwane City

19

5

4

10

21

26

19

15

Free State Stars

18

4

4

10

14

26

16

16

Golden Arrows

19

4

1

14

16

35

13

Key: Pld(games played) W(games won) D(games drawn) L(lost games) GF(goal for) GA(goals against) Pts(points)
2).

  1. How many teams do we have on the Absa Premiership league?   (1)
  2. How many points does the last team on the league have? (1)
  3. How many games did the first team on the league play (Pld)? (1)
  4. Which team(s) played the least number of games? (1)
  5. Calculate the mean for the number of games played.Give your answer to the nearest whole (4)
  6. Determine the median of the “goal against” data set (GA). (3)
  7. Write down the mode for the points scored (Pts). (1)  [12]

Note: The reason that we have all three measures: mean, median and mode, is because they can give us different information.

Solutions

  1. a)   5 6 7 9 15 15 15 17 20 21 25 36 65 70 ✓         
    b) Mode = 54 ✓         
    c) Mean = 35+34+47+60+60+67+46+54+65+57+56+54+54+46+45
                                                            15
    = 780
       15
    = 52
    d) 5 6 7 9 15 15 15 17 20 21 25 36 65 70        ✓
    Median = 15 + 17
                          2
    = 32 
        2
    = 16
    e) 14 ✓
    f) 9   × 100 = 60% ✓
       15
  2. a) 16 teams ✓ (1)
    b) 13 points ✓ (1)
    c) 19 games ✓ (1)
    d) Orlando Pirates ✓
    e). Mean = 19+19+18+20+17+21+18+19+20+18+20+18+19+19+18+19 ✓
                                                            16
    = 302
       16
    = 18,88 ✓ ✓
    ≈ 19 ✓ (4)
    f) Median (first arrange the numbers in an ascending order)
    12 13 13 18 20 21 21 22 22 22 24 25 26 26 27 35 ✓
    (22 + 22) ÷ 2 ✓
    = 44 ÷ 2
    = 22 ✓   
    g) Mode = 23 ✓                 

e.g.  Worked example 5
The department of trade and industry has funded 1 internship student from Gauteng to study engineering at the University of Toronto in Canada. Mr Kasmal, the project co-ordinator has decided to use the learner’s marks in the table below, to choose the best learner.
Help Mr Kasmal to choose the best learner by calculating the mean, median and mode of the 2 learners’ results.

Subject

Internship student A

Internship student B

Mathematics

95

95

Physical Science

93

93

Life Sciences

69

72

Life Orientation

87

87

English

90

90

Home Language

92

89

Geography

90

90

Solution

Internship student A

Internship student B

Mean = 95+93+69+87+90+92+90 
                                 7
= 88 

Mean = 95+93+72+87+90+89+90 
                                 7
= 88   

69 87 90  90 92 93 95
Median = 90

72 87 89  90 90 93 95
Median = 90

Mode is 90

Mode is 90

The measures of central tendency are the same for the two learners. In this case, these cannot be used to determine the best candidate.
The measures of spread can then be used to make the choice by analysing the spread of marks, as discussed on the next page.

6.4.2 Measures of spread

Range
The range is the difference between the largest (highest) and the smallest (lowest) values.
The range is a measure of spread because it tells you how spread out the data values are.
A small range suggests that the values are grouped closer to the median, while a bigger range suggests that the values are more spread out.
Range = highest data value – lowest data value

e.g.  Worked example 6
Find the range of the number of death fatalities that occurred over 6 months on the N4 highway: 3; 7; 8; 5; 4; 10.
Solution
The lowest value is 3, and the highest is 10, so the range = 10 – 3 = 7.
worked excahbghj
Measures of spread are used to determine how spread out the data is. We have the following measures of spread: range, quartiles and percentiles. These are used together with measures of central tendency to analyse and interpret data.
The range only indicates the spread between the lowest and highest values. This might be misleading if only the minimum and the maximum values are spread apart and the other values are grouped together.

e.g.  Worked example 7
In the data: 2, 75, 79, 83, 86, 86, 89, 99, the range will be: 99 – 2 = 97, which might give the wrong interpretation that the data is spread apart. To overcome outlier values, quartiles may be used to analyse the data.
(An outlier is an extremely low or extremely high value.)

Quartiles
This is the division of data into 4 equal parts. The data is divided into four portions of 25% each.
To determine the quartiles, first divide the information into 2 equal parts to determine the median (Q2), then divide the lower half into 2 equal parts, so that the median of the first half is the lower quartile (Q1). Then divide the upper half into 2 equal parts, so that the median of the second half is the upper quartile (Q3).
data 8 jkhguytgfa

Data can be summarised using 5 values, called the five number summary,
i.e.   the minimum value, lower quartile, median, upper quartile, and maximum value.

Interquartile range
This is the difference between the upper quartile and the lower quartile.
It indicates the spread between the lower part of the data and the upper part of the data

e.g.  Worked example 8
The following data has been released by Statistics South Africa on fatal crashes for November and December 2011.

MONTH

GP

KZN

WC

EC

FS

MP

NW

LIM

NC

NOV

200

156

77

112

73

75

57

81

21

DEC

182

227

107

135

90

109

78

120

32

Use the given table to determine the five number summary and the interquartile range for each month.
First arrange the data for each month in ascending order.
Determine the minimum and the maximum values.
Determine the median (quartile 2).
Then determine quartile 1 and quartile 3.

Solutions
November:

21 57 73 75 77 81 112 156 200

  • Minimum : 21
  • Maximum : 200
  • Median (quartile 2 ) : 77, divides the data into 2 equal halves
  • Quartile 1 divides the lower half of the data into 2 equal parts
    Quartile 1 = 57+73  =  65
                           2
  • Quartile 3 divides the upper half of the data into 2 equal parts
    Quartile 3 = 112+156  = 134
                              2
  • Interquatile = Q3 - Q1
    = 134 - 65
    = 69

December:

32 78 90 107 109 120 135 182 227

  • Minimum : 32
  • Maximum : 227
  • Median (quartile 2 ) : 109, divides the data into 2 equal halves
  • Quartile 1 divides the lower half of the data into 2 equal parts
    Quartile 1 = 78+ 90  =  84
                           2
  • Quartile 3 divides the upper half of the data into 2 equal parts
    Quartile 3 = 135 +182  = 158,5
                              2
  • Interquatile = Q3 - Q1
    = 158,5 - 84 
    = 74,5

This can be summarised in the following table:

Month

Minimum

Q1

Q2

Q3

Maximum

IQR

November

21

65

77

134

200

69

December

32

84

109

158,5

227

74,5

The summary can be used to analyse which month had more fatalities by representing the data on box and whisker plots (in Section 7.5).

Activity 3: Measures of spread

The South African Weather Service recorded the temperatures for ten towns and cities in South Africa on 2009-05-13

TABLE 5: Temperatures recorded on 2009-05-13 for ten South African towns and cities

 

 

Temperature in °C

Bloemfontein (Bfn)

Cape Town (Ctn)

Durban (Dbn)

Johannesburg (Jhb)

Kimberley (Kmb)

Mafikeng (Mfk)

Musina (Msn)

Nelspruit (Nls)

Pretoria (Pta)

Polokwane (Pol.)

Minimum

5

13

15

6

10

8

20

9

7

3

Maximum

23

22

A

21

24

23

40

22

22

22

Mean (average) maximum temperature = 25,6°C

Use the information in the above table to answer the following questions.

  1. The upper quartile for the minimum temperature is 13°C.
    Identify the towns or cities in which the minimum temperatures were less than the upper quartile.     (7)
  2. Calculate:
    1. The maximum temperature, A for (2)
    2. The median of the maximum (3)
    3. The percentage of the towns and cities that had a maximum temperature greater than the (1)
  3. Would the maximum temperatures best be represented by the median or the mean?
    Justify your answer.   (3)
  4. Determine the interquartile range for the maximum temperatures.     (6) [22]

Solutions

  1. Bloemfontein, ✓ Johannesburg, ✓ Mafikeng, ✓ Kimberly, ✓ Nelspruit, ✓ Pretoria ✓ and Polokwane ✓  (7)
  2.              
    1. 25,6  =  23+22+A+21+24+23+40+22+22+22
                                                   10
      25,6  = A + 219
                    10
      A = 37
    2. 21 22 22 22 22 23 23 24 37 40 ✓
      Median =  22+23 
                              2
      = 22,5 
    3. 2.3 50% of the cities and towns had a maximum temperature greater than the median. ✓    (1)
  3. The mean is affected by the 2 high temperatures ✓ (Durban 37°C and Musina 40°C). Eight of the 10 towns have maximum temperatures ✓ less than the mean The median is therefore a better representation. ✓  (3)
  4. 21 22 22 22 22 23 23 24 37 40 ✓✓
    Q1 = 22 + 22
                  2
    = 22
    Q3 = 24 + 37
                 2
    = 30,5
    IQR =   Q3 - Q1
    =  30,5 -  22
    = 8,5

Percentiles
This is the division of data into 100 equal groups. This is used to analyse the spread of large sets of data. Percentiles can be represented as follows:
data 9 jhgyuhgad

5% of values lie below the 5th percentile and 95% of the values lie above.
25% of values lie below the 25th percentile and 75% of the values lie above.
50% of the values lie below the 50th percentile and 50% of the values lie above.
95% of values lie below the 95th percentile and 5% of the values lie above.
Percentiles are used to determine the percentage of the data grouped in categories.
The concept of percentiles is used in growth charts. The curves on the growth chart below represent the percentile values of the data collected from different age groups. The growth chart is used to compare the BMI (body mass index) of a child to others in his age group. This is also used to determine the health status of the baby.

e.g.  Worked example 9
A South African couple has relocated to USA .The growth chart below has been used to monitor the growth of their female children.
Use the chart to answer the questions.
data 10 jguyguad

  1. What is the BMI of a 4 year old girl at the 95th percentile?
    Solution
    Draw a vertical line upwards from 4 years to the 95th percentile. Draw a horizontal line across to find the relevant BMI.
    The BMI is 18 kg/m2.
  2. The couple’s 10 year old child has a BMI of 16 kg/m2. Between which percentile curves does her BMI lie?
    Solution
    Draw a vertical line upwards from 10 years. Draw a horizontal lie across from 16 kg/m2. Locate the percentile, where the two lines meet. Between the 25th and 50th percentiles.
  3. The BMI of their youngest child who is 2 years old lies at the 45th What does this mean?
    Solution
    The BMI of 45% of the girls of her age group is less than hers and the BMI of 55% of the girls in her age group is above hers.
  4. Use the table below to determine the health status of their 16 year old girl with the BMI of 20 kg/m2

    BMI for age percentile range

    Weight status

    <5th percentile

    Underweight

    5th percentile to < 85th percentile

    Healthy

    85th percentile to < 95th percentile

    Risk of overweight

    ≥ 95th percentile

    Overweight

    Solution
    Draw a vertical line upwards from 16 years.
    Draw the horizontal line across from 20 kg/m2.
    Determine the percentile and use it to determine the health status: it is just below the 50th percentile, therefore the child is healthy.

Activity 4: Working with a percentile graph

Study the growth chart below and answer questions that follow.
data 11 kjhuuygad

Mrs Michael, the visiting American ambassador has brought her twins, a boy and a girl who are 9 months old. She is also looking after her late sister’s daughter who is 1 year old. Use the table below.

BMI for age percentile range

Weight status

<5th percentile

Underweight

5th percentile to < 85th percentile

Healthy

85th percentile to < 95th percentile

Risk of overweight

≥ 95th percentile

Overweight

  1. What is the weight of her daughter at the 75th percentile? (1)
  2. Give a range of percentile curves for her son who weighs 10,5 (1)
  3. Calculate the BMI of her niece whose height is 60 cm and whose weight at the 25th Give your answer in kg/m2.
    Use the formula : BMI = mass .    (3)
                                           height2
  4. Do you think she must be worried about her niece’s health status? Explain. (1) 
  5. What does it mean if the weight of a child is at the 68thpercentile? (2)  [8]

Solutions

  1. 9 kg ✓     
  2. 90th to 75th percentile ✓
  3. Her weight is 9 kg ✓
    60 cm = 0,6 m
    BMI = mass .  
              height2
    = 9/0.62
    = 25 kg/m2
  4. No, because she is healthy according to the BMI table. ✓
  5. The weight of 68% of the children of her age group is less than hers, and the weight of 32% of the children in her age group is above hers. ✓✓    (2) [8]

6.5 Representing, interpreting and analysing data

Purposes of graphs:

  • a way of exploring the relationships in data
  • a way of displaying and reporting data
  • making it easier to report patterns and relationships, shapes of distributions and trends.

Any graph used to report findings should show:

  • the significant features and findings of the investigation in a fair and easy-to-read way
  • the underlying structure of an investigation in terms of the relationships between and within the variables
  • the dependent variable on the horizontal (x) axis and the independent variable on the vertical (y)

Types of graphs

We have the following types of graphs:

  • Line graph
  • Bar graph
  • Histogram
  • Scatter plot
  • Pie chart
  • Box and whisker

6.5.1 Line graphs

In data handling we use line graphs to show the relationship between two quantities. A line graph is formed by using straight lines to join data points which have been mapped on a grid. It is used to show the change of information over time.

e.g.  Worked example 10
The table below shows the average number of minutes per month that Jabu spent watching TV from January to November last year.

Month

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Daily TV viewing time (min)

108

103

108

120

115

122

116

105

110

105

104

  1. Plot this data on a set of axes.
  2. Can you observe any trends or patterns in the data? Give some possible reasons for these trends.
  3. Would you be able to represent this data on a bar graph?
  4. What is the advantage of using a line graph to show this information?

Solution for example 1

  1. The points are plotted and connected with line segments
    data 12 uigyghad
  2. You can see that Jabu’s viewing time increases in April, again in June and slightly in September (perhaps due to school holidays). We also see decreases in his viewing time during February, May, August, October and These could be times when he was preparing for tests and exams.
  3. Yes, it would be possible to represent this data on a bar graph; the number of minutes would be plotted as a bar for each
  4. A line graph helps us to see trends because we can easily see the increasing or decreasing slope of each line segment in the graph

6.5.2 Bar graphs

A bar graph is used to represent data that is sorted into categories. Display data is compared in categories. Each bar shows the number of items in that category and there are spaces between the bars.

A bar graph can be a:

  • single graph
  • double or multiple graph
  • compound or stacked

Worked example 11
The school tuck shop keeps track of how many hot dogs, sandwiches, salads and burgers they sell at one break time. They have the data given in the table below. Draw a bar graph to represent this data.

Item

Frequency

Hot dogs

15

Sandwiches

35

Salads

10

Burgers

12

Solution for example 2
data 13 kjhuuyadf
e.g.  Worked example 12
A survey of 1 000 households was undertaken during 2001 to determine how many households used various electronic appliances. A survey of the same number of households was repeated during 2007.
The graph below shows the results of the two surveys.
data 14 kjhiuhf

  1. What was the percentage increase in usage of TV sets between 2001 and 2007?
  2. Which appliance was used most in households during both 2001 and 2007?
  3. Which appliance showed a decreased usage in 2007 compared to 2001?
  4. How many of the 1 000 households surveyed used cellphones during 2007?
  5. Calculate the difference in percentage usage during 2001 between TV sets and DVD players.

Solution for example 3

  1. 65,6% – 53,8% = 11,8%
  2. Radio
  3. Video machine
  4. 72,9% × 1 000 households
    = 0,729 × 1 000
    = 729 households
  5. Difference in percentage = 53,8% – 24,4%
    = 29,4%

Activity 5: Working with bar graphs

The compound bar graph below shows the percentage of South African children for age seven to thirteen enrolled in primary schools during 1996 and 2007.
data 15 mhgujhyad
[Source: www.businessreport.co.za. 8 November 2007]
Use the graph above to answer the following questions.

  1. What percentage of 10 year olds was enrolled during 1996? (1)
  2. Calculate the increase in the percentage enrolment of 11 year olds from 1996 to 2007.     (1)
  3. Which age group had
    1. the largest percentage enrolment in 1996? (1)
    2. the smallest percentage enrolment in 2007? (1)
    3. the greatest increase in percentage enrolment between 1996 and 2007? (1)
  4. If there were 240 000 ten year old children in South Africa in 1996, calculate the number of 10 year olds enrolled in primary schools in 1996.    (2) [7]

Solutions

  1. 91,3% ✓         
  2. Increase = 96,3% - 93,6% ✓
    = 2,7%    (1)
  3.                              
    1. 13 year olds ✓     (1)
    2. 7 year olds ✓   (1)
    3. 7 year olds ✓ (1)
  4. d. 91,3% of 240 000
    91,3 ÷ 100 ×  240 000 ✓
    = 219 120 ✓   (2)  [7]

6.5.3 Histograms

Histograms are different from bar graphs in that they represent continuous data. Data that is displayed on a histogram is also grouped. There are no spaces between the bars.
e.g.  Worked example 13
Lwanda measures the lengths of his school books (in cm) and draws up the frequency table below. Draw a histogram to represent this data.

Length of book

Frequency

20 – 23,9 cm

4

24 – 26,9 cm

7

27 – 29,9 cm

5

Longer than 30 cm

3

data 16 khguyhad

Activity 6: Working with histograms     

Mr Smith, an investor from Australia, has just opened the branch of Raetsiza Company in Pretoria central. The graph below represents the salary categories of the employees versus the number of employees per category. Study the graph and answer questions that follow.
data 17 mkbhjgad

  1. How many people were employed by the Raetsiza Company? (2)
  2. How many employees are earning the lowest salary? (1)
  3. Why do fewer employees earn the highest salary? (1)
  4. Give possible reasons why there are fewer employees in the category of R180 000 to R210 000. (1)
  5. If the salary increases by 6%, what will be the new maximum amount for employees in the category R150 000 – R180 000?  (2)  [7]

Solutions

  1. 150 + 100 + 110 + 40 + 70+ 80 ✓ + 30 = 580 ✓ (2)
  2. 150 earn less than R120 000 ✓ (1)
  3. They are senior employees. ✓ (1)
  4. They have got special skills. ✓ (1)
  5. 106% of R180
    = 1,06 ×  R180 000 ✓
    = R190 800 ✓ (2) [7]

6.5.4 Pie charts

Pie charts are circular graphs, divided into sectors. They are used to show the parts that make up a whole. They can be useful for comparing the size of relative parts. They do not give quantities of the categories, only the relative (compared) amounts. They do not show the actual amounts. The information is often presented as percentages that must add up to 100%. They are often used in media to show clear and important differences, but  they cannot show shape and spread of data.
Note: Learners are not expected to draw pie charts.

e.g.  Worked example 14
The pie chart below shows a survey of the different types of the favourite fruit juice flavours that are normally bought by a group of 120 learners from Ndukwenhle High school during their lunch time.
data 18 hguyagduy

  1. Calculate how many learners chose each type of juice
  2. In what way does the pie chart work better than a bar graph to represent this data?
  3. What information would a bar graph give you that this pie chart does not?

Solutions

  1. 45% of 120 learners
    = 54 learners who chose fruit cocktail.
    30% of 120
    = 36 learners who chose apple.
    12,5% of 120 learners
    = 15 learners who chose grape.
    12,5% of 120
    = 15 learners who chose litchi.
  2. The pie chart is a simple, visual representation that works well for representing A pie chart allows us to see at a glance the relative proportions of the learners who prefer each flavour.
  3. The number of learners who prefer each

Activity 7: Working with pie charts 

A recent survey looked at households in two income groups. The study determined what percentage of monthly income was spent on food, housing and other requirements. The pie charts below represent the findings of the study.
data 19 kjhuighua

  1. What were the average monthly incomes of the groups considered?     (2)
  2. What percentage of Group 1’s earnings was spent on housing? (1)
  3. How much was spent on housing by a household in Group 2? (2)
  4. Which group spent the larger amount of money on food? Justify your answer by calculations.      (5) [10]

Solutions

  1. R3 000 and R20 000 ✓✓ (2)
  2. 100% – 75% = 25%  (1)
  3.  (40 ÷ 100) × 20 000
    = R8 000 ✓ ✓ (2)
  4. 1. (55 ÷ 100) × 3 000 
    = R1 650 ✓
    2. (14 ÷ 100) × 20 000 
    = R2 800 (2 spent more) ✓✓ (5)  [10]

6.5.5 Scatter plot

A scatter plot is the most useful graph for studying the relationship (correlation) between two variables. It shows one of the variables on the horizontal axis and the other variable on the vertical axis. The resulting scatter plot of points will show at a glance whether a relationship exists. You cannot have more than two sets of data on a scatter plot.

A scatter plot can show:

  • positive correlation
  • negative correlation
  • no correlation
  • When seeing patterns remember that the tighter together the points are clustered, the stronger the correlation between the variables you have plotted
  • If you find a pattern that slopes from the lower left to the upper right, this tells you that as x increases, y also increases. This means there is a “positive” correlation between the two variables
  • If you find a pattern that slopes from the upper left to the lower right, this tells you that as x increases, y This means there is a “negative” correlation between the two variables.

data 20 kjhiughaud

e.g.  Worked example 15
After writing controlled tests for term one, Tourism and Mathematical Literacy marks of 10 randomly selected Grade 10 learners were recorded.
Learners are not expected to draw a line of best fit.

TOURISM

55

60

 

20

70

5

40

50

10

30

55

MATHS LIT

70

60

 

40

75

80

30

70

5

45

50

  1. Draw a scatter plot for the
  2. Describe the relationship between the
  3. Is there any point you regard as an outlier? Give a reason for your
  4. Is there a correlation between the sets of data?

Solutions

  1. SCATTER PLOT
    data 21 jhgvjuyad
  2. There is a positive relationship between the Tourism and Mathematical Literacy
  3. Yes, point (5; 80). The learner has got the highest mark in Mathematical Literacy and the lowest mark in Perhaps there is a mistake with one of the marks.
  4. Yes and it is a positive correlation

6.5.6 Box and whisker plots

Box and whisker plots are graphical representation of the five number summary of a set of data.
The five number summary:

  • Minimum value
  • Lower quartile (Q1)
  • Median (Q2)
  • Third quartile (Q3)
  • Maximum value

Learners are not expected to draw the box and whisker plot.
e.g.  Worked example 16
Read from the box and whisker plot the values of the five number summary.
data 22 uyguygad
Solution

Minimum

70

Lower quartile (Q1)

100

Median (Q2)

110

Third quartile (Q3)

115

Maximum value

120

Activity 8: Box and whisker plots

(Sunday times 2009 Q & A)
The box and whisker plot below represents the batting averages of 160 cricketers who have batted in T20 matches since 1 January 2009. Answer the questions that are based on the plot.
data 23 ujyyhgujyhac

  1. What is the name given to the two data points with values 57 and 57,33? (1)
  2. How many players have a batting average less than 6,25? (2)
  3.  What must a batsman’s average be for him to be in the top quartile? (1)
  4. Jacques Kallis is the South African with the highest batting average. If his average is 48,4, how does he compare with the other batsman? (1) [5]

Solution

  1. Outliers ✓ (1)
  2. About 40 players (lower quartile of 160 players) ✓✓ (2)
  3.  >25,5 ✓ (1)
  4. He is definitely in the top quartile. ✓ He is close to the highest batting averages so he compares favourably with the best batsmen in the world. (1)  [5]

6.6 Misleading data

(Used http://en.wikipedia.org/wiki/Misleading_graph as a source for graphics – Creative Commons)
Graphs in the media are often drawn to support a particular point of view. We need to be aware that in some situations the graph may give a false impression of the data.
There are numerous ways in which a misleading graph may be constructed.

Biased labelling
The use of biased words in the graph’s title, axis labels, or caption may lead the reader to an incorrect conclusion.

Misleading graphs
Comparing pie charts of different sizes could be misleading as people cannot accurately read the comparative area of circles.
Thin slices which are hard to read may be difficult to interpret.
Making a pie chart 3D or drawing at an angle might make interpretation difficult due to the resulting effect

Misleading pie chart

Regular pie chart

data 24 juhguyghad

   data 25 jughiuhgad

In the misleading pie chart, Item C appears to be at least as large as Item A, whereas it is actually less than Item A.

Source: BBC – GCSE: Bitesize misleading graphs
e.g.  Worked example 17
The histogram below shows the price increase of houses from 1998 to 1999.
data 26 jhgjyhgad

What is misleading about the histogram shown above and how should the information be represented?
Solution
It looks as though house prices have tripled (increased by a factor of 3) in one year, but this is not true. The graph is misleading because the vertical axis does not start at 0.
data 27 kjhgjgadc

e.g.  Worked example 18: Misleading graphs
What can you conclude from the graph given here? How best can this information be presented?
data 28 khjggfad

Axis changes
Changing y-axis maximum
data 29 kljhgjuygad

No scale
The scales of a graph are often used to exaggerate or minimise differences.
Misleading bar graph with no scale

Less difference

More difference

data 30 kuhhad data 31 jhygyyuagd

Note the lack of a starting value for the y-axis, which makes it unclear if the graph is truncated. Additionally,note the lack of tick marks which prevents the reader from determining if the graph bars are properly scaled.Without a scale, the visual difference between the bars can be easily manipulated.

Solution
It looks from the graph as if the number of singles sold went down from 1995 to 1996 and up from 1996 to 1997 and then down again. The information could be clearer if the 2D bars were used.
Now it is clear that sales for the year 1995 and 1997 are the same.
data 32 khgyugad

Activity 9: Revision exercise  

Misleading graphs

  1. Look at the bar graph below and answer the questions that follow
    data 33 jhghyugad
    1. Does this graph tell us how many Grade 10 learners there are in total?    (3)
    2. Can we assume that none of the learners who take Accounting take Geography? (2)
    3. A pie graph of this data would not make Explain why. (3) [8]

Solutions to 1

  1. No. It may look like there are 140 learners in total ✓ but learners take more than one subject, ✓ so we can’t use the numbers of learners per subject to determine how many learners there are altogether. ✓
  2. No, we have no information ✓ about whether learners can take both Accounting and ✓            (2)
  3. Learners do not only take one subject, ✓ therefore the data cannot be split into discrete percentages ✓ per subject and represented using a pie ✓   (3)  [8]

2). The two graphs below show the same data in different forms about the members of Uthando Saving Club from 2004 to 2007. Using the graphs, explain whether each of the statements beneath is true or false
data 34 huuyhgafd

  1. The number of female members was greater than the number of male members each year.     (1)
  2. In 2006 the numbers of male and female members were equal.    (2)
  3. The number of members who are men has gradually increased over the (2)
  4. There were more female members in 2007 than there were in (1)
  5. There were more male members in 2006 than there were in (1)
  6. The comparison of male and female members has changed over the years.     (1)  [8]

Solutions to 2

  1. FALSE – In 2007 there were fewer female members. ✓
  2. TRUE – In 2006 the bars for men and women are the same height on the bar chart; ✓ on the line graph the lines showing ‘men’ and ‘women’ cross. ✓ (2)
  3. TRUE – In 2005 and 2006 the number is the same, but over ✓ the four-year period overall, it gradually increases✓ (2)
  4. TRUE – The line showing the ‘women’ is higher in 2007 than the same line in 2005✓            (1)
  5. FALSE – There is the same number of men in 2005 and 2006✓ (1)
  6. TRUE – In 2005 there were more women than men; by 2007 there are more men than women. ✓   (1)  [8]

3). Marvin has a gym. In 2012 a total of 1 150 people attended the weight-lifting classes. He kept a record of the number of males, females and different races attended the weight-lifting class from 1 January to 31 December

TABLE: Number of males and females attending the weight-lifting classes

Month

Number of males

Number of females

January

60

16

February

71

19

March

63

18

April

82

15

May

80

19

June

52

13

July

96

A

August

79

14

September

80

15

October

119

20

November

76

25

December

85

18

TOTAL

943

 

data 35 jhguyhgad  

3). Use the pie chart and the table above to answer the following questions

  1. Give the ratio (in simplest form) of the number of females to males who attended the weight-lifting classes in September 2012.   (2)
  2. Calculate the missing values A and B(5)
  3. If a weight lifter is chosen at random from the whole year’s weight-lifting class, what is the probability that the weight lifter will be a white female?   (5)
  4. Determine the:
    1. mean (average) of the number of males in the weight-lifting class   (2)
    2. modal monthly number of females in the weight-lifting class  (1)
    3. median of the number of males in the weight-lifting class (3)
    4. range of the number of females in the weight-lifting class(2) [20]

Solutions to 3

  1. 15: 80 ✓
    = 3: 16 ✓   
  2. A =1 150 – (943+16+19+18+15+19+13+14+15+20+25+18) ✓
    = 1 150 – 1135
    = 15 ✓
    B = 18%- ✓ (1,57%+8,26%+5,08%) ✓
    = 3,09% ✓
  3. Number of females =1 150 – 943
    = 207 ✓
    Number of white females = 8,26% of 207 ✓
    = 17,0982
    ≈ 17 ✓
    P (white female) =   17      ✓
                                 1 150
    = 0,01478 ✓ 
  4. (i)    Mean = 943 ÷ 12 ✓
    = 78,58
    ≈ 79 ✓     
    (ii) Mode =15 ✓
    (iii) 52;60;63;71;76;79;80;80;82;85;96;119
    Median = 79+80 ✓
                        2       
    = 79,5
    ≈ 80 ✓   
    (iv) Range = 25 – 13 ✓
    = 12 ✓    
Last modified on Wednesday, 08 September 2021 12:26