Introduction The aim of this coursework is to estimate how good the population (HAM students) is at estimating string lengths and angles. To do this, I will use secondary data (firstly because it would be very time-consuming to collect primary data from all the population, and secondly because I was already provided with secondary data). Because the data is too extensive (there are more than 400 pieces of data), I will analyse it using samples (I have a limited time to complete this coursework, and if I analysed every piece of data, I wouldn’t complete it on time). Both quantitative (e.g. IQ) and qualitative data (e.g. hair colour) will be considered when formulating my hypothesis and conclusion. I chose to do this coursework only in the computer because the graphs will be more accurate and not influenced by human error. I was told to use a sample percentage between 20% and 30% of the population. I chose to use 30%, because it will represent the population better (it’s the biggest percentage I could use), but it’s still small enough to handle. There are several sampling methods I could use (for example, random, systematic, stratified sampling...), but I chose to use stratified systematic sampling - stratified because it chooses the sample proportionally, making this estimation less biased, and systematic because it’s easier to select than random sampling (you just have to choose every nth piece of data on the list). After my sampling is complete, I’ll review it to make sure there are no abnormalities or errors, and if I find any, I’ll take those out of my samples and replace them with another piece of data chosen at random. Afterwards, I will formulate between three and five hypothesis (i.e. what I think the outcome of the investigation will be – e.g. I think Year 11s are better at estimating than Year 7s, or I think Year 11s with an IQ above 100 are better at estimating than Year 11s with a lower IQ). One of the main reasons why I’m doing this coursework is for my examiners to see if I can test my hypotheses using different methods. To prove I can, I will have to test my hypotheses using more than one method for each one. I have a range of choices, and here are some examples:
Pie charts/graphs (grade E) Scatter graphs (grade D) Averages – mean, median, mode (grade D) Estimate of mean (grade C) Cumulative frequency/box plots (grade B) Spearman’s rank (grade B) Histograms (grade A)
Logically, I will try to choose the higher grade methods so I might achieve a higher grade in this coursework. It’s not a problem if my hypotheses turn out to be wrong – just the fact that I can test them will give me some marks. When looking at all my testing of the hypotheses, I will be able to write a conclusion saying what I have found out and interpreting the results (this means that if, for example, girls’ guesses are closer to the actual figures than boys’ guesses, girls are better at estimating than boys). In spite of all these things, I still have some limitations; for example, I have no assurance that the data provided is correct or unbiased (some people may have 2
made up some values or been influenced by what others wrote). The data also has a wide range: people’s guesses are far from each other, and qualitative data isn’t very concentrated (for example, hair colours are very varied). If I have some time after finishing the coursework, I can add some extra features as an extension – like creating my own survey/questionnaire, for example. Another thing I can do is to ask my own family and friends; in this way I could collect some primary data.
Sampling calculations To do my calculations, I need to choose an appropriate sample of data. I chose to use 30% of the total population in my sample. To do my sampling, I gave myself the option to choose between four methods: 3
Random sampling Systematic sampling Stratified systematic sampling Stratified random sampling
I chose stratified systematic sampling, but I will demonstrate the other three methods as well. Random sampling In random sampling, everyone has an equal chance of being chosen, regardless of their gender or age. We don’t need to know how many boys and girls we should choose from each year, we only need to know how many students we should choose from each year. This means that, sometimes, the sampling is not fair (for example, if there are double the boys than the girls, and we use random sampling, it will be very likely that there are much more boys than girls who are picked to be part of the sample.) To do the calculations, I will need the total population (which I named tp) and the sample population (sp). sp will always be 30% of tp, as I want to use 30% of the population in my sample. Year 7 There are 100 students in Year 7. sp = tp x 0.3 sp = 100 x 0.3 sp = 30 The result tells us we have to choose 30 people at random from Year 7, regardless of their gender. Year 8 There are 120 students in Year 8. sp = tp x 0.3 sp = 120 x 0.3 sp = 36 This means we have to choose 36 people in Year 8. Year 9 There are 77 students in Year 9. sp = tp x 0.3 sp = 77 x 0.3 sp = 23.1 ≈ 23 So we now discovered that we have to choose 23 people in Year 9 to be part of our sample. Year 10 I’ve counted 78 students in Year 10. sp = tp x 0.3 4
sp = 78 x 0.3 sp = 23.4 ≈ 23 We need to randomly choose 23 people in Year 10. Year 11 Finally, there are 58 students in Year 11. sp = tp x 0.3 sp = 58 x 0.3 sp = 17.4 ≈ 17 We must choose 17 people from Year 11. So here are the figures of my random samples: Year 7 30 students chosen Year 8 36 students chosen Year 9 23 students chosen Year 10 23 students chosen Year 11 17 students chosen
Systematic sampling In this sampling method, every nth piece of data from the list is chosen. To find the value of n, we must divide the total number of students in the school by the total number of students in each year. Example (year 7): Total number of students in school = 100+120+77+78+58 = 433 Y7 = 433÷100 = 4.33 ≈ 4
5
In the Year 7 list, we must choose every 4th student, regardless of their age or gender. Year 8 There are 120 students in Year 8. Y8 = 433÷120 = 3.61 ≈ 4 We will choose every 4th student in the Year 8 list. Year 9 Year 9 has 77 students. Y9 = 433÷77 = 5.62 ≈ 6 We have to choose every 6th student from the list of Year 9 students. Year 10 In Year 10, there are 78 students. Y10 = 433÷78 = 5.55 ≈ 6 We would also have to choose every 6th student from the list of Year 10 students. Year 11 There are only 58 students in Year 11. Y11 = 433÷58 = 7.47 ≈ 7 This means we have to choose every 7th student from the list of Year 11s. Here are the figures of my systematic samples: Year 7 Choose every 4th student
Year 10 Choose every 6th student
Year 8 Choose every 4th student
Year 11 Choose every 7th student
Year 9 Choose every 6th student
Stratified systematic
sampling This is the method I’ll use to obtain the samples I will be working with, because it represents the population more fairly than any other sampling method (people are chosen according to their proportions in the population). In this type of sampling, what we need to do is find the proportion of each gender to the total number of students (we do that by dividing, for example, the number of boys by tp). Then, we multiply the result by the size we want our sample to be, which is sp. As in random sampling, sp will be equal to tp x 30%. These calculations will give me the number of boys and girls I have to choose according to their proportion in one particular year. 6
To see which students I have to choose, I divide the total number of students in one year by the number of boys (or girls) in that same year. Example: 15 boys 30 girls 45 students b = 45÷15 = 3 g = 45÷30 = 1.5 ≈ 2 This means I should choose every 3 rd boy and 2nd girl in the list (after separating boys and girls). Year 7 In this case, as we already know, there are 100 students in Year 7. There are 46 boys and 54 girls. In Year 7, sp = 100 x 0.3 = 30. b = (46÷100) x sp b = 0.46 x 30 b = 13.8 ≈ 14 g = (54÷100) x sp g = 0.54 x 30 g = 16.2 ≈ 16 (We have to round the results because we’re talking about people – there are no fractions of a person. If the results are not whole numbers, we have to round them.) Let’s check if b + g = sp: b + g = sp 16 + 14 = 30 The sum of b and g is equal to sp, so our values are correct. Now let’s see the systematic numbers: b = 100÷46 = 2.17 ≈ 2 g = 100÷54 = 1.85 ≈ 2 This means we have to choose 14 boys and 16 girls from Year 7 (every 2nd boy and 2nd girl from the list). Year 8 There are 59 boys and 61 girls (120 students). sp = 120 x 0.3 = 36 b = (59÷120) x sp b = 0.49 x 36 b = 17.64 ≈ 18 g = (61÷120) x sp g = 0.51 x 36 g = 18.36 ≈ 18 7
18 + 18 = 36 b = 120÷59 = 2.03 ≈ 2 g = 120÷61 = 1.97 ≈ 2 We should choose 18 boys and 18 girls from Year 8. Again, we should choose every 2nd boy and 2nd girl. Year 9 There are 32 boys and 45 girls: 77 students in total. sp = 77 x 0.3 = 23.1 ≈ 23 b = (32÷77) x sp b = 0.42 x 23 b = 9.66 ≈ 10 g = (45÷77) x sp g = 0.58 x 23 g = 13.34 ≈ 13 10 + 13 = 23 b = 77÷32 = 2.41 ≈ 2 g = 77÷45 = 1.71 ≈ 2 There will be 10 boys and 13 girls in our Year 9 sample. Every 2nd boy and 2nd girl will be chosen. Year 10 45 boys and 33 girls are in Year 10 (78 students). sp = 78 x 0.3 = 23.4 ≈ 23 b = (45÷78) x sp b = 0.58 x 23 b = 13.34 ≈ 13 g = (33÷78) x sp g = 0.42 x 23 g = 9.66 ≈ 10 13 + 10 = 23 b = 78÷45 = 1.73 ≈ 2 g = 78÷33 = 2.36 ≈ 2 We have to choose 13 boys and 10 girls for our sample (every 2nd boy and girl). Year 11 There are 28 boys and 30 girls in Year 11. That makes 58 students. sp = 58 x 0.3 = 17.4 ≈ 17 8
b = (28÷58) x sp b = 0.48 x 17 b = 8.16 ≈ 8 g = (30÷58) x sp g = 0.52 x 17 g = 8.84 ≈ 9 8 + 9 = 17 b = 58÷28 = 2.07 ≈ 2 g = 58÷30 = 1.93 ≈ 2 We have to choose 8 boys and 9 girls for our Year 11 sample (every 2nd boy and girl).
These are the figures of my stratified systematic samples: Year 7 Year 9 Year 11 30 students chosen 23 students chosen 17 students chosen 14 boys (every 2nd) 10 boys (every 2nd) 8 boys (every 2nd) nd nd 16 girls (every 2 ) 13 girls (every 2 ) 9 girls nd (every 2 ) Year 8 36 students chosen 18 boys (every 2nd)
Year 10 23 students chosen 13 boys (every 2nd)
Stratified random sampling I will not do the calculations for this sampling method, since I would be repeating myself (the only difference between stratified systematic sampling and stratified random sampling is that stratified random sampling chooses the data randomly, while stratified systematic sampling chooses the data systematically). Knowing this, the only thing I will do is ignore the rule of choosing every nth student, but I will still conserve the sample sizes and the proportion of boys and girls that should be chosen. So after doing this, the figures for stratified random sampling would be: Stratified random sampling figures: Year 7 30 students chosen chosen 16 boys 14 girls
Year 10 23 students
Year 8 36 students chosen chosen 18 boys 18 girls
Year 11 17 students
Year 9 23 students chosen 10 boys
13 boys 10 girls
8 boys 9 girls 9
And in this way I conclude my sampling calculations. As I said earlier, the figures that I will use are the stratified systematic sampling figures. That takes us to the next part: sample selection.
Sample selection Since I’m using stratified systematic sampling, what I need to do is separate boys and girls from each year. This means I will have 10 tables:
Year Year Year Year Year Year Year Year Year Year
7 boys 7 girls 8 boys 8 girls 9 boys 9 girls 10 boys 10 girls 11 boys 11 girls
The tables will not show all the data, only the selected samples. The first table is on the next page.
10
Hypotheses This is the part where I think of some guesses (hypotheses) about how good particular year groups or other groups are at estimation. Here are my hypotheses: Hypothesis 1 KS3 girls with an IQ over 100 estimate lengths more accurately than KS3 girls with an IQ under or equal to that value. Hypothesis 2 KS4 girls over 1.65 m estimate length B more accurately than KS3 girls with a height equal to or smaller than 1.60 m. Hypothesis 3 KS4 boys estimate angle D more accurately than KS3 boys. Hypothesis 4 KS3 blonde girls estimate angle C more accurately than KS3 girls with brown hair. So now I have the hypotheses – that means the next step is to start testing them! Testing the hypotheses, part 1 First of all, let’s look at the first hypothesis:
11
KS3 girls with an IQ over 100 estimate lengths more accurately than KS3 girls with an IQ under or equal to that value. Estimated mean The first thing I’m going to do is a table where I’ll group the data. This estimate is for KS3 girls with an IQ over 100:
KS3 girls, IQ > 100
So the estimate mean will be:
687.5 =23.71cm 29 We see the guesses were quite good, because the actual value of length A is 23 cm. The estimate mean of the guesses is only about 7 mm off. As a percentage, that deviation would be:
71 This0.71 is the deviation in KS3 girls’ guesses with an IQ over 100 (related to length ×100= =+3.087 23 A). 23 Now let’s look at KS3 girls with an IQ equal to or under 100:
KS3 girls, IQ ≤ 100
An outlier (38) was removed from the original 18 pieces of data.
397.5 =23.38cm 17 Surprisingly, the girls with a lower IQ have guessed closer to the actual value! I didn’t need to calculate the deviation to notice this. To find out the figures for length B (59.5 cm):
KS3 girls, IQ > 100
12
1747.5 =60.26 cm 29 60.26−59.5=0.76 cm 0.76 76 ×100= =+ 1.28 59.5 59.5
KS3 girls, IQ ≤ 100
The total frequency for this group has gone down from 18 to 16 because two outliers, 20 and 95, were removed (I hadn’t noticed these when I was doing the samples).
930 =58.13 cm 16 58.13−59.5=−1.37 cm
−1.37
−137
×100= with =−2.3 So we have a tie: for length an IQ ≤ 100 have guessed more 59.5 A, girls59.5 accurately, while for length B, the opposite happened. To (dis)prove my hypothesis, I’ll have to use another method.
KS3 girls, IQ > 100
CF graphs/box plots (length A)
13
For cumulative frequency graphs, the points are always plotted at the upper bound of the interval, so that’s what I’ve done. After plotting the graph (which is in page 16), I have to find the positions of Q1, Q2 and Q3.
Total frequency = 29 Q2 position =
29 =14.5 2
Q1 position =
29 =7.25 4
Now I can draw lines on the graph to find out the values of the quartiles. After doing this, I found that: Q1 = 21.5 cm Q2 = 24.5 cm Q3 = 27 cm The median (Q2) is 1.5 cm above the actual figure of 23 cm. As a percentage, that is 6.52% above the actual figure. To find out how consistent the data is, I drew a box and whisker plot based on the CF graph (in the next page). However, the best way to take information out of a box plot is to compare it with another box plot (which will be related to KS3 girls with a lower IQ), so the next thing I will do is plot another CF graph and another box plot for KS3 girls with an IQ ≤ 100. First, we need some tables:
KS3 girls, IQ ≤ 100
14
Q1 position =
17 =4.25 4
Q2 position =
17 =8.5 2 17
Q3 position ×3=12.75 The CF graph and the box plot for this=data are in page 17. The information I 4 can take from these is in page 18.
15
KS3 girls, IQ > 100
18
20
22
24
26
28
30
32
34
36
Estimated length A (cm)
KS3 girls, IQ > 100 28 26 24 22 20 18 16 Cumulative frequency
14 12 10 8 6 4 2 0 18
20
22
24
26
28
30
Estimated length A (cm)
32
16 34
36
KS3 girls, IQ > 100 28 26 24 22 20 18 16 Cumulative frequency
14 12 10 8 6 4 2 0 18
20
22
24
26
28
30
32
34
Estimated length A (cm)
17
36
KS3 girls, IQ ≤ 100 17 16 15 14 13 12 11 10 9 Cumulative frequency
8 7 6 5 4 3 2 1 0 18
19
20
21
22
23
24
25
26
27
28
29
Estimated length A (cm)
18
30
31
KS3 girls, IQ ≤ 100
18
19
20
21
22
23
24
25
26
27
28
29
30
Estimated length A (cm)
19
31
Looking then at the two box plots (which now have the same scale so that they can be compared accurately),
KS3 girls, IQ > 100
18
20
22
24
26
28
30
32
34
36
Estimated length A (cm)
KS3 girls, IQ ≤ 100
18
20
22
24
26
28
30
32
34
36
Estimated length A (cm)
20
we can get some pieces of information: The average (median) of the group with IQ ≤ 100 is closer to the correct guess of 23 cm, and is lower than the guess of the other group. The interquartile range of the group with a smaller IQ (25.5 – 22.5 = 3) is less than the interquartile range of the other group (27 – 21.5 = 5.5). This means the central part of the data (the middle 50%) is more consistent for the second box plot than for the first. The spread of the data is also smaller in the second box plot (30 – 19 = 11 against 34 – 19 = 15). All of the above suggest clearly that KS3 girls with an IQ ≤ 100 have guessed the value of length A more accurately, which means... The part of my hypothesis related to length A was disproved.
To see if KS3 girls with IQ > 100 estimate length B better than KS3 girls with IQ ≤ 100, I will plot... Scatter graphs (length B) My X axis will be the IQ and my Y axis will be people’s guesses. What I have in mind is ing the guesses of both girls with IQ > 100 and girls with IQ ≤ 100 in the graph. Then, I’ll start from the actual length (59.5 cm) and go down the graph – the IQ I obtain will tell me who is most likely to guess correctly: girls with IQ > 100 or ≤ 100.
21
KS3 girls 85 80 75 70 65 60 Estimated length B (cm) 55 50 45 40 35 65 60
75 70
85 80
95 90
105 115 100 110 120
IQ
After collecting all the data (which I did in a separate sheet), I obtained this graph:
22
The trend line was automatically drawn by the computer. If we find the correct value of length B (59.5 cm) on the Y axis, follow it across and then down when it reaches the trend line, we discover that the IQ most likely to guess correctly is 110. This is above 100, which means from the scatter graph, we conclude that KS3 girls with an IQ > 100 are better at estimating length B. Hypothesis 1 conclusion The hypothesis was partly proved – KS3 girls with an IQ > 100 can estimate length B more accurately, but not length A. Testing the hypotheses, part 2 A reminder of the second hypothesis: KS4 girls over 1.65 m estimate length B more accurately than KS3 girls with a height equal to or smaller than 1.60 m. This time I will use different methods: Pie charts Starting with KS3 girls ≤ 1.60 m, I will produce a pie chart for each of the groups and then compare them. We first find the frequency sum (26), and then divide the frequency of each interval by the total frequency so we obtain a proportion. To find the angle each interval should have in the pie chart, we time each proportion by 360˚. These workings are shown in the following table, and the finished pie chart is below:
KS3 girls ≤ 1.60 m
23
KS3 girls ≤ 1.60 m (guesses in cm)
5
1 2
4
3
The same thing was done for KS4 girls > 1.65 m:
KS4 girls > 1.65 m
24
KS4 girls > 1.65 m (guesses in cm)
3 1
2
The correct value of length B is 59.5 cm. If we look at the interval with the biggest frequency (the one with the largest slice), it contains the correct guess in both situations, so we can say that both groups have mostly guessed accurately. But for me to know which of the groups has guessed the most accurately, I’ll test this hypothesis again with a different method. Ordered stem and leaf diagrams/averages
25
To test this hypothesis, I need to find the basic averages for both groups – mean, median and mode. But this time I want to do more than just find the averages; instead, I’ll find them through an ordered stem and leaf diagram (one diagram for each group) showing the guesses of every girl in the sample (on the next page):
KS3 girls ≤ 1.60 m
Median To find the median’s position, we count the number of values (which is 26), and then:
26 + 1 = 13.5th value 2 In this case, we will have to calculate the mean of the 13 th and 14th values (which we find by counting the leaves from right to left), and this will give us the median:
59 + 59 = 59 cm 2 So we just found that the median for these KS3 girls is 59 cm – this number is very close to the correct value of 59.5 cm, showing the average guess is quite accurate. Mean We have to add up all the values (the result is 1575) and divide it by the number of values (the frequency):
1575 = 60.58 cm 26 The mean is slightly higher than the median (because it considers all the pieces of data), but 60.28 cm is still quite an accurate guess. Mode/modal group
26
To find the mode, we look at the number which appears the most in the stem and leaf diagram. This is quite easy to work out – the mode for this group is, again, 59 cm (which is very accurate). The modal group is the group which contains most of the data (in this case, the stem which contains most of the data). The modal group for this set of data is clearly 52-59. This group does not contain the actual value, but it is very close (only 0.5 cm off).
Range The range is obtained by subtracting the lowest value from the highest value. If we do this, we will get 81 – 37 = 44 cm, which is a high value (but we need to consider the sample was also big; therefore, we should compare this to the range of the other group before taking any conclusions). Until now, this group has guessed very accurately in general, but let’s take a look at the other group and then we can make some comparisons.
KS4 girls > 1.65 m
Median
∑ f +1 =
2
12+1 = 6.5th value 2
6th + 7th 61+65 = = 63 cm 2 2 The median for this group is further off from the actual value. According to the hypothesis, this was the group supposed to guess more accurately, so the hypothesis is looking disproven at the moment. Let’s look at the other averages before making conclusions: Mean
∑ fx = 767 = 63.92 cm ∑ f 12 The mean guess for this group is also further off from the correct value than the mean guess of the other group. At this point, it looks like KS3 girls are guessing more accurately. Mode/modal group There are two modal groups: 54-58 and 70-75. When this happens, we say a set of data is bimodal. 27
The mode for this set of data is also more than one value; 54, 58 and 72 all appear twice. This doesn’t allow us to reach much of a conclusion with this average. Range The range for this group is 75 – 54 = 21 cm. This is less than half of the other group’s range, meaning this set of data is much more consistent, and therefore, better. Unfortunately for the hypothesis, this is the only statistical value pointing in favour of this group. This hypothesis seems disproven at the moment, because KS3 girls’ ≤ 1.60 m have guessed length B more accurately than KS4 girls > 1.65 m. However, I still think KS4’s sample was slightly too small, and that might have influenced some of the results. I think I will have to try another method to be sure. Standard deviation This method considers all of the data, and it gives us an idea of how spread out or consistent the data is in relation to an average (the mean). Both groups have a similar mean, but the group with more consistent data and an average closer to the correct value will be the group who has guessed more accurately. KS3 girls ≤ 1.60 m The formula for the calculation of standard deviation is the following:
σ=
√
∑ (x−´x )2 n
The lower case letter sigma symbolises standard deviation, and it is just one of the symbols. The x bar is the symbol for the mean (which I have calculated earlier – 60.58 cm). The calculations require a table, which is below:
KS3 girls ≤ 1.60 m
28
The number emphasised in blue is the main figure we need for the calculation, which we can now make:
√
(x−´x )2 ⇔ ∑ 2486.3464 ⇔ σ =now discovered ❑ σ = that the standard ❑ σ= √95.63=9.78 We deviation cm for the guesses of KS3 girls ≤
√
n 9.78 cm. This 26 means that, in general, about 68% of the total 1.60 m is population would be within 9.78 cm above or below the mean (60.58 cm). In other words, about 68% of the population has guessed between 50.8 cm and 70.36 cm. Let’s make the calculations for KS4 girls > 1.65 m and compare them (the mean for this group is 63.92 cm):
KS4 girls > 1.65 m
σ=
√
∑ (x−´x )2 ❑⇔ σ = n
√
664.9168 ⇔ ❑ σ =√ 55.41=7.44 cm 12
This means KS4 girls’ guesses are less spread out from the average – meaning KS4’s data is more consistent and closer to the average of 63.92 cm. Considering this, I conclude... Hypothesis 2 conclusion The hypothesis was proven – KS4 girls can estimate length B more accurately than KS3 girls, because their data was more consistent and closer to the correct value.
29
Testing the hypotheses, part 3 The third hypothesis was: KS4 boys estimate angle D more accurately than KS3 boys. Percentages I’m planning to count every correct estimate in KS3 and KS4 and divide that number for the total of individuals in the sample. The result multiplied by 100 will give me the percentage of correct guesses, and the group with the largest figure will be the group that can guess more accurately. KS3 boys The total number of individuals in the KS3 sample is 42. Out of these, 4 boys guessed angle D would be 145˚, which is true. In a percentage, that is:
4 ×100=9.52 42 So the percentage of correct guesses in KS3 is 9.52%. KS4 boys There are 21 pieces of data in total in the KS4 sample. Out of all these individuals, 2 have guessed the exact value of 145˚. But wait a minute! The proportion 4 out of 42 has the same value than 2 out of 21! That means KS3 boys and KS4 boys have the exact same proportion of correct guesses, so we have no means of knowing who guesses more accurately by just using this method. CF graphs/box plots Perhaps we might be able to reach a conclusion by finding out each group’s median and determining which one has the most consistent data. KS3 boys This group has 42 pieces of data, so the median’s position will be half of this (21). Q1’s position will be half way through the median (10.5). Q3 will be ¾ of the way through the total frequency (21 + 10.5 = 31.5). The points are always plotted at the upper bound of the interval.
KS3 boys
30
KS3 boys 42 40 38 36 34 32 30 28 26 24 22 Cumulative frequency
20 18 16 14 12 10 8 6 4 2 0 100 105 110 115 120 125 130 135 140 145 150 155 160 Estimated angle D (˚)
31
KS3 boys
100
105
110
115
120
125
130
135
140
145
150
155
160
Estimated angle D (˚)
After plotting the CF graph, I found that: Lowest value = 100˚ Q1 = 118.5˚ 32
Q2 = 135˚ Q3 = 141.5˚ Highest value = 160˚ Considering this data, I drew a box plot which is shown below the CF graph. The central 50% of the population (represented by the box) looks quite centred, meaning that the data is not mainly made up of high or low values. It looks like the 3rd quarter of the data is the section which contains most of the values (25% of the data is between 135˚ - 141.5˚). KS4 boys First, we need a table:
KS4 boys
The total population of this sample is 21, so Q2’s position will be half of this (10.5). Q1’s position is half of Q2 (5.25). Q3’s position is the triple of Q1 (15.75). When analysing the CF graph (on the next page), I discovered that: Lowest value = 125˚ Q1 = 128˚ Q2 = 134.5˚ Q3 = 143.5˚ Highest value = 160˚ I then used this data to draw the box plot (shown below the CF graph). The comparison of the two box plots is shown in page 30.
33
KS4 boys 21 20 19 18 17 16 15 14 13 12 11 Cumulative frequency
10 9 8 7 6 5 4 3 2 1 0 125
130
135
140
145
150
155
Estimated angle D (˚)
34
160
KS4 boys
125
130
135
140
145
150
155
160
Estimated angle D (˚)
35
KS3 boys
100
105
110
115
120
125
130
135
140
145
150
155
160
140
145
150
155
160
Comparing box plots
KS4 boys
100
105
110
115
120
125
130
135
Estimated angle D (˚)
KS3 boys have a larger spread (this is not affected by the size of the sample, because KS4 could have data as spread out of KS3, regardless of the fact 36
that the sample is only half of KS3’s sample). This means KS4’s data is more consistent and, possibly, more accurate. KS4’s interquartile range is smaller than KS3’s (143.5 - 128 = 15.5˚ against 141.5 – 118.5 = 23˚). Neither group’s interquartile range contains the correct value of 145˚. KS3’s median is larger by 0.5˚ (135˚ against 134.5˚). Since KS4’s data is more consistent and closer to the correct value, we can say...
Hypothesis 3 conclusion This hypothesis was proven – KS4 boys can estimate angle D more accurately than KS3 boys.
Testing the hypotheses, part 4 The fourth hypothesis stated: KS3 blonde girls estimate angle C more accurately than KS3 girls with brown hair. Ordered double stem and leaf diagrams
KS3 brunette and blonde girls, respectively
The real value of angle C is 65˚. The stem and leaf diagram on the left shows the guesses of KS3 brunette girls and the right hand side shows the guesses of KS3 blonde girls (as the title indicates). This way of showing a pair of stem and leaf diagrams is beneficial, because you can compare them more easily and at a glance. Some information we can immediately draw from this method: Both samples’ modal group is 60˚ - 69˚ (the longest leaf), which contains the correct guess. The sample of brunettes is multimodal (trimodal). The 3 modes are 45˚, 54˚ and 65˚ (all of these appear 3 times), with 65˚ being the correct value of the angle. The sample has a range of 82˚ - 45˚ = 37˚. 37
The total frequency of the sample is 20. The position of the median is the 10.5th value, which is between 62˚ and 62˚ - this means the median is 62˚. The sample of blondes is unimodal – the mode is 65˚ (the correct value), which appears three times in the sample. The range of this sample is 75˚ - 31˚ = 44˚. This value is 7˚ larger than the range of the sample of KS3 brunette girls. Although the range is higher, it is only so because of the least 3 values - 31˚, 32˚ and 32˚. The rest of the values are concentrated between 50˚ and 75˚, which would be practically a range of only 25˚. The total frequency of this sample is 22. The position of the median is therefore the 11.5th value – this corresponds to the value of (63 + 65) : 2 = 64˚. The median of the guesses of blonde girls is 2˚ larger than the median of the guesses of brunette girls, but closer to the actual value of the angle. All these statistics point KS3 blonde girls as the group who can estimate angle C more accurately, but I should test the hypothesis with more than one method to remove any doubts.
Relative frequency I am thinking of obtaining the relative frequency of the correct guess for each of the groups. The way to obtain this statistics is to divide the frequency of the value we want to measure (in this case, 65˚) by the total frequency of the sample. After doing this for each of the groups, the sample with the highest relative frequency of correct guesses will be the sample who can guess more accurately. KS3 brunette girls Total frequency = 20 Frequency of correct guesses = 3 Calculation of relative frequency of correct guesses:
Correct guesses 3 = =0.15 Total guesses 20 KS3 blonde girls Total frequency = 22 Frequency of correct guesses = 3 Calculation of relative frequency of correct guesses:
Correct guesses 3 = ≈ 0.1 4 Total guesses 22 The relative frequency is pointing KS3 brunette girls as the ones who guess more accurately (although just by a short margin). This means the two methods I have used are against each other, because each one identifies a different sample as the one who can estimate more accurately.
38
However, relative frequency is more accurate when a very large number of trials have been made (normally at least 300), and that suggests that these results do not portray the data very well. Unfortunately, I am short of time to develop this coursework further, so I will have to say... Hypothesis 4 conclusion Based on my ordered double stem and leaf diagram and the analysis I took from it, this hypothesis is proven – KS3 blonde girls have more consistent data and the median is closer to the actual value. The blonde girls’ mode is also more accurate (the sample is unimodal against the trimodal result of the brunette girls. Because of all these facts, I believe KS3 blonde girls are the ones who can estimate angle C more accurately, just like my original hypothesis stated.
Conclusion Here are the conclusions to each of my hypotheses: Hypothesis 1: KS3 girls with an IQ over 100 estimate lengths more accurately than KS3 girls with an IQ under or equal to that value. This hypothesis was partly proved – KS3 girls with IQ > 100 can estimate length B more accurately, but not length A. Hypothesis 2: KS4 girls over 1.65 m estimate length B more accurately than KS3 girls with a height equal to or smaller than 1.60 m. The hypothesis was proven – KS4 girls can estimate length B more accurately than KS3 girls, because their data was more consistent and closer to the correct value. Hypothesis 3: KS4 boys estimate angle D more accurately than KS3 boys. This hypothesis was proven – KS4’s data was more consistent and closer to the actual value. Hypothesis 4: KS3 blonde girls estimate angle C more accurately than KS3 girls with brown hair. Although this hypothesis was a bit rushed, it was proven – based on my statistics, KS3 blonde girls can estimate angle C more accurately and consistently. There were some hindrances to my work, and a particular example was the lack of time. This was mainly notorious in the testing of hypothesis 4, where my testing was limited to only one method. There were also some occasions where 39
the samples might have been a bit small – in hypothesis 2, for example. KS4’s sample had only 12 people. Besides my teacher, I had other sources of aid for the completion of this coursework, like mymaths.co.uk, where I got an idea of what methods might be more suitable and how to apply them properly. In some occasions I searched on Google for help (this was mainly for me to learn how to draw graphs in Excel, particularly box plots). I do not resent the decision of writing this coursework completely in computer, because it gives a neater presentation and more accuracy in the graphs (e.g. when drawing lines of best fit in scatter graphs or drawing the angles for a pie chart). If I had the chance of doing this again, I would like to have had some more time so I could develop my testing further. I believe this would have boosted my grade further. Other ways of collecting data would also have been helpful (instead of using secondary data, perhaps I could have created my own questionnaire to get only the information I want – as you will have noticed, I didn’t even touch some of the data that was provided to me, for example KS2 statistics or the distance pupils have to cover every day to come to school.) Despite these obstacles and errors (and some others you might find), I still have hope that I might achieve a grade in this coursework that will satisfy me, but as you know, that is not my decision, but yours!
40