WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 89% (H?Y)
00:00:00.000 --> 00:00:08.800
iIn this video we will talk about standard scores we will see how standard scores are calculated what
00:00:08.800 --> 00:00:17.000
they represent and why they are so useful. To motivate the idea of a standard score let us look at
00:00:17.000 --> 00:00:24.900
some of our actual data let's go to Jimovie. Let us first load our data set
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:27.500 --> 00:00:33.900
and let us look at some of our vocabulary scores.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:33.900 --> 00:00:43.000
The first child listed here has a vocabulary score of 47, this means that the child answered forty
00:00:43.000 --> 00:00:49.000
seven questions correctly, or selected 47 pictures correctly.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:49.000 --> 00:00:59.300
Okay is that a good thing or a bad thing, is that a good performance or a bad performance ? This 47 is
00:00:59.300 --> 00:01:09.100
essentially meaningless unless we know what kids are expected to do with this test at this age. Let
00:01:09.100 --> 00:01:17.000
us see what the other kids in our small sample did, all go to exploration, descriptives
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:17.600 --> 00:01:21.300
and choose to look at
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:21.300 --> 00:01:33.950
the mean, median, minimum and maximum performance. So for vocabulary in kindergarten we see that scores
00:01:33.950 --> 00:01:38.900
ranged between 38 and 91.
NOTE
Treffsikkerhet: 84% (H?Y)
00:01:38.900 --> 00:01:48.050
And that a means score was 61.3 so compared to 61.3
NOTE
Treffsikkerhet: 85% (H?Y)
00:01:48.050 --> 00:01:52.500
Our score of 47 seems quite low,
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:52.500 --> 00:01:55.699
but how low is it really
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:55.699 --> 00:02:03.800
What proportion of children scores higher or lower than 47, that would be a much more useful kind of
00:02:03.800 --> 00:02:12.300
information to understand the meaning of this 47. Let us look at the standard deviation for our
00:02:12.300 --> 00:02:13.900
sample.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:18.300 --> 00:02:36.200
The standard deviation is 12.1 so our score of 47 is 47 - 61.3 that is more than 14 points below the
00:02:36.200 --> 00:02:37.350
mean.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:37.350 --> 00:02:51.100
And 14 points below the mean is more than one standard deviation, in fact by dividing 14.3 by 12.1 it
00:02:51.100 --> 00:03:01.900
turns out that this child with the score of 47 correct choices is actually 1.18 standard deviations
00:03:01.900 --> 00:03:07.050
below the mean. What does that mean to understand ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:07.050 --> 00:03:14.400
To the proportion that is associated with each score we can go to our normal probability display, so
00:03:14.400 --> 00:03:25.800
this is our normal probability display and let us go to a score of - 1.18, that is 1.18 standard
00:03:25.800 --> 00:03:28.300
deviations below the mean.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:28.900 --> 00:03:37.900
This is where this child lies with respect to the normal distribution, when we think about it in
00:03:37.900 --> 00:03:40.500
terms of standard deviations.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:40.900 --> 00:03:51.700
And if we click up to, we see that 1.18 standard deviations below the mean is associated with the
00:03:51.700 --> 00:04:01.700
probability of about 12%. So this means that our child is at the 12 percentile, this means that 88
00:04:01.700 --> 00:04:09.899
percent of children score higher, and 12 percent of children score at 47 or below.
NOTE
Treffsikkerhet: 87% (H?Y)
00:04:09.899 --> 00:04:20.300
Indeed if you go back to Jimovie and count how many of the vocabulary in kindergarten values are
00:04:20.300 --> 00:04:30.800
higher than 47, you will see that there is 42 of them, 42 out of 47 children in the sample is about 89
00:04:30.800 --> 00:04:38.000
percent which is exactly what we expected based on looking at the percentile from the normal
00:04:38.000 --> 00:04:39.250
distribution.
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:39.250 --> 00:04:48.799
So what we did by using information from the mean and standard deviation was to convert this
00:04:48.799 --> 00:04:57.700
meaningless raw score of 47 into a value that is directly interpretable as a proportion of children
00:04:57.700 --> 00:05:07.000
scoring higher. So we turned an almost useless number into a highly informative quantity that tells
00:05:07.000 --> 00:05:09.450
us how high or how low this
NOTE
Treffsikkerhet: 76% (H?Y)
00:05:09.450 --> 00:05:18.000
child scores for this test and in comparison with the same age range, or at least in comparison with
00:05:18.000 --> 00:05:19.950
irrelevant sample.
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:19.950 --> 00:05:28.900
This is the distribution of vocabulary raw scores in kindergarten for our entire sample of 47
00:05:28.900 --> 00:05:37.500
children, here is the histogram and on top of the histogram I have plotted a normal curve with the
00:05:37.500 --> 00:05:43.150
same mean and standard deviation as the sample, for comparison.
NOTE
Treffsikkerhet: 85% (H?Y)
00:05:43.150 --> 00:05:52.000
So it looks like the histogram is a fairly good approximation of this normal curve although there
00:05:52.000 --> 00:06:00.350
seems to be a slight excess of children's scoring between 50 and 60 compared to Children scoring
00:06:00.350 --> 00:06:02.500
around 60.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:02.600 --> 00:06:12.000
Anyway let us use this sample to see how we go from the raw score values, which as we saw are
00:06:12.000 --> 00:06:18.900
essentially uninformative, to those values that are informative, that are interpretable as
00:06:18.900 --> 00:06:27.700
proportions of the relevant sample. Recall that the mean of this sample was 61.3,
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:27.700 --> 00:06:31.750
and the standard deviation was 12.1.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:31.750 --> 00:06:38.900
So the first thing we do is we subtract the mean from each value
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:39.100 --> 00:06:50.200
if we subtract 61.3 from each value then we get to this histogram now as you will notice this is
00:06:50.200 --> 00:06:57.700
identical, of course the distribution of values hasn't changed, what has changed is that every value
00:06:57.700 --> 00:07:08.200
is 61.3 less than it used to be so the dispersion is unchanged and indeed the new set of numbers has
00:07:08.200 --> 00:07:09.299
the same standard deviation
NOTE
Treffsikkerhet: 82% (H?Y)
00:07:09.299 --> 00:07:18.000
as before but since we subtracted the mean from every value, the mean is now zero because a value
00:07:18.000 --> 00:07:27.350
that would be around 61 is now around Zero by subtracting 61.3, and the same is true for every value.
00:07:27.350 --> 00:07:38.200
So values that used to be around 61 are now around 0, it's like Shifting the histogram so that the
00:07:38.200 --> 00:07:39.250
mean is now
NOTE
Treffsikkerhet: 77% (H?Y)
00:07:39.250 --> 00:07:40.550
at zero
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:40.550 --> 00:07:43.850
that's what subtracting the mean did.
NOTE
Treffsikkerhet: 87% (H?Y)
00:07:43.850 --> 00:07:49.800
Let us now divide every value by this standard deviation.
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:51.600 --> 00:08:00.200
Again it should not be a surprise that the histogram is identical, we did not change the relative
00:08:00.200 --> 00:08:11.100
standing of any number, we just divided every number by 12.1 the mean remains unchanged because it's
00:08:11.100 --> 00:08:19.400
not affected by the division what used to be zero is still zero. What did change was the actual
00:08:19.400 --> 00:08:22.150
values of the numbers so that
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:22.150 --> 00:08:26.350
a number that used to be 12 points above the mean
NOTE
Treffsikkerhet: 86% (H?Y)
00:08:26.350 --> 00:08:34.600
which was about one standard deviation about the mean well divided, by twelve point one is now around
00:08:34.600 --> 00:08:43.150
one so being 12 points above the mean is now one. So being one above the mean is now
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:43.150 --> 00:08:52.600
one standard deviation above the mean. Indeed the numbers here are interpretable as how many standard
00:08:52.600 --> 00:08:57.650
deviations above or below the mean the original value was,
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:57.650 --> 00:09:05.800
and that's what a standard score is. So we have calculated standard scores, otherwise known as Z
00:09:05.800 --> 00:09:09.300
scores using the formula
NOTE
Treffsikkerhet: 79% (H?Y)
00:09:09.500 --> 00:09:18.800
raw score which is our original measurement minus the mean of all the measurements, divided by the
00:09:18.800 --> 00:09:29.200
standard deviation in this is called a z-score or standard score. And a z-score is exactly a number
00:09:29.200 --> 00:09:37.400
that expresses how many standard deviations away from the mean the original measurement was. So a
00:09:37.400 --> 00:09:39.200
positive z-score means the
NOTE
Treffsikkerhet: 86% (H?Y)
00:09:39.200 --> 00:09:45.800
original measurement was above the mean, and a negative z-score means the original measurement was
00:09:45.800 --> 00:09:47.800
below the mean.
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:48.100 --> 00:09:57.900
The reason this is so useful is that if our data are approximately normally distributed, or can be
00:09:57.900 --> 00:10:05.700
brought to be approximately normally distributed, then the number of standard deviations above or
00:10:05.700 --> 00:10:14.400
below the mean is directly linked to a probability. Because of the direct link of probabilities to
00:10:14.400 --> 00:10:17.349
positions in the normal distribution.
NOTE
Treffsikkerhet: 89% (H?Y)
00:10:17.349 --> 00:10:27.300
And as you recall probability is thought of as frequency, or proportions, how often things happen on
00:10:27.300 --> 00:10:38.700
or in what proportion of instances things happen. So a probability of 12%, is the same as a 12
00:10:38.700 --> 00:10:41.500
percent proportion of the sample,
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:41.500 --> 00:10:46.800
or of the population depending on how things work computed.
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:46.800 --> 00:10:57.400
sS by using a z-score through the normal distribution we can link to percentiles, the proportion of
00:10:57.400 --> 00:11:03.050
children scoring below and above our original raw score.
NOTE
Treffsikkerhet: 86% (H?Y)
00:11:03.050 --> 00:11:12.100
And therefore we have computed a very informative measure out of the raw score that tells us the
00:11:12.100 --> 00:11:20.200
relative standing of this child as a proportion of the sample, and that's what is z score or standard
00:11:20.200 --> 00:11:24.000
score is and why it is so useful.