WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 87% (H?Y)
00:00:00.000 --> 00:00:08.700
In this video we will talk about test standardization, in particular we will talk about rescaling of
00:00:08.700 --> 00:00:16.300
standard scores to easily understood numbers and norming, or standardization samples, on which
00:00:16.300 --> 00:00:22.900
measurements are used to derive the conversion. And these are essential elements of every assessment
00:00:22.900 --> 00:00:29.750
test. As you recall from our presentation of standard scores
NOTE
Treffsikkerhet: 88% (H?Y)
00:00:29.750 --> 00:00:41.400
when we have a set of measurements on a test, we can subtract the mean and get to a centered set of
00:00:41.400 --> 00:00:47.200
measurements with a mean of zero and the same standard deviation as the original, and then by
00:00:47.200 --> 00:00:53.900
dividing with the standard deviation we get to standard scores which have the mean of 0 and a
00:00:53.900 --> 00:00:58.700
standard deviation of one, and these are the z-scores
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:58.700 --> 00:01:05.800
and they are essentially the original scores expressed as number of standard deviations away from
00:01:05.800 --> 00:01:14.300
the mean. Now these kinds of numbers are not very easy to perceive or remember for many people
00:01:14.300 --> 00:01:21.600
because they have decimal digits and they're also often negative, indicating performance below the
00:01:21.600 --> 00:01:28.600
mean. So negatives and decimals make these numbers a bit cumbersome for many people,
NOTE
Treffsikkerhet: 90% (H?Y)
00:01:28.600 --> 00:01:35.100
and therefore what we can do to correct that and make them more convenient without changing their
00:01:35.100 --> 00:01:44.300
essence is to transform them to any distribution we would like. So if we multiply by a number such as 15
00:01:44.300 --> 00:01:52.800
and then add a hundred to every one of them then we can derive exactly the same distribution again,
00:01:52.800 --> 00:01:58.699
and of course the same as the original raw scores, but this distribution
NOTE
Treffsikkerhet: 81% (H?Y)
00:01:58.699 --> 00:02:04.900
has a mean of a hundred, and the standard deviation of 15, the two numbers that we use to derive
00:02:04.900 --> 00:02:05.850
it.
NOTE
Treffsikkerhet: 84% (H?Y)
00:02:05.850 --> 00:02:14.600
So someone who was one standard deviation above the mean, is now 15 above the mean because we
00:02:14.600 --> 00:02:22.600
multiplied everything by 15, someone who was 2 standard deviations below the mean is now 2 times 15
00:02:22.600 --> 00:02:27.650
that's 30 below the mean, 30 below a hundred is 70.
NOTE
Treffsikkerhet: 80% (H?Y)
00:02:27.650 --> 00:02:39.100
Someone who had exactly average performance would be at 100, so what we did was to transform our actual
00:02:39.100 --> 00:02:47.000
distribution of raw score that we measured, into an identical distribution with a mean of a hundred
00:02:47.000 --> 00:02:54.400
and standard deviation of 15, which are numbers that are easy to remember when we know and do not
00:02:54.400 --> 00:02:57.500
forget what they represent.
NOTE
Treffsikkerhet: 86% (H?Y)
00:02:58.500 --> 00:03:08.100
So standardized scores can be scaled, the original standard score is the Z score, but we can
00:03:08.100 --> 00:03:15.800
scale them to be like an IQ scale, and these are with the mean of a hundred and standard deviation of
00:03:15.800 --> 00:03:25.700
15. This is what IQ scores actually are, they are standardized numbers that are made to have a mean of
00:03:25.700 --> 00:03:29.300
100 and a standard deviation of 15.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:29.300 --> 00:03:36.200
Based on the actual raw scores of how many questions were answered, which are uninformative numbers
00:03:36.200 --> 00:03:42.100
in themselves because in order to interpret those properly we would need to know the mean and
00:03:42.100 --> 00:03:47.050
standard deviation of everyone answering the questions on the test.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:47.050 --> 00:03:54.600
There are other scales you are probably familiar with such as whisk subscales which are made to
00:03:54.600 --> 00:04:01.000
have a mean of 10 and a standard deviation of three using an operation similar to the one we just
00:04:01.000 --> 00:04:07.750
demonstrated, and there are many other possibilities of scale scores that are used in assessment tests
00:04:07.750 --> 00:04:16.100
and they all have the same interpretation, you remember always think of percentiles. So all kinds of
00:04:16.100 --> 00:04:17.299
scale scores
NOTE
Treffsikkerhet: 73% (MEDIUM)
00:04:17.299 --> 00:04:25.200
are informative to the extent you know what percentiles, what proportion of the population each score
00:04:25.200 --> 00:04:26.950
corresponds to.
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:26.950 --> 00:04:35.200
Standardization of a test begins with selecting a sample that is a group of people, adults or
00:04:35.200 --> 00:04:37.250
children, to be measured.
NOTE
Treffsikkerhet: 89% (H?Y)
00:04:37.250 --> 00:04:44.100
This is called the norming sample and it is extremely important for the norming sample to be
00:04:44.100 --> 00:04:51.100
representative of the population, that is to have the same mean and standard deviation as the whole
00:04:51.100 --> 00:04:58.200
population. Of course we can never measure everyone so we have to carefully select our sample so that
00:04:58.200 --> 00:05:04.300
we can reasonably expect it to have a mean and standard deviation that will be very close to the
00:05:04.300 --> 00:05:07.600
whole population, so they must have the same average
NOTE
Treffsikkerhet: 90% (H?Y)
00:05:07.600 --> 00:05:14.100
performance and a similar dispersion of performance as over the whole population. Once the norming
00:05:14.100 --> 00:05:21.400
sample is selected everyone in this sample is administered the test and the distribution of scores
00:05:21.400 --> 00:05:27.300
on the test is checked to make sure it conforms to the normal distribution, or is brought to conform to
00:05:27.300 --> 00:05:28.150
it.
NOTE
Treffsikkerhet: 82% (H?Y)
00:05:28.150 --> 00:05:34.400
And after that we can calculate a mean and standard deviation expecting that to express the
00:05:34.400 --> 00:05:42.900
population, and then calculate Z scores and provide conversion tables to z-scores other standardized
00:05:42.900 --> 00:05:51.100
scores, scale scores and eventually percentiles. So for each raw score we can express the
00:05:51.100 --> 00:05:58.400
corresponding z-score or scaled score or percentile that it corresponds to and these are the tables
NOTE
Treffsikkerhet: 80% (H?Y)
00:05:58.400 --> 00:06:05.500
you see acompany every assessment test were you look up your raw score and derive a scaled score, or
00:06:05.500 --> 00:06:07.900
a percentile, or both.
NOTE
Treffsikkerhet: 87% (H?Y)
00:06:07.900 --> 00:06:17.600
So the result of the test is a percentile rank, the proportion of people in the norming sample this
00:06:17.600 --> 00:06:26.800
person performs better or worse than, so a relative standing. And this percentile is generalizable to
00:06:26.800 --> 00:06:32.049
the population to the extent the norming sample was representative.
NOTE
Treffsikkerhet: 87% (H?Y)
00:06:32.049 --> 00:06:40.300
Remember percentiles derived from a test are actually corresponding to rankings with respect to the
00:06:40.300 --> 00:06:45.750
norming sample from which the conversion tables were derived.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:45.750 --> 00:06:51.600
Now let us look at some examples of interpreting such scores.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:51.800 --> 00:06:59.800
Let's start with the case of full scale IQ, which is scaled to a mean of 100 and a standard deviation
00:06:59.800 --> 00:07:01.250
of 15.
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:01.250 --> 00:07:11.100
So what does the score of 100 on an IQ test mean? Well 100 is equal to the standardization mean by
00:07:11.100 --> 00:07:16.650
definition we make IQ scaled scores to have a mean of 100,
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:16.650 --> 00:07:24.700
so this is a z-score of zero not away from the mean exactly, equal to the mean, therefore it
00:07:24.700 --> 00:07:34.600
corresponds to the 50th percentile. This means half the people score better and half score worse or
00:07:34.600 --> 00:07:40.400
equal to this score, so an IQ of 100 means you are in the middle.
NOTE
Treffsikkerhet: 87% (H?Y)
00:07:40.400 --> 00:07:55.400
What is an IQ of 130 mean ? Well 130 is 30 above 100, 30 above the mean and 30 is twice the standard
00:07:55.400 --> 00:08:06.600
deviation of 15, so an IQ of 130 means two standard deviations above the mean that is a z-score of +2
00:08:06.600 --> 00:08:11.500
which corresponds to the 98th percentile.
NOTE
Treffsikkerhet: 88% (H?Y)
00:08:11.500 --> 00:08:19.700
Note that we don't usually use decimal places for percentiles, we round them up two integers so that
00:08:19.700 --> 00:08:25.700
they're easier to interpret. So we say the 98th percentile here.
NOTE
Treffsikkerhet: 86% (H?Y)
00:08:26.700 --> 00:08:40.100
What about an IQ of 80 ? Well 80 is 20 points below the mean of 100, and 20 divided by 15 means that's one and
00:08:40.100 --> 00:08:47.800
one third one, and one third standard deviations below the mean, this is what 80 means. 80 means a
00:08:47.800 --> 00:08:57.400
z-score of - 1.33 which corresponds to the 9th percentile, to someone with an IQ of 80.
NOTE
Treffsikkerhet: 74% (MEDIUM)
00:08:57.400 --> 00:09:06.300
Has a score that is expected to be lower than ninety one percent of the population and equal or
00:09:06.300 --> 00:09:14.700
better than nine percent of the population, and this is the meaning of an IQ equal to 80.
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:15.600 --> 00:09:23.800
Let's look at a different scaling turning to whisk subscale type scaling which are standardized to
00:09:23.800 --> 00:09:28.150
have a mean of 10 and standard deviation of three.
NOTE
Treffsikkerhet: 75% (MEDIUM)
00:09:28.150 --> 00:09:37.000
What does a subscales score of 10 mean ? Well that is equal to the standardization mean that is a z-score
00:09:37.000 --> 00:09:44.250
of 0 corresponding to the 50th percentile. So a scale score of 10 means that you're doing
00:09:44.250 --> 00:09:49.500
better than half, and worse than half the sample, you are in the middle.
NOTE
Treffsikkerhet: 69% (MEDIUM)
00:09:49.600 --> 00:10:00.200
What about a sub scale score of 13 ? Well 13 is three above the mean which is 10, so that is one
00:10:00.200 --> 00:10:02.800
standard deviation above the mean.
NOTE
Treffsikkerhet: 80% (H?Y)
00:10:02.800 --> 00:10:10.700
The z- score of plus 1 one standard deviation above the mean, corresponds to the 84th percentile, so a
00:10:10.700 --> 00:10:20.900
standard score of 13 means that you are doing better or as well as 84% of the relevant population. So
00:10:20.900 --> 00:10:25.600
children of your age if you're talking about the Whisk.
NOTE
Treffsikkerhet: 85% (H?Y)
00:10:26.100 --> 00:10:35.500
What about a scaled score of 8 on a whisk subscale ? Well 8 is 2 below the mean of 10
NOTE
Treffsikkerhet: 82% (H?Y)
00:10:35.500 --> 00:10:41.450
and that is actually two thirds of a standard deviation, 2/3 of three.
NOTE
Treffsikkerhet: 79% (H?Y)
00:10:41.450 --> 00:10:51.950
SO 2/3 of of a standard deviation below the mean is z equal to minus 0.67, that's 2/3
00:10:51.950 --> 00:10:59.900
which corresponds to the 25th percentile. So a scale score of eight means you are doing worse than
00:10:59.900 --> 00:11:08.900
75% of the norming sample and as well or better than 25% of that.
NOTE
Treffsikkerhet: 82% (H?Y)
00:11:09.300 --> 00:11:20.600
What about a scaled score of 4, 4 is 6 below the mean of 10, and 6 is twice three, so two standard
00:11:20.600 --> 00:11:29.000
deviations below the mean, this means a z-score of - 2 which corresponds to the second percentile, so
00:11:29.000 --> 00:11:36.900
a scaled score of 4 means you're doing worse than 98% of the norming sample, or of the corresponding
00:11:36.900 --> 00:11:39.700
comparison population.
NOTE
Treffsikkerhet: 91% (H?Y)
00:11:40.500 --> 00:11:48.800
We must be very careful with interpretation of test scores, and remember that although they look like
00:11:48.800 --> 00:11:56.050
numbers they're very easy to misinterpret because they don't mean what actual numbers mean.
NOTE
Treffsikkerhet: 82% (H?Y)
00:11:56.050 --> 00:12:03.400
What I mean to say with that is that scaled scores are not on your Ratio or interval scale.
NOTE
Treffsikkerhet: 90% (H?Y)
00:12:03.600 --> 00:12:13.349
First of all distances aren't comparable, so the difference between two people who have 115
NOTE
Treffsikkerhet: 84% (H?Y)
00:12:13.349 --> 00:12:22.900
and 105 scaled scores of IQ, or intelligence quotients, and the difference between two people who have
00:12:22.900 --> 00:12:30.550
80 and 70 is not comparable, they both look like 10-point differences, they are 10-point differences
00:12:30.550 --> 00:12:36.400
but that doesn't mean they're equal differences. There is nothing that can be said about comparing
00:12:36.400 --> 00:12:42.800
these differences, because these numbers aren't on a scale that has distances.
NOTE
Treffsikkerhet: 74% (MEDIUM)
00:12:43.000 --> 00:12:52.600
And much more obviously, ratios are completely meaningless because there is no 0 on this scale, so
00:12:52.600 --> 00:13:01.100
there is no sense in which you can say that an IQ of 120 is 50% higher than an IQ of 80, or that an
00:13:01.100 --> 00:13:09.200
IQ of 150 is a hundred percent higher than an IQ of 75 percent, or double that, or anything of that
00:13:09.200 --> 00:13:13.750
sort. These are completely nonsense statements.
NOTE
Treffsikkerhet: 91% (H?Y)
00:13:13.750 --> 00:13:20.600
The only valid interpretation of test scores is as percentiles,
NOTE
Treffsikkerhet: 88% (H?Y)
00:13:20.600 --> 00:13:28.400
and that's why you always need to have a conversion table handy so that you can convert the scale
00:13:28.400 --> 00:13:36.300
scores to Z scores, or directly to percentiles, and know what your performance, your test performance
00:13:36.300 --> 00:13:42.400
means in terms of proportion of the population that does better or worse.