WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:00.000 --> 00:00:05.400
In this video we will talk about the histogram.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:07.100 --> 00:00:15.050
So far we have used this kind of graph to show all the measures that we have made.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:15.050 --> 00:00:23.700
For example if we had a set of measurements for children's knowledge of letters, that is how many
00:00:23.700 --> 00:00:32.500
letters children know the sounds to or their names, and so for each child we have a number of how
00:00:32.500 --> 00:00:38.800
many letters they actually know like this. This means that
NOTE
Treffsikkerhet: 90% (H?Y)
00:00:38.800 --> 00:00:45.100
there were two children who knew two letters,
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:45.700 --> 00:00:56.900
three children who knew three letters, there weren't any children who only knew four letter,s there
00:00:56.900 --> 00:01:04.900
were five children who knew five letters, one child you knew six letters, and so one for each possible
00:01:04.900 --> 00:01:08.550
value of number of letters known.
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:08.550 --> 00:01:17.050
This kind of graph has a lot of information, but is not very informative if you want to get a sense
00:01:17.050 --> 00:01:25.900
for the distribution of your data, especially if you have very large samples, or many different
00:01:25.900 --> 00:01:30.500
possible values that your observations could take.
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:30.900 --> 00:01:39.500
So this is another example where we have measured a bunch of children on their reading skill with
00:01:39.500 --> 00:01:48.400
the word building fluency test, and calculated a words per minute metric for each child. So there was
00:01:48.400 --> 00:01:55.300
one child who couldn't read at all that also got a zero words per minute. And there were two children who got
00:01:55.300 --> 00:01:59.199
20 words per minute, and so on.
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:59.199 --> 00:02:07.300
Because there are many different values this graph is quite flat, it doesn't really tell us how the
00:02:07.300 --> 00:02:15.300
measures are distributed. And you can imagine if we had 500 children this would not look very
00:02:15.300 --> 00:02:17.100
interpretable.
NOTE
Treffsikkerhet: 90% (H?Y)
00:02:17.100 --> 00:02:27.050
Instead of showing every single value we obtain, you do instead is put ranges of values
00:02:27.050 --> 00:02:37.400
in buckets. Buckets are called bins, and bins are formally defined as consecutive, not overlapping
00:02:37.400 --> 00:02:46.500
intervals. So that's ranges of values the touched on their ends. So for a bin width of five the
NOTE
Treffsikkerhet: 79% (H?Y)
00:02:46.500 --> 00:02:55.900
first bucket, the first bin, would be from a value of zero to a value five words per minute.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:56.400 --> 00:03:09.500
So how many children had reading rates from 0 to 5 words per minute, well one, two, three. So these
00:03:09.500 --> 00:03:19.649
three children go into the first bin that represents values between 0 and 5, and there are three
00:03:19.649 --> 00:03:24.000
values inside this bin, this bucket.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:24.000 --> 00:03:29.450
The next bin would be between 5 and 10
NOTE
Treffsikkerhet: 77% (H?Y)
00:03:29.450 --> 00:03:42.100
so one here, and 5 here, that is six values were observed between five and 10. And this is our second
00:03:42.100 --> 00:03:46.300
bucket that contains six measurements .
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:47.000 --> 00:03:53.450
And so the next one would be between 10 and 15,
NOTE
Treffsikkerhet: 88% (H?Y)
00:03:53.450 --> 00:04:05.500
we have only one child with a rate of between 10 and 15 words per minute so this is one. We have
NOTE
Treffsikkerhet: 70% (MEDIUM)
00:04:05.500 --> 00:04:12.750
two, plus three, plus two, plus two,
NOTE
Treffsikkerhet: 84% (H?Y)
00:04:12.750 --> 00:04:16.299
that is nine
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:16.299 --> 00:04:29.049
values in the range between 15 and 20, and so on. Now one important thing here is what happens with
00:04:29.049 --> 00:04:37.600
the Border values. We should not count them twice because the total number of observations here must
00:04:37.600 --> 00:04:43.400
be equal to the total number of observations here, which is how many we have. So if we count the
00:04:43.400 --> 00:04:46.800
heights of all these bars the sum would be
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:46.800 --> 00:04:50.100
how many children were assessed.
NOTE
Treffsikkerhet: 85% (H?Y)
00:04:50.100 --> 00:05:00.800
So it's important to be consistent. Usually the Border value goes on the left, so 20 words per minute
00:05:00.800 --> 00:05:11.700
goes into the 15 to 20 box. And not into the 22 to 25 box. So five words per minute would be here, 10 words
00:05:11.700 --> 00:05:20.850
per minute would be here, and so on. And the result of this bining of values is taking this
NOTE
Treffsikkerhet: 86% (H?Y)
00:05:20.850 --> 00:05:28.100
representation of, which is called a frequency plot, and creating this here which is called a
00:05:28.100 --> 00:05:29.750
histogram.
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:29.750 --> 00:05:38.000
The histogram is probably the most important graphical display there is we always use it when we
00:05:38.000 --> 00:05:47.400
have quantitative variables, because it gives us at a glance a very good indication of the shape of
00:05:47.400 --> 00:05:57.050
our data distribution, how our data are distributed over values, in a way that is not so much affected
00:05:57.050 --> 00:06:00.299
by how many values there can be, or
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:00.299 --> 00:06:09.000
many measures we have made. aAd we can also easily see if there are Peaks, if there are gaps, or if
00:06:09.000 --> 00:06:14.800
there are any outliers far from the other measurements using the histogram.