WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 90% (H?Y)
00:00:00.000 --> 00:00:07.700
in this video we will see how to derive descriptive statistics in general mean
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:07.700 --> 00:00:14.000
for this and many of the following examples we will be using a data set which you can download from
00:00:14.000 --> 00:00:15.700
canvas
NOTE
Treffsikkerhet: 80% (H?Y)
00:00:15.899 --> 00:00:27.049
which is called stats data. So to open these for the first time you have to go to open Engine movie
00:00:27.049 --> 00:00:36.050
and then browse and go to that place in your computer where you have stored this file, starts data
00:00:36.050 --> 00:00:42.800
sped 4010, 2020 and open.
NOTE
Treffsikkerhet: 80% (H?Y)
00:00:44.100 --> 00:00:54.700
After you open this file once, this file will appear down here to the left side and you will be able
00:00:54.700 --> 00:00:58.600
to directly open it by clicking here.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:58.600 --> 00:01:08.100
To go back to our file, every time you first load a file you should check your variables. What they
00:01:08.100 --> 00:01:12.100
are and how they are defined.
NOTE
Treffsikkerhet: 90% (H?Y)
00:01:12.300 --> 00:01:19.700
Ee will go into greater detail about this data set later, for now we will just talk about the basic
00:01:19.700 --> 00:01:27.700
properties of variables. The first variable is called ID and it's just a designation for each
00:01:27.700 --> 00:01:29.150
participant
NOTE
Treffsikkerhet: 90% (H?Y)
00:01:29.150 --> 00:01:34.050
so 11, 12, 13 these are all different children,
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:34.050 --> 00:01:37.600
and there is 47 of them-
NOTE
Treffsikkerhet: 84% (H?Y)
00:01:37.600 --> 00:01:46.250
The second variable is sex and it includes males and females. The third variable is called home Lang
00:01:46.250 --> 00:01:52.850
and it refers to the home language whether it is a majority language or a minority language .
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:52.850 --> 00:02:00.500
The fourth variable is called condition and has to do with an intervention provided to some children.
NOTE
Treffsikkerhet: 85% (H?Y)
00:02:01.100 --> 00:02:10.400
And the next set of variables are measured in kindergarten for these children, so we have a
00:02:10.400 --> 00:02:17.600
measure called matrices which is a general cognitive ability test, we have a measure of letter
00:02:17.600 --> 00:02:22.399
knowledge, how many letters of the alphabet that children know,
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:22.399 --> 00:02:31.200
and a measure of receptive vocabulary, how many words they know and can show it by pointing to the
00:02:31.200 --> 00:02:38.400
right picture when they hear the corresponding word. In kindergarten we also have a measure of word
00:02:38.400 --> 00:02:45.800
reading fluency and as you may be able to see here, most of the values are 0 which means that kids in
00:02:45.800 --> 00:02:50.200
kindergarten don't know how to read any words yet.
NOTE
Treffsikkerhet: 80% (H?Y)
00:02:50.600 --> 00:02:59.300
There are also measures taken in grade one you g1 and grade 2, g2 so we have vocabulary in grade 1
00:02:59.300 --> 00:03:07.500
and 2 and word fluency in grade 1 and 2. And there is also a measurement of word fluency taken after
00:03:07.500 --> 00:03:13.800
some reading intervention. Some of these data come from a real study others are made up for the
00:03:13.800 --> 00:03:17.700
purpose of demonstrations in this course.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:17.700 --> 00:03:24.400
But otherwise they are very realistic data, so the first thing to check is the definition of the
00:03:24.400 --> 00:03:33.500
variables. The first four variables here are obviously not numbers they are labels so it's very easy
00:03:33.500 --> 00:03:40.600
to understand that they are actually measured on a nominal scale there is no ordering in any of
00:03:40.600 --> 00:03:42.149
these, no rank,
NOTE
Treffsikkerhet: 77% (H?Y)
00:03:42.149 --> 00:03:50.550
there is a bunch of Children here identified by the IDS, there is two categories in sects two labels
00:03:50.550 --> 00:03:58.200
there is two labels in home language, and to labels in condition. And to check the definition or to
00:03:58.200 --> 00:04:04.650
correct the definition of a variable, you double click on the variable name and this panel opens up
00:04:04.650 --> 00:04:10.500
and you can see here what the measuring scale is this is at the nominal level, which is correct,
00:04:10.500 --> 00:04:12.050
we're not going to change it.
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:12.050 --> 00:04:17.250
We could set it to ordinal or continuous if it were a different kind of variable
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:17.250 --> 00:04:23.600
also Jimovie has a specific type that's called ID, but we don't need to use it here we'll just
00:04:23.600 --> 00:04:30.500
leave it out the actual nominal scale. And here you can see all the different levels which in this
00:04:30.500 --> 00:04:37.550
case are the children, you can go to the next variable by clicking here or by clicking here.
NOTE
Treffsikkerhet: 81% (H?Y)
00:04:37.550 --> 00:04:46.300
The sex variable is also measured at the nominal level, we're not changing,it is a text label and the
00:04:46.300 --> 00:04:49.000
possible labels are f and m.
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:49.000 --> 00:04:52.150
The next variable
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:52.150 --> 00:05:02.100
is again nominal measurement of a text type, and includes two levels: majority and minority language.
00:05:02.100 --> 00:05:09.799
And the next variable is condition that has control and intervention values measured at the nominal
00:05:09.799 --> 00:05:20.600
scale level. The next variable matrices 0K which is in fact the number of correct answers in
00:05:20.600 --> 00:05:22.300
this test.
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:22.300 --> 00:05:26.150
It is actually measured at the ratio level,
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:26.150 --> 00:05:33.000
because it is a real count. So although the data are integers you cannot have half a question
00:05:33.000 --> 00:05:42.700
answered correctly, you still have a natural zero which means zero correct answers. And so this
00:05:42.700 --> 00:05:49.400
variable is measured at the ratio level and here is marked as continues, because that's what Jimovie
00:05:49.400 --> 00:05:52.500
calls the number variables.
NOTE
Treffsikkerhet: 90% (H?Y)
00:05:52.700 --> 00:06:01.100
The next variable is also continuous, it's letter knowledge at kindergarten and this is the number of
00:06:01.100 --> 00:06:09.200
letters known by the child, so again it's a ratio level. Vocabulary at kindergarten is again
00:06:09.200 --> 00:06:14.900
continuous because it's the number of correct responses in the vocabulary test, so this is correctly
00:06:14.900 --> 00:06:21.850
assigned to be continues. And then we have vocabulary grade 1 again, correct, vocabulary
NOTE
Treffsikkerhet: 69% (MEDIUM)
00:06:21.850 --> 00:06:24.850
at grade 2, correct,
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:24.850 --> 00:06:35.800
Word fluency, this is measured as words per minute so it is again a ratio level variable which is
00:06:35.800 --> 00:06:38.600
marked as continues in Jimovie.
NOTE
Treffsikkerhet: 84% (H?Y)
00:06:40.400 --> 00:06:46.600
Word fluency in grade one, word fluency in grade 2.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:47.500 --> 00:06:56.100
So clicking on this will remove this variable definition panel, and now that we have checked all our
00:06:56.100 --> 00:07:05.200
variables and ensure that they are correctly defined we can go on and look at some descriptives. To
00:07:05.200 --> 00:07:11.300
perform descriptive analysis we have to be on the analysis panel here, not the data panel, and click
00:07:11.300 --> 00:07:15.100
on exploration ,descriptives.
NOTE
Treffsikkerhet: 83% (H?Y)
00:07:17.100 --> 00:07:26.300
And we'll go through this twice because what we can do with nominal level scales and ratio level
00:07:26.300 --> 00:07:36.300
skills are very different things. So first we should choose our nominal scales, we don't need to do
00:07:36.300 --> 00:07:46.600
any statistics on the IDS, unless we wanted to check for mistakes of this sort, for example if there
00:07:46.600 --> 00:07:47.900
are any children that
NOTE
Treffsikkerhet: 73% (MEDIUM)
00:07:47.900 --> 00:07:56.000
that appear twice. So if we want to check that we could add ID here and use a values table to
00:07:56.000 --> 00:08:03.200
see if any ID appears twice, because that would be a mistake. So I click here on frequency tables as a
00:08:03.200 --> 00:08:11.000
first thing to look at arw the frequencies of all the values, frequency here means how many times it
00:08:11.000 --> 00:08:17.750
is observed, we look at the frequenciee. So let's start from the bottom, and you see
NOTE
Treffsikkerhet: 82% (H?Y)
00:08:17.750 --> 00:08:27.700
that each ID appears once, all these are once so there are no mistakes in the IDS, none
00:08:27.700 --> 00:08:34.500
of the IDS appears twice, we're not really interested in this so I'm sending it back.
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:35.600 --> 00:08:45.100
What kind of descriptive statistics can we do with nominal level variables, well not very much, we
00:08:45.100 --> 00:08:54.600
cannot calculate means or medians so I unclick these, there is no minimum or maximum, we can count how
00:08:54.600 --> 00:09:03.800
many values we have, and how many missing values we have. You can see here that for our three nominal
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:03.800 --> 00:09:15.500
variables we have no missing value, so there is 47 values of each, none of the other options apply for
00:09:15.500 --> 00:09:23.600
qualitative, or categorical, or nominal scale variables, so we don't check anything else and that's it
00:09:23.600 --> 00:09:25.350
for our
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:25.350 --> 00:09:28.150
categorical variables.
NOTE
Treffsikkerhet: 82% (H?Y)
00:09:28.150 --> 00:09:40.100
Now we will do another run for the number, for the quantitative variables. And to go back I click here
00:09:40.100 --> 00:09:49.900
to close that analysis window, if I click here it comes back. And if I add or remove anything from
00:09:49.900 --> 00:09:54.750
here if I check anything this result is affected,
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:54.750 --> 00:10:02.200
but I don't want to affect this now, I want to start a new set of descriptives concerning the
00:10:02.200 --> 00:10:08.600
quantitative, the numeric variables, those are marked as continues in Jimovie. So I want this to be
00:10:08.600 --> 00:10:09.700
done,
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:09.700 --> 00:10:19.750
I could alternatively close it like that and start a new one. So again I go exploration, descriptives
00:10:19.750 --> 00:10:28.800
and here we have the beginning of a new set of descriptive analysis, and these will concern the
00:10:28.800 --> 00:10:34.849
numeric ones. So I clicked on the first variable,
NOTE
Treffsikkerhet: 83% (H?Y)
00:10:34.849 --> 00:10:39.700
I could double click, it it makes no difference,
NOTE
Treffsikkerhet: 85% (H?Y)
00:10:39.700 --> 00:10:44.200
I could send it in here like that
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:44.300 --> 00:10:54.900
and back. And to select the whole bunch of variables together I can shift click shift, I'm pressing
00:10:54.900 --> 00:11:01.800
shift and clicking, and so all of these are selected. And I can send all of these here, I left this one
00:11:01.800 --> 00:11:07.550
out on purpose because it's the intervention variable and we don't care about it right now.
NOTE
Treffsikkerhet: 86% (H?Y)
00:11:07.550 --> 00:11:16.600
Jimovie will not produce a table of values for quantitative, for numeric variables, because there
00:11:16.600 --> 00:11:23.500
could be very many different values, indeed each value might even appear only once, so this is not a
00:11:23.500 --> 00:11:31.100
very informative thing to do with a numeric variable. Instead we can have the indices of central
00:11:31.100 --> 00:11:36.849
tendency and dispersion that we have already seen. So first we have to check
NOTE
Treffsikkerhet: 90% (H?Y)
00:11:36.849 --> 00:11:43.600
how many values we have and if there are any missing ones, this is always important to check. In terms
00:11:43.600 --> 00:11:49.400
of central tendency we can get the median and the mean, and we can also ask for the mode when it
00:11:49.400 --> 00:11:57.400
exists, we're usually not interested in it, if we are we click here and it appears in the list as well.
NOTE
Treffsikkerhet: 91% (H?Y)
00:11:57.800 --> 00:12:02.250
We can look at the individual quartiles,
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:02.250 --> 00:12:12.800
we can look at the minimum and maximum values, the range of values, and the standard deviation.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:14.500 --> 00:12:22.200
And we can look at some other things that we will talk about later in the course. So these are the
00:12:22.200 --> 00:12:30.200
basic descriptive statistics that are usually interested in the mean and the median, the range, the
00:12:30.200 --> 00:12:37.400
standard deviation, the minimum and the maximum. aAd we're usually more interested in the minimum and
00:12:37.400 --> 00:12:44.000
maximum which are indicative of potential problems than their difference which is the range. So
00:12:44.000 --> 00:12:45.200
this is a
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:45.200 --> 00:12:53.100
reasonable table of descriptives for a set of numeric variables. And now that we have completed this
00:12:53.100 --> 00:13:01.500
analysis we can hide the analysis panel, and we can save our results ,or copy/paste the tables into
00:13:01.500 --> 00:13:11.400
our report. We have one section for the categorical variables and one section for the numeric
00:13:11.400 --> 00:13:13.000
variables.