WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 84% (H?Y)
00:00:00.100 --> 00:00:09.100
In this video we will talk about statistical significance in particular we will review the logic of
00:00:09.100 --> 00:00:16.700
statistical Association testing, and the process of statistical hypothesis testing. And we will set
00:00:16.700 --> 00:00:24.400
the stage for the specific statistical analyses that will follow in this course. The three important
00:00:24.400 --> 00:00:29.900
ingredients of statistical significance testing are indices of
NOTE
Treffsikkerhet: 82% (H?Y)
00:00:29.900 --> 00:00:38.200
Association, sampling distributions for these indices, and confidence levels that we set for our
00:00:38.200 --> 00:00:40.300
decision making.
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:40.400 --> 00:00:49.850
The logic of statistically examining relationships between variables is a little convoluted.
NOTE
Treffsikkerhet: 82% (H?Y)
00:00:49.850 --> 00:00:57.400
The first thing that we need to do when we have two variables and want to examine whether they are
00:00:57.400 --> 00:01:06.300
statistically Associated is to compute an index of Association. So some quantity that expresses the
00:01:06.300 --> 00:01:14.800
strength of their Association. Remember these words for us mean the same thing Association,
00:01:14.800 --> 00:01:20.199
relationship, predictability and effect refer to the same
NOTE
Treffsikkerhet: 90% (H?Y)
00:01:20.199 --> 00:01:32.000
concept, the idea that two variables are related. We need to have a quantity that becomes larger when
00:01:32.000 --> 00:01:35.550
Association strength increases.
NOTE
Treffsikkerhet: 91% (H?Y)
00:01:35.550 --> 00:01:42.600
And so we need different ways to calculate this depending on the types of variables we have as we
00:01:42.600 --> 00:01:49.400
will see later, what is very important to take into account is the variability that is observed in
00:01:49.400 --> 00:01:58.700
the variables. Variables with larger variability will require stronger associations in order for that
00:01:58.700 --> 00:02:01.500
to be statistically discernible.
NOTE
Treffsikkerhet: 89% (H?Y)
00:02:02.600 --> 00:02:10.800
The second thing we need to do is to determine the sampling distribution of this index of
00:02:10.800 --> 00:02:12.650
Association.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:12.650 --> 00:02:21.600
The sampling distribution refers always to the null hypothesis, so how would this index be
00:02:21.600 --> 00:02:30.400
distributed if there had been no effect, no relationships between the variables. To determine the
00:02:30.400 --> 00:02:38.950
sampling distribution one way is to generate many independent random samples of the desired size,
NOTE
Treffsikkerhet: 86% (H?Y)
00:02:38.950 --> 00:02:47.800
and then obtain a simulated distribution in a large set of samples. In our simple examples using the
00:02:47.800 --> 00:02:54.300
mean as a test statistic instead of an index of Association we have obtained the sampling
00:02:54.300 --> 00:03:00.800
distribution of the mean by repeatedly sampling from a population and calculated the mean of each
00:03:00.800 --> 00:03:02.100
sample.
NOTE
Treffsikkerhet: 80% (H?Y)
00:03:02.100 --> 00:03:10.200
When it turns to Computing indices of Association we can do the same by sampling repeatedly from the
00:03:10.200 --> 00:03:16.600
null distribution and Computing the relevant statistic and finally obtaining its sampling
00:03:16.600 --> 00:03:24.900
distribution, in practice this is most often done mathematically. The mathematicians have derived the
00:03:24.900 --> 00:03:31.000
theoretically expected distributions for us and we can just look up the values that we are interested
00:03:31.000 --> 00:03:32.050
in.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:32.050 --> 00:03:43.500
The final step is hypothesis testing, it is to evaluate the unexpectedness of a finding like the one
00:03:43.500 --> 00:03:48.050
we've got if we assume the null hypothesis.
NOTE
Treffsikkerhet: 84% (H?Y)
00:03:48.050 --> 00:03:57.900
And this is found by comparing our observed index to the distribution in random samples, so to the
00:03:57.900 --> 00:04:06.300
sampling distribution of our statistic, of our index of Association. And this amounts to obtaining a
00:04:06.300 --> 00:04:15.250
p-value for The observed or higher level of Association, so the logic is first we need a quantity
00:04:15.250 --> 00:04:18.050
that indicates how strongly
NOTE
Treffsikkerhet: 86% (H?Y)
00:04:18.050 --> 00:04:27.300
the variables appear to be related, and then finding out if this strength of Association would be
00:04:27.300 --> 00:04:35.200
unexpected to occur by random sampling. This is the same thing we did when flipping coins in
00:04:35.200 --> 00:04:39.050
comparison to examining how many children improved.
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:39.050 --> 00:04:47.700
Now the actual process of getting this done involves the following steps; the first one is to
00:04:47.700 --> 00:04:54.900
determine our research question, which has to do with a relationship between specific variables that
00:04:54.900 --> 00:04:57.700
have to be fully defined.
NOTE
Treffsikkerhet: 86% (H?Y)
00:04:57.800 --> 00:05:06.400
Then in contradiction to our research hypothesis we posit the null hypothesis, which states that
00:05:06.400 --> 00:05:15.450
there is no relationship, and this means that any index of Association we compute will be equal to 0
00:05:15.450 --> 00:05:22.000
according to the null hypothesis. So the null hypothesis States no relationship, which means zero
00:05:22.000 --> 00:05:23.400
index.
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:23.400 --> 00:05:30.300
And then we compute the actual index of relationship from the data,
NOTE
Treffsikkerhet: 86% (H?Y)
00:05:30.800 --> 00:05:41.600
the following step is to assume the null hypothesis is true and compute the probability of an index
00:05:41.600 --> 00:05:49.100
as large or a greater than the one we observed based on the sampling distribution from the null
00:05:49.100 --> 00:05:58.600
hypothesis, this amounts to asking; How likely is it that such an observation, or a more extreme one,
00:05:58.600 --> 00:06:01.250
might occur just by randomly
NOTE
Treffsikkerhet: 75% (MEDIUM)
00:06:01.250 --> 00:06:05.100
sampling from the null distribution?
NOTE
Treffsikkerhet: 89% (H?Y)
00:06:05.100 --> 00:06:14.700
This probability, called the P value, is then compared to an arbitrary criteria that is set by
00:06:14.700 --> 00:06:26.600
convention. If the p-value is less than 0.05 which is the most commonly used Criterion, then we call
00:06:26.600 --> 00:06:34.200
the relationship statistically significant, this is a term, statistically significant means that the
00:06:34.200 --> 00:06:35.950
p-value is less than
NOTE
Treffsikkerhet: 88% (H?Y)
00:06:35.950 --> 00:06:38.900
our comparison criteria.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:38.900 --> 00:06:46.600
And if that happens then what we do is we reject the null hypothesis,
NOTE
Treffsikkerhet: 82% (H?Y)
00:06:46.600 --> 00:06:55.650
and conclude that there is a relationship among our variables. So the null hypothesis is rejected
00:06:55.650 --> 00:07:04.700
when we judge an effect such as the one we obtained to be unlikely on the Assumption of the null
00:07:04.700 --> 00:07:14.500
hypothesis. And unlikely means less than 5% likely, which is a rather loose criteria, you can use more
00:07:14.500 --> 00:07:17.800
stringent criteria in your studies.
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:17.900 --> 00:07:28.250
If the p-value is greater than, or equal to the Criterion here 0.05, then the relationship is called
00:07:28.250 --> 00:07:36.700
not statistically significant. Not statistically significant, remember this is a term statistically
00:07:36.700 --> 00:07:43.500
significant is a term, and the opposite is not statistically significant, it's not statistically
00:07:43.500 --> 00:07:45.800
insignificant.
NOTE
Treffsikkerhet: 90% (H?Y)
00:07:46.100 --> 00:07:51.850
In this case we don't reject the null hypothesis
NOTE
Treffsikkerhet: 91% (H?Y)
00:07:51.850 --> 00:08:00.100
and we conclude that there is no discernible relationship between our variables,
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:00.100 --> 00:08:08.500
there may be one but we cannot see it, so we cannot reject the null hypothesis and cannot conclude
00:08:08.500 --> 00:08:11.150
that there is a relationship.
NOTE
Treffsikkerhet: 82% (H?Y)
00:08:11.150 --> 00:08:21.400
The Criterion, also known as Alpha level, is called level of significance, so for a given significance
00:08:21.400 --> 00:08:30.700
level we may reject or fail to reject the null hypothesis, these are very strictly used terms and it
00:08:30.700 --> 00:08:37.200
is important to use them correctly, you're not allowed to rephrase as you like.
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:38.600 --> 00:08:48.200
As a preview here is a table of simplified presentation of indices of association between Pairs of
00:08:48.200 --> 00:08:56.800
variables, do not worry about trying to learn any of this right now, the point I want to make at this
00:08:56.800 --> 00:09:05.000
point is that depending on the types of variables one has, so whether the independent and dependent
00:09:05.000 --> 00:09:06.350
variable is
NOTE
Treffsikkerhet: 90% (H?Y)
00:09:06.350 --> 00:09:13.800
quantitative or qualitative. There are different indices of Association that it is possible to compute
00:09:13.800 --> 00:09:20.900
because that depends on the kinds of data that you have, and there are specific and actually not
00:09:20.900 --> 00:09:27.100
complicated ways to compute them in practice, we never do that the computer does it for us we just
00:09:27.100 --> 00:09:36.300
have to understand what happens. The null hypothesis amounts to setting these indices to 0 which is
00:09:36.300 --> 00:09:36.600
the assumption
NOTE
Treffsikkerhet: 90% (H?Y)
00:09:36.600 --> 00:09:44.000
of no relationship, and then we have to look up our values for these indices in the sampling
00:09:44.000 --> 00:09:52.000
distributions with Associated degrees of freedom, in each case, and then reach our decision based on
00:09:52.000 --> 00:09:55.600
the Criterion, which is the alpha level.
NOTE
Treffsikkerhet: 88% (H?Y)
00:09:56.700 --> 00:10:05.800
What exactly do we do with the null hypothesis? We may reject the null hypothesis when the
00:10:05.800 --> 00:10:12.500
probability of our observed statistic, or a more extreme one, by randomly sampling from the null
00:10:12.500 --> 00:10:21.400
distribution is too low, too low is determined by an arbitrary Criterion set by convention.
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:21.700 --> 00:10:31.300
This is called statistical significance it doesn't mean importance, so significant in the statistical
00:10:31.300 --> 00:10:40.600
sense means exactly that the probability of The observed effect, or a larger one, is less than a
00:10:40.600 --> 00:10:47.900
conventional Criterion for significance. That's what the word significant means in statistics, it
00:10:47.900 --> 00:10:52.150
doesn't mean important, it doesn't mean substantial,
NOTE
Treffsikkerhet: 86% (H?Y)
00:10:52.150 --> 00:11:01.600
it doesn't mean large. It only means that the P value is lower than some criteria, and this
00:11:01.600 --> 00:11:08.200
statistical significance indicates a statistically significant difference or a statistically
00:11:08.200 --> 00:11:15.200
significant Association, or a statistically significant effect, or prediction, and all these words that
00:11:15.200 --> 00:11:18.100
refer to the same concept.
NOTE
Treffsikkerhet: 91% (H?Y)
00:11:18.800 --> 00:11:29.100
The other case is when we cannot reject the null hypothesis, and you should never say accept the null
00:11:29.100 --> 00:11:37.350
hypothesis, we never accept the null hypothesis. The null hypothesis may never be true, for all we know
00:11:37.350 --> 00:11:45.900
there may always be some effect that it just so small that our sample size cannot discern it, so we
00:11:45.900 --> 00:11:48.650
never say accept the null hypothesis.
NOTE
Treffsikkerhet: 91% (H?Y)
00:11:48.650 --> 00:11:56.100
You may say that you retain the null hypothesis, which is the same and not rejecting it, but usually
00:11:56.100 --> 00:12:03.300
you just say we cannot reject, or we do not reject, or we fail to reject the null hypothesis, and we do
00:12:03.300 --> 00:12:10.599
that when the probability of our index is not low enough by the alpha level.
NOTE
Treffsikkerhet: 81% (H?Y)
00:12:10.599 --> 00:12:18.550
And the reason for that may be because we have insufficient power, we will never know.
NOTE
Treffsikkerhet: 90% (H?Y)
00:12:18.550 --> 00:12:26.000
When we cannot reject the null hypothesis we have a situation that is called not statistically
00:12:26.000 --> 00:12:32.400
significant, so the association is not statistically significant, or the difference is not
00:12:32.400 --> 00:12:39.100
statistically significant, the effect or the prediction is not statistically significant. We can also
00:12:39.100 --> 00:12:45.100
say that the indices are statistically indistinguishable, two means may be statistically
00:12:45.100 --> 00:12:49.000
indistinguishable, or our index of Association
NOTE
Treffsikkerhet: 60% (MEDIUM)
00:12:49.000 --> 00:12:52.800
may be statistically indistinguishable from zero.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:54.200 --> 00:13:03.400
In comparison between null hypothesis significance testing and confidence intervals we have to say
00:13:03.400 --> 00:13:10.500
that these are basically the same thing conceptually, although they don't look exactly the same and
00:13:10.500 --> 00:13:19.800
may not be equally difficult to understand. The first thing to note is that these two things are
00:13:19.800 --> 00:13:24.850
equivalent, so to say that the p-value is less than
NOTE
Treffsikkerhet: 84% (H?Y)
00:13:24.850 --> 00:13:34.050
Alpha is the same as saying that zero is not within the alpha level confidence interval.
NOTE
Treffsikkerhet: 91% (H?Y)
00:13:34.050 --> 00:13:46.200
So P less than 0.05 is the same as saying that the 95% confidence interval does not include 0, both
00:13:46.200 --> 00:13:55.200
of these are determined in the same way by looking up the critical value in the same sampling
00:13:55.200 --> 00:13:56.900
distribution.
NOTE
Treffsikkerhet: 91% (H?Y)
00:13:56.900 --> 00:14:05.100
And therefore they are exactly equivalent, so you can either provide an alpha level confidence
00:14:05.100 --> 00:14:12.200
interval and check whether it includes 0 or not, or you can provide a p-value and check whether it is
00:14:12.200 --> 00:14:15.000
lower than the alpha level.
NOTE
Treffsikkerhet: 91% (H?Y)
00:14:16.400 --> 00:14:26.500
An effect is significant whenever it is greater than the margin of error, if you remember the margin
00:14:26.500 --> 00:14:34.600
of error is calculated by taking into account the critical value that is based on the probability,
00:14:34.600 --> 00:14:41.900
which is your arbitrary Criterion, and the standard error which relates to the observed variability
00:14:41.900 --> 00:14:43.800
and your sample size.
NOTE
Treffsikkerhet: 78% (H?Y)
00:14:43.800 --> 00:14:49.800
So the margin of error already includes all the relevant information.
NOTE
Treffsikkerhet: 86% (H?Y)
00:14:50.200 --> 00:15:00.100
To say this a bit differently again, if the confidence interval excludes the null, which is almost
00:15:00.100 --> 00:15:08.700
always zero, if the confidence interval excludes the null then you are essentially rejecting the null
00:15:08.700 --> 00:15:18.100
hypothesis. The confidence interval is an interval estimate of something, it can be a confidence
00:15:18.100 --> 00:15:20.950
interval for your mean which is the simplest
NOTE
Treffsikkerhet: 70% (MEDIUM)
00:15:20.950 --> 00:15:28.150
statistic you can get, or it can be a confidence interval for your index of Association, or anything else.
00:15:28.150 --> 00:15:32.600
You can always provide a confidence interval.
NOTE
Treffsikkerhet: 91% (H?Y)
00:15:33.700 --> 00:15:43.600
This will be equivalent to providing a point estimate, so one value as your main guess or your main
00:15:43.600 --> 00:15:51.849
result of calculation, plus the standard error that is associated with it. Now you know how you can
00:15:51.849 --> 00:15:59.000
obtain a confidence interval based on a point estimate and standard error, so these two are
00:15:59.000 --> 00:16:00.900
equivalent.
NOTE
Treffsikkerhet: 90% (H?Y)
00:16:01.800 --> 00:16:11.200
The same Alpha level is relevant for both formulations, for the confidence interval the alpha level
00:16:11.200 --> 00:16:18.800
goes into calculating the margin of error via the critical value and for the significance of the
00:16:18.800 --> 00:16:25.000
point estimate it will go into comparing your p-value to this level.
NOTE
Treffsikkerhet: 90% (H?Y)
00:16:26.000 --> 00:16:34.700
The confidence interval may be a bit easier to understand, not in the sense that it is simpler to
00:16:34.700 --> 00:16:42.600
calculate, but in the sense that it is less likely to be misinterpreted like the p-value.
NOTE
Treffsikkerhet: 82% (H?Y)
00:16:42.700 --> 00:16:50.900
P values are unfortunately very often misinterpreted, and you should be very careful when you state
00:16:50.900 --> 00:16:53.600
what your P value means.
NOTE
Treffsikkerhet: 86% (H?Y)
00:16:54.900 --> 00:16:58.700
Here is what the P value means,
NOTE
Treffsikkerhet: 80% (H?Y)
00:16:58.700 --> 00:17:08.400
it is the probability of a finding or a statistic, an index of Association, like the one you observed
00:17:08.400 --> 00:17:17.900
from your data, or more extreme, if the null hypothesis is true. That's what the p-value is, and the
00:17:17.900 --> 00:17:24.150
p-value makes sense only in the context of the null hypothesis being true.
NOTE
Treffsikkerhet: 89% (H?Y)
00:17:24.150 --> 00:17:32.000
Here's what the p-value is not, it is not the probability of the null hypothesis, or of the
00:17:32.000 --> 00:17:38.300
alternative hypothesis. It cannot be the probability of the null hypothesis because you assume that
00:17:38.300 --> 00:17:47.500
it's true in order to compute the p-value, and of course the alternative hypothesis usually being the
00:17:47.500 --> 00:17:54.750
opposite of the null also cannot be evaluated probabilistically because it's assumed to be false for
NOTE
Treffsikkerhet: 74% (MEDIUM)
00:17:54.750 --> 00:17:56.900
computing the p-value.
NOTE
Treffsikkerhet: 88% (H?Y)
00:17:57.300 --> 00:18:06.100
It is not the probability of the data we observe, the data have been observed, they have a probability
00:18:06.100 --> 00:18:15.050
of one, they are just there. It's not the same to say the probability of my data and the probability
00:18:15.050 --> 00:18:23.100
of getting a result like the one I've got. So it's the probability of getting a data pattern like the
00:18:23.100 --> 00:18:26.800
one you've observe by random sampling.
NOTE
Treffsikkerhet: 91% (H?Y)
00:18:26.800 --> 00:18:34.750
It's not the probability that we are right or wrong obviously, no one can calculate this probability
00:18:34.750 --> 00:18:41.200
we would love to have that probability, but it is not mathematically possible because it depends on
00:18:41.200 --> 00:18:51.500
many other things. It depends on what other hypotheses are possible or plausible, it depends on how
00:18:51.500 --> 00:18:55.000
frequently things happen in general.
NOTE
Treffsikkerhet: 91% (H?Y)
00:18:55.000 --> 00:19:00.200
You cannot say anything like that just based on your data,
NOTE
Treffsikkerhet: 91% (H?Y)
00:19:01.200 --> 00:19:10.300
it is not the probability that your findings, or my findings, occurred by chance, or did not occur by
00:19:10.300 --> 00:19:20.200
chance. No one can tell how your data came about, we don't know why our observations occurred, we
00:19:20.200 --> 00:19:29.400
sample the population this means that we sampled from some distribution, and this entails a certain
00:19:29.400 --> 00:19:31.150
amount of variability.
NOTE
Treffsikkerhet: 68% (MEDIUM)
00:19:31.150 --> 00:19:39.300
A certain amount of sampling error. We may have been lucky to get a representative sample from the
00:19:39.300 --> 00:19:45.000
distribution, from the population we sampled from, or we may have been unlucky and there is no
00:19:45.000 --> 00:19:51.100
observable difference but can help us distinguish these, it's just a sample.
NOTE
Treffsikkerhet: 91% (H?Y)
00:19:51.100 --> 00:19:58.400
There's no way to know if your findings occurred by Chance, the only thing you can do is tell How
00:19:58.400 --> 00:20:07.200
likely it is to get findings like the one you got by random sampling from a distribution you know has
00:20:07.200 --> 00:20:09.200
no effect in it.
NOTE
Treffsikkerhet: 77% (H?Y)
00:20:09.200 --> 00:20:19.000
So you cannot conclude whether or not your result was not due to chance based on your p-value,
NOTE
Treffsikkerhet: 91% (H?Y)
00:20:19.300 --> 00:20:24.550
and you cannot even say how confident you are.
NOTE
Treffsikkerhet: 91% (H?Y)
00:20:24.550 --> 00:20:33.200
Your p-value doesn't tell you how certain you can be that your results are not due to chance, it only
00:20:33.200 --> 00:20:40.750
tells you how likely you are to get such results by chance, but nobody knows anything about your
00:20:40.750 --> 00:20:48.500
results. So you can't say anything about this either, this is not what the p-value means.
NOTE
Treffsikkerhet: 87% (H?Y)
00:20:48.500 --> 00:20:56.650
These are very commonly seen not only in Master's thesis, but In Articles sometimes.
NOTE
Treffsikkerhet: 83% (H?Y)
00:20:56.650 --> 00:21:04.300
These are typical misunderstandings of the p-value and it is very important that you think hard
00:21:04.300 --> 00:21:09.700
through this idea and you do not misinterpret the p-value.
NOTE
Treffsikkerhet: 91% (H?Y)
00:21:10.600 --> 00:21:20.100
Finally we should note that this whole deal of the null hypothesis significance testing process is
00:21:20.100 --> 00:21:29.500
very weird, it is a roundabout method going in circles, it's perverse because it tells us something
00:21:29.500 --> 00:21:32.000
that we don't really care about.
NOTE
Treffsikkerhet: 91% (H?Y)
00:21:32.000 --> 00:21:39.000
It's not what we want to know, we don't really care so much what is the chance of getting something
00:21:39.000 --> 00:21:47.450
by random sampling, so why are we doing that? Well because we can. So pretty much everyone, or almost
00:21:47.450 --> 00:21:53.600
everyone, is using null hypothesis significance testing because it's feasible, it's something we can
00:21:53.600 --> 00:22:00.350
actually calculate that is of relevance even though it's not exactly what we want.
NOTE
Treffsikkerhet: 91% (H?Y)
00:22:00.350 --> 00:22:08.400
There are actually large movements in statistics calling for going away from null hypothesis
00:22:08.400 --> 00:22:15.199
significance testing because of this perversity, and because of the misinterpretations, and because
00:22:15.199 --> 00:22:17.750
it's not telling us what we want.
NOTE
Treffsikkerhet: 91% (H?Y)
00:22:17.750 --> 00:22:24.700
These movements all make very good arguments and for the most part they have not been very
00:22:24.700 --> 00:22:32.250
successful except to alert people to the problems, because there is not a very good alternative yet.
NOTE
Treffsikkerhet: 80% (H?Y)
00:22:32.250 --> 00:22:40.100
There is another kind of Statistics called Bayesian statistics which is gaining ground, but that
00:22:40.100 --> 00:22:47.000
approach has its own problems as well. Although many things that it is an improvement on null
00:22:47.000 --> 00:22:55.000
hypothesis significance testing, who knows maybe it will prevail in the end, but at this time null
00:22:55.000 --> 00:23:02.449
hypothesis significance testing remains pervasive, so we must be familiar with it and must be
NOTE
Treffsikkerhet: 91% (H?Y)
00:23:02.449 --> 00:23:07.700
to avoid the misinterpretations that often accompany it.
NOTE
Treffsikkerhet: 89% (H?Y)
00:23:10.000 --> 00:23:19.700
So what we can do and what this process allows us to do is to calculate a probability based on an
00:23:19.700 --> 00:23:26.700
assumed distribution, and that is the distribution based on the null hypothesis because this is
00:23:26.700 --> 00:23:32.550
sufficiently concrete and well-defined that we can actually calculate something.
NOTE
Treffsikkerhet: 91% (H?Y)
00:23:32.550 --> 00:23:40.700
Unfortunately it's not possible to calculate probabilities based on unknown distributions, the thing
00:23:40.700 --> 00:23:48.100
here is we don't really know, no one really knows, what the true distribution in your population is, if
00:23:48.100 --> 00:23:51.300
we knew that we would not need to sample.
NOTE
Treffsikkerhet: 91% (H?Y)
00:23:51.300 --> 00:23:58.650
So if we don't know what the actual distribution is then there's nothing we can calculate from it,
NOTE
Treffsikkerhet: 91% (H?Y)
00:23:58.650 --> 00:24:04.600
that's why we have to resort to the crazy idea of the null hypothesis.
NOTE
Treffsikkerhet: 91% (H?Y)
00:24:04.600 --> 00:24:13.700
And we cannot calculate probabilities of hypotheses Based on data, we can't say How likely your idea
00:24:13.700 --> 00:24:21.700
is based on your observations because we don't know other things about your idea and its General
00:24:21.700 --> 00:24:27.200
plausibility, and what other Alternatives there may exist.
NOTE
Treffsikkerhet: 91% (H?Y)
00:24:27.700 --> 00:24:37.000
And that's why we always posit a null hypothesis and what we do by positing this is hoping to be
00:24:37.000 --> 00:24:46.000
able to reject this hypothesis by comparing the index obtained from our data to this Criterion for
00:24:46.000 --> 00:24:53.500
rejection that we set via the alpha level. So the null hypothesis that we state is essentially a
00:24:53.500 --> 00:24:58.000
hypothesis we seek to reject and the question is;
NOTE
Treffsikkerhet: 82% (H?Y)
00:24:58.000 --> 00:25:03.900
will our data allow us to reject this hypothesis or not ?