WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 84% (H?Y) 00:00:00.100 --> 00:00:09.100 In this video we will talk about statistical significance in particular we will review the logic of 00:00:09.100 --> 00:00:16.700 statistical Association testing, and the process of statistical hypothesis testing. And we will set 00:00:16.700 --> 00:00:24.400 the stage for the specific statistical analyses that will follow in this course. The three important 00:00:24.400 --> 00:00:29.900 ingredients of statistical significance testing are indices of NOTE Treffsikkerhet: 82% (H?Y) 00:00:29.900 --> 00:00:38.200 Association, sampling distributions for these indices, and confidence levels that we set for our 00:00:38.200 --> 00:00:40.300 decision making. NOTE Treffsikkerhet: 91% (H?Y) 00:00:40.400 --> 00:00:49.850 The logic of statistically examining relationships between variables is a little convoluted. NOTE Treffsikkerhet: 82% (H?Y) 00:00:49.850 --> 00:00:57.400 The first thing that we need to do when we have two variables and want to examine whether they are 00:00:57.400 --> 00:01:06.300 statistically Associated is to compute an index of Association. So some quantity that expresses the 00:01:06.300 --> 00:01:14.800 strength of their Association. Remember these words for us mean the same thing Association, 00:01:14.800 --> 00:01:20.199 relationship, predictability and effect refer to the same NOTE Treffsikkerhet: 90% (H?Y) 00:01:20.199 --> 00:01:32.000 concept, the idea that two variables are related. We need to have a quantity that becomes larger when 00:01:32.000 --> 00:01:35.550 Association strength increases. NOTE Treffsikkerhet: 91% (H?Y) 00:01:35.550 --> 00:01:42.600 And so we need different ways to calculate this depending on the types of variables we have as we 00:01:42.600 --> 00:01:49.400 will see later, what is very important to take into account is the variability that is observed in 00:01:49.400 --> 00:01:58.700 the variables. Variables with larger variability will require stronger associations in order for that 00:01:58.700 --> 00:02:01.500 to be statistically discernible. NOTE Treffsikkerhet: 89% (H?Y) 00:02:02.600 --> 00:02:10.800 The second thing we need to do is to determine the sampling distribution of this index of 00:02:10.800 --> 00:02:12.650 Association. NOTE Treffsikkerhet: 91% (H?Y) 00:02:12.650 --> 00:02:21.600 The sampling distribution refers always to the null hypothesis, so how would this index be 00:02:21.600 --> 00:02:30.400 distributed if there had been no effect, no relationships between the variables. To determine the 00:02:30.400 --> 00:02:38.950 sampling distribution one way is to generate many independent random samples of the desired size, NOTE Treffsikkerhet: 86% (H?Y) 00:02:38.950 --> 00:02:47.800 and then obtain a simulated distribution in a large set of samples. In our simple examples using the 00:02:47.800 --> 00:02:54.300 mean as a test statistic instead of an index of Association we have obtained the sampling 00:02:54.300 --> 00:03:00.800 distribution of the mean by repeatedly sampling from a population and calculated the mean of each 00:03:00.800 --> 00:03:02.100 sample. NOTE Treffsikkerhet: 80% (H?Y) 00:03:02.100 --> 00:03:10.200 When it turns to Computing indices of Association we can do the same by sampling repeatedly from the 00:03:10.200 --> 00:03:16.600 null distribution and Computing the relevant statistic and finally obtaining its sampling 00:03:16.600 --> 00:03:24.900 distribution, in practice this is most often done mathematically. The mathematicians have derived the 00:03:24.900 --> 00:03:31.000 theoretically expected distributions for us and we can just look up the values that we are interested 00:03:31.000 --> 00:03:32.050 in. NOTE Treffsikkerhet: 91% (H?Y) 00:03:32.050 --> 00:03:43.500 The final step is hypothesis testing, it is to evaluate the unexpectedness of a finding like the one 00:03:43.500 --> 00:03:48.050 we've got if we assume the null hypothesis. NOTE Treffsikkerhet: 84% (H?Y) 00:03:48.050 --> 00:03:57.900 And this is found by comparing our observed index to the distribution in random samples, so to the 00:03:57.900 --> 00:04:06.300 sampling distribution of our statistic, of our index of Association. And this amounts to obtaining a 00:04:06.300 --> 00:04:15.250 p-value for The observed or higher level of Association, so the logic is first we need a quantity 00:04:15.250 --> 00:04:18.050 that indicates how strongly NOTE Treffsikkerhet: 86% (H?Y) 00:04:18.050 --> 00:04:27.300 the variables appear to be related, and then finding out if this strength of Association would be 00:04:27.300 --> 00:04:35.200 unexpected to occur by random sampling. This is the same thing we did when flipping coins in 00:04:35.200 --> 00:04:39.050 comparison to examining how many children improved. NOTE Treffsikkerhet: 91% (H?Y) 00:04:39.050 --> 00:04:47.700 Now the actual process of getting this done involves the following steps; the first one is to 00:04:47.700 --> 00:04:54.900 determine our research question, which has to do with a relationship between specific variables that 00:04:54.900 --> 00:04:57.700 have to be fully defined. NOTE Treffsikkerhet: 86% (H?Y) 00:04:57.800 --> 00:05:06.400 Then in contradiction to our research hypothesis we posit the null hypothesis, which states that 00:05:06.400 --> 00:05:15.450 there is no relationship, and this means that any index of Association we compute will be equal to 0 00:05:15.450 --> 00:05:22.000 according to the null hypothesis. So the null hypothesis States no relationship, which means zero 00:05:22.000 --> 00:05:23.400 index. NOTE Treffsikkerhet: 91% (H?Y) 00:05:23.400 --> 00:05:30.300 And then we compute the actual index of relationship from the data, NOTE Treffsikkerhet: 86% (H?Y) 00:05:30.800 --> 00:05:41.600 the following step is to assume the null hypothesis is true and compute the probability of an index 00:05:41.600 --> 00:05:49.100 as large or a greater than the one we observed based on the sampling distribution from the null 00:05:49.100 --> 00:05:58.600 hypothesis, this amounts to asking; How likely is it that such an observation, or a more extreme one, 00:05:58.600 --> 00:06:01.250 might occur just by randomly NOTE Treffsikkerhet: 75% (MEDIUM) 00:06:01.250 --> 00:06:05.100 sampling from the null distribution? NOTE Treffsikkerhet: 89% (H?Y) 00:06:05.100 --> 00:06:14.700 This probability, called the P value, is then compared to an arbitrary criteria that is set by 00:06:14.700 --> 00:06:26.600 convention. If the p-value is less than 0.05 which is the most commonly used Criterion, then we call 00:06:26.600 --> 00:06:34.200 the relationship statistically significant, this is a term, statistically significant means that the 00:06:34.200 --> 00:06:35.950 p-value is less than NOTE Treffsikkerhet: 88% (H?Y) 00:06:35.950 --> 00:06:38.900 our comparison criteria. NOTE Treffsikkerhet: 91% (H?Y) 00:06:38.900 --> 00:06:46.600 And if that happens then what we do is we reject the null hypothesis, NOTE Treffsikkerhet: 82% (H?Y) 00:06:46.600 --> 00:06:55.650 and conclude that there is a relationship among our variables. So the null hypothesis is rejected 00:06:55.650 --> 00:07:04.700 when we judge an effect such as the one we obtained to be unlikely on the Assumption of the null 00:07:04.700 --> 00:07:14.500 hypothesis. And unlikely means less than 5% likely, which is a rather loose criteria, you can use more 00:07:14.500 --> 00:07:17.800 stringent criteria in your studies. NOTE Treffsikkerhet: 91% (H?Y) 00:07:17.900 --> 00:07:28.250 If the p-value is greater than, or equal to the Criterion here 0.05, then the relationship is called 00:07:28.250 --> 00:07:36.700 not statistically significant. Not statistically significant, remember this is a term statistically 00:07:36.700 --> 00:07:43.500 significant is a term, and the opposite is not statistically significant, it's not statistically 00:07:43.500 --> 00:07:45.800 insignificant. NOTE Treffsikkerhet: 90% (H?Y) 00:07:46.100 --> 00:07:51.850 In this case we don't reject the null hypothesis NOTE Treffsikkerhet: 91% (H?Y) 00:07:51.850 --> 00:08:00.100 and we conclude that there is no discernible relationship between our variables, NOTE Treffsikkerhet: 91% (H?Y) 00:08:00.100 --> 00:08:08.500 there may be one but we cannot see it, so we cannot reject the null hypothesis and cannot conclude 00:08:08.500 --> 00:08:11.150 that there is a relationship. NOTE Treffsikkerhet: 82% (H?Y) 00:08:11.150 --> 00:08:21.400 The Criterion, also known as Alpha level, is called level of significance, so for a given significance 00:08:21.400 --> 00:08:30.700 level we may reject or fail to reject the null hypothesis, these are very strictly used terms and it 00:08:30.700 --> 00:08:37.200 is important to use them correctly, you're not allowed to rephrase as you like. NOTE Treffsikkerhet: 91% (H?Y) 00:08:38.600 --> 00:08:48.200 As a preview here is a table of simplified presentation of indices of association between Pairs of 00:08:48.200 --> 00:08:56.800 variables, do not worry about trying to learn any of this right now, the point I want to make at this 00:08:56.800 --> 00:09:05.000 point is that depending on the types of variables one has, so whether the independent and dependent 00:09:05.000 --> 00:09:06.350 variable is NOTE Treffsikkerhet: 90% (H?Y) 00:09:06.350 --> 00:09:13.800 quantitative or qualitative. There are different indices of Association that it is possible to compute 00:09:13.800 --> 00:09:20.900 because that depends on the kinds of data that you have, and there are specific and actually not 00:09:20.900 --> 00:09:27.100 complicated ways to compute them in practice, we never do that the computer does it for us we just 00:09:27.100 --> 00:09:36.300 have to understand what happens. The null hypothesis amounts to setting these indices to 0 which is 00:09:36.300 --> 00:09:36.600 the assumption NOTE Treffsikkerhet: 90% (H?Y) 00:09:36.600 --> 00:09:44.000 of no relationship, and then we have to look up our values for these indices in the sampling 00:09:44.000 --> 00:09:52.000 distributions with Associated degrees of freedom, in each case, and then reach our decision based on 00:09:52.000 --> 00:09:55.600 the Criterion, which is the alpha level. NOTE Treffsikkerhet: 88% (H?Y) 00:09:56.700 --> 00:10:05.800 What exactly do we do with the null hypothesis? We may reject the null hypothesis when the 00:10:05.800 --> 00:10:12.500 probability of our observed statistic, or a more extreme one, by randomly sampling from the null 00:10:12.500 --> 00:10:21.400 distribution is too low, too low is determined by an arbitrary Criterion set by convention. NOTE Treffsikkerhet: 91% (H?Y) 00:10:21.700 --> 00:10:31.300 This is called statistical significance it doesn't mean importance, so significant in the statistical 00:10:31.300 --> 00:10:40.600 sense means exactly that the probability of The observed effect, or a larger one, is less than a 00:10:40.600 --> 00:10:47.900 conventional Criterion for significance. That's what the word significant means in statistics, it 00:10:47.900 --> 00:10:52.150 doesn't mean important, it doesn't mean substantial, NOTE Treffsikkerhet: 86% (H?Y) 00:10:52.150 --> 00:11:01.600 it doesn't mean large. It only means that the P value is lower than some criteria, and this 00:11:01.600 --> 00:11:08.200 statistical significance indicates a statistically significant difference or a statistically 00:11:08.200 --> 00:11:15.200 significant Association, or a statistically significant effect, or prediction, and all these words that 00:11:15.200 --> 00:11:18.100 refer to the same concept. NOTE Treffsikkerhet: 91% (H?Y) 00:11:18.800 --> 00:11:29.100 The other case is when we cannot reject the null hypothesis, and you should never say accept the null 00:11:29.100 --> 00:11:37.350 hypothesis, we never accept the null hypothesis. The null hypothesis may never be true, for all we know 00:11:37.350 --> 00:11:45.900 there may always be some effect that it just so small that our sample size cannot discern it, so we 00:11:45.900 --> 00:11:48.650 never say accept the null hypothesis. NOTE Treffsikkerhet: 91% (H?Y) 00:11:48.650 --> 00:11:56.100 You may say that you retain the null hypothesis, which is the same and not rejecting it, but usually 00:11:56.100 --> 00:12:03.300 you just say we cannot reject, or we do not reject, or we fail to reject the null hypothesis, and we do 00:12:03.300 --> 00:12:10.599 that when the probability of our index is not low enough by the alpha level. NOTE Treffsikkerhet: 81% (H?Y) 00:12:10.599 --> 00:12:18.550 And the reason for that may be because we have insufficient power, we will never know. NOTE Treffsikkerhet: 90% (H?Y) 00:12:18.550 --> 00:12:26.000 When we cannot reject the null hypothesis we have a situation that is called not statistically 00:12:26.000 --> 00:12:32.400 significant, so the association is not statistically significant, or the difference is not 00:12:32.400 --> 00:12:39.100 statistically significant, the effect or the prediction is not statistically significant. We can also 00:12:39.100 --> 00:12:45.100 say that the indices are statistically indistinguishable, two means may be statistically 00:12:45.100 --> 00:12:49.000 indistinguishable, or our index of Association NOTE Treffsikkerhet: 60% (MEDIUM) 00:12:49.000 --> 00:12:52.800 may be statistically indistinguishable from zero. NOTE Treffsikkerhet: 91% (H?Y) 00:12:54.200 --> 00:13:03.400 In comparison between null hypothesis significance testing and confidence intervals we have to say 00:13:03.400 --> 00:13:10.500 that these are basically the same thing conceptually, although they don't look exactly the same and 00:13:10.500 --> 00:13:19.800 may not be equally difficult to understand. The first thing to note is that these two things are 00:13:19.800 --> 00:13:24.850 equivalent, so to say that the p-value is less than NOTE Treffsikkerhet: 84% (H?Y) 00:13:24.850 --> 00:13:34.050 Alpha is the same as saying that zero is not within the alpha level confidence interval. NOTE Treffsikkerhet: 91% (H?Y) 00:13:34.050 --> 00:13:46.200 So P less than 0.05 is the same as saying that the 95% confidence interval does not include 0, both 00:13:46.200 --> 00:13:55.200 of these are determined in the same way by looking up the critical value in the same sampling 00:13:55.200 --> 00:13:56.900 distribution. NOTE Treffsikkerhet: 91% (H?Y) 00:13:56.900 --> 00:14:05.100 And therefore they are exactly equivalent, so you can either provide an alpha level confidence 00:14:05.100 --> 00:14:12.200 interval and check whether it includes 0 or not, or you can provide a p-value and check whether it is 00:14:12.200 --> 00:14:15.000 lower than the alpha level. NOTE Treffsikkerhet: 91% (H?Y) 00:14:16.400 --> 00:14:26.500 An effect is significant whenever it is greater than the margin of error, if you remember the margin 00:14:26.500 --> 00:14:34.600 of error is calculated by taking into account the critical value that is based on the probability, 00:14:34.600 --> 00:14:41.900 which is your arbitrary Criterion, and the standard error which relates to the observed variability 00:14:41.900 --> 00:14:43.800 and your sample size. NOTE Treffsikkerhet: 78% (H?Y) 00:14:43.800 --> 00:14:49.800 So the margin of error already includes all the relevant information. NOTE Treffsikkerhet: 86% (H?Y) 00:14:50.200 --> 00:15:00.100 To say this a bit differently again, if the confidence interval excludes the null, which is almost 00:15:00.100 --> 00:15:08.700 always zero, if the confidence interval excludes the null then you are essentially rejecting the null 00:15:08.700 --> 00:15:18.100 hypothesis. The confidence interval is an interval estimate of something, it can be a confidence 00:15:18.100 --> 00:15:20.950 interval for your mean which is the simplest NOTE Treffsikkerhet: 70% (MEDIUM) 00:15:20.950 --> 00:15:28.150 statistic you can get, or it can be a confidence interval for your index of Association, or anything else. 00:15:28.150 --> 00:15:32.600 You can always provide a confidence interval. NOTE Treffsikkerhet: 91% (H?Y) 00:15:33.700 --> 00:15:43.600 This will be equivalent to providing a point estimate, so one value as your main guess or your main 00:15:43.600 --> 00:15:51.849 result of calculation, plus the standard error that is associated with it. Now you know how you can 00:15:51.849 --> 00:15:59.000 obtain a confidence interval based on a point estimate and standard error, so these two are 00:15:59.000 --> 00:16:00.900 equivalent. NOTE Treffsikkerhet: 90% (H?Y) 00:16:01.800 --> 00:16:11.200 The same Alpha level is relevant for both formulations, for the confidence interval the alpha level 00:16:11.200 --> 00:16:18.800 goes into calculating the margin of error via the critical value and for the significance of the 00:16:18.800 --> 00:16:25.000 point estimate it will go into comparing your p-value to this level. NOTE Treffsikkerhet: 90% (H?Y) 00:16:26.000 --> 00:16:34.700 The confidence interval may be a bit easier to understand, not in the sense that it is simpler to 00:16:34.700 --> 00:16:42.600 calculate, but in the sense that it is less likely to be misinterpreted like the p-value. NOTE Treffsikkerhet: 82% (H?Y) 00:16:42.700 --> 00:16:50.900 P values are unfortunately very often misinterpreted, and you should be very careful when you state 00:16:50.900 --> 00:16:53.600 what your P value means. NOTE Treffsikkerhet: 86% (H?Y) 00:16:54.900 --> 00:16:58.700 Here is what the P value means, NOTE Treffsikkerhet: 80% (H?Y) 00:16:58.700 --> 00:17:08.400 it is the probability of a finding or a statistic, an index of Association, like the one you observed 00:17:08.400 --> 00:17:17.900 from your data, or more extreme, if the null hypothesis is true. That's what the p-value is, and the 00:17:17.900 --> 00:17:24.150 p-value makes sense only in the context of the null hypothesis being true. NOTE Treffsikkerhet: 89% (H?Y) 00:17:24.150 --> 00:17:32.000 Here's what the p-value is not, it is not the probability of the null hypothesis, or of the 00:17:32.000 --> 00:17:38.300 alternative hypothesis. It cannot be the probability of the null hypothesis because you assume that 00:17:38.300 --> 00:17:47.500 it's true in order to compute the p-value, and of course the alternative hypothesis usually being the 00:17:47.500 --> 00:17:54.750 opposite of the null also cannot be evaluated probabilistically because it's assumed to be false for NOTE Treffsikkerhet: 74% (MEDIUM) 00:17:54.750 --> 00:17:56.900 computing the p-value. NOTE Treffsikkerhet: 88% (H?Y) 00:17:57.300 --> 00:18:06.100 It is not the probability of the data we observe, the data have been observed, they have a probability 00:18:06.100 --> 00:18:15.050 of one, they are just there. It's not the same to say the probability of my data and the probability 00:18:15.050 --> 00:18:23.100 of getting a result like the one I've got. So it's the probability of getting a data pattern like the 00:18:23.100 --> 00:18:26.800 one you've observe by random sampling. NOTE Treffsikkerhet: 91% (H?Y) 00:18:26.800 --> 00:18:34.750 It's not the probability that we are right or wrong obviously, no one can calculate this probability 00:18:34.750 --> 00:18:41.200 we would love to have that probability, but it is not mathematically possible because it depends on 00:18:41.200 --> 00:18:51.500 many other things. It depends on what other hypotheses are possible or plausible, it depends on how 00:18:51.500 --> 00:18:55.000 frequently things happen in general. NOTE Treffsikkerhet: 91% (H?Y) 00:18:55.000 --> 00:19:00.200 You cannot say anything like that just based on your data, NOTE Treffsikkerhet: 91% (H?Y) 00:19:01.200 --> 00:19:10.300 it is not the probability that your findings, or my findings, occurred by chance, or did not occur by 00:19:10.300 --> 00:19:20.200 chance. No one can tell how your data came about, we don't know why our observations occurred, we 00:19:20.200 --> 00:19:29.400 sample the population this means that we sampled from some distribution, and this entails a certain 00:19:29.400 --> 00:19:31.150 amount of variability. NOTE Treffsikkerhet: 68% (MEDIUM) 00:19:31.150 --> 00:19:39.300 A certain amount of sampling error. We may have been lucky to get a representative sample from the 00:19:39.300 --> 00:19:45.000 distribution, from the population we sampled from, or we may have been unlucky and there is no 00:19:45.000 --> 00:19:51.100 observable difference but can help us distinguish these, it's just a sample. NOTE Treffsikkerhet: 91% (H?Y) 00:19:51.100 --> 00:19:58.400 There's no way to know if your findings occurred by Chance, the only thing you can do is tell How 00:19:58.400 --> 00:20:07.200 likely it is to get findings like the one you got by random sampling from a distribution you know has 00:20:07.200 --> 00:20:09.200 no effect in it. NOTE Treffsikkerhet: 77% (H?Y) 00:20:09.200 --> 00:20:19.000 So you cannot conclude whether or not your result was not due to chance based on your p-value, NOTE Treffsikkerhet: 91% (H?Y) 00:20:19.300 --> 00:20:24.550 and you cannot even say how confident you are. NOTE Treffsikkerhet: 91% (H?Y) 00:20:24.550 --> 00:20:33.200 Your p-value doesn't tell you how certain you can be that your results are not due to chance, it only 00:20:33.200 --> 00:20:40.750 tells you how likely you are to get such results by chance, but nobody knows anything about your 00:20:40.750 --> 00:20:48.500 results. So you can't say anything about this either, this is not what the p-value means. NOTE Treffsikkerhet: 87% (H?Y) 00:20:48.500 --> 00:20:56.650 These are very commonly seen not only in Master's thesis, but In Articles sometimes. NOTE Treffsikkerhet: 83% (H?Y) 00:20:56.650 --> 00:21:04.300 These are typical misunderstandings of the p-value and it is very important that you think hard 00:21:04.300 --> 00:21:09.700 through this idea and you do not misinterpret the p-value. NOTE Treffsikkerhet: 91% (H?Y) 00:21:10.600 --> 00:21:20.100 Finally we should note that this whole deal of the null hypothesis significance testing process is 00:21:20.100 --> 00:21:29.500 very weird, it is a roundabout method going in circles, it's perverse because it tells us something 00:21:29.500 --> 00:21:32.000 that we don't really care about. NOTE Treffsikkerhet: 91% (H?Y) 00:21:32.000 --> 00:21:39.000 It's not what we want to know, we don't really care so much what is the chance of getting something 00:21:39.000 --> 00:21:47.450 by random sampling, so why are we doing that? Well because we can. So pretty much everyone, or almost 00:21:47.450 --> 00:21:53.600 everyone, is using null hypothesis significance testing because it's feasible, it's something we can 00:21:53.600 --> 00:22:00.350 actually calculate that is of relevance even though it's not exactly what we want. NOTE Treffsikkerhet: 91% (H?Y) 00:22:00.350 --> 00:22:08.400 There are actually large movements in statistics calling for going away from null hypothesis 00:22:08.400 --> 00:22:15.199 significance testing because of this perversity, and because of the misinterpretations, and because 00:22:15.199 --> 00:22:17.750 it's not telling us what we want. NOTE Treffsikkerhet: 91% (H?Y) 00:22:17.750 --> 00:22:24.700 These movements all make very good arguments and for the most part they have not been very 00:22:24.700 --> 00:22:32.250 successful except to alert people to the problems, because there is not a very good alternative yet. NOTE Treffsikkerhet: 80% (H?Y) 00:22:32.250 --> 00:22:40.100 There is another kind of Statistics called Bayesian statistics which is gaining ground, but that 00:22:40.100 --> 00:22:47.000 approach has its own problems as well. Although many things that it is an improvement on null 00:22:47.000 --> 00:22:55.000 hypothesis significance testing, who knows maybe it will prevail in the end, but at this time null 00:22:55.000 --> 00:23:02.449 hypothesis significance testing remains pervasive, so we must be familiar with it and must be NOTE Treffsikkerhet: 91% (H?Y) 00:23:02.449 --> 00:23:07.700 to avoid the misinterpretations that often accompany it. NOTE Treffsikkerhet: 89% (H?Y) 00:23:10.000 --> 00:23:19.700 So what we can do and what this process allows us to do is to calculate a probability based on an 00:23:19.700 --> 00:23:26.700 assumed distribution, and that is the distribution based on the null hypothesis because this is 00:23:26.700 --> 00:23:32.550 sufficiently concrete and well-defined that we can actually calculate something. NOTE Treffsikkerhet: 91% (H?Y) 00:23:32.550 --> 00:23:40.700 Unfortunately it's not possible to calculate probabilities based on unknown distributions, the thing 00:23:40.700 --> 00:23:48.100 here is we don't really know, no one really knows, what the true distribution in your population is, if 00:23:48.100 --> 00:23:51.300 we knew that we would not need to sample. NOTE Treffsikkerhet: 91% (H?Y) 00:23:51.300 --> 00:23:58.650 So if we don't know what the actual distribution is then there's nothing we can calculate from it, NOTE Treffsikkerhet: 91% (H?Y) 00:23:58.650 --> 00:24:04.600 that's why we have to resort to the crazy idea of the null hypothesis. NOTE Treffsikkerhet: 91% (H?Y) 00:24:04.600 --> 00:24:13.700 And we cannot calculate probabilities of hypotheses Based on data, we can't say How likely your idea 00:24:13.700 --> 00:24:21.700 is based on your observations because we don't know other things about your idea and its General 00:24:21.700 --> 00:24:27.200 plausibility, and what other Alternatives there may exist. NOTE Treffsikkerhet: 91% (H?Y) 00:24:27.700 --> 00:24:37.000 And that's why we always posit a null hypothesis and what we do by positing this is hoping to be 00:24:37.000 --> 00:24:46.000 able to reject this hypothesis by comparing the index obtained from our data to this Criterion for 00:24:46.000 --> 00:24:53.500 rejection that we set via the alpha level. So the null hypothesis that we state is essentially a 00:24:53.500 --> 00:24:58.000 hypothesis we seek to reject and the question is; NOTE Treffsikkerhet: 82% (H?Y) 00:24:58.000 --> 00:25:03.900 will our data allow us to reject this hypothesis or not ?