WEBVTT Kind: captions; language: en-us
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:00.000 --> 00:00:08.600
In this video we will go through some preliminary remarks regarding probability, in particular we
00:00:08.600 --> 00:00:16.400
will consider the question: what is probability, and why should we care ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:00:16.400 --> 00:00:24.100
Let's start with the second question: why is probability important to understand?
NOTE
Treffsikkerhet: 88% (H?Y)
00:00:24.300 --> 00:00:33.300
First of all probability is how we deal with variability. As we've said before nothing, is certain
00:00:33.300 --> 00:00:41.800
things are variable, we can never be completely sure. But not everything is equally likely, some things
00:00:41.800 --> 00:00:48.599
are likely and expected and some other uncertain things are less likely and unexpected.
NOTE
Treffsikkerhet: 89% (H?Y)
00:00:48.599 --> 00:00:56.500
And we need to be able to handle this variability so that we make good decisions based on justified
00:00:56.500 --> 00:01:03.000
expectations. When are our expectations justified? When they reflect the actual probability that
00:01:03.000 --> 00:01:05.099
something will happen.
NOTE
Treffsikkerhet: 90% (H?Y)
00:01:05.700 --> 00:01:13.050
More specifically in a research context there is the issue of variability coming from sampling.
00:01:13.050 --> 00:01:20.900
Sampling means, because we can't measure everyone we're interested in, we have to sample and measure
00:01:20.900 --> 00:01:29.700
some people or some instances or some cases. And we want to be able to draw conclusions from our
00:01:29.700 --> 00:01:35.950
sample that will be valid for the whole population, that is we want our
NOTE
Treffsikkerhet: 78% (H?Y)
00:01:35.950 --> 00:01:40.300
has to be valid for all those we didn't happen to sample.
NOTE
Treffsikkerhet: 85% (H?Y)
00:01:40.300 --> 00:01:49.150
Would we get the same result if we had used another sample ? If we run a study with 20 or 50 people
00:01:49.150 --> 00:01:57.600
and find something, would we have gotten the same finding with a different group of 20 or 50 people ? This
00:01:57.600 --> 00:02:03.000
is very important in order to be able to make the right decision based on our study.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:03.000 --> 00:02:12.200
So in general we need to understand the variability of sampling, he variability that comes from sampling,
00:02:12.200 --> 00:02:18.500
what it implies for our conclusions, the implications of this sampling variability which are
00:02:18.500 --> 00:02:21.500
inherently probabilistic.
NOTE
Treffsikkerhet: 81% (H?Y)
00:02:22.700 --> 00:02:30.400
Now what exactly is probability ? There are different philosophical approaches to this question and
00:02:30.400 --> 00:02:36.800
there are formal definitions, and we're not going to go into any of that. The question for us is
00:02:36.800 --> 00:02:44.200
how to think about probability so that we have reasonable intuitions and our thinking makes sense.
NOTE
Treffsikkerhet: 91% (H?Y)
00:02:44.200 --> 00:02:54.600
So the basic idea is that probability refers to how likely something is to happe,n but this is not
00:02:54.600 --> 00:02:56.700
very easy to think about.
NOTE
Treffsikkerhet: 83% (H?Y)
00:02:56.700 --> 00:03:05.700
So the easier and equivalent way of thinking about it is how frequently something happens. So if
00:03:05.700 --> 00:03:11.800
there were many opportunities for an event occur, how many of these would actually lead to the
00:03:11.800 --> 00:03:14.300
occurrence of the event ?
NOTE
Treffsikkerhet: 85% (H?Y)
00:03:14.800 --> 00:03:23.100
High probability means that something happens often, so it's a frequent occurrence. This does not mean
00:03:23.100 --> 00:03:32.400
that it's certain, we can never be certain, but we can expect it to happen so a high probability event
00:03:32.400 --> 00:03:36.700
is something that doesn't surprise us when it happens.
NOTE
Treffsikkerhet: 91% (H?Y)
00:03:36.700 --> 00:03:44.000
What about low probability? Well that's the opposite, it's something that doesn't happen very often so
00:03:44.000 --> 00:03:51.900
it's an infrequent event, that does not mean that it's impossible it actually will happen, but not
00:03:51.900 --> 00:03:59.600
very frequently. So a low probability event is something unexpected, but tends to surprise us when it
00:03:59.600 --> 00:04:06.550
does happenm it doesn't surprises that it happens at allm but it surprises us
NOTE
Treffsikkerhet: 73% (MEDIUM)
00:04:06.550 --> 00:04:16.600
more than a high probability event because it's expected less. And the question then arises; when
00:04:16.600 --> 00:04:24.400
exactly are low probability events expected ? They're not zero probability so they will happen at some
00:04:24.400 --> 00:04:27.900
point, can we say something more about that ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:27.900 --> 00:04:35.300
Let's ask this kind of question in a little more concrete context it's not a special needs education
00:04:35.300 --> 00:04:42.200
context but it's something that's very easy to understand and think about and then we'll go back to
00:04:42.200 --> 00:04:45.900
more relevant examples later in the course.
NOTE
Treffsikkerhet: 88% (H?Y)
00:04:46.300 --> 00:04:52.000
So the easy question is: why don't I win the lottery ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:04:52.200 --> 00:04:57.350
And there is also another question if everybody
NOTE
Treffsikkerhet: 82% (H?Y)
00:04:57.350 --> 00:05:05.200
can ask themselves why don't I win the lottery then how come anyone wins the lottery if it's so
00:05:05.200 --> 00:05:10.900
unlikely for anyone of us why is it likely for someone ?
NOTE
Treffsikkerhet: 89% (H?Y)
00:05:11.300 --> 00:05:16.000
So you probably familiar with this type of Lottery
NOTE
Treffsikkerhet: 90% (H?Y)
00:05:16.000 --> 00:05:25.950
in this particular case there are 34 numbers and you are supposed to pick seven numbers
NOTE
Treffsikkerhet: 85% (H?Y)
00:05:25.950 --> 00:05:34.300
and note these seven numbers, and then there is some sort of draw that is a completely random event
00:05:34.300 --> 00:05:44.900
there is some device that ensures that seven out of 34 numbers are selected. So in this Norwegian
00:05:44.900 --> 00:05:52.600
case and also in many other countries there is a rotating device that uses air or something else to
00:05:52.600 --> 00:05:56.250
just mix up these numbered balls
NOTE
Treffsikkerhet: 91% (H?Y)
00:05:56.250 --> 00:06:04.400
and some of them are caught and rolled out and then you have the winners. So there are seven numbers
00:06:04.400 --> 00:06:11.500
that are drawn at random, and if you happen to have picked exactly these seven numbers, then you can
00:06:11.500 --> 00:06:13.600
win a lot of money.
NOTE
Treffsikkerhet: 84% (H?Y)
00:06:13.600 --> 00:06:19.200
So that sounds enticing and that's why people actually pay to do this,
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:19.300 --> 00:06:28.000
but as you probably know you're not very likely to win, and why aren't you likely to win ? Well this
00:06:28.000 --> 00:06:33.900
can be answered by thinking of what it takes to have a winning sequence.
NOTE
Treffsikkerhet: 91% (H?Y)
00:06:34.700 --> 00:06:44.400
If you are to win this game this is necessary to happen, so when there are still all 34 balls inside
00:06:44.400 --> 00:06:53.700
that device the one that comes out must be one of the seven you have picked, so the odds of getting
00:06:53.700 --> 00:07:03.200
the first ball correct are seven divided by 34. And after this the following must happen of
NOTE
Treffsikkerhet: 81% (H?Y)
00:07:03.200 --> 00:07:06.800
of the 33 balls that are remaining in there
NOTE
Treffsikkerhet: 81% (H?Y)
00:07:07.000 --> 00:07:15.000
whichever one comes out must be one of the six left that you have guessed, and because you want both of
00:07:15.000 --> 00:07:23.300
these to be true these have to be multiplied. And then there are 32 balls in there and the one that
00:07:23.300 --> 00:07:31.600
comes out must be one of your five remaining choices, and so on and so forth, and the answer is for
00:07:31.600 --> 00:07:35.950
all of this to happen so that you win, that you get the
NOTE
Treffsikkerhet: 82% (H?Y)
00:07:35.950 --> 00:07:45.500
seven balls correct by your single choice, the chances are one in about five and a half million.
NOTE
Treffsikkerhet: 89% (H?Y)
00:07:47.700 --> 00:07:51.450
To put this number in context
NOTE
Treffsikkerhet: 88% (H?Y)
00:07:51.450 --> 00:08:02.500
the chance of being hit by lightning in Norway are estimated to be about 1 in 150 thousand, and if
00:08:02.500 --> 00:08:13.500
you randomly dial completely random numbers on your phone the chance of calling someone you know can
00:08:13.500 --> 00:08:22.550
be estimated to be about one in 167,000 assuming that you know about 600 people on average
NOTE
Treffsikkerhet: 84% (H?Y)
00:08:22.550 --> 00:08:31.800
and this is how many numbers are possible to ring in Norway and I didn't make up these estimates
00:08:31.800 --> 00:08:39.799
there are some published estimates for that. So how likely do you think it is to actually reach
00:08:39.799 --> 00:08:42.600
someone you know by randomly dialing ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:42.600 --> 00:08:51.400
How likely do you think it is to be hit by lightning ? Well winning the lottery is way less likely
00:08:51.400 --> 00:08:56.200
than that, so that's why you don't win the lottery.
NOTE
Treffsikkerhet: 91% (H?Y)
00:08:58.100 --> 00:09:03.950
An important note in this context is that
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:03.950 --> 00:09:11.400
this calculation doesn't take into account which numbers you selected, because it doesn't matter in
00:09:11.400 --> 00:09:20.200
any way, so any possible set of seven numbers has exactly the same probability of winning. This means
00:09:20.200 --> 00:09:28.500
that these numbers are exactly equally likely to win as these numbers.
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:28.500 --> 00:09:37.200
And the reason I'm saying this is because these don't look very likely, these look more likely, these
00:09:37.200 --> 00:09:44.300
look more random, but that makes absolutely no difference any set of seven numbers has exactly the
00:09:44.300 --> 00:09:45.900
same probability.
NOTE
Treffsikkerhet: 91% (H?Y)
00:09:45.900 --> 00:09:54.500
So the probability of winning the lottery, a Lottery of this sort, is best thought of if you are
00:09:54.500 --> 00:10:02.250
thinking of this sequence. You get a more accurate feeling than by thinking about these.
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:02.250 --> 00:10:10.600
Moreover which ball comes out at any given instant does not depend on which one came out before
00:10:10.600 --> 00:10:19.650
either in the same, or in a previous draw. The random choice of numbers is independent and memory less.
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:19.650 --> 00:10:27.000
Regardless of what happened before or what happened in the immediately previous draw, so even if
00:10:27.000 --> 00:10:36.150
these numbers came up last week, this doesn't affect their likelihood of coming up again this week.
NOTE
Treffsikkerhet: 89% (H?Y)
00:10:36.150 --> 00:10:42.800
So this should give you a better sense of how likely it is doing the lottery, although you already
00:10:42.800 --> 00:10:51.400
knew that you are not likely to win it at all. Which brings us to the next question: how come anyone
00:10:51.400 --> 00:10:54.300
wins if it is so unlikely ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:10:54.300 --> 00:11:03.800
To answer this question we have to think of a different approach, so if you were not just calling one
00:11:03.800 --> 00:11:12.800
random number, but if you dialed a million different random numbers so you had a lot of time for many
00:11:12.800 --> 00:11:17.650
years all you did was just dialing random numbers.
NOTE
Treffsikkerhet: 91% (H?Y)
00:11:17.650 --> 00:11:27.250
After 1 million dials what do you expect to have reached someone you know probably, more than one.
NOTE
Treffsikkerhet: 75% (MEDIUM)
00:11:27.250 --> 00:11:34.900
If you played a million different 7 number sets, would you expect to win the lottery ? Wou wouldn't be
00:11:34.900 --> 00:11:38.300
certain, but your chances would be much better
NOTE
Treffsikkerhet: 89% (H?Y)
00:11:38.300 --> 00:11:45.500
if you played all possible 7 number sets then you'd be sure to win, but then of course you would lose
00:11:45.500 --> 00:11:52.200
money instead of winning, it would be too expensive to do that. But think of everything that's between
00:11:52.200 --> 00:11:59.300
selecting just one set of seven numbers and all possible sets of seven numbers. You move from high
00:11:59.300 --> 00:12:03.000
very high uncertainty to near certainty.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:03.000 --> 00:12:08.750
So whether or not you expect to win depends on how many times you try.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:08.750 --> 00:12:11.750
Okay now we're getting somewhere.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:11.750 --> 00:12:18.900
How many times you try, it doesn't have to be you it can be different people, so whether you can
00:12:18.900 --> 00:12:27.000
expect a winning set depends on how many people play. It actually depends on how many 7 number sets are
00:12:27.000 --> 00:12:34.500
chosen, but if everyone chooses just one then the number of players determines how likely it is to
00:12:34.500 --> 00:12:36.400
have a winner.
NOTE
Treffsikkerhet: 91% (H?Y)
00:12:36.500 --> 00:12:47.700
So large numbers, in this case large number of Trials, lead to a predictability of patterns of random
00:12:47.700 --> 00:12:55.500
low chance events. So here we have low chance events that are completely unpredictable, but if you
00:12:55.500 --> 00:13:02.150
have a large number of Trials then you can predict how often you can expect someone to win,
00:13:02.150 --> 00:13:07.000
approximately. You can calculate if you can expect a
NOTE
Treffsikkerhet: 91% (H?Y)
00:13:07.000 --> 00:13:13.900
winner at every draw, if you know how many sets of seven numbers are played before the draw. You can
00:13:13.900 --> 00:13:21.100
never predict who will win,but you can predict how often someone will win. And for the research
00:13:21.100 --> 00:13:28.100
situation that we are interested in, this kind of a probabilistic thinking is very very useful. Because
00:13:28.100 --> 00:13:34.800
we're not interested in individual events we are interested in long-term outcomes of sets of
00:13:34.800 --> 00:13:36.400
events.
NOTE
Treffsikkerhet: 77% (H?Y)
00:13:37.300 --> 00:13:45.400
So what's the point of all this? Is this about gambling well although knowing about probability help
00:13:45.400 --> 00:13:55.000
with gambling but that's not all. Otcomes of random events or any kind of events that are variable, we
00:13:55.000 --> 00:14:02.000
don't control the variability, and we can think of them as being random. So any variability can be
00:14:02.000 --> 00:14:07.000
ascribed to a random process and we treat our variable data as being
NOTE
Treffsikkerhet: 91% (H?Y)
00:14:07.000 --> 00:14:14.600
random in some way, and then we can use ideas from probability to predict what can happen on the
00:14:14.600 --> 00:14:17.000
basis of these data.
NOTE
Treffsikkerhet: 91% (H?Y)
00:14:19.800 --> 00:14:30.349
A bit closer to home if we have a study based on data from a sample, so we study a special population
00:14:30.349 --> 00:14:36.500
and we want to understand a clinical population for example, or we study the effects of an
00:14:36.500 --> 00:14:44.000
intervention, either way we don't study everyone we use a sample of people and we measure something
00:14:44.000 --> 00:14:45.450
about them.
NOTE
Treffsikkerhet: 91% (H?Y)
00:14:45.450 --> 00:14:55.000
So how far from the population value can we expect to be based on our sample ? We'd like to draw a
00:14:55.000 --> 00:15:02.800
conclusion that is good for everyone in the population. including everyone we didn't measure. If we
00:15:02.800 --> 00:15:09.849
only measure a sample, how far from that population value can we expect to be ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:15:09.849 --> 00:15:16.600
And how big a sample do we need in order to not be very far ?
NOTE
Treffsikkerhet: 91% (H?Y)
00:15:16.600 --> 00:15:23.400
This kind of question you can first think of in terms of something else you're familiar with, which
00:15:23.400 --> 00:15:32.800
is polling of voting preferences or voting intentions. So polling companies go and ask people what
00:15:32.800 --> 00:15:39.300
they plan to vote, who they plan to vote for, and then they report the results and give a margin of
00:15:39.300 --> 00:15:46.200
error, they say this is the percentage of votes for each party with a margin of plus or minus for
NOTE
Treffsikkerhet: 53% (MEDIUM)
00:15:46.200 --> 00:15:50.550
example two percentage points, or one or five percentage points.
NOTE
Treffsikkerhet: 88% (H?Y)
00:15:50.550 --> 00:16:00.000
So they give a range within which they expect to be with respect to the actual voting intention for
00:16:00.000 --> 00:16:06.800
the rest of the population. And they can calculate how many people they have to go around and ask in
00:16:06.800 --> 00:16:14.000
order to have a range of their choice. Do they want to be certain to within one percentage point or
00:16:14.000 --> 00:16:17.200
is within five percentage points enough ?
NOTE
Treffsikkerhet: 89% (H?Y)
00:16:17.200 --> 00:16:25.900
So all of this comes from understanding the role of probability in affecting long-term outcomes of
00:16:25.900 --> 00:16:32.800
many events, and sampling is very very important for us because all research is based on sampling.
00:16:32.800 --> 00:16:39.000
It's very rare that we can actually study the whole population that we're interested in.
NOTE
Treffsikkerhet: 88% (H?Y)
00:16:40.600 --> 00:16:50.100
And more specifically its understanding probability that helps us decide if the finding from our
00:16:50.100 --> 00:16:57.900
study can be trusted and should be interpreted in generalized, or it should rather be ignored, or
00:16:57.900 --> 00:17:06.300
augmented with more findings. So this is why probability is very important to understand, not just in
00:17:06.300 --> 00:17:11.300
the context of a statistics course, but in the context of understanding
NOTE
Treffsikkerhet: 91% (H?Y)
00:17:11.300 --> 00:17:15.300
research and practice in Special Needs education.