• Title: Discrete vs. Continuous Case

  • Series: Probability Theory

  • YouTube-Title: Probability Theory 3 | Discrete vs. Continuous Case

  • Bright video: https://youtu.be/XZHi51f-B9A

  • Dark video: https://youtu.be/ogqDWFrk6hg

  • Quiz: Test your knowledge

  • PDF: Download PDF version of the bright video

  • Dark-PDF: Download PDF version of the dark video

  • Print-PDF: Download printable PDF version

  • Thumbnail (bright): Download PNG

  • Thumbnail (dark): Download PNG

  • Subtitle on GitHub: pt03_sub_eng.srt

  • Timestamps

    00:00 Intro

    00:48 Introduction of cases

    02:17 Sample Space (discrete case)

    02:33 Sample Space (continuous case)

    03:14 Sigma algebra (discrete case)

    03:36 Sigma algebra (continuous case)

    03:59 Probability measure (discrete case)

    05:41 Probability measure (continuous case)

    07:46 Example (discrete case)

    08:44 Example (continuous case)

    10:38 Outro

    10:57 Endcard

  • Subtitle in English

    1 00:00:00,314 –> 00:00:03,586 Hello and welcome back to probability theory

    2 00:00:04,500 –> 00:00:08,114 and first i want to thank all the nice supporters on Steady and Paypal.

    3 00:00:08,714 –> 00:00:14,543 and in todays part 3 we will talk about two important cases that can occur for stochastic problems.

    4 00:00:15,200 –> 00:00:21,740 For this please recall we have already introduced a lot of notions and we started with a general sample space omega.

    5 00:00:22,571 –> 00:00:27,772 Then we can look at chosen subsets in this omega which form a sigma algebra

    6 00:00:28,443 –> 00:00:31,474 and exactly these subsets we call events.

    7 00:00:32,186 –> 00:00:36,266 Then in the next step we want to measure the probability of such an event

    8 00:00:36,757 –> 00:00:40,800 and this leads us to a general probability measure we call “P”.

    9 00:00:41,600 –> 00:00:48,371 Now i can tell you it’s possible to deal with these objects in this abstract sense and then we get a general theory.

    10 00:00:48,900 –> 00:00:54,628 However in a lot of applications we find that two special cases are very important here.

    11 00:00:55,214 –> 00:00:59,886 So now we distinguish between the discrete case and the continuous case

    12 00:01:00,600 –> 00:01:05,400 To be more precise i would also speak of the absolutely continuous case.

    13 00:01:06,143 –> 00:01:12,814 Now all the other possibilities we can’t put into these two boxes we will ignore at least for this video.

    14 00:01:13,329 –> 00:01:18,671 Indeed often at the start of probability theory one focuses at discrete problems.

    15 00:01:19,300 –> 00:01:23,029 They are easy to explain, when we only have finitely many outcomes.

    16 00:01:23,743 –> 00:01:29,057 However i would also say that we have a discrete problem when we have infinitely many outcomes

    17 00:01:29,257 –> 00:01:32,700 if they still are countable like the natural numbers.

    18 00:01:33,471 –> 00:01:41,471 For example we could throw a die infinitely many times and we count how many throws we needed to get the first six

    19 00:01:42,057 –> 00:01:46,943 Then we are still in the discrete case, because we can count all the possible outcomes.

    20 00:01:47,567 –> 00:01:54,200 On the other hand in the absolutely continuous case we have infinitely many outcomes, but they are uncountable.

    21 00:01:54,667 –> 00:01:58,733 The typical example here would be a dart board where you throw a dart.

    22 00:01:59,500 –> 00:02:03,083 There all the values in the disk are possible outcomes.

    23 00:02:03,533 –> 00:02:09,999 Okay so that’s the rough idea for the two special cases here and now i would say let’s go into the details.

    24 00:02:10,950 –> 00:02:16,567 For this let’s do a table such that we can compare the discrete case and the absolutely continuous case.

    25 00:02:17,133 –> 00:02:19,617 First let’s start with the sample space omega.

    26 00:02:19,650 –> 00:02:23,017 Which is a finite or countable set in the discrete case.

    27 00:02:23,500 –> 00:02:27,650 For example if you flip a coin. Omega would be a set with two elements.

    28 00:02:28,267 –> 00:02:32,300 Or giving an infinite example. Omega could be the natural numbers.

    29 00:02:33,050 –> 00:02:38,150 On the other hand for our continuous case the sample space omega should be an uncountable set.

    30 00:02:38,717 –> 00:02:41,983 and usually we can choose it as a subset of “R^n”

    31 00:02:43,100 –> 00:02:46,850 To be even more precise omega should be a so called Borel set.

    32 00:02:47,833 –> 00:02:51,467 So it’s an element of the Borel sigma algebra of “R^n”.

    33 00:02:52,033 –> 00:02:56,717 If you don’t know what the borel sigma algebra is don’t worry. I have a whole video about it.

    34 00:02:57,317 –> 00:03:01,617 However at least for the sake of this video it’s not the most important thing to know here.

    35 00:03:02,450 –> 00:03:06,233 Just think of a common example. Omega could be the unit interval.

    36 00:03:06,900 –> 00:03:09,317 I choose it as closed here, but it could also be open.

    37 00:03:10,017 –> 00:03:13,417 Ok, that is what you should know about possible sample spaces.

    38 00:03:14,583 –> 00:03:17,650 Then in the next step let’s talk about the sigma algebras.

    39 00:03:18,583 –> 00:03:22,883 In the discrete case it’s very simple. You can just take the whole power set of omega.

    40 00:03:23,333 –> 00:03:30,200 Of course depending on the problem you could choose a smaller one, but there is no restriction for choosing the largest one, the power set.

    41 00:03:30,850 –> 00:03:35,467 Therefore in the discrete case we don’t have to care about the sigma algebra at all.

    42 00:03:36,217 –> 00:03:41,683 However we really need the notion of a sigma algebra on the right hand side. In the continuous case.

    43 00:03:42,233 –> 00:03:48,533 There in general it’s not possible to choose the power set, but it’s always possible to take the Borel sigma algebra.

    44 00:03:49,400 –> 00:03:53,967 This means that we don’t have all the subsets of omega, but still a lot of them.

    45 00:03:54,550 –> 00:03:59,267 Therefore our probability measure can still give probabilities to a lot of events.

    46 00:03:59,850 –> 00:04:04,517 Speaking of probability measures this is the next thing we want to compare in both cases.

    47 00:04:05,183 –> 00:04:10,783 In the discrete case measuring a singleton, so a set with only one element, is very useful,

    48 00:04:11,433 –> 00:04:18,750

    because if you know these numbers for all lowercase omega in the sample space omega you know the whole probability measure.

    49 00:04:19,733 –> 00:04:23,583 This immediately comes out of the sigma additivity of the probability measure.

    50 00:04:24,550 –> 00:04:28,667 As a reminder it’s this property here we discussed in the last video.

    51 00:04:29,450 –> 00:04:38,117 Now because of this property in the discrete case instead of the probability measure we can equivalently write down a probability mass function.

    52 00:04:38,868 –> 00:04:43,750 Usually one uses a lowercase “p” for this and omega is found in the index.

    53 00:04:44,417 –> 00:04:50,500 Of course in the end it should have the same meaning as this probability, but now this function is our starting point.

    54 00:04:51,200 –> 00:04:54,467 I say function, but usually we write it as a sequence.

    55 00:04:54,900 –> 00:04:59,322 Depending on omega the sample space is a finite sequence or a countable one.

    56 00:05:00,044 –> 00:05:05,383 Now because we want to use this for probabilities we claim that this number is always non-negative.

    57 00:05:06,250 –> 00:05:10,867 and also the series or sum through all omegas should be exactly one.

    58 00:05:11,567 –> 00:05:16,899 If we have such a sequence that fulfills these two properties we call it a probability mass function

    59 00:05:17,517 –> 00:05:20,500 and with this we can define the probability measure.

    60 00:05:21,367 –> 00:05:29,917 For any event “A” we can set P(A) the sum or series over “p_omega” where omega goes through all the elements in “A”.

    61 00:05:30,650 –> 00:05:37,383 There you see the big advantage in the discrete case. We just have countably many numbers involved and also only sums.

    62 00:05:38,367 –> 00:05:41,100 So each probability can be written as such a sum.

    63 00:05:41,817 –> 00:05:45,617 On the other hand that’s is not possible in the absolutely continuous case.

    64 00:05:46,133 –> 00:05:49,609 However for a probability measure there we have something similar,

    65 00:05:50,500 –> 00:05:56,383 because like in the example with the dartboard the probability of a single point is just zero.

    66 00:05:57,083 –> 00:06:01,383 There are just too many points to get a non-zero probability for a single one.

    67 00:06:02,217 –> 00:06:05,633 However we can say something about the density of the probabilities

    68 00:06:06,450 –> 00:06:12,667 or to put it in other words it’s no problem at all to measure the probability of a whole region instead of a single point.

    69 00:06:13,617 –> 00:06:18,767 Now this density function we simply call “f” and it’s defined on the whole sample space omega

    70 00:06:19,383 –> 00:06:24,517 and because we want to measure probabilities we have the same two properties as on the left hand side.

    71 00:06:25,350 –> 00:06:29,067 The first one is at each point we have a non-negative number

    72 00:06:29,533 –> 00:06:35,133 and you see because in this case omega is a subset of “R^n”. We usually use the letter “x”.

    73 00:06:36,017 –> 00:06:38,767 but still here “x” is an element of omega.

    74 00:06:39,550 –> 00:06:44,633 Ok now for the second property we have to translate this sum into a continuous case.

    75 00:06:45,150 –> 00:06:49,900 This means that here we want that the integral of the function “f” is equal to 1.

    76 00:06:50,733 –> 00:06:54,167 It’s the integral where we integrate over the whole domain omega.

    77 00:06:55,083 –> 00:07:01,400 Please note here in our simple example it would be a one-dimensional integral, but in general we have an n-dimensional integral

    78 00:07:02,100 –> 00:07:05,950 but then you see it’s completely similar to the thing we wanted in the discrete case.

    79 00:07:06,683 –> 00:07:12,883 Therefore in the same way the probability measure can be defined, P(A) is the integral where we have the domain “A” .

    80 00:07:13,950 –> 00:07:17,760 Ok and with this you see this is our translation between both cases.

    81 00:07:18,633 –> 00:07:25,533 However to be honest i omitted one technical detail here on the right hand side, because here we have to deal with sigma algebras.

    82 00:07:26,150 –> 00:07:30,550 For this reason this density function “f” here needs to be measurable.

    83 00:07:31,533 –> 00:07:35,050 It’s a property we need such that all the integrals here make sense.

    84 00:07:35,783 –> 00:07:40,217 If you don’t know this term measurable yet, don’t worry we will talk about it later.

    85 00:07:41,083 –> 00:07:45,467 At the moment i would say it’s sufficient that you know that we need some technical detail here.

    86 00:07:46,233 –> 00:07:51,333 Ok, i think that’s enough for the theory. Lets talk about some practical examples.

    87 00:07:52,317 –> 00:07:56,333 In the discrete case let’s look again at the example of throwing one die.

    88 00:07:57,100 –> 00:08:00,133 However maybe this time let’s take an unfair die.

    89 00:08:00,983 –> 00:08:05,083 This means that we have different probabilities in the probability mass function.

    90 00:08:05,683 –> 00:08:10,917 So we can bring different non-negative probabilities in, but they have to sum up to 1.

    91 00:08:11,800 –> 00:08:18,950 For example we could set all the numbers one to five to one tenth and then the probability of getting a six would be one half.

    92 00:08:19,700 –> 00:08:25,550 So maybe as a test let’s calculate the probability of the event that we don’t get a 6.

    93 00:08:26,367 –> 00:08:34,683 Now by definition this would be the sum of omega going from 1 to 5 and there we have our “p_omega”.

    94 00:08:35,400 –> 00:08:43,883 Which in our case is exactly 1 over 10 for all omegas involved here and there you see we get out one half as expected.

    95 00:08:44,950 –> 00:08:48,450 Ok, then on the other side let’s look at a continuous example.

    96 00:08:49,133 –> 00:08:52,633 Ok, maybe let’s take the interval from 0 to 2 here.

    97 00:08:53,350 –> 00:08:56,850 So you see our dart board from before is one-dimensional now.

    98 00:08:57,533 –> 00:09:00,850 So we randomly throw one point into this interval.

    99 00:09:01,567 –> 00:09:05,617 Then the question would be what is now our probability density function “f”?

    100 00:09:06,583 –> 00:09:11,817 Now because we want to have an uniform probability here we need to take a constant function

    101 00:09:12,483 –> 00:09:17,333 and because we want to fulfill these two properties there is only one reasonable choice.

    102 00:09:18,233 –> 00:09:21,150 We take the constant function with value one half.

    103 00:09:21,967 –> 00:09:29,167 So maybe let’s check that the integral property is indeed fulfilled. So we have the integral 0 to 2 and f(x) inside.

    104 00:09:30,083 –> 00:09:35,583 Now we can pull the constant one half outside and then only the length of the interval here remains.

    105 00:09:36,217 –> 00:09:39,100 Which is of course 2 such that we get out 1.

    106 00:09:39,917 –> 00:09:45,233 Ok, then maybe the last question here is can you calculate the probability of a subset “A” here?

    107 00:09:45,950 –> 00:09:50,967 As we know the probability is defined as the integral where we integrate over the set “A” here.

    108 00:09:51,833 –> 00:09:56,883 Doing the same as before. We can pull out the constant and only the simple integral remains.

    109 00:09:58,017 –> 00:10:05,667 Now this one is what we call in general the volume of “A” or in this case, because it’s only one-dimensional we could call it the length of “A”.

    110 00:10:06,700 –> 00:10:09,842 or we could simply say it’s the lebesgue measure of “A”.

    111 00:10:11,383 –> 00:10:16,805 This sounds now more complicated than it really is, because for intervals we can immediately calculate it.

    112 00:10:17,633 –> 00:10:24,763 For example if we have the interval [a, b] we can calculate that this is 0.5*(b - a).

    113 00:10:25,317 –> 00:10:30,784 So you could say in this example we just calculate lengths, but then we have to normalize it.

    114 00:10:31,400 –> 00:10:34,309 Which means we divide by the full length which is 2.

    115 00:10:35,133 –> 00:10:37,142 and indeed then we get a probability.

    116 00:10:38,083 –> 00:10:43,340 Ok, with this i hope you now know how to distinguish between discrete and continuous cases.

    117 00:10:44,250 –> 00:10:48,185 They are indeed the typical examples that occur. Therefore it’s good to know them.

    118 00:10:48,800 –> 00:10:52,479 Then in the next videos we will look at more complicated examples.

    119 00:10:52,983 –> 00:10:56,333 Therefore i hope i see you there and have a nice day. Bye!

  • Quiz Content

    Q1: If $\Omega = \mathbb{N}$, do we have a discrete or continuous case?

    A1: Discrete.

    A2: Continuous.

    A3: Both cases are possible.

    Q2: In the discrete case, what is the common choice for the $\sigma$-algebra?

    A1: $\mathcal{A} = \Omega$

    A2: $\mathcal{A} = { \emptyset, \Omega }$

    A3: $\mathcal{A} =\mathcal{P}(\Omega)$

    Q3: In the (absolutely) continuous, we have a probability density function $f: \Omega \rightarrow \mathbb{R}$. What is not a property of this function?

    A1: $f(x) \geq 0$

    A2: $f(x) \leq 1$

    A3: $\int_\Omega f(x) , dx = 1$

  • Back to overview page