• Title: Markov’s Inequality and Chebyshev’s Inequality

  • Series: Probability Theory

  • YouTube-Title: Probability Theory 26 | Markov’s Inequality and Chebyshev’s Inequality

  • Bright video: https://youtu.be/7oC7fpi2Bsg

  • Dark video: https://youtu.be/gAi1rp4YdpY

  • PDF: Download PDF version of the bright video

  • Dark-PDF: Download PDF version of the dark video

  • Print-PDF: Download printable PDF version

  • Thumbnail (bright): Download PNG

  • Thumbnail (dark): Download PNG

  • Subtitle on GitHub: pt26_sub_eng.srt

  • Timestamps

    00:00 Intro

    00:42 General assumption

    01:11 Markov’s inequality

    02:42 Visualization of Markov’s inequality

    04:32 Proof of Markov’s inequality

    07:10 Chebyshev’s inequality

    09:11 Proof of Chebyshev’s inequality

    11:42 Credits

  • Subtitle in English

    1 00:00:00,399 –> 00:00:05,640 Hello and welcome back to Probability

    2 00:00:03,120 –> 00:00:08,360 Theory the video course where we learn

    3 00:00:05,640 –> 00:00:11,480 how to calculate in probability

    4 00:00:08,360 –> 00:00:14,639 spaces. And in today’s part 26 we will

    5 00:00:11,480 –> 00:00:16,760 talk about two famous inequalities named

    6 00:00:14,639 –> 00:00:19,960 after Markov and

    7 00:00:16,760 –> 00:00:23,160 Chebyshev. Both can be used in applications

    8 00:00:19,960 –> 00:00:25,599 because they hold in a general setting.

    9 00:00:23,160 –> 00:00:27,840 However before we start stating these

    10 00:00:25,599 –> 00:00:29,560 inequalities, I first want to thank all

    11 00:00:27,840 –> 00:00:32,120 the nice people who support the channel

    12 00:00:29,560 –> 00:00:34,719 on Steady, here on YouTube, or via other

    13 00:00:32,120 –> 00:00:37,360 means and please don’t forget supporting

    14 00:00:34,719 –> 00:00:39,559 me gives you benefits: so for example you

    15 00:00:37,360 –> 00:00:41,000 can download additional material with

    16 00:00:39,559 –> 00:00:43,200 the link in the

    17 00:00:41,000 –> 00:00:46,079 description. Okay then let’s start

    18 00:00:43,200 –> 00:00:48,960 formulating these two inequalities for

    19 00:00:46,079 –> 00:00:52,079 a probability space. So we have a sample

    20 00:00:48,960 –> 00:00:53,800 space Omega, a sigma algebra A, and a

    21 00:00:52,079 –> 00:00:56,840 probability measure

    22 00:00:53,800 –> 00:00:59,399 P and there I can already tell you the

    23 00:00:56,840 –> 00:01:02,399 power of Markov’s inequality and Chebyshev’s

    24 00:00:59,399 –> 00:01:06,000 inequality lies in the fact that

    25 00:01:02,399 –> 00:01:08,920 they are valid in any probability space

    26 00:01:06,000 –> 00:01:12,080 and as we will see now they are also not

    27 00:01:08,920 –> 00:01:14,400 hard to prove at all. So let’s start with

    28 00:01:12,080 –> 00:01:15,880 the first one here. Let’s start with

    29 00:01:14,400 –> 00:01:18,600 Markov’s

    30 00:01:15,880 –> 00:01:21,479 inequality. This one holds for any random

    31 00:01:18,600 –> 00:01:24,640 variable defined on the probability

    32 00:01:21,479 –> 00:01:28,079 space this means we consider a map X

    33 00:01:24,640 –> 00:01:30,840 from Omega into R so as always we

    34 00:01:28,079 –> 00:01:33,119 consider real valued random

    35 00:01:30,840 –> 00:01:36,240 variables. However what we actually need

    36 00:01:33,119 –> 00:01:39,280 for Markov’s inequality is a non-negative

    37 00:01:36,240 –> 00:01:42,759 random variable. Therefore what we do is

    38 00:01:39,280 –> 00:01:46,119 to go to the absolute value of X

    39 00:01:42,759 –> 00:01:49,079 because this one now has non-negative

    40 00:01:46,119 –> 00:01:50,439 values. And it satisfies what we call

    41 00:01:49,079 –> 00:01:53,159 Markov’s

    42 00:01:50,439 –> 00:01:56,280 inequality. And in short this one just

    43 00:01:53,159 –> 00:01:59,039 compares a probability with the

    44 00:01:56,280 –> 00:02:01,360 expectation. More precisely, we look at

    45 00:01:59,039 –> 00:02:04,560 the probability that the absolute value

    46 00:02:01,360 –> 00:02:07,560 of X is greater or equal than a given

    47 00:02:04,560 –> 00:02:09,759 Epsilon. So this is the given event which

    48 00:02:07,560 –> 00:02:12,200 we can measure with our probability

    49 00:02:09,759 –> 00:02:15,440 measure. And the probability that comes

    50 00:02:12,200 –> 00:02:17,879 out is less or equal than an

    51 00:02:15,440 –> 00:02:19,920 expectation. Indeed we can write

    52 00:02:17,879 –> 00:02:23,720 expectation of the random variable

    53 00:02:19,920 –> 00:02:26,879 absolute value of X to the power p. And

    54 00:02:23,720 –> 00:02:30,160 then we divide that by Epsilon to the

    55 00:02:26,879 –> 00:02:32,879 power p as well. So this is Markov’s

    56 00:02:30,160 –> 00:02:36,680 inequality and it holds no matter which

    57 00:02:32,879 –> 00:02:38,640 positive Epsilon or p we choose. In this

    58 00:02:36,680 –> 00:02:41,440 sense it’s very flexible and in

    59 00:02:38,640 –> 00:02:44,680 particular it also holds for p is equal

    60 00:02:41,440 –> 00:02:47,480 to 1. And exactly this case we can

    61 00:02:44,680 –> 00:02:50,319 immediately visualize. This is not

    62 00:02:47,480 –> 00:02:53,360 complicated at all: let’s sketch a graph

    63 00:02:50,319 –> 00:02:55,239 for our random variable X. Or as you

    64 00:02:53,360 –> 00:02:57,400 already know we actually want to have

    65 00:02:55,239 –> 00:03:00,560 the non-negative random variable

    66 00:02:57,400 –> 00:03:03,599 absolute value of X. Hence here the

    67 00:03:00,560 –> 00:03:04,680 x-axis just represents the whole sample

    68 00:03:03,599 –> 00:03:08,040 space

    69 00:03:04,680 –> 00:03:11,360 Omega. And then the values of X just give

    70 00:03:08,040 –> 00:03:14,840 us a graph in this picture. And now what

    71 00:03:11,360 –> 00:03:17,920 we can do is just fix an Epsilon here on

    72 00:03:14,840 –> 00:03:20,720 the Y-axis. And then we immediately

    73 00:03:17,920 –> 00:03:23,879 recognize the samples where the absolute

    74 00:03:20,720 –> 00:03:26,879 value of X is greater or equal than

    75 00:03:23,879 –> 00:03:30,159 Epsilon. More precisely we just find them

    76 00:03:26,879 –> 00:03:32,480 here and there. So in the picture we can

    77 00:03:30,159 –> 00:03:36,200 just project them back to the sample

    78 00:03:32,480 –> 00:03:39,159 space Omega. Hence the subset in Omega we

    79 00:03:36,200 –> 00:03:42,280 want to measure is this one combined

    80 00:03:39,159 –> 00:03:44,120 with that one. In other words by

    81 00:03:42,280 –> 00:03:46,760 measuring that with our probability

    82 00:03:44,120 –> 00:03:50,480 measure P we already have the left hand

    83 00:03:46,760 –> 00:03:53,480 side here. And now if you multiply this

    84 00:03:50,480 –> 00:03:56,680 by the value Epsilon, we get out the two

    85 00:03:53,480 –> 00:04:00,519 rectangles here. So you can say what we

    86 00:03:56,680 –> 00:04:03,120 have is these two areas added.

    87 00:04:00,519 –> 00:04:05,959 However on the other hand you know that

    88 00:04:03,120 –> 00:04:08,959 the expectation of a random variable is

    89 00:04:05,959 –> 00:04:11,319 given by the whole integral. Hence

    90 00:04:08,959 –> 00:04:13,959 obviously the area given by the whole

    91 00:04:11,319 –> 00:04:16,680 integral is bigger than just the two

    92 00:04:13,959 –> 00:04:19,040 rectangles here. Or in general we

    93 00:04:16,680 –> 00:04:22,600 immediately have the inequality with

    94 00:04:19,040 –> 00:04:25,400 less or equal here. So it’s easy to see

    95 00:04:22,600 –> 00:04:29,120 and this is the whole Markov’s inequality

    96 00:04:25,400 –> 00:04:32,240 for the case p is equal to 1. And in fact

    97 00:04:29,120 –> 00:04:35,120 this idea here already covers the whole

    98 00:04:32,240 –> 00:04:38,680 proof. The only thing we have to do now

    99 00:04:35,120 –> 00:04:40,600 is to extend it to any p. This means here

    100 00:04:38,680 –> 00:04:42,960 in the picture instead of the absolute

    101 00:04:40,600 –> 00:04:46,039 value of X we consider the absolute

    102 00:04:42,960 –> 00:04:48,280 value of X to the power p. This does not

    103 00:04:46,039 –> 00:04:52,240 change so much because the set we

    104 00:04:48,280 –> 00:04:54,320 calculate here is the same for any p. So

    105 00:04:52,240 –> 00:04:57,160 if we have a lowercase omega which

    106 00:04:54,320 –> 00:05:00,080 satisfies the one inequality, it also

    107 00:04:57,160 –> 00:05:02,680 satisfies the other one. Simply because

    108 00:05:00,080 –> 00:05:05,639 the power of p is a monotonically

    109 00:05:02,680 –> 00:05:08,240 increasing function. Therefore the

    110 00:05:05,639 –> 00:05:11,680 measured probability here is also

    111 00:05:08,240 –> 00:05:14,360 exactly the same. And now as before we

    112 00:05:11,680 –> 00:05:16,680 want to multiply the whole thing by the

    113 00:05:14,360 –> 00:05:19,440 value we have but now the value is

    114 00:05:16,680 –> 00:05:23,360 Epsilon to the power p. So let’s simply

    115 00:05:19,440 –> 00:05:25,479 put Epsilon to the power p to both sides.

    116 00:05:23,360 –> 00:05:27,800 And now here on the right hand side we

    117 00:05:25,479 –> 00:05:30,840 can do exactly the same estimate as

    118 00:05:27,800 –> 00:05:33,560 before in the picture but maybe as an

    119 00:05:30,840 –> 00:05:36,440 explanation we can put in one more step

    120 00:05:33,560 –> 00:05:39,400 in between namely we introduce the

    121 00:05:36,440 –> 00:05:42,199 indicator function. This one you know we

    122 00:05:39,400 –> 00:05:45,319 write it with a bold one where in the

    123 00:05:42,199 –> 00:05:48,440 index we find the corresponding set. And

    124 00:05:45,319 –> 00:05:51,720 here we have the set where X to power p is

    125 00:05:48,440 –> 00:05:54,319 greater or equal than Epsilon to the power p

    126 00:05:51,720 –> 00:05:55,840 and for this indicator function we can

    127 00:05:54,319 –> 00:05:58,199 calculate the

    128 00:05:55,840 –> 00:06:00,560 expectation, which is exactly the

    129 00:05:58,199 –> 00:06:03,840 probability here on on the left hand

    130 00:06:00,560 –> 00:06:06,400 side. So we still have equalities here

    131 00:06:03,840 –> 00:06:09,479 and in the next step we can pull in the

    132 00:06:06,400 –> 00:06:12,120 factor Epsilon to the power p which

    133 00:06:09,479 –> 00:06:15,039 means now we have an expectation of a

    134 00:06:12,120 –> 00:06:17,599 step function. And there you can look

    135 00:06:15,039 –> 00:06:20,680 back to the picture where we exactly

    136 00:06:17,599 –> 00:06:23,800 have this function there. So the graph

    137 00:06:20,680 –> 00:06:27,039 has a line here and there and otherwise

    138 00:06:23,800 –> 00:06:28,960 it’s at zero. However the crucial part

    139 00:06:27,039 –> 00:06:31,960 here is that the whole graph of the

    140 00:06:28,960 –> 00:06:35,199 function lies below the graph of the

    141 00:06:31,960 –> 00:06:38,000 function X to the power p hence the

    142 00:06:35,199 –> 00:06:41,440 monotonicity of the integral or the

    143 00:06:38,000 –> 00:06:44,080 expectation tells us that we have the

    144 00:06:41,440 –> 00:06:47,080 inequality. So in the calculation here we

    145 00:06:44,080 –> 00:06:49,800 get that we have less or equal than the

    146 00:06:47,080 –> 00:06:52,840 expectation of the absolute value of X

    147 00:06:49,800 –> 00:06:55,039 to the power p. And now if you compare

    148 00:06:52,840 –> 00:06:57,240 this to the left hand side you see that

    149 00:06:55,039 –> 00:06:59,639 we have proven Markov’s

    150 00:06:57,240 –> 00:07:01,919 inequality so you see that was very

    151 00:06:59,639 –> 00:07:05,120 quick and the whole idea was in the

    152 00:07:01,919 –> 00:07:07,919 picture above. In other words what we see

    153 00:07:05,120 –> 00:07:10,759 it’s a very rough estimate in general

    154 00:07:07,919 –> 00:07:13,080 but we can use it whenever we want and

    155 00:07:10,759 –> 00:07:16,120 the first application we have is to use

    156 00:07:13,080 –> 00:07:18,560 it to prove the famous Chebyshev’s

    157 00:07:16,120 –> 00:07:21,840 inequality. This one can be used to

    158 00:07:18,560 –> 00:07:23,879 estimate how likely it is to be off from

    159 00:07:21,840 –> 00:07:26,840 the expectation of a given random

    160 00:07:23,879 –> 00:07:29,280 variable. And again the power of this

    161 00:07:26,840 –> 00:07:32,000 lies in the general setting you can use

    162 00:07:29,280 –> 00:07:34,599 this estimate for any random variable

    163 00:07:32,000 –> 00:07:36,280 where the expectation is defined.

    164 00:07:34,599 –> 00:07:38,440 Therefore this is the only assumption we

    165 00:07:36,280 –> 00:07:42,560 have here we have a chosen random

    166 00:07:38,440 –> 00:07:45,319 variable X and there we want that E of X

    167 00:07:42,560 –> 00:07:47,039 exists or as you know you can formulate

    168 00:07:45,319 –> 00:07:50,759 that that the expectation of the

    169 00:07:47,039 –> 00:07:52,520 absolute value of X is finite. Indeed

    170 00:07:50,759 –> 00:07:54,720 here we need an expectation for the

    171 00:07:52,520 –> 00:07:57,599 random variable X because otherwise the

    172 00:07:54,720 –> 00:08:00,360 formulation would not make sense. And as

    173 00:07:57,599 –> 00:08:02,879 already said in the formulation we look

    174 00:08:00,360 –> 00:08:06,159 at the set where we deviate from the

    175 00:08:02,879 –> 00:08:09,240 expectation of X. More precisely we want

    176 00:08:06,159 –> 00:08:12,680 to see how likely it is to be off by a

    177 00:08:09,240 –> 00:08:15,080 given Epsilon. In other words these are

    178 00:08:12,680 –> 00:08:16,879 the points in the sample space Omega

    179 00:08:15,080 –> 00:08:19,840 where we are not in an Epsilon

    180 00:08:16,879 –> 00:08:23,759 neighbourhood of the expectation for the

    181 00:08:19,840 –> 00:08:25,800 values of X and exactly this Subspace in

    182 00:08:23,759 –> 00:08:29,080 Omega we can measure with the

    183 00:08:25,800 –> 00:08:31,599 probability measure and now the variance

    184 00:08:29,080 –> 00:08:33,760 of the the random variable X gives us an

    185 00:08:31,599 –> 00:08:36,519 upper bound for this

    186 00:08:33,760 –> 00:08:41,599 probability. More concretely we have the

    187 00:08:36,519 –> 00:08:44,560 variance of X divided by Epsilon squared. Moreover you

    188 00:08:41,599 –> 00:08:48,399 immediately see that this inequality is

    189 00:08:44,560 –> 00:08:51,640 only useful if the variance of X is also

    190 00:08:48,399 –> 00:08:54,720 a finite number. But if we have that then

    191 00:08:51,640 –> 00:08:56,800 the variance here gives a nice estimate

    192 00:08:54,720 –> 00:08:59,920 for the probability that a random

    193 00:08:56,800 –> 00:09:02,360 variable deviates from its mean. And

    194 00:08:59,920 –> 00:09:05,959 indeed it does not matter which Epsilon

    195 00:09:02,360 –> 00:09:08,240 greater than zero you choose here. Okay

    196 00:09:05,959 –> 00:09:11,360 now with the statement in mind I would

    197 00:09:08,240 –> 00:09:14,040 say we can try to write down a proof for

    198 00:09:11,360 –> 00:09:16,560 it. And there what we can immediately do

    199 00:09:14,040 –> 00:09:18,600 to simplify the whole thing is to

    200 00:09:16,560 –> 00:09:22,079 consider a random variable where the

    201 00:09:18,600 –> 00:09:24,720 expectation is zero. So in other words we

    202 00:09:22,079 –> 00:09:27,920 just consider X tilde which is given as

    203 00:09:24,720 –> 00:09:29,880 X minus the expectation. This is really

    204 00:09:27,920 –> 00:09:33,680 helpful because it means that the

    205 00:09:29,880 –> 00:09:37,839 variance of X tilde is just given as the

    206 00:09:33,680 –> 00:09:39,720 expectation of X tilde squared. So here you see we

    207 00:09:37,839 –> 00:09:41,399 need the formulas we have for

    208 00:09:39,720 –> 00:09:44,040 calculating

    209 00:09:41,399 –> 00:09:47,120 variances. In particular you should see

    210 00:09:44,040 –> 00:09:50,079 that the variance of X and X tilde are

    211 00:09:47,120 –> 00:09:53,079 exactly the same. This is simply because

    212 00:09:50,079 –> 00:09:56,640 the variance as a function is additive

    213 00:09:53,079 –> 00:09:59,440 and the variance of a constant is always

    214 00:09:56,640 –> 00:10:02,320 zero. Okay so this is the whole idea to

    215 00:09:59,440 –> 00:10:05,000 simplify the inequality and now we can

    216 00:10:02,320 –> 00:10:07,600 write it down. So the left hand side here

    217 00:10:05,000 –> 00:10:10,320 is already very simple because we just

    218 00:10:07,600 –> 00:10:13,800 have the probability where absolute

    219 00:10:10,320 –> 00:10:16,120 value of X tilde is involved. But there you

    220 00:10:13,800 –> 00:10:19,360 should immediately see that this looks

    221 00:10:16,120 –> 00:10:21,680 exactly like the statement in Markov’s

    222 00:10:19,360 –> 00:10:24,839 inequality and since we have the power

    223 00:10:21,680 –> 00:10:27,839 of two here involved in the variance, we

    224 00:10:24,839 –> 00:10:30,920 should use Markov’s inequality for p is

    225 00:10:27,839 –> 00:10:33,440 equal to 2. And there you see this is the

    226 00:10:30,920 –> 00:10:37,680 reason why we have formulated Markov’s

    227 00:10:33,440 –> 00:10:39,839 inequality for all values p. Okay and now

    228 00:10:37,680 –> 00:10:41,600 on the right hand side here you see we

    229 00:10:39,839 –> 00:10:45,279 get a fraction where we have the

    230 00:10:41,600 –> 00:10:47,760 expectation of X tilde squared and Epsilon

    231 00:10:45,279 –> 00:10:50,000 squared in the denominator. And now

    232 00:10:47,760 –> 00:10:52,279 please note for the square it does not

    233 00:10:50,000 –> 00:10:55,200 matter if we have the absolute value or

    234 00:10:52,279 –> 00:10:58,360 not. In other words what we have in the

    235 00:10:55,200 –> 00:11:01,560 numerator here is just the variance of

    236 00:10:58,360 –> 00:11:04,000 our random variable X. So exactly what

    237 00:11:01,560 –> 00:11:07,279 Chebyshev’s inequality

    238 00:11:04,000 –> 00:11:10,959 stated. Therefore the proof here is

    239 00:11:07,279 –> 00:11:14,240 already finished. So our general result

    240 00:11:10,959 –> 00:11:17,480 here is that we have two inequalities

    241 00:11:14,240 –> 00:11:19,839 which hold in a very general context

    242 00:11:17,480 –> 00:11:22,959 this means you can use this estimate

    243 00:11:19,839 –> 00:11:25,120 here for a lot of probability

    244 00:11:22,959 –> 00:11:27,880 distributions. Therefore we can use it

    245 00:11:25,120 –> 00:11:30,160 for proofs in general statements but

    246 00:11:27,880 –> 00:11:32,360 also in applications

    247 00:11:30,160 –> 00:11:35,920 and I would say we first should look at

    248 00:11:32,360 –> 00:11:38,680 some concrete examples. So let’s do that

    249 00:11:35,920 –> 00:11:42,020 with the next video. So really hope we

    250 00:11:38,680 –> 00:11:48,029 meet there again and have a nice day

  • Quiz Content (n/a)
  • Back to overview page