## A Probability “Paradox” as per the Previous Post

Last time I mentioned the paradoxical-sounding situation in which I said “pick a number at random under some given continuous probability distribution over the reals, and call this number $c$”.  But what was the probability of choosing $c$?  Well, a quick exercise shows that of course it would have to be zero.  In measure theory this can be stated as:  The event $\{c\}$ has Lebesgue (or probability) measure 0.

Ok, so there was no chance that I could have chosen $c$ and yet somehow I did?  Whaaaaaaattt?!?

The answer is a good thing to know, but is also somewhat boring.  There is just an abuse of terminology going on.  We should always be relating things back to measurable events.  So saying “$c$ is between $a$ and $b$ with positive probability” really translates to “the event described by {$x$ : $x$ is between $a$ and $b$} has positive measure.”  And so although (as we will show) we do not lose anything by asserting that we have chosen an explicit $c$ and are comparing it to $a$ and $b$, we are actually describing much larger events and waving our hands that they are the same.

So why is this abuse so prevalent?  Anytime you make such a choice and work with it, such as by comparing $c$ to $a$ and $b$, there are always more detailed events and probability spaces somewhere in the background.  But when you use them explicitly then suddenly the problem becomes much less intuitive and far more difficult.  (This example isn’t even as bad as many are.)  So it is often easier and natural to ignore these “details”.

To make this explicit, below I typed out (somewhat thoroughly) what the very simple argument from last time would look like if you were really trying to make things analytically rigorous (=computationally annoying and boring).  But, it is good for the soul to see this once before you forget it forever.

Let $\mu$ and $\nu$ be two probability distributions on $\mathbb{R},\mathcal{B}_{\mathbb{R}}$ and let $\{X_1, X_2, Y\}$ be independent random variables such that $X \overset{d}{=} X_2 \overset{d}{=} \mu$ and $Y \overset{d}{=} \nu$.  For an explicit model let $\Omega := \mathbb{R}^3, \mathcal{B} = {\mathcal{B}_{\mathbb{R}}}^{\otimes 3} = \mathcal{B}_{\mathbb{R}^3}, P = \mu\otimes\mu\otimes\nu,$ and $(X_1, X_2, Y) : \mathbb{R}^3 \to \mathbb{R}^3$ be the identity map.  Then the event that you win given the algorithm from last time is

$A = \{X_1 > X_2, Y

So the goal is to estimate the probability

$P(A) = P(X_1 > X_2, Y < X_1) + P(X_2 \geq X_1, Y \geq X_1).$

Using the obvious fact that $(X_1, X_2, Y) \overset{d}{=} (X_1, X_2, Y)$ we also have,

$P(A) = P(X_1 > X_2, Y

Now assuming that $\mu$ has a continuous distribution and using Fubini’s theorem we find,

${P(A) = \int_{x_1 > x_2}\nu(Y < x_1)d\mu(x_1)d\mu(x_2) + \int_{x_1\geq x_2}\nu(Y\geq x_2)d\mu(x_1)d\mu(x_2)}\\ {\hspace{11mm}=\int_{x_1>x_2}[\nu(Yx_2}[\nu(Yx_2}[1 + \nu(x_2 \leq Y < x_1)]d\mu(x_1)d\mu(x_2)}\\ {\hspace{11mm}=\int_{x_1>x_2}[1 + \nu([x_2,x_1))]d\mu(x_1)d\mu(x_2)}.$

Finally, if we assume that $\nu([a,b)) > 0$, which we did by our choice of probability distribution (e.g. the normal distribution works), then for all real and distinct $a$ and $b$, we find that

$P(A) > \int_{x_1>x_2}1d\mu(x_1)d\mu(x_2) = \frac{1}{2}.$

$\square$

So there you have it.  Math lives another day.  It is consistent and even much better to sweep these details under the rug.  So for these standard abuses, just go ahead and do’em without thinkin and yall’ll be juuuust fine.

I am a fifth year mathematics Ph.D. student at UCSD.
This entry was posted in Algorithms, Probability. Bookmark the permalink.