DRMacIver's Notebook
World counting as a tool for understanding probability
World counting as a tool for understanding probability
There are a number of basic conditional probability problems that seem counter-intuitive but become intuitive with an extremely basic technique.So basic that it’s almost embarrassing to describe it as a technique to be honest, but then I see people getting confused on problems trivially solvable by it and think it probably warrants that label after all. I thought it would be worth explaining this technique, as I think it’s both a good tool to have in your toolkit explicitly, and is potentially a good route in for teaching people to reason about probability.
The technique is “world counting”. You enumerate all the possible versions of the world, cross out any ones that have been ruled out by evidence, and then work out what fraction of the worlds remaining have the property of interest.
For example, say I roll two dice, and tell you only that I rolled a sum of 8 or more. What is the probability that at least one of those rolls was a 6?
First, we enumerate all the possible rolls: 1/1, 1/2, …, 2/1, …, 6/6.
Now, we cut out all of the ones where the sum is less than 8, and get the following combinations remaining: 2/6, 3/5, 3/6, 4/4, 4/5, 4/6, 5/3, 5/4, 5/5, 5/6, 6/2, 6/3, 6/4, 6/5, 6/6
That gives us 15 worlds, of which 9 have a 6 in them, so the probability of a 6 is \(\frac{9}{15} = 0.6\).
This works well with the dice example, but we run into a wrinkle with a slightly more complicated one.
Consider the following problem: I flip a coin. If it shows heads, I flip it again. If it shows tails, I don’t.
This gives us three worlds: HH, HT, T. Two where I got heads the first time, one where I got tails the second time.
Now, what is the probability of getting heads on the first toss? It’s half, obviously, but the world enumeration approach as I’ve described it so far gives \(\frac{2}{3}\) because there are twice as many worlds with heads as tails. This is clearly wrong.
One way to fix this is to imagine that we flip the second coin regardless but that flip is only “visible” if we got heads on the first flip. This gives us the worlds HH, HT, TH, TT, where to the observer TH and TT look identical, and we recover the right probability (\(\frac{1}{2}\)). For the purpose of getting the right answer, this is a perfectly valid approach and can sometimes be very useful, but it doesn’t help that well for understanding what’s going on and why it wasn’t valid to treat the HH, HT, and T worlds as equally likely.
The real problem here is that the world counting method I described only works for situations where every world is equally likely, and in this case that’s not true. The T world is twice as likely as either the HT or the HH worlds.
So, let’s generalise the technique slightly: A weighted world enumeration is a non-empty list of possible worlds together with some non-zero weight associated with each of them. In order to calculate the probability of something, you add up the weight of the worlds in which it’s true, and divide by the total weight of the worlds.
Now, the question is… how do you get weighted world enumerations?
You build them as follows:
- You start from a basic enumeration which is just a single world in which anything could happen, and some arbitrary weight (it doesn’t matter what. Conventionally it would be 1, but you can pick this for convenience)
- You can at any point split a world into one or more subworlds which are mutually incompatible. e.g. if I flip a coin, you could split the world into the subworlds where it comes up heads and the subworlds that it comes up tails. The weights you assign to those subworlds can be whatever you like, as long as they add up to the weight of the original world. If you think of the subworlds as equally likely you should split the weight between them evenly.Note that you’re allowed to throw away possibilities when you do this split. e.g. if you believe that something is a necessary consequence of one of your worlds, you can replace it with some more specialised condition without changing the weights.
- To get a conditional weighted world enumeration (i.e. the world enumeration you want after observing something that rules out some of the worlds), you just remove everything from the list that has been ruled out.
So, with our two coins example, we start with a world weighted at (say) 4, we first split it into (H, 2), (T, 2), and then again into (HH, 1), (HT, 1), (T, 2). This gets us the intuitive result that the T world is twice as big as each of the HH and HT worlds (but has the same weight as them combined).
Here’s the problem where I first realised quite how useful this technique is:Paraphrased from this tweet. In the original form it asks whether the probability that there’s a ball under the cup has gone up or down, and this doesn’t actually require you to calculate the probability, because of an interesting general principle: If \(A, B\) are disjoint events with non-zero probabilities of occurring, then necessarily \(P(A | \neg B) > P(A)\).
There are 10 opaque cups face down on a table. I flipped a fair coin, if it was heads I picked 1 cup uniformly at random and placed a ball under it.
You flip the first 9 cups. No ball.
What is the probability that there’s a ball under cup 10?
We can build up our world enumeration as follows:
First, for convenience, pick the initial weight to be 20.
Now, split this into two worlds based on the flip of a coin. Our world enumeration is now (H, 10), (T, 10). Half the world weight goes to each coin flip.
If we picked heads, we now pick a cup at random. This splits the H world into the 10 possible subworlds, one for each cup. So now we have (T, 10), (H1, 1), (H2, 1), …, (H10, 1).
Now, we want our conditional world enumeration, which has determined that there is no ball in the first 9 cups. This removes all the worlds in which we placed a ball there, so now our world enumeration is (T, 10), (H10, 1).
This means that the probability that the ball is in the 10th cup is \(\frac{1}{10 + 1} = \frac{1}{11}\), because the total weight of the worlds remaining is \(11\) and of that weight, \(1\) of it has the ball there.So to answer the original question, it goes up, because the original probability before revealing those balls is \(\frac{1}{20}\).
I think this approach is also useful for reasoning about the dreaded Monty Hall problem:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
We start with our initial world, which we’ll give weight 1 out of convention, and split it three ways based on which door has a car behind it: 1, 2, 3. We believe these to be equally likelyIf we didn’t believe that we could in our head shuffle the doors and relabel them to make this the case., so give them each weight \(\frac{1}{3}\).
Monty then picks a door, 2 or 3. This gives us 6 theoretically possible worlds. 12, 13, 22, 23, 32, 33.
Now, we know that Monty knows which door the car is behind, and thus presumably is deliberately picking a door which doesn’t have a car. This means that the worlds 22 and 33 are actually impossible, because Monte will never pick that door. Thus our actual list is 12, 13, 23, 32.
The worlds in which it’s an advantage to us to switch are 23 and 32, so to calculate how likely it is for switching to be good we need to know their total weight. We don’t know the weights of 12, 13, but we do know that the total weight stayed unchanged (as we’ve not removed any possible worlds, only impossible ones that we’d have had to assign weight zero to), so the total weight is \(1\) and the probability that switching is good for us is the sum of the weights of worlds 23 and 32, which is \(\frac{2}{3}\). So switching is good most of the time.
It’s interesting to compare this to the version in which Monty doesn’t know which door has the car behind it and picks randomly (with you just losing immediately if he picks a car). We now have all six of our possible worlds 12, 13, 22, 23, 32, 33, with each of these having an equal weight of \(\frac{1}{6}\). Now, we’ve observed a goat, which cuts out the worlds 22 and 33, so now we’ve got worlds 12, 13, 23, 32. But this time all of these have equal weight, so switching is to our advantage with probabilityh \(\frac{1}{2}\), and there is no benefit (or disadvantage) to switching.
You can extend this technique to general Bayesian reasoning, which is essentially a modification where rather than cutting out worlds that are impossible you reweight them based on how likely they are to show the relevant data, but that’s a topic for another time.