DRMacIver's Notebook

Algebra and insight

Algebra and insight

I was reading up on Markov Chains today, as one does, partially reminding myself about them and partly filling in some gaps in my knowledge.

One bit I was reading about was reversible distributions. If you’ve got some transition matrix \(P_{ij}\), a reversible distribution for it is a probability distribution \(q\) over the states such that \(q_i P_{ij} = q_j P_{ji}\). If you also lack any intuition about what that means, me too at the point at which it’s introduced. The broad intuition is that a Markov chain in a reversible distribution can be “run backwards in time” and you can’t tell the difference, but the main significance is that it’s an easy property to verify that has strong consequences.

Specifically, it’s a theorem that every reversible distribution is stationary. That is, if \(q\) is reversible then \(q_i = \sum\limits_j P_{ji} q_j\) for all \(i\) - the distribution is not changed by running the Markov chain forwards.

The proof of this presented in the text I was reading is as follows:

We can write \(q_i = q_i \sum\limits_j P_{ij}\) (because it’s a standard property of transition matrices that this sum is 1), so \(q_i = \sum\limits_j q_i P_{ij} = \sum\limits_j q_j P_{ji}\), so \(q\) is stationary.

I can follow this proof of course. It’s very simple algebra. But there’s something about it that feels like a magic trick. I don’t feel like I’ve got any actual insight into the nature of reversible distributions or have any sense of why they must be stationary from this. I’m sure there are plenty of mathematicians who could, and I’m not even sure that I need that insight, but it bugged me.

Thinking about it a bit more lead me to come up with the following alternative proofWhich I suspect I have cribbed in some sort of half-remembered manner from Geoffrey Grimmett’s “Probability on Graphs”, but if so it was useful to recreate it from memory anyway.

One way to think about distributions on markov chains is that you’ve got a certain amount of “stuff” (probability mass) floating around, and at each step that you run the Markov chain, this probability mass flows around the graph.

When you’ve got a probability distribution \(q\), all of \(q_i\) flows out of node \(i\), and \(\sum\limits_j q_j P_{ji}\) flows into it. That is, for each node \(j\), the mass in it flows out to every other node, and the fraction of it that goes to \(i\) is \(P_{ji}\).

A distribution is stationary if for all \(i\), \(q_i = \sum\limits_j P_{ji} q_j\), but equally we can write this as \(q_i - \sum\limits_j P_{ji} q_j = 0\). The net flow of mass out of \(i\) is 0.

We can write this flow of mass out of \(q_i\) as \(q_i = \sum\limits_j q_i P_{ij}\) - the sum of the mass flowing to each \(j\). So the net flow of mass out of \(i\) is \(\sum\limits_j P_{ij} q_i - P_{ji} q_j\). That is, there is a net flow along each edge, let’s say \(f_{ij} = P_{ij} q_i - P_{ji} q_j\). The probability of \(i\) remains unchanged if \(\sum\limits_j f_{ij} = 0\).

However, the reversability condition is precisely that all of the \(f_{ij} = 0\). That is, there is no net flow of mass along each edge. As a result, a reversible distribution is necessarily stationary, because a much stronger condition applies: Not only is the total net flow across the edges from \(i\) equal to 0, it’s 0 along each edge.

This proof is in some sense much worse. It’s longer and involves more algebra rather than less. But after reading this proof I felt like I understood why reversibility obviously implied a distribution was stationary, and also why it was a much stronger condition than being stationary.

I think mathematics - and many other things - benefits hugely from this sort of chewing on a problem and seeing it from different angles until it actually makes sense. If you just know that a result is true, you often find yourself unable to really use it. If you understand at an intuitive level why it’s true, it will stick with you and you can really see how to apply it, including in cases where it doesn’t quite hold but something similar does. Proof without developing insight alongside it is certainly better than no proof, but it’s in some sense fundamentally fragile and untrustworthy, and it’s often better to mull it over until you find the right way to break up the problem to actually allow it to make sense at an emotional level.