DRMacIver's Notebook

Boltzmann Samplers for Consistent Epistemic Actors

Boltzmann Samplers for Consistent Epistemic Actors

(This post will make little sense to you)

A question I have been wondering in relation to judgement aggregation is how we should be measuring reliability on sets of connected proposition.

Suppose we have \(C = A \wedge B\), with the true state of the world being that \(A\) and \(B\) are both true, If we choose \(A\) and \(B\) independently with probability \(p\) then we conclude \(C\) with probability \(p^2\). This leads to the result that a majority can be right about each of \(A\) and \(B\) but wrong about \(C = A \wedge B\) if \(0.5 < p < 0.5^{0.5} \approx 0.71\).

But what if the problem here is that we have a bad measure of reliability? Judges with \(p > 0.50\) is enough to get the right answer on each of \(A\) and \(B\), but as most of the time they get the wrong conclusion they’re clearly not a very good judge. Certainly it would be nice to be able to assemble unreliable actors into reliable parts, but it’s hard to do that if we don’t know what reliability looks like.

Suppose we measured the reliability of a judge on a set of propositions with the expected fraction they got right. For independent propositions this is just \(p\) as above, but if we add \(C\) into the set then the reliability is \(r(p) = \frac{2p + p^2}{3}\). Some basic calculation (for which I used sympy) shows us that if \(p > 0.582\) then \(r(p) > 0.5\), and that the when \(P(C) = 0.5\), \(r(p) \approx 0.64\). So under this notion of reliability we can still have that a “reliable” judge will get \(C\) wrong most of the time.

What if the problem is now our generative model? Why are we picking \(A\) and \(B\) independently rather than considering \(C\)? Suppose we want to consider a maximally uninformative distribution of possible beliefs given a fixed reliability, what should we do?

Well, this is a Boltzmann sampler.

If size is number of true propositions believed, the Boltzmann generating function for this problem is \(B(x) = 1 + 2x + x^3\), so the expected size of a Boltzmann sampler is \(E(x) = x \frac{B'(x)}{B(x)} = \frac{2x^2 + 3x^3}{1 + 2x + x^3}\). This has a reliability of \(0.5\) for \(x \approx 1.21\). At this value of \(x\) we have \(P(C) = \frac{x^3}{B(x) \approx 0.35\). We have \(P(C) = 0.5\) for \(x \approx 1.62\), which corresponds to a reliability of \(\approx 0.63\). This is only a slightly lower reliability than in the model where we chose \(A\) and \(B\) independently.

Now, suppose we take into account the logic structure of the propositions in our voting as follows: For each voter we define the distance of their beliefs to the chosen outcome, and we choose the consistent set of beliefs that minimizes the total distance to each voter’s belief. (You could think of this as analogous to the Kemeny-Young rule) What outcome does this give us?

Well, we can calculate the expected distance of each possible outcome. The possible outcomes are \(v_1 = \neg A \wedge \neg B \wedge \neg C, v2 = A \wedge \neg B \wedge \neg C, v3 = \neg A \wedge B \wedge \neg C, v4 = A \wedge B \wedge C\).

Discounting the \(B(x)\) factor, we get expected scores proportional to:

We have \(S_4 \leq \max(S_1, S_2, S_3)\) for \(x \geq 1.29\), which corresponds to a reliability of \(\approx 0.525\). This is much lower than the reliability we required to get the majority of people believing in \(C\) under any model where we voted on the propositions independently. Because we knew the logical structure of the problem space, we could take advantage of that to combine the independent support for \(A\) and \(B\) into support for \(C\).

One of the debates in judgement aggregation is about how to do this without dividing the set of propositions up into premises and conclusions, and this does that pretty well. My suspicion is this sort of agreement maximisation will tend to produce relatively high reliability results from relatively low reliability inputs.