DRMacIver's Notebook

Thoughts from David R. MacIver


Coevolution and the bad take machine

This may come as a surprise to you, but there are some people on Twitter whose tweets are really bad. I don't just mean that they're not very good, I mean that their tweets are so bad that it feels like they sat down and twirled a metaphorical moustache and went "I wonder how I can make the most people mad today?". I have three particular examples in mind, but I'm sure you can come up with hundreds more.

Why are they like this? Two reasons, basically:

  1. They want to be.
  2. We've trained them to be good at it.

The first I feel should be obvious: If this level of negative response doesn't deter them, it's clear that they must be getting something out of it. Either they enjoy the negative reaction (annoying people is funny), they feel like they're "sparking important debate", or some other reason. At the very least if they found the reactions unpleasant they would have learned to stop doing this. It's mostly smart people doing it after all.

For the second... lets talk about evolution.

Life makes more sense when you view a lot of the world as systems of replicating patterns: Patterns mutate, successful patterns get copied, unsuccessful patterns die off, and the shape of the world is patterned after what worked, for a value of worked that means nothing more than "was able to replicate itself". We tend to talk about these patterns as "replicators", as if they were concrete things that reproduced themselves, and I will use that terminology, but I find thinking of them in terms of patterns is more helpful.

Genes are the classic replicator, but there are many others. Memes in the Dawkins sense (Dawkins-memes) are another, as are memes in the modern sense (cat-memes). Cat-memes are both driven by and drive Dawkins-memes, which is an important example of coevolution: We can look at cat-memes as replicators in their own right (cat-memes are copied and modified and selected for), but their replication will make much more sense when we consider the Dawkins-memes. The replication has a significant loop where Cat-memes replicate partly by generating Dawkins-memes which replicate and generate more cat-memes.

This is an oversimplified view of the process that ignores many important details and other loops, but I think it is still an illuminating one.

These coevolutionary processes are common. Culture and genes are another one: e.g. consider lactose tolerance. Lactose tolerance is widespread in the west, because of the coevolution of dairy farming techniques (culture) and genetics (the ability to consume milk as an adult without getting very ill). Each replicator drives the other.

Another replicator is behaviour. We typically model our behaviour on past behaviours of ours (new behaviours can of course be introduced, but most behaviour is derived from past behaviour). When a behaviour goes well, we do it more, when it goes badly, we do it less. This is classic operant conditioning, the thing with the salivating dogs, viewed as an evolutionary process.

Another thing that counts as a replicator is "things people are talking about on Twitter". People talk about stuff, so others engage with it, or find out what's going on and talk about it too, etc. So tweets are a replicator.

So when you have a bad take machine, you get the following processes:

  1. They make a bad take.
  2. People are outraged and talk about it.
  3. The bad take machine likes it and does more of that behaviour in future.

If, on the other hand, they make a take and nobody cares, they do not get reward and the behaviour is selected against.

The behaviours drove the spread of the outrage replicator, and the outrage replicator provides the selection mechanism for the behaviours. Thus, via the spread of our outrage on Twitter, we have operant conditioned the bad take machine into producing worse takes.

Which is to say, it's bad on purpose to make you replicate it.


We are surrounded by ghosts

(This is part of an attempt to get back to using my notebook to write more half formed thoughts)

I recently read "Epistemic injustice in mathematics" by Rittberg, Tanswell, and Van Bendegem. In it they talk about the idea of ghost theorems:

There are mathematical results which are taken as accepted in a mathematical community, relied upon in talks, discussion, and proving further results, but which cannot be traced to a concrete proof in the literature. These results are part of the expert knowledge one is expected to have in certain communities and we will present examples below. ... The kind of result we wish to discuss, i.e. the kind of result whose proof cannot be traced in the literature, thus seems to be a special kind of folk theorem for which we propose the term “ghost theorem”; these theorems are immaterial in the sense that they are not proven in the literature yet they “haunt” parts of daily mathematical life.

They highlight ghost theorems as a particular source of epistemic injustice against people trying to participate in mathematics as an epistemic community. They focus on an example of someone trying to get some results published, and having them be rejected because they were ghost theorems - even though their submission had never been published, it was "obvious". I think it would still have been an injustice of sorts without that gatekeeping, because presumably these results are useful, and there is no way to discover them without privileged access to the community. As someone who does a lot of solo reading in a variety of fields, a kind of feral fan for a variety of academic disciplines, this is an injustice that particularly matters to me: There are probably many things that are completely obvious if you've been trained in, say, phenomenology, that I as an outsider will never discover no matter how much of the literature on the subject I read.

I'd like to call the more general phenomenon that this is a specific instance of this "ghost knowledge": It is knowledge that is present somewhere in the epistemic community, and is perhaps readily accessible to some central member of that community, but it is not really written down anywhere and it's not clear how to access it. Roughly what makes something ghost knowledge is two things:

  1. It is readily discoverable if you have trusted access to expert members of the community.
  2. It is almost completely inaccessible if you are not.

In this sense, most knowledge is ghost, particularly if you take an expansive view of what counts as an epistemic community.

A recent (more recent than the publishing of this paper) related example was "Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra" by Denton, Parke, Tao, and Zhang. A couple of physicists found an interesting (to them, and apparently to others who care about such things, I confess I find it too technical to be interesting - I understand the mathematics well enough but don't work in this field so have a differently tuned sense of interest). The story of this as I understand it is roughly:

  1. A bunch of physicists discovered an interesting and, apparently, novel identity.
  2. They emailed Terence Tao and he almost immediately was able to provide multiple proofs of it.
  3. They published a paper together.
  4. It got a lot of press because Terence Tao is something of a big deal and discovering a new result in such a mature field is surprising.
  5. Turns out it was actually a result that was not so much "well known" as repeatedly rediscovered in myriad different places, but had never made a splash before.
  6. They wrote a new survey paper about this.

If you look at only parts (1-3) this is a perfect example of ghost knowledge. But the result already existed in the literature.

To quote a great philosopher:

“But the plans were on display…” “On display? I eventually had to go down to the cellar to find them.” “That’s the display department.” “With a flashlight.” “Ah, well, the lights had probably gone.” “So had the stairs.” “But look, you found the notice, didn’t you?” “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

It is very often the case that even when knowledge is written down somewhere, it's nearly impossible to find it. This has happened to me in mathematics before: A friend and I once discovered a result that we thought was very interesting (under certain conditions, the best possible continuous approximation to a discontinuous fucntion exists), only to eventually discover that the result was known, it had just been published in an obscure mathematical journal, in the 1970s, in Poland, so virtually nobody knew about it. Fortunately we had never put in much serious effort into trying to publish it before discovering that (honestly we probably should have, but neither of us were professional mathematicians by that point), but it was still annoying.

The core problem that causes all of this is that there's a leaky pipeline of knowledge from epistemic communities to the outside world. In order for you to discover a piece of knowledge:

  1. It has to be interesting enough for someone to think it is worth writing down.
  2. It has to be interesting enough that it gets accepted (though if not, it may end up on a random blog post if you're lucky).
  3. It has to be interesting or well organised enough that it gets surfaced in a way you can find.
  4. It has to be accessible enough for you to be able to find it (e.g. it can't use super technical terms that you'll have no way to ever discover without access to an expert).

This pipeline is leaky enough that it would be very surprising if most knowledge produced by an epistemic community were accessible to you.

This may not seem like a big deal when you think of communities like mathematics, where most consumers of its contents are also members of the community, but it's a big deal when you consider two things:

  1. Everything is like this, including subjects that everyone would benefit from. I read a lot of therapy books for example and I'll bet that there's at least two orders of magnitude more ghost knowledge about therapy than I have access to.
  2. Every community is an epistemic community. Many lessons are e.g. learned over and over again inside companies, and are never written down anywhere, or when they are are so defanged that nobody can benefit from them. The pipeline is different, but the problem is the same.

Given this, we are surrounded by ghosts: The information that is written down and that we can access is so thin on the ground compared to what's actually out there, it's astonishing that we ever get anything done.


Notes on Conscious Experience

These are some notes on some my more eccentric beliefs about consciousness and theory of mind, which I keep promising to write up for people. I kinda lost steam halfway through writing them, so they peter out a bit, but I am happy to take questions and write up any answers in more detail.

Apologies in advance: This is going to be very poorly cited, because I have synthesised these beliefs out of a wide variety of sources, over a long period of time. I have arrived at these opinions drawing heavily on other people's work, but I have somewhat lost track on which bits come from where, and of which bits are original to me. Most of the originality will be framing and synthesising things that are not normally connected.

The briefest possible summary of my position is bounded materialist panpsychism. That is, I hold that in some sense "everything" is conscious, with a bunch of caveats on this and what it means which causes my beliefs to fall short of traditional panpsychism, but that there is no supernatural mechanism behind this and that it is fully explainable using boring physical theories involving atoms and physics and suchlike.

More specifically what I believe is that consciousness is a distributed phenomenon, and that it does not really make sense to talk about the number of conscious entities, because conscious entities put together form composite conscious entities, and many examples (including humans) that we think of as a single conscious entity are themselves composite.

In the rest of these notes I'm going to argue for this position. I don't necessarily expect my argument to be convincing unless you are already sympathetic to it, it's more by way of explaining why I hold it.

In brief, my argument position is:

  1. Cognition is an inherently distributed process with no natural "point of origin", and a smaller proportion of it happens in single people than we tend to assume.
  2. Consciousness extends into the world through the means by which we act in it and experience it.
  3. Our experience of cognition is an integral feature of our consciousness.
  4. Most of that cognition we experience is wholly or partly distributed.

Philosophical Notes

Before I get into the details, I'd like to explain a few things: I am going to use a bunch of terms which have many possible meanings and are hotly debated. I will try to give a precise idea of how I am using them, with a sort of over/under definition system: I will give a hand wavy ballpark definition of what I am trying to point to with them, and some examples of things that I think do and don't fit that definition.

In particular, I am going to be using the words consciousness and cognition.

By cognition I roughly mean anything that looks like a thought process, in a very broad and expansive sense. For example: I would consider both human thought and a chess program as examples of cognition. I would not consider a rock just sitting there an example of cognition. I am somewhat ambivalent as to whether I would consider a reflexive action (e.g. flinching away from pain) as cognition, but would lean towards no.

By consciousness I mean something like "the ability to have a subjective experience" - there being "something it is like to be this thing". This is closer to what philosophers would call phenomenal consciousness. A human is conscious, and so is a rabbit. A rock or a chess program are not conscious.

Secondly, I'm going to adopt the following general philosophical principles:

  1. If two things can be observed to behave differently then they are different.
  2. If two things can't be observed to behave differently then you should be suspicious of but not automatically reject claims that they are different.

In particular if you have examples of things that have one property but not another then those properties are not the same thing. So in the above, cognition and consciousness are not the same thing, because a chess program has cognition but not consciousness. It may or may not be possible to have consciousness without cognition - certain drug induced or meditative states could be examples of this - but whether it is or not is not an essential feature of my argument.

Cognition Combines

By "cognitive system" I mean "any assemblage of things, considered in its capacity as something that exhibits cognition". "Cognitive system" is a descriptive word rather than a category word - essentially everything is a cognitive system, but some things are more interesting to think of as cognitive systems than others. The term describes how we think of the system, not what the systems is.

Most things are what you might think of as "the null cognitive system" - a thing you can consider as exhibiting no cognition. A rock is a null cognitive system, a human is a more interesting cognitive system. We will largely be concerned with the more interesting type of cognitive system.

A brain is certainly a cognitive system, but I'm going to argue that the brain is not the natural cognitive system of interest for human thought. This is because:

  1. Human thought is intrinsically embodied. Your body is very much an important participant in your thought processes.
  2. Human thought extends out into the world, and many human thoughts are not possible with an unassisted body.

The evidence for (1) is relatively strong from the neuroscience and psychology worlds: Your body certainly affects your thought processes (consider how differently you think when hungry. "Am I depressed, is everything terrible, or am I just hungry?" is a common ambiguous experience. In general, emotions are a very embodied process. Much of your emotional experience exists primarily in your body, and a functioning emotional experience seems to be fairly essential to making good decisions: Without emotions to guide you, you don't become a sensible being of pure logic, you become a disorganised mess who can't prioritise.

For an example of (2), consider mathematics. It is almost impossible to do non-trivial mathematics without external working memory. A mathematician with pencil and paper is capable of vastly more mathematics than a mathematician without them. The cognitive system that includes the pencil and paper is capable of different things than the base mathematician, so should be considered a different cognitive system in its own right.

Additionally, cognitive systems combine. Advanced Chess is the most commonly cited example of this: Both the human player and the chess program are cognitive systems in their own right, but the combined human/program chess player is capable of chess play that is fundamentally different than either can manage on their own.

Cognitive systems can also combine when the individual cognitive systems are conscious. A collaboration between two humans is often capable of things that neither human is capable of on their own - either due to not having the same skill sets or just by distributing work. A common example of the latter is transactive memory - we often "remember" things by delegating that memory to someone else, e.g. a partner or a coworker. We remember who is responsible for a category of information and ask them.

This, incidentally, illustrates a problem with a possible alternative interpretation: It is not enough to just think of these tools as cognitive aids. A human can extend the cognition of another human, but in doing so the converse is true too: Their cognition is being extended in turn. It doesn't make sense to ask which one is the "real" source of cognition: They both are! The combined system is not a centralised one with assistance, it is a collaboration resulting in a new cognitive system in its own right.

Cognitive systems also typically divide. You are a cognitive system divided up into your brain and body, but each of those subdivide - your gut contributes differently to your cognition than your foot does, different parts of your brain contribute differently. If part of it is physically removed or damaged, you may continue to be a cognitive system (not all brain damage still permits meaningful thought of course, but much of it still leaves you as someone who can think, albeit in an impaired manner).

Note however that cognitive systems are usually more than the sum of their parts - most of cognition lives in the interaction, not in the physical object. A pencil and paper is the null cognitive system (it exhibits no cognition on its own), and yet a mathematician with a pencil and paper is a different cognitive system than a mathematician on their own. If however you don't let the mathematician interact with the pencil and paper, the result is just a slightly frustrated sum of its parts. The interaction is important.

Given this tendency to combine and divide, I do not believe it makes sense to privilege any particular level of cognition as special. Thoughts occur at every level, and transmit between different parts of the system, and this happens (albeit through different mechanisms).

Consciousness is Extended and Overlapping

Where does your consciousness extend out to? That is, what physical objects can be "part" of your conscious experience?

It is not totally clear what this question means of course, partly because it's not totally clear what a conscious experience is, but hopefully we can still provide partial answers to it based on things we think should be true of any reasonable interpretation of it.

For example, I think it is fair to assume that we are conscious of our body. Things that happen to it are part of our subjective experience, more or less (we may not be conscious of all of our body - e.g. most internal organs we don't have easy conscious access to unless they have an associated pain).

Equally it's clear that there are some things that are not part of our conscious experience. A teapot in orbit of alpha centauri is not something I have a subjective experience of (Neither are most things closer to home of course, but I wanted an unambiguous example).

Much (most? all?) of consciousness - the ability to have a subjective experience - is consciousness of something. It has an aboutness, what philosophers (phenomenologists specifically) often call intentionality. I'd like to consider some distinctive features of this intentional consciousness. A lot of this will follow fairly standard phenomenology of perception work from Merleau-Ponty but full disclosure I have not actually read Merleau-Ponty.

Try running your hand over a couple of surfaces. Say a smooth surface and a rough one (I used a table and a couch for this experiment). Notice how it feels. It may help to close your eyes while doing this to focus on the sensation of touching the surface.

When you are doing this you are conscious of the texture of the surfaces, but you experience that texture through your hands. The table is (probably) not part of your consciousness, but your hands are. Which parts of your hands are part of that consciousness? The experience is primarily had through your nerves, but equally your skin is a major part of it - you are experiencing its give, and your conscious experience of the surface is an interactive process that depends on your muscles, and many other details besides. I think it is reasonable to describe this as a conscious process that involves your hands as a whole.

Now put on a pair of gloves (ideally a thin pair) and do it again. You probably had a broadly similar subjective experience with and without the gloves, but with the sensations somewhat muted. So the gloves muted your subjective experience of sensing the texture of the surface, but you still had a subjective experience of the texture of the surface, right?

But there's a key difference here: you never touched the surface. What your senses were directly experiencing was not the texture of the surface, but the inside of your gloves. Nevertheless, what you experienced was the texture of the surface.

Try it again a few times and see if you can experience it as feeling the inside of your gloves. It's hard. I can't.

In the same way that you were previously experiencing the texture of the surface through your hands, you are now experiencing it through your gloves (really through the combination of your gloves and your hands). So your conscious experience can extend to include not just your body, but the physical tools you are using.

You can see this with other tools that less directly map to your normal sensations. Try picking up a pen and running it along the surfaces. Maybe practice a bit more, using it to feel out the shape of other objects with your eyes closed. Try writing some things by hand like it's the twentieth century. After a while, you should have a similar experience as you get used to it: You are having a subjective experience not of the pen itself, but the things you are interacting with the pen through. Your consciousness has extended into the pen.

I want to reemphasise that this is not a mystical process. You have not acquired any extrasensory awareness of the pen, you're just manipulating it with your hands, and getting physical feedback through it. But you have a subjective experience of what you are doing that is experienced through the tools you are using to do it, while the tools themselves can fade into the background unless you focus explicitly on them.

Now go find another conscious entity who it wouldn't be weird to hold hands with. I used a cat, which she was not best pleased about. It would probably work better to use a human because they're more likely to cooperate, but I had a cat immediately to hand and I didn't have a human immediately to hand.

Now use their hand (paw) like a pen to feel some textures and objects. It will be weird, but just roll with it. After a while you can have a similar subjective experience as you did with the pen - their hand becomes an extension of your conscious experience which you can use to feel the world through. So as well as being able to extend your consciousness into inanimate objects, there is nothing stopping you from extending it into physical spaces that are "owned" by another conscious entity.

Note that there is something significantly different happening when you do this: While previously, only you were consciously experiencing the world through the pen, when you do it through someone else, there is a physical object which is contained in both of your conscious experiences. You are feeling the world through the cat's paw, while the cat is simultaneously feeling the world through her paw and also feeling very annoyed and affronted at the interfering human.

You can also see this with the pen. Ask your partner-in-subjectivity to grab hold of the other end of the pen, both close your eyes, and now move the pen around. You are each having a subjective experience of the other person's movements, mediated by the pen. The pen is now an object which you both experience as part of your consciousness.

Note that I am not suggesting that you are having the same conscious experience of the pen here. You are not. You each have your own distinct subjective experiences which are not shared with the other person, but the same physical object is acting as part of each of your consciousnesses, and there is no obstacle to it doing so.

Consciousness of Cognition

At this point I lost a bit of steam writing this so I'm just going to summarise the remainder of the argument:

A major part of what we are conscious of is our own thoughts, but our own thoughts are less "ours" than we think they are. Cognition is intrinsically distributed, and we routinely delegate our thinking to the distributed cognition that we are embedded in.

In doing so, we often subjectively experience that cognition as "ours". Knowledge construction is an intrinsically social activity, and most of our opinions and behaviours are copied from those arounds us (possibly with modification), but we still experience them as ours. Transactive memory mentioned above is ubiquitous, and we seem to treat things that are stored in transactive memory as if we know them (there's research that supposedly supports this, but I haven't followed up on the primary sources and replication crisis caveats apply), so our subjective experience of "our" thoughts are really our subjective experience of the distributed cognitive system that we're hooked up to. In the same way that we experience the world through the tools we interact with it, our experience of cognition is not bounded by our own skin, but extends into the whole of the cognitive system through which we have thoughts.


Almost every set is immune

A set \(A \subseteq N\) is recursively enumerable if it can be enumerated by a computable function (formally, if there is some turing machine that halts on an input if and only if it's in the set). If an infinite set contains no infinite recursively enumerable subsets, it is called immune. I was introduced to this concept by Gary Fredericks:

An "immune set" is an infinite set of natural numbers of which no computer program can print out any infinite subset.

The fact that they "exist" at all is pretty weird.

In fact they are not weird, they are normal. Almost every infinite subset of \(\mathbb{N}\) is immune.

To see this, consider the set \(C = \{0, 1\}^\mathbb{N}\). This can be mapped to the power set of the natural numbers by mapping a set to its indicator function \(1_A\), where \(1_A(x) = 1\) if \(x \in A\) and \(0\) otherwise. Every \(x \in C\) is the indicator function of \(A = \{n: x_n = 1\}\) so this mapping is a bijection.

\(C\) is a probability space, with the probability corresponding to the probability of observing an event with an infinite sequence of uniform coin tosses. This is equivalent to picking a set by randomly assigning every natural number to be an element of it with probability \(\frac{1}{2}\).

In this note I'll show that a randomly chosen \(x \in C\) corresponds to an immune set with probability one.

To see this, let \(A \subseteq \mathbb{N}\) be any infinite set. The probability that a random element of \(C\) contains \(A\) as a subset of its corresponding set must be zero, because this happens if and only if \(x_n = 1\) for all \(n \in A\). The probability of any \(n\) given coordinates being \(1\) is \(2^{-n}\), so the probability of infinitely many specified coordinates being \(1\) must be \(\leq 2^{n}\) for all \(n\), i.e. \(0\).

There are only countably many infinite recursively enumerable sets (because each is defined by a turing machine and there are only countably many turing machines). Enumerate them as \(R_i\). \(P(x \text{ is not immune}) = P(\bigcup\limits_n \{1_C : R_n \subseteq C\}) \leq \sum\limits_n P(\{1_C: R_n \subseteq C\}) = \sum\limits_n 0 = 0\). So the probability of a randomly picked subset of \(\mathbb{N}\) not being immune is zero, and thus the probability of a randomly picked subset of \(\mathbb{N}\) is \(1\).

Another way of seeing more or less the same thing (they are not actually the same theorem, but they have the same interpretation that the "typical" set is immune - it's just a different notion of typicality): the set of points containing some infinite set \(A\) is a closed set (because its complement is \(\bigcup\limits_{i \in A} \{x: x_i = 0\}\)) and has empty interior (because for any finite set of natural numbers there is some \(i \in A\) not in that set, so those numbers are not enough to demonstate membership of \(A\)). Therefore the set of non-immune sets is meager. The comeager sets are another type of "large" set, and in particular by the Baire Category Theorem comeager sets are necessarily uncountable.

This kind of reasoning might be a bit strange if you're not used to it, but it's fairly common in set theory - rather than constructing objects satisfying a particular property, we show that in some sense it is "typical" that objects satisfy that property, and the ones that don't that we normally concern ourselves with are a weird uncommon special case (this sort of reasoning is also common in more concrete subjects like combinatorics).


Speeding Up Conditional Random Sampling

This is an elaboration of an algorithm I described in an email to John Regehr, when talking to him about his recent blog post about generative fuzzers and its connection to Boltzmann sampling. It currently has no implementation and so I can't make any promises about how well it works, but it's interesting and I don't currently have time to work on it, so I wanted to write it down before the idea gets away from me.

The basic problem I'm interested in here is this: Suppose you have some random variable \(X\) which is implemented by making a series of non-deterministic binary choices (coin flips), and some condition \(f(X)\), and you want to sample from the conditional distribution \(X | f\) - i.e. the subset of values of \(X\) for which \(f\) holds, with probabilities proportional to their usual probabilities for \(X\).

The connection to John's problem of interest is that samplers based on binary choices already have the property that the probability of drawing any given sequence of choices \(s\) depends only on its length, and is proportional to \(2^{-|s|}\). Such random variables are called Boltzmann Samplers with parameter \(\frac{1}{2}\). In particular this means that the conditional distribution on any fixed size is already uniform, so if you pick \(f\) to be a size constraint then it will naturally get something approximating what John wants for the constrained generator.

The easy and normal way to do this is to do rejection sampling. Generate independent copies \(X_1, \ldots, X_i\) until you find some \(i\) with \(f(X_i)\) and return that. The problem with this being that this takes on average \(\frac{1}{P(f(X)}\) generator calls, which if \(f\) is hard to satisfy can be very slow. I've been interested and idly thinking for a while about how we can use the structure of random generation to speed it up. I think the following algorithm does it:

In order to do this it helps to think of \(X\) as a function \(c: \{0, 1\}^{\mathbb{n}} \to T\) taking a series of binary choices and turning them into a test case, with the property that \(c(u)\) only depends on some finite initial prefix of \(u\) (and we can observe what that prefix is by instrumenting the random generator). We can simulate \(X\) by lazily generating a uniform-at-random infinite binary sequence, but we can also just as easily try \(c\) on any other binary sequence (this observation is also how test-case reduction and a bunch of other things work in Hypothesis).

The algorithm works as follows: We maintain a list of starts \(S = \{s_1, \ldots, s_n\}\). This list is a set of finite bit strings in \(\{0, 1\}^{<\omega}\) with the following properties:

  1. It is prefix free - that is if \(i \neq j\) then \(s_i\) does not start with \(s_j\).
  2. If \(f(c(u))\) for some \(u\) then \(u\) starts with some (necessarily unique) \(s_i\).

We will update this set as we learn more about the shape of the problem. Initially we start with \(S\) containing just the empty string.

Suppose we have any such set \(S\). The following algorithm perfectly simulates the conditional distribution \(X|f\):

  1. Pick \(s_i \in S\) with probability proportional to \(2^{-|s_i|}\).
  2. Evaluate \(x = c(s_i u)\) for some uniformly chosen \(u \in \{0, 1\}^{\mathbb{n}}\).
  3. If \(f(x)\) then return \(x\), else go back to \(1\) and try again.

This is certainly true if we have the property that every \(u \in \{0, 1\}^{\mathbb{n}}\) starts with some \(s_i\), because we're picking \(s_i\) with the probability that a uniform at random string starts with it, and then we're drawing the rest uniformly at random, so our input sequence is a pure uniform at random series of choices.

If \(S\) does not have this property, imagine enlarging it by adding strings until it does. Every \(s\) added has the property that no extension of it will cause \(f(su)\) to be true (because by assumption such an \(su\) would have to start with one of the existing \(s_i\), which by prefix-freeness is incompatible with starting with \(s\)). This means step 3 will always fail, so the distribution is the same as the conditional distribution on always choosing from one of our original \(s_i\).

This algorithm is a speedup only if the chosen set \(S\) can prune a lot of possible prefixes. This actually does sometimes help because if the tree has a lot of shallow branches then we can evaluate them in full, determine whether they lead to a desired value, and if not discard them, but this will mostly not be the case. It does how motivate the next algorithm:

  1. Pick \(s_i \in S\) with probability proportional to \(2^{-|s_i|} P(f(sU))\) where \(U\) is a uniform at random infinite bit sequence..
  2. Evaluate the sequence \(x_i = c(s_i u_i)\) for some uniformly chosen \(u_i \in \{0, 1\}^{\mathbb{n}}\) until you find some \(x_i\) with \(f(x_i)\) being true, and return that.

This is essentially the same algorithm as before, but with the rejection sampling moved inside: We pick a prefix with probability weighted to how likely it is to work, and then we run rejection sampling until it works.

The easiest way for me to see that this works was to note that this is a disjoint union of Boltzmann generators weighted in the right way, but it's probably easy to check it from elementary principles.

There are two big problems with this algorithm:

  1. For many sets \(S\) it's no speedup at all. In particular when \(S = \{''\}\) it's literally just rejection sampling.
  2. We have no way to calculate those probabilities.

Fixing this is where the details of the algorithm get a bit fuzzy and would require me to actually have an implementation to test and see how it performs, as there are a bunch of tuning parameters and choices made in the design. The basic idea is that we maintain a distribution of beliefs about these probabilities, Bayesian style, and then use a Thompson Sampling style approach to build our sampler with a suitable explore/exploit tradeoff.

For each \(s_i\) we maintain counts of the number of times we've tried a sequence starting with it, \(n_i\), and the number of times that has resulted in satisfying \(f\), \(g_i\). This gives us a posterior distribution for \(P(f(sU))\) of \(\beta(g_i + 1, n_i - g_i + 1)\) (this would not be true if our observations were not drawn uniformly at random but the algorithm will proceed in a way that guarantees that's always the case). We can record all draws we perform in a Patricia Trie so that these are easy to recalculate when we modify \(S\).

The algorithm now proceeds as follows:

  1. For each \(s_i\) draw a value \(q_i\) from its posterior distribution.
  2. Pick \(s_i\) with probability proportional to \(2^{-|s_i|} q_i\).
  3. Perform "some amount of" rejection sampling on \(s_i U\) to attempt to find an \(X\) satisfying \(X\) (updating the statistics for \(s_i\) as we do).
  4. If the rejection sampling succeeds, return that. If not, possibly split \(s_i\), setting \(S' = S \cup \{s_i 0, s_i 1\} \setminus \{s_i\}\) (possibly not adding any of these which define a complete prefix for \(c\) that doesn't satisfy \(f\)), then go back to \(1\).

The idea being that we sample according to what we would in some configuration of our estimate of the true probabilities. As the algorithm runs, we get more and more information about those probabilities, so this algorithm should converge on our exact algorithm above.

This leaves two questions:

  1. How much rejection sampling to perform? Our estimate of the probability might be wildly off, so we can't rejection sample indefinitely: We might even be at a prefix where the true probability is zero!
  2. When should we split a node?

This is something that I think needs careful experimentation and tuning and can't be figured out from first principles, but my guess is:

  1. The number of rejection sampling loops should be roughly tuned to how certain our estimate of the probability is. Something alone the lines of updating the stats after drawing af ailure, drawing a new \(p\) from the posterior, and terminating the loop with probability \(1 - \max(1, \frac{p}{q_i})\) where \(q_i\) is our initially drawn estimate of the probability is likely to work pretty well: If we start to think that it's really unlikely that we're going to succeed because we've got a large number of failures in a row, we'll bail out, but if we knew we were an unlikely scenario early on with fairly high certainty and thought it worth picking anyway.
  2. We should split fairly aggressively once we've seen at least one success under a node, and fairly conservatively if we haven't. There might be something clever we can do with estimating the benefits of splitting based on observed data, but my suspicion is that a dumb heuristic to the tune of "Split a node once you've seen at least \(10\) data points under it, at least one of which has succeeded and one of which has failed" is probably likely to be pretty good.

Alternatively, here's another interesting possibility for really aggressive splitting: Every time we observe a success \(f(c(u))\), find the \(s_i \in S\) that starts \(u\) and split it.

The reason why these heuristics are a bit uncertain is that they're essentially a quality/performance tradeoff: splitting will almost always improve the quality of our generator (the "almost" qualifier is because each child node will have less data gathered for it, so we will tend to spend a little more time exploring them due to our uncertainty, which may be useless), so the goal of the splitting heuristic is mostly going to be to keep memory usage under control, and similarly running more rejection samples will almost always give us a higher quality result, but it will also cause us to be slower.

I don't know if this idea will work well. It has the nice property that it can't really work badly, in that its failure modes are essentially no worse than the status quo, but it may well be that they're also no better. My plan is to at some point (hopefully in the next few weeks but realistically probably not) and see if I can use this in Hypothesis. It's particularly interesting for solving a problem that has long been a bit of a sore point in Hypothesis, which is size control.

QuickCheck has some special functionality for dealing with sizes of examples. Examples in QuickCheck are generated with both a random seed and a particular value of a size parameter. This is used to explore small examples first and then gradually work up to big ones, which also helps it with bounding the size of recursive subexamples and preventing cases where they don't terminate.

Hypothesis doesn't do this. Instead it bounds the maximum length of the choice sequence, effectively performing rejection sampling to keep the size under control, and biases the distribution in ways that make it more likely to fit within that cap. This has historically worked out pretty OK, but it would be nice to be able to explicitly fuzz in smaller sized regions first - it improves test performance and makes the shrinker's life easier.


Faster SAT model counting

Advance warning as this seems to be getting some attention: Please remember that this is my notebook blog where I just post random shitty things that are not fully thought out. Don't expect anything amazing and/or well justified out of things posted here.

Here's a trick I figured out a while ago and have yet to deploy in anger because it's so far outside my wheelhouse (sometimes I tackle problems I know I won't be able to solve just to see if anything interesting falls out and/or to learn more about the research in that area). As far as I can tell this is novel to me - I've read a bunch of the relevant research, and there are recent papers that should have used this trick if it were known. I should probably write a paper about it but, well, haven't. I'm almost certain it's not revolutionary in any meaningful sense, but it does seem to be a nicer approach than similar things I've seen in the literature, and it might be an interesting incremental improvement on part of the problem.

SAT, the boolean satisfiability problem, is the canonical NP-hard problem. #SAT is the canonical worse than NP hard problem. Rather than finding out if a set of formulae is satisfiable, you want to find out how many satisfying assignments it has (or, in the version I prefer, the probability of a uniform at random assignment satisfying it - these are obviously equivalent by just multiplying by \(2^n\) where \(n\) is the number of variables, but the nice thing about this is that it is not affected by adding extra "free" variables that don't appear in the formulae).

This is obviously at least as hard as SAT because SAT is equivalent to the number of satisfying assignments being non-zero, but in fact it is much harder than SAT, both in the complexity sense (assuming modest "these complexity classes we're pretty sure are inequivalent really are" assumptions. Certainly \(P \neq NP\) is enough), and also in the sense that most of the tricks used to make SAT-solving fast don't work for SAT-counting because they involve aggressively pruning the search space. e.g. one thing you do in a SAT solver is that whenever a variable appears only negated or never negated in formulae, you can unconditionally set it to a compatible value to prune out all the clauses it appears in. With model counting you need to consider both values.

Anyway, it turns out there is a trick that lets you speed up model counting in some circumstances. It doesn't speed things up massively in all circumstances by any means, but in situations where the problem has a nice underlying structure it can have a huge impact by cutting down the problem size significantly.

The way it works is by calculating things:

  1. The backbone of the problem - the set of literals that will always be assigned true in any satisfying assignment.
  2. The literal equivalences of the problem - an equivalence relationship over literals such that every equivalent pair of literals will have the same value in any satisfying assignment.

Combining these you can significantly reduce the complexity of the problem by merging all equivalent literals, assigning the backbone to true, and computing a reduced version of the problem. Every variable in original is either:

  1. Forced (it is a backbone element, or is equivalent to some literal of strictly smaller variable number)
  2. Free (it is not equivalent to any strictly smaller variable number and does not appear in the reduced problem, so its value is irrelevant)
  3. Uniquely determined by an assignment of the reduced problem.

Therefore the probability of an assignment satisfying the clauses is \(2^{-k}\) times the probability of an assignment satisfying the reduced problem, where \(k\) is the number of forced variables.

If the reduced problem is much smaller than the original problem, which it often will be, this make model counting tractable in cases where it was previously intractable.

Similarly, you can use this to speed up random sampling, because you can randomly sample by generating a random sample from the reduced problem, assigning all forced variables based on that, and uniformly at random assigning all free variables.

So how do we do this? Partition refinement!

We can simultaneously calculate both the backbone and the literal equivalence with at most \(n + 1\) SAT queries, where \(n\) is the number of variables, as follows.

We maintain three data structures:

  1. A union/find data structure for keeping track of merges, modified so that it understands the negation relationship (i.e. if \(x \tilde y\) then \(\neg x \tilde \neg y\).
  2. A partition refinement data structure for keeping track of candidate literals for merging, using the literal with the smallest variable number as the canonical representative of a partition.
  3. A set of literals that might be members of the backbone.

We initialise the partition refinement data structure with the set of all literals (note: Not all variables, it's important to contain negations too - a variable might be equivalent to the negation of another variable). We initialise the backbone with the set of all literals.

Every time we calculate a new assignment for our clauses we:

  1. Update the partition refinement to be compatible with the assignment, so that every member of the same partition takes the same value in that assignment.
  2. We intersect the backbone set with it, so that only literals which are assigned true in this assignment remain in the backbone.

To start with, calculate a single satisfying assignment for the clauses, and update in the above way.

Now, for each variable \(a\), in increasing variable number order, do the following:

  1. We look it up in the partition refinement data structure to get the canonical representative of its partition, \(b\). If \(a = b\) then stop and do nothing for this variable (this will always be where we stop for variable \(1\)).
  2. If it is not, look for a satisfying assignment to the problem with the addition of the clauses \(a \vee b\) and \(\neg a \vee \neg b\). These extra clauses force \(a\) and \(b\) to be assigned distinct values.
    1. If there is such an assignment, update the backbone and equivalence as above.
    2. If on the other hand there is no such assignment, merge \(a\) and \(b\) in our union/find data structure.

At the end of this process, the potential backbone will either be empty, or every element of it will be equivalent. If it is non-empty pick an arbitrary element \(a\) of the backbone set and add \(\neg a\) to the clauses temporarily and look for an assignment to that. If there is one, the problem has no backbone. If there is not one, the current candidate backbone set is the actual backbone of the problem.

Additionally, we will have exactly the literal equivalence relationship in the union/find data structure: For every pair of literals, either we will have proven they can be distinct by demonstrating this with an assignment, or that they cannot, by finding that there is no satisfying assignment.

In addition to this core algorithm there are a bunch of good tricks you can do to speed it up by avoiding making SAT queries where the answer is more obvious.

The first, is that there's a nice way of precomputing a lot of the equivalences from the 2-SAT structure of the problem, using Tarjan's algorithm for finding strongly connected components of the graph. Create a graph where the nodes are literals, and for every clause \(a \vee b\), add edges \(\neg a \to b\) and \(\neg b \to a\). Now compute the strongly connected components of the graph, and merge each strongly connected component together (because if \(a\) and \(b\) are in the same component then we know that each implies the other so they must be equivalent). If this results in any merges, replace any literals in the problem that have been merged. If this results in any new 2-clauses, repeat this process until there are no more merges.

Additionally, we can partially compute the backbone as we go. As well as maintaining a set of backbone candidates, we can also maintain a set of things that are definitely in the backbone, by just running the unit propagation part of a SAT solver: We maintain a standard data structure where we keep track of two watched literals for each clause. Whenever we merge, we update the watches for the replaced literal. If this results in any new units, we run unit propagation, add every resulting literal to a known backbone set, and mark the relevant variables as forced and thus skippable in our iteration.

You can also use unit propagation to speed up the merge step in a common case by using a trick I've described before. By making a version of the watched literals data structure that does a lazy copy, you can easily calculate what the resulting set of units would be if you added a clause \(a\) to the current clauses. Thus when testing whether \(a\) and \(b\) are equivalent, you can first start by discovering whether each implies the other through unit propagation alone (this can also tell you that in fact one or both of them might always false if unit propagation leads to a contradiction). If this is the case, you can merge them without doing a SAT query.

Things I should probably do but haven't:

  1. Bundle this all up as a convenient bit of code. I have some code for doing this, but it's all very entangled with a bunch of barely working code for sampling from SAT instances (barely working because slow. Turns out the thing that is a major research field is a hard problem who knew?!)
  2. See if this is interesting enough to people to get a paper out of it.


How To Make Good Coleslaw

I need to write a notebook post for beeminder reasons and don't have the bandwidth to make it good, so here have a piece of information that is apparently idiosyncratic to me: My coleslaw recipe.

The trick is to basically "lengthen" the mayonaisse into a dressing by adding more of the mayonaisse ingredients that aren't eggs and fat.

This way of doing it gets a good compromise between two coleslaw extremes - coleslaw which is just a bunch of shredded vegetables is too bland, coleslaw which is swimming in mayonaisse is just gross, but this kind of lengthened mayonaisse dressing lets you pack in a lot more flavour without ever being at risk of hitting that danger zone of disgusting supermarket mayonaisse soup. It's probably also a lot healthier, but that's not really why I do it, it just tastes much nicer this way.

There are no quantities per se, because I do it entirely by eye and taste, but the ingredients are:

Add a fairly generous amount of the mustard, then add salt and vinegar to taste, mixing thoroughly, until the vegetables taste sharp and moderately savoury (there's more salt coming from the mayonaisse, so don't oversalt it). Now add mayonaisse in a little bit at a time until the coleslaw "looks" right - it should look very lightly coated in mayonaisse, not swimming in it.

The resulting coleslaw tends to pool some of its liquid at the bottom, so if it's been sitting for a while (I often make a big batch and keep it in the fridge) before serving it needs to be tossed again to get the full flavour.


Expected Time To Hit A Score Difference

For reasons I've recently found myself interested in the following problem: Suppose you have a series of bernoulli random variables \(X_1, \ldots, X_n, \ldots\) independently distributed with identical parameter \(p > \frac{1}{2}\). What is the expected time until you have at least \(k\) more successes than failures?

I spent an annoyingly long time with it before realising in the shower this morning that it's actually very easy. The key to solving this is to realise the critical role that the \(k = 1\) version plays.

Let \(a_k\) be the expected time to get \(k\) more successes than failures. Then \(a_0 = 0\) clearly. The key observation is that for \(n \geq 1\) we have \(a_n = 1 + (1 - p)(a_1 + a_n) + p a_{n - 1}\).

Why does this hold?

Think of it as an infinite state markov chain. When you are at state \(n\), you can either drop down to state \(n - 1\) with probability \(p\), or go up to state \(n + 1\) with probability \(1 - p\). If we go up a state, in order to return to the current state subsequent draws need to have exactly one more success than failure. By definition, this takes an expected time of \(a_1\).

We can now plug in \(a_1\) to this and do some simple algebra to get that \(a_1 = (2 p - 1)^{-1}\).

For \(n > 1\) we again do some algebra and get \(a_n = p^{-1}(1 + (1 - p)a_1) + a_{n - 1}\), or \(a_n = (2p - 1)^{-1} + a_{n - 1}\). Putting these all together we get that \(a_n = n (2p - 1)^{-1}\)

Which all told is a much nicer result than I was expecting.


A Fun Puzzle

From Twitter:

fun math game:

there are two players, and a machine that outputs a random number between 0.0 and 1.0 when you press a button

(inclusive, chosen uniformly and independently, from the reals, etc) player 1 pushes the button twice, and multiplies the two outputs together to get a score (e.g. 0.45 x 0.9=0.4).

then player 2 pushes the button once, and squares the result to get their score (e.g. 0.67 x 0.67 = 0.4489)

the higher score wins. which player wins more often?

The answer is that player two wins with probability \(\frac{5}{9}\).


I originally just brute forced solved the integral, but it's easier to see after a change of variables. The log of a uniform \(0, 1\) random variable is exponentially distributed with parameter \(1\). So this boils down to the claim that if \(X, Y, Z \sim \textrm{Exp}(1)\) then \(P(X + Y \leq 2Z) = \frac{5}{9}\).

The proof of this is as follows:

\begin{align} P(X + Y \leq 2Z) &= P\left(Z \geq \frac{X + Y}{2}\right) \\ & = \int\limits_{x, y \geq 0} e^{-x - y} \int\limits_{z \geq \frac{x + y}{2}} e^{-z} dz dx dy \\ & = \int\limits_{x, y \geq 0} e^{-x - y} \left(1 - e^{-\frac{x + y}{2}}\right) dx dy \\ & = \int\limits_{x, y \geq 0} e^{-x} e^{- y} dx dy - \int\limits_{x, y \geq 0} e^{-\frac{3}{2} x} e^{-\frac{3}{2} y} dx dy \\ & = \left(\int\limits_{x, y \geq 0} e^{-x} dx\right)^2 - \left(\int\limits_{x, y \geq 0} e^{-\frac{3}{2}x} dx\right)^2 \\ & = 1^2 - \left(\frac{2}{3}\right)^2 \\ & = \frac{5}{9} \\ \end{align}

I still feel like there should be a nicer solution than this. It's a step above just brute force solving the integral in tidiness, but I don't feel it makes it any more intuitive where the actual number comes from.

Update: OK, here is probably as nice as it's going to get.

For a random variable \(A\) define the moment generating function of \(A\) as \(F_A(t) = E(e^{tA})\). Note that if \(A, B\) are independent then \(F_{A + B}(t) = E(e^{t(A + N)}) = E(e^{tA} e^t{B}) = E(e^{tA}) E(e^t{B}) = F_A(t) F_B(t)\).

Note also that if \(Z \sim \mathrm{Exp}(\lambda)\) and \(P(A \geq 0) = 1\) then \(P(Z \geq A) = E(P(Z \geq a | A = a)) = E(e^{-\lambda A}) = F_A(-\lambda)\).

Now, with \(X, Y, Z\) as above, we have:

  1. \(P(X + Y \geq 2Z) = F_{X + Y}(-\frac{1}{2})\) (because if \(Z \sim \mathrm{Exp}(1)\) then \(2Z \sim \mathrm{Exp}(\frac{1}{2})\)).
  2. If \(t < 1\) then \(F_X(t) = F_Y(t) = \int\limits_0^\infty e^{(t - 1)x} = \frac{1}{1 - t}\)
  3. \(F_{X + Y}(t) = F_X(t) F_Y(t) = F_X(t)^2 = \frac{1}{(1 - t)^2}\)

Thus, plugging in the numbers, we have \(P(X + Y \geq 2Z) = \left(\frac{1}{1 + \frac{1}{2}}\right)^2 = \frac{4}{9}\) as desired.

Update 2: Colin's solution is much better than mine.


Two Player TickTalk

Today's TickTalk had a high drop out rate so ended up with just two of us there, so we tinkered with the format to try to make it work.

The main thing we did here was that we did away with the cards and just passed it back and forth with each taking it in turns to talk about things. Often this would be digging more into something we'd just be talking about. Voting was just a question of "Shall we talk about this some more?" "Sure"

We did maintain talking object protocol. We weren't always very good at it, which I've noticed tends to happen in TickTalk in general (especially with small groups) but I still think it was worth having.

I found we often lost track of the timer and in particular whether we were in the second or first flip of it, so I think the way I would run a two person TickTalk in future is this:

  1. The person with the timer gets to pick the topic for this five minute session and puts it in front of them.
  2. When the timer runs out, the other person can choose to just flip it over and leave it in front of the current person to continue the topic.
  3. They can also claim the timer, flipping it and putting it in front of them and saying "I'd like to change the topic / focus on X / etc." (it should be very acceptable to change topic)

In particular it's perfectly fine to continue on with a topic for more than ten minutes in two player TickTalk. The format is naturally more flexible than with more than two players, which I think is good for a two person conversation.

This session was extremely good. Rachael (the other attendee) and I always have good conversations, so that's not necessarily indicative of it in general, but I did think it was an unusually good conversation with her even given that.

I won't write detailed notes on our conversation, but one thread in it that was very helpful was to help clarify what some of the value of TickTalk is: It creates an environment where a lot of the emotional work you have to do of managing a conversation (e.g. making sure you're not dominating it) is taken care of automatically by the structure of it, so that you can stop worrying about it and focus on having an actual conversation. These actual conversations are then very good at getting you to explain yourself to other people, which lets you use the group as a way of understanding and analysing your assumptions about your situation.

I've thought of TickTalk before as a way to get a group to have a conversation that is about as good a conversation as the group could have in principle but probably wouldn't in practice, and I do think that emotional management aspect is a big part of it. I think it also tends to push the structure more naturally towards the analytical mode, so among the optimal conversations the group has it will tend to select for more analytical ones, which is often very helpful to get some perspective on the problem.


Separating Sampling and Removal

This is a weird data structure that as far as I know I invented (it's plausibly a reinvention, but it might also just be too niche a set of requirements for anyone else to have bothered with) and I quite like. It uses a combination of two neat tricks to support all the following operations in O(1):

It does assume that the values in the collection are hashable, but honestly I've only ever wanted to use it with integers.

It builds on the following trick:

class LazySequenceCopy(object):
    """A "copy" of a sequence that works by inserting a mask in front
    of the underlying sequence, so that you can mutate it without changing
    the underlying sequence. Effectively behaves as if you could do list(x)
    in O(1) time. The full list API is not supported yet but there's no reason
    in principle it couldn't be."""

    def __init__(self, values):
        self.__values = values
        self.__len = len(values)
        self.__mask = None

    def __len__(self):
        return self.__len

    def pop(self):
        if len(self) == 0:
            raise IndexError("Cannot pop from empty list")
        result = self[-1]
        self.__len -= 1
        if self.__mask is not None:
            self.__mask.pop(self.__len, None)
        return result

    def __getitem__(self, i):
        i = self.__check_index(i)
        default = self.__values[i]
        if self.__mask is None:
            return default
            return self.__mask.get(i, default)

    def __setitem__(self, i, v):
        i = self.__check_index(i)
        if self.__mask is None:
            self.__mask = {}
        self.__mask[i] = v

    def __check_index(self, i):
        n = len(self)
        if i < -n or i >= n:
            raise IndexError("Index %d out of range [0, %d)" % (i, n))
        if i < 0:
            i += n
        assert 0 <= i < n
        return i

This allows us to make a copy of the initial sequence without actually copying it.

Now if we wanted to do sampling without replacement it's easy: We can just do a lazy fisher yates shuffle:

def sample_without_replacement(random, sequence):
    i = random.randrange(0, len(sequence))
    j = len(sequence) - 1
    sequence[i], sequence[j] = sequence[j], sequence[i]
    return sequence.pop()

We pick a random element, swap it to the end, and then pop the last element. This lets us randomly sample and then remove in O(1) because we don't care about preserving the order of the sequence.

The tricky bit is what to do when you want to sample and then decide later whether you want to remove the value or not. I e.g. run into this requirement when randomly but exhaustively exploring a state space - sometimes we discover that a node is fully explored and we want to remove it, but we can't know that at the time of sampling.

The idea is to do deferred deletion. Rather than attempting to actually delete it at the time of deletion, we simply record that we wanted to delete it, and the next time we sample we remove any elements we encounter.

This works as follows:

from collections import Counter

class RemovableSampler(object):
    def __init__(self, values):
        self.__values = LazySequenceCopy(values)
        self.__deletions = Counter()

    def sample(self, random):
        while True:
            i = random.randrange(0, len(self.values))
            v = self.values[i]
            if self.__deletions[v] > 0:
                j = len(sequence) - 1
                sequence[i], sequence[j] = sequence[j], sequence[i]
                self.__deletions[v] -= 1
                return v

    def delete(self, value):
        self.__deletions[value] += 1

i.e. we reuse the procedure from the lazy fisher yates shuffle but we only remove if we're not going to return the value because we've previously deleted it.

We can see that this is ammortized O(1) by using a "debt" model. Each call to delete increases the total debt by one, and each iteration of sample either returns or repays one debt, so the total cost done is never more than O(deletes + samples).


Notes on Disagreement

We did what I thought was a very good TickTalk session about disagreement today (actually we disagreed on whether it was about disagreement, in that everyone except me forgot that it was supposed to have a theme, but I ended up interpreting everything through a disagremeent themed lens. This worked surprisingly well).

We identified what I thought were some quite good themes / advice. In no particular order:


Notes on the Legibility War

There's a thesis that I've been mulling over for the last six months or so that I don't feel like I've got fully formed enough to properly write down, but equally have been thinking about too much to not write down, so this post is some partial notes on the subject. I don't know how much any of this will make sense, but it's at least a starting point.

The basic thesis is this: Underlying much of life and politics is the legibility war, a conflict to over how how we make people intelligible to each other.

A partial reading list for the Legibility War:

(You don't have to read these to understand this post, they're just some of my key influences here).

I'm also currently reading Discipline and Punish by Foucault, and I think it will probably be relevant to how this plays out. Certainly the notion of power I hold is already fairly Foucaldian.

The basic idea of legibility is that the act of making something comprehensible enough to control is itself an act that shapes the thing to be controlled, often with far greater consequences than the control itself. This is because it removes complexity that is deemed as irrelevant that makes it harder to control, and that complexity may be in some way essential to the health of the system.

Compare this with "Sorting Thing's Out"'s idea of torque. Torque is the metaphorical force that acts between lives and classification systems, shaping each to the other - the boundaries of classification are negotiated, so that they work for at least most people, while people's lives are forced to fit into boxes when classification is linked to power. A minority may remain miscellaneous, and the majority may fit more or less well, but through the exercise of torque the lives and classification system are twisted to more or less meet eachother. In the process of classification, people are made legible, so one way of looking at torque is as the legibilising force.

As I've discussed before legibility is not just a thing inflicted by institutions on people, but on people to each other. We make other people explain themselves in terms we can understand, we hire people who we can understand better. People are punished (even if only by making them work harder, but often more than that) when they fail to make sense to us, and rewarded when they make sense. But equally, the way we make sense of the world is shaped by the people we have to understand. In this way we all exert torque on eachother.

Miranda Fricker proposed the notion of epistemic injustice, which is injustice done to someone in their capacity as an epistemic actor. That is to say, as someone who seeks to understand and know things about the world. The two types of epistemic injustice relevant to the legibility war are Fricker's notion of hermeneutical injustice and Dotson's notion of contributory injustice. These are both framed in terms of what Fricker calls hermeneutical resources - tools of interpretation, things that help you understand the world. e.g. concepts, words, stories you can use as analogies. Hermeneutical injustice is injustice which denies you the hermeneutical resources you need, contributory injustice is injustice in refusing to take on the hermeneutical resources you need to understand someone else's experience.

I'm less convinced than I used to be that "hermeneutical resource" is a good framing for this. It is possible to deny someone's ability to interpret the world without denying them hermeneutical resources (e.g. gaslighting), and it is possible to aid someone's ability to interpret the world without providing them with them (e.g. rubber ducking). I think a better way of looking at this is Bowker and Star's notion of an information infrastructure. Society is an infrastructure that each of us relies on to interpret the world, but access to and support from that infrastructure is not evenly distributed. Hermeneutic and contributory injustice can be both thought of as structural prejudices in the information infrastructure and how we interact with it.

The legibility war is thus the conflict we all participate in over this infrastructure, trying to shape society's collective interpretative faculties to work in our favour.


Why don't I understand combinatorics?

(This post is mostly my jotting down a half formed thought so that I don't get distracted by it now or forget it later).

A persistent mathematical failing of mine is that I'm not very good at combinatorics. This might seem like a minor issue (there are lots of areas of mathematics I'm not good at!) but there's increasingly often a sort of "combinatorial core" to problems I'm interested in, so it's started to be something of a problem, because it's not just that I'm not interested in being good at combinatorics (though my interest is almost entirely extrinsic rather than intrinsic). It's that I'm interested and have tried to be good at it (a bit) but I'm still not good at it.

An interesting question is why this is the case.

Thanks to The Fully General System For Learning To Do Hard Things there are only three reasons not to be good at something:

  1. I haven't done the work required.
  2. I haven't figured out the right way to decompose the problem so that I can do the work required.
  3. I have hit some blocker where it is literally impossible for me to get good at it.

I'm entirely sure it's not the third (it's almost never the third). It's definitely partially the first - I haven't done enough work, that's for sure - but I feel like I've made much better progress on other subjects with the amount of effort I've already invested in combinatorics. Some of that might be intrinsic. People laugh when I say I'm a mathematician who is bad at numbers, but a) That's because they don't realise that that's actually normal and b) I'm particularly bad at numbers for a mathematician. But I think it's worth considering that it might be the second. If nothing else, time spent thinking about how to decompose the problem is almost always productive for understanding the problem even if you don't succeed at decomposing it.

One of my favourite papers is Tim Gowers's Two Cultures of Mathematics. In it he suggests that the following two questions are useful for determining a mathematician's allegiance:

  1. The point of solving problems is to understand mathematics better.
  2. The point of understanding mathematics is to become better able to solve problems.

Grouping mathematicians into roughly two groups: Theory builders and problem solvers. Every mathematician is a bit of both of course, but there's a big difference in where you put the emphasis.

He offers the following delightful definition in this paper:

I often use the word “combinatorics” not quite in its conventional sense, but as a general term to refer to problems that it is reasonable to attack more or less from first principles.

I'd like to suggest that if this is a reasonable definition of combinatorics then one of the reasons I struggle with combinatorics is the universal reason for struggling to decompose a subject: You were taught it badly.

Note that this doesn't necessarily mean badly in some absolute sense, only that the way it was taught was bad for you, though it often does mean that it's taught badly in an absolute sense.

One of the things that I am currently thinking is problematic about how combinatorics is taught is that it is poorly toolkit-ized.

What do I mean by that?

Well, a cognitive toolkit is... a general grab bag of techniques. A hammer you can use to hit a thorny problem with, a saw you can use to take it apart, that sort of thing.

In theory-builder mathematics the items in your toolkits are usually theorems and definitions. e.g. topology is an amazing selection of tools for tearing apart interesting problems. Fixed point theorems are another nice example - you want to show an object exists, you just construct a suitable function that you have a theorem about and show that the function necessarily has a fixed point and that a fixed point of the function has the desired property.

This theorems as tools feature is a fundamental feature of how most people do mathematics, and as a result of how mathematics is taught, but in combinatorics the tools are not theorems.

Example from Gowers's paper:

If a combinatorialist were to interrupt such a gathering andask roughly how many subsets of \(\{1, 2, \ldots, n\}\) can be found such that the symmetric difference of any two of them has size at least \(\frac{n}{3}\), the response might very well be a little frosty. (This problem is very easy if and only if one knows the appropriate technique, which is to choose sets randomly and show that the chances of any given pair of them having a symmetric difference of size less than \(\frac{n}{3}\) are exponentially small. So the answer is \(e^{cn}\) for some constant \(c > 0\).)

There's not really a theorem here, there's just a general idea that it might be worth trying things at random. That general idea is much more powerful than any one theorem in combinatorics.

I've previously suggested that one of the more powerful learning loops is:

  1. I have a problem, how do I solve it?
  2. I have a solution, what problems can I apply it to?

In theory-builder mathematics a solution is usually the application of a particular theorem, so it makes sense to organise your learning around theorems, while in combinatorics theorems are more like an accidental byproduct of the problem solving process, and so organising your learning around them will make the whole subject seem weird and disconnected and difficult to learn.

On top of that, Gowers's definition of combinatorics as problems that are amenable to an elementary approach means that you don't necessarily need to learn a great deal of surrounding context and theory to tackle a problem. All you need is the definitions and a clever idea. So, instead of trying to learn about, say, combinatorics of finite sets, or permutations, or whatever, it might be worth organising your learning around these general principles: Rather than try to understand a particular class of combinatorial object, take a particular combinatorical trick and try applying it to a wide class of objects.

If you've already read the Gowers paper, you may now be thinking "Yes David, this is exactly what Gowers said" and in some sense this is true, but he was trying to justify the appeal of the subject. Somehow I had failed to miss the key insight that the natural organising principle of a subject might be a good thing to orient your learning around.

I would feel more embarrassed about this if it were not a failure mode shared by every combinatorics textbook I have ever read.


Further thoughts on Lashon Hara

Since my last post on Lashon Hara I've done a bit more reading, including an English translation of the Sefer Chofetz Chaim (which I don't particularly recommend), and "Jewish Laws of Speech: Towards a Multicultural Rhetoric" by Erika Falk, which I do.

There has also been an interesting test case of this in the Python community recently, which is njs's post about Kenneth Reitz. This is unambiguously Lashon Hara (well except in the sense that the concept doesn't apply because they're not as far as I know Jewish), and by repeating it and discussing it we're undoubtedly committing sins of spreading and believing Lashon Hara (same caveats). It turns out I'm OK with this.

The strong impression I get reading the Sefer Chofetz Chaim was that it was a set of norms that was really not very interested in minimizing harm to individuals, but was instead designed to promote community cohesion and conformance. I can see why that would be a useful thing in many contexts, but I don't think it is a set of structures I'd want to emulate. In particular, I don't think I could willingly accept the standards of evidence required, which require at least two first hand witnesses before an accusation is to be believed.

It's an interesting set of norms, and one that I think is worth thinking about when designing community norms, but ultimately I think I will continue to spread Lashon Hara and hope the fact that I'm only a bit Jewish will protect me from any divine retribution that may be forthcoming.

Another thing that was linked to as a result of the Lashon Hara discussions were the three vital questions:

  1. Does this need to be said?
  2. Does this need to be said by me?
  3. Does this need to be said by me now?

I like these much better as a framework to think about limiting your own speech, but will note that I might often find myself answering "No, but it would amuse me to do so, so I will".


Improving Binary Search by Guessing

The following code (lightly modified from Hypothesis code for clarity) is among the most useful things I have ever written:

def find_integer(f):
    """Finds a (hopefully large) integer n such that f(n) is True and f(n + 1)
    is False. Runs in O(log(n)).

    f(0) is assumed to be True and will not be checked. May not terminate unless
    f(n) is False for all sufficiently large n.
    # We first do a linear scan over the small numbers and only start to do
    # anything intelligent if f(4) is true. This is because it's very hard to
    # win big when the result is small. If the result is 0 and we try 2 first
    # then we've done twice as much work as we needed to!
    for i in range(1, 5):
        if not f(i):
            return i - 1

    # We now know that f(4) is true. We want to find some number for which
    # f(n) is *not* true.
    # lo is the largest number for which we know that f(lo) is true.
    lo = 4

    # Exponential probe upwards until we find some value hi such that f(hi)
    # is not true. Subsequently we maintain the invariant that hi is the
    # smallest number for which we know that f(hi) is not true.
    hi = 5
    while f(hi):
        lo = hi
        hi *= 2

    # Now binary search until lo + 1 = hi. At that point we have f(lo) and not
    # f(lo + 1), as desired..
    while lo + 1 < hi:
        mid = (lo + hi) // 2
        if f(mid):
            lo = mid
            hi = mid
    return lo

This is loosely based on ideas from timsort.

The reason this code is so useful is that it lets you make many operations adaptive at zero cost: If you have an operation that it may be useful to do some large number of times, you can compose it and do only \(O(\log(n))\) checks to do \(O(n)\) work.

In particular the nice feature of this that justifies the term adaptive is that the amount of work it does is logarithmic in the size of the output. It either gives you a very large result or costs very little.

An example of how this gets used in test-case reduction is the following:

def reduction_pass(ls, predicate);
    ls = list(ls)
    i = 0
    while i < len(ls):
        # Will delete a sequence of length n in O(log(n))
        k = find_integer(lambda k: predicate(ls[:i] + ls[i + k:]))
        del ls[i:i + k]
        i += 1

This allows for a test-case reduction pass that never does substantially more work than the naive greedy algorithm which tries deleting one item at a time, but can potentially do very large deletions with much less work than that when the opportunity arises.

A neat variant I realised earlier is that you can use this function to define the following variant of binary search:

def binary_search_with_guess(f, lo, hi, guess=None):
    """Find n such that lo <= n < hi and f(lo) == f(n) != f(n + 1). It is
    assumed that f(hi) != f(lo) and will not be checked.

    ``guess`` is a prediction of the value of n and defaults to lo.
    This function runs in O(log(abs(guess - n))).

    if guess is None:
        guess = lo

    assert lo <= guess < hi

    good = f(lo)

    if f(guess) == good:
        # Our guess was equivalent to lo, so we want to find some point after it.
        k = find_integer(lambda k: guess + k < hi and f(guess + k) == good)
        return guess + k
        # Our guess was equivalent to hi , so we want to find some point before it.
        k = find_integer(lambda k: guess - k >= lo and f(guess - k) != good)
        return guess - k - 1

This is a binary search with a twist, which is that you start with a guess as to what you expect the answer to be. The cost of the search is still logarithmic, but it's logarithmic with respect to how bad your guess is rather than how large the range of the search is. If you can rely on your guess being pretty good, this will sometimes let you do the binary search in \(O(1)\) instead of \(O(\log(n))\).

Naturally this can't improve on binary search (which is optimal) in the general case, so how useful this is depends entirely on how good your guess is. If your guess is maximally bad (i.e. you guess one end and the correct answer was the other), this can make up to twice as many calls to the test function as a classic binary search does, but as long as your guesses are on average pretty good it will tend to win out.

The reason this came up for me was that I'm working on a variant of Angluin's L* algorithm with the Rivest and Schapire modifications. Without going into details, you have a sequence of \(N\) elements. You know the first one is good, and the last one is bad, and you want to find a good element which is directly followed by a bad one, and so so using binary search. You then make a change to your model that makes that bad element good, but this can have an unknown knock-on effect on the other elements, so you need to repeat the procedure. The use of the guess here is this: Generally speaking, most elements are bad, and elements rarely go from good to bad as a result of our fix procedure (it absolutely can happen, it just usually doesn't). This means that a pretty reasonable guess is \(0\) on the first iteration, and the element we just made good on the subsequent ones (because the elements before it probably haven't changed from good to bad, and the element we just fixed is probably now good).

How much of an improvement this is would require more benchmarking than I've done (i.e. anything other than eyeballing the times), but anecdotally this guessing heuristic seems to be mostly very accurate, so in theory it should be a factor of five to ten performance improvement just based on the size of the examples (because it cuts out a \(\log(n)\) factor and the sequences are a few hundred items long)..


TIL: Libguides

I happened to log into tumblr for the first time in ages and found this possibly lifechanging piece of information:

I’m sure it’s common knowledge that scholars and writers have academic specialties. The same is true for subject librarians! Most libraries use a tool called Libguides to amass and describe resources on a given topic, course, work, person, etc. (I use them for everything. All hail Libguides.) These resources can include: print and ebooks, databases, journals, full-text collections, films/video, leading scholars, data visualizations, recommended search terms, archival collections, digital collections, reliable web resources, oral histories, and professional organizations. ... An even better way to search for libguides?


Use the libguide community site and search by topic, institution, or even your friendly neighborhood librarian! (If you have a librarian or two who you trust to put you on the right path, you might be able to get that guidance even if you don’t have time to reach out directly!) If their site says “LibGuide” it’ll show up in THAT community somewhere!


Lashon Hara

This was a good Twitter thread:

People compliment me on my maturity in regards to how I engage with "twitter drama" and a lot of that is due to me refraining from Lashon Hara. So in this thread I'm gonna teach you about Lashon Hara and how, whether you're Jewish or not, you can (and should) refrain from it too. Lashon Hara (lit. "Evil Tongue") is to say negative things about someone, particularly about their misdeeds, which are completely true, but you're not trying to accomplish anything useful by saying it.

The following point from the twitter thread was particularly interesting:

Why is it forbidden by Jewish law to spread Lashon Hara? Because historically we lived in small close-knit communities, and in order to function and coexist and be healthy, we need to know that it's possible to atone for our misdeeds, be forgiven, and move on from it.

The wikipedia page on Lashon Hara is also good.

This makes me wonder whether there are other interesting social lessons to learn from Jewish culture. Wikipedia suggests the Chofetz Chaim might be a good source.


Short Reviews

I've read a whole bunch of good books recently that I've failed to review. Here are some very short reviews of them, pending possible longer ones later. This post is more by way of an IOU longer reviews and a note to self.

A Slip of the Keyboard, by Terry Pratchett

A collection of nonfiction by Terry Pratchett. It is good, but probably only for Pratchett completionists. Main outcomes of my reading it are impulse buying a copy of Brewer's Encyclopedia of Phrase and Fable despite knowing full well that the last time I owned one I barely ever even opened it, and reminding me that I should have another go at Nation.

At some point I will have read everything Terry Pratchett ever wrote, and I do not look forward to the kick in the feels that day will bring, so I'm rather putting it off.

Teaching What You Don't Know by Therese Huston

This was good, but if you're going in looking for insights on a broader topic, you probably won't get them. This book is very specialised for its problem domain, and quite USA centric in many of the details. Still, this was what I needed (USA centricity aside), so I found it useful. It also contains a bunch of recommendations for other books that I'll be following up on.

The Communication Book: 44 Ideas for Better Conversations Every Day by Mikael Krogerus and Roman Tschäppeler

I don't remember why I own this book, but decided to finally sit down and read it cover to cover. It's fine. There are some neat ideas in there, and it's good for flicking through. I'll probably get some benefit out of it at some point and am going to keep it around, but nothing life changing.

Invitation to Personal Construct Psychology by Trevor Butt and Vivien Burr

This book is very good but also very expensive. I bought it second hand at a much lower price but all the cheap(ish) ones are gone and I'm not sure I can in good conscious recommend spending £44 on it.

Where Good Ideas Come From by Steven Johnson

Very good, strongly recommended. I agree with most of it, though I think he oversimplifies in places, and I'm suspicious of how well he really understands technology and/or evolutionary biology (some things where I feel like he's misrepresented the tech side to a degree larger than lies to children would permit, no specific misgivings about the biology side of things it's just a bit pat and that raises alarm bells).

That being said, the model of how idea generation works as an evolutionary process is spot on. It filled in some useful details for me and I think a lot of people would benefit from understanding this better.

The Current Affairs Mindset

A good book of left-wing political writing that isn't afraid of nuance or disagreeing with the consensus. I suspect a lot of people I know would read this and immediately decry some of the authors as horrible centrists, but I don't think that would be an accurate reading.


Book Review: Agnotology: The Making and Unmaking of Ignorance, edited by Robert N. Proctor and Londa Schiebinger

This was a good book and is going on my rereading shelf, but I would maybe only softly recommend it.

The book starts with a discussion of why they thought a new words was needed and the process by which they went about coining it, which automatically makes me like them more.

Agnotology is proposed as a sort of dual to epistemology. The claim (which I believe) is that ignorance is more helpfully viewed as a thing in its own right rather than a mere absence of knowledge. In particular, ignorance is something that can be produced actively in its own right (a subject previously discussed in my review of Trans Like Me).

I think maybe the main thing I got out of it is a lot more clarity of the process and history behind how science is (ab)used by corporations to manufacture uncertainty when it is in their interests - e.g.~climate change denial and tobacco companies. It also prompted me to think a lot more about the intersection between power and how knowledge and ignorance are constructed, though I'm not sure that was necessarily a set of questions that were well posed by the book.

I had two major problems with the book:

The first is that it suffers the problems that many books which are collections of essays do, which is that by virtue of being a whole bunch of chapters written by different people it's both a bit lacking in coherence and is also incomplete. The chapters do not support each other well, and it often feels like there are some missing gaps. For example there are lots of good empirical/historical chapters about ignorance being bad, and a number of chapters pointing out that ignorance isn't necessarily bad on theoretical grounds, but I felt like the theory on bad ignorance and empiricism on good ignorance were both quite light on the ground in comparison.

The result is that it is a collection of very good essays rather than a very good collection of essays.

The second thing, which I think is more damning, is that it is incredibly North America, and in particular USA, centric. To the degree that it references Europe and Africa it is almost always as a precursor to the main story in North America (in fairness one chapter - on genetic engineering - is a bit more even handed). I increasingly do not trust theorising that comes out of the USA, because what inevitably happens is that it results in arguments from first principles and claims about the fundamental nature of humanity that describe patterns that you only really see in the united states because the USA is really fucking weird. If you want to know how ignorance is manufactured, I submit that looking at the way people build their understanding of the world and the nature of knowledge based entirely on examples centered around a single country would be an excellent topic of study, but instead this book is merely an example of that.

I also would have appreciated more coverage of the intersection of ignorance and privilege. The chapter on White Ignorance was pretty good for that, but I think there were a lot of things it did not cover, and it would have been interesting to see more about that. Maybe I just need to go read more Kristie Dotson instead though (this isn't exactly her area, but epistemic justice and social epistemology are pretty strongly adjacent).


Book Review: The Descent of Man by Grayson Perry

This book was one I ended up with as a result of asking the following question:

Suppose I wanted to read books written by men which are either about or in some way exemplify healthy models of masculinity, what would you recommend?

I don't feel it's an especially good answer to that question, as it's mostly yet more writing about the problems with traditional masculinity.

It's perfectly fine as an instance of that. It's short, well written, and I'm sure there are many men out there who would benefit from it, but I was very much a member of the choir being preached to, and I didn't really find it especially insightful or clarifying. I feel like everything useful I could have got out of it I'd already gotten better out of Brene Brown.

There were a few bits towards the end that I vaguely intend to take notes on, so I'm going to hang onto this book for a few weeks to see if I do, and then I'm probably going to dump it on the book exchange shelves at my tube station in the hope that someone else finds it who is more in need of it than I am.


Book Review: Illness (Art of Living) by Havi Carel

This was a good book, and is going on my rereadings shelf. I've already recommended it to several people before I finished it. I have maybe a few more caveats after having finished it, but I still think it was a helpful read.

The book is what happens when a professional philosopher discovers that she is terminally ill and decides to do philosophy to it. What results is an exploration of the phenomenology of being ill - the subjective and social experience of it. She does not deny that there is an objective physical component to illness of course, but explores how understanding illness on a purely objective, physical, level in many ways misses the point and that the subjective experience of the patient should be be considered.

I'm not really sure what to do with the information from this book yet. I tried to summarise it in this review, and my summaries kept coming out as quite uncharitable, so I'm going to leave it at this: I liked the book and found it a useful perspective, but I'm not sure it's a perspective I'm able to or want to adopt. I disagree significantly with several of the arguments presented, but think they are probably still quite useful mental frameworks to adopt in some circumstances.


Book Review: "Daring Greatly" by Brene Brown

This was a pretty good book, and is going onto my rereadings shelf.

A friend described Brene Brown as "A bit too shiny and US American self-helpy" and that's not an unfair description, but I found the book fairly useful anyway.

The most useful aspect of this book for me was clarifying the distinction of shame vs guilt. Guilt is feeling that you have done something bad, shame is feeling that you are something bad. The former is often a productive emotion (within bounds), the latter almost never is, because thinking you are in some way intrinsically bad mostly causes you to lock down and blocks you from becoming better.

I also found the discussion of how shame plays out differently depending on gender helpful for reasons that I don't currently feel able to elaborate well.

There are a number of suggested interventions in this book. I've yet to try any of them and I'm not convinced that I'm going to, but I intend to revisit it and that may change when I do. Even without those interventions, I found it helpful for reshaping some of my perspectives and giving me better language to talk about it.


You're walking wrong

This started out life as an extended metaphor for how we treat people's support mechanisms in general, but I think it works pretty well on a literal reading too.

Note that I am essentially able-bodied (minor caveats only), but I did check this post with a cane user before publishing it.

A: Hey, you over there, the one walking! You're doing it wrong.

B: Excuse me?

A: You're using a cane! You shouldn't do that!

B: ...

A: You should be walking with your own two feet, like a normal person!

B: I will fall over if I do that.

A: Have you tried?

B: Yes. I fell over.

A: Maybe you should try harder?

B: That would cause me to fall over harder.

A: You just need more practice! The problem is, you're using the cane as a crutch.

B: Well, technically a cane and a crutch are different assistive devices. You see a crutch-

A: I mean you're using your cane to support your weight.

B: It's more about stability, but yes that's what it's for.

A: You should be using your legs to do that!

B: I literally can't. I have a torn ligament in my ankle. It will not hold my weight properly.

A: That's no excuse. You should fix that then!

B: I am on a waiting list for surgery, yes.

A: Well if you know how to fix it, why are you using that cane?

B: Because I haven't had the surgery yet and I need to walk now.

A: But what if your cane is made with conflict minerals?

B: Uh, is it?

A: I don't know, did you even check?

B: Not really, no. Who made your shoes?

A: (waves hand dismissively) oh who cares about that? Everyone wears shoes. You're wearing shoes. But your cane might be made with conflict minerals, and I don't have one of those.

B: In the grand scheme of things I don't think canes are a significant contributor to global conflict.

A: But it's all of our responsibility to care about these things! You should be more responsible in how you walk.

B: OK. If we're all responsible, would you like to pay for me to be able to have my surgery sooner?

A: No, it's your responsibility to do that too.

B: ...

A: Anyway, have you tried yoga?


A Boltzmann Agent with Very Bad Judgement

As per previous post, it can make sense when looking at a set of consistent propositions to consider agents as Boltzmann samplers over the set of valid consistent beliefs, with their reliability measured by the expected number of true beliefs.

A thing I hadn't previously realised is that this can cause an agent that is on average reliable to be reliably wrong for some propositions.

Consider a chain of propositions of the form \(P_1 \implies \ldots \implies P_n\). There are exactly \(n + 1\) possible consistent beliefs for this sampler (each defined by the first \(P_i\) that the agent believes), so the Boltzmann generating function is \(B(x) = 1 + \ldots + x^n\). Suppose \(n = 10\). Some simple maths (by which I mean I used sympy) shows that this agent ends up believing \(P_1\) with probability at least half only when \(x \approx 2\), which leds to the expected number of propositions believed being \(\approx 9\). So in order to achieve \(50%\) reliability on the base proposition we have to achieve \(90%\) overall reliability!

This isn't very surprising in some sense, but probably puts a bound on how good we can expect judgement aggregation to be in this case.


Boltzmann Samplers for Consistent Epistemic Actors

(This post will make little sense to you)

A question I have been wondering in relation to judgement aggregation is how we should be measuring reliability on sets of connected proposition.

Suppose we have \(C = A \wedge B\), with the true state of the world being that \(A\) and \(B\) are both true, If we choose \(A\) and \(B\) independently with probability \(p\) then we conclude \(C\) with probability \(p^2\). This leads to the result that a majority can be right about each of \(A\) and \(B\) but wrong about \(C = A \wedge B\) if \(0.5 < p < 0.5^{0.5} \approx 0.71\).

But what if the problem here is that we have a bad measure of reliability? Judges with \(p > 0.50\) is enough to get the right answer on each of \(A\) and \(B\), but as most of the time they get the wrong conclusion they're clearly not a very good judge. Certainly it would be nice to be able to assemble unreliable actors into reliable parts, but it's hard to do that if we don't know what reliability looks like.

Suppose we measured the reliability of a judge on a set of propositions with the expected fraction they got right. For independent propositions this is just \(p\) as above, but if we add \(C\) into the set then the reliability is \(r(p) = \frac{2p + p^2}{3}\). Some basic calculation (for which I used sympy) shows us that if \(p > 0.582\) then \(r(p) > 0.5\), and that the when \(P(C) = 0.5\), \(r(p) \approx 0.64\). So under this notion of reliability we can still have that a "reliable" judge will get \(C\) wrong most of the time.

What if the problem is now our generative model? Why are we picking \(A\) and \(B\) independently rather than considering \(C\)? Suppose we want to consider a maximally uninformative distribution of possible beliefs given a fixed reliability, what should we do?

Well, this is a Boltzmann sampler.

If size is number of true propositions believed, the Boltzmann generating function for this problem is \(B(x) = 1 + 2x + x^3\), so the expected size of a Boltzmann sampler is \(E(x) = x \frac{B'(x)}{B(x)} = \frac{2x^2 + 3x^3}{1 + 2x + x^3}\). This has a reliability of \(0.5\) for \(x \approx 1.21\). At this value of \(x\) we have \(P(C) = \frac{x^3}{B(x) \approx 0.35\). We have \(P(C) = 0.5\) for \(x \approx 1.62\), which corresponds to a reliability of \(\approx 0.63\). This is only a slightly lower reliability than in the model where we chose \(A\) and \(B\) independently.

Now, suppose we take into account the logic structure of the propositions in our voting as follows: For each voter we define the distance of their beliefs to the chosen outcome, and we choose the consistent set of beliefs that minimizes the total distance to each voter's belief. (You could think of this as analogous to the Kemeny-Young rule) What outcome does this give us?

Well, we can calculate the expected distance of each possible outcome. The possible outcomes are \(v_1 = \neg A \wedge \neg B \wedge \neg C, v2 = A \wedge \neg B \wedge \neg C, v3 = \neg A \wedge B \wedge \neg C, v4 = A \wedge B \wedge C\).

Discounting the \(B(x)\) factor, we get expected scores proportional to:

We have \(S_4 \leq \max(S_1, S_2, S_3)\) for \(x \geq 1.29\), which corresponds to a reliability of \(\approx 0.525\). This is much lower than the reliability we required to get the majority of people believing in \(C\) under any model where we voted on the propositions independently. Because we knew the logical structure of the problem space, we could take advantage of that to combine the independent support for \(A\) and \(B\) into support for \(C\).

One of the debates in judgement aggregation is about how to do this without dividing the set of propositions up into premises and conclusions, and this does that pretty well. My suspicion is this sort of agreement maximisation will tend to produce relatively high reliability results from relatively low reliability inputs.


Book Review: The Body Keeps the Score by Bessel Van Der Kolk

This was an interesting book, and I intend to keep it for reference, but there was a lot I disliked about it. Given that plenty of people rave about how good it is, I'm mostly going to mention what I disliked, but bear in mind that I not only am glad that I read it but it's even staying on my shelves.

It's thought provoking and reasonably well written, and it helped put some personal stories on a number of aspects of trauma that I was not previously aware of. This was very helpful for fleshing out my intuitive sense of what trauma looks and feels like in other people, and I found that quite valuable.

Unfortunately, I have three fairly major complaints with the book, which sharply limited what I was able to get out of it beyond that. The first is about the content, the second is about the author, and the third is about the relationship between the two.

My complaint about the content is that this is not a book about therapy, this is a book of stories about people in therapy. Many of the stories in some way touch on an important feature of therapy but, and I cannot emphasise this enough, they do not illustrate a point of therapy. The types of therapy mentioned in this book are many and varied, and you will learn barely anything about any of them.

The second is that the author thinks he is very interesting and wants to make sure you know this. There is a type of didactism where the primary thing you are trying to teach is that you are very clever and people should respect you, and this book is very much of that type. In particular he drops names so hard that the rate is almost names dropped per page, not per chapter.

The third is that he seems extremely credulous about the therapies he is writing about. The strongest statement of doubt he ever expresses is "This works really well but we have no idea why". If he is to believed, we live in a world where there are a huge number of different therapies, all of which work wonders, and the only thing stopping us fixing everyone's problems is lack of resources. It seems plausible that this is at least somewhat true and that we are currently resource constrained and many more people should be in therapy than currently are, but it also seems plausible that actually many of the cures he thinks work magically well and should be widely distributed in fact don't work at all. And by "plausible" I mean "that's what the scientific literature seems to show".

All that being said, the book contains a number of interesting anecdotal observations about people in therapy, and I do intend to keep it around to refer back to, it's just that the process of referring back to it will be so that I can do follow up literature searches to find information from people who I like more and whose opinions I actually trust.


(Partial) Book Review: Focusing by Eugene T Gendlin

Let me tell you a thing that happened recently, as a series of events:

  1. Someone I casually knew from the internet told me that I should read a book to solve the problems with my life.
  2. I read the intro of the book and went "Wow this book is culty. I have been invited to join a literal cult."
  3. I read the rest of the book and went "Hmm yes the author is actually a very reasonable person and is making some good points."
  4. I went to a friend and said "I've just read a great book and I want you to read it, here let me buy you a copy."

Is this how it starts? Am I in a cult now?

I don't think I'm in a cult now.

This book is quite good. The introduction is terrible and goes full on woo "Your western science can't explain this!!!", to a degree where I almost put the book down and probably wouldn't have except that it's really short and TBH I really need something in this space, but it then settles down into being a slightly breathless self-helpy take that is well within the the Overton window of normal body oriented psychotherapy, and is a fairly lucid presentation of what appears to be very good advice.

A lot of why I sent a copy to my friend Alex is because she actually studies this stuff and wanted a second opinion on how much I should trust the book's advice - either she'd tell me it was complete rubbish, which if true would be useful information, or it would be highly relevant to her interests, so win-win either way. In conversation we discovered that Meg-John Barker's Staying with Feelings Zine actually recommends this approach, which also gave me a reasonable amount of faith that it's probably reasonable. Meg-John Barker is great.

Anyway, all of this came later. The process of reading the book was much more straightforward. It took me about three hours, and provoked a remarkably strong emotional reaction which is a bit too personal for me to talk about here. I'm still processing this, and I need to reread and study the book a bit more before I feel like I've fully understood it or "done it properly", but I think if I get nothing out of it other than the one partial insight I got from reading it, that's three hours and £7 well spent.

The book is basically an instruction manual for Focusing's Six Steps, which are a reflective technique for understanding your own emotions. The core feature of Focusing is that you stop trying to either ignore or analyze your emotions, and step over and let the other biocomputer that is an integral part of your existence, your body, have a chance to point out some useful things you've been ignoring.

At least as importantly as explaining the six steps, it also contains a remarkably detailed set of debugging instructions for when you struggle to follow the core instructions.

I need to reread the book, and I need to think more about this, but I will definitely be doing that. Until I've done that I'd say any recommendation I can make for it is distinctly qualified, but for now I weakly recommend reading this book, and expect that on further reflection the strength of that recommendation will increase.


Unkind / Unfair / Untrue

This is a classification system I use, mostly for humour: Ask if a statement is unkind, unfair, or untrue.

The optimal number is one or two. Really the optimal number is one, but if the joke is strong enough you can get away with two.

Zero doesn't really work, because kind, fair, and true statements are good and valuable and have important social and therapeutic roles, but they're rarely funny. At least not if you're as bitchy as I am (people somehow think I'm a nice person. I don't know where they get this idea).

Three doesn't work because then you're just telling lies, and lies aren't funny.

Honestly my favourite is unkind/fair/true, because you can just double down on it. There's no lie there, you're not even really misrepresenting anything, you're just highlighting its worst aspects.

An example from earlier:

In space, noone can hear you scream, but that doesn't really matter when Musk is involved because honestly it's implied.

Peak unkind/fair/true right there.


Book Review: Sorting Things Out: Classification and Its Consequences by Geoffrey C. Bowker and Susan Leigh Star

(This is a bit of a placeholder review because I've been putting off writing it).

This is a book about classification and its consequences, and in particular about the politics and decision making that goes on behind them. It makes the point that classification forms a complex set of interlocking infrastructure, and discusses how that plays out in a variety of real world settings.

I really liked this book, but I'm not sure how strongly to recommend it. It varies wildly between extremely readable and fairly obtuse in its prose. I don't know whether this reflects which bit was written by which author, or whether both authors have the style. Nevertheless, I'm keeping it on my "further reading" shelf and intend to return to it regularly.

A couple of interesting points:

This book definitely falls under the heading of books I think are secretly about combat epistemology.

I need to do some reading and comparative study before I hvae much more to say about it.


The Core Problem with Democracy

As far as things that are likely to trigger a massive crisis of faith go, flipping through a social epistemology textbook (Social Epistemology: Essential Readings) and reading a highly technical chapter with lots of numbers in it probably not ranked very highly, but there we go.

The chapter in question was Group knowledge and group rationality: a judgement aggregation perspective, and it pointed out the following fairly fundamental problem.

Suppose you have a proposition \(A\), and people believe it independently with probablity \(p\). Then for large populations, with overwhelmingly high probability the majority vote on whether it's true will be yes if \(p > 0.5\), no if \(p < 0.5\). How large the population has to be is mostly determined by how far \(p\) is from \(0.5\).

Now suppose you have two true propositions \(A\) and \(B\), and people believe each with probability \(p\). As a result, they believe the proposition \(A \wedge B\) (\(A\) and \(B\)) with probability \(p^2\).

Now suppose we have \(0.5 < p < 0.7\). Then \(p > 0.5\), so the majority believe each of \(A\) and \(B\), but \(p^2 < 0.49 < 0.5\), so the majority disbelieve \(A \wedge B\). This means that if you put each of \(A\) and \(B\) to the vote separately that you will conclude that \(A \wedge B\) is true, but if you put \(A \wedge B\) to the vote on its own you will conclude it is false.

The situation gets worse the more conjunctions you add. If you want to get the same answer for three propositions then you need \(p > 0.79\). Eventually, you need almost everyone to be entirely right about everything to get the correct outcome.

This is only a model, and the underlying assumptions are certainly unrealistic, but I do think it points to a true problem: Individual people are much more likely to be correct about premises than conclusions. I was aware of this as a possibility before, but I hadn't previously appreciated that it wasn't just possible but overwhelmingly normal.

The problem is that almost everything we vote on is not a premise but a conclusion. Democracy has essentially two functions that are being conflated:

You are aggregating both values and judgement, and you are doing so with exactly the sort of conflation of things that we've just shown can't possibly be expected to work well.

What's the solution? I don't know. It's certainly not a dictatorship (which has its own worse problems) and markets have essentially the same issue. I need to think about this more.


The Lazy Collection Trick

The opening to a recent pull request:

Hey, psst, buddy, would you like to see some really weird code? Then boy do I have the PR for you.

A thing I've been working on right now is Hypothesis's memory consumption. This is as part of some work on driving Csmith with Hypothesis.

Unfortunately it turns out that the following facts are in tension:

The result was that Hypothesis ends up spending literal gigabytes on representing parse trees (not per parse tree, but it's not as far off that per parse tree as I'd like).

How do we make these two facts play well together?

Well, most of the parse trees are never used. Test case reduction is a fairly local process and only cares about small bits of it.

In the linked pull request, we are representing the test case as a list of blocks. A block is just an open interval in the underlying byte stream, with a bunch of metadata about it. Most of the time we only need one or two blocks. Sometimes we need all blocks. Sometimes we need none at all.

How do we solve this without being massively intrusive to calling code?

Easy! We implement it lazily. We store a compact representation of the endpoints of the block. From that and the underlying buffer we can build the Block object at any given index, and construct it on first access.

That's the basic idea of the lazy collection trick. There are lots of extra details involved in the implementing PR.

The result is a huge memory saving for Hypothesis's access patterns and I'll probably be using this more inside Hypothesis in a few other places too.


Do numbers exist?

In What even is a number? I finished with:

There is a more foundational question, which is whether numbers "exist" in any meaningful sense. My answer is "Almost certainly not, but if so why would we care?"

I'm going to backtrack on that a bit. Obviously numbers exist.

The problem is that this is a bad question because there are no non-misleading answers to it. Both the statements "numbers exist" and "numbers do not exist" are more likely to cause you to believe false things than true ones, and which answer you prefer says more about what you mean by "exist" than it does about numbers.

Numbers don't exist, in that there is no platonic realm where you will find the authentic number 3. At least, there is no evidence that there is such a realm, there isn't even a suggestion of what such evidence would look like, and it doesn't seem to be especially useful to postulate that one exists, so there might as well not be. Insert your own atheism analogy here. Michelangelo's Stone: an Argument against Platonism in Mathematics is a good argument against the idea that one exists, basically by pointing out that if such a thing existed it would be too large to serve the role that its proponents want it to - it is basically the infinite library of Borges, with every random string written down in a book. All published works exist there, but finding them is a harder task than creating them would be.

Additionally, they don't exist because as I argued, numbers aren't really any one thing. There is not "the number 3", there is only "the value that corresponds to the number 3 in our current implementation of natural numbers." In What Numbers Could not Be Paul Benacerraf argues quite convincingly that it is a category error to equate the number 3 with any particular instantiation of it.

So numbers really don't exist in any literal physical sense. I promise.

And yet numbers obviously exist.


Well because all of the same arguments apply to Chess, and if you try telling someone that Chess doesn't exist they'll look at you like you've sprouted an extra head.

Imagine the conversation:

You: Chess does not exist.

Them: Um, yes it does. Look, here's a Chess board.

You: Sure, that's a Chess board, but that's not Chess, that's just the symbolic representation you use to play the game of Chess.

Them: Which doesn't exist.

You: That's correct, yes.

Them: We have literally played a game of Chess on this board. How can we play Chess if it doesn't exist?

You: It's easy, you just follow the rules of Chess.

Them: Which don't exist.

You: Oh, well, there are rules of Chess of course, but those are mere matters of human convention. They don't imply any sort of abstract platonic ideal of Chess that exists somewhere.

Them: Look, here is a printed out copy of the rules of Chess.

You: I see. And?

Them: That seems to suggest the rules of chess exist.

You: Why? Those aren't the rules of chess, they're just words on a page that describe how to play chess. The rules are a purely abstract concept that exist purely as a matter of human convention.

Them: This seems like a fairly spurious distinction.

You: I suppose it does rather.

Them: Shall we settle it over a game of Chess?

You: Sure, why not.

And yet these are exactly parallel to the arguments that say numbers don't exist. People are very attached to the idea that you can apply the word "exist" to abstract objects and, while I don't want to entirely defer to common usage of a word for its more specific meaning, I think it would require an extremely convincing argument to persuade me that they are wrong to do so. Countries are also purely abstract objects designed purely as a matter of human rules and convention, but if you told me that France did not exist I could only answer "It does, though".

So, any reasonable definition of the word "exist" should incorporate abstract concepts with rules and names defined purely by human convention. Numbers are such a thing, and as purely abstract concepts it is perfectly reasonable to say that they exist. They exist in the same way and for the same reasons that France and Chess exist: Because we say so.


What even is a number?

Context: A philosopher friend asked me this question. This is my attempt to answer.

Attention Conservation Notice: It may not be a very good attempt to answer.

Epistemic status: My philosophy of mathematics is slightly idiosyncratic, in the sense that most people appear to believe ridiculously wrong things so it is unusual to be correct about this. I'm aware that feeling this is a bad sign for one's reliability.

Content warning: Highly informal mathematics.

My glib answer to this question which is completely 100% true and is actually quite deep when unpacked is "Depends. What do you want it to be?".

This is an answer that took a few hundred years of very heated arguing and people defending their own aesthetic preferences about numbers to arrive at. You can see some of that history of what we call some of the different sorts of numbers: e.g. Negative, irrational, and imaginary numbers.

Each of these represents a bitter argument about whether a new type of number should be allowed, all of which were eventually won by the people on the side of the new numbers, but with a legacy of our distaste for them left in the name.

You can view this as a process of discovery, with people finding out new things that are also numbers. You would be factually incorrect in doing so, and misunderstanding the basic nature of mathematics, but you could.

The main problems with this viewpoint are that there isn't really a single thing that we mean by "number", and each of these types of number is better viewed as having been invented than discovered.

Anyway, before we talk about the fundamental nature of mathematics, lets talk about sheep.

Suppose you are a farmer and you want to keep track of your sheep. In particular you want to make sure that all of your sheep who left the enclosure in the morning return to it in the evening. Unfortunately, numbers haven't been invented yet. What do you do?

(This is not a true history and is only a parable. Also it is definitely not my own invention but I can't figure out who the original source for this story is. Plausibly it's based on a misremembering of Yan Tan Tethera. People keep telling me it's from Yudkowsky but that's definitely not where I got it from and Lucy Keer points out prior art that is also not where I got it from. At this point I think we might as well just label it folklore)

Well, there's a very easy trick! You keep a small bag. Every time a sheep leaves the enclosure in the morning, you put a pebble in the bag. Every time a sheep returns to the enclosure, you take a stone out of the bag.

There are a couple of scenarios here:

  1. If all the sheep are in the enclosure and the bag is empty then the same number of sheep have left as returned, yay.
  2. If all the sheep you see are in the enclosure and there are still stones in the bag, you're missing some sheep. For each stone remaining, there is a missing sheep out there somewhere.
  3. If the bag is empty and there are still sheep outside of the enclosure, something has gone a bit wrong. You seem to have picked up some extra sheep from somewhere.

To take a simple system and make it complicated, what we have done is create a correspondence between the set of stones in the bag and the set of sheep outside the enclosure, such that we can perform equivalent operations on each of these things. In this correspondence we have the following:

The thing that makes this correspondence work is that the set of stones and the set of sheep behave in the same way, and so we can set up parallel arguments based on the behaviour of these two different systems matching at every step of the way.

Our claim is that the sheep are all in the enclosure if and only if the bag is empty. How can we know that this is the case?

Lets represent a series of operations on one of these things (the bag or the sheep) by writing them down as follows: We write down a \(+\) every time something is added to the collection (the rocks in the bag, the sheep in the enclosure), and a \(-\) every time something is removed. So for example if three sheep leave the enclosure and then two enter it (three rocks enter the bag, two are taken out of it), we would write that as \(+++--\).

These sequences obey the following rules:

  1. If the sequence starts with \(-\) then something has gone wrong - we've tried to take a rock out of an empty bag, or a sheep out of a world with no sheep in it.
  2. If the sequence ends with a \(+\) then the collection is not empty (we've just added something to it - we put a rock in the bag, a sheep left the enclosure).
  3. If there is a \(+-\) somewhere in the sequence (we put a rock in the bag then immediately took one out, a sheep left the enclosure and then a sheep returned) then the sequence of operations is equivalent to the same sequence with that \(+-\) removed (any two sheep and any two rocks are interchangeable for our purposes).

We can now determine whether the end state of any sequence of operations is empty solely from its representation in this notation by running the following procedure:

  1. If there are no operations, we are in the starting state of emptiness so the set is empty.
  2. If the last element is a \(+\) then by the above rules the set is not empty.
  3. If the first element is a \(-\) then we're in an error state. Thus the first element must be a \(+\) and we contain at least one \(-\). Scan forward until we find the first \(-\). This must follow a \(+\) (because it's the first). Remove that \(+-\) pair from the sequence to get a shorter sequence, and determine if the new sequence is empty.

In each step of this process one of two things happen:

  1. We determine the answer and stop.
  2. We replace the sequence with a strictly shorter sequence and try again.

This process of replacing the sequence with a strictly shorter one must eventually stop, because we can run it for no longer than it took us to build the sequence in the first place. This means we eventually end up either with an empty sequence (the set is empty) or with a sequence with only \(+\) operations in it (the set is not empty). Importantly, because these operations represent either sheep or stones, and because both sheep and stones obey the rules we laid out above, we know that the set of sheep in the outside world is empty if and only if the bag is empty: We can represent the current state of each by the same sequence of \(+\) and \(-\) operations, and we can determine solely from those operations whether the set is empty.

We can abstract this further.

Say something is a counter if you can do the following three things and obey the following rules:

  1. Ask if it is currently empty.
  2. Increment it.
  3. If it is not currently empty, decrement it.

Say two counters are observationally equivalent if, after doing the same sequence of increment/decrement operations to each side, they are either both empty or both not-empty.

Now, counters must obey the following rules:

  1. After incrementing it, the counter is not empty.
  2. Incrementing it then decrementing it results in a counter that is observationally equivalent to the original counter (and in particular is empty if and only if it started empty).

The above argument provides the stronger conclusion: Any two empty counters are observationally equivalent. We can represent the sequences of operations with our \(+/-\) notation, and at the end we will have concluded that either both counters are empty or both counters are non-empty.

OK, what does this have to do with the original question about numbers?

A counter can be anything that implements those increment and decrement operations. You can ask "What is a counter?" and the answer isn't any one thing. It could be a flock of sheep, it could be a bag of stones. Both of those things are counters, because they allow you to implement the counter operations in a way that obeys the rules. The notion of a counter is not that of a single concrete class of object, it is an abstract description of the behaviour of a system providing certain operations satisfying certain rules.

The nice thing about the counter rules is that to a certain degree we don't need to care what a counter is - the power of the behaviour is that we can know that whichever counter implementation we have, they will behave the same way. Because any two empty counters are observationally equivalent you can talk about the empty counter, but this is a polite fiction: There are an infinite variety of empty counters, we just don't care because they all behave the same way as far as counting is concerned.

This is also what a number is. A number is not any one thing, it is any one of any number of things that implement some operations (and there are different types of number depending on what operations you want to implement).

The numbers closest to the counters we've discussed are what are called the natural numbers: \(0, 1, 2, \ldots\). They don't include negative numbers like \(-3\) or fractional numbers like \(\frac{1}{2}\). We could also call them counting numbers, as these are the numbers which can be used to represent operations that correspond to counting things

The operations you can perform on natural numbers are that you can take their successor (the next element), and you can test if two numbers are equal. Additionally, there is a special element called zero (or \(0\)). The rules they have to satisfy are the following:

  1. \(0\) is not the successor of any element.
  2. Every element has a successor element.
  3. If two elements are equal then their successor is equal (this is important to state because there may e.g. be irrelevant information in the representation. Think of how we represented counts with stones - two bags of stones represent the same if they have the same number of stones, regardless of whether they're literally the same stones).
  4. If the successor of two elements are equal then those elements are also equal.
  5. Every number can be formed by writing down a series of successor operations from zero.

For example, one implementation of the natural numbers is our sequences of \(+\) operations for a counter. The empty sequence is our \(0\) element, taking the successor is adding a \(+\) to the end, two elements are equal if they are the same word.

In the same way that we reasoned about the behaviour of counters without having to know exactly what counter implementation we were using, we can reason about the behaviour of numbers without knowing how they work, and conclude properties that must hold for any implementations.

In particular, we can conclude from \(4\) that every element other than \(0\) is the successor of some other element: Suppose for an arbitrary number that we'll call \(n\), we write the successor of \(n\) as \(S(n)\). Then any number other than \(0\) can be written as \(S(S(\ldots S(0)\ldots))\) by rule \(4\), so is the successor of the bit inside the brackets. By rule \(3\), every non-zero number has a unique element that it is the successor of. Call this its predecessor. Similarly if \(n\) is some arbitrary non-zero number then we write its predecessor as \(P(n)\).

This means that we can use any implementation of natural numbers to implement a counter as follows:

  1. A counter wraps a single number.
  2. It is empty if that number is \(0\)
  3. When incrementing the counter we replace the number with its successor.
  4. When decrementing the counter we replace the number with its predecessor (which is valid because the counter is non-empty if and only if it contains a non-zero number).

Two of these counters are observationally equivalent if and only if they wrap equal numbers, because for any such counter you can produce a unique sequence of decrements that makes it empty (this isn't totally obvious, but the details are fiddly and unenlightening without getting much more formal than I want to).

This means that we can use any implementation of numbers to define the notion of the size of any given counter: It is the unique (up to equality) number that is observationally equivalent to the current counter.

The careful reader will notice a sleight of hand there: I haven't proven that there is always such a number, only that if there is one then it is unique. This is because there is not always such a number!

The following is a perfectly valid implementation of a counter:

  1. The counter is never empty.
  2. Incrementing the counter does nothing.
  3. Decrementing the counter does nothing.

Such a counter cannot be represented by a number, because for any number based counter implemented as above you can eventually run a sequence of decrement operations to make it empty. You could think of such a thing as representing an infinite collection, but note that this doesn't require any foundational assumptions about whether infinity is in some sense real or realisable, it's just a straightforward implementation of a set of rules.

In fact, every counter as defined is equivalent to either this infinite counter or some number, but I'll leave proving that as an exercise for the interested reader.

OK, after all of that, what actually is a number?

Depends. What do you want it to be?

The problem with asking this question about abstract mathematical entities like numbers is that they are not any one thing. A number is not its representation, any more than a counter is a bag of rocks or a flock of sheep. A number is an entity that comes from some implementation of the operations that we have decided to consider distinctively numeric.

An interesting feature of something like the natural numbers is that we can choose the rules to uniquely constrain the behaviour of the implementation (with some caveats about formal systems that I won't go into here), but there are many other classes of mathematical objects that are also interesting where this is not the case.

There is a more foundational question, which is whether numbers "exist" in any meaningful sense. My answer is "Almost certainly not, but if so why would we care?". It may be that there is some pure platonic realm of numbers where the true numbers exist, but we don't need it to exist to do mathematics, and it's unclear what its existence would imply about how we do mathematics anyway - it's not like this view of mathematical objects as reasoning about abstract systems obeying rules stops working in that event, the platonic realm just becomes another system that we can model this way.


Caching interactions with an arbitrary interface

This is a trick I've figured out recently for Hypothesis. I've yet to decide whether it's something I want to use, but it's a neat trick. It's much more easily implemented in a dynamically typed language, though I expect you could make something like it work in a statically typed language relatively easily with macros and metaprogramming and suchlike.

Suppose you have some interface that has the following two properties:

  1. All of the arguments to its methods are immutable.
  2. All of the return values of its methods are both immutable and hashable (for the sake of simplicity I will assume that none of its methods throw exceptions, but this can easily be made to work if they do).

Now suppose you have a deterministic function that takes any implementation of this interface and returns an immutable value. You can cache that function, which may be a big win if it is very slow (as it may be if it's a complicated test function as in Hypothesis).

You do not need to know anything about the implementation of the interface in order to do this. The cache will work by returning the value of any prior implementation of the interface that it is called with that is observationally equivalent to the current implementation.

How could this be?

Because the function is deterministic, what it does next is determined entirely by what has happened so far, which is in turn determined entirely by the sequence of return values from the object it is called with. Thus every result of calling the function corresponds to a unique sequence of return values. Additionally, none of these sequences can be a prefix of another, because that would be non-deterministic behaviour - once the function ran and called another method after this point, another time it didn't.

This structure makes it very easy to store the results in a tree. Each node either records which method was called and its arguments, or that no more methods were called and is a leaf storing the return value, or stores a sentinel value to record that we don't know what happens here and need to ask the underlying test function (this is helpful for the representation):

import attr
from collections import defaultdict

class TreeNode(object):
    """Node wrapping a previous observation. Having a mutable
    wrapper class around the value makes the implementation a lot
    observation = attr.ib(default=None)

class Result(object):
    """A previously observed return value."""
    value = attr.ib()

class Decision(object):
    """A previously observed call to the underlying implementation."""
    method = attr.ib()
    args = attr.ib()
    children = attr.ib(default=attr.Factory(lambda: defaultdict(TreeNode)))

To populate the tree you use the following wrapper around the implementation which proxies method calls to the underlying implementation and puts the observations in the tree:

class WrapperImplementation(object):
    def __init__(self, tree, underlying):
        self.__tree = tree
        self.__underlying = underlying

    def __getattr__(self, name):
        base = getattr(self.__underlying, 'name')
        if not callable(base):
            return base

        def accept(*args):
            if self.__tree.observation is None:
                self.__tree.observation = Decision(name, args)
            rv = base(*args)
            self.__tree = self.__tree.observation.children[rv]

def call_recorded(fn, tree, implementation):
    """Call fn(implementation) and update the tree to reflect the
    return fn(WrapperImplementation(tree, implementaiton))

We can then define a cached version of the test function as follows:

class UnknownResult(Exception):

def simulated_function(tree, implementation)
    """Either returns a previous result we've saved in the tree from
    calling this implementation, or raise UnknownResult."""

    previous = tree.observation

    if previous is None:
        raise UnknownResult()

    if isinstance(previous, Result):
        return previous.value
        rv = getattr(implementation, previous.method)(*args)
        tree = previous.children[rv]

In order get a cached outcome of an arbitrary implementation it's a little complicated and I can't be bothered to sketch out the code: If you can reset the implementation to empty it's easy, you just call it with the simulated function, reset it, and then call it recorded with the real function. If you can't reset it then you can define a new wrapper implementation that replays the prefix that was observed in simulated_function without calling the underlying implementation, then starts calling the real implementation once you're in unknown territory. You can implement this in terms of a resettable wrapper around the implementation.

How useful is this technique? Unsure. It's pretty niche. I think it might help a lot in Hypothesis though, where we're currently having some significant scalability issues for large and hard to reduce examples, and having something like this will allow us to avoid keeping them around because of how easy it is to recreate the data without reinvoking the test function.


Talking to your imaginary friends as a writing prompt

This is an idea that has been bumping around in my head for a while and I've never quite got around to trying, but I still think it's a good idea.

A common problem is this: I want to write, but I don't know what to write about.

A possible solution to this is as follows: Make up a fictional character, and have a conversation with them about something.

Even if you don't do anything with this, it's still useful writing practice, but I suspect it will also result in things that might be usefully edited into a long form piece.

Possible prompts within that:

  1. Explain some problem you're having to the fictional person.
  2. The fictional person doesn't understand something and wants your advice on it.


Book Review: Marriage and Morals, by Bertrand Russel

This book is an interesting artifact. It contains many progressive ideas and conclusions that even today would be considered "a bit too modern" by many people. Along the way, it also contains a great deal of casual racism, more than a bit of casual sexism (though you get the feeling that most of the sexism he feels is the result of a culture of depriving women of adequate education, so I'll give him a partial pass on this), and outright advocacy of eugenics.

This makes more sense as a combination when you realise that the book is from 1929, and it's aged as badly as progressive politics usually do once the world has overtaken them. In general I got the sense that he was an intelligent and decent person, and that most of his more objectionable beliefs were some mix of factually wrong and failing to think through the consequences, but it still wasn't a wholly comfortable read.

There was also rather a lot of gender essentialism and some frankly weird assumptions about how people behave, but given the above complaining about those feels like quibbling.

Broadly, his main conclusions were:

  1. People are way too uptight about sex and now that we have contraception it's actively unhealthy for this to be the case.
  2. Marriage should be more explicitly an institution for the raising of children, and as a result there should be no expectation of faithfullness.
  3. A bunch of eugenics stuff that I don't feel inclined to summarise.

The first seems basically accurate, the second... has problems but isn't an unreasonable stance (there are many happily childfree marriages and I think they seem to work very well for the people involved. I don't think in principle what he is describing is bad, but I think the transition to it is fraught, particularly when LGBT rights issues come into play). I'm not touching the third.

The most interesting part of the book for me is the central thrust of his argument, which is basically that having moral codes that you know aren't going to be obeyed is actively damaging, especially for children. By teaching people that sex is shameful, given that they're going to have sex anyway, you create a culture of guilt and shame, and you teach people from very early on that deceit is necessary and that they can't trust their parents. This seems like a sound argument, and I think is a useful general principle for evaluating moral precepts on.

I am glad to have read it but don't feel like I could actively recommend it to others. Either way, this copy is not mine and will be returned to the friends I have borrowed it from.


Easy parallel test-case reduction

As part of my current work in Hypothesis I'm redesigning the reducer to be very fine grained. Rather than having a coarse reduction pass that may run many operations, each shrink pass is explicitly written as a function that generates a set of small-step reduction steps.

Each of these steps can be run on any test case, and is designed to run only a very small number of operations (typically they are adaptive passes that run \(O(1)\) test invocations in the event that they make no progress or \(O(\log(n))\) if they make progress).

An example of this would be that instead of a greedy pass that tries deleting each of \(N\) elements, it generates \(N\) steps where step \(i\) tries deleting the element at position \(i\) (actually a contiguous region around that element, which is where the adaptiveness comes in).

The idea is that logically the original coarse grained shrink pass is equivalent to generating all of the steps then running all of them in turn.

The reason for doing it this way is twofold:

  1. It makes it easy to run operations for all reduction passes in a random order, which is often useful.
  2. It lets you interleave the operations of two different reduction passes - this is particularly useful because often reduction passes get "stuck", and having other passes running at the same time at them means that you make progress anyway (which reduces the amount of work that the stuck pass has to do, so even if you run it to completion it takes less time than it would have if you'd done it all at once).

However, the thing I noticed earlier is that it also makes it very easy to run the operations in parallel!

The way this works is very simple:

  1. Maintain a single globally best known result.
  2. In some arbitrary order run each of the shrink steps with as much parallelism as you like.
  3. Each step takes the globally best known result at the beginning of its run, performs some test cases, and possibly finds a better test case.
  4. If a step does find a better test case, it attempts to atomically update the globally best test case as follows:
    1. If it hasn't changed since the start of its run, it just updates it, no problem.
    2. If it has changed since the start of its run but the value it currently has is worse than the one found, just update it.
    3. If it has changed since the start of its run and the new best value is better than the one found, restart the current shrink step (which has proven likely worth running) on the new global best.

In the case where the test case is already fully reduced, this is embarrassingly parallel. In the case where everything makes progress, this is quite heavily contended, but also everything is making progress which tends to be the fast path for a test case reducer. It's probably worth scaling back the concurrency in this case.

My suspicion is that in most cases this will run quite close to the embarassingly parallel version - prior art on parallelising test case reducers (e.g. C-Reduce, halfempty) has based its concurrency on an assumption that test cases all fail.

One thing that is worth noting is that this parallel algorithm can cause some "dropped" work in the following sense: A and B run in parallel and both succeed, but A completes before B and B overwrites its work. If B ran again it would make further progress. Fortunately, this is actually fine, because in test case reduction we will iterate to a fixed point, so anything we drop will get an opportunity to run the next time, because this kind of dropped work only occurs when we're failing to make progress, and whenever we succeed at making progress we know we'll get another go.


Updating Random Samplers

I often struggle to refind the references for this, so this is mostly a note to self for future reference.

If you want to randomly sample from \(N\) items with probability of picking \(i\) proportional to \(w_i > 0\) you can use the Alias method for this and it works very well.

Suppose now you want to be able to update the weights. It turns out this is possible in morally-\(O(1)\) time (one of the algorithms linked is \(O(\log(n^*))\), which is the smallest \(k\) such that \(\log^k(n) < 1\). This isn't quite Inverse Ackermann levels of morally-\(O(1)\) but it's pretty close).

The key observation that makes the above two papers work is that if you can guarantee that the ratio of the largest to smallest weight is bounded then the naive method of sampling by picking uniformly at random and accepting with probability \(\frac{w_i}{\max w_j}\) runs in \(O(1)\) time. When you can't guarantee this, you can instead bucket together all values with similar weights, then pick a bucket with probability proportional to its total weight using some other method, then sample from within that bucket.


TV Show Review: A Series of Unfortunate Events

I nearly stopped watching this show quite early on, due to violently felt distress. My reaction appears to be idiosyncratic, and after perservering for a bit I mostly got past that. I still found the show quite unsettling - something about being surrounded by people who have power of you, are obviously wrong, and will not listen to anything you have to say on the subject is very stressful to me.

It's probably less stressful watching it as a child, as it's probably a fairly accurate rendition of their reality. It's a very dark show, but as the Chesterton line goes:

Fairy tales do not give the child his first idea of bogey. What fairy tales give the child is his first clear idea of the possible defeat of bogey. The baby has known the dragon intimately ever since he had an imagination. What the fairy tale provides for him is a St. George to kill the dragon.

In contrast, watching as an adult, you know that when St George valiantly fights dragons he is likely to die a violent fiery death.

Distress aside, it's an odd show. I think I liked it more for how peculiar it was than any real merit on its part. It felt like it had a lot of underexplored characters, which I think is partly because of how it was adapted from the books - one book per two episodes is not a lot of screen time to get to know people in.


Reducing the Reduction Pass Ordering Problem

Most test-case reducers are composed of many different passes, each performing a different class of operation. For example, you might have a pass that reorders values and a pass that deletes them.

The reduction pass ordering problem is that what order you run these passes in can have a huge impact on performance, and a moderate impact on the quality of end result.

I'm starting to be of the opinion however that this is an illusion, caused by the fact that our reduction passes are too large. If you have a reduction pass that, say, tries deleting each element, then when you start from \(N\) elements you could instead just have \(N\) reduction passes that try deleting a single element.

Once you've made your reduction passes this small, running any given pass is very cheap, and it seems like then most orderings do a pretty good job. For example, the following seems to work pretty well as the core loop of a reducer:

  1. Generate the set of all reduction passes that could have an effect on the current reducer.
  2. Run them in a random order.

This works particularly well if your passes are adaptive, so that they make \(O(1)\) SUT calls when they fail to do anything and may make up to \(O(\log(k))\) calls when making \(k\) to achieve progress equivalent to running \(k\) appropriate choices of pass from the same category (e.g. deleting an interval around the element).

It does seem possible to do better than uniformly at random for choosing these, but there doesn't really seem to be more than a factor of two variation regardless of how clever you are with your ordering (unless I'm missing an option that is better than anything I've tried so far).


Vegan Chartreuse Hot Chocolate

This is the hot chocolate I made for people over the weekend. It's very good, and also entirely vegan despite being very good.

The following makes one mug:

  1. Fill the mug \(1/5\) of the way up with green Chartreuse
  2. Fill it the rest of the way up with Alpro Cashew Nut Milk
  3. Add two (more if it's a big mug) squares of Lindt 90% dark chocolate to a pan with some of the liquid and stir on low heat until the chocolate is melted.
  4. Add the rest of the liquid and a pinch of salt and stir until everything is mixed and at the right temperature
  5. Serve

The result is a rich, boozy, vegan hot chocolate. I'm sure it would be better with real milk than cashew milk, but cashew milk is fatty and mild flavoured enough enough and the rest of the flavours are rich enough, that unless you were told that it was vegan I'm not sure you'd notice.


Recipe Write-Up: Not really Soboro

Soboro is a Japanese sweet beef dish that a friend told me about. I've never actually had it, but it sounded good.

In a typical Western cook manner, I decided to take this recipe I have never tasted an authentic version of and mangle it completely, using none of the intended ingredients. The result can at best be described as "soboro inspired", but is sufficiently good that I don't feel too guilty about its inauthenticity.


There are no quantities in this recipe because I have no idea what quantitites I used.

(I didn't have the correct ingredients so basically went for roughly the right flavour profile, and honey + soy sauce + brown rice vinegat did a pretty good job of getting what I imagine to be the correct umami/sweet/sour flavour. I think real soboro is sweeter than what I made but I wouldn't have wanted it much sweeter).

To give a ballpark idea I used about 1kg of ground beef and probably about 5 or 6 spring onions and a couple inches of ginger. The sauce I made filled a mug and wasn't quite enough so I added more of the liquid ingredients as I went, with roughly equal proportions of vinegar and soy sauce.


  1. Chop and fry the spring onions and ginger (traditionally you would put raw spring onion on top of the soboro. I hate raw alium in all its forms).
  2. Once those are well cooked, add the beef. Stir and break it up until it is mostly browned and well separated.
  3. Add the honey, brown rice vinegar, and soy sauce, stir, then reduce heat, cover and simmer. Keep an eye on it, stirring occasionally. It is done when the liquid is all absorbed.
  4. Serve. I served it with brown rice and steamed green beans because I serve everything with steamed green beans and I wanted to try out the brown rice I bought for Brexit prep (verdict: it's very good brown rice and I'll probably buy another bag as I expect to use most of this one before March).


Book Review: Every Cradle is a Grave by Sarah Perry

Conent warning: Suicide. No, seriously. There's a lot of discussion of suicide in this post.

I'm not a huge fan of this book. I want to say "It would be better for this book not have born", but that's mostly for the burn rather than because I actually feel that way. I thought it was an interesting read, but I also disagree with it pretty heavily, and think it has mostly failed in its goal.

This is particularly surprising because I would have expected to be a sympathetic audience. Roughly the two main causes of the book are antinatalism (the idea that it is immoral to have children) and the right to suicide. My prior moral positions (which are essentially unchanged) on these issues are:

On top of that I am at least somewhat familiar with Sarah Perry as a writer and while I don't agree with many of her positions was still vaguely positively inclined towards her.

Given all of the above, I found the book quite disappointing. It is a decent counter to some of the arguments against antinatalism and suicide rights, but I do not feel that it makes a strong case for either. The case made for suicide rights is, I think, stronger than the case for antinatalism, but it was also the one I least felt needed the case made for it.

My principle objections to the actual content:

But my real objections are to the omissions. I felt there were a lot of unexamined assumptions, and the entire thing read like a cost-benefit analysis without the benefit analysis part. This is partly because Perry doesn't agree that it's valid to make a cost-benefit analysis here, but I think she's manifestly wrong in that regard and hasn't made more than a sketch of an argument as to why that position should be taken seriously.

I did find the book an interesting read, and there are a few important take homes from it for me - e.g. the section on suicide contagion has definitely weakened some of my beliefs in the actual mechanics and practicalities around suicide reporting, and the discussion around how you count the benefits to as yet non-existent people clarified some of my understanding of the debate - but ultimately I ended up if anything less sympathetic to the antinatalist position than I started out.

I'm unsure where this book will end up. On my shelves for now, and we'll see if I want to revisit it. I might lend or give it to friends who are interested in the arguments, but I wouldn't want to pass it on to strangers.


Book Review: A Paradise Built in Hell by Rebecca Solnit

This book was very good. The subject was interesting, and Rebecca Solnit remains an excellent writer. It took me a surprisingly long time to read despite not being particularly long by page count. It took me about three weeks, which means it's probably a multiple month book for people who read less or more slowly than me. I'm not entirely sure why, and this may be due to external factors distracting me, but if you are not a fast reader this means my recommendation is more qualified.

The claim of the book (which is supported by the reports she provides from a lot of individual disasters. It's mostly anecdotal / qualitative, but the reports are reasonably convincing nevertheless) is that the portrayal of how people behave in disasters is backwards: People on the ground don't panic and descend into chaos, instead they band together and look out for eachother. It's the kind of anarchy that means bottom up organisation, not the kind of anarchy that means chaos.

There are some exceptions to this of course:

In general she is extremely negative about "looting" as a concept, which is fair enough. The point she makes in this regard is that it essentially covers two entirely different things: The sensible and ethically justified requisitioning of supplies in the breakdown of a market economy (if you're in the middle of a disaster and nothing is running then of course you should break into a supermarket to get food and medicines) and the use of a disaster for opportunistic theft. The latter is a relatively rare occurrence, a relatively minor problem, and tends only to occur in cases of extreme breakdown of society with high prior inequality. Further, atttempts to prevent it are more likely to prevent the entirely justifiable requisitioning, and people who are there to "prevent" it are likely to engage in it themselves.

There is an interesting tie in between this and the "Production of Ignorance" discussed in CN Lester's "Trans Like Me" (see my prior review here), which is that much of the harm caused by disasters is caused by these false narratives of how people behave during them.

The other half of the point made in the book is that far from being a terrible experience for the people involved, often people who survive disasters remember it as a time of joy. Everyone bands together with a common cause, and the utopian communities they form to help one another out form a profound source of meaning, which is absent from day to day life.


Onion, Bacon, and Potato Pancakes

One of the things I miss about not being able to eat wheat and dairy is pancakes. I'm currently experimenting with fixing this.

The main thing I've learned is that potato starch works really well as a flour substitute for this, to the point where I almost prefer it to real flour. Unfortunately I haven't yet figured out how to make pancakes without dairy that taste normal and aren't a bit boring. The baseline is fine but a bit bland.

This was an attempt to pull the pancakes slightly in the direction of more conventional potato pancakes flavour wise. I would describe it as "worked pretty well but needs work".

Preparation steps:

Unlike my normal pancake recipe, using potato flour seems to demand a fairly well oiled pan and a much higher heat.

Things I would change next time:


Some female SFF authors to read

Elsewhere someone was noting that their bookshelf was looking very male dominated and asked for a recommendation of good books by women that they could read. Here's the list I gave them.

There are a lot more I could recommend as for a variety of reasons (mostly unintentional, but with some deliberate selection once I noticed the pattern) I tend to mostly read fiction written by women, but this is a pretty good "top pick" list.


Reducer Pass Budgeting

One of the hard problems in designing test-case reducers is pass ordering: You've split the reducer up into a number of different passes, now you want to know what order to run them in.

A lot of why this is tricky is that some passes you just really don't want to run until you have to because they're much more expensive. I'm currently exploring a solution to this problem that mostly avoids it by just making that not the case.

The solution is budgeting - running a reduction pass with a maximum number of calls it's allowed to make.

The following algorithm seems to work fairly well:

  1. Set the budget to infinity, mark all passes as active.
  2. For each active reduction pass:
    1. Run it, but halt it once it has made a number of unsuccessful calls that exceed the current budget.
    2. If it succeeded, set the budget to something appropriate. I'm currently trying double the mean number of calls made by a successful pass on this run through.
    3. If it failed, mark the pass as inactive.
  3. If the number of active passes is one or fewer, mark all passes as active.
  4. If no passes made progress, halt (note that the budget is infinite unless a pass has made progress, so this doesn't affect correctness). Otherwise go back to step 2.

This seems to do a pretty decent job of keeping the badly behaved passes under control while still making good progress.

Other things I have tried:

  1. Starting the budget low and working upwards causes a lot of wasted work.
  2. Being more aggressive with lowering the budget on each pass does the same.
  3. Allowing just a single pass to remain active tends to result in a lot of pointless small shrinks and not much useful progress.


Brexit Prep

I've been operating on the assumption for a while that it makes sense to stock up on food for Brexit. I don't think anything too terrible is going to happen (above and beyond the fact that Brexit is intrinsically terrible), but I think the chances of food supply chain disruption are high enough that it's worth hedging bets and making sure that you don't starve if things go wrong. I don't really have an accurate number in my head, but if we say I think the chances are higher than 1%, and I can hedge against that by just front-loading on food purchases, it makes sense to do so.

I've been putting it off, but I finally ended up doing a huge Amazon order the other day to form the basis of my Brexit supplies. It's random and not very well thought out (I have no idea what I'm doing) and I've already spotted some problems with it, but a friend asked me to share it and it's better than nothing, so here it is.

General philosophy

Civilization is not ending but the supply chain is going to go through a rough patch. I expect it will be possible to buy food, but I have enough uncertainty on that that I would like to ensure that if I can't it's not a disaster.

I am assuming utilities will mostly not be too disrupted, but am hedging my bets a bit against the possibility of things not working properly for a day or so at a time.


Food is the main thing I'm stocking up on. The general goal is to mostly only buy things I'd like and use anyway, although I've screwed up a bit there in a few places.

The basic things I've bought are:

The goal here was to have a variety of things that would keep well, providing a range of interest and nutritional value. With these I should neither starve nor get scurvy or other nutritional deficiencies.

I've bought all of these in reasonably large quantities. I haven't really done the maths but I think I could easily survive for two months on just this food if it were just me I was feeding (it's not really, but I'm not actually intending this to be a few month sole supply), I had access to fresh water (and ideally power, but I have the means to make fire). Longer if I stretched it more carefully.



Recent Fiction Reading

My more recent non-fiction reading has been eating in to my fiction reading a bit, but here are some things I've been reading recently.

Books that I thought were decent but not great

Books I thought were pretty good

Books I thought were great


Book Review: Aphantasia: Experiences, Perceptions, and Insights, by Alan Kendle

I didn't find this book very interesting. You might find it more interesting if you don't have aphantasia and are one of the people who find its existence mind-blowing, but most of this book was very obvious to me as someone who has had aphantasia and talked to a bunch of other people with it.

The big problem with it for me was that it made no real attempt at science or synthesis. It was just a bunch of people talking about their personal experience. I can see why that seems interesting in principle, but the problem in practice is that it's a bunch of people trying to generalise about a vastly heterogenous experience from their personal n=1 sample. As a result I didn't find it very informative.

This book will probably get passed on to some acquaintance who is more curious or less informed about other people's experiences of aphantasia than I am.


Using unsound A* search to improve a greedy algorithm

I figured out something quite neat today, which is that you can use improper \(A^*\) search to find a good set of test-case reduction pass orderings.

The basic idea is that you have some set of interesting test cases \(X\) a number of reduction passes \(f_i: X \to (\mathbb{N}^+, X)\) with \(f(x)_1 \leq x\). The pass returns a shrink (possibly the same value) and a cost (i.e. number of test evaluations) of getting there. We're looking for a minimum cost sequence of applications of the \(f_i\) until we get to a fixed point.

This can be viewed as a graph search problem. Each \(f_i\) defines an edge of weight \(f_i(x)_0\) from \(x\) to \(f_i(x)_1\). You can run Dijsktra's algorithm to find an exact shortest path to a fixed point, but this takes basically forever.

\(A^*\) is an algorithm that speeds up Dijsktra by using a lower bound on the possible path length to preferentially select good directions. There's no way to run a proper version of \(A^*\) because the problem is not well enough posed to get a good lower bound on the cost of reduction, but you can run an improper one by setting that lower bound heuristic to whatever you like. Depending on how close it is to being a true lower bound you may still get a good path which may or may not be optimal (in my experiments it seems to be within about 10-20% of optimal, which is pretty good).

The trick for test-case reduciton is that greedy search works pretty well but not brilliantly. You can set the lower bound heuristic to be the cost of running a greedy search (I'm using one which always selects the lowest cost edge that makes progress). This then ensures that your lower bound always corresponds to the cost of some path, even if it's not the optimal one. In particular if you wanted it to be this could become an any time algorithm.

I suspect this trick generalises but I haven't quite figured out its essential characteristics yet.


Book Review: The Cooking Gene by Michael W. Twitty

I would describe this book as "good but not great", which is a shame because a lot about it is very good. This is a powerfully written book about the (black) author's experiences travelling around the south and learning about his ancestry. I learned a bunch of interesting and important details about:

Unfortunately, most of what I learned was details. The book meanders quite a lot, with very little in the way of unifying direction or synthesis, and often repeats itself. It felt more like a collection of essays that had loosely been edited together into a book, without any attempt to really tie them together or unify their content.

This would be fine in principle, except the book is quite long. The length is not in and of itself unwarranted - I would certainly have happily read a book of the same length on the same subject if I felt the length was well used - but it made me significantly more impatient with its structural flaws, and the result is a book that is locally very well written but globally fails to hold together, and as a result by the end of the book I was mostly impatient for it to be over despite mostly liking it.

I don't regret reading it, and I'm going to keep it around for a bit (mostly to try out a few of the recipes that are sprinkled throughout it), but I don't feel like I can really recommend it to anyone whose reading time or rate are more constrained than mine.


Derivative of a polynomial with real roots

Theorem: Let \(P\) be a polynomial with real roots. Its derivative, \(P'\), also has real roots.

Note that there is something essentially analytic going on here, because the theorem is false if you replace "real" with "rational": Consider the polynomial \((x^2 - 1)(x + 2) = x^3 + 2x^2 - x -2\). This obviously has real roots (they are \(-1, 1, -2\)), but its derivative is \(3x^2 + 4x - 1\) which has roots at \(\frac{-2 \pm \sqrt{7}}{6}\), which are not rational.

This isn't hard to prove but I found my first proof of it a little unsatisfying - there's an easy bit and then an annoying thing you have to do to patch it up. I asked on Twitter for nice proofs, and there is indeed a nicer proof using an only modestly large hammer of the Gauss-Lucas theorem which I hadn't previously encountered and is indeed very neat. This states that the roots of the derivative of any polynomial lie in the convex hull of the roots of the polynomial. In particular because the reals form a convex set, if the roots are real then so are the roots of the derivative.

This is a note about the other... I don't really want to say it's more elementary, because in some sense it's less elementary, but the one that most people seemed to naturally reach for.

Lemma: Suppose \(P\) has a root at \(w\) with multiplicity \(k > 1\). Then \(P'\) has a root at \(w\) with multiplicity at least \(k - 1\).

Proof: Let \(P(x) = (x - w)^k Q(x)\). Then \(P'(x) = (x - w)^{k - 1}(k Q(x) + (x - w) Q'(x))\), so has a root of multiplicity at least (k - 1\0 desired (as the right hand side is a polynomial).

Lemma: Suppose \(P\) has distinct roots \(x_1 < \ldots < x_k\), then there are \(y_k\) with \(x_1 < y_1 < x_2 < \ldots < y_{k - 1} < x_k\) and \(P'(y_i) = 0\).

Proof: Rolle's theorem. \(P(x_i) = P(x_{i + 1}) = 0\) so there is some \(y_i\) with \(x_i < y_i < x_{i + 1}\) and \(P'(y_i) = 0\).

Proof of theorem: Let \(x_1 < \ldots < x_k\) be the roots of \(P\) with multiplicities \(n_i\). Then by Lemma 1 we have at least \(\sigma (n_i - 1) = n - k\) roots at the real values \(x_i\) and by Lemma 2 we have \(k - 1\) real \(y_i\) with roots, giving us a total of \(n - k + k - 1 = n - 1\) roots. \(P'\) is a polynomial of degree \(n - 1\) so only has \(n - 1\) roots, thus we have accounted for them all.

As an amusing alternative you can use a density argument (the set of polynomials with distinct roots is dense in the set of polynomials with real roots, the derivative function is continuous on the set of polynomials of degree at most \(n\), and the set of polynomials with real roots is closed), but this seems like overkill.


Flying Lessons

For a long time I've been using Douglas Adams's flying lessons in "So Long And Thanks For All The Fish" as a very useful analogy for a certain type of lesson.

In my head the instructions Arthur gives to Fenchurch on how to fly go roughly "First left up your left foot, then lift up your right foot". The analogy being that I find that a lot of instructions (e.g. for meditation) consist of a sequence of individually reasonable steps but put together I can't help but feel like something is missing.

I've just looked up the actual instructions:

‘Ask me how I did that.’ ‘How... did you do that?’ ‘No idea. Not a clue.’ She shrugged in bewilderment. ‘So how can I . . . ?’ Arthur bobbed down a little lower and held out his hand. ‘I want you to try,’ he said, ‘to step onto my hand. Just one foot.’ ‘What?’ ‘Try it.’ Nervously, hesitantly, almost, she told herself, as if she was trying to step onto the hand of someone who was floating in front of her in mid-air, she stepped onto his hand. ‘Now the other.’ ‘What?’ ‘Take the weight off your back foot.’ ‘I can’t.’ ‘Try it.’ ‘Like this?’ ‘Like that.’ Nervously, hesitantly, almost, she told herself, as if - She stopped telling herself what what she was doing was like because she had a feeling she didn’t altogether want to know. She fixed her eyes very very firmly on the guttering of the roof of the decrepit warehouse opposite which had been annoying her for weeks because it was clearly going to fall off and she wondered if anyone was going to do anything about it or whether she ought to say something to somebody, and didn’t think for a moment about the fact that she was standing on the hands of someone who wasn’t standing on anything at all. ‘Now,’ said Arthur, ‘take your weight off your left foot.’ She thought that the warehouse belonged to the carpet company who had their offices round the corner, and took her weight off her left foot, so she should probably go and see them about the gutter. ‘Now,’ said Arthur, ‘take the weight off your right foot.’ ‘I can’t.’ ‘Try.’

I'm not sure if this works better or worse than the version I remembered. They're much more detailed, but they more directly state the core message of forgetting that what you're trying to do is impossible and just do it anyway. The big problem with that message is that what actually happens when you do that is that you fall down.


Test-case reduction as graph search

Attention conservation notice: Infodump of an idea I haven't tried yet. Very poorly explained.

This is an idea that's been bouncing around in my head a bunch recently and has recently crystalised into an interesting form.

Suppose you've got a test-case reducer consisting of passes \(r_1, \ldots, r_n\) and are trying to reduce some test-case to a simultaneous-fixed point of all of these passes. How do you decide what order to apply the passes in?

One difficulty is that there are two different things to optimise for: Cost of reduction, and quality of final results.

However, what I realised recently is that actually this is wrong. You shouldn't be trying to optimise for quality of final results at all. That should be implicit in your definition of the passes: any final result that you get that is a fixed point of all of your passes is somewhere you can get stuck. If you didn't want to get stuck there then you need to define more passes rather than worrying about how to carefully apply your passes so as not to get stuck.

Given that, our cost model is now easy: We just want to minimize the number of predicate calls.

And there's a nice way to do this! We can regard the set of test cases satisfying the predicate as a labelled weighted digraph, whose vertices are test cases satisfying our predicate, and whose edges are labelled with a reduction pass, go to the test case that that pass takes the current one too, and carry a cost of number of evaluations made during running that pass.

This means that we can basically run a shortest path graph search to find a fixed point of all of the reducer passes! Except that's not quite right, because we also need to include the cost of verifying that we are at a fixed point. So what we are trying to minimize is the total number of reducer calls made during running passes added to the cost of running each path at our destination.

As this stands, that's probably not a very useful thing to do, but some additional assumptions and tricks let us turn it into an improper version of \(A^*\) search which may well be quite efficient!

The main trick to note is that we can evaluate passes with other predicates. In particular we can evaluate them with a predicate that returns False for everything but its starting point but records its result.

This gives us two interesting things:

  1. By doing this for each pass we can work out what the cost of that node would be if it were a fixed point (by counting all of the values considered when running all of the passes).
  2. We can simultaneously work out a pessimistic speculative version of the graph, which makes the assumption that each pass pays its full cost for the version that always returns False and leads to the worst strict improvement.

We now do a species of \(A^*\) search as follows:

  1. For each test-case satisfying our predicate we keep track of the shortest known path to it following the real graph.
  2. We maintain a priority queue of pairs test-case/reduction pass, keyed off the sum of the actual cost of reaching that test case, the speculative cost of the pass, and the speculative cost of its pessimistic destination.
  3. We maintain a counter for each test case of the number of reduction passes it is a fixed point of and a total cost of all reduction passes we've run on it.
  4. We maintain a current shortest path length, initially set to infinity.

We start by putting our initial test case along with each reduction pass into the queue. While the queue is non-empty we iterate the following algorithm:

  1. We pop the smallest item from the queue.
  2. If the current real cost is greater than or equal to the current shortest past length, we discard it.
  3. Similarly if the current real cost is greater than or equal to the current known shortest path to this test case, discard it.
  4. Otherwise we run the reduction pass on its test case, keeping track of the number of calls made. This gives us a new edge.
  5. If this edge is a self-loop, add the loop cost to the cost of this test case and increment the counter. If we now know it is a fixed point of all reducers as a result of that, set the current shortest path length to the cost of reaching this test case plus its loop cost.
  6. Otherwise, if this results in a new shortest path to the destination, put edges for it in the queue (use real ones when known).

Generally speaking as soon as a pass works we will tend to prioritise it, because it will be lower cost than our pessimistic assumptions, but once we've run several passes we will tend to backtrack a bit to see if any of those earlier passes were worth trying after all.


Etymology of property-based testing

Fun Fact: People say that property-based testing was invented in Haskell but this isn't actually true. Property-based testing was invented in Erlang.

QuickCheck was invented in Haskell, but they didn't start referring to it as property based testing until the Erlang version of QuickCheck. The term was then widely adopted and came to include the original Haskell QuickCheck too.

The first time that the term "property-based testing" was used for this style of testing was in 2006, in "Testing telecoms software with Quviq QuickCheck". I asked John Hughes about this and he says that they were pushed into giving it a better name by Ulf Wiger.

Funnily, there's a much earlier usage of the term property-based testing, in the work of George Fink leading up to his PhD thesis. I didn't really look into his system in detail, and it doesn't seem to have taken off, but as far as I can tell there's not much relationship between the two concepts.


Another attempt at brownies.

Continuing to iterate on how to do brownies without wheat or dairy.

Today's attempted ratios:

The results are OK but they're too sweet (I know this seems unsurprising given the amount of sugar, but this recipe works if you use flour and butter) and lack structural integrity. I think the excessive sweetness is probably because I forgot the salt. Either way the last batch was definitely superior.


When is it OK for a book to be long?

I have a very strong preference for short books. In How do you read so many books? I suggested the following rules of thumb:

  • If the book is under a hundred pages I’m probably just going to read it. I can do that in two hours, no problem. I can always stop if I hate it.
  • If the book is between one and two hundred pages it’s worth spending a bit of time determining if it’s worthwhile, especially if it’s not that well written.
  • If the book is more than three hundred pages I am very unlikely to read it if it’s badly written. If it’s well written then I hold it to a higher standard of value but am still willing to read it.
  • If the book is more than five hundred pages then I basically want someone signing a promise in blood that it’s worth my time to read it.

I've noticed that there are some exceptions to this rule where I'm perefectly OK with longer books. The main one is textbooks. I don't expect textbooks to be short, but that's OK because I don't expect to read them cover to cover either.

There are some non textbook examples as well. I really liked Rewriting the Rules despite its 400 pages. I'm currently reading and enjoying Watching the English despite it being more than 500 pages and nobody having signed that promise in blood I asked for.

It occurs to me that there are two defining characteristics of an acceptable long book:

In contrast I really like the concepts from "Seeing Like a State" but I felt like the book did not justify its length because almost all of the length was composed of making the same point over and over again.


Privacy as Friction Reduction

This piece on privacy is interesting.

I now think privacy is important for maximizing self-awareness and self-transparency. The primary function of privacy is not to hide things society finds unacceptable, but to create an environment in which your own mind feels safe to tell you things.

It ties in to some thoughts I've been having recently about the role of private discussion spaces: People are generally willing to acknowledge and discuss things in much greater nuance in private than they are in public. What I'd previously noticed happening at the group level is that public sharing of discussion essentially forces you to bring everyone listening in up to the same level of shared understanding in order to have the conversation, and also you have to be constantly self-monitoring for the potential weaponisation of the things you're saying.

In some sense these are the same thing, applied at the level of the group mind.



One of the advantages of using beeminder is that it keeps me on track.

One of the disadvantages is that it can't judge quality of data points, so sometimes I phone it in...

For example by posting something like this in order to prevent my notebook goal from derailing.


Levy's theorem

I have this theorem from "Combinatorics of Permutations" by Miklós Bóna, where it's frustratingly referred to as Levy's theorem and neither proved nor referenced. Fortunately despite its incredibly odd looking nature it's not actually that hard to prove.

Theorem: Let \(a_0, \ldots, a_n\) be a sequence of non-negative real numbers and let \(A(z) = \sum a_k z^k\). If \(A(z) > 1\) then the following two statements are equiovalent:

This theorem took me completely aback when I saw it but once you see the proof it's quite natural.

Let \(p_i\) be arbitrary probabilities. Without loss of generality assume that \(p_i > 0\) (if some of the \(p_i\) are non-zero this will just result in some of the high coefficients being 0). Let \(X_i\) be independent bernoulli variables as above, and let \(d_k = P(\sum X_i = k)\).

Construct the generating function \(f(z) = \sum d_k z^k\). This can be written as \(f(z) = \sum\limits_{A \subseteq [n]} \prod\limits_{i \in A} z p_i \prod\limits_{i \not\in A} (1 - p_i)\), but this is \(\prod\limits_{i} (z_i p_i + 1 - p_i)\). This is a polynomial with roots at \(z = -\frac{1 - p_i}{p_i}\) as desired.

This gives us our implication from the second to the first. By construction and the fact that \(f(1) = P(0 \leq \sum X_i \leq n) = 1\), \(A(z) = A(1) f(z)\), so the roots of \(A(z)\) are the roots of \(f\) and \(d_k = \frac{a_k}{A(1)}\).

To get the other implication we just need to show every possible \(A(z)\) arises this way. i.e. we can always find \(p_i\) such that \(A(z) = A(1) f(z)\).

Let \(A(z)\) be non-constant and have real roots. Note that these roots must be negative, because for any \(z > 0\) we have \(A(z) \geq a_0 + \min(z, z^n) \min\limits_{i \geq 1} a_i > 0\). So suppose these roots are \(r_1, \ldots r_n\).

Because the formula for the roots of \(f\) takes the value \(0\) when \(p_i = 1\) and tends to \(-\infty\) as \(p_i \to 0\), we can choose our \(p_i\) so that \(r_i = -\frac{1 - p_i}{p_i}\). This ensures that \(A\) is proportional to \(f\), and again because \(f(z) = 1\) this means that \(A(z) = A(1) f(z)\) as desired.


A rather surprising (to me) implication of this theorem is that unless there are \(j, k\) such that \(a_i > 0\) if and only if \(j \leq i \leq k\), \(A(z)\) has at least one non-real root. There's presumably some elementary way of proving that, but it's direct from Levy's theorem because the probabilities of getting \(k\) successes have this property.

In general this seems to put strong constraints on the coefficients of polynomials with non-negative coefficients and real zeroes, but I haven't fully understood what they are yet (something to do with log concativity of their coefficients?).


Wheat-and-Dairy-Free Brownies

Due to the restriction diet I can't eat wheat or dairy, so I'm experimenting with my brownie recipe to try to make it work without.

The following seems to work reasonably:

Mix all together, bake at 200C for 25 minutes.


These were pretty good but didn't work as well as they could have, I think partly due to my succumbing to the urge to tinker with too much at once. When I try it next I 'll go back to the correct ratios of sugar and cocoa (2 cups sugar, 1 cup cocoa). I also feel like there wasn't quite enough liquid in them - I didn't want to put too much oil in because the almonds implicitly add a lot of fat, but it would maybe have benefited from a little bit more. This might also have been because the cocoa absorbs more liquid than the same amount of sugar would.


Book Review: How to Improve Your Foreign Language Immediately Paperback by Boris Shekhtman

I really like this book. It has a few flaws that annoy me slightly and are worth commenting on, but it's basically a very good book. I can't personally say if it works, as my foreign language skills are extremely poor and I'm not actively working on that (I have at various points spoken French and German badly, and I can mostly still puzzle them out in written form, but it would be a lie to say I really speak them), but even without that context I think it's a good book.

Fundamentally this is a book about how to have a conversation, and how to develop sets of concrete communication skills that allow that to work for you. This is particularly valuable when speaking in a language that you don't know very well, but I think the lessons are important for native speakers as well - in particular the advice on how to deal with stalled conversations is something that I think most people should learn.

The book is very short, so I won't try to summarise its lessons here, and I'd just recommend reading it instead, but I would add two caveats:

  1. Some of the specifics of his advice will come across as extremely rude. In particular he recommends changing the subject by saying "I'm not really interested in this subject". Don't do that. Say "I don't know very much about this subject" or similar. The former comes across as saying "Your interests are boring".
  2. The strategy of inserting colloquialisms into your speech in order to sound more natural is high variance. It is very obvious when a foreign speaker is doing this badly, and it starts to grate. As a native speaker it is my responsibility to have patience with foreign speakers (and when the roles are reversed I hope they would grant me the same), so by all means go for it, just be aware of this dynamic.

I suspect if you were to use this book as it is intended then it would be very helpful to practice all of these techniques with a trusted friend who is fluent (ideally native speaking) in the target language.

This book is staying on my shelves for circulation to friends who are interested in it and maybe the occasional reread (this was already my second reading of it).


Book Review "The Tao of Pooh and the Te of Piglet" by Benjamin Hoff

The Tao of Pooh was mostly OK. I found it faintly annoying, but I can see how its advice would be useful to people.

The Te of Piglet can fuck right off. I'm not in any mood for your grumpy old man bullshit about Amazon Feminists who want to be men but insist on "nonsensical" gender-neutral language. I stopped reading at about that point and started skimming, but quickly decided that I wasn't even interested in keeping skimming.

I was also fairly unimpressed by the number of objectively false things that the author believed, including that there were Taoists who lived to more than two hundred years and that the fact that we don't believe the Taoist creation myths was the result of our reductive Western ways.

Anyway, this book is going in the firepit. It is not without its virtues, and if you find a solo copy of just the Tao of Pooh it might even be on balance worth reading, but I cannot in good conscience recommend it to anyone: Either the Te of Piglet will piss them off or worse it won't piss them off and they might take it seriously.


Democracy Can't Work

(Title somewhat overstating the case, but only somewhat)

A challenge for the interested reader: How do you reconcile the following facts?


Crispy Spicy Green Beans



  1. Put the green beans in a roasting dish
  2. Cover them with oil
  3. Add salt and chill to taste.
  4. Toss until well coated
  5. Roast at 250C until crispy (I wasn't paying attention to time, but I think this was 10-20 minutes)


Oh gods these are like crack. They are so good.


Book Review: Learn to Write Badly by Michael Billig

I picked this up thinking that it would be a book of writing advice. Even before I bought it it was reasonably clear that wasn't what it was, but it seemed like it might be interesting anyway.

It was interesting, and I would cautiously recommend it, but it's something of a niche interest and was a surprising amount of work to read (it's not badly written - that would be a bit too ironic - but it is quite dense).

Roughly what the book is about is the pressures that social scientists find themselves under from the academic system, and the way they write in order to conform to those pressures. Billig argues that academics in the social sciences have a number of habits of language including the following:

  1. Use precise technical terms in imprecise ways.
  2. Use terms in ways that pun between their colloquial and technical meanings (e.g. significant).
  3. Omit necessary quantifiers from terms ("participants did X" instead of "all participants did X" vs "60% of participants did X").
  4. Use the passive voice to omit necessary details about who did what.
  5. Use noun phrases which are actually quite ambiguous (he had some good examples but these were early in the book and there was a gap in my reading, so I don't remember them and didn't easily refind them when writing this review).

The common thread of all of these is that they allow the authors to give the impression of being more precise than colloquial language would allow, while in fact being less precise. It allows for a language in which they can sound more impressive while being less impressive.

If this sounds a lot like cranky reactionary objections to postmodernism, it is worth noting that a) The author, Michael Billig, is very much himself a social scientist and b) The book consists of a large number of quite specific, measured, and well researched and cited criticisms.

I think I would need to reread it in pieces before I felt I fully appreciated the scope of his objections, but the main thing I was left with was that it reinforced two things I already believed and somewhat damaged my faith in something I quite like.

The two things I already believed:

The book is a pretty compelling argument for the former (although Billig mostly restricts himself to the social sciences, and his arguments have more strength there, I don't object to generalising even though I think he would encourage me to resist the urge to generalise), and thus implicitly for the latter. Or, at least, on the latter it is pretty good at demolishing the arguments that academic social scientists use to defend their terrible writing (which seems suspiciously similar to the ones that I've seen from other academics).

The thing that the book also implicitly argues against that I've previously believed, and still mostly believe, is roughly something along the lines of... "abstraction is good". I've previously argued that the way mathematicians use words has some powerful characteristics, but the book is mostly arguing that if you try to apply those ways in the social sciences then the way people actually use them in practice is quite unhealthy. I feel compelled to say "Yes but that's because they're doing it wrong", but this isn't actually a very good defence - if a tool makes it very easy to misuse, that is a fault of the tool and not the people misusing it. I need to think about this further.

A recurring argument that he makes in support of this is that when we turn something into a noun, we rhetorically sidestep a lot of the work of demonstrating that it exists. Nouns come with a sense of reality that isn't necessarily supported by the evidence. I could define a "tonon" as "The headache you get from being stared at by a cat" (this is not his example, but it's something I thought up in another context). This isn't a thing that actually happens, but it was still perfectly legitimate to make the definition. Only now I can write extensively about tonons and by the time you've read an entire essay about the role of tonons in popular culture it will be just that little bit harder to remember that this isn't actually a real thing. After all, there's a noun for it, and nouns are very real, solid sounding, words.

Billig is quite keen to make it clear that the book is an analysis of the problem and its consequences rather than actually a book about writing style, and goes through most of the book without making much in the way of recommendations, but doesn't quite escape recommendation free and the last couple of pages do suggest some things that might help (with a long preamble explaining why they probably won't). He mentions Orwell's Politics and the English Language which makes the following six helpful suggestions:

  1. Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.
  2. Never use a long word where a short one will do.
  3. If it is possible to cut a word out, always cut it out.
  4. Never use the passive where you can use the active.
  5. Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.
  6. Break any of these rules sooner than say anything outright barbarous.

I'm not exactly Orwell's biggest fan but these all seem basically sensible rules.

Billig offers a somewhat overlapping set of advice in a similar vein. Paraphrased, his suggestions are:

  1. Use simple language and avoid technical terms as much as possible.
  2. Prefer the active voice over the passive, in order to give more specific information about what occurred.
  3. Prefer verb-heavy over noun-heavy sentences, especially avoiding phrases that consist of multiple nouns clumped together.
  4. Treat all of these suggestions as guidelines over rules.
  5. When writing about things concerning people, do so in a way that is populated - rather than talking about abstract systems and properties thereof, illustrate with examples of real people doing real things.
  6. Technical terms can be useful, but don't fall in love with them or become too attached.

This seems like broadly sensible advice, although I'm not sure (and am pretty sure that Billig doesn't believe) that some basically sensible advice can possibly counteract people's tendency to follow incentives, and there's an entire book about how the incentives push you away from doing any of these that I just read...


You Can Do Anything (But You Can't Do Everything)

I posted a link to On Fluency on Twitter the other day, which prompted a short conversation about the following paragraph:

Across these experiences, at a low level, my brain has recognized that it can understand things, that given the time and resources, I can figure out what is going on, in almost any domain!

I think this is true, but there is an important caveat: Most of the time you don't have the time and resources.

It still is a useful thing to believe, and makes for a useful reflective exercise.

Pick a skill it would be nice to learn. Now pretend you have unlimited time, money, and motivation to learn it. Can you learn it? Yes, you almost certainly can. Can you learn it to the standard you want to achieve? Almost certainly unless your standards are incredibly unrealistic.

In fact you have sharp limits on time, money, and motivation, but by imagining you don't you can start to construct a plan for how you would learn it if you did.

Say I wanted to learn to speak Japanese fluently (I don't especially. It's not even high up on my list of languages it would be nice to know). I might do the following:

After five years of that I think it's fair to say that I would be a fluent Japanese speaker with a reasonable accent.

Could I achieve the same result with less than that? Some of it is overkill. At that level of immersion I might be fluent in, say, two years (probably not, but lets be optimistic). After that, it's a matter of deciding my trade-offs. If I spent less time on the subject it would take longer. If I didn't hire the tutors it would require more work and also take longer.

By playing with these numbers I can get a reasonable idea of what sort of time investment would be required and get a rough range of the parameters involved. These don't have to be especially realistic predictions, they just have to give me a broad sense of what it might look like.

There are a couple things you can then do with this information, but in my case the answer is easy: I'm sure I can learn Japanese, but that is way more work than I want to invest in it, especially given that I don't particularly care about learning Japanese in the first place.

Because once you leave counterfactual land with your estimate of how much work learning the skill will actually take you, the correct decision is often that it's not worth that much work. This is because if you decide to spend those years full time working on your Japanese you are then not spending those times on other things you value more highly. As soon as you are resource constrained, you have to start prioritising.

I think this is a thing that it's often easy to miss: The fact that you can do anything doesn't mean that you can do everything at once, and deciding to do something necessarily means that there are many other things you are deciding not to do in its place.


Teach People How To Do Things

You know what makes me mad?

Well, everything makes me mad. It's kinda my brand.

But specifically a thing that has raised me to incandescent levels of rage recently is this: The idea that hard work is sufficient for success.

This idea is toxic and if you spread it you are hurting people. Cut it the fuck out.

This post is triggered somewhat by answers to a question I had about how to learn to write poetry, but it's not really the people giving those answers I'm mad at. This is mostly leftover rage from an awful book about writing.

Hard work is necessary for success. You will never get good at something without a lot of practice at it. "Natural talent" is an illusion mostly caused by natural interest - the people who are naturally good at something are the people who are constantly thinking about it and doing it. Even when some people are naturally better at some things than others, that does not change the necessity of hard work, only the effectiveness of the work you do.

But when you treat hard work as sufficient for success you are basically throwing people who have not figured out the right approach themselves under a bus. They may be working incredibly hard and you are telling them that the reason they are failing is because they're not working hard enough. This advice will burn people out and possibly literally kill them.

The correct way to get better at anything is the following:

  1. Figure out what aspect you need to improve.
  2. Figure out how to improve that aspect.
  3. Work hard on improving that aspect.

For example, the thing I really need to work on if I want my writing to improve is probably to work on my editing skills. My writing quality has mostly plateaued, and I haven't prioritised fixing that, so I mostly haven't been working on improving it, but that's what I would do if I wanted to.

If you do not do the first two steps then step three will work only by luck. Many people are lucky and figure out the first two steps on their own, possibly without even noticing that they have done so, but many do not and are then made sad by your terrible awful dangerous advice that step three is sufficient on its own.

Please note: I will not accept your exceptionalist bullshit about why your subject is special and comes from the soul and these rules do not apply. Your subject is not fucking special. It is a skill.

Nobody can do your hard work for you, but steps 1 and 2 are absolutely something that experts can help you overcome, by giving you an idea of the shape of the answer and providing you with help on debugging your current weaknesses. There is no royal road to philosophy or any other skill, but it sure helps if someone hands you a machete and points you in the right direction rather than just dropping you in the middle of the forest and saying "I'm sure it's somewhere around here. Good luck, have fun!".

As a teacher your moral obligation is not to save people from having to work hard, it is to ensure that when they work hard they do so in a way that helps them improve.

This is something our education system reliably fails to do. Depending on the subject in question we vary from dismal to appalling, but it is very rare for the education system to reach the heady heights of mediocrity in this regard.

I do not know how to fix our education system. It's a mess. Instead I am doing damage control, trying to counterbalance its worst excesses. We live in a world where most people have been damaged by it, and I would greatly appreciate it if you would stop rubbing salt into their wounds by repeating the worst of its lies.


Book list from CUP.

I have a habit for stopping myself from buying too many books in book stores: When I see a book that I want but am unsure about, I go "Oh that looks interesting" and take a photo of the book instead of buying it. I try to make sure when doing this I always buy at least one or two books so I don't feel like I'm ripping the store off, but it curbs the worst of the spending spree impulse.

Anyway I was in Cambridge University Press book store yesterday. The following are the books I noted as looking interesting but probably available via my university library:

The books I actually bought were:

I'm currently about halfway through "Learn to write badly". It's both very interesting and not at all what it sounds like.


Building Friendships

A reframing that I've been finding helpful recently is to stop thinking in terms of "making friends" and move to thinking in terms of "building friendships". I don't think you should feel any particular obligation to do the same, but it's been helpful to me.


  1. Someone is not binary either your friend or not your friend. You don't just "make friends" with them and instantly become BFFs. This framing makes the growing nature of the frienship obvious.
  2. A friendship is a collaborative activity. You don't make someone your friend, you build a friendship together.
  3. Honestly "make" has weirdly coercive overtones to it that I don't like very much.

One of the things that has got me thinking along these lines is how you go about explicitly building (and maintaining) friendships. I wrote a little while back about how explicit decision making is often better but one of the problems here is that making your decision making around frienships explicit inevitably comes off as creepy.

For example, here are two things that have come up:

  1. Maintaining a list of friends you want to prioritise and actively trying to spend more time with them.
  2. Having a "frienship onboarding" process where when you want to be friends with someone you can basically just invite them into the process without it seeming strange.

Of course, I think most people who are good at frienship are doing these things anyway, they're just not admitting (possibly even to themselves) that this is the case.

To be concrete, I don't do the former but have a friend who does, which I only found out because I mentioned to her that I was thinking about doing it. I don't really do the latter but I've been thinking about it - some things I do implicitly are in that general shape, and I'm starting to think about making it more explicit.

And in doing so, it sure feels creepy, even from the inside.

Part of this is due to something Granny Weatherwax said:

sin, young man, is when you treat people as things. Including yourself. That’s what sin is.

(I don't think most people got this lesson from Granny Weatherwax, but it's a widely shared belief)

Starting to build these explicit systems around people sounds very like treating people like things to me.

Friendships though? Friendships are things. Or rather they're systems - a complex set of implicit and explicit agreements and shared knowledge and understanding between people. Systems are, I think, OK to treat as things, as long as you do so with respect for the agency of the people who make them up.

It still feels manipulative. I think that goes away somewhat by being up front with people about it, but I don't yet know how to do that - even in a culture which successfully promotes overcommunication you still end up with this being a tricky subject. e.g. the list system is I think good, but once you've admitted its existence people will want to know if they're on it, and it's hard to do that without hurting people's feelings.


The people that live in your head

There's a good Twitter question about books that have changed your worldview for the better. A lot of the ones listed are fiction, including mine.

A related question I asked the other day was about genre fiction provides good role models.

Although the primary purpose of fiction is of course entertainment, the primary benefit of fiction is that it fills your head with people.

There's a study from a few years back about how reading fiction improves theory of mind:

Kidd, David Comer, and Emanuele Castano. "Reading literary fiction improves theory of mind." Science 342.6156 (2013): 377-380.

Their conclusions were that literary fiction does improve theory of mind and popular fiction does not. I'm a little skeptical of this claim - it's not that I doubt their conclusions in the particular experiment, but I think the implications are far less general than they claim.

I think a more reasonable (but perhaps less exciting / conducive to sharpening the axe they want to grind) set of conclusions would be:

  1. Reading fiction can improve your theory of mind.
  2. Which fiction you read may have a dramatic effect on how well it achieves that.

For example I think reading Pratchett's witches books is likely to have a very strong impact on your theory of mind (it did for me), while reading, say, Xanth might go as far as to having a negative impact on your theory of mind.

That being said, while quality is of course important, I think quantity has a virtue all of its own, as long as there is a reasonable diversity of failure modes in that quantity. The nice thing about genre fiction is that people can and do read a lot of it, which gives you a large number of people in your head.

This is one of the reasons why having a good range of diversity of authors and protagonists is helpful - it begins to give you a theory of mind that encompasses a wide range of human experience.


How to build a superintelligence

I tweeted the following:

It's a source of some frustration to me that we've had the capability to build useful functioning superintelligences for about 100,000 years and yet never seem to have taken this capability seriously.

I was asked to elaborate, so this is an elaboration.

Short version: Groups of people are superintelligences. They are reasoning entities that are capable of tasks significantly beyond those of individual humans.

We build groups as superintelligences all the time (albeit often very ad hoc and dumb ones), but what we don't seem to spend much time doing is explicitly designing them. This is a shame, because there is a lot of interesting low-hanging fruit in this space.

A particularly interesting/frustrating example is from Nigel Cross's "Can a Machine Design?". In this they experiment with building a "computer aided design" system in which humans take on the role of the computer. Either the "computer" half of the equation generates ideas and is critiqued by the humans, or vice versa.

This paper is interesting for two reasons:

  1. They come up with some interesting conclusions about human/machine collaboration. In particular they suggested that having humans critique computer generated work rather than vice versa was a significantly superior option. The entire industry then went and did the opposite.
  2. They built a system in a day that was in some ways significantly superior to anything we've used since, and then went "huh, that's interesting" and threw it away.

Humans are incredibly flexible components in system design, and we're very good at taking on specialised roles. We mostly don't use these capabilities, and I'm reasonably sure that if we did then we could very easily design reasoning systems that vastly outperform the ad hoc ones we build instead.

An interesting design point in this space is Liberating Structures. They don't explicitly frame their work this way, but it has many interesting characteristics for this problem.


Reading a Paper

I mentioned this system in passing before but thought it would worth noting down explicitly and commenting on some of its reasoning.

When trying to decide whether to do a close reading of a paper I've started using a points based system. Ideally I should be able to evaluate this score in about 10 minutes of reading maximum.

I ask three questions:

Each question gets a score from -1 to 2, with -1 meaning unusually bad and 2 meaning among the best I've seen recently in that category. These scores are added together to get a total score for the paper. A paper with a score of 2 or higher gets read in more detail.

The scoring is slightly odd but basically the reasoning is this gives the following combinations worth reading:

In particular a paper has to be doing well on two of these characteristics or amazingly on one of them in order to be worth reading.

While doing the initial casual reading towards, one thing I make sure to do is note anywhere I'm confused and then move on as quickly as possible. When doing the second reading I start with those points.


Book Review: Rereadings, edited by Anne Fadiman

Unusually for a review, I didn't like this book. This is a shame as I really wanted to, which is why I finished it, but 'twas not to be.

It's a collection of essays by people talking about rereading books, usually ones they had read when they were in their teens or younger. They talked about the impact of rereading and how their views of the books changed as they got older.

The problem was that fundamentally all of these essays were really stories about the authors' lives and how their reading of a book intertwined with that and fundamentally... I just didn't care. I can see ways it could have worked for me, and I kept reading it hoping to find some gems, but as it stood it was just many brief glimpses into the lives of people I didn't care about viewed through the lens of books I didn't care about.

This is not to say that it's a bad book, but I think it will work well for you only if you are nostalgic for the same kinds of books that the authors are, and fundamentally my formative years were spent on a diet of science fiction and fantasy, while most of the books talked whistfully about here are the kinds that literature professors read when they want to feel like they're slumming it.


Miscellaneous Notes on Vegetarian Cooking

I'm not a vegetarian any more, but I was during a lot of my formative years learning to cook properly, and I think I'm a much better cook for it.

As a result, I get very bored of omnivores' anti-vegetarian-food takes. Vegetarian food is great. Sure, any individual vegetarian dish may not be, but I've seen some pretty disappointing meat-based dishes too.

The reasons people tend to dislike vegetarian food seem to fall into roughly three categories:

  1. They're stupid and wrong.
  2. There is a very common style of vegetarian cooking that really is strictly worse than meat-based cooking.
  3. They have mostly been exposed to vegetarian food made by people who don't know how to construct decent vegetarian dishes.

What's the common style? It's what tends to get referred to as "fake meat". For example, if you go to UK supermarkets, you may get the impression that vegetarians live mostly off quorn. This does appear to be true of many vegetarians and vegetarianish people that I know. This is unfortunate because Quorn is made by people who hate joy and want your food to be an endless litany of bland suffering. Don't eat Quorn.

(If for whatever reason you feel that Quorn adds something of value to your life, by all means eat Quorn, I'm not the boss of you. I just personally think it's an awful substance, and I feel reasonably confident in saying that it's at the root of a lot of negative opinions about vegetarian food.)

In general my recommendation for dietary restrictions is: Unless you absolutely cannot avoid it, do not make any substitutions that you would not be happy to make if that restriction were not in place.

For example I like tofu, especially smoked, and seitan (though I suspect I can no longer eat seitan) a lot. They are interesting things in their own right which you can build a dish around, and they do not in any way pretend to be meat (yes you can get tofu sausages, but personally I recommend you don't. I will concede an affection for seitan based burgers, but they're a thing in their own right and don't taste especially like they're trying to be meat). You can get quorn that pretends to be meat, and what you will get is a dish that would fundamentally taste better if you had used actual meat instead.

Restrictions are an excuse to try a variety of things that you would not otherwise have tried, and to find new and different points in culinary space. If you make things that are inferior copies of the unrestricted version of the diet then you will be constantly reminded of how much happier you could be without the restriction in place. If instead you make things that use the ingredients available to you in interesting and novel ways, you will eat tastier things which will not suffer by comparison.

Sometimes you have to substitute of course. If you can't or won't consume dairy and can't learn to enjoy your coffee black (which is fair enough - I like coffee both ways, but they're very different drinks), you're going to have to use a non-dairy milk, and unfortunately all non-dairy milks I've encountered work very poorly in coffee. I hear good things about some, but I've never actually encountered it working well myself. Similarly if you don't eat eggs, you will have to substitute in baking in order to get a good binding agent. Substituting isn't a point of shame, and it's not something to avoid religiously, but it shouldn't be a primary basis of your diet - you should do it when you can't avoid it or it produces something interesting and different, not just because it's easy.

One of the reasons why substituting works so poorly for vegetarian food is that meat and cheese based dishes have a fundamentally different character to good vegetarian food, which is that they are built around a single strong primary flavour, with all of the other flavours acting in support of that one. This is rarely the case for good vegetarian dishes. Most vegetarian food is built out of a set of mutually constrasting and supporting flavours, with no single one dominating. Instead of figuring out what goes well with the centerpiece of the dish, you construct a dish out of components that work well together.

This style is why I think learning to cook as a vegetarian greatly improved my skills as a cook, because it forced me to think about flavour combinations to a degree that would have been less obvious as a meat-based cook. It's not that these skills don't matter when cooking with meat, but you can get by with developing them to a significantly lesser degree, and most people do.

It's also why I think most vegetarian food cooked by omnivores is terrible, especially vegan food where the lazy option of piling cheese on it is not available: They still start with a centerpiece, but it's one that is not really suitable as such. For example, you see a lot of dishes that are basically "Help how do I make a cauliflower interesting?" and the result is something that would be fine as a side dish.

A related problem is that because people aren't thinking about the balance of the dish and the multiple ingredients that go into making it work, they also miss out crucial details. Protein is the major failure mode here. It's easy to forget that protein is something you need to explicitly take into account in your dish preparation when the centerpiece of your dish automatically guarantees you have enough of it. Vegetables, rice, etc. have more protein in than than is commonly recognised, but you should still add more in to your dishes on top of that, and omnivore cooks doing vegetarian food rarely realise that this is the case.


Book Review: Rewriting the Rules

I have a major problem with this book.

Before I explain what, some context for you. I have a very strong bias towards short books. This is especially true in recommending them to people. I'm aware that my reading rate is absurd (I was expecting to finish this book within a single 24 hour period, but a cold which made it hard for me to concentrate meant it was more like a 40 hour period). Even with careful curation, if I recommend all of the books I read and like to people then it's basically like putting them in front of a book firehose. Long books are particularly bad for this - a 200+ page book is a serious investment of most people's time. A 300+ page book is probably a month of wall clock time, minimum. So when I recommend a book of that length, I am basically creating a situation where either I am asking them to devote a huge chunk of their precious time to pursuing my special interest, or they will feel vaguely guilty for not doing so. As a result, these days I basically never recommend books this long,

Anyway, back to the book. This book is 335 pages and I really think you all should read it. I actually feel this strongly enough that I've bought a second copy so that I can shove it into the hands of multiple groups of people I think need it (the rest of you can buy your own copy).

Meg-John Barker is an excellent writer. This is the third book by them that I've read, and the previous ones I've read ("Queer: A Graphic History" and "How to understand your gender") have been good. This one is great.

I've been aware of it for a while, but I've always assumed it was basically a manual on how to do poly. I didn't really feel I needed that, so it took me a while to get around to reading it.

That impression wasn't completely wrong, but it was rather incomplete. There is a chapter about monogamy and non-monogamy, and it does discuss a variety of poly relationship types, but it's less a manual on how to do poly and more a manual on how to do people (in both euphemistic and non-euphemistic senses).

The core message of the book is this: The social world is made up of narratives that dictate certain rules of behaviour. Most of what we understand about ourselves, our relationships, and how our gender and sexuality bridge those things, is the product of those narratives rather than an intrinsic property of reality. Other narratives are possible, and while these other narratives might not be better (and it would not be any healthier to hold rigidly onto them), by maintaining an open and honest attitude, and by being willing to experiment, we can find new narratives that work better for ourselves.

Most of the book is an excellent practical account of how to find and analyse these narratives in our relationships - with ourselves, our friends, our families, and our lovers. It is a thoughtful and insightful book that draws on the buddhism, therapy, and a great deal of life experience with LGBT+ issues.

The most immediately useful things about the book for me:

I will attempt to write a more insightful post about this book when I have less of a head cold, but honestly just go read it.


Notes on Queer Life as Combat Epistemology

There are a bunch of concepts that my brain is still processing and I need to work on a bit more before I can turn them into a real essay, but here are some interim notes on them that may be useful. They're not fully formed and may be poorly explained.

The two concepts I want to talk about are the experience being illegible, and marginalisation (with a particular focus on LGBT issues and neurodivergence, because that's what I know about) as a form of combat epistemology.

That sounds very esoteric but I promise you they're both extremely useful concepts.

The notion of legibility comes from James Scott, who is an anthropologist with a focus on anarchism. I know nothing about James Scott's sexuality or neurotypicality, but his analysis of the similarities between Soviet farm collectivization and 19th Century German scientific forestry turns out to be a very useful tool for understanding LGBT issues.

I'm not fucking with you, honest.

The legibility of something is how easily comprehensible it is in simple terms. Shared land ownership with complex covenants and rights is legible to the few people involved in the process, but illegible to a central government who doesn't really care about the land usage except to the degree that it can tax it. To make it legible, the government insists on a simple land ownership model where one person owns and has sole rights over that land, and they tax that person accordingly. This, being backed by government power, in turn changes the relationship that the locals have with the land, possibly in very negative ways.

There's a classic (i.e. mid 20th century) saying that "the map is not the territory", meaning that the terms and models we use to describe reality are not the same as the underlying reality they describe. This is true, but Scott's observation is that the map is the tool that those with power will use to reshape the territory, and they will do so by prioritising their need for comprehension over the needs of the people they have power over.

If you want to know more about the role of legibility in Scott's original analysis of it, I gave a keynote about this to a bunch of Python programmers back in 2017.There's also a written version, but they don't actually say the same thing because I got up on stage, saw 800 impressionable minds staring back at me, and panicked and all the words went out of my head.

What does this have to do with LGBT issues and neurodivergence?

Well, it comes back to some of the things I described in my review of Trans Like Me. One of the major experiences of being LGBT, neurodivergent, or (presumably) otherwise marginalised, is that the tools of interpretation that you need to understand the world are those of society. It is powerful to have a concept of nonbinary, or asexuality, or aphantasia, or dyspraxia, or any one of dozens other hermeneutical resources (tools of interpretation) available to you. Having these will significantly improve your life if they are the concepts you need to understand your experiences.

After which, you will find that the fact that you have that concept does not mean that the society around you does, and they may be actively hostile to acquiring it (A contributory injustice).

What does this mean? It means that you are now illegible to them. You do not fit the simplified model of the world that they have in their head, and they will constantly be trying to fit you into that model anyway. Every time you find that the bathrooms only say male or female, or you encounter advice that works great if you're neurotypical and terribly for you and your brain, or you otherwise fail to fit into the right box, you are experiencing your own illegibility to society. Every time someone passes a bathroom bill, or refuses to make a venue accessible, or bans plastic straws, you are experiencing society's attempt to legibilise your role in it, possibly by excluding you altogether.

This is a shared experience of marginalisation: We do not make sense to the broader society, but in all our interactions with it they will try to make sense of us. They do this not by trying to understand our experience, but by trying to fit us into their own. This will fail in ways that harm us, because we are illegible to them.

It is also a shared experience of the non-marginalised to a lesser degree: Nobody conforms to society's expectations perfectly, and even those who are close are often in constant anxiety about it. A large driver of that anxiety is driven by the fear of being weird, which is mostly about having your behaviour be illegible to others.

Now lets talk about combat epistemology (this is a deliberately somewhat tongue-in-cheek term and I'm not that wed to it).

As far as I know this is a term I've made up for what I'm about to describe. I've seen occasional usage of it, but mostly in reference to Charles Stross using it in one of his books. I think his use is compatible with but a subset of mine.

What do I mean by combat epistemology? Combat epistemology is the set of techniques that lets you fight epistemic injustice (injustice done to you in your capacity as someone who is attempting to understand the world). Combat epistemology is the construction of knowledge and understanding while embedded in an adversarial environment. You need to understand the world, but the world does not want you to understand it. They deny you information, they feed you lies. In this environment, how will you come to understand what you need to?

How do you do combat epistemology? That is an excellent question that I would very much like a good answer to, but I would like to suggest that LGBT+ and other marginalised communities contain a great many experts in practical combat epistemology, and identifying them and finding out exactly what it is they do would be a great way to work on this. For example, Meg-John Barker's "Rewriting the Rules" (which I'm currently about halfway through. Review to follow when I've finished) can be regarded as an excellent handbook in practical combat epistemology.

So, more research is required. In the mean time I would like to tentatively offer the following possible tactics for fighting an epistemic war:

  1. Build communities of knowledge around differences and marginalisations and actively discuss the ways in which your understanding of the world is under attack. Attempt to construct a better understanding of the world between you. (Hopefully you are already doing this).
  2. Err strongly on the side of over-communication, at least among people you trust. It is much easier to notice that this is happening if you share your experiences with others.
  3. Actively create tools for understanding the world. Have you noticed something important? Talk about it with other people. Build a language for describing it.
  4. By differing from mainstream society, you become a domain expert on a particular shared aspect of human experience. There are problems you experience more acutely than the norm, and which you have access to more perspectives on, but that doesn't mean that people who lack your shared experience are not experiencing those problems and suffering as a result. This means that you are in a very powerful learning environment for figuring out how to tackle a widely experienced problem. For example, asexual people often understand sexual attraction much better than sexual people. ADHD people know a great deal about attention management. Poly people know a great deal about communication in relationships. These are things everyone struggles to understand, but that some communities do dramatically better than others. Share those tools, both because they will help people and because with their success you will bring the broader society's understanding of the world some way towards your own.


Testing if nearness is an equivalence.

Suppose you have points \(X = \{x_1, \ldots, x_n\}\) and an arbitrary pseudometric \(d: X^2 \to \mathbb{R}^+\). Let \(\epsilon > 0\). Can you decide whether the relationship \(x \tilde y \equiv d(x, y) \leq \epsilon\) is an equivalence relationship in better than \(O(n^3)\) time?

You certainly can't do it in less than \(O(n^2)\) time. Consider \(d(x_i, x_j) = 1_{(i, j) = (u, v)}\) and \(\epsilon = \frac{1}{2}\). Any algorithm that has not checked all pairs \(u, v\) will fail to distinguish one of these from the pseudometric that is always \(0\).

I'm reasonably confident the answer is that there is no better than \(O(n^3)\) solution but I haven't checked. I'm just writing this problem down because I popped into my head and I wanted to get it out of there.

Update: No I'm being stupid. There's a general \(O(n^2 \log(n))\) algorithm to check if any reflexive and symmetric relationship is an equivalence relationship. It works as follows:

import itertools

def is_equivalence(points, r):
  vec_to_fingerprint = {}

  fingerprints = {
    x: vec_to_fingerprint.setdefault(
      tuple(r(x, y) for y in points),
    for x in points

  return all(
    not r(u, v) or (fingerprints[u] == fingerprints[v])
    for u, v in itertools.combinations(points, 2)

We "fingerprint" each point by the set of points it is equivalent to, turning it into integers we can compare in \(O(\log(n))\) time. We now test whether every equivalent pair has the same fingerprint. If yes, then the equivalence relationship is just whether the two points have the same fingerprint. If no, that witnesses an intransitivity.

I don't know if you can get rid of the \(\log(n)\) factor, but now I really don't care.


Book Review: 100 Ways to Improve Your Writing, by Gary Provost

I mostly bought this book because of how much I love the following passage:

This sentence has five words. Here are five more words. Five-word sentences are fine. But several together become monotonous. Listen to what is happening. The writing is getting boring. The sound of it drones. It’s like a stuck record. The ear demands some variety. Now listen. I vary the sentence length, and I create music. Music. The writing sings. It has a pleasant rhythm, a lilt, a harmony. I use short sentences. And I use sentences of medium length. And sometimes, when I am certain the reader is rested, I will engage him with a sentence of considerable length, a sentence that burns with energy and builds with all the impetus of a crescendo, the roll of the drums, the crash of the cymbals–sounds that say listen to this, it is important.

As you can see in this passage, Provost is very keen on writing in a style that works well when spoken. Fortunately, this is the objectively correct position, so that's a point in his favour. In fact, this book made me realise that actually reading out loud is very different from reading in your head, so I'm going to try to do that more often when writing.

This is one of the best passages in the book, but the rest of the book is also very good, and contains a large number of useful tips. It's also very short.

In general the book is very prescriptivist, but it's a good kind of prescriptivism - one that makes it clear that prescriptive suggestions are tools, not laws. He's quick to point out the limitations of the techniques he suggests, and that no rule is universal.

I also liked the following passage from the conclusion:

Writing is art, not science, and when I finish a piece of writing, I do not review every single one of my tips. I ask, have I communicated well? Have I pleased my readers, have I given them something that is a joy to read? Have I entertained them, informed them, persuaded them, and made my thoughts clear to them? Have I given them what they wanted?

I like this because it not only points out the limitations of the rest of the book, but also stands alone as an excellent piece of advice in its own right.

I don't agree with everything Provost suggests, but nobody's perfect. For example he hates footnotes (and parentheses), which I love. The material on research is also quite dated. Fortunately, because it's a tool box rather than a set of commandments, you are feel free to pick and choose whichever of its tools work for you, and ignore the ones which don't.

If you're looking to improve your writing, I recommend this book as a useful and accessible way to do so.

On which note, I will leave you with the following quote from the introduction:

If your writing does not improve after you have read this book, you have not failed. I have. It is the writer's job, not the reader's, to see that the writing accomplishes whatever goal the writer has set for it.


Not the Box

This is an analogy that's been bouncing around in my head waiting for something to glom onto for a few weeks.

I don't think they still do this, but for the longest time my parents had an amusing verbal quirk, which is that they would refer to the main tower of the desktop computer as "the CPU" (the CPU being actually just a single component of the computer). They conflated the thing that does the core part of the function of the computer with the entire system - ignoring the fact that a CPU without a motherboard, RAM, a hard drive, etc. wouldn't really do very much of interest.

People do this with brains too. Dualism isn't real, but you are not a brain, you are a body, and your mind doesn't live in the brain, the brain is just the CPU. The mind is the whole box.

(Yes I know brains have memory and long-term storage as well as processing. It's not a perfect analogy, and brains don't work like computers anyway)


Book Review: Trans Like Me

This book is very good and I think you should read it.

(I am tempted to leave the review at that, but there's interesting stuff in the book that I want to discuss).

This book is patiently but insistently written. It clearly spells out what the trans experience is like, and it pulls no punches in doing so, but even when CN's frustration with the world was palpable in their writing I never got the sense that they were frustrated with the reader. It was always "this is how the world is and how it looks to us, I wish to help you get up to speed on a shared understanding of that".

That view of the world is at times very personal, and coloured and shaped by CN's particular experience of the world as a nonbinary person in a particular time, place, and context. I don't think it could be otherwise, and they make no pretence that it is. They do their best to round it out by referring to the works and experiences of other trans people, and their best is very good. It is necessarily not "definitive", but no view is. I think this book nevertheless gives a valuable and balanced view that most people (especially people who are not already knowledgeable about trans issues) would benefit from reading.

A while ago I asked about good books about gender to recommend to cishet people, and didn't get any very satisfying answers, but I think this book is a good answer to that question. Perhaps it would look different to a reader who was actively transphobic, and such a reader would feel like the book was dripping with judgement of them, but I don't think so. CN is remarkably restrained in that regard. It always feels like they are saying "This is what they (you) are doing and this is how it is hurting us". There are remarkably few places in which I felt like the message was "this is why they are bad people".

The book will doubtless look different again to trans readers. I believe it would be helpful, especially for people who are just coming to terms with their identity and are thus not necessarily well versed in trans issues themselves, but it's hard for me to accurately say. I did do some careful googling to make sure that I wasn't about to recommend a book about the trans experience that trans people hated, and as far as I can tell the reception from trans readers has been generally very positive.

What did I, personally, get out of the book?

Well it wasn't news to me that these things happened. I found the description of dysphoria quite helpful - I vaguely understood what dysphoria was, and had heard people experiencing it describe it before, but CN describes their experience of it very clearly in a way that I think helped me to understand a bit better.

Beyond that the specifics were mostly things I already knew, with a few useful details added.

I tweeted the following paragraph from it (page 130 in my copy, in the chapter "Are trans people real?"):

I had, and most likely still have, a tendency toward didacticism. It made me feel superior, when most of my world told me I was wrong. I am so thankful to all the people who have helped me to unlearn the defense of believing my particular truth to be universal. They taught me to really listen to other people, and to accept the limits of my own knowledge. I have never really liked putting my self into words. Listening taught me that the labels that confined me could liberate others. That the right answer for one person could become the wrong answer for another, and that all we could do was lend support in our shared individuality.

This paragraph resonated particularly strongly for me. As a cis man I do not share CN's particular experiences, but this part I feel very personally.

The most important take home from this for me was CN's framing of how this all fits together, and that as a result of that this was an extremely good book on practical epistemic justice (I don't think this is something CN consciously knew they were writing, but perhaps that's what makes it such a good book on the subject). This is actually the context in which it was recommended to me. I'm not sure I would have consciously picked up on this without that context, but it's a very useful framing of the book.

Epistemic Justice is justice towards people in their role as epistemic actors - people who reason and know about the world, and participate in a community of knowledge. The concept comes from Miranda Fricker's book "Epistemic Justice, power and the ethics of knowing", although honestly I'd probably recommend just reading "A Cautionary Tale: On Limiting Epistemic Oppression" by Kristie Dotson. It builds on Fricker's work, contains some useful concepts that the book does not, and corrects some limitations in the book's view. It's also much shorter and easier to read.

Part of why I recommend the paper over Fricker's work is that it focuses on what I think is the more powerful of the two concepts in Fricker's work and expands upon it. Fricker introduces two concepts of epistemic injustice: Testimonial and hermeneutical. Testimonial injustice is the discounting of a person's testimony based on who they are rather than what they say. This problem is prevalent, real, and very serious, but I think it is already well known (among its targets) that this happens, and I think it is obviously bad even without the framework of epistemic injustice in which to place it. This is not to diminish it as a problem, only to say that I don't think that the view of epistemic justice is that helpful in framing it.

The two other types of epistemic injustice described in Dotson and Fricker's work are hermeneutical and contributory injustice. In order to explain those I need to tell you what a hermeneutical resource is.

Hermeneutics is the study of interpretation. In the context of epistemic injustice what we are specifically interested in is interpretation of the world, so the tools of interpretation are the tools that we use to construct narrative and meaning that helps us understand the world around us. A hermeneutical resource is one of these tools - for example a word that describes a concept. The idea of being nonbinary for example is a hermeneutical resource.

A hermeneutical injustice is an injustice done to someone by denying them the hermeneutical resources that they require in order to understand their experience. Growing up as a nonbinary person who has been denied the concept of nonbinary is a hermeneutical injustice. The fact that you had a hermeneutical injustice done to you without having the concept of hermeneutical injustice to explain how it harmed you is also a hermeneutical injustice.

A third type of epistemic injustice, proposed by Dotson as an example that Fricker missed, is contributory injustice. A contributory injustice is a refusal to add other people's hermeneutical resources to your own, which denies them the ability to participate as equals in your culture of knowledge. Imagine (you probably don't have to imagine) someone who refuses to accept the concept of nonbinary as valid. This is a contributory injustice (and a dick move) - they are refusing to acknowledge the validity of a hermeneutical resource that you use to interpret the world.

Much of "Trans Like Me" can be read as a very hands on account of the contributory and hermeneutical injustices done to trans people, alongside a fourth type of epistemic injustice that I'm not aware of being discussed in the philosophy community that CN refers to as the production of ignorance. The production of ignorance occurs when you don't just deny people the hermeneutical resources they require to interpret the world, but you provide them with ones that will cause them to interpret the world in a way that is incorrect or harmful. The mainstream media's presentation of trans rights reliably engages in the production of ignorance, with biased and outright transphobic stories that cause people to paint the experience of trans people in a very different light than is warranted.

As I said, I don't think this view of the book as a book about epistemic injustice is intended, but I think it is a very powerful way of reading it, and doing so has certainly improved my understanding about both the trans experience and epistemic justice, and I am very grateful to CN for that opportunity.


Notes from TickTalk

We ran the first play test of TickTalk recently. We ended up undoing a bunch of changes I'd made to it and implementing a couple more. It went it extremely well and was some of the most insight dense conversation I have had in recent memory.

I didn't do note taking during the session because I was trying to focus on the system and also participating and I didn't have enough bandwidth. Others took notes, but I don't know if they're going to write them up.

Fortunately what I do have is the topics we discussed, so these are my notes based on those and memory. Note: I'm treating these notes as if we were under Chatham house rule, so I've not attributed anything to anybody except occasionally me. I don't think there's anything in here that would be at all embarrassing to identify, but better to err on the side of caution. Also because I'm going from memory I don't remember who every point or question came from.

These were:

I don't remember offhand which we discussed for five minutes and which we discussed for ten.

I'm going off memory and editorialising heavily, but the following are ideas I remember us generating.

What can we learn about meeting design from tabletop roleplaying games?

This turned out not to be a very good question. I intended it as a sort of troll question but annoyingly it was the first that came up. Some things I liked:

I'm not sure we learned everything very deep from this question though.

How do we tackle a conversation where you need to disappoint the expectations of your counterpart? Especially e.g. quitting your job.

There were two perspectives on this: shit sandwich (something nice, something bad, something nice) vs launch right into it.

We also talked about how much you want to forewarn someone. The use of "Can we have a meeting" gives someone time to prepare, but "I'd like to hand in my notice" is the equivalent of breaking up with someone over text.

We suggested that it helps to make clear that this isn't a decision you've come to lightly, and that you don't think that this is their fault (even if you do think that it is their fault). Unpack some of the reasons in your meeting, but don't overwhelm them with information - they need time to process. Offer to send them a written statement elaborating. It may be helpful to write that statement in advance to put your thoughts in order anyway.

Remember that, ultimately, people quitting is a thing that happens, and dealing with that is part of their job as a manager. Handling it well is a good thing for you to do, but if they handle it badly then ultimately that's your problem.

What can we use to refocus a meeting that has gone off-topic?

This went a bit all over the place and I don't remember it very well. Suggestions I remember:

How do you deal with challenging people/conversations in a group?

Depends on the context. It's better to take them aside afterwards and explain the problem than it is to try and confront them in the meeting, especially if they are someone with power, as they will tend to double down. Try to get them to buy in to the actual point of the meeting.

In a meetup group, if this doesn't work you can just kick them out.

Remote communication (e.g. via slack) - what are the benefits and draw backs versus face to face communication?

Remote communication is good for low importance low interrupt stuff ("Can anyone help me with X?" "Hey, Y, when you have a minute I need Z but it's not urgent") so is good for not disrupting people, but it goes sour fast for things that require nuance or depth. Face to face communication is ridiculously high bandwidth in comparison (we didn't talk about voice calls - probably a good substitute when remote?).

The approach we suggested was start conversations remove but quickly move them face to face as soon as they got in depth. We talked a bit about one couple who found they always got into major fights over whatsapp but never in person so dialled down their usage of it.

One question I have now writing this up is where things like IRC and Twitter fall in. I've had some amazing conversations on both that would never have worked in meatspace.

How do you introduce checkins (e.g. making sure that the friend you are interacting with is happy and feeling positive about what you're currently doing) without people feeling they have done something wrong?

A bunch of good suggestions here. I don't remember enough of them. I hope someone who took notes on this writes this one up.

From what I recall:

How do you decide what your career should be, or what you should do with your life?

Don't. Long-term planning is for suckers and vocations are mostly a lie. Instead work on watching for opportunities, experimenting, doing interesting things, and building a foundation for later.

We talked about a variety of ways of knowing when that's working for you - setting personal goals, journalling, deciding what you're actually getting out of it. Make milestones in the future to check in on your progress, setting yourself OKRs. Look into the future and telling yourself a story about how it goes if you keep doing what you doing, then seeing how you feel about it. Journalling and use the journal for emotional processing.

There were some good suggestions here. I hope someone else writes up their notes, because I'm sure I'm forgetting some.

How do you notice what decisions you are procrastinating on making?

I think the answers we had here have merged in my head into the answers we had to the previous question.

How do you prioritise how you spend your free time? e.g. Learning things, free time with friends. What are the pros and cons?

This mostly became a question about calendaring and todo lists. I only remember two specifics:

  1. I proposed using a "GP appointment booking" system to prevent your calendar filling up too far in advance - have a certain number of evenings, weekends, etc. reserved for "emergency booking" that can only be scheduled closer to the time. Someone reported that they already implicitly do something by booking free days in their calendar explicitly and occasionally overriding them closer to the time.
  2. We should spend some of our free time doing more of these because it was great.

We also talked about the problem of cycles of overbooking and burnout leading to underbooking.

How do you talk to people about their privilege?

I thought this was a good discussion but frustratingly I don't remember the specific suggestions very well. I remember us suggesting:

I swear we had some other specific concrete things.

General Conclusions

I wish I had been taking more detailed notes, but this was otherwise very good. I think the main things I will take from it are:


How to write when writing is hard

(Lightly edited and expanded upon from a message to a friend. Will turn into something longer at some point)

Meta advice on how to do hard things: Find things that you can already do that are like the things that you can't do. Analyse what the difference is, and then try variations where you change one thing about them to make them more like the thing you can't do. Work on those variations until they are no longer hard.

As a specific application to writing, the most important thing if you are writing less than you want to be writing is to get comfortable with the basic process of writing. Write literally anything. Tweeting is writing. All writing counts.

If you struggle with the process of writing, try keeping a journal, where you write whatever you want (it doesn't have to be a diary. I am practically allergic to the concept of keeping a diary for some reasons, but a journal where I just write stuff I'm thinking about down is fine). The journal is a place for fully unfiltered writing where there is no question of editing it or quality control.

I like journalling. The nice thing about journalling is it's just for me. There is no chance of me ever sharing my journal. I will eat my journal before I let you read it. This makes it much easier to shut out the inner perfectionist.

If unstructured journalling doesn't work for you, try coming up with some good prompts. I don't do structured journalling, so I don't have any great suggestions, but maybe start by taking notes on who you saw today and what you talked about.

Back to the meta advice, the following are things that it is common to struggle with in writing:

Not everyone struggles with all of these. Most people struggle with more than one of them.

I cannot say this strongly enough: If you are struggling with more than one of them and it is impacting your ability to write then work on a single one of them at a time. Find a form of writing where that is your only problem and do that until you are comfortable with it.

The problem is that if you are struggling with multiple hard things then a) You are playing on hard mode and b) You are playing on hard mode and learning less than you would be if you were playing on easy mode. Having multiple sources of difficulty means you are trying to figure out multiple problems at once and they're all getting in the way of each other, which means you are getting worse feedback than you would if you were only working on one problem at once. Work on one problem at a time and you will get better at it much faster than if you try to get better at many things at once.


Notes on Lagrangian Duality

A long standing failure of mine is that I've never really understand Lagrangian duality, and it's come up enough times that this is starting to get embarrassing. The big problem I've always had is that I could follow the logic but I didn't really get why this was at all a sensible thing to do, so it never really stuck in my head.

Today was dedicated study time for me, and the subject came up again, so I decided to have another crack at it. I think I've finally understood what's going on, and that the problem I was having was that people kept filling their explanations with distracting details. I'm sure someone must have explained it to me this way before, for some reason this time I was able to frame it in exactly the right way for me and it clicked. As such, this may not resonate for you.

The trick for me to was to make the problem more abstract to start with.

Suppose we've got some abstract set \(D\) and a function \(g: D \to \mathbb{R} \cup \{\infty\}\). Suppose further we've got some \(C \subseteq D\). The constrained minimization problem is that we want to find \(p^* = \inf\limits_{x \in C} g(x)\).

In general this is intractable, but if \(g\) and \(C\) have some special structure we may find it pretty easy. e.g. if \(g\) is a convex function and \(C\) is a convex set, you can just differentiate or do gradient descent or whatever.

One way to make optimisation problems more tractable is the idea of relaxations, where you replace the problem with one that is "easier" in the sense that any optimal solution to the harder problem can be turned into a solution to the easier problem that is at least as good, but not vice versa ("at least as good" is important. For example replacing \(g\) with \(g(x) - 1\) is a valid relaxation, though not a very useful one).

The utility of relaxations is two-fold:

The easiest relaxation is just to minimize \(g\) over all of \(D\) instead of \(C\), throwing away the constraints, but this doesn't give us a way of tightening up the relaxations easily. Other examples of relaxations include e.g. dropping some constraints, allowing an integer valued variable to take arbitrary reals, requiring the constraints to only be satisfied to within \(\epsilon\).

There is one particularly useful class of relaxations. Let \(\mathcal{F} = \left(\mathbb{R} \cup \{\infty\}\right)^D\) and \(f \in \mathcal{F}\) be such that \(f|_C \leq g_|C\). i.e. \(f\) can be arbitrary outside \(C\) but on \(C\) it cannot ever exceed \(g\). Let \(R_{g, C}\) be the set of such \(f\). Then necessarily \(\inf\limits_{x \in D} f(x) \leq \inf\limits_{x \in C} f(x) \leq \inf\limits_{x \in C} g(x)\) - i.e. such an \(f\) is a valid relaxation, and as such gives us a lower bound on \(p^*\).

Our goal is to choose \(f\) so that it is very large outside of \(C\) and very close to \(g\) inside of \(C\). If we can ensure that \(|f - g|(x) \leq \epsilon\) inside of \(C\) and that \(f\) attains within \(\epsilon\) of its local minimum in \(C\), we can get our approximation to within \(2 \epsilon\),

In fact, we can reduce \(\epsilon\) to zero! Pick \(f(x) = g(x)\) when \(x \in C\) and \(f(x) = \infty\) when \(x \in D \setminus C\). Then we have \(\inf\limits_{x \in D} f(x) = \inf\limits_{x \in C} f(x) = \inf\limits_{x \in C} g(x)\), and so the bound is exact.

Thus we have \(p^* = \max\limits_{f \in R_{g, C}}\inf\limits_{x \in D} f(x)\) - all such \(f\) give us a lower bound on \(p^*\), and there is at least one such \(f\) that achieves that bound.

The problem with this observation is that it is useless on its own - this version isn't any easier to solve than the previous one!

The thing that makes it easier is that we can now consider a restricted subset of \(R_{g, C}\) where this problem is easier to solve. Unfortunately, this destroys our equality. Suppose we have \(M \subseteq R_{g, C}\), all we can now claim is \(p^* \geq \sup\limits_{f \in M}\inf\limits_{x \in D} f(x)\), as we might have thrown away the values which are sufficiently close to the maximum as to actually attain it.

The Lagrangian gives us such a family of nicer functions. Suppose we can define \(C\) as the solution set to \(\boldsymbol{h}(x) \leq 0\) for some vector valued \(\boldsymbol{h}\). For any \(\boldsymbol{\lambda} \geq 0\) the function \(f_\boldsymbol{\lambda} = g(x) + \boldsymbol{\lambda} \cdot h(x) \in R_{g, C}\), as we know that \(h(x) \leq 0\) and \(\boldsymbol{\lambda} \geq 0\), so we know that on \(C\), \(\boldsymbol{\lambda} \cdot h(x) \leq 0\) and so \(f_\boldsymbol{\lambda} \leq g\).

Suppose we can calculate \(t(\boldsymbol{\lambda}) = \inf\limits_{x \in D} f_{\boldsymbol{\lambda}}(x)\) (literally we suppose that. In general we can't, but if \(f\) and \(h\) have some nice form then this is now a simple unconstrained optimisation problem). By our above discussion we know that \(p^* \geq \sup\limits_{\boldsymbol{\lambda}} t(\boldsymbol{\lambda})\). This latter quantity is called the dual solution, \(d^*\), and we have proved the primal-dual inequality that \(d^* \leq p^*\).

Why is this a remotely useful thing to do?

Well the main reason is that calculating the dual solution can be much easier for two reasons:

  1. It has much simpler constraints - the only constraint is that \(\boldsymbol{\lambda} \geq 0\), so the domain of our optimisation is a particularly simple convex set.
  2. It is the maximization of a concave function. We defined it as the infimum of a number of linear functions (of \(\boldsymbol{\lambda}\). They may be highly non-linear in \(x\), but that doesn't matter!), so it is the infimum of a set of concave functions and thus concave.

What this means in particular is that what we have is a convex optimisation problem (the function is concave, but we're maximizing it so it's convex optimisation)! Convex optimisation is easy, so calculating the dual is hopefully easy once you've got to this point.

Note that this is true regardless of literally anything about the original problem - no structure or continuity is assumed (although good luck calculating \(t\) if you don't have some structure and continuity).

Things I still do not understand:

  1. When the primal-dual gap is zero (i.e. when the inequality is an equality). I'm aware of some of the theorems about this but I need to study them.
  2. What the actual interpretations of the lagrange multipliers (\(\boldsymbol{\lambda}\)) means. I've seen a bunch of things about "shadow prices" and I had some idea what that meant, but I don't yet understand how that fits in with this interpretation.


Dietary Experimentation

Content Warning: Food, weight.

Attention Conservation Notice: Person on a diet is talking about the diet he is on.

Disclaimer: I am not a doctor please do not take any of this as professional medical advice.

I'm fucking around with my diet a lot at the moment. This is a post about some of the things I'm doing, how I'm making them doable, and why I think it makes sense. It's 50% note to self, 50% explanation of what this weird thing I'm doing is.

This is still very early days, so although initial impressions are positive and I am hopeful about its effects, it is very much not advice to follow my specific diet. Even I don't intend to follow this diet indefinitely.


As you may or may not know I have a lot of lowgrade ongoing health problems. I've been considering the possibility that some of these might be diet related for the while but not really doing anything about that. I recently read Valerie Aurora's post on the subject in which she lists the following problems as possibly diet linked:

  • Skin problems (acne, blackheads, red spots, redness, scaly skin, oily skin, rashes, etc.)
  • Fatigue
  • Depression
  • Insomnia
  • Gastrointestinal problems (acid reflux, gas, IBS, diarrhea, constipation, etc.)
  • Joint pain
  • Muscle pain
  • Migraines or headaches
  • Asthma
  • Frequent colds and sinus infections

Remember when I said I have low grade health problems? What I mean is I experience literally all of these (except maybe asthma. My doctor is trying to convince me I have asthma, but I disbelieve their statistical methodology. I think I have some sort of non-asthma related breathing problem and all of the interventions they claim allegedly reversed some of it were noise they were pretending was signal).

Some of this is obviously the Barnum/Forer effect. These are all things that "everybody" has to some degree, but I'm an outlier on most of them. I don't consider this conclusive evidence of anything but it's at least a nudge that this is something I should be looking into.

I identified the following as candidates for suspicion:

On top of this I've already been trying to eat more vegetables, and I've recently decided to make a concerted effort to lose weight (although haven't been doing a great job) so I've been trying to reduce the amount of carbs I eat on top of that. While I experiment with the dietary restrictions these are not the priority, but it would be nice to keep them up if I can.


My approach is to change literally everything about my diet. This probably sounds like a terrible idea, but it's OK - I promise I know what I'm doing. It's not completely outside the realms of what people advise, and it's something that works well with my particular psychology and health level.

The point of changing everything is not to learn what is causing the problem or to fix things. The point of changing everything is to find out if there is a dietary intervention that will have a non-trivial effect. The new diet is not intended for long-term use, but is an existence proof.

I know that the new diet will be safe for me because I don't have anything really badly wrong, just a lot of low grade problems, so I'm free to experiment and basically shunt my entire diet onto a new and very diffierent equilibrium point. Once I have reached that point and observed the results, I can triangulate between it and my prior diet - reintroducing things, eliminating things, and see what happens as I make incremental changes. I've alluded to this approach before

So, I'm making the following dietary changes:

  1. Eliminate everything on the list of suspicious foods.
  2. Remove almost all of my normal restrictions - e.g. my diet is normally low in meat and contains almost no fish, and I try not to buy snack food.
  3. Generally aim for a high-protein, high-fat, high-fiber diet.
  4. Eat a lot more fruit.
  5. Meals should still be at least 50% vegetables.
  6. As long as I satisfy the above requirements I may eat whatever I want, and I am forbidden from feeling guilty about doing so.

Initial Results

How's it going so far? Well, it's a lot of fun.

Yes, really.

There are roughly three things going on here:

The first is what I'm terming "restriction as reward" for the moment, though that name isn't quite right. Because I have such strong restrictions in place I am explicitly giving myself permission to not feel guilty about what I do once those restrictions are met. I made meatballs recently! I bought chocolate! I'm going to experiment with variations on my brownie recipe to finally perfect a gluten-free dairy-free version, at which point I will nom the hell out of them. Although I am putting restrictions in place, the fact that I don't have to think about what I'm eating beyond those restrictions makes the whole thing a mini holiday - here David, have a delicious reward.

(If this causes me to eat too much sugar then I might have to dial it back a bit, but the meat and fish are a pretty big part of the reward aspect and they are an active part of the experiment).

The second is that this gives me an excuse to experiment. I like cooking, and have fallen into a bit of a rut, so this is forcing me to try all sorts of new things that I don't normally do. Expect a bunch of recipe posts on the notebook as a result. This aspect is also a useful learning experience for how to cook better for people I know with dietary restrictions, which is always nice (I would describe myself as "good but not great" at this normally. A large chunk of what I cook is naturally almost vegan and almost gluten free, so it's not too hard to adapt that, but it tends to result in food that is on the less interesting end of what I'd normally make)..

The third is what you might think of as the "tube strike effect": When there's a tube strike, a lot of commutes permanently change afterwards, because the constraint forced people to try something different and they discovered they liked it better than their existing system (repeat after me: People are not optimizers). Trying out these restrictions gives me a kick to find out things that I would like and keep eating even if the restriction is lifted. For example, I have learned (or possibly been reminded - I can't imagine ever not knowing, but I might not have known that I knew) that fresh pineapple is ridiculously cheap and I really like it.

Is it helping? Hard to say so far, but I think so. Subjectively I feel maybe a bit better. The period since starting it has been incredibly productive, but it's hard to distinguish hypomania from health.

My bowel movements have definitely become more regular. I have dropped almost a kilo in the two days since starting dothis. I'd be alarmed by that fact except that I'm pretty sure I literally dropped it (and then flushed it), so it's probably just a happy side effect of whatever has made me more regular. All of this should be treated firmly as anecdote and not data.

Will report back later.


Book Review: The Knowledge Illusion

This book is 80% great and 20% red flags, but that doesn't mean it averages out to a book that is merely good, instead it is a great book that should be consumed well salted.

What do I mean by this?

Well, I really like the conclusions of the book and it has taught me a number of useful and interesting things, but it is fundamentally written in a popular science style. This means that its claims are stated with much higher confidence than I think is warranted, especially given the replication crisis, and it is often overly credulous about things outside of the authors' area of expertise - apparently blockchain is an amazing technology that is going to change the community of knowledge, and I nearly permanently put the book down when they started waxing lyrical about chaos theory and fractals early on. I feel like the authors should take their own advice and write causal explanations of the things they are referencing in their book in order to evaluate their own understanding.

But it's a really useful framing of how knowledge, cognition, and understanding work, and I think I would recommend it to everyone despite the above reservations.

The following are some of the interesting things in it:

The titular knowledge illusion is probably the main thing that I didn't know something about before reading this book, but I found the rest of it significantly increased my understanding of it. I think there is a decent chance that I will do a reread and take copious notes the second time through.


Pork Meatballs with Ground Almonds

I'm currently starting a restriction diet to cut out a bunch of things, which is about as hateful as it sounds. It also means that I'm temporarily giving up on my usual restrictions on eating almost no fish and not much meat, because it's too many restrictions to juggle at once. This combination means that my cooking is going to go a bit... weird over the near term future.

Anyway, here are the meatballs I made last night. They were pretty good:

Instructions: Mix all the ingredients together, smoosh them into meatballs, shallow fry in vegetable oil, turning over as they cook. They're ready when they uh look ready. I didn't exactly do science to that bit.


Doing Mathematics to People

It's very tempting to try to do mathematics to people - trying to model the world and the people in it as a simple mathematical systems. It's tempting in large part because it is in some sense very effective, but it has a tendency to go quite poorly in a number of ways.

James Scott's notion of Legibility is a classic critique of this.

I've been thinking about this and the nature of mathematics a bunch recently, and the other night I managed to express this in a way that I think captures an important feature of my thoughts on the subject.

The problem is this: One of the basic operations of mathematics is that you don't think about an entire mathematical object, you ask what its important properties are - it's a group, it's a topological space, it has a distinguished element with some properties, etc. This allows you to cleanly manipulate the object by ignoring most of the details about it, by pruning it down to only its most relevant features. The problem when you do this to people is that when dealing with real world objects, the question of what is relevant or irrelevant is an intrinsically political one, so when doing mathematics to people you're not solving "the" problem, but instead the variant of the problem which is convenient to those with power.


Notes on "Towards a general mathematical theory of experimental science"

A friend linked to this paper and I thought it sounded interesting. These are my first-pass notes. The paper failed to clear my threshold for a close reading, which was a shame because I was quite into it in the abstract. I would potentially like to do a close reading of something longer on this subject, but I don't feel this paper would be worth the close reading.

Close Reading Questions

Scores go from -1 to +2. The threshold for a close reading is that the total score must be 2 or higher.

Total score: 1


The general goal here is to provide an abstract mathematical notion of what it means to do experimental science. I believe they have provided a reasonable basis for doing so, but it is unclear to me what the actual implications of this are and I feel like if they have really been working on it for as long as they say then one of the following must be true:

  1. There aren't any very interesting ones.
  2. The authors are weirdly bad at motivating their work.

Based on the fact that the paper is reasonably well written and that I have also thought about this issue and didn't come up with any very interesting implications, I am inclined to think that it is the former, but I hope to be wrong.

Roughly the model of the paper is the following:

We have some abstract set of statements associated with physical reality. These are closed under arbitrary boolean functions (i.e. if we have some set of statements \(U\) and a function \(f: \mathbb{B}^U \to \mathbb{B}\), where \(\mathbb{B} = \{\text{true}, \text{false}\}\), we can construct a statement \(f(U)\) which is interpreted as combining the truth values of all of the elements of \(U\) and passing them to \(f\). This assumption gives the statements (up to equivalence) the structure of a complete boolean algebra.

They consider as a starting point statements which are verifiable - that is, there are is some experimental procedure that would allow us to eventually determine that the statement were true. These procedures are allowed to run for infinite time if the statement is false, but must eventually terminate in finite time if the statement is true. The verifiable statements are closed under finite disjunction (i.e. if A and B are both verifiable then the statement "A and B" is also verifiable) and countably infinite disjunction (i.e. if \(A_n\) are all verifiable then \(A_1\) or \(A_2\) or \(\ldots\) is verifiable).

They make the interesting (to me) observation that is useful to define an experimental science in terms of two sets of statements: An experimental domain, which is a set of statements we interpret as being verifiable by some set of experimental procedures and thus is closed under countable disjunction and finite conjunction, and a theoretical domain, which is the closure of the experimental domain under those operations plus negation. i.e. we can state theoretically things which we cannot verify but can only refute. This seems like a sensible and useful distinction, and their reasoning for doing so seems sound.

They then define the "possibilities" as the elements of the theoretical domain that are sufficient to determine the truth value of all other statements in the theoretical domain. These form the points that we are trying to distinguish by experimental procedure.

Probably the key observation of the paper is that these points acquire two natural structures - from the experimental domain they acquire a topological structure, while from the theoretical domain they acquire the structure of a \(\sigma\)-algebra. This is a nice observation.

Problems I have with the paper

My biggest problem is the one I already alluded to above - "OK but so what?". I think the formalism is interesting, but without seeing anything built on it it currently just stands as a pleasing intellectual curiousity. I don't have a problem with such, but this one doesn't really give me anything to get excited about beyond a sense of "Huh, neat", and I feel like the authors are treating this as a much deeper result than it actually justifies at present. If they can show some interesting implications from it, then I would revise this opinion.

I also found some of the formalism a bit sloppy. There are a bunch of places where I've written "???" or "suspish" on the paper that I would want to work through more carefully if I were to do a close reading. Notably I think they're playing quite fast and loose with the countability arguments, and I am rather suspicious of the claim that statements are closed under arbitrary boolean functions - I would have expected some continuity requirement, and I think the fact that they don't impose that calls into question some of their other claims about countable generation.

I think these formalism issues can probably be fixed, but as I said I don't plan to do a closer reading of the paper at present, so I haven't worked through the details of how.


Frequentist Statistics as a Tool of Critique

Epistemic Status: I'm pretty confident this is valid, but I'm not an expert in statistics and I'm even less an expert in philosophy of statistics, so I'm less sure that this is useful, and if it is then it's probably not novel.

The other day I figured out a framing of frequentist statistics that I quite like, which is that the core of frequentist statistics is something that you might call the model-based critique. The model-based critique goes roughly as follows:

  1. If we accepted your argument, we would also have to accept the following variant of it in an idealised model (where we know the ground truth).
  2. In said model, this undesirable outcome would happen with at least this probability.

For example, the structure of (two sided) significance testing can be thought of as follows:

  1. If we accept your argument that this estimator taking this value indicates that the true parameter must be non-zero, then certainly the parameter being larger should count as the same.
  2. Here is a model that produces data that is in some sense like the data that you are testing on, but where the true parameter is zero (the null hypothesis).
  3. Under data generated by this model, the threshold you have set would still report there being a real difference with probability \(p\) (the p-value).
  4. Therefore either you have to explain why the model we have proposed is missing some essential feature of your real process, or you accept that your argument will claim a real result where there is in fact only noise about that often. (which, depending on how often that is, might be fine!)

Thus significance testing (and model-based critique in general) is not a tool of inference, but instead a tool of critique - quantitative rhetoric if you will. It puts forth a counter-argument to a claim that the data is sufficient to support some conclusion or premise.

The thing I like about this framing is that it makes the following three things much more apparent:

  1. This is a valid thing to do.
  2. It's even a useful thing to do.
  3. It's not, however, a tool of inference, but instead a tool of refutation. When something is "statistically significant" what that means is we have failed to convincingly argue against it, not that it is true (or even likely!).

It's also nice that it's much more explicit about the relationship between the model and the experiment. I think most normal framing of frequentist statistics pretends that the model is in some sense true, while that is not an important feature of the model-based critique - instead it merely has to be a convincing analogy.

This framing has also shifted my opinion of frequentist statistics. It's not that I like it more or less, but previously my attitude was mostly "Eh, whatever. It's not my favourite thing, but I don't think I have a strong opinion on this and it's not one of the battles I choose to fight" while now I think the following:

  1. Frequentist statistics is a useful and valid tool and most of the criticisms of it are, I think, treating it as trying to do something that it's not (and, in fairness, most of the proponents of it probably are too).
  2. Most of the way that it's used in practice by people with a typical scientist's level of understanding of statistics are probably deeply flawed.
  3. It's probably easier to get people to stop using p-values than it is to get them to acquire the level of statistical sophistication that allows them to be a useful tool.


Book Review: The Life Changing Magic of Tidying Up

I'd been idly considering reading this for some time, and a little while back asked on Twitter what people thought of it. The review that tipped me over the edge was "It's probably much weirder than you're expecting". It was much weirder than I was expecting.

I enjoyed the book and will be changing my habits as a result of it, but I do not at present plan to follow the system. However, precommitment: If I have not got my room in a state where by May 18th 2019 (my 36th birthday) I can honestly say that I feel like my rom is reliably sufficiently tidy, I will follow the KonMarie system and do a big clear out.


Well, three reasons:

The first is that I have a deep seated distrust of anyone who says "This Is The System And Thou Shalt Not Deviate From It". I acknowledge that given an optimal system this is the correct thing to do and the temptation to deviate from the system is stronger than the benefits of adapting it, but my priors are strongly against any given system being proposed being anywhere close to optimal.

The second is probably my biggest objection to the book: I am really deeply personally offended that someone could spend so much time developing a system centered around throwing away stuff without spending some time thinking about how to responsibly throw things away. At some point she talks about how she's been responsible for clients discarding 28,000 bags of stuff, which is probably 28,000 bags of stuff taking up space in a landfill somewhere (I don't know how garbage disposal works in Japan. Maybe it's incinerated instead, which would be better but is instead 28,000 bags of rubbish worth of carbon footprint). So I'm on board with the idea of getting rid of things but want to figure out how to do so responsibly.

The third is that I understand her argument about storage but still think she's wrong. A lot of the problem with my current setup is storage space - not that I don't have enough of it, but that what I have is poorly designed, and I do think I need to resolve that problem before I can get things to a decent state.

So what am I going to do?

I said I'm going to change things as a result of this book, so here is a list of things that I am changing as a result of this book:

  1. I am donating a lot of clothing. I have a large bag that I'll be taking to the Samaritans this morning (chosen honestly for no other reason than there's a Samaritans shop that takes donations easily walkable from here).
  2. I'm getting a weekly pill planner. I have a lot of pill related clutter right now due to a variety of daily supplements and medications I'm on, and this will help me pack all of that away in a better location.
  3. I'm going to dump a bunch of books on the popup library in my local underground station.
  4. I'm going to ask my university library if they take book donations - I have a bunch of weird textbooks that I'm realistically never going to use but that will just get pulped or ignored if I take them to a charity bookstore or the popup library.
  5. I'm getting a Kallax shelving unit with inserts. My current book storage is a pair of stacked vertical book shelves and honestly they look awful and area aterrible use of space. The Kallax unit will be a much more effective and attractive way of storing books and has enough additional storage as to remove all plausible deniability: if I'm running out of space after I have the Kallax unit then there is no way to pretend that it is because I have too much stuff rather than not enough storage.
  6. I am going to be getting rid of the desk in my room (which I literally never use) and a half-finished computer assembly project that I honestly am never going to whole-finish, and take the relevant peripherals to work where I would find them actually useful.
  7. After that we'll see. As per above, I precommit to following the KonMarie method properly if a combination of the above and whatever naturally comes next does not successfully sort my shit out.


Notes from London Liberating Structures 2018-10-30

Yesterday was my second London Liberating Structures, where we did two different Liberating Structures. These are my notes from them.

Notes from Discovery & Action Dialogue

Discovery & Action Dialogue is a structure for identifying positive deviance: Given some widespread problem, some people fare better than others at it than others. Can we identify and copy those strategies? The question we asked was "How can we improve our London commute?".

DADs have seven structuring questions, but we combined the last two.

How do you know when problem X is present?

We identified three major types of problem we have with our commutes:

How do you contribute effectively to solving problem X?

We identified a number of strategies we used:

  1. Replace your "optimal" route with a slower but more predictable or pleasant approach. e.g. We talked about:
    • Using the Thames Clippers boat.
    • Alternate tube routes that involve slightly more walking (e.g. I rearranged my commute to involve walking from Lancaster Gate to work instead of Gloucester Road - it's a 15-20 minute walk instead of a 10 minute walk, but it's a bit faster so you hardly lose any time on it and the walk across Hyde park is much nicer).
    • Taking the overground instead of the underground - less crowded, so you have more space to read a book.
    • Motor biking, biking, or even walking.
  2. When there's nothing else you can do due to crowding, people suggested they used podcasts for this.
  3. Be better about advance planning - e.g. think about your layers, balance of your bag, etc. so that it's easy to adjust your comfort and temperature level when you're on the tube.
  4. Working from home and flexible working hours - if you can avoid commuting during rush hour, the problem mostly goes away.
  5. Leave earlier when it's less crowded and schedule something before work - e.g. swimming, meeting someone for coffee, morning glory.

What prevents you from doing this or taking these actions all the time?

A mix of being bad at organisation and constraints outside our control: e.g. if your job is not one you can do from home then you're basically stuck as to when you commute. If you don't live and work near to the river you can't take the Thames Clippers

Do you know anybody who is able to frequently solve problem X and overcome barriers? What behaviors or practices made their success possible?

We didn't really have any suggestions other than the ones we already said in "How do you contribute effectively to solving problem X?" (though there were a few things that got brought up here that were basically "Oh yeah I forgot to say...") and "people who are more organised than me"

Do you have any ideas?

The main useful ideas we had were:

We also talked about things councils could do - e.g. making more bus lanes suitable for motorbike use.

What needs to be done? Who needs to be involved?

This rather blended into the previous section to be honest - I'm not sure there was much of a noticeable difference between the sort of things we suggested in this and the previous section.

The ideas we suggested in this section were:

We made the observation at this point that almost all of our solutions were about mitigating or avoiding the problem rather than really solving it.

Notes from 15% Solutions

15% solutions is a structure for identifying immediate points of action - what can you do to improve the situation right now without requiring any additional resources or authority.

I don't have any detailed notes from this because I forgot to bring my journal with me. Fortunately Alex assures me that he will be writing up his (and we were in the same group).

We discussed the question "What can we do to address power imbalances in meetings?". What I remember of my suggestions were that I had roughly two categories of suggestion: Roughly, "better structures for meetings" and "develop ally skills".

In the former I suggested:

For ally skills I suggested the following:

Alex also suggested note-taking as a good tool here. There are two senses in which this is true: It can be very useful for making sure you give credit where credit is due, and also if there is some sort of formal minutes process where someone is expected to volunteer themselves, often that ends up going to a woman (being aware of this dynamic is part of why I almost always volunteer as note taker. The other part is that I'm decent at it and it helps me focus on the subject being discussed)

General Reflections

I'm afraid neither of these landed for me. I liked the idea of both and would be up for doing them again, but the specific instances as we did them didn't work.

Part of this is due to something I left as feedback for the organisers already: I do not think there is enough time to do two liberating structures in two hours. This is especially true because there's a break in the middle and because people arrive on London time, so two hours isn't really two hours, it's at best about an hour and a half, less if you don't count explanation time. The result was that the DAD felt a little rushed and the 15% solutions was too rushed to do anything meaningful with.

Personally, I think I would also like a more in depth discussion of the structures before we did them (in theory I had read the sections in the book before hand, but in practice I didn't remember anything from what I read).

I also had some specific complaints with both structures.

  1. I thought DAD's structuring questions were too unstructured, and the timing was insufficiently clear. I also thought the problem we were talking about was not really shared reliably enough among the people involved - the problem was that everyone had such different constraints that we didn't have much in the way of solutions we could transfer.
  2. I did not think that 15% solutions worked without more careful timekeeping than we did, and by the time we got to the second round everyone had forgotten people's specific suggestions from the first round. I think it would have worked much better if people had given their suggestions and then immediately been asked the clarifying questions (with a time box per person). This makes it somewhat closer to Troika Consulting.
  3. I missed having the talking objects aspect from conversation cafe.

All that griping aside, I still found it an enjoyable evening and an interesting experience - although I didn't think the structures worked for their intended purpose, it was still useful to try them out and observe their failure modes.


Notes on Blocking Sets

Here's some stuff I'm playing with related to some problems in the combinatorics of test case reduction.

Let \(X = \{1, \ldots, n\}\).

Let \(S \subseteq X\) and \(\pi\) be a permutation of \(X\). Say \(S\) starts \(\pi\) if \(S = \{\pi_1, \ldots, \pi_{|S|}\}\).

Let \(B \subseteq \mathcal{P}(X)\). Say \(B\) blocks \(\pi\) if \(S\) starts \(\pi\) for some \(S \in B\). If \(B\) blocks every possible \(\pi\), say it is blocking.

Lemma: Suppose \(B\) is a blocking set such that \(k \leq |U| \leq n - k\) for all \(U \in B\). Then \(|B| \geq n\choose{k}\), and this bound is tight.


To see that the bound must be tight note that the we can choose the set of all subsets of \(X\) of size \(k\), and this is a blocking set of the desired size.

Suppose now that \(|B| < n\choose{k}\). Pick a permutation \(\pi\) uniformly at random. For each \(k\), \(\pi\) starts with each subset of \(X\) of size \(k\) with equal probability. Thus for any given subset of size \(k\) the probability that \(\pi\) starts with it is \(\frac{1}{n\choose{k}}\). Thus the expected number of sets in \(B\) that start \(\pi\) is \(\sum\limits_{U \in B} \frac{1}{n\choose{|U|}} \leq \frac{|B|}{n\choose{k}} < 1\).

If the expected number of sets that start \(\pi\) is less than one, then the probability of it being equal to zero must be non-zero. In particular, there is some \(\pi\) that does not start with any of these sets, and thus is not blocked by \(B\). Therefore \(B\) is not blocking.


The question I'm really interested in is the following:

Say \(S\) is a critical set for \(B\) if \(B\) is non-blocking but \(B \cup \{S\}\) is blocking.

Suppose \(B\) is some non-blocking set such that \(|U| \geq 2\) for all \(U \in B\), and suppose that \(B\) has a critical set of size \(2\). How small can \(B\) be?

I believe the answer is that it most have at least \({n - 1}\choose{2}\) elements, but I keep failing to pin down the details.


The No Genies Conjecture

A thing I often end up explaining to people is that in testing you can go broad or you can go deep: Example-based testing tends to be deep, property-based testing tends to be broad. There's no real way to go broad and deep at once other than doing a stupid amount of work.

A similar dichotomy is between physical and digital tools. Physical tools tend to be flexible but limited power, while digital tools tend to be powerful but inflexible. There are very few tools that are both powerful and flexible without you having to do an awful lot of work.

Roughly there's a "pick two" thing going on between:

  1. Powerful
  2. Flexible
  3. Easy to use

If you're lucky you get one of these. If you're really lucky you get two. You basically never get all three.

Except in fiction. A genie's wishes are powerful, flexible, and easy to use.

Thus, one might reasonably term the statement that you cannot have all three of these things at once as the conjecture that genies do not exist.


Book Review: "An Astronaut's Guide to Life on Earth" Chris Hadfield

When I mentioned to a friend that I was reading this she said "Oh is that actually good? I assumed it was pop science trash". It is actually good and it's not pop science trash.

There are roughly two interesting bits to this book:

  1. Advocating for a particular attitude to life.
  2. A lot of anecdotes about space and what the life of an astronaut is like.

I found both sides of the book interesting, although I think the life advice would have fallen very flat without the space component, while the space component might have stood just fine on its own.

Some of the attitude stuff that is advocated for:

I've been thinking about trying to get better at strategic planning, and this book was a useful nudge in that direction. In particular there's a bunch of advice in there for constructively over-thinking things which might be useful. The main gist is that rather than worrying about everything that might go wrong and stressing about it as a result, worry about everything that that might go wrong and plan what you will do when it does.


Book Review: Thinking in Systems by Donella H. Meadows

Decent introduction to systems thinking. Not super useful for me given that it was a bit preaching to the choir, but helped fill in some gaps and I'd recommend it as an intro for people who don't naturally think this way.

I found some of the examples a bit dated and the politics a little annoying in a few places. I also feel like it was less actionable than the author believes it to be. All told though it's a nice, clear, and enjoyable read.


Book Review: A Field Guide To Getting Lost by Rebecca Solnit

This a lovely book that I can't really easily summarise. It's a melancholy exploration of our relationship to the world, through the lens of loss and what it means to be lost.

One thing that was interesting for me personally when reading it is how very clearly it is the sort of book which I could never possibly have written. I don't just mean that the author is a much better writer than me (although she absolutely is), but the nature of our relationships with memory are very different: The book is full of detail-rich memory of her life, with the themes she's discussing being woven in and out and illustrated with personal stories and how they played out in those cases.

I simply don't have that sort of memory, which makes it very hard for me to... relate is not the right word, because I can absolutely relate, but I cannot imagine what it would be like to have that level of access to my history. Usually this just feels normal, but being confronted with how wide that gap is is a strange (and, indeed, lost) feeling.


Deleting larger intervals in test case reduction

So here's a surprising fact I just noticed.

Suppose you've got some sequence \(S\) and a predicate \(p\) and you want to do test-case reduction on \(S\) with respect to \(p\).

Classically we try to get the result to be one-minimal. I'm pretty sure that by choosing \(p\) carefully you can always force this to require \({|S|\choose 2}\) evaluations of \(p\). You certainly can with the greedy algorithm. See this mathoverflow question I asked about generalising.

A stronger condition you might want is what you might call interval minimality. That is you want to find \(S'\) with \(p(S')\) such that \(p([S_1', \ldots, S_i', S_j', \ldots, S_n'])\) is false for any \(i < j\).

The funny thing is that these tasks have the same worst case complexity.

If you run the brute force algorithm that tries to delete each interval of length \(k\) at a time, increasing \(k\) when that fails, if you delete an interval of size \(m\) then this takes you worst case \(m \frac{2n - m + 1}{2}\) evaluations, but it decreases the size of the sequence by \(m\). By doing induction and seeing where the maximum lies, this means that the actual number of predicate evaluations to get to a fixed point can also never exceed \({|S|\choose 2}\).

The typical case is generally not going to be as good, but it's interesting that the worst case is the same.


Vegetarian Lasagne Recipe

(Writing up a vegetarian lasagne recipe I made the other day because I've been getting requests).

This has four ingredients:

  1. Crown Prince Squash, cut into roughly 2cm cubes and roasted in butter (you can use oil if you like. In general this recipe has a lot of butter)
  2. Wholewheat lasagne noodles
  3. Grated Mozzarella
  4. Very Red Sauce (recipe for this below. This is the only labour intensive bit)

Crown Prince Squash are much much better than butternut squash and are thankfully in season at the moment. They look like this. I get mine from West Hampstead Farmers Market, which is open on Saturday 10:00-14:00, where they are fairly cheap (£2.50 each, which means £5 gets you more squash then you will probably want to eat in a week). I think a bunch of the middle-eastern and organic vegetable stores in that area also sell them on other days.

To make it you stack layers:

  1. Squash
  2. Very red sauce
  3. Noodles
  4. Very red sauce
  5. Mozzarella

I usually do two of these on top of each other, with extra mozzarella on top for extra crispy cheesiness.

Very Red Sauce

This is basically just a rich vegetarian, vegetable heavy, tomato sauce with some variations. Unfortunately there's no real formal recipe because I improvise it each time, so this is mostly reconstructed by guess work. Feel free to adapt it or substitute some standard recipe.

It's best made the day before and left to sit overnight on the counter (if you're not feeling brave you can skip this bit) and then in the fridge.

The way I make this is very dependent on having both a food processor and a pressure cooker. You can almost certainly make it differently without those.

The following makes enough for two large lasagnes:

  1. 4 medium sized onions
  2. 4 large carrots
  3. 4 medium sized beetroot
  4. 1 liter of tomato passata
  5. 1 can chopped tomatoes
  6. 250g salted butter (yes, you're reading that right)
  7. A lot of black pepper
  8. Two large spoons of marmite
  9. 5 or so bay leaves.

You shouldn't add any salt to this - between the butter and the marmite it's got plenty believe me.

First roast the beetroot. Once it's roasted, peel it (I use kitchen gloves for this - the skin should just come off once it's roasted).

Dice the onions, and put them in the pressure cooker with the butter and lots of ground black pepper. Leave them on a medium heat to brown.

While those are browning, peel the carrots, then grated them and the beetroot (this is where the foot processor comes in - I just use the grater extension). Add these to the onion, stir them together, and leave to cook for a while longer - say 20-30 minutes.

Once that's looking cooked (unfortunately you can't judge this in a pressure cooker, so I do guess work plus putting it back on if it's not ready), add the tomato, bay leaves, and marmite, stir thoroughly, and then cook on a low heat for an hour or so.

After that, the sauce is ready except for the waiting. Really do leave it overnight - it will be much better the next day.


Book Review: All The Lives I Want by Alana Massey

Reasonably strongly recommended.

This is a collection of essays about the lives of famous women, and the author's life, and the experience of being a woman.

Confession: When this book arrived from Amazon I went "Wait, why did I order this again?". Fortunately past me was wiser than other-past me and it was a good order. I'm pretty indifferent to a lot of popular culture, so many of the people referenced in this book were ones I didn't or barely recognised. That didn't really matter.

This is another book that is slightly meandering and personal in the way that I complained about for Writing to Learn. Unlike Writing to Learn, that was fine - I never felt that the book should get to the point, because the story it was telling was the point.

I don't have a good summary of the book, except perhaps that it's about giving a sympathetic eye to women who do not normally get one, and it does that very well. It's not exactly a light read - the content is at times fairly distressing - but it is well written and to the point in a way that means it also never feels like a heavy one.


Book Review: Writing to Learn

The biggest problem with this book for me is that it is not really about writing to learn, instead writing to learn is better thought of as its recurring theme.

The book exemplifies a style that it also advocates for: Personal, privileging peple over knowledge. Every piece of information should be presented in the context of a story, and about how the people relate to that knowledge.

I'm not a huge fan of this style. I think it can work, and certainly personal books of the type it encourages are fine (and sometimes even great), but as a way of presenting knowledge I think it fails. The point being made is obfuscated by being tangled up in the story, while the story suffers because I'm constantly thinking "OK get to the point" and thus want to rush through the story.

As, frankly, happens in this book. The book is well written, and contains a lot of interesting examples of good writing. I think if I hadn't been wanting to learn more about its actual theme of "writing to learn" and just sat back to enjoy it as a piece of writing I would have liked it a lot more.

The actual structure of the book is roughly this:

  1. The fact that writing is left to the English department to teach does an injustice to the student and to the teachers - writing is better integrated across the curriculum.
  2. A couple of useful pieces of information about how to integrate that.
    • Get people to use writing to make their reasoning process explicit.
    • Require well-written essay answers on exams, but if an answer is correct but poorly written, give students an opportunity to revisit it, with the help of a writing workshop.
  3. A very large number of anecdotal and extract evidence of the fairly obvious proposition that yes you really can write well in any field.

I'm particularly disappointed by the third part because almost none of these felt like examples of people writing to learn - they were almost all examples of writing to teach.


Compare and Contrast: Big Capital vs Radical Markets

Copied and expanded upon from Twitter.

I said in my review of Big Capital that I didn't think I'd got a lot out of it other than being sad and angry, but actually one thing I've gotten out of it is that I think it's helped me put my finger on why the property model in Radical Markets is so dangerous. The problem is that you end up effectively charging people for community - people put down roots, form connections, etc. This creates an unwillingness to move that means that the value to the incumbent is significantly greater than the value to the purchaser. Either this is reflected in their personal valuation of the house (which means they are taxed more) or they inevitably lose value when someone buys their house from under them. As well as being very class correlated that's an awful incentive gradient you've got there.

The way this has played out in London is in the context of council housing, where people who have bought their homes under right to buy schemes have been met with mandatory purchase orders from the council where they want to redevelop the estate. The COST model proposed by Radical Markets is basically a form of mandatory purchase order - you don't have the same injustice over being paid below market rates (except in the sense that "market rates" are set by what people can afford to pay in their COST taxes, and estates typically house poorer people!), but other than that the effect is the same.

This destroys communities, and because these are the communities that people have built their lives in and around, that destroys lives.

Anyway, given that it has helped me make sense of something important, I tentatively revise my review from "Very good but I can't recommend it" to "Very good, will make you sad and angry, but may help you make sense of some things around property and is a useful counterpoint to Radical Markets, and I leave the cost-benefit analysis to you".


Big Capital: Who is London for? by Anna Minton

This was a good book that I don't think I can really recommend.

It's a compassionate, well-written, account of the extent of the problems with treating London property as an investment vehicle, the extent of the personal cost to people, and the degree of culpability of the councils in this problem. I knew much of this, but it filled in a lot of details for me.

I was very impressed with how level a tone the author managed to keep throughout the book. The style would have been almost painfully dry, except that the material has such a high emotional impact and the writing is sufficiently concise and to the point as to offset the worst of the dryness. Instead it is a very effective accounting of the situation we find ourselves in. I don't think I could have written anything like it without calling for open revolution.

Unfortunately, I don't know what to do with that information, so the book just made me angry and sad, and I was already angry and sad about this. Some concrete proposals were made in the last chapter, and they seem like good ones, but they're not ones where I feel like I have much agency, and to the degree that I have agency I would probably spend it on other things.


Book Review: We Should All Be Feminists by Chimamanda Ngozi Adichie

Short, persuasively written, and a lovely interweaving of the personal and the political.

Unless you haven't been paying attention (which is a lot of people), probably doesn't contain anything very new to you except some of the details of how gendered issues play out specifically in Nigeria, but still a good read.


Book Review: How to understand your gender

I thought this was a good book, and I would recommend it to people who either are trying to do just that, or to people who want to understand a bit more about what others go through in this space. It's well written, reasonably thorough, and provides sensible, compassionate, advice.

I didn't personally get a huge amount out of it. This is partly because I've already had a lot of these discussions and done a lot of the relevant reading, partly because I sped through it rather than doing the reflective exercises they encouraged, and partly because my own relationship with my own gender could perhaps best be described as "mild antipathy" - other people talk about gender as this deeply felt lived experience, and it's not something I can relate to at all (I think Ozy Frantz's notion of "cis by default") is helpful here. The result is that the book was written as if a lot of its content should have deep emotional impact and, while I think this was the correct way to write it, a lot of it failed to land for me.


Book Review: How to Talk about Books You Haven't Read, by Pierre Bayard

A surprisingly charming book. It sounds like it should be funny, but it's not (and I don't think is intended to be).

Instead the author, a literature professor, discusses the roles of books in our lives, and how they are interpreted and exist within our cultural framework and views on the world.

One of his key points is that there is not really a binary of books we have "read" versus "not read" - every book we have read, we immediately start to forget, so at some point even books we have ostensibly read may achieve the same status as books we have never read.

He uses a cute set of categorisations: UB (book that is unknown to me), SB (book I have skimmed), HB (book I have heard about), FB (book I have forgotten), coupled with a rating of --, -, +, ++ to indicate a negative or positive opinion about them.

I am glad to have read this book for, ironically perhaps, a reason that I think is the major... if not missed, at least brushed over, point, which is that I think his view of books is very fiction-centric. What he is saying is not wrong for non-fiction, but I feel it is incomplete and elides much of the impact that a book can have on your thinking. His point is still valuable here - there are many ways to achieve that impact other than a thorough in-depth reading of the book, and some of them will even work better than a close reading the book (e.g. in-depth discussion interspersed with skimming and the occasional read) - but given that I think of this as one of the major point of books I would have appreciated more discussion.

On the other hand this is a bit of a "Why didn't you include...?" and if he had included everything that could be said on the the subject the book would have been significantly less pleasingly short, so perhaps it is best left as is.

Possibly interesting to contrast with "How to Read a Book".



I asked two related questions on Twitter recently. The first was about short non-fiction books:

Seeking recommendations of books with the following properties:

  • Available in paperback form with < 300 pages (ideally < 200).
  • Non-fiction
  • Taught you something interesting
  • Accessible to a (reasonably well educated) general audience

At some point I realised that all of the books I had on my list for this and almost all of the recommendations were written by white men, so I asked a follow-up question:

Opening up the subquestion of this because it's made me go "wait a second...".

What are your favourite nonfiction books that are not written by white men? Bonus if they also meet the other criteria of the quoted tweet but not essential.

There are a lot of good answers to both tweets that I will not attempt to summarise here.

I then decided that if I was buying this many books in one sitting I might as well not give Amazon my money for this and decided to go to Foyles. I ended up leaving with uh, quite a lot of books. Book stores are very good for serendipitous discovery and purchase in a way that websites are not, especially with Amazon's dreadful recommendations.

There is some overlap with both the recommendations and the stated goals, but it's far from total. I ended up buying 15 books. Of these one was written by a man, and another co-authored by one. Of the women authors a couple were people of colour but I haven't checked that carefully which.

The Foyles haul was:

There was no particular rhyme or reason to this selection. A bunch of these are books that were staff picks at Foyles, a few were ones from the recommendations I'd got. "How to understand your gender" has vaguely been on my to read list for a while. Mostly I just wandered around looking for things that caught my eye.

So once I've managed to plow through this pile (which due to reasons I have less than two months to do. Fun times) I should at least be a bit better in the gender balance of who I read nonfiction by! Race is still decidedly lacking though.


Teaching from worked examples

Mostly saving this for later so I can find it again.

How We Teach Introductory Computer Science Is Wrong.

In particular the following cites from it are interesting:

The general point being that getting people to try to solve the problem on their own as a starting point is not actually a good teaching model, and you're better off showing them a number of worked examples first. I find this counterintuitive personally, and I suspect they're framing it as a more certain thing than it really is, but I still think it's worth taking note of.

I wonder how this ties in with Sue Sentance's PyCon UK keynote suggestion of teaching programming by giving people something to modify. There I guess you are giving them a worked example first, so the two seem mutually supportive.


Beware Threshold Effects

Nyoom by Alicorn is a lovely piece about her new mobility scooter. There's a refrain throughout that hit me particularly hard:

Not like, you know, really disabled.


I wasn't disabled or anything!


Because I thought I wasn't disabled.

I hate this idea that "disabled" is a binary category that you either are or aren't. There's no magic threshold at which it's suddenly OK to need help. Everyone needs help, and if something helps you you should do it.

I don't think Alicorn disagrees with this, and I think this is probably a deliberate point of the piece, I just want to emphasise it.

The world is full of threshold effects where we take some continuous spectrum and declare that everything that is past this point counts and everything that's not doesn't. This is bullshit.

You see this a lot with psychological issues as well, and it ties in to the war on drugs. There are a lot of drugs that are currently prescription only that I suspect would be much more widely useful - ADHD or modafinil for example. Hell, I'd probably benefit from one or both of these - do I have ADHD? No, I am almost certainly not able to meet the clinical definition of ADHD, but I've had more than one person look at my output on the internet and go "So, David...", and I'm definitely at least halfway in that direction. Do I have a sleep disorder? Yes, almost certainly. Is it one of the two that you can get modafinil prescribed for? Nah.

We need to stop thinking of medication and assistive tech and, well, everything else, in terms of "Here is some discrete set of conditions that they fix" and more in terms of "this is the class of problems they help with". If you have a problem and there is something that would help you with it, the solution should be available to you regardless of whether you cross the threshold of "really needing it".


Communicating Knowledge

Copied over from Twitter so I don't lose this.

My brain is currently trying to synthesise some stuff together that I'm not sure I can articulate in less than a Chidi-thesis worth of content and it's very annoying.

The following combine in a way that I have not yet been able to tease out a small t thesis from:

  • "The Shock of the Old" - David Edgerton
  • "Shop Class as Soulcraft" - Matthew Crawford
  • "The Two Cultures of Mathematics" - Tim Gowers
  • "Telling is Listening" - Ursula Le Guin

Roughly the area I am attempting to sound out is the idea of knowledge which is embodied in a person or a group, and cannot be separated from that without essentially rebuilding it from scratch (possibly with some gentle editing from a teacher who already understands it).

Physical skills are very much like this (you can't just read a book to learn a martial art), but I think we treat cognitive skills as if they're not like this, when they absolutely are, and we privilege knowledge based on how close it is to the ideal of transmissibility.


Implicit vs Explicit

The Zen of Python:

The only one of these people seem to remember is "explicit is better than implicit". Unfortunately, it falls afoul of the problem that almost all "X is better than Y" advice has, which is that the important qualification "All else being equal" is left, well, implicit.

Explicit is more verbose than implicit. Explicit is more expensive than implicit. If you try to make everything explicit then you will never get anything done, because as with everything, being explicit has costs as well as benefits. Explicit may be better than implicit, but implicit is cheaper than explicit. Everything is a trade-off.

It can be worth adjusting where you are in trade off space though - just because everything is a trade off doesn't mean you're automatically in the right region of trade off space, and it's much easier to err in the direction of making things too explicit rather than too implicit.

Tying into past posts:


Beeminder for Enforced Participation

I'm a huge fan of beeminder. I was on a break with it for a while, not for any very good reason, but I've been back using it for a couple of months now, and it's really good to be back.

A lot of the things I've been using beeminder for recently are a sort of... "making the fundamental feature of Beeminder contagious". The fundamental feature of Beeminder for me is that you cannot simply decide to give up on a goal - because the goal will charge you if you don't satisfy it, and because you have to wait a week for changes to take effect, the choice to stop a goal has to be a deliberate decision planned out in advance.

This is fairly huge to me.

A lot of goals I've had recently are basically of the form "keep doing the thing". This allows me to create systems with this property - beeminder doesn't track my success at using the system, it just tracks that I'm doing it at all.


The notebook and day plan goals do have some quantity attached to them but it's not that important. The journal and book goals literally all they track is "Did I do the thing today?". If yes, I get a point. If no, I don't. I can do as much or as little as I like as long as I do something. The system itself is then what keeps me doing it well, "all" beeminder is doing is keeping me participating in it.


Stories about Data

Consider the difference between the following:

The result was non-significant but when we split it up into N groups it became significant for subgroup Y. Therefore there is a significant effect for group Y.


The result was non-significant but when we split it up into N groups it became significant for subgroup Y. Obviously this is invalid statistically, but that might be an interesting followup experiment to perform.

It's OK to tell whatever stories you want about your data, as long as you make it clear which ones are and aren't valid inferences.


Auto-parallelising test-case reduction

There's a new parallel test-case reducer called half-empty.

It uses what is a new to me approach to parallelisation but is apparently also what C-Reduce does, which they call pessimistic parallelism. The basic idea is that you parallelise based on the assumption that what you're going to try isn't actually going to work, you fork a background process to check if it does actually work, and the main calculation just proceeds as if it was false. If your assumption that it rarely works mostly holds true, this lets you turn what seem like highly sequential processes into highly parallel ones.

It occurred to me that you could fairly easily automate this in a way that lets you write a test-case reducer exactly as if it were sequential but have it magically automatically parallelised in the background.

The way it would work is this: You have some test case reducer state object which has a cached version of the predicate. You wrap a reduction pass in some special function (say a decorator in Python). Now when you call the predicate from within the reduction pass what happens is as follows:

  1. If the result has already been cached, use that.
  2. If the result hasn't been cached, return false and queue the result for background computation, which will update the cache when it is finished.
  3. If at any point a backgrounded job returns true instead of false, clear the queue, wait for the current computations to finish, and then restart the reduction pass from the beginning.

This makes a couple of assumptions:

  1. Running the full reduction pass is cheaper than running the predicate.
  2. The predicate will rarely return true.

Ways it might be useful to patch this:

  1. Keep the queue size bounded. When the test function calls the predicate and the queue is at capacity, have it block until the queue is emptier.
  2. If the pessimistic assumption does not hold, e.g. say if at least 5 of the last 10 predicate calls were true, run the predicate in the foreground instead of backgrounding it.

If you want a really cheeky approach (I don't think this will work), here's a neat trick you could try: Use some sort of classifier (language inference, machine learning, whatever) to predict the result of the predicate and use that, invalidating when the parallel computation gets it wrong.. You could even just be very lazy and just predict whichever outcome is the most common.

Actually speaking of language learning, this approach would also work well for L* with a bit more tracking of dependencies, which would give you a fully parallel language learner.


Try not to think about it

A useful general principle is "If you have to make a decision to do something, you will eventually get it wrong". As a result, it is often useful to arrange things to avoid decision making, even when this results in redundant work. This is especially true when one class of error is much more costly than the other.

Some examples:

  1. Always lock the car when you walk away from it, even if you don't think you need it. Locking the car when you don't need to is cheap, forgetting to lock the car when you need to is very expensive.
  2. Leave useful things (chargers, clothing, etc) in places where you are likely to need them (work, a partner's flat, etc, your family home), so that if you forget to bring one there is already a supply there. Having extra stuff is fairly cheap, but forgetting stuff you need is super annoying.
  3. Keep small stuff in your bag that is regularly useful, even if you probably don't need it on any given trip (e.g. I always carry a phone charger and a pencil case). Same reasons as above (although the added weight isn't entirely cheap).


Branch and Consolidate

Attention conservation notice: This note is even more note to self than then normally are, so may not make much sense.

The following implementation strategy has just occurred to me, for writing code that can be both a randomized algorithm and a dynamic programming solution for giving you the full distribution (you can also achieve this by just writing it abstracted by a suitable monad implementation of course).

Add two primitives:

  1. branch(n) conceptually generates a uniform random number \(0, \ldots, n - 1\).
  2. consolidate(data) says "the rest of the computation is uniquely determined by this value".

In random generation, branch has the obvious implementation and consolidate does nothing.

When doing the dynamic programming, we use the standard trick for exhaustively enumerating a tree of unknown shape: Explore based on prefixes, filling with infinitely many zeroes for branches drawn past the prefix, and increment it lexicographically until we can't any more. The difference is that whenever we call consolidate with a value we have already seen, we raise an exception to terminate the process and add an entry that says to use the final value. Conceptually this is the same as just exhaustively enumerating all the possibilities, but with shortcuts.

At the end we just solve the obvious dynamic programming problem to calculate the probabilities.


Maybe these two great flavours go together

A problem I have is that when trying to fix stuff I want to do too many things at once, and then I don't know which of them worked.

A thing that I just occurred to me is that I know rather a lot about test-case reduction.

Maybe these things could go together...

(You still have the problem that the things you try may interfere with each other but I think that's a lesser issue)

Really the proper frame for this is probably group testing rather than test case reduction, but use what you know.


Things you didn't know you can be bad at

Twitter thread by me:

I wonder how many things we're all going around doing badly because the idea of not knowing how to do them well seems too ridiculous to admit to. Prompted by the fact that I'm reading about Buteyko breathing (with a certain amount of skepticism). On the one hand "You're breathing badly" seems like a ridiculous claim. On the other... who ever taught you to breathe? Are you sure you haven't self-taught bad habits? But also prompted by recent conversations about conversation. You've probably never been taught to have a conversation. I've had exactly one class on it and it was in the last six months. I know damn well that many people have not self-taught this well... In general there's this entire class of implicit skills that we mostly don't think of as skills, that we're entirely self-taught on, and that we practice sufficiently non-demonstratively that we can't easily watch what other people do. The result is a very personal skill idiolect

Idiolect is a very good word BTW. Not enough people know and use it.

To unpack on this slightly from conversation in that thread, there are two things going on here:

Other people have pushed back on the notion of "bad". In some cases it's just "could be better". I do think in many cases bad is the right word though. For example if it's really true that overbreathing can cause or aggravate asthma, I think that would count at being bad at breathing.

Another example is that apparently westerners are apparently bad at bending over.

Consider also the way this shows up in our use of language: Someone has bad posture rather than being bad at posture.

From ryan on Twitter

Things you do very frequently and are thus worth a counterintuitive amount of attention/optimization: Sleep Sit Walk Work Commute Read Eat Drink water Type Decide who to spend time with * Browse distractions


I think I actually do spend a counterintuitive amount of time and attention working on most of these. The only ones I don't do much about in particular are "type" and "drink water", but I feel like I'm already way on the good end of the bell curve of those (actually I probably drink too much water if anything. I have observed this pattern, and I think it ties in to other things too much to be worth addressing on its own).


Vegetables and diet

This post by Sarah Constantin about the Fasting Mimicking Diet looks interesting (I generally trust her health recommendations, insofar as I trust anyone's).

In particular the concrete diet plan suggested is:

For the first five weekdays of every month, eat nothing but (non-potato) vegetables, cooked in fat if desired. The rest of the time, eat whatever you want.

I've previously written about trying to eat more vegetables, where I'm trying to bring my meals up to at least half vegetables by volume. I've been doing pretty well on this for most non-breakfast meals, with the exception of lunches when I go in to imperial.

I'm quite tempted to try this diet, in large part because it will I think force me to sort out my bringing lunch to work situation.



I struggled to refind this comic about questions and answers so here's a link for later use.

Favourite quote:

Once you see that an answer is not serving its question properly anymore, it should be tossed away.


The Duties

I am generally deeply suspicious of normative ethical theories. I think ethics is hard, and attempts to simplify it to a set of rules inevitably have the problems that taking a complex space and making it legible always do. They're potentially a useful mental tool, but as soon as you start arguing about which one is true you've lost. Even if I was a moral realist, which I'm not, it seems obviously the case that the ethics that can be told is not the true ethics.

So obviously I thought it might be an interesting experiment to set out to build one. The following is part of what I came up with. This is something I at most weakly believe. I think it's a good model for evaluation of actions, but I don't think it's one I would actually attempt to follow dogmatically.

You have one duty: The world and all that is in it should thrive. Things should get better over time, not worse.

You have four duties: To yourself, to those around you, to humanity, and to the world.

Until you have fulfilled the earlier duties, you should not consider the later.

  1. Your duty to yourself is that you thrive.
  2. Your duty to those around you is that you help them thrive.
  3. Your duty to humanity is that your existence should help rather than hinder it in thriving.
  4. Your duty to the world is that your existence should help rather than hinder it in thriving.

Each duty supersedes the later ones: Ensure you thrive, then others around you, then humanity, then the world. This does not mean that you should always prioritise yourself above others, but it does mean that you should put your oxygen mask on first before helping others with theirs.

To live a good life is to discharge your duties to the best of your ability. There is no shame in failing to uphold these duties because you are unable, only because you are unwilling.

(I originally had a bunch of text explaining the reasoning behind this, but actually this is more intended as an interesting artifact than something I'd propose to defend, so I just deleted it).


You Can't Trust Lawful Good

Jack posted a link to this YouGov poll in which people were asked their D&D alignment and somehow almost nobody thought they were evil. I'm as surprised as you are given the last two years of British politics.

The particular thing that was surprising to him:

What the everloving fuck? I love the idea of mapping political positions this way, but you reckon "everyone is a bit racist" and "i like theresa may" are... lawful good? How about lawful evil? I think well-meaning fussy philosophical types vote libdem, not ukip.

My reply:

Paladins are literally cops, Jack.

So, uh, yeah. ALGCAP (All Lawful Good Characters Are Bastards).

I have technically played a lawful good paladin. My interpretation of him probably veered more chaotic good than was strictly accurate.

From an earlier discussion with some other people:

I have played a paladin and it worked pretty well. He was very good at smiting people with sarcasm (and then a sword). I think I just decided that traditional interpretations of lawful good as being humourless were too narrow and decided to have fun with the concept After all, nothing incompatible about a deep burning righteous anger at the injustice of the world and a profound desire to fuck with people.


Shit, maybe I am the lawful good character.

(For the record, I am not the lawful good character. I'm very clearly Chaotic Good. Maybe Neutral Good on days when I'm too lazy to be properly chaotic).

Jack and I later discussed another aspect of the villain versus hero dynamic:

Me: Every time I find myself going "hmm the villain actually has a pretty good point it's a shame they're evil" I start to head canon that I'm consuming media from the "plucky ragtag heroes'" well funded propaganda arm.

Jack: What I eventually realised was that villains' motivations ranged from "bizarre" to "excellent" but if what made them villains was doing unjustifiably bad things in the pursuit of that.

Me: I think that is broadly true, but that it is remarkably convenient how the people with excellent motivations for changing the system always do unjustifiably bad things in the pursuit of that.

"Avatar: The Legend of Korra" is possibly the worst example of this I have ever seen: I broadly enjoyed the show, but every single villain was raising legitimate objections to the system, but fortunately they were evil so you could just punch them and move on without addressing the fundamental systemic inequalities that they were objecting to.

As I put it elsewhere in that thread:

Gotta love them underdogs and their defence of the status quo

Recommended reading/viewing on this subject:

Anyway, that's why I am SOMEBODY ELSE WHO IS NOT ME IS raising an army of crows.


On Formal Mathematics

Some thoughts on the question of the formalisability of mathematics.

Based on "Rigor and Structure" by John P. Burgess I tweeted the following a while back:

I'm reading "Rigor and Structure" by John P. Burgess at the moment, and a point he makes that I really like is that formal logic is best viewed as analogous to a sort of... physics or economics of mathematics. It is a (very good) theoretical model of Actual Mathematics. And one way the model breaks down is that it's a great model of deducibility, but a poor model of deduction - you can be reasonably confident that if a proposition is formally deducible then it's real-maths deducible, and we also believe the converse mostly on faith but it doesn't follow (and mostly isn't true) that the actual process of proof is well modelled by the formal logic - e.g. proof lengths in informal and formal mathematics are not well correlated, and the styles of proof that you adopt are radically different.

Ron Pressler took exception to this framing, and I believe in particular to the part that we believe mostly on faith that every informal proof is formalisable. I've tried several times to write what I believe Ron's argument against this is, but it kept coming out as such an implausible strawman that I must assume that I am misunderstanding some crucial aspect of his point.

Another informative text on the subject is Imre Lakatos's "Proofs and Refutations" which makes, I think, a compelling argument that understanding proof purely in terms of its formal content is a very limiting view. Burgess is arguing around (though not entirely for) the idea that "a proof is that which convinces", but Lakatos is arguing that the primary purpose of a proof is not to convince but to illuminate (this is a huge oversimplification).

The problem I have with the idea that we should naturally expect all informal proofs to be formalisable is as follows:

  1. It's obviously false. Informal proofs lead to formal proof schemas unless you pick your logic very carefully. In particular informal proofs can e.g. quantify over predicates.
  2. The arguments I have seen that we should expect it to be true from the deeper Church-Turing principle seem so obviously wrong that I can't even interpret them as coherent: The ability to simulate a human brain well enough to replicate an informal proof on a Turing machine doesn't tell you anything about formal provability of a propostion, it provides a formal proof that an informal proof exists.
  3. Many informal proofs most interesting characteristic is that they demonstrate a contradiction or paradox in the natural language that goes away when you attempt to formalise them. For example the interesting number paradox or the Berry paradox. This means that the formalisation process in itself has interesting semantic content.
  4. It is typical for any informal proof to have so many logic gaps in it that finding a formal refinement of the proof really constitutes creating an entirely new proof. Therefore even if this is true, it is true in a fairly weak sense.

Do I think that any sufficiently precise informal proof of a statement whose natural language meaning has an unambiguous formal equivalent can be refined to a formal proof schema? Yes, almost certainly.

Do I think that the above is obviously true rather than a belief? No, and I don't think it would be possible for it to be obviously true - we don't have a sufficiently pinned down notion of what an informal proof even is.

Do I think this is all a pointless distraction from the actually interesting point made above, which is that formal proof is a much better model of provability than it is of proof? Yes, definitely.


Notes on Interviewing

Twitter thread from me:

The worst thing about all of these "diversity shouldn't get in the way of finding the best person!!" arguments (I know, it's a hard choice) is the complete and utter disconnect from reality required to believe that anyone has a clue about how to find the best person. Santa Claus isn't real, and your interview/talk selection/whatever process is probably almost entirely noise rather than signal. The good news about this is that it means you might as well decide on the outcome you want and then apply your process to select among the options within that outcome.

The bad news is I'm going to assume that you've done that, so the outcome you got is the outcome you wanted. These are legitimately hard problems, and I have more sympathy than most for the trade offs involved in them, so I'm not e.g. going to assume you're a terrible person because of a failure to hire diverse candidates, but I am going to assume you probably weren't trying very hard. Source: I've fucked up this way, and in retrospect I wasn't trying very hard.

The two books I recommend to people interviewing are The Halo Effect and Epistemology and the Psychology of Human Judgment. Neither will, unfortunately, teach you to interview well. I don't know of anything that will. If you have book recommendations on this subject then I would like to hear them. Instead what these books will teach you is to doubt your own judgement when interviewing, which I think is a pretty good start - the worst interview processes I've been involved in have failed due to people trusting their own judgement over the process.

Things I would like people to understand about interviewing:

  1. Your process has false positive and false negative rates. You mostly can't see the false negative rate, so it's probably very high.
  2. Without a better idea of what you're actually looking for, your false positive rate is basically meaningless anyway.
  3. You are not actually looking for the best person for the job. You are looking for a person who can do the job well. Trying to find the best person for the job would extremely expensive in interviewing time.
  4. The job will change people, so even among a small candidate pool the person who is currently the best fit for the job may not be the same as the person who is the best fit in three months anyway.
  5. The "person who is best at the job" according to most easy to track measures may be very different from the person who brings the most to the team.

Given this, my advice to you is not throw out your whole interview process. Not because I think your current interview process is good, but because throwing it out and replacing it with something else will cost a lot of political capital and you probably still won't create a good interview process because interviewing is bloody impossible.

Instead my advice to you is this:

  1. Keep an eye on your false negative rate. Maybe let a random or biased set of candidates through the early stage of your pipeline who you would otherwise have ignored.
  2. Do pay attention to stuff like what your job ads look like. I don't currently have a good link for advice on this but if someone sends me one I'll edit it in.
  3. Read Kristian's Blog Post
  4. Think in advance about what you actually want and what you would settle for.
  5. Try to make sure you're getting a broader and more diverse audience in to your process in the first place. Watch out for filtering that happens before they reach you - e.g. based on recruiters, job platforms, etc.


What is a neural network?

These are some notes I put together in response to the question "How would you explain neural networks to business people?" I have moved them over to my blog as they seem to be getting popular and the main blog is a better format for such posts.


Principles of (Social) System Design

I think the following two points are under-appreciated when designing systems for people:

Therefore when designing systems the most important questions are:

  1. How can we eject bad actors?
  2. How can we retain trust in the system?
  3. How can the system help people to achieve the things that they already want to do?

I have found this to be very true when designing systems for myself (skipping the first step - I may be a bad actor, but I can't eject myself). I don't have enough practical experience at group system design to say for sure it applies there, but just based on what I've observed of other systems' failure modes I'm pretty confident that it does.


I attended the London Liberating Structures meetup the other day. I really enjoyed it. We did a conversation cafe which was an interesting format that I'm definitely going to borrow some ideas from.

The subject of our cafe was "Making Difficult Decisions". I found this really useful and have been wanting to write up my thoughts on our discussion in a proper article, but haven't been finding the time or spoons, so here are my notes on the topic so that I can remember what I wanted to say when I actually get around to that.

Note: Various of these points were made by various people. I'm going to make them without attribution, partly because the conversation was quite personal, and partly because I haven't recorded the attribution! I just want to make it clear that although I agree with all of this, it is not at all original to me. I've tried to make this a fair representation of what people said, but it's inevitably filtered and biased by my views on it and what I found interesting. Where my thoughts are things from writing this post rather than from the conversation itself, I have tried to mark them clearly by prefacing them with "Aside:".

A theme we hit on early is that it's rarely the decision that is difficult. Once you have got to the point where you understand that there is a decision to be made, we haven't actually found that it's very hard to make it, and once we've made it we feel an incredible sense of relief. The difficulty is getting to the point where we have anything as concrete and simple as a single decision.

Examples (these are all super paraphrased, I don't have the actual quotes written down):

In many of these cases, we reported that once we had identified and made the decision we felt great.

A key point that we identified in this "making decisions is easy" aspect is resilience - the ability to feel safe making these decisions. We knew that regardless of which we decided, nothing too bad was going to happen to us. We mostly talked about emotional resilience.

Aside: Financial resilience is also important, but we had a group of people from (I assume) fairly high-paying jobs by nature of the meetup, so this didn't end up really factoring in to the discussion.

Another thing that came up was the observation that often there was some sort of crystalising event that prompted the decision. e.g. being asked "Are you happy?", or some particularly bad event at work that forced them to realise that it was time to think about leaving.

Aside: I have two personal examples related to this that I didn't bring up. The first is that when I left Google it was prompted by reading the GCL ("GoogleGeneral Configuration Language") specification from cover to cover and going "fuck this shit". I didn't quit over GCL, but GCL was what prompted me to realise that I should. If you want an idea of what GCL is like, I refer you to flabbergast which is basically an open source implementation of it. The second is that I was recently asked "If you weren't working on this, what would you be working on?" RE Hypothesis, which was a very useful clarifying question for a number of reasons that alas this margin is too large to contain.

Some good points that were made in the course of the discussion:

Someone made the point that we were very focused on "making decisions in hard situations" and asked whether there were easy situations that had hard decisions. The joking example was that "chocolate or ice cream" is never a hard decision.

Aside: This is an instance of the Buridan's Ass problem. It's tempting to treat making a decision between very similar things as hard, when in fact it should be easy - just flip a coin because it doesn't matter very much.

One example we identified as a difficult decision that crops up without an accompanying difficult situation was "Should I flake on this event or not?" - not going feels like letting people down, even if you really don't have the energy or health to go.

Aside: Part of why this is a difficult decision is because even though (in most cases) the individual decision doesn't matter, the aggregate effect of it does. I have flaked on two events in a row with a friend recently, both for good reasons (one iron-clad, one merely good), but this feels much worse than twice as bad as flaking on one event, because it shows a pattern of flakiness. See also this vox article on losing friends.

Some questions we finished with:


I made a comment in conversation on Twitter the other day that I like and need to think about more:

"A logically omniscient instance of homo economicus" is mostly just a good user persona to have in mind in your system design meetings

Despite being generally down on SEU, both as a normative and as a descriptive model, I think think this might actually be a good use case of it: Treat your system design as if it contains such people and ask what they will do will tell you useful things, as long as you don't pretend they are the only people using the system.

It's not dissimilar to the principle of design on the assumption that an abusive ex will be using the system: Abusive exes are not your main users (hopefully!) but knowing how they will abuse the system tells you important things about what you need to do.

One reason that this is an important user persona to consider is that people will tend to approximate it increasingly well as they get used to the system, and increasingly well as the amount of time and money they have available to bring to bear on it, so you can think of homo economicus as the user persona for people who are willing to sit down and figure out how to actively game your system.

A useful feature of both of these personas is that there's a continuum between normal users and them. Someone doesn't have to be an abusive ex to behave like an asshole to another user, and the tools for dealing with an abusive ex help there too. Similarly, if the system pushes homo economicus to awful destructive behaviour that you want to avoid, it will probably nudge normal users in that direction too.


Group Decisions on Names

Here's a question I'm currently wondering about: How would you design a group decision making procedure for naming things?

For Sinister and Dexter we used Majority Judgment. We brainstormed pairs of names until we ran out, then we cast a vote on them.

The voting went as follows:

So how this played out was that in the first round "Sinister and Dexter" and "Lorem and Ipsum" both scored 4, and everything else scored less, so those were the two candidates that made it through to the second round. We then removed a 4 from each of their scores, and now "Sinister and Dexter" still scored 4 while "Lorem and Ipsum" scored 3, so the cats were named Sinister and Dexter.

Was this a good system? No, not really. I'm happy with the result, but the fact that you got a good outcome doesn't mean you had a good system.

There are a couple of problems with this. Firstly, the voting system. I don't have a problem with majority judgment (range voters don't @ me), but I think for this kind of very small group decision making any voting system has a legitimacy problem. For example, imagine we had an option where the votes were 5, 5, 1. This would win, because its initial score was 5, despite one of us hating the name.

The bigger problem with this though is that it treats naming as a closed list decision procedure: We decide on the names up front, then we vote on it, and use the outcome of that vote. This is nonsense. Naming is intrinsically open ended - generating new candidates is cheap.

For example the following might have been a better procedure:

  1. Everyone sits silently and writes down as many names as they like.
  2. People read out the names they've chosen and, if they like, explain their reasoning and origin. Anyone has the opportunity to veto a name. Vetos are encouraged - the goal is to only leave names in the list where everyone is happy to use the name, even if it's not their favourite.
  3. We vote on the names as above using majority judgement.
  4. This gives us a candidate name.

If we have an existing candidate name, we now take a majority vote whether to replace it with the new one. We then take a vote as to whether to continue the process or use the current candidate name. If we continue, repeat as above, possibly after some break.

Note that I don't think this is a particularly good system, it's just a sketch - the point is to incorporate the deliberative process of naming things into the mechanism, and to treat voting as a guideline rather than a source of legitimacy.


Lightweight RPG Systems

This was a good thread about Fail Better Games's new RPG, which in particular links to a lot of other interesting lightweight systems.

Mostly putting this here to save for later because I struggled tor refind it.

Another thing which isn't in this thread but was mentioned to me recently and looks a lot of fun is Lasers and Feelings


A thing I've been noticing a lot recently in how I think about problems is what an essential role switching between different models of the system has.

For example, when thinking about groups of people, it's important to think about the systems - what incentives are at play, how does the group response to events, etc. You treat the group as an abstracted object that is not made up of complex individuals but instead has a few very simple variables in play.

It's also important, both ethically and practically, to think about the group as a collection of individual people. People are complex and will surprise you, and if you neglect their individual needs then you will usually treat people badly.

It might be tempting to think that the ideal is a single unified view of the system which accounts for everything, but realistically that's almost always impossible, and switching between multiple very distinct models can often work nearly as well.

One thing this does is it combats the problem of legibility - the thing where the easiest way to understand something simply is to make it simple enough to understand, destroying much of its essential complexity and causing massive damage - but this seems to be true even without that. e.g. in mathematics it is often useful to switch between different representations because they make different features more salient.


I just read Richard Gabriel's The Structure of a Programming Language Revolution.

I don't think I have enough grounding in either lisp or the philosophy of science to fully understand it, and want to do a second and closer read with citation chasing, but one point really stood out for me:

I believe that, in general, this view of engineering and science is false: I believe engineering and science are intertwined, and for programming languages and software creation techniques, it’s often the case that engineering precedes science—and it’s very easy to see it.


One good example is the steam engine. Engineers began its development while scientists were making their way from the phlogiston theory of combustion to the caloric theory of heat, both today considered hilarious.

I think this observation is obviously true, and I'm kicking myself for the fact that I didn't think it was obviously true until it was pointed out to me.

This is also interesting in the context of the way pure and applied maths work. Often physicists are doing interesting mathematical things that are "completely wrong" until a pure mathematician comes along and provides a theory of how they could work.


Some of my favourite PyCon UK talks

The PyCon UK team are amazingly fast at uploading their videos, which means the entire conference is now online on youtube (There was some problem with one video but I'm not sure whether that's been resolved, so maybe the entire conference bar one talk).

If I'm being honest, I don't go to PyCon UK for the talks - I go because it's an amazing community. In general I don't get a huge amount out of talks in most conferences I go to. However I thought there were some especially high quality talks this year, including some that I not just watched but am going to rewatch.

So, here are some of my favourite talks that I would actively recommend.

The two talks that were so good that I intend to rewatch them (mostly to mine for citations and talking points) are:

Sue Sentance's keynote basically made me go "Welp, I need to redesign all my workshops". Hannah's talk had a lot of interesting material on reading code, and I want to follow the references to read more about it - I don't currently need most of the specific advice on legacy code, but reading code is still a very useful skill, especially given Sue's emphasis on its importance for education!

Daniele also gave an impromptu talk about documentation that is going to significantly impact how I write documentation in future. It was not recorded, but I believe it was a variation on this talk at PyCon Australia (or possibly this one at PyCon US).

Talks I would recommend but was already too much of a member of the choir to get that much out of (all of these are by people I'd consider friends or at least friendly acquaintances. This probably isn't coincidental but wasn't deliberate in my selection except in the sense that I always go to friends' talks if they don't clash with anything else that grabs me):

Misc talks I enjoyed and got something out of but that don't have any particularly insightful categorisation of

I also thought the lightning talks this year were excellent:

If you have a burning urge to see me speak (which I'm mostly not doing this year), I gave a talk about voting system. I'm mostly pretty happy with how this turned out.


Notes on Tweeting Too Much At Conferences

Well, PyCon UK, the best conference, is over for another year. Sad face.

This year I ended up doing something with a surprising amount of impact on my and others' experience of the conference: I tweeted a lot. Yes, I know, even by my standards. I essentially became the unofficial scribe of the conference. I won't even attempt to embed them, but here's a search query that will give you everything I tweeted on the conference hash tag for this conference.

Each day:

Why? Well, I'd been talking recently about how conference organisers put up with a lot and a point that got made in response to this is that a really helpful thing for attendees to do is tweet about the conference - it helps get more sponsors next year, promotes the ideas of the conference, and generally raises its profile. This seemed an easy enough thing to do, so I decided to give it a try and got a little bit carried away.

People seemed to really like me doing this. Especially the organisers - I heard from a lot of them that the running commentary helped them feel more in touch with the conference. So if this achieved nothing else then I'm happy with it. It also was appreciated by people who weren't able to make the conference, and in a few cases to those who would never have come because they weren't even programmers (though I of course still think they should come)!

People have asked me how I did it, but it's not really complicated: I had a laptop, I touchtype really fast, and I've wasted far too much of my life on Twitter. I had not previously thought the latter was a professional skill, but apparently.

I did have a couple of problems with doing it:

On the whole though, this level of live-tweeting seems to have been popular, and I will probably do it again at future conferences I attend.


Lean Coffee at PyCon UK

As part of the PyCon UK sprints I ran a variant lean coffee. It worked really well - we had a bit of an initial slow start, peaked at more people than the group could really handle, and gradually tapered down to a group of four by the end of the day. This was split over three sessions, during which we discussed 23 different cards.

The variant we ran was based on a previous proposal of mine to randomize lean coffee. Several people had reported that they ran lean coffees this was after my post, and thought that it worked much better, but I'd never actually got around to trying it, and I thought this was a good opportunity.

In my entirely unbiased opinion, I can confirm that it works much better. People seemed to really enjoy the format, and many of them reported that they would take it away and try to run it at work. For the third session I was basically wiped out (scribing, moderating, and discussing at the same time was something I could do, but it was very high energy), so I passed on the duties - one person took up moderating, another took up scribing, and there were no problems with doing so, so the system seems to be easy enough to transmit to other people, which is a win.

The Conversation

The point of the system is to provide a structured conversation about a large range of topics in a very short space of time. We select a card (more on that below). This has a discussion prompt (often a question) and the name of the person who proposed it. At this point anyone may veto discussing the card. People shouldn't veto cards just because they're uninterested, the veto system exists for topics that would make you unhappy or uncomfortable to have discussed. This never actually came up in practice. I don't know if that is because people didn't feel empowered to veto or because it was never necessary - I think it is the latter, but I don't feel like people would have necessarily been willing to veto if they needed to.

The proposer gets a (short!) opportunity to elaborate on the theme, then the group votes on whether they want to discuss it. If a strict majority raise their hands, a 5 minute clock is started, and the group discusses it until the time runs out. At that point, the group votes whether they want to continue it. If a strict majority does, a new 5 minute timer is started, and the discussion continues. The subject may not be extended a second time.

Selecting the Cards

At the beginning everyone writes down as many cards as they like and these are put in a central pile. These are shuffled, and cards are selected by drawing from the top of the pile. Anyone can insert a new card at any time they like, at which point the deck is reshuffled.

We adopted a system that I like in principle and think worked reasonably well but maybe ended up a bit too complicated - in order to ensure everyone got a good opportunity to seed the conversation, we deprioritised cards from people who had recently had their topic discussed.

The way this worked was that when a card had been discussed we put it face up on the table. If a card came up from someone whose name was already on the table, we put it aside. Once we had been through the whole deck, we stacked the cards that were face up so that they were no longer visible, shuffled the cards that had been put aside, and started the process again.


Things that didn't quite work

In Future

A lot of people came away from this going "This was great, I need to run some of these". Including me! Despite the fact that several people have used my variations on lean coffee before, this is actually the first time I have. I'd already been thinking I'd like to run one of these at Imperial, and now I'm even more sure I would like to do that.

We also talked about maybe running some of these earlier in PyCon UK next year. They were a great generator of high insight conversations, and I think provided some really nice social connections with people that it would have been great to form more than six hours before we had to say "Well, see you in a year I guess!".

In general, a lot of the things we talked about involved things that might be nice for the conference next year (not criticisms! Almost all of the form "PyCon UK is great, but here's an idea that might make it even greater). I'm probably going to (finally) get involved in the organisation of PyCon UK next year and once people have decompressed a bit and are ready to receive feedback, I'm going to write a summary of what those were and circulate it.


This is a fairly involved example which I don't expect to convince anyone, and is just the result of me thinking through some things.

Suppose we have a bunch of propositions \(A_1, \ldots, A_n\). We know a priori that \(A_i \implies A_j\) is false for \(i > j\), but do not know whether there are any forwards implications. We have an "implication oracle" which acts as follows:

  1. It has access to a number of "primitive implications" of the form \(A_i \implies A_j\). These implications are considered to be unreliable: They are true with probability \(1 - \epsilon\), but with probability \(\epsilon\) they provide no information (i.e. the proposition may be true, we just don't know that). These errors are independent.
  2. We may query the proof oracle with any pair \(A_i, A_j\) and it tells us the probability of there being a valid proof of \(A_i \implies A_j\) given only a true set of primitive implications.

We also have a "plausibility oracle" that gives us our prior probabilities of \(p_i = A_i\) being true.

Suppose we want to define an agent that chooses between these propositions, with a reward if the proposition chosen is true.

We can define a Bayesian agent that picks whichever of \(k \in \{i, j\}\) has the highest posterior probability\((1 - \epsilon) P(A_k | A_i \implies A_j) + \epsilon p_i\).

The problem with this agent is that it is not transitive!

Consider the following example: Let \(\epsilon = 0.11\), and suppose we have the primitive implications \(A_1 \implies A_2\), \(A_2 \implies A_3\) and the prior probabilities \(p_1 = 0.2\), \(p_2 = p_3 = 0.01\).

Some boring computation that I can't be bothered to carry over to text results in the above agent preferring \(A_2\) to \(A_1\), \(A_3\) to \(A_2\) and \(A_1\) to \(A_3\). The reason is that the strength of the implication \(A_1 \implies A_3\) is weaker than that of either the individual implications, as it is \(1 - (1 - \epsilon)^2 \approx 0.21\). Thus even though we still "believe" this implication, the weaker strength of it makes our prior probabilities overwhelm it.

Now, this paradox goes away if we have access to the inner workings of the implication oracle: If we know all of the primitive implications a priori then we can just calculate the "true" posterior probabilities across all possible combinations of whether the implications are valid or not, and pick the answer with the highest posterior, but this effectively requires us to know the entire space of propositions in advance.

I think that no strategy which has to decide based only on the answer of those two oracles on the current pair can dominate this strategy, because this is the dominant strategy for the case where there are only two propositions and the oracles are exactly correct about the probabilities, but I haven't checked the details of this argument.

So what this means in practice is that if some elements of your reasoning are "screened off" from you as black boxes, and you do not have full knowledge in advance of the set of available options, even a fully VNM-rational Bayesian reasoner will necessarily exhibit intransitive preferences.

However! Note that this does not mean that they can be Dutch Booked. The reason for this is that the proper Bayesian reasoner will update their posteriors about propositions as they are forced to make choices. This may in fact mean that the time varying preferences they make are actually transitive, at least in the limit, while their instantaneous preferences are not.


(Lightly edited) exchange from Twitter

Me: These days I think I'm in favour of a hung parliament. *checks dictionary* Oh that's not what it means. Never mind.

Jack: Are you fishing for "hanged parliament"?

Me: No, people are hanged. Animals are hung.

I don't actually ascribe to this worldview (I believe very strongly that dehumanizing your opponents is morally indefensible no matter how evil you think they are), and there are literally dozens of MPs who I would be sad if they were hanged, but sometimes my inner evil overlord just insists on coming out to play.

Also I really want to use this line in a story now.


What might a continuous rational agent look like?

In a previous post I said I didn't care much about this problem, which obviously nerd-sniped me into thinking about it.

So, the question is: We have a "rational" agent which is making choices over pairs of lotteries \(\mathcal{L}\), and it does this in terms of a function \(\tau : \mathcal{L}^2 \to [0, 1]\) where \(\tau(u, v)\) means "the probability of choosing \(u\) in preference to \(v\).

We had the nice (ish) VNM theory for physically impossible discontinuous rational agents, but what should the axioms for a continuous rational agent look like?

The following seem like they should obviously hold:

  1. \(\tau(L, M) = 1 - \tau(M, L)\).
  2. If we define \(L \prec M\) as meaning \(\tau(L, M) = 1\) then \(\preceq\) should be a partial order.
  3. If we define \(L \tilde M\) as meaning \(\tau(L, M) = \frac{1}{2}\) then whenever \(L \tilde L', M \tilde M'\) we have \(\tau(L, M) = \tau(L', M')\).
  4. If \(\tau(L, pL + (1 - p) M)\) should be monotonic in \(p\), and whenever it is non-constant that monotonicity should be strict.
  5. For any pure lotteries (that is, lotteries which take a single outcome with probability \(1\)) \(\tau(L, M) \in \{0, 1, \frac{1}{2}\). i.e. for any concrete outcomes the agent either has a strict preference or is indifferent between them.
  6. \(\tau(L, pM + (1 - p)N) \geq \min \tau(L, M), \tau(L, N)\)

Together with the continuity requirement, these give us roughly the equivalence of the first four VNM axioms in Wikipedia's ordering.

In contrast, the independence requirement obviously doesn't and can not hold in any meaningful sense for such an agent. Pick two lotteries with \(L \prec M\). Consider \(\tau(pL + (1 - p)N, pM + (1 - p)N)\). This is a continuous function of \(p\), and when \(p = 0\) it is equal to \(\frac{1}{2}\), therefore for any \(\epsilon > 0\) there must be some \(0 < p < 1\) we must have \(\tau(pL + (1 - p)N, pM + (1 - p)N) < \epsilon\), which breaks independence in a very strong way.

I think this lack of independence is in some sense the "essential difference" between a continuous and a discrete rational agent.

I'm not sure the above are the full set of axioms required. They feel a bit weak - I think more may need to be said about the relationships between \(\tau\) values over convex combinations of lotteries.

However, the following two examples might be illuminating in terms of things that obviously should be considered rational agents:

Let \(\mu\) be some utility function over outcomes and let \(\alpha: [0, \infty) \to [0, 1]\) be monotonic decreasing with \(\alpha(0) = 1\) and \(\alpha(x) \to 0\) as \(x \to \infty\). If \(E(\mu(L)) > E(\mu(M))\) then let \(\tau(L, M) = \frac{1 - \alpha(E(\mu(L)) - E(\mu(M)))}{2}\). Otherwise extend according to the requirement that \(\tau(L, M) = 1 - \tau(M, L)\).

The idea is basically that \(\alpha\) acts as a decision procedure about whether it's worth finding out more information - it represents the probability of giving up and flipping a coin. You run this procedure by observing to increasingly high precision until you either know that alpha is large enough that you should give up (based on a non-deterministic choice of doing so) or which side of the border you're on.

Another procedure that I think can not be realised as an instance of that but should still be considered rational is how a logically omniscient Bayesian agent who is only able to access the lotteries through sampling might behave. You start with some prior distribution over lotteries (maybe an improper one) and query the sampler for each, with some cost function \(\alpha: \mathbb{N} \to [0, \infty)\) for how expensive it is to evaluate \(n\) samples (probably \(\alpha\) is a linear function). You stop and choose as soon as you hit a point where your expected value (under your posterior distributions) of acquiring more information is strictly less than the expected value of choosing one of the outcomes.

I don't want to suggest that either of the two above are the only possible rational agents in such circumstances. I suspect in fact that there's a much broader diversity of behaviour possible than for VNM rational agents, which might make any axiomatic classification hard.


Programming vs Mathematics

Programming: "Custom operators and single letter variable names? Why so terse? Bytes are cheap! Suspish. Not sure if want. Code should be optimised for reading, not writing!"

Mathematics: "Let \(\alpha, \gamma, \beta\) be as in theorem 17.1. If \(\gamma \wedge \beta\) is a normal R-domain, then \(\mu(\alpha \oplus \gamma) \dagger \beta\) is quasi-uniform."

(No, that's not an actual quote and those terms don't really mean anything)


Physical and Topological Limitations to Rational Choice

Epistemic status: Confident.

Attention conservation notice: So much inside baseball.

Context: This is something I've known about for a while but couldn't find a concise write-up of that I had previously written and still liked, so I thought I'd just rewrite it here.

The Von-Neumann-Morgenstern utility theorem states that if you ask people to choose between finite lotteries over outcomes, any "rational" behaviour looks like picking based on whichever gives the greatest expected utility according to some utility function over the outcomes.

I have a number of objections to this idea, but my main one is this: Regardless of the axioms you choose for rationality, it's physically impossible to implement an agent that can express the sort of total order over lotteries that VNM rationality is a theorem about, even if you grant the existence of logically omniscient agents (you can do it if you grant the existence of actually omniscient agents).


Well, suppose we have some total preorder \(\preceq \subseteq \mathcal{L}^2\), where \(\mathcal{L}\) is the set of lotteries over some finite set of outcomes. Take this total order and define the choice function \(\tau : \mathcal{L}^2 \to \{-1, 0, 1\}\) where \(\tau(u, v) = 0\) if \(u \preceq v\) and \(v \preceq u\), else if \(u \prec v\) then \(\tau(u, v) = -1 = -\tau(v, u)\). i.e. \(\tau\) is a function determining whether we strictly prefer one or are indifferent between the two.

Any choice that a physical agent makes must be based on a finite (but not necessarily bounded up front) set of observations. Each of those observations can only give you information about the world up to some non-zero (but potentially arbitrarily small) tolerance. In particular, if we have lotteries \(u_1 \preceq u_2\), there is some \(\epsilon > 0\) such that if \(d((u_1, u_2), (v_1, v_2) < \epsilon\), we must have \(\tau(v_1, v_2) = \tau(u_1, u_2)\), because we only ever looked at \(u_1, u_2\) up to some finite precision.

In particular this means that \(\tau: \mathcal{L}^2 \to \{-1, 0, 1\}\) is a continuous function!

Unfortunately, \(\mathcal{L}^2\) is a connected topological space and \(\{-1, 0, 1\}\) is totally disconnected. Thus any continuous function must be constant. We know that \(\tau(u, u) = 0\), so we must have \(\tau(u, v) = 0\) for all \(u, v\). i.e. the only physically possible total preorder that we can express is the one that is totally indifferent between lotteries.

If the above argument makes no sense to you, another way to look at it is that you need to know \(u\) and \(v\) to infinite precision at the boundary. If you are on a boundary point where \(\tau(u, v) = 0\), moving slightly in the direction of \(u \prec v\) immediately forces your hand, so you cannot satisfy that continuity property at the boundary.

This problem can be made to go away by removing the indifferent set from the set of lotteries you consider - it's perfectly physically possible to distinguish the lotteries if you know a priori that you will definitely have a preference for one of them. Unfortunately there are several problems with this:

  1. Where does that a priori knowledge come from? It's obviously not true in general - you have to somehow avoid ever being asked to choose between a lottery and itself.
  2. The VNM axioms rely crucially on the use of indifference in the continuity axiom.
  3. The implementation of such a choice function is still physically fraught, because as you approach the boundary the amount of precision you require to decide tends to infinity.

It is possible that there is some cunning workaround that lets you rescue VNM choice theory, but I find it very implausible that this is the case. The basic requirement that you construct a discrete function of the world is intrinsically aphysical, and it seems very hard to rescue that.

I have yet to sit down and think through exactly what I would like in its place in any great depth, mostly because nobody except me cares about this problem and I don't care enough to pursue it solo, but my preferred primitive is to replace the choice function with a continuously varying choice probability, so instead you have a function \(\tau: \mathcal{L}^2 \to [0, 1]\), where \(\tau(u, v)\) is the probability of choosing \(u\) over \(v\), and \(\tau(v, u) = 1 - \tau(u, v)\). This neatly side steps all of the problems with using a discrete choice function, because you don't need to know the outcome probabilities with infinite precision in order to make your choice, and \([0, 1]\) is a connected topological space so, unlike discrete choices, it's perfectly possible to construct such functions.


I previously wrote a post about NP-hardness in decision theory. On rereading it, I think its tone really doesn't help its point at all, so I thought I'd quickly write up a more formal version.

The basic point is this: If you don't assume computation is free, NP-hard problems prove an interesting barrier to decision making that satisfies the classical "rationality" axioms.

Suppose you have an NP-hard problem (say a SAT instance), \(S\), and are offered a choice between the following three options:

  1. A certain reward \(R_1\).
  2. A reward \(R_2 > R_1\) if a particular solution \(x\) witnesses that \(S\) is satisfiable.
  3. A reward \(R_3 > R_2\) if \(S\) is satisfiable.

Arrange the values such that \(\alpha_2 \ll R_2 - R_1 < R_3 - R_1 < \alpha_3\), where \(\alpha_i\) is the cost of solving the computational problem that would allow you to determine the pay off of these choices. i.e. the difference in reward is big enough that it is worth evaluating one solution, but small enough that it's not worth solving the problem.

You should always strictly prefer \(3\) to \(2\), because under every circumstance where \(2\) pays anything, \(3\) pays a larger amount.

Additionally, you should prefer \(2\) to \(1\) if and only if \(x\) is a witness - because it's cheap enough to check, you just acquire the information and be done with it (if you want to quibble about expected payoffs, assume that \(\alpha_2\) is really really small).

However, you should also prefer \(1\) to \(3\), because the possible reward you can gain by solving the problem is not worth the cost, so you should take the certain reward instead.

This means that if \(x\) is chosen so that it is a witness, you have an intransitive preference \(2 > 3 > 2 > 1\).

Another way of thinking about this is that the choices over this problem are not subset consistent. The choice you make from \(\{1, 2, 3\}\) in the case that \(x\) is a solution is either \(3\) - evaluating \(2\) tells you that it's worth choosing \(3\) over \(1\), so you can skip paying the cost and just choose the good option. In contrast, your choice when picking from \(\{1, 3\}\) would be \(2\) - removing a value that was not the chosen answer has caused your opinion to flip.


Terry Tao has an interesting series of posts:

The idea of the "no self-defeating object" argument is, roughly, that suppose there were some some object that "defats" all objects, then it would also defeat itself, and thus cannot exist. It's a specific form of reductio ad absurdum, and can be applied to many different forms of "object" and notions of "defeat".


In the second post he outlines how we can almost always turn these arguments instead into "every object is defeated by some other object", and this often works better for people uncomfortable with proof by contradiction (which is most non-mathematicians).

The third post is especially interesting in the light of my recent post about the nature of mathematics, in that it observes that an unusual characteristic of mathematics is that mathematical statements are intended to have a precise meaning in a way that natural language statements typically are not.

This suggests the following modified definition:

Mathematics is the study of unambiguous statements about hypothetical objects


The laptop policy from Shriram Krishnamurthi's "Accelerated Introduction to Computer Science" class is an interesting collection of resources on laptop usage in class.

I've definitely found that it is true that longhand note taking improves my retention and focus while device usage immediately kills it. The point about device usage distracting other people around you is particularly interesting though.

I feel like imposing this sort of rule is a deeply unpopular move in my social group, but I think they're mostly wrong about that. OTOH this is very much a question of competing access needs and I'm not sure what the best way to resolve it is.`


How Complex Systems Fail is very good. A lot of the citations have been on my reading stack for a while, and given that I'd already deprioritised them I'm now inclined to just not bother now that I've read the TLDR.


I've been trying to come up with a definition of mathematics that I like and think would be useful in the course of teaching people mathematics.

This is of course a big ask, as according to Wikipedia there is a great deal of spirited philosophical debate on the subject, but on the other hand I think most of those definitions are terrible, so I don't feel too bad about trying myself.

The one I dislike the least from that list is Eric Weisstein's:

Mathematics is a broad-ranging field of study in which the properties and interactions of idealized objects are examined.

It's a bit long-winded but mostly captures the sense I want. The phrasing I've been thinking of in preference is something more like:

Mathematics is the rigorous study of hypothetical objects.

The idea is that in mathematics we're not really concerned with real life physical objects, we can just say "Suppose there were objects satisfying the following properties, what can we reliably say about them?"

Sometimes those objects are ones that can easily be realised as real physical objects. For example the Mathematics of Chess studies hypothetical chess boards, but those hypothetical chess boards can easily be realised by going out and buying an actual physical chessboard. However, many of them can not be. There is no way to construct a real physical set of natural numbers, but from a mathematical point of view that's OK - we can reason about the properties of the hypothetical one perfectly well.

There are a couple axes of variation on which people differ about the nature of mathematics:

  1. Is informal mathematics legitimate, or should all mathematics be considered a (possibly bad) approximation to an entirely formal set of reasoning rules?
  2. Are some hypothetical objects privileged as the true platonic mathematical objects in a way that others are not?

I think this definition is more or less compatible with any combination of answers to these questions: Formalism is a question of what we count as "rigorous", and even if there are platonic mathematical objects, we still must study them as if they were hypothetical because by its very nature we cannot have access to the platonic realm.

Traditionally the answers to these questions have been correlated more than I think is logically required: The formalist position is that mathematics doesn't real and that everything is formal manipulation of symbols, while the platonist position is that we are seeking to discover truths about the ideal platonic realm and the truths are what matter regardless of how we reason about them.

I think there's room for a third position though, which is that formalism is interesting but not strictly required, but the objects we describe have no inherent reality and really are allowed to be purely hypothetical. I've historically self-described as a formalist, but I think this third position is closer to my true beliefs: I don't think Platonism is philosophically defensible, but I do think there is a lot of interesting mathematical content and activity that cannot be adequately captured by the formalist position.

In many ways this third position is that of Lakatos in his "Proofs and Refutations". Most of the interesting mathematics happens in a fuzzy middle-ground where you are making your definitions precise enough to be defensible. This could go all the way to formalism, but it doesn't have to.

The mathematics of chess is again an interesting test case here: Chess is a purely arbitrary set of rules. I think it would be hard to argue that there is a platonic game of chess that is in some essential way different than it would have been if, say, kings moved like knights or you could win by killing the queen or the king. These are both perfectly valid games that someone could play, and there is a perfectly valid mathematics in studying them, but we study the mathematics of chess in preference to them because that is the actual game people play.

Conversely, there really is a set of true statements about the game of chess (in an informal sense of chess), and while mechanising and formalising the study of them might be useful for determining what they are, I think it's fair to say that what actually matters is whether the statement is true of real games of chess, and the formalisation only matters to the degree that it helps us discover those truths.

I don't think the above definition is enough to fully reconstruct an idea of what mathematics is like, because it leaves open two big questions:

  1. How do we select which hypothetical objects to study?
  2. How do we study them?

The answer to the first is comparatively easy, which is that it's based on what I think of as "The Three Good Reasons To Do Things":

  1. It's useful.
  2. It's interesting.
  3. Some asshole is forcing you to do something useless and boring.

(Most people's encounters with mathematics is of type 3, sadly, which is why I always hear "Oh I hated mathematics at school" when I tell people I did mathematics at university)

Of course, point 2 is slightly subtle, because doing mathematics is much easier if other people have done similar mathematics, so you're constrained not just by what you think is interesting, but by what you can convince other people is interesting.

The second question is the hard part, and I think we currently do a very poor job of explaining it to people. I need to think further about it.


The thing I called the Feynmann style relates to Tim Gowers's The Two Cultures of Mathematics, where he suggests that there are two cultures of mathematics: Theory building and problem solving. The latter tends to get organised not along the lines of general big ideas and broad theorems, but instead along heuristics and guiding principles.

Gowers refers to the areas of mathematics that are primarily problem-solving as combinatorial, but I feel like this kind of problem solving is one that it doesn't seem right to refer to as combinatorial - it's more... calculation?

There's a similar sense of being guided by heuristics and general ideas though.

For example, one general idea is "replace annoying terms with integrals over some new variables, then swap out the variable".

Suppose we didn't know what \(\sum\limits_{n = 1}^\infty (-1)^{n - 1} \frac{1}{n}\) was. How would we deal with this?

(Note: There's all sorts of playing fast and loose with convergence in this post that you can shore up later with some proper calculation but I'm not actually going to do. That's very common in this sort of proof).

Well, that \(\frac{1}{n}\) is an annoying term. Lets get rid of it with. A classic way of doing this is to replace it with \(\int\limits_0^1 x^{n - 1} dx\).

We can now do the computation as follows:

\begin{align} \sum\limits_{n = 1}^\infty (-1)^{n - 1} \frac{1}{n} & = \sum\limits_{n = 1}^\infty (-1)^{n - 1} \int\limits_0^1 x^{n - 1} \\ & = \sum\limits_{n = 0}^\infty (-1)^n \int\limits_0^1 x^n \\ & = \int\limits_0^1 \sum\limits_{n = 0}^\infty (-x)^n \\ & = \int\limits_0^1 \frac{1}{1 + x} \\ & = \ln(1 + x) \\ \end{align}

Roughly the steps here are:

  1. Try replacing tricky terms with integrals over simpler terms
  2. Use standard sums that you already know the answer for
  3. Try swapping sums and integrations

Another useful calculational heuristic is "try changing the variable".

e.g. what's the limit as \(n \to \infty\) of \(n \ln (1 + \frac{1}{n})\)?

Well, let \(x = \frac{1}{n}\). This expression is now \(\frac{ln(1 + x)}{x} = \frac{ln(1 + x) - \ln(1)}{x}\) as \(x \to 0\). i.e. it's the derivative of \(\ln\) at \(1\), i.e. \(1\).

It's hard to explain exactly what the thought process is here. It's like solving a puzzle - you have a bunch of known tricks that you think might work and you try to apply them all. If you were to mechanize the process then it wuold end up looking like a brute force solver for the problem, but by using intuition you can kinda guide the way.

I think maybe one part of the split between problem-solving and theory building is how much of what you end up building escapes the head of the mathematician building it: Problem-solving skills are much harder to teach to another person than theory is (once that other person has build the skill of acquiring theory, which is also hard to teach)


Compare and contrast:

I think the role the "Aristotle" (AKA Rachel Barney) describes is probably quite a useful one in the right context, the problem is that the nature of Trolling as defined in that paper is intrinsically that it is done not in the right context.

There's a thing that happens in Vernor Vinge's "A Deepness in the Sky" where the Evil Overlord ™ is experimenting with different configurations you can put a group mind in. I sometimes think about this as an analogy for how to construct better modes of group problem solving (in a non-evil-overlord way that in no way involves my using the army of crows that I don't have to impose my will on the unsuspecting masses. Yes) .

In particular, I think it's often actively useful for someone to explicitly take an adversarial role in a group discussion, and it improves the resulting group's intelligence. The difficulty is that you need to do this in a context where the group consents to this, and with a fairly explicit discussion in advance of boundaries. It also helps to be able to ask the adversary to step out of the adversarial role and clarify their position.


When come back bring pie(s)

There's a metaphor people use: Some people fight for a larger slice of the pie, others see that it's better to enlarge the pie.

I've seen this metaphor used for everything from intersectional feminism to the Patrician of Ankh-Morpork's extremely libertarian brand despotism. Broadly the point is this: It's better to build a positive sum game where everyone benefits than it is to compete in a zero or negative sum game.

The relationship between this point and the metaphor is interesting. I agree with the thing that I am claiming to be the underlying point (but then I would), but I think what the actual metaphor demonstrates is also interesting: People don't understand how pies work.

What happens when you build a bigger pie?

  1. You run into scaling issues, limited both by the size of your oven and also (if you build a better oven) the square-cube law (actually I'm not sure if this is the square-cube law at work, as pies tend to be scaled horizontally faster than they are scaled vertically, but either way once your pie gets big enough it's very hard to ensure it's cooked all the way through - you end up with overdone outsides and and underdone middle).
  2. The same people who couldn't eat your smaller pie still can't eat your larger one.6

The correct solution is not to enlarge the pie. It's to bake more pies, and also the provide tasty food that is not pie because not everyone likes pie.

If you've ever tried catering to a diverse group of dietary requirements, at some point you hit the point where you realise that it's much much easier to make multiple dishes than it is to try to create a single dish that can feed everyone. A vegan gluten free nut free diabetic friendly pie is certainly possible, but it is a pie that basically nobody is going to want to eat. In contrast, a wide variety of desserts that can cater to each particular restriction that your group encounters, without attempting to shoehorn everyone into a one size fits all badly model.

The Unit of Caring has a notion she uses a lot of competing access needs. She explains it well here, but the important quote (to save you from Tumblr's giant GDPR screen) is:

Competing access needs is the idea that some people, in order to be able to participate in a community, need one thing, and other people need a conflicting thing, and instead of figuring out which need is ‘real’ we have to acknowledge that we can’t accommodate all valid needs. I originally encountered it in disability community conversations: for example, one person might need a space where they can verbally stim, and another person might need a space where there’s never multiple people talking at once. Both of these are valid, but you can’t accommodate them both in the same space.

Trying to build a space that works for everyone is more or less impossible, and what you will end up with is a space that works badly for everyone. Instead we need the ability to have multiple spaces which we acknowledge as valid and allow people to freely move between these spaces as long as they are prepared to accept the local rules.

In an interesting coincidence, this came up in a completely different context recently. A while back I sketched out a way of using randomization to improve the design of Lean Coffee meetups. This morning a friend reported:

I used to organize David-Style lean coffees at my previous job. (...) The interesting limitation we ran into is that toward the end, the attendence was two groups with mostly disjoint interests.

The nice thing about small-scale democratic processes like this is that splitting the union is a completely legitimate move. If you have two groups with disjoint interests, why not run them as two groups? Ideally at different times so that people who really are interested in both can attend both.

How to do this sort of thing at a larger scale seems to be one of the great unsolved problems of society.


Follow on to misc thoughts about voting design for talk scheduling.

Here's how a system that is much closer to classic STV could work. Assume everyone has a ranking of all the talks they wish to attend (this isn't actually reasonable to ask for, but you could get people to score talks according to some ordinal scores and then randomly tie break, or tie break in organiser preferred order or something).

The system has the following three parameters:

  1. The number of time slots.
  2. The number of talks per time slot.
  3. The minimum number of attendees required for a talk to be worthwhile (should be at least one). Callt his the threshold.

You also need to pick a quota system. Either the Droop or the Hare quota are the obvious choices. My natural bias is to use the Hare quota, as it's better for minority interests and I think that's a nice feature to have in your conference talk selection (conferences have a tendency to have the same talks over and over again and I think this would help offset that).

The system could easily be adapted to more complicated constraints in which not all talk/time slot combinations are valid, but I'm going to ignore that.

Conceptually what happens is everyone is given one voting-buck, and a talk slot "costs" an amount of voting-bucks equal to the quota. People band together to form buying blocs and each spend the same percentage of their remaining pool of voting money to buy a slot (this is basically how normal STV works too).

The system involves running the following process to a fixed point:

  1. Set the list of eligible talks to all talks which have at least the threshold number of people voted for them.
  2. Give everyone exactly one vote (note: as the process evolves, people will have fractional votes).
  3. People vote for (talk, slot) pairs, where the slot has not already been filled and the talk is both eligible and not yet scheduled. They will vote for a pair if:
    1. The talk their highest ranked talk among the available talks.
    2. If there are slots which have no talks they want to see in them, they will only vote for pairs in those slots. Otherwise they will vote for pairs where they prefer the talk to the one currently scheduled there. Note that a voter can vote for multiple (talk, slot) pairs.
  4. If there are no such pairs, we have scheduled all of the talks we can (even if there still unfilled slots). Stop and report this as the schedule.
  5. If any of the pairs has a total number of votes exceeding the quota, pick the one with the most votes and schedule that. For each voter who voted for it, multiply their remaining vote by \(1 - \frac{q}{r}\), where \(q\) is the quota and \(r\) is the total vote for the elected slot (i.e. we've removed \(q\) from their total vote and everybody pays it equally).
  6. If no pair was elected, take the talk with the lowest maximum vote over all vote pairs, and remove it from the list of eligible talks.
  7. If a pair was elected, now check if any talks can no longer meet the threshold - i.e. if for every slot you could schedule them in, count the number of people for whom that is their favourite talk in that slot. If there are no slots where this exceeds the threshold, remove the talk from the eligible list.
  8. If we removed any talks from the eligible list, reset all of the state except the list of eligible talks and go back to step 2. Otherwise go back to step 3.

Most of this is just variant STV, with some of the specific details owing to specific types of STV. The main difference is that because the same voter may cast their vote for multiple options simultaneously, we need to be careful not to elect more than one "candidate" at once, plus the specialised drop-out rule for talks that fail to meet the threshold.

Most of my problems with it are the same as my problems with STV in general: It looks like an iterative optimisation process, but it's not at all clear what it is you are optimising for. So it might work well, but I'm not really sure how you would measure "well" in this context. It seems plausibly worth a try though.


Mechanisms for talk scheduling and voting

I've been thinking about mechanism design for conference scheduling again. I've previously argued that conference scheduling should be treated as an optimisation problem, but I no longer believe that's true.

In particular I think the following hold:

Lets see some examples in support of this.

Suppose you're running a Python conference, and 60% of the people attending are web developers and 40% are data scientists. You put together a set of talk proposals, people vote on them, and you take all of the top voted talks. What you end up with is of course a conference consisting entirely of web development talks.

(Note: Despite the running Python example, this post is not actually about The PyCon UK Schedule, which I've barely looked at.)

For some contexts maybe that's OK, but given that a lot of the value in conferences is the hallway track, it's nice to be able to put together heterogenous conferences. You could fix this by artificially selecting for certain subjects, but proportional representation seems like a much better approach because it doesn't require you to know all the ways in which your audience is heterogenous in advance. So, in the above example, we would have roughly 60% web dev talks and 40% data science talks, but also if it turned out that about 10% of the audience were really excited about Flask, we could have about 10% Flask talks.

If the conference is single-track we're more or less done: Pick your favourite (non party-list based, so probably some variant of STV), proportional voting system, use that to select your talks, and call it a day.

I'd like to pause here by saying that I'm increasingly a fan of single track conferences, so I think "do a single track conference and call it a day" might actually be the correct solution.

But lets suppose you're less on board with that and want a multi-track conference.

For simplicity, lets imagine that our Python conference now has two rooms, with talks running in the same time slots in each room, and attendees now have to choose which of the two to attend. Lets say it's a single day conference and there are five time slots, so ten talks.

According to our above PR argument, we should run six web dev talks, but does it really make sense for us to do so? There are only five time slots, so (by the pigeonhole principle if you want to get fancy about it) you're inevitably going to put two web dev talks back to back. That might be OK - maybe you're scheduling a Django and a Flask talk against each other - but maybe there's a strict preference where there are five obviously best web dev talks and the sixth is pretty good (preferable by web devs to any data science talk) but not good enough (will not get any attendees when scheduled against any of the top five talks). What's the point in selecting that talk given that?

In the other direction, lets say we have 20% of the audience who are really interested in random forests, and so we select two random forests talks, which we then proceed to schedule in the same time slot. Now despite 20% representation at the talk level, they only have 10% representation at the time slot level!

(I want to draw an analogy to gerrymandering here but I don't think it quite works)

So, tracking creates an upper bound on how much proportional representation is worth doing, and also scheduling within those tracks affects the amount of proportionality you actually get.

So what to do about it?

Well, I'm not entirely sure. I started designing a whole complex system in support of this that this note was originally supposed to be about, but I decided I didn't like it very much.

The basic ideas were:

  1. Give each participant a "voting currency" - everyone starts with an equal amount, and talk slots effectively get auctioned off, with the proceeds distributed among everyone equally (possibly among everyone who still has any interest in attending remaining talks).
  2. Participants will only vote for talks in slots that are strictly better for them than the talks already scheduled in that slot.
  3. Define a threshold of "Minimum number of people required to be worth running a talk". Whenever a talk no longer would meet that requirement (because every slot it could be scheduled in has talks people prefer more), it is immediately excluded and the process restarts from the beginning. This is akin to how exclusions work in The Wright System of STV, and is designed to avoid "spoiler" talks, where people who preferred them effectively get screened off from voting in the process until the talk is excluded.

The details kinda became a weird hybrid of STV and the Vickrey-Clarke-Groves mechanism and the more I looked at it the less convinced I became that it was the right way to do things or that I actually understood how the VCG mechanism plays out in practice.

I do think the above examples are important to consider though.


My parents, Ayn Rand and God

From bazzalisk on Twitter:

“You know him better than I” and “You know him better than me” are both grammatically valid but mean different things

The former means "You know him better than I do", the latter means "You know him better than you know me".

The title of this note comes from the following probably-apocryphal book dedication, used as an argument for the oxford comma:

This book is dedicated to my parents, Ayn Rand and God.

Without the Oxford comma, the implication is that the author's parents are Ayn Rand and God, with the Oxford comma, this is a dedication to four people (the author's parents, and also to Ayn Rand and God). Mental Floss has a bunch of similar ones.

Snopes think this probably never happened, but OTOH the following is part of their argument:

Since Rand was such an outspoken atheist, I find it hard to believe that anyone would mention both her and God as sources of inspiration.

And, well, this seems to ignore the existence of Paul Ryan and a significant chunk of the US political right. Also I'm now amused by the idea of Ayn Rand's atheism being a reaction to God being a deadbeat dad. Someone should write that fanfic, but it's not going to be me.

There is of course an entire book about comedic misinterpretations due to bad grammar, but that's not exactly what's going on here: Instead these are interesting grammatically valid examples that are right on the edge of ambiguity.

It's unclear to me whether this actually tells us anything useful. We could probably derive some normative advice about correct use of grammar from it, but this sort of thinking about things in terms of their edge cases is a very modern-mathematician view of the world, which doesn't come very naturally to others.

The general widely deployed solution to linguistic ambiguity is instead that we just guess or ask, and frankly that probably works better than trying to remove it.


Fiction for Kristian

This is a small collection of fiction I've written that I like enough to actively recommend and think count as "finished".

Fan fiction

The two pieces of Stargate fan fiction that I've written and would recommend are Stargate Physics 101 and Interview with a System Lord. Both are not only canon-compatible (more or less. Stargate Physics 101 doesn't quite line up with Stargate Universe, but I don't care about Universe), but are 100% my headcanons of how the universe works.

Completion status: 100% finished standalone pieces. I may write other Stargate fan fiction at some point, and if I do then as part of the universe's canon those will naturally be part of its backstory, but there will never be sequels per se to these pieces.

Recommendation strength: Stargate Physics 101 is one of my most popular pieces and works even if you have never watched Stargate. If you like any of infrastructure science fiction, software testing, or stargate, it's worth reading. Interview with a System Lord is worth reading if and only if you like Stargate SG1 (and especially if you like Ba'al) and want a moderately amusing story exploring a weird headcanon. Warning: May cause mild sympathy for the devil.

The Rules of Wishing is a piece of fan fiction of Disney's Aladdin. Premise:

What if people were good at wishing? The Genie's rules have holes you could drive a herd of camels through, but they don't have to. Aladdin and Jafar's wishes are shallow and limited, and lack the foresight that really effective wishing entails, but wouldn't a battle between effective wishers be much more interesting? And while we're at it, why does Jasmine have so little agency and basically act as a prize to be won in a battle between two men when literally the entire point of her narrative is that she's not that?

It has been argued to be rational!fic though I'm not sure I agree with the classification. Jasmine in this is probably my joint favourite character I've ever written.

Completion status: Has a mini non-canon sequel The Consequences of Wishing that explains the divergence between this story and the film. May, but probably won't, spawn another sequel, but the current ending wraps it up entirely to my satisfaction and any sequel would be a new story in the same universe with the same characters rather than a continuation of this story.

Trigger warning: Moderately violent.

Recommendation strength: Honestly, you should read this if you like my fiction at all and are not put off by the trigger warning.

Counterparts is a crossover fic between Lucifer (the TV show) and Old Harry's Game (the radio show).

Completion status: Very standalone. It's not impossible I may do a followup involving The Good Place, but it stands on its own regardless of whether I do.

Recommendation strength: Well it amuses me. Based on feedback, if you like Lucifer it will probably also amuse you. Familiarity with the Old Harry's Game is helpful but not strictly required.

Original Fiction

Programmer at Large is a story about gender, social anxiety, and legacy code. It seems to have a lot of fans.

Completion status: Abandoned, but it kinda works that way. It's a series of slice of life chapters, and the protagonist's life is never really "finished". However it definitely has some unsatisfying dangling plot threads that will never be resolved. However most of the strength of this story is at the chapter level anyway - it has some of my best writing in it, but as a whole story I do not feel that it works. I intend at some point to take it apart and refactor and modularise it into several smaller stories. I am fully aware of the irony of saying this about a story about legacy code.

Recommendation strength: Mixed. There's some stuff in there I really like, and a lot of people seem to love it, but like I said I don't feel that it hangs together in its current incarnation.

The Diaries of Vicky Frankenstein more normally AKA "The Vicky Stories". Series of short stories about Dr Vicky Frankenstein and her adventures in joining a biotech startup run by the vampire Ada Lovelace.

Completion Status: (Hopefully permanently) incomplete in the sense that I fully intend to keep writing Vicky stories (but don't more than about one every six months), but each Vicky story is a complete standalone short story that happens to be set in the same world and use the same characters. There are minimal to no dangling plot threads between the stories.

Recommendation Status: I ♥ writing Vicky and think you should read these. Also, contains a (99% SFW) lesbian sex scene between two amoral monsters that reviewers describe as "ridiculously adorable", so there's that.


Compare and contrast two interesting links:

(Note that I've not read the latter two and should. I've only read digested versions of them).

In general often the right way to judge an action is not actually on its immediate effects, but on what long-run effect they will have on the sort of people you will surround yourself with. This can make seemingly good actions harmful and seemingly bad or nonsensical ones quite useful.

I think about this a bunch in the context of codes of conduct: Often the benefit of the code of conduct is not whether it is ever enforced, but that it filters out people who don't like codes of conduct.


Notation for test-case reducers

A thing I've been noticing recently is that it's really useful to have compact notation for describing things. Usually this is equivalent to primivitives + some combinators.

One thing that I think it would be useful to have such a notation for is (greedy) test-case reduction passes. They combine pretty well, and it makes it useful to discuss various things.

For example, if you have reducers \(A\), \(B\), you can define the reducer \(AB\) which runs \(A\), then runs \(B\) on its result. You can also define the reducer \(A^+\) which runs \(A\) to a fixed point.

Another interesting combinator is \(/\). \(A / B\) runs \(A\), then runs \(B\) if \(A\) didn't do anything..

There are a bunch of really basic algebraic relations that hold, like composition and \(/\) are associative, and \((A^+)^+ = A^+\), but not a huge amount beyond that.

A bunch of interesting questions about test-case reduction can be compactly expressed in this notation though. For example, suppose you want to reduce to something that is a fixed point of both \(A\) and \(B\). You could do \((AB)^+\), but you could also do \((A^+B^+)^+\), and it's quite natural to do this in some contexts. My suspicion, which I've yet to verify, is that it's almost never the right thing to do.

You can kinda regard the quadratic mode failure of greedy search as an instance of this problem: If \(\delta_i\) is the operation that deletes the element at position \(i\), the correct pass to run for greedy deletion is \((\delta_0^+ \ldots \delta_n^+)^+\), but if you start again at the beginning every time you succeed you are running \((\delta_0 / \ldots / \delta_n)^+\).


Modes of writing

Two posts on writing to contrast:

Devon posted the second on twitter and it reminded me of the first, which I struggled to refind, which is part of why I'm posting it here.

I've been finding having a paper journal very useful, but I'm also finding having this new notebook useful in an entirely different way. The contrast is very interesting.


Can a machine design?

Can a machine design? by Nigel Cross is an interesting paper about architecture (the real kind!) and its relation to automation. I found it via Adam Marshall Smith's PhD thesis Mechanizing exploratory game design (truthfully via this tweet about it from Max Kreminski), which is an excellent thesis on mechanically assisted creativity (I must admit I skimmed the technical content as less relevant to me - I care about the meta more than I care about game design qua game design).

Most relevant quote for me:

Despite this apparently easy pace of interaction, all of the designers reported that they found the experiments hard work and stressful. They reported that the main benefit of using the "computer" was increased work speed, principally by reducing uncertainty (i.e., they relatively quickly received answers to queries, which they accepted as reliable information). I also tried a few variations from my standard experiments. The most interesting was to reverse the normal set of expectations of the functions of the designer and the "computer." The "computer" was given the job of having to produce a design to the satisfaction of the observing designer. It immediately was apparent that, in this situation, there was no stress on the designer—in fact, it became quite fun—and it was the "computer" that found the experience to be hard work.

i.e. it's much more fun to tweak a computer's output than it is to be critiqued by one. An important observation for people in correctness research I think!


Some free user experience consulting for Google

I am not a UX expert. I've worked with people who are, and I'm probably a lot better than my otherwise utter incompetence at front-end work would suggest, but I'm at best OK.

Nevertheless, as a user I get to see a lot of the sharp edge of the problems, and I'm good enough at UX that I think I can see what the shape of the solution is.

The product I would like to offer Google some free advice on is the following: Google Maps's driving navigation.

On a related note, if you can recommend a good driving navigation app to me (iPhone, sadly), that would be delightful. It would be especially useful if it were one that understood features of English roads like "has roundabouts" and "is verrah verrah smol" that seem alien to people from the US (although given how much of Google maps is in Zurich, I'm still surprised by its failure to understand these).

Anyway, free UX consulting. User stories are cool I hear, so here are my two user stories for Google maps:

As a driver, I would like to survive my trip.


As a driver, I would like to be able to drive without a constant sense of paranoia.

Currently Google Maps fails both of these so hard that I have conjectured that I have somehow triggered a special murder-mode for ex-Googlers, because honestly if Google Maps treats most drivers like it treats me then either not many people can be using it or I would have expected a better publicised death toll from it. I am not actually being hyperbolic here (or even parabolic).

Google maps reliably does everything in its power to destroy my trust in it, which is not ideal in something that I have to use while driving.

As the most basic minimum that would be required to restore my trust, I would like to propose the following feature:

Google maps should never, under any circumstances, exit navigation without an audible confirmation that it has done so.

There is what is almost certainly a bug in Google maps where sometimes it just goes "lol, I'm done here" and exits navigation without telling me. This is functionally indistinguishable from the sort of confirmation Google maps uses to tell me to just keep going straight. As a result, whenever Google maps is silent for an extended period of time, I end up feeling a gnawing sense of paranoia that it's just not telling me what to do and I'm going in completely the wrong direction.

Almost all of the time this is not the case and the correct thing to do is to keep going straight (although Google maps's notion of what "keep going straight" is is often very funny and involves amusing interpretations of the word "going straight" that include things like "turning left" - it is not very good at actually knowing where the road markings are, and if the road follows around to the right it will often confuse a left turn with keep going straight. However, I will forgive it data problems, particularly on the weird back country roads I often drive), but this bug triggers just often enough (last incidence: about an hour ago) that the exceedingly common operation of driving in a straight line fills me with deep unease whenever I use Google maps for navigation.

Even if this bug were fixed, the damage is done, and I will never believe Google maps is still running if it is silent.

On top of that, I would like to propose the following feature:

Google maps should never be silent for an extended period of time.

I'll grant that if the last instructions were "Keep going for 500 miles" it doesn't need to give me a mile counter every five minutes, but if it could tell me every half hour or so "Yup, everything is cool, keep going" that would be great. In normal operation, every five minutes sounds about right.

The second source of paranoia is that Google maps gives absolutely no feedback as to when you have done something wrong. I know the whole nagging satnav going "Make a U-Turn. Make a U-Turn. Make a- *urk* (noise as satnav is thrown out window)" has a bad reputation, but there's a happy medium: When you do something Google maps does not expect, it should say something along the lines of "You missed a turn, I'm going to try to turn you around" or "You missed a turn, finding a new route".

Fun instances where it was very useful to have a second person in the car yesterday:

  1. When Google maps took me 30 miles up the wrong motorway before eventually turning me around.
  2. When Google maps was very upset that I didn't drive through the traffic cones blocking the route it wanted me to take and insistently tried to turn me around for another go.

Feedback that I had done the wrong thing would have been very helpful on the first, because I would have spent a lot of time confused without it. Feedback on the second that it was taking me around for another pass would also have been very helpful. I would have probably ignored its instructions even without Luke to assist me, but I would have felt much less certain about it.

Anyway, those is the main sources of paranoia. Lets talk about the other moderately important feature: Not dying and/or killing people.

This is a very simple issue: Google maps literally never gives you enough advance warning. This is especially true in the following two cases:

Giving this sort of last minute instruction is deeply unsafe, and needs to stop.

On top of that there's all sorts of data problems and things where Google maps just clearly doesn't understand UK roads, but I don't realistically expect those to be fixed, especially with the UK dooming itself to irrelevance next year and the only Google UK presence being in a city where you already have to embrace paranoia and risk loss of life and limb to drive in anyway, so I won't bother venting about those now.

In the meantime, I'm serious about that desire for recommendations of less murdery navigation apps. Please?


I'm a big fan of the Brzozowski derivative, introduced in "Derivatives of regular expressions" by Janusz A. Brzozowski.

The basic idea is that given some language \(L\) over an alphabet \(A\), and some string \(u\) over \(L\), you can define the derivative language \(\partial(L, u) = \{v: uv \in L\}\). We can extend this further (and it will be useful to do so below). If \(M\) is some other language, we can define \(\partial(L, M) = \{v: \exists u \in M, uv \in L\}\). I'm not currently sure if the derivative of a regular language by a regular langauge is regular in general. It is in the case we'll see later, and I suspect it is in general.

This seems like a pretty trivial observation until you realise the following three things:

  1. \(u \in L\) if and only if \(\epsilon \in \partial(L, u)\)
  2. \(uv \in L\) if and only if \(v \in \partial(L, u)\)
  3. For most common representations of languages, it's actually pretty easy to calculate a representation of their derivative.

Putting these together, you can use the Brzozowski derivative to calculate a deterministic (not necessarily finite!) automaton for almost any language that you can easily represent. You label states with descriptions of languages, a state is accepting if it matches the empty string, and transitions to the states labelled by the derivatives.

Regular-expression derivatives reexamined by Owens et al. has some nice practical details of doing this in the context of functional programming.

To see this in action, consider the standard regular expression operators. These satisfy the following identifies:

  1. \(\partial(A | B, u) = \partial(A, u) | \partial(B, u)\)
  2. \(\partial(AB, u) = \partial(A, u)B | \nu(A) \partial(B, u)\), where \(\nu(A) = \epsilon\) if \(\epsilon \in A\) or \(\emptyset\) otherwise (i.e. the derivative can skip over \(A\) if and only if \(A\) contains the empty string)
  3. \(\partial(A^*, u) = \partial(A, u) A^*\)

A result proved in Brzozowski's original paper (apparently. I can't currently seem to access it, and am going off thecite in "Regular-expression derivatives reexamined) is that a small number of reasonable normalisation rules over the representation of the language is enough to ensure that you only get finitely many states in the state machine generated by partial derivatives of regular expressions. It's certainly true that you only get finitely many if you have full equivalence for the regular languages labelling the states - the derivative automaton is actually the minimal automaton representing a language.

There are two very nice things about this representation of the language's automaton though:

  1. It can be done lazily. This means that even when your deterministic automaton has exponentially (or infinitely!) many states, you only ever need to explore the states that you walk when matching strings.
  2. It is very easy to extend with new operators.

An example of (2) is that regular expressions reexamined actually does it for extended regular expressions with intersection and negation, because might as well right? It's no harder than doing it with the normal ones, even though adding these to your regular expression language can cause exponential blowup in the size of the automata compiled from your regex.

But there are even more interesting ones if you're prepared to go for more esoteric operations!

Have you heard of the Levenshtein automaton? The set of strings within some finite edit distance of another string is a regular language and you can define a nice automaton matching it. But in fact, a stronger result is true: For any regular language \(L\) and natural number \(n\), the set \(E(L, n) = \{u: \exists v \in L, d(u, v) \leq n\}\) is a regular language. Why?

Well, we can calculate its derivative! The derivative of \(E\) is \(\partial(E(L, n), u) = E(\partial(L, u), n) | E(L, n - 1) | E(\partial(L, \cdot), n - 1) | \partial(E(\partial(L, \cdot), n - 1), u)\). That is, at each character we can either:

  1. Continue matching the original language (cost 0).
  2. Insert a new character in front of something in the original language (cost 1)
  3. Replace a character in the original language with \(u\) (cost 1)
  4. Drop a character from the original language and try again (cost 1)

In the course of doing this we apply the following rewrite rules:

  1. \(E(L, 0) = L\)
  2. \(E(\emptyset, n) = \emptyset\)

As long as the number of reachable representations for the original languages is finite, so is the number of reachable states in our Levenshtein construction: Every state is labelled by a set of languages of the form \(E(\partial(L, U), k)\) where \(U\) is a language defined by \(u_1 \ldots u_m\) with each \(u_i\) either a single character or a \(\cdot\), and \(m + k \leq n\). There are only finitely many such labels as long as there are only finitely many derivatives of \(L\), although in principle there may be exponentially many. Because of the laziness of our construction that often won't matter - you can still determine membership for a string of length \(k\) with only \(O(k)\) state traversals (though calculating those states could in principle require up to \(O(nm)\) work, where \(m\) is the number of states in the original automaton).

You can also use this to determine the minimum edit distance between two regular languages, because you can test whether \(E(L, n) \cap L' = \emptyset\) by calculating and walking the generated DFA for the left hand side, so this gives you a decision procedure for \(d(L, L') \leq n\).

Is this a practical algorithm? Not sure. I've played with it a little bit, but I've not really put it to the test, but I think it's an interesting example of the flexibility of the Brzozowski derivative, and it was at least mildly surprising to me that the edit ball of a regular language is itself regular.


Mathjax and Python Markdown

I've been having an interesting time of things with this notebook and getting Python markdown and Mathjax to play well with each other. In particular I have not been enjoying the markdown extension API at all.

Anyway, it turns out that it is easy to do what I need, just slightly undocumented and with some annoyingly silent failure modes.

Here is the (slightly simplified) code from this notebook that makes MathJax work correctly:

from markdown.inlinepatterns import HtmlPattern

LATEX_BLOCK = r"(\\begin{[^}]+}.+?\\end{[^}]+})"
LATEX_EXPR  = r"(\\\(.+?\\\))"

class MathJaxAlignExtension(markdown.Extension):
    def extendMarkdown(self, md, md_globals):
        # Needs to come before escape so that markdown doesn't break use of \ in LaTeX
        md.inlinePatterns.add('mathjaxblocks', HtmlPattern(LATEX_BLOCK, md), '<escape')
        md.inlinePatterns.add('mathjaxexprs', HtmlPattern(LATEX_EXPR, md), '<escape')

The HtmlPattern class takes an expression and treats anything matching that expression as something that the markdown processor should not touch further.

Some caveats to note:


I'm going to start trying to port over some contents from my research notebook into here, as this is intended long-term to be a replacement for it. This will require some figuring out in terms of how to present maths.

As a starting point, here's a theorem:

\(H(m) = \sum\limits_{q = 1}^m {(-1)}^{q - 1} {m \choose q} \frac{1}{q}\)

Where \(H(m)\) is the m'th harmonic number \(H(m) = \sum\limits_{i}^m \frac{1}{i}\).

This came up in "Birthday Paradox, Coupon Collectors, Caching Algorithms and Self-Organizing Search" by Flajolet et al. (which is excellent) where it was stated as "well known". It wasn't well known to me, so I set out to prove it.

The following is my proof:

The main idea is to use a standard tricks of turning sums and integrals into other sums and integrals that happen to be easier to solve. We use the following standard results:

We then perform the following manipulations (don't worry if some of these are clear as mud. They kinda should be):

\begin{align} \sum\limits_{q = 1}^m {(-1)}^{q - 1} {m \choose q} \frac{1}{q} &= \sum\limits_{q = 1}^m {(-1)}^{q - 1} {m \choose q} \int\limits_0^1 x^{q - 1} dx\\ &= \int\limits_0^1 \sum\limits_{q = 1}^m {(-1)}^{q - 1} {m \choose q} x^{q - 1} dx\\ &= \int\limits_0^1 -x^{-1} \sum\limits_{q = 1}^m {m \choose q} {(-x)}^q dx\\ &= \int\limits_0^1 -x^{-1} \left( \sum\limits_{q = 0}^m {m \choose q} {(-x)}^q - 1 \right)dx \\ &= \int\limits_0^1 -x^{-1} \left( {(1 - x)}^m - 1 \right)dx \\ &= \int\limits_0^1 {(1 - x)}^{-1} (x^m - 1) dx \\ &= \int\limits_0^1 \sum\limits_{n = 0}^\infty x^n (x^m - 1) dx \\ &= \sum\limits_{n = 0}^\infty \int\limits_0^1 x^n (x^m - 1) \\ &= \sum\limits_{n = 0}^\infty \frac{1}{n + m} - \frac{1}{n} \\ &= \lim\limits_{k \to \infty} H(m) - \sum\limits_{n = k}^{m + k} \frac{1}{n + m}\\ &= H(m)\\ \end{align}

Notable magic tricks performed:

This is a style of calculation I think of as the Feynmann style because it's very good at seeming more clever than it actually is he was fond of smugly boasting about using this sort of trick in preference to contour integration. Given its prevalence prior to Feynmann, my only defence of the terminology is that it's not really intended as a compliment.

I find the Feynmann style completely unenlightening to read - the only way to read a Feynmann style proof is to do it yourself, using the original as a guide when you get stuck.

I think that's in some ways its point. It's not a proof technique designed to leverage enlightenment, but instead it leans heavily on your puzzle solving skills. That can be useful sometimes when you just want to brute force your way through a problem and don't really care about understanding it on any sort of deeper level.

I was exposed to the Feynmann style quite early on, due to reading Schaum's Outlines of Advanced Calculus (an earlier edition. I'm not sure how early. Brown covered one. I sadly gave away my copy, and the 1974 edition one I ordered doesn't seem to be quite it) prior to going to university. It has quite a lot of exercises using calculations like this, and afterwards I realised that this is what Feynmann had been talking about in "Surely you're joking, Mr Feynmann" (I didn't understand what a contour integral was until a few years later).

Somehow despite this the Feynmann style of brute force problem solving never really integrated into my mathematics, and it's only some years later I've come to appreciate its merits. I still prefer to achieve insight and make the problem trivial, but sometimes the problem isn't worth the insight and you're better off just putting in the hard work and solving it.

Putting in the hard work is also useful because sometimes it leads you to the insight you missed and you can throw away most of the work. This didn't happen here, but I think that's OK - it's not that interesting a problem, so I don't really feel upset by the lack of insight into it.


Notes on tiling with polyominoes

Gary Fredericks wrote about a backtracking algorithm for tiling a board with polyominoes.

His solution is roughly "turn the problem into exact cover and then apply a bunch of interesting optimisations in this context to the naive backtracking algorithm". The paper Dancing Links by Donald E. Knuth in fact studies this exact problem as an application of the exact cover algorithm.

I think some of the optimisations Gary performs are not ones that would be performed by a modern SAT solver because they are actually too expensive to be worth it if you're good at the SAT problem-e.g. I know modern SAT solvers tend not to bother decomposing problems into independent problems because the cost is too high-but it's possible they synergise well enough to be worth it. e.g. the number theory optimisation combined with the independent components may well be worth it, especially with the heuristic of prioritising moves that disconnect the board.

I've been doing a bit of casual reading about this class of problem recently. I thought I'd use the opportunity of this new notebook to collect some references. Ideally these would be proper cites, but I haven't got the citation part of the notebook system working yet.

Checker Boards and Polyominoes by Solomon W. Golomb is a classic here. It looks at the question of tiling the chessboard with a single square monomino and 11 tetrominos of various shapes. In particular it establishes:

How to Tile a Chessboard by Trupti Patel is a nice expository piece on this.

Golomb also wrote Tiling with Polyominoes, studying much more general questions of how to tile truncated chessboards with polyominoes.

A classic version of this is what Wikipedia refers to as the mutilated chessboard problem (apparently following Max Black):

Suppose a standard 8×8 chessboard has two diagonally opposite corners removed, leaving 62 squares. Is it possible to place 31 dominoes of size 2×1 so as to cover all of these squares?

The answer is no. In Tiling with Dominoes, N. S. Mendelsohn discusses two proofs:

First solution

From the checkerboard diagram, the region contains 30 black cells and 32 white cells. Since each domino covers 1 black and 1 white cell, tiling is impossible.

Second solution

When I was first shown the problem many years ago, it did not occur to me to colour the cells. The region itself had seven cells in the top and bottom rows and eight cells in the remaining rows. The same held for the columns. I proceeded to obtain information on how many dominoes pointed horizontally and how many vertically. The first count dealt with the vertical dominoes. If the region is tiled, the horizontal dominoes in the top row occupies an even number of cells. Hence, the cells in the top row that are not occupied by horizontal dominoes are odd in number. Thus there are an odd number of vertical dominoes between the first and second rows. Since the second row has eight cells, and an odd number are occupied by vertical dominoes coming down from the first row, there remain an odd number of cells in the second row. The same argument now shows there is an odd number of vertical dominoes from the second row to the third. Continuing this way, we see that there is an odd number of vertical dominoes between any pair of consecutive rows. Hence the total number of vertical dominoes is the sum of seven odd numbers, which is odd. In the same way, using columns instead of rows, there is an odd number of horizontal dominoes. Hence the total number of dominoes is even. Since there are 62 cells to cover, the number of dominoes required is 31, an odd number. Therefore, tiling is impossible.

He goes on to say:

Why do I produce two solutions to the puzzle? It is because I am interested in the question of which is the better solution. At first glance, it appears that the first solution is the better. It is much shorter and is easily understood by many people with virtually no knowledge of mathematics. But are there considerations that might judge the second solution to be the better one?

He then discusses whether the second one is better because it generalises better, when setting out to prove Gomory's theorem (which I've not been able to find a copy of the original of so far, but I haven't looked very hard): If you remove two squares of the same colour, you can always tiling the remainder with dominoes. The proof involves the construction of a hamiltonian circuit on the adjacency graph, and seems fiddly but interesting. I've only skimmed it and would like to digest it further.

However note that we saw a generalisation in a different direction in the first paper linked! Golomb's proof of the impossibility tiling with straight tetrominoes unless the monomino was in a very specific location was also a colouring argument.

The wikipedia page references "Across the board: the mathematics of chessboard problems" by John J. Watkins. I should probably look up a copy.



This is an experimental new blog intended for notes, thoughts, and whatever else I want to put here. It will likely be biased towards short notes rather than longform essays. It's loosely inspired by Mark Jason Dominus's shitposting blog and by my frustrations with WordPress, but I'm not really sure where it's going yet.

It's also a place where I'll be experimenting with notation, and generally trying to find a low friction way to express myself in a manner that I like. As such it's all a bit cobbled together out of spit, bailing wire, and Python.

Notational Highlights

I kinda hate LaTeX, but it's the best typesetting language for mathematics that I know of, so this notebook supports it using mathjax.

Testing: \(e^{i\pi} = -1\)

A test of code highlighting.

class SomeClass(object):
    """"A python class"""

    def method(self):
        """A method definition"""

As you've probably noticed, I'm using Tufte CSS. I'm not sure it's exactly what I want, but it's a lot closer to what I want than most other things I've tried. I will likely be messing aroudn with this further.

I'm also using mako templates, and fully intend to define a metric tonne of macros to make this usable.

In general I expect the actual source code for this site to be totally unusable to anyone who is not me. If anything, if it's not then I probably haven't done enough customization for my brain.