DRMacIver's Notebook

Rejection Sampling in Hypothesis

Published: 2021-01-14

(This post assumes you know roughly how Hypothesis does test-case reduction and generation. See our ECOOP paper for a high level explanation)

Suppose you have some generator base and you want to sample from it conditionally. In Hypotheis you can do this with base.filter(condition).

Conceptually, this does rejection sampling, which works roughly like this:

class Filtered:
    def __init__(self, base, condition):
        self.__base = base
        self.__condition = condition

    def do_draw(self, data):
        while True:
            attempt = data.draw(self.__base)
            if self.__condition(attempt):
                return attempt

That is, we’re attempting to draw from the conditional distribution of values that come from base that satisfy condition and we do this by repeatedly drawing values from it until one satisfies the condition.

This has a couple of problems:

The underlying choice sequence contains a lot of probably useless garbage - we made a bunch of choices in each iteration of the loop when drawing from the base (and possibly even when calling the condition! This is allowed), and these are probably all irrelevant.
These loops are potentially very long, especially if the condition is hard or impossible to satisfy.

We adopt the following heuristic approach to deal with the second problem:

class Filtered:
    ...

    def do_draw(self, data):
        for _ in range(3):
            attempt = data.draw(self.__base)
            if self.__condition(attempt):
                return attempt
        data.mark_invalid()

That is, we terminate all rejection sampling at three iterations - it may well be possible to proceed from there, but if so we have a chance at trying again at the next generation, and spending longer seems largely not worth it.

(Why three you ask? Well, let me ask you a better question: Why not three?)

(That is to say, no good reason. It’s a fairly arbitrarily chosen number that seems to work OK)

The first problem we have a slightly more principled way to deal with: We want to be able to reduce the choice sequence by removing all the redundant draws.

(This is made slightly more complicated by the fact that the draws might not be redundant - Hypothesis generators are allowed to have side effects, such as inserting rows into a database, so it’s possible for a filtered generator to do important work in the initial loops).

Assuming for the moment that condition does not make any choices (which it mostly won’t, and we’re not that fussed about producing worse results if it does) this is actually relatively easy: The boundaries of the draw call are marked on the choice sequence, and the reducer can try deleting that region of the choice sequence. This allows it to delete the early iterations of the loop, resulting in an effect where in a reduced choice sequence we just get implausibly lucky and always satisfy the condition first time.

We can, and do (although it turns out this got silently broken at some point and I fixed it as a result of writing this blog post), speed this up further by adding special boundary markers that indicate that the region is probably irrelevant. This looks like the following:

class Filtered:
    ...

    def do_draw(self, data):
        for _ in range(3):
            data.start_example(FILTER_LABEL)
            attempt = data.draw(self.__base)
            if self.__condition(attempt):
                data.stop_example()
                return attempt
            data.stop_example(discard=True)
        data.mark_invalid()

The reason this helps is that we can gather together all regions of the choice sequence that have been marked as discarded and attempt to delete them simultaneously. This will, usually, work. In particular it allows us to make all filtered generators implausibly lucky simultaneously with one call.

This kind of annotation of the choice sequence with useful information about common patterns in generation is often quite helpful for Hypothesis’s test-case reduction. I think a lot of it is not that necessary any more, but this distinction between used and discarded regions seems to help a great deal and often allows for some significant performance improvements during reduction.