DRMacIver's Notebook

Representing Steps in Test-Case Reduction

Representing Steps in Test-Case Reduction

In my last post on reducer design I included the following deletion algorithm:


def greedy_deletion(self):
    i = 0
    while i < len(self.best_test_case):
        attempt = self.best_test_case[:i] + self.best_test_case[i + 1:]
        if not self.is_interesting(attempt):
            i += 1

This iterates forward through a sequence and tries deleting each element of it in turn to see if it’s still interesting. It’s a foundation for a lot of useful test-case reduction passes.

It’s also in the wrong order. Really for most problems you probably want to write it as follows:


def greedy_deletion(self):
    i = len(self.best_test_case) - 1
    while i >= 0:
        # We're only calling this for the side effect of updating self.best_test_case
        self.is_interesting(self.best_test_case[:i] + self.best_test_case[i + 1:])
        i -= 1

The reason for this order is that later elements often depend on early elements (e.g. if a variable declared earlier is used later), so this often does less wasted work because it takes fewer passes to get to a fixed point - going forward, you need to rerun the entire pass to delete the earlier bits that have been enabled by the later bits.

One problem with either of these is that they can lead to long stalls, where there’s a large undeletable region and you’re slowly going through all of it. This is particularly annoying because often if you’d made the test case much smaller before checking all of those bits it would be much faster to go through them, because the interestingness test is faster on smaller test cases.

In order to avoid these stalls it might be better to write it like this:

def greedy_deletion(self):
    indices = list(range(len(self.best_test_case)))
    while indices:
        i = pop_random(indices)
        # Might no longer be true because we've successfully reduced
        if i < len(self.best_test_case):
            # We're only calling this for the side effect of updating self.best_test_case
            self.is_interesting(self.best_test_case[:i] + self.best_test_case[i + 1:])

This iterates over the test case in a random order, avoiding stalls as a result - if a fraction \(p\) of the test case is deletable this will only stall for around \(\frac{1}{p}\) steps, while either deterministic version can potentially stall for \(O(n)\) calls.

(this isn’t a very good way of doing randomisation, and all of these are improved by adaptive delta debugging of course, but we’ll ignore that for now).

The problem of course is that all of these are basically the same algorithm with a lot of code duplication between them, and we would like to be able to flexibly swap between modes based on what’s working.

Another related problem is that often we want to get an idea of how good a pass is before running it. If we were to e.g. run the random one for ten steps and see if it did anything before we decide whether to run the full thing, that would help a lot with the pass selection problem (how we know which reduction pass we should run right now).

This example also makes it seem easier than it is. Consider the following reduction pass:

def put_in_order(self):
    for i in range(len(self.best_test_case)):
        for j in range(i + 1, min(len(self.best_test_case), i + 9)):
            if self.best_test_case[i] > self.best_test_case[j]:
                attempt = list(self.best_test_case)
                attempt[i], attempt[j] = attempt[j], attempt[i]
                self.is_interesting(attempt)

This tries swapping all nearby out of order pairs in order to make the test case more sorted (which is a reduction lexicographically). Here all the same considerations potentially apply, but there are two indices that vary instead of one. The question of what order to try things in is also even more complicated, as potentially this one might need many passes to reach convergence (there are \(n!\) possible permutations of a sequence so I think this has the potential to take \(O(n!)\) calls to complete given an adversarial interestingness test).

All of this is problem set up as a preamble for telling you about how we solve the problem in Hypothesis, which is to consider a reduction pass as making a series of nondeterministic choices, which it must exhaustively try all of before reduction is considered to have finished. The API for making these choices looks as follows:


class DeadBranch(Exception):
    """Raised when a reduction pass is unable to continue from
    the current point."""


class Chooser:
    """A source of nondeterminism for use in reduction passes."""

    def choose(self, values, condition=lambda x: True):
        """Returns a value ``x`` in ``values`` such that
        ``condition(x)`` is True, possibly raising ``DeadBranch``
        if there are no such values, or if it turns out we have
        already exhausted all remaining values. """

The reduction passes above now look as follows:


def greedy_deletion(self, chooser):
    i = chooser.choose(range(len(self.best_test_case)))
    self.is_interesting(self.best_test_case[:i] + self.best_test_case[i + 1:])

def put_in_order(self, chooser):
    i = chooser.choose(range(len(self.best_test_case)))

    # NB: Possible no such j exists in which case this raises.
    j = chooser.choose(
        range(i + 1, min(len(self.best_test_case), i + 9)),
        lambda j: self.best_test_case[j] < self.best_test_case[i]
    )

    attempt = list(self.best_test_case)
    attempt[i], attempt[j] = attempt[j], attempt[i]
    self.is_interesting(attempt)

The reducer is now free to try these in any order it likes, without the reduction passes overspecifying it.

Currently we try two different orders in different contexts:

  1. Uniformly at random.
  2. Iterating backwards in reverse lexicographic order.

Switching between the two according some frankly fairly arbitrary internal heuristics that seem to work pretty well but are clearly suboptimal.

The thing that’s worth noting however is the representation. You can see it in the relevant Python file, but the important features of it are basically:

  1. We maintain a rose tree, lazily built as we walk it, where each node has len(values) children, values being the collection of choices made at that point. Each node also maintains a count of how many “live” (that is, not fully explored) children it has.
  2. When we complete running a pass step, we mark the final node as dead, decrement the count on its parent and if that is zero also mark it as dead and do the same with its parent etc.
  3. Every time we successfully reduce the current target, we delete the tree and start over with a fresh one, but for deterministic iteration start from the same point in the tree (to the greatest degree possible)

Essentially we are representing a step in a reduction pass as a sequence of integers representing nondeterministic choices, and stopping when the full tree has been explored for the given shrink target. As well as turning out to be quite a convenient representation, this is pleasingly similar to how Hypothesis represents test-case generation.