DRMacIver's Notebook

Ensuring Downward Paths

Published: 2020-08-07

I had some half-formed thoughts about test-case reduction that I wanted to work through, and this notebook is a place for working through half-formed thoughts, so here we go…

The thing I’ve been working on on and off for about a month now is using language inference to figure out new reduction passes.

It can be thought of as a way of taking an idea that doesn’t work (using L* for test case reduction) and combining it with an idea that works but doesn’t generalise well (learning a table of small string substitutions to make), and getting an idea that does work and does generalise.

This has me thinking: What, exactly, does it mean for an approach to test-case reduction to generalise?

Also does this approach generalise well? The big question with it is basically whether it works well when you change the interestingness test. I think it will, but it probably depends a lot on the choice of interestingness test and how close your reducer is to normalising it already.

In the approach to test-case reduction I favour, test-case reduction starts from set of test cases \(X\) and a total order \(\preceq\) over \(X\) called the reduction order, and a test-case reducer is a function \(r(U, x)\) where \(x \in U \subseteq X\), \(r(U, x) \in U\), and \(r(U, x) \preceq x\). For me usually \(X\) is a set of strings and \(\preceq\) is the shortlex order where \(x \preceq y\) if \(|x| < |y|\) or \(|x| = |y|\) and \(x \leq y\) lexicographically.

Your prototypical test-case reducer starts from a bunch of small-step reductions, which gives you a function \(t: X \to X^{<\omega}\), where \(t(x)_i \prec x\), and the reducer greedily tries each of \(t(x)_i\) in order. If none of them are in \(U\) it returns \(x\), otherwise it it recurses and returns \(r(U, t(x)_i)\) for the first \(i\) with \(t(x)_i \in U\).

More generally given such a function we might say that a reducer is \(t\)-local if it satisfies the following conditions:

\(t(U, x) = x\) if and only if \(t(x)_i \not\in U\) for all \(i\).
For all \(U, x\) is some sequence \(x = x_0, \ldots, x_n = r(U, x)\) with \(x_{i + 1} \in t(x)_i\) (note that we don’t necessarily require that \(x_i \in U\)).

That is, basically a \(t\)-local reducer is one that looks like the prototypical reducer but may have some optimisations and fiddly bits. e.g. delta debugging is a \(t\) local reducer where \(t\) is maps \(x\) to all test cases that can be formed by deleting a single element of \(x\). Delta debugging starts by trying large bulk combinations of these deletions and gradually narrows it down until it’s only trying the small ones. Another approach which uses the same \(t\) but a different algorithm is what I’ve called adaptive delta debugging.

Most reducers implemented in practice are roughly \(t\)-local for some \(t\) (often they fail to be \(t\)-local in some cases because they hit some internal limit and stop when they could be making more progress, and rerunning the reducer may produce further redctions) and I think this is good because it allows you to reason in terms of what transformations they are capable of performing.

So lets assume we have a \(t\)-local reducer. What makes it good?

Basically the tradeoff is this:

The larger \(|t(x)|\) tends to be the better you will reduce \(x\).
The smaller \(|t(x)|\) is the better your reducer tends to perform.

The ideal seems to be that you want \(|t(x)| = O(|x|)\), as per delta debugging, and the trick is to tweak the set within that constraints to try to get good results.

(Note that having \(|t(x)| = O(|x|)\) doesn’t mean that the overall reducer runs in \(O(|x|)\) - even if you reduce the size at every step you can easily be made to make \(O(|x|^2)\) membership checks, and if you just make lexicographic transformations it’s easy for performance to get exponentially bad)

The problem that you run into is that basically what you really want is not just for \(|t(x)|\) to be large, but for it to contain values that are in some sense “likely to be in \(U\)”. One way this can fail to happen is when there is some sort of precondition on the range of \(U\) we are interested in. The biggest example of this is when interesting test cases have to be syntactically valid according to some grammar. When this happens, effectively you are censoring \(t\) by replacing it with its syntactically valid subset. Often when something like delta debugging gets stuck it’s not because it’s not because of any intrinsic limitation related to the interestingness test, it’s just ended up at a point where \(t(x)\) contains no syntactically valid test-cases.

So, essentially what we want in order to get a good reducer for a format is to try to ensure that \(t(x)\) is always well populated with syntactically valid reductions of \(x\).

One way to do this might be to artificially construct interestingness tests that are basically designed to expose small reductions. I’m not quite sure what those would be. One thing I was thinking of is having interestingness tests of where \(x\) is interesting if \(y \preceq x\) for some fixed \(y\), then once those are normalised gradually “censor” the tests by removing all intermediate parts, stopping when you can’t learn anything other than the direct jump to \(y\).

This can be thought of as ensuring that whenever the final reduced result is \(y\) you have as rich a set of paths leading to it as the learning process is possibly able to find.

I don’t yet know how likely this is to work. I’m planning to do more experiments soon on getting this running on hypothesis-csmith and we’ll see how well it can learn that, but I think there’s some interesting stuff to investigate here.