DRMacIver's Notebook
Adaptive parallel test-case reduction
Adaptive parallel test-case reduction
There's a very simple algorithm I call "adaptive parallel deletion", which I think of as completely standard and thus often forget that literally nobody but me uses. I haven't written it down anywhere, so here's me finally getting around to doing that.
This algorithm is a component of test-case reduction that replaces the classic delta debugging algorithm - that it is, it takes some sequence and produces a subsequence of it satisfying some condition, such that if you remove any one element from its output the condition is no longer satisfied.
The core features of adaptive parallel deletion are:
- When there are large reductions that can be made, it will usually run in
O(log(n))
. - When there are no reductions that can be made, it will run in
O(n)
, but with the work scaling to as much parallelism as you can give it. - It will scale fairly gracefully in between the two.
Here it is:
def single_pass_delete(test_case, condition): """Returns a version of `test_case` that satisfies `condition` and is "one-minimal" in the sense that you cannot delete any single member of it and have it still satisfy the condition.""" test_case = list(test_case) def can_delete(j, k): return condition(test_case[:j] + test_case[k:]) i = 0 while i < len(test_case): try: # Scan in parallel for the first location where we can delete # a single element of the list. If no more elements are deletable # this will run with as much parallelism as we can give it. i = find_first(range(i, len(test_case), lambda j: can_delete(j, j + 1) except NotFound: break del test_case[i] # Now that we've found an area where there's at least one deletable # element, we hope that it's part of a large chunk of deletable elements. # This finds an integer k so that we can delete the next k elements # and not the next k + 1 elements, running in O(log(k)). This hopefully # lets us delete a reasonable large chunk at a time. k = find_integer(lambda k: i + k <= len(test_case) and can_delete(i, i + k)) del test_case[i:i+k] # Because we chose k such that k + 1 elements are not deletable, we know # that the element left at i (if any, we might have deleted the whole list) # is now not deletable, so we increment i to skip past it on the next # iteration. This is purely a performance optimisation, but in the worst # case might cause us to make twice as many calls to `condition` as we # need to if we skip it. i += 1 return test_case
Where the functions this uses are find_integer
that I've previously written about here which is just an exponential probe and binary search (so runs in O(log(k))
when it returns k
) and find_first
is defined as follows:
import os from itertools import islice class NotFound(Exception): pass def find_first(target, predict): """Returns the first element of `target` that satisfies `predicate`, or raises `NotFound` if none do.""" if not target: raise NotFound() parallelism = os.cpu_count() if parallelism <= 1: for x in target: if f(x): return x raise NotFound() else: with ThreadPoolExecutor(max_workers=parallelism) as executor: it = iter(target) chunk_size = 1 while True: chunk = list(islice(it, chunk_size)) if not chunk: raise NotFound() for x, b in zip(chunk, executor.map(f, chunk)): if b: return x chunk_size *= 2
In particular find_first
is designed so that when it does not find an answer it scales up rapidly to be embarrassingly parallel, but when it finds an answer early on it doesn't do much redundant work (you still have the overhead of the thread pool and such, but it doesn't actually run much extra code).