DRMacIver's Notebook
Adaptive parallel test-case reduction
Adaptive parallel test-case reduction
There’s a very simple algorithm I call “adaptive parallel deletion”, which I think of as completely standard and thus often forget that literally nobody but me uses. I haven’t written it down anywhere, so here’s me finally getting around to doing that.
This algorithm is a component of test-case reduction that replaces the classic delta debugging algorithm - that it is, it takes some sequence and produces a subsequence of it satisfying some condition, such that if you remove any one element from its output the condition is no longer satisfied.
The core features of adaptive parallel deletion are:
- When there are large reductions that can be made, it will usually
run in
O(log(n))
. - When there are no reductions that can be made, it will run in
O(n)
, but with the work scaling to as much parallelism as you can give it. - It will scale fairly gracefully in between the two.
Here it is:
def single_pass_delete(test_case, condition):
"""Returns a version of `test_case` that satisfies `condition` and
is "one-minimal" in the sense that you cannot delete any single
member of it and have it still satisfy the condition."""
= list(test_case)
test_case
def can_delete(j, k):
return condition(test_case[:j] + test_case[k:])
= 0
i while i < len(test_case):
try:
# Scan in parallel for the first location where we can delete
# a single element of the list. If no more elements are deletable
# this will run with as much parallelism as we can give it.
= find_first(range(i, len(test_case), lambda j: can_delete(j, j + 1)
i except NotFound:
break
del test_case[i]
# Now that we've found an area where there's at least one deletable
# element, we hope that it's part of a large chunk of deletable elements.
# This finds an integer k so that we can delete the next k elements
# and not the next k + 1 elements, running in O(log(k)). This hopefully
# lets us delete a reasonable large chunk at a time.
= find_integer(lambda k: i + k <= len(test_case) and can_delete(i, i + k))
k del test_case[i:i+k]
# Because we chose k such that k + 1 elements are not deletable, we know
# that the element left at i (if any, we might have deleted the whole list)
# is now not deletable, so we increment i to skip past it on the next
# iteration. This is purely a performance optimisation, but in the worst
# case might cause us to make twice as many calls to `condition` as we
# need to if we skip it.
+= 1
i
return test_case
Where the functions this uses are find_integer
that I’ve
previously
written about here which is just an exponential probe and binary
search (so runs in O(log(k))
when it returns
k
) and find_first
is defined as follows:
import os
from itertools import islice
class NotFound(Exception):
pass
def find_first(target, predict):
"""Returns the first element of `target` that satisfies `predicate`,
or raises `NotFound` if none do."""
if not target:
raise NotFound()
= os.cpu_count()
parallelism
if parallelism <= 1:
for x in target:
if f(x):
return x
raise NotFound()
else:
with ThreadPoolExecutor(max_workers=parallelism) as executor:
= iter(target)
it = 1
chunk_size while True:
= list(islice(it, chunk_size))
chunk if not chunk:
raise NotFound()
for x, b in zip(chunk, executor.map(f, chunk)):
if b:
return x
*= 2 chunk_size
In particular find_first
is designed so that when it
does not find an answer it scales up rapidly to be embarrassingly
parallel, but when it finds an answer early on it doesn’t do much
redundant work (you still have the overhead of the thread pool and such,
but it doesn’t actually run much extra code).