DRMacIver's Notebook

Improving Binary Search by Guessing

Published: 2019-04-30

The following code (lightly modified from Hypothesis code for clarity) is among the most useful things I have ever written:


def find_integer(f):
    """Finds a (hopefully large) integer n such that f(n) is True and f(n + 1)
    is False. Runs in O(log(n)).

    f(0) is assumed to be True and will not be checked. May not terminate unless
    f(n) is False for all sufficiently large n.
    """
    # We first do a linear scan over the small numbers and only start to do
    # anything intelligent if f(4) is true. This is because it's very hard to
    # win big when the result is small. If the result is 0 and we try 2 first
    # then we've done twice as much work as we needed to!
    for i in range(1, 5):
        if not f(i):
            return i - 1

    # We now know that f(4) is true. We want to find some number for which
    # f(n) is *not* true.
    # lo is the largest number for which we know that f(lo) is true.
    lo = 4

    # Exponential probe upwards until we find some value hi such that f(hi)
    # is not true. Subsequently we maintain the invariant that hi is the
    # smallest number for which we know that f(hi) is not true.
    hi = 5
    while f(hi):
        lo = hi
        hi *= 2

    # Now binary search until lo + 1 = hi. At that point we have f(lo) and not
    # f(lo + 1), as desired..
    while lo + 1 < hi:
        mid = (lo + hi) // 2
        if f(mid):
            lo = mid
        else:
            hi = mid
    return lo

This is loosely based on ideas from timsort.

The reason this code is so useful is that it lets you make many operations adaptive at zero cost: If you have an operation that it may be useful to do some large number of times, you can compose it and do only \(O(\log(n))\) checks to do \(O(n)\) work.

In particular the nice feature of this that justifies the term adaptive is that the amount of work it does is logarithmic in the size of the output. It either gives you a very large result or costs very little.

An example of how this gets used in test-case reduction is the following:

def reduction_pass(ls, predicate);
    ls = list(ls)
    i = 0
    while i < len(ls):
        # Will delete a sequence of length n in O(log(n))
        k = find_integer(lambda k: predicate(ls[:i] + ls[i + k:]))
        del ls[i:i + k]
        i += 1

This allows for a test-case reduction pass that never does substantially more work than the naive greedy algorithm which tries deleting one item at a time, but can potentially do very large deletions with much less work than that when the opportunity arises.

A neat variant I realised earlier is that you can use this function to define the following variant of binary search:

def binary_search_with_guess(f, lo, hi, guess=None):
    """Find n such that lo <= n < hi and f(lo) == f(n) != f(n + 1). It is
    assumed that f(hi) != f(lo) and will not be checked.

    ``guess`` is a prediction of the value of n and defaults to lo.
    This function runs in O(log(abs(guess - n))).
    """

    if guess is None:
        guess = lo

    assert lo <= guess < hi

    good = f(lo)

    if f(guess) == good:
        # Our guess was equivalent to lo, so we want to find some point after it.
        k = find_integer(lambda k: guess + k < hi and f(guess + k) == good)
        return guess + k
    else:
        # Our guess was equivalent to hi , so we want to find some point before it.
        k = find_integer(lambda k: guess - k >= lo and f(guess - k) != good)
        return guess - k - 1

This is a binary search with a twist, which is that you start with a guess as to what you expect the answer to be. The cost of the search is still logarithmic, but it’s logarithmic with respect to how bad your guess is rather than how large the range of the search is. If you can rely on your guess being pretty good, this will sometimes let you do the binary search in \(O(1)\) instead of \(O(\log(n))\).

Naturally this can’t improve on binary search (which is optimal) in the general case, so how useful this is depends entirely on how good your guess is. If your guess is maximally bad (i.e. you guess one end and the correct answer was the other), this can make up to twice as many calls to the test function as a classic binary search does, but as long as your guesses are on average pretty good it will tend to win out.

The reason this came up for me was that I’m working on a variant of Angluin’s L* algorithm with the Rivest and Schapire modifications. Without going into details, you have a sequence of \(N\) elements. You know the first one is good, and the last one is bad, and you want to find a good element which is directly followed by a bad one, and so so using binary search. You then make a change to your model that makes that bad element good, but this can have an unknown knock-on effect on the other elements, so you need to repeat the procedure. The use of the guess here is this: Generally speaking, most elements are bad, and elements rarely go from good to bad as a result of our fix procedure (it absolutely can happen, it just usually doesn’t). This means that a pretty reasonable guess is \(0\) on the first iteration, and the element we just made good on the subsequent ones (because the elements before it probably haven’t changed from good to bad, and the element we just fixed is probably now good).

How much of an improvement this is would require more benchmarking than I’ve done (i.e. anything other than eyeballing the times), but anecdotally this guessing heuristic seems to be mostly very accurate, so in theory it should be a factor of five to ten performance improvement just based on the size of the examples (because it cuts out a \(\log(n)\) factor and the sequences are a few hundred items long)..