DRMacIver's Notebook

An Annoyingly Hard Algorithm to Beat

Published: 2020-07-22

I have a vice, and that vice is clever solutions. I love it when I can hit a problem with some giant theory hammer that cracks it right open. I don’t claim this is a good way to solve problems, but it sure is aesthetic.

So let me tell you about an algorithm I resent so much. It looks like this:


failures = 0

# 10 is arbitrary and the value that works best
# varies by problem.
while failures < 10:
    try_a_random_thing()

    if the_thing_succeeded():
        failures = 0
    elif the_thing_was_bad():
        failures += 1

I call these failure/success loops. As far as I know they’re idiosyncratic to me but I’m sure it’s a reinvention because they’re so obvious.

The basic idea is that you can use them to spend more time on things that have a high chance of succeeding. The parameter value (10 in this case) basically reduces variance at the cost of higher overhead, but either way you end up running things that have a higher chance of working for longer (I haven’t bothered to do the maths of the exact distribution, because I inevitably end up using these in situations where success changes the probabiltiy of future successes anyway).

Why do I resent the failure/success loop algorithm?

Well because it’s so dumb and it keeps beating better options.

For example suppose you have a multi-armed bandit problem, where you have \(n\) arms and you want to pull arms that succeed and not pull arms that fail. Try the following:


def bandit(arms, max_pulls):
    pulls = 0

    while True:
        for arm in arms:
            failures = 0
            # Why three? Who knows! But somehow three
            # often ends up being a good choice here.
            while failures < 3:
                result = arm.pull()
                pulls += 1
                if pulls >= max_pulls:
                    return
                if result:
                    failures = 0
                else:
                    failures += 1

This keeps pulling each arm until it’s failed three times in a row, then moves on to the next one.

Is this optimal? No, obviously not. This is a well studied problem with lots of interesting theory about regret bounds. From a theoretical point of view this algorithm is garbage.

And also every damn time I try to write a multiarmed bandit code and feel super pleased with myself at getting to use some clever theory I find rough edges, go “Hmm… What if I just insert a bit of pregaming to prime the bandit with better information?” and then before I know it I’ve deleted the bandit code and am left with something that looks like the above.

Why does this keep happening?

I think there are a couple of reasons.

The first is that the situations where they end up beating out clever algorithms for me tend to be ones where nothing is going to work especially well. For example the above bandit algorithm seems to work pretty well in cases where most of the bandit arms have no chance of working and a few of the arms have a not-terrible chance of working. What ends up happening in these cases is that clever algorithms end up not buying you much, occasionally get stuck in weird ways, and you pay high overhead for their complexity.

The second is that the virtue of this algorithm is it’s incredibly simple and thus it’s easy to tinker with - you can fiddle with the parameter on the fly, you can try other things in the middle, and it pretty much all works fine. There’s no complicated state to maintain, and there’s plenty of room to maneuver as the result of the simplicity, so even if the base algorithm doesn’t work something like it will.

In some sense this is perhaps evidence that the algorithm is less dumb than it seems, but that doesn’t mean I resent it any less.