DRMacIver's Notebook
Caching interactions with an arbitrary interface
Caching interactions with an arbitrary interface
This is a trick I’ve figured out recently for Hypothesis. I’ve yet to decide whether it’s something I want to use, but it’s a neat trick. It’s much more easily implemented in a dynamically typed language, though I expect you could make something like it work in a statically typed language relatively easily with macros and metaprogramming and suchlike.
Suppose you have some interface that has the following two properties:
- All of the arguments to its methods are immutable.
- All of the return values of its methods are both immutable and hashable (for the sake of simplicity I will assume that none of its methods throw exceptions, but this can easily be made to work if they do).
Now suppose you have a deterministic function that takes any implementation of this interface and returns an immutable value. You can cache that function, which may be a big win if it is very slow (as it may be if it’s a complicated test function as in Hypothesis).
You do not need to know anything about the implementation of the interface in order to do this. The cache will work by returning the value of any prior implementation of the interface that it is called with that is observationally equivalent to the current implementation.
How could this be?
Because the function is deterministic, what it does next is determined entirely by what has happened so far, which is in turn determined entirely by the sequence of return values from the object it is called with. Thus every result of calling the function corresponds to a unique sequence of return values. Additionally, none of these sequences can be a prefix of another, because that would be non-deterministic behaviour - once the function ran and called another method after this point, another time it didn’t.
This structure makes it very easy to store the results in a tree. Each node either records which method was called and its arguments, or that no more methods were called and is a leaf storing the return value, or stores a sentinel value to record that we don’t know what happens here and need to ask the underlying test function (this is helpful for the representation):
import attr
from collections import defaultdict
@attr.s()
class TreeNode(object):
"""Node wrapping a previous observation. Having a mutable
wrapper class around the value makes the implementation a lot
tidier."""
= attr.ib(default=None)
observation
@attr.s()
class Result(object):
"""A previously observed return value."""
= attr.ib()
value
@attr.s()
class Decision(object):
"""A previously observed call to the underlying implementation."""
= attr.ib()
method = attr.ib()
args = attr.ib(default=attr.Factory(lambda: defaultdict(TreeNode))) children
To populate the tree you use the following wrapper around the implementation which proxies method calls to the underlying implementation and puts the observations in the tree:
class WrapperImplementation(object):
def __init__(self, tree, underlying):
self.__tree = tree
self.__underlying = underlying
def __getattr__(self, name):
= getattr(self.__underlying, 'name')
base if not callable(base):
return base
def accept(*args):
if self.__tree.observation is None:
self.__tree.observation = Decision(name, args)
= base(*args)
rv self.__tree = self.__tree.observation.children[rv]
def call_recorded(fn, tree, implementation):
"""Call fn(implementation) and update the tree to reflect the
interaction."""
return fn(WrapperImplementation(tree, implementaiton))
We can then define a cached version of the test function as follows:
class UnknownResult(Exception):
pass
def simulated_function(tree, implementation)
"""Either returns a previous result we've saved in the tree from
calling this implementation, or raise UnknownResult."""
= tree.observation
previous
if previous is None:
raise UnknownResult()
if isinstance(previous, Result):
return previous.value
else:
= getattr(implementation, previous.method)(*args)
rv = previous.children[rv] tree
In order get a cached outcome of an arbitrary implementation it’s a
little complicated and I can’t be bothered to sketch out the code: If
you can reset the implementation to empty it’s easy, you just call it
with the simulated function, reset it, and then call it recorded with
the real function. If you can’t reset it then you can define a new
wrapper implementation that replays the prefix that was observed in
simulated_function
without calling the underlying
implementation, then starts calling the real implementation once you’re
in unknown territory. You can implement this in terms of a resettable
wrapper around the implementation.
How useful is this technique? Unsure. It’s pretty niche. I think it might help a lot in Hypothesis though, where we’re currently having some significant scalability issues for large and hard to reduce examples, and having something like this will allow us to avoid keeping them around because of how easy it is to recreate the data without reinvoking the test function.