DRMacIver's Notebook

What is probabilistic programming?

What is probabilistic programming?

Friends and family often ask me what probabilistic programming is. This isn’t very surprising, as I currently work in probabilistic programming. Unfortunately I haven’t had a very good answer for them, as this is quite a recent development for me and I haven’t fully understood the boundaries of the field.

I think also it’s hard to answer because the field is a little confused about what it is, and there’s a “What the field says it is” answer and “What the field actually is” answer.

So I’m going to try to give my own answer of what I think the field is. It will be a bit overbroad and claim some territory that could equally be put in other fields, but I think the overbroad definition is more informative than the overly narrow one.

First, let me give you a narrow answer that I think people would classically give:And the wikipedia page gives. Probabilistic programming is the field of developing probabilistic programming languages.

What’s a probabilistic programming language? Well it’s a particular type of tool for doing the thing I’m about to describe in the general answer.

So, this is what I think: Probabilistic programming is the field of constructing random samplers with precise distributional properties, centrally but not exclusivelyThat is, this is the origin of the field and the major driver of progress in it, but the techniques are more broadly useful and using them to do other things would still be considered probabilistic programming. with the goal of developing better Monte Carlo methods for doing statistics.

I will now unpack all of those terms so that this definition actually makes sense.

A random sampler is a program that, when run, generates some value, typically at random. You could imagine e.g. a coin flip, which randomly generates either “heads” or “tails” as your prototypical random sampler. The work I did in Hypothesis is also an example of a random sampler, where it generates test cases for checking your program with by making a bunch of choices at random.

Random samplers have the virtue of being very easy to construct - you can just write a simple program that makes a bunch of random choices and bam you’ve got a random sampler.

However, it’s often hard to reason about the behaviour of random samplers, and in particular to construct random samplers that behave in particular ways.

For example, suppose you were creating a random sampler to simulate some physical process - say, the weather - it’s quite important that if about 10% of your random samples say it’s going to rain tomorrow, then you should expect that about 10% of the time tomorrow there will be rain.What does that mean? Don’t worry about it too precisely. If you want to worry about it more precisely, read Probably enough probability for you and World counting as a tool for understanding probability

Having random samplers that have accurate distributional properties like this - e.g. that our random model of the weather accurately reflects what we should believe about the weather - is an essential property for doing monte carlo simulations, which just means that you run a bunch of random simulations and use those as a representative of the range for the true value you’re trying to calculate. e.g. if you want to know if it’s going to rain on Tuesday, run a few thousand weather simulations and see what fraction of them rain on Tuesday. If you want to know the total rainfall you should expect, average the total rainfall across all of your simulations, etc.

These are not the exact calculations that you might hope for in classical statistics - what you get is a random number, and if you reran the simulation you’d get a slightly different number, but it is a random number that you can count on being very close to the true number you want,Figuring out how close is part of the general body of theory needed and developed in doing probabilistic programming. and you can get it as close as you like by running more simulations.

Now, that’s all very well, but what this gives you is an accurate estimator of these values in your simulation. How do you get your simulation right?

This is where probabilistic programming and its toolkit for trying to control distributional properties of samplers comes in.

In particular it often comes in via trying to do Bayesian inference.

Bayesian inference works as follows:

  1. You start with some very general model of the world (a “prior distribution”), and what data you would expect to see given specific configurations of that world.
  2. You feed in a bunch of actual data.
  3. You get an updated model of the world (a “posterior distribution”) out.

A great deal of probabilistic programming is concerned with how to do this sort of Bayesian updating on random samplers.

This allows for a very general approach to doing statistics:

  1. You write a random sampler for a very broad world model.
  2. You feed in a bunch of data and do a Bayesian update on your sampler to take into account the data.
  3. You run Monte Carlo simulations and use them to make predictions about the world.

All of the hard part is in Step 2. This is an almost impossibly difficult problem to do in general. Many of the classic probabilistic programming tools try to do this, and what you get is tools that sometimes work very well and sometimes just run unusably slowly, with very little ability to predict which situation you’ll be in or much recourse for fixing it when it happens.

What the lab I’m in tries to do is to instead provide a toolkit that makes it easy to write your own algorithms for your specific problem, by providing an API for samples that you can perform a variety of manipulations on - either standard algorithms or ones of your own creation - that let you perform the update.

But, in general, doing this bit well is where most of the active work in probabilistic programming is, and why the tools have historically been a bit underused. We’re hopeful that a mix of new improvements in computing and new advances in probabilistic programming allow us to do a lot better than we’ve previously seen, but this is still an active research and development area, so we’ll see how that goes.