DRMacIver's Notebook

Frequentist Statistics as a Tool of Critique

Published: 2018-11-01

Epistemic Status: I’m pretty confident this is valid, but I’m not an expert in statistics and I’m even less an expert in philosophy of statistics, so I’m less sure that this is useful, and if it is then it’s probably not novel.

The other day I figured out a framing of frequentist statistics that I quite like, which is that the core of frequentist statistics is something that you might call the model-based critique. The model-based critique goes roughly as follows:

If we accepted your argument, we would also have to accept the following variant of it in an idealised model (where we know the ground truth).
In said model, this undesirable outcome would happen with at least this probability.

For example, the structure of (two sided) significance testing can be thought of as follows:

If we accept your argument that this estimator taking this value indicates that the true parameter must be non-zero, then certainly the parameter being larger should count as the same.
Here is a model that produces data that is in some sense like the data that you are testing on, but where the true parameter is zero (the null hypothesis).
Under data generated by this model, the threshold you have set would still report there being a real difference with probability \(p\) (the p-value).
Therefore either you have to explain why the model we have proposed is missing some essential feature of your real process, or you accept that your argument will claim a real result where there is in fact only noise about that often. (which, depending on how often that is, might be fine!)

Thus significance testing (and model-based critique in general) is not a tool of inference, but instead a tool of critique - quantitative rhetoric if you will. It puts forth a counter-argument to a claim that the data is sufficient to support some conclusion or premise.

The thing I like about this framing is that it makes the following three things much more apparent:

This is a valid thing to do.
It’s even a useful thing to do.
It’s not, however, a tool of inference, but instead a tool of refutation. When something is “statistically significant” what that means is we have failed to convincingly argue against it, not that it is true (or even likely!).

It’s also nice that it’s much more explicit about the relationship between the model and the experiment. I think most normal framing of frequentist statistics pretends that the model is in some sense true, while that is not an important feature of the model-based critique - instead it merely has to be a convincing analogy.

This framing has also shifted my opinion of frequentist statistics. It’s not that I like it more or less, but previously my attitude was mostly “Eh, whatever. It’s not my favourite thing, but I don’t think I have a strong opinion on this and it’s not one of the battles I choose to fight” while now I think the following:

Frequentist statistics is a useful and valid tool and most of the criticisms of it are, I think, treating it as trying to do something that it’s not (and, in fairness, most of the proponents of it probably are too).
Most of the way that it’s used in practice by people with a typical scientist’s level of understanding of statistics are probably deeply flawed.
It’s probably easier to get people to stop using p-values than it is to get them to acquire the level of statistical sophistication that allows them to be a useful tool.