This is a weekend divergence, but a timely and appropriate topic. For one thing, it’s Super Bowl weekend; and I am putting together the “Statistical Inference Part 3” handout for my research methods course on hypotheses and hypothesis testing.

You are probably not aware of this but deflate gate (whether the Patriot’s intentionally deflated footballs) has kicked off a statistical analysis frenzy. Here is one by Sharp Football Analysis that has the attention of a blog I follow. The blog I follow is Flowing Data – which also appropriately refers to one from Regressing called “Why Those Statistics about the Patriot’s Fumbling Are Mostly Junk.”

Flowing data and Regressing offer pretty sophisticated commentary on the statistical assumptions made by the author from Sharp Football Analysis. I am not going to attempt to offer any additional statistical commentary. The fact is – regardless of how you look at it there is a difference between the Patriot’s fumbling and the other team’s fumbling in NFL games. It is overestimated in the Sharp Football Analysis, but there is a difference.

I want to talk about that for this post. There is a difference. As you know statistical analysis that puts forward the probability of a difference occurring is testing the probability of the data conditional on the assumption that no difference exists. This is the null hypothesis. Again, the null hypothesis is that there is no difference; and the test is trying to obtain:

Pr(data | null hypothesis is true) – – that is what a “p” value is testing; the conditional probability that the data would occur if the null hypothesis were true. And in the case of the Patriot’s, there is a low probability (how low depends on how you calculate it and I think Regressing has done a nice job with it) that the data would occur if the null were true. With a small p value we can “reject the null” so to speak. Let’s put aside the underlying assumption of a random sample for now – clearly these are not random samples so the entire approach of using statistics based on sampling distributions to estimate sample error should be questioned.

When we reject the null hypothesis we are happy to put forward the “alternate” hypothesis – the hypothesis that “MUST BE” true – that “HAS TO BE TRUE” because the alternate hypothesis is the opposite of the null, the null is wrong as we just rejected it, and we live in a world of non contradiction, so if the null is rejected the alternate is accepted – taken as true. Does this work? Yes, but with limits based on what the alternate is actually saying.

Let’s accept that the null is rejected (there is **not no difference** between the Patriots and other teams in the “effect” – fumbling). I bolded my double negative because that is what we do when we reject the null. We say – there is not no difference, therefore there is a difference (an effect, an association). The alternate is that there is a difference is accepted. Fine. Let’s accept that there is a difference. But that is all the alternate can say. All the alternate says is that there is a difference.

The next BIG question is – why is there a difference? In a controlled trial this is easier to answer. Things are controlled, even randomized, the only accepted difference between the conditions is a particular suspected causal agent. But this is not a controlled trial, these teams have not been selected or put together by a randomization.

Deflate gate is about one possible causal agent – deflated footballs. But the analyses done, the statistical probabilities reported, are not specifically testing that as an alternate hypothesis. The only thing they are showing is that there is a difference. Since this is not a controlled trial we are left to attempt to “abduct” what the best explanation of the difference might be – but that is not tested at all by the statistics being looked at for deflate gate, and only discussed a tiny bit by the analysts.

To accept the alternate (they are different) is NOT to accept a particular explanation of why the alternate occurred (the balls were deflated). There are other perfectly reasonable explanations of the alternate – THE PATRIOTS ARE A BETTER TEAM – and the better team will score more points, have less points scored against them, create more turnovers, PRODUCE LESS TURNOVERS. The numbers being produced do not justify one of these explanations for the alternate over the other – that is the point of a discussion. But what I have been reading are completely biased explanations for the alternate. As an example, when discussing great quarterbacks it is accepted that they throw significantly less interceptions, we reject the null, we accept the alternate that they are different. BUT, we use that data to explain why they are great quarterbacks, we do not assume they must be doing something with the footballs to throw less interceptions. So then why not interpret a team with less fumbles as evidence that they are a better team? Here I defend Sharp a bit.

Two things in the defense of Sharp Football Analysis. First, there is a reason to suspect deflated footballs as a possible explanation – footballs were found under pressure in the AFC Championship game. So it is not that this possible explanation comes out of the blue – there is a valid reason to put it forward as a possible explanation. **But being forward as a possible explanation does not make it THE explanation.** Second, they do put forward a series of other explanations (at the end of the post) that they discuss off hand and with sense of disregard:

“Could the Patriots be so good that they just defy the numbers? As my friend theorized: Perhaps they’ve invented a revolutionary in-house way to protect the ball, or perhaps they’ve intentionally stocked their skill positions with players who don’t have a propensity to fumble. Or perhaps still, they call plays which intentionally result in a lower percentage of fumbles. Or maybe its just that they play with deflated footballs on offense. It could be any combination of the above.”

“But regardless of what, specifically, is causing these numbers, the fact remains: this is an extremely abnormal occurrence and is **NOT simply random fluctuation**.”

We can agree – not simple random fluctuation. But that does not mean we can infer deflated balls. They disregard these other explanations but with no real justification other than the occurrence of deflated balls at one game. We can list a whole host of other reasons that explain the statistical difference. After all, a team will be better than the rest at things – and that team will appear extreme relative to the mean – and no one is saying it is due to random fluctuation. But just because it is not random fluctuation we cannot claim that it is because footballs were deflated.

The fact is, there is an extremely complex causal network for the difference and to claim deflated balls as the cause first requires all other causal paths to the effect of interest be considered with data to rule them out prior to inference of one particular cause.

I am all for listening to the data. But let’s keep in mind that without our guidance, the data has very little to say.

Hi,

Great blog!

Actually, p is not Pr(data | null hypothesis is true) but Pr(data OR MORE EXTREME data | null hypothesis is true)

Looking forward to reading the rest of your blog!

LikeLiked by 1 person

Dear Wu – thank you for your very insightful comment! You raise an interesting point about the p value. Can you explain a bit more by what more extreme data for a null hypothesis would be? I am glad you are enjoying the blog – hope you share more comments along the way.

Sean

LikeLike