Читать книгу Multilevel Modeling in Plain Language - Karen Robson - Страница 16

Degrees of freedom and statistical significance

Оглавление

Degrees of freedom are a problem when using OLS to model multilevel relationships. When we use OLS and simply add group-level variables, such as region in the example above, we create a model that assumes individual-level degrees of freedom. At this point you may well be wondering, ‘What are degrees of freedom?’ – fair enough. As the name implies, degrees of freedom are related to how many of the values in a formula are able to vary when the statistic is being calculated. Our example data contains 13,646 students in eight different regions. These students then have 13,646 individual pieces of data. We use this information to estimate statistical relationships. In general, each statistic that we need to estimate requires one degree of freedom – because it is no longer allowed to vary. Many equations contain the mean, for example. Once we calculate a group mean, it is no longer able to vary. Again, once we calculate a standard deviation, we lose another degree of freedom. In the our examples above, degrees of freedom are determined from individual data, but if we have group characteristics in this individual-level data set, OLS calculates the degrees of freedom as though they are simply related to characteristics of individuals. In terms of group characteristics, the degrees of freedom should be based on the number of regions (8) rather than the number of pupils (13,646). The numbers – 13,646 versus 8 – are obviously very different. Degrees of freedom are integral in calculating tests of statistical significance. The resulting error from using the wrong degrees of freedom in OLS calculations is that it increases our likelihood of rejecting the null hypothesis when we should not. In other words, we are more likely to get statistically significant results – when we shouldn’t – if we use the individual level degrees of freedom instead of the group level degrees of freedom.

Table 1.7

Adapted from Diez-Roux (2000: 173)

Table 1.7 summarizes the OLS ‘workarounds’ discussed above and their associated problems. The overarching problem is that when you use OLS models on data better suited to multilevel techniques you are very likely to underestimate standard errors and therefore increase the likelihood of results being statistically significant, possibly rejecting a null hypothesis when you should not. In other words, you are more likely to make a Type I error. If you correctly model your multilevel data then your results will be more accurately specified, more elegant, and more convincing, and your statistical techniques will match your conceptual model.

Multilevel modeling, in general and specific aspects of it, has also come in for some criticism. As with many debates of this kind, there is unlikely to be a firm and final conclusion, but we do advocate that users of any technique are aware of the criticisms and current debates. So we suggest that you start with this series of papers: Gorard (2003a, 2003b, 2007) and Plewis and Fielding (2003).

Multilevel Modeling in Plain Language

Подняться наверх