Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 75
2.28.3 The Issue of Standardized Testing: Are Students in Your School Achieving More Than the National Average?
ОглавлениеTo demonstrate how adjusting the inputs to zM can have a direct impact on the obtained p‐value, consider the situation in which a school psychologist practitioner hypothesizes that as a result of an intensified program implementation in her school, she believes that her school's students, on average, will have a higher achievement mean compared to the national average of students in the same grade. Suppose that the national average on a given standardized performance test is equal to 100. If the school psychologist is correct that her students are, on average, more advanced performance‐wise than the national average, then her students should, on average, score higher than the national mark of 100. She decides to sample 100 students from her school and obtains a sample achievement mean of . Thus, the distance between means is equal to 101 – 100 = 1. She computes the estimated population standard deviation s equal to 10. Because she is estimating σ2 with s2, she computes a one‐sample t‐test rather than a z‐test. Her computation of the ensuing t is:
On degrees of freedom equal to n − 1 = 100 – 1 = 99, for a two‐tailed test, we require a t statistic of ± 1.984 for the result to be statistically significant at a level of significance of 0.05. Hence, the obtained value of t = 1 is not statistically significant. That the result is not statistically significant is hardly surprising, since the sample mean of the psychologist's school is only 101, a single mean point higher than the national average of 100. It would seem then that the computation of t is telling us a story that is consistent with our intuition, that there is no reason to believe that the school's performance is higher than that of the national average in the population from which these sample data were drawn.
Now, consider what would have happened had the psychologist collected a larger sample, suppose n = 500. Using our new sample size, and still assuming an estimated population standard deviation s equal to 10 and a distance between means equal to 1, we repeat the computation for t:
What happened? The obtained value of t increased from 1 to 2.22 simply as a result of collecting a larger sample, nothing more. The actual distance between means remained the same (101−100 = 1). The degrees of freedom for the test have changed and are now equal to 499 (i.e., n − 1 = 500 − 1 = 499). Since our obtained t of 2.22 exceeds critical t, our statistic is deemed statistically significant at p < 0.05. What is important to realize is that we did not change the difference between the sample mean and the population mean μ0, it remained extremely small at only a single mean achievement point (i.e., 101 – 100 = 1). Even with the same distance between means, the obtained t of 2.22 and it being statistically significant at p < 0.05 now means we will reject the null hypothesis, and infer the alternative hypothesis that μ ≠ μ0. And because scientists have historically considered the infamous statement “p < 0.05” to be automatically and necessarily equivalent to something meaningful or important, the obvious danger is that the rejection of the null hypothesis at p < 0.05 is considered by some (or even most) a “positive” result. When in reality, the difference, in this case, is nothing short of trivial.
The problem is not that the significance test is not useful and therefore should be banned. The problem is that too few are aware that the statement “p < 0.05,” in itself, scientifically (as opposed to statistically) may have little meaning in a given research context, and at worst, may be entirely misleading if automatically assigned any degree of scientific importance by the interpreter.