Читать книгу Methods in Psychological Research - Annabel Ness Evans - Страница 84

Common Tests of Significance.

Results will be referred to as either statistically significant or not statistically significant. What does this mean? In hypothesis-testing research, a straw person argument is set up where we assume that a null hypothesis is true, and then we use the data to disprove the null and thus support our research hypothesis. Statistical significance means that it is unlikely that the null hypothesis is true, given the data that were collected. Nowhere in the research article will you see a statement of the null hypothesis; instead, you will see statements about how the research hypothesis was supported or not supported. These statements will look like this:

With an alpha of .01, those wearing earplugs performed statistically significantly better (M = 35, SD = 1.32) than those who were not (M = 27, SD = 1.55), t(84) = 16.83, p = .002.

The small difference in happiness between married (M = 231, SD = 9.34) and single individuals (M = 240, SD = 8.14) was not statistically significant, t(234) = 1.23, p = .21.

These statements appear in the results section and describe the means and standard deviations of the groups and then a statistical test of significance (t test in both examples). In both statements, statistical significance is indicated by the italic p. This value is the p value. It is an estimate of the probability that the null hypothesis is true. Because the null hypothesis is the opposite of the research hypothesis, we want this value to be low. The accepted convention is a p value lower than .05 or, better still, lower than .01. The results will support the research hypothesis when the p value is lower than .05 or .01. The results will not support the research hypothesis when the p value is greater than .05. You may see a nonsignificant result reported as ns with no p value included.

You will find a refresher on statistical inference, including a discussion of Type I and Type II errors, and statistical power in Chapter 4.

Researchers using inferential techniques draw inferences based on the outcome of a statistical significance test. There are numerous tests of significance, each appropriate to a particular research question and the measures used, as you will recall from your introductory statistics course. It is beyond the scope of our book to describe in detail all or even most of these tests. You might want to refresh your memory by perusing your statistics text, which of course you have kept, haven’t you? We offer a brief review of some of the most common tests of significance used by researchers in the “Basic Statistical Procedures” section of this chapter.

Going back to the results section of our example article, we see that the author has divided that section into a number of subsections. The first subsection, with the heading “Mood,” reports the effect of light on mood. It is only one sentence: “No significant results were obtained” (Knez, 2001, p. 204). The results section is typically brief, but the author could have provided the group means and the statistical tests that were not statistically significant. The next subsection, titled “Perceived Room Light Evaluation,” provides a statistically significant effect. Knez (2001) reports a significant (meaning statistically significant) gender difference. He reports Wilks’ lambda, which is a statistic used in multivariate ANOVA (MANOVA; when there is more than one DV), and the associated F statistic and p value for the gender difference, F (7, 96) = 3.21, p = .04. He also includes a figure showing the mean evaluations by men and women of the four light conditions and separate F statistics and p values for each condition.

In the subsections that follow, Knez (2001) reports the results and statistical tests for the effect of light condition on the various DVs. He reports one of the effects as a “weak tendency to a significant main effect” (p. 204) with a p value of .12. We would simply say that it was not statistically significant, ns. Indeed, many of Knez’s statistical tests produced p values greater than .05. We bring this to your attention as a reminder that even peer-reviewed journal articles need to be read with a critical eye. Don’t just accept everything you read. You need to pay attention to the p values and question when they are not less than .05. You also need to examine the numbers carefully to discern the effect size.

What is noticeably missing from the results section of Knez (2001), our example article, is a calculation of effect size. Effect size gives us some indication of the strength of the effect (see Chapter 4 for more detail). Remember, statistical significance tells us that an effect was likely not due to chance and is probably a reliable effect. What statistical significance does not indicate is how large the effect is. If we inspect the numbers in Knez’s article, we see that the effects were not very large. For example, on the short-term recall task, the best performance was from the participants in the warm-lighting conditions. They had a mean score of 6.9 compared with the other groups, with a mean score of about 6.25. A difference of only 0.65 of a word on a recall task seems like a pretty small effect, but then again, one would hardly expect that lighting conditions would have a dramatic effect on performance.

Once you have finished reading the introduction, method, and results sections, you should have a pretty good idea about what was done, to whom, and what was found. In the discussion section, you will read the researcher’s interpretation of the research, comments about unexpected findings, and speculations about the importance of the work or its application.

Подняться наверх