Читать книгу Practical Field Ecology - C. Philip Wheater - Страница 64

Asking questions about data

Оглавление

If we wish to ask specific questions of the data, then we are in the realm of inferential statistics. These usually involve the testing of hypotheses. It is standard practice to set up a null hypothesis alongside the questions to be asked. The null hypothesis tests the chance of there being no significant difference between samples (or relationship between variables, or association between categories of variables). So if we wish to know whether there is a difference between two samples (e.g. comparing the number of birds found in deciduous woodlands with the number found in coniferous woodlands), then we actually test the null hypothesis that: there is no significant difference between the number of birds in deciduous and coniferous woodlands. Note that we are looking at ‘significant’ differences. These are differences that are unlikely to have resulted from random variation in the individual woodlands sampled. For this we need a method that tests the null hypothesis that there is no significant difference in the sample averages. In addition to difference tests between samples, there are also relationship tests between variables, and tests designed to examine associations between categories of variables. Table 1.3 summarises some commonly used, relatively simple, statistical approaches to these research questions.

Since there are various questions that we might ask as part of an investigation, it is important to be clear about possible analysis methods in advance of any sampling. The choice of test depends not only on the question being asked, but also on the data types being used. Where data are ranked, but not measured (i.e. ordinal data – p. 27) then a suite of tests called nonparametric tests may be used. The alternative (using parametric tests) is more robust and generally preferred, but requires data to be on a measurement scale (i.e. interval/ratio data). Therefore, it is usually an advantage to obtain measurement data rather than to rank data wherever possible. Even where measurements are taken, parametric tests may not be the most appropriate. This is because most parametric tests require the data to conform to a type of distribution called a normal distribution. Briefly, this is determined by examining histograms of the data (with the variable of interest plotted on the x axis and the frequency of its occurrence on the y axis) to see whether they have a symmetrical pattern (see Figure 1.6). For further details about the shape of distributions, and of which test to use, see Chapter 5. There are also different tests depending whether the data are matched or unmatched (p. 305).

To illustrate some of the considerations in project design and data collection, we start with a research question that sounds relatively simple on the face of it: is there a relationship between the size of trees and the number of squirrels' dreys in the canopy of the trees? Ideally, we would want to measure the canopy height with some degree of accuracy. This would enable us to work out whether the relationship exists using a parametric statistical technique called Pearson's product moment correlation analysis (p. 308). However, it may be difficult even to see the tops of very tall trees and those obscured by other trees. Thus, we may estimate tree height, perhaps into several groupings. We can of course rank these data, but this means that we need an alternative approach for analysis that is suitable for ordinal data. This is Spearman's rank correlation coefficient analysis, which is not quite as powerful as the Pearson's method. The power of the test is its ability to detect a true relationship (or difference, or association) if one exists. If we knew that any such relationship was likely to be fairly weak, then the less powerful technique might not reveal it and we could be wasting our time in not measuring the trees relatively accurately to obtain measurement data and thus employ the more powerful test. Alternatively, if we are only interested in revealing strong relationships, then using ranked size classes to indicate tree height may be acceptable. The other complexities in this apparently simple question include ensuring that all other aspects are as constant as possible (e.g. species of tree, surrounding landscape, density of the squirrel colony, etc.).


Figure 1.6 Data set approximating to a normal distribution.

Practical Field Ecology

Подняться наверх