Читать книгу Analysing Quantitative Data - Raymond A Kent - Страница 19

На сайте Литреса книга снята с продажи.

Values

Values are what researchers actually record as a result of the process of assessing properties. Such records may relate either to variables or to set memberships. The values recorded for variables arise from one or more of the activities of classifying, ordering, ranking, counting or calibrating the characteristics of cases. All variables at a minimum classify cases into one of two values, but there may be many or even an infinite number of possible values. The range of values deployed to record a case property either may consist of a defined number of categories that are mutually exclusive – they do not overlap – and are exhaustive of all the possibilities, or are a result of applying a metric. The former may be called ‘categorical’ or ‘non-metric’ variables, the simplest of which are binary variables. These consist of a record of the presence or absence of a property. Thus an organization may be commercial or non-commercial; a nation-state may be democratic or non-democratic; an individual may be married or not married. Some variables are naturally binary, for example a product is either on the shelf in a supermarket or it is not. Some variables with a limited number of possible values may be readily converted into binary sets, for example employment status. If the possibilities are ‘employed full time’, ‘employed part time’ or ‘unemployed’ then these can become either ‘employed full time/not employed full time’ or ‘employed, either full or part time/unemployed’. However, there are nearly always complicating issues. Even the apparently simple distinction between married and not married becomes complicated by decisions about whether ‘not married’ includes divorced, separated, widowed or in a civil partnership. Despite these issues, much of our thinking is binary in nature. We often think that people are ‘right’ or ‘wrong’, that propositions are ‘true’ or ‘false’, that countries are ‘democratic’ or ‘undemocratic’. Binary logic, furthermore, has been at the forefront of developments in electronic circuits, computer science and computer engineering, which are all based on binary language.

Where cases are classified not into the presence or absence of a characteristic, but into contrasting groups, then we have a nominal variable. Dichotomies, for example, consist of two categories that represent two contrasting groups like black/white, male/female, or they may be polar opposites like rich/poor, fast/slow, instrumental/expressive. Such variables may be better treated as two binary variables, ‘rich/not rich’ and ‘poor/not poor’, or better still as fuzzy sets with degrees of membership of these categories. Fuzzy sets are explained below. Strictly speaking, dichotomies are not binary in the sense that white, for example, is not the absence of black, and female is not the absence of male. Cases are either type A or type B rather than A or not A. If there really are only two categories, as with gender, then it does no harm to treat such variables as if they are binary. However, for a yes/no answer in a questionnaire, if it is binary, then ‘no’ really means ‘not yes’ and may, for example, include those who refused to answer, did not have an answer, or the question was inapplicable. In a dichotomy, ‘no’ means that the answer ‘no’ was given and these other possibilities are excluded.

Nominal variables are sometimes converted into binary variables so that, for example, the dichotomy A/B becomes A/not A and B/not B. A trichotomy becomes A/not A, B/not B and C/not C. Statisticians sometimes call these dummy variables and they are useful because they have particular properties and can be used in some statistical procedures where nominal variables are inappropriate.

A key feature of nominal variables is that where there are three or more categories, the order in which the values appear in a table makes no difference to any statistical calculations that may appropriately be applied to the data. The values do need to be listed in some sequence (which might, for example, be alphabetical), but it is not a graduated series from ‘high’ to ‘low’ or ‘large’ to ‘small’. Some variables, however, define the relationships between values not just in terms of categories that are exhaustive and mutually exclusive, but the categories are also arranged in relationships of greater than or less than, although there is no metric that will indicate by how much. Thus product usage can be classified into ‘Heavy’, ‘Medium’, ‘Light’ and ‘Non-user’; there is an implied order, but no measure of the actual usage involved. The various social classes, social grades or socio-economic groups used in various European countries are good examples of such ordered category variables. The individual items used to generate summated rating scales such as the Likert scale, which were explained in the previous section, are also common examples of ordered categories.

In ordered category variables there is usually a limited number of categories into which researchers may map a large number of cases. So, 200 students might be mapped onto five degrees of the extent to which they say they have been bullied at school. However, in other situations it may be possible to rank-order each respondent. In ranked variables each case being measured is given its own ranking. Thus 30 hospital patients may be ranked 1–30 on the basis of their summated ratings of a hospital radio programme, or schools are ranked according to the performance of their pupils in examinations. We would normally rank-order only a fairly limited number of people or objects. To rank 300 people 1–300 would be rather cumbersome. Alternatively, respondents in a survey may be asked to rank a number of items; for example, customers may be asked to rank seven brands 1–7 in terms of value for money. Respondents may find this tricky, so paired comparisons may be used. If, for example, seven brands of beer are to be ranked, then respondents are asked to say which of two brands they prefer, taking each combination of pairs, of which there will be n(n − 1)/2 pairs or 21 combinations. The results can be converted into a rank order by counting the number of times each brand is preferred.

Binary, nominal, ordered category and ranked variables all use sets of values that are usually labelled in words. However, in order to be able to enter these values into data analysis software like IBM SPSS (which is introduced in the next chapter), the labels need to be identified by numbers which are used as codes that ‘stand for’ each value. There are few rules that suggest how this should be achieved. They may be assigned arbitrarily and it does not matter if we assign 1 = male and 2 = female, or 1 = female and 2 = male, or even 26 = male and 39 = female. What we certainly cannot do, for example, is, if we take 1 = male and 2 = female and we have 60 males and 40 females, calculate the ‘average sex’ as 1.4! As we will see later, any self-respecting computer will happily perform this calculation for you: the trick is to realize that the result is total nonsense. At the ordered category level, again we can assign the numbers arbitrarily, but they must preserve the order.

Metric variables arise when either there is a metric like age in years that can be used to calibrate distances between recorded values, or the values are a result of counting the number of instances involved as a measure of size. Think about how we might measure the size of a car park. We could either measure the area in square metres, or count up the number of parking spaces it provides. The first procedure might give any value in square metres and fractions of a square metre up to however many decimal places are required. The second method will produce only whole numbers or integers. In statistical parlance, the first is usually called a continuous variable and the second a discrete variable. The values for metric variables are numeric rather than in words, as with categorical variables.

Table 1.1 lists the variables used in the alcohol marketing study, the values recorded and the types of measure. Notice that only two of the variables are listed as continuous metric – the age at which respondents first had their alcoholic drink and the total units of alcohol last consumed. It is quite common in survey research that most of the variables are binary, nominal, ordered category or discrete metric.

Variables may be seen as containers, and each case has a place in each container (one container for each property) either in one of two or more compartments or at a certain ‘level’ inside the container. Set memberships, by contrast, focus on whether or not (or the extent to which) cases ‘belong’ in a container. Cases, then, may be members of some sets, but not others. Thus a nation-state may be a member of the sets ‘democratic’, ‘having strong trade unions’ and ‘low crime rate’, but not of the sets ‘unregulated press’ or ‘strict controls on the possession of guns by individuals’. The focus is then on which combinations of memberships characterize each case. Sets are based on notions of inclusion or exclusion; boundaries are defined in a way that creates containers into which cases may or may not be assigned.

Set memberships may be crisp or fuzzy. With crisp sets, cases are unambiguously members or not members of a set. Thus the UK is a member of the EU but not a member of the eurozone. Crisp sets are identical to binary variables in that they record the presence or absence of a characteristic, but they are allocated not codes but set membership values. For this reason, the latter are, in this text, indicated in square brackets. Full membership is always indicated with a value of [1] and non-membership with a value of [0]. Crisp sets have only these two values. Crisp sets are at the base of set theory and what has become known as Boolean logic. George Boole (1847) was a nineteenth-century mathematician and logician who developed an algebra suitable for properties with only two possible values. Set theory and Boolean logic are explained in more detail in Chapter 7 on configurational approaches to data analysis.

In reality, the world and large parts of social science phenomena do not come naturally in binary form. Membership of the category ‘democratic country’ or ‘profitable organization’ may be a matter of degree. Fuzzy sets record degrees of membership of a defined category by permitting membership values in the interval between [1] and [0]. They distinguish between cases that are ‘more in’ a set than out of it and are given values above [0.5], for example a value of [0.8] to indicate that a case is mostly in a set, and those more out of a set than in it are given values below [0.5]. The crossover point of [0.5] indicates cases that are neither in nor out of a set. It is the point of maximum ambiguity. This, Ragin (2000) emphasizes, should be conceptually defined according to the theory or theories being applied, or according to empirical evidence, research findings or researcher understanding of the cases involved. The researcher has to decide, for example, what being a ‘heavy’ viewer of television entails, for example in terms of hours viewing per day or per week and at what point a viewer is no longer in the set ‘heavy viewer’. This should not be an arithmetical mean or average, which is driven by the particular dataset used, but an absolute value that is not affected by other values in the set. The assessment of crisp and fuzzy sets is explained in detail in Chapter 7.

Подняться наверх