Читать книгу Analysing Quantitative Data - Raymond A Kent - Страница 47
Open-ended questions
ОглавлениеMost surveys contain one or more open-ended questions where responses are recorded as words, phrases, sentences or even more extended text. To be used in quantitative data analysis, the responses need to be categorized and each category given a code. The result should be either a binary or a nominal measure such that the values are exhaustive and mutually exclusive, or a fuzzy set giving degrees of membership of a defined category.
The approach to coding can be split into two situations. In the first situation, the open-ended question is being used to capture factual information, since listing all the options for responses in a closed question would take up too much space. Where respondents can give their answer in numerical form, for example putting in their age, then no additional coding is necessary. The actual age can simply be put into the data matrix. Where responses are in words, like brand purchased last time, then coding will involve creating a list of all the possible answers, assigning a code to each and recording a code for each respondent’s answer. It may be necessary to develop coding rules which specify codes to be allocated when the answer does not fit any of the obvious categories. For example, if respondents are asked ‘Not counting yourself, how many other people were you with?’ then most will give a clear number, but some may say ‘30–40’ or ‘a lot’. In this situation, one rule might be to give the mid-point of a range of values, so the answer ‘30–40’ will be coded as 35.
Where open-ended questions are being used not to capture factual information but to record respondent opinions, attitudes, views, knowledge, and so on, then creating a sensible code frame is the most important part of the analysis. By definition this is likely to get quite complex – if it were easy then the question could no doubt be pre-coded! The aim is to formulate a set of categories that accurately represents the answers and where each category includes an appreciable number of responses. Ideally, the set of categories should be exhaustive, mutually exclusive and minimize the loss of information. Furthermore, they should be meaningful, consistent and relatively straightforward to apply. There may also need to be separate codes for ‘No response’, ‘Not applicable’ and ‘Don’t know’. Where the information is very detailed there may need to be many codes.
Developing a frame may require several ‘passes’ over the data. It is probably a good idea to have all the comments collected and typed out, but this may not be possible. A method of constant comparison is probably best. Begin by looking at a few of the comments and see whether they should be put into separate categories. Then look at a few more and see if some can be put into the same category or whether more categories will need to be developed. When too many categories begin to emerge, look for similarities so that some categories can be brought together. If there are a large number of responses then it may not be sensible to look through all of them to develop the fame, but take a sample. Thus if there are 500 cases, a sample of 50–100 should enable the frame to be finalized. It also helps if more than one person develops a code frame separately; they should then work together on a final code. This maximizes the validity and reliability of the process.
It helps if the researcher sets up the objectives for which the code frame is to be used before beginning the process. Thus if the objective is to look for positive and negative statements about a situation or a product then answers will be coded along this dimension, perhaps with categories of very positive, vaguely positive, mixed, vaguely negative and very negative. Sometimes answers to open-ended questions can be coded in several ways according to different dimensions. Thus a study of injuries following an earthquake could look at the way injuries occurred, the parts of the body affected, where the injury occurred, what the person was doing at the time, and so on. Each of these aspects may need to be recorded separately in a different variable.
At one time researchers had to code all open-ended questions before data entry could begin. With modern survey analysis packages like SPSS, however, this may be done after all the pre-coded questions have been entered. This is a big advantage because researchers are not always sure how responses to open-ended questions should be coded until they have started analysis of the data. In short, it is sometimes better to delay coding of open-ended responses until they are needed for analysis.