Читать книгу Handbook of Web Surveys - Jelke Bethlehem - Страница 53
EXAMPLE 3.1 A metadata database: variables definitions
ОглавлениеThe Eurostat website is offering a metadata database that includes Euro‐SDMX Metadata Structure (ESMS) (a set of international standards for the exchange of statistical information between organizations), classifications, legislation and methodology, concepts and definitions (CODED, Eurostat's Concepts and Definitions Database, and other online glossaries relating to survey statistics), glossary, national methodologies, and standard code lists.
A section reports the description of the variables in different sources. Variable descriptions are detailed explanations of the researcher's intended meaning of the variable in the questionnaire, and it is one example of basic metadata. For example, for the purposes of the Labour Force Survey, the following definition is used: “Employees are defined as persons who work for a public or private employer and who receive compensation in the form of wages, salaries, fees, gratuities, payment by results or payment in kind; non‐conscripted members of the armed forces are also included.”
In structural business statistics, employees are defined as “those persons who work for an employer and who have an employment contract and receive compensation in the form of wages, salaries, fees, gratuities, piecework pay or remuneration in kind.”
Furthermore, a worker is a wage or salary earner of a particular unit if he or she receives a wage or salary from the unit, regardless where he or she works (in or outside the production unit). A worker from a temporary employment agency is considered to be an employee of the temporary employment agency not of the unit (customer) in which they work. Metadata states that “employees include part‐time workers, seasonal workers, person on strike or on short‐term leave, but excludes those persons on long‐term leave. Employees does not include voluntary worker.”
If the variable is not precisely declared, the respondents could compile the questionnaire according different concepts; one could exclude part‐time workers, whereas another could include them. Therefore, in such a survey, measurement error would arise, or a high number of nonresponses to the specific question (item nonresponse) would emerge due to the unclear variable definition.
Most statistical offices, both NSIs and various research bodies, present a section on metadata. Research institutes, marketing research societies, and every business or institution collecting survey data should provide a clear metadata definition and communicate it to the users.
The third step is the Designing the mobile web or web‐only survey; this may be broken down into sub‐steps. Firstly, two basic sub‐steps to consider are as follows: (1) decide if the study should be experimental or observational, and (2) decide the mode of data collection.
Regarding the sub‐step Decide if the study should be experimental or observational (sub‐step 1), it should be kept in mind that an experimental study tries to catch how different factors affect the results; thus, the task is to highlight relationships between factors and the results (or outputs). There is no special interest in estimating the values of the variables at the target population level. Observational studies, on the contrary, aim at estimating the values of the variables at the target population level. Designing a survey for an experimental study does not necessary require a probability‐based sample, because the major task is getting a sort of case study for investigating causal relationships. For example, in‐the‐moment surveys, typically reaching the interviewee on the smartphone, are often lacking in probability sampling criteria; thus, mostly they have just a value of experimental studies capturing emotions and opinion when the individual is experimenting some event or action. Observational studies focus at the level of variables estimation; therefore, the probability‐based sampling technique is crucial, and the sampling design is an important step. Socioeconomic surveys in general aim at the estimation of the whole target population estimates.
Sub‐step B, Deciding the mode of data collection, is important because it verifies if organizing a survey only via the web (or mobile web) is feasible and effective. Criteria for mode selection are general and related to several aspects of the research environment and the specific issue. A mobile web survey or a web survey in general, because it is self‐completed, fits extremely well for sensitive research questions and/or for short and simple questionnaires. Efficient implementation of complex questionnaires may be efficiently implemented; this happens particularly in official statistics. Web only or mobile web in this case is more problematic; mixed-mode is preferable. One relevant constraint in the use of a probability‐based mobile web survey is the availability of an adequate sampling frame. Thus, the choice of the mode depends on many factors, and a critical one is the sampling frame availability. An inadequate mode choice might let many types of errors arise (coverage errors, extremely high unit nonresponse, and so forth) bringing about a poor‐quality result. Due to the importance of an adequate mode selection for a probability‐based mobile web survey, Thorsdottir and Biffignandi present a flowchart to show the major steps driving the mode choice. Figure 3.2 presents the actions and the decisions to be undertaken when choosing the mode of data collection.
Moving to Figure 3.2, when selecting the mode, the first problem is deciding if it is possible to draw a probability‐based survey from target population under study, i.e., the question is if everyone does have an e‐mail address. If everyone has an e‐mail address and a complete list (good sampling frame) is available, then it is possible to proceed with the web‐only survey. However, if the list of e‐mail addresses is incomplete (bad sampling frame) or does not exist, the surveyor must decide if an alternative sampling frame is available, for example, a sampling frame of telephone numbers or postal addresses. If the alternative sampling frame is available, a mixed‐mode approach should be adopted. The surveyor should select a contact mode (telephone or mail) and approach the sampled interviewees to ask them to participate in the survey and if they can provide an e‐mail address or not. If the researcher intends to conclude the survey via web, he can provide a personal computer and Internet access (with e‐mail address) to those without Internet and e‐mail address. In this case, the data collection takes place via a web or mobile web survey. Thus, from this step of Figure 3.2, it is possible to follow the decision steps of the flowchart in Figure 3.1. Whether the researcher don't want to provide Internet access or interviewees do not agree to participate via the web or if they do not provide an e‐mail address, the interview should be administered using an alternative mode. In such a case, the surveyor must run a mixed‐mode survey with a web component (see Chapter 9). In this case also, from this step of Figure 3.2, it is possible to follow the decision steps of the flowchart in Figure 3.1.
Figure 3.2 Sub‐steps for deciding the mode of data collection
If no alternative sampling frame exists, only a non‐probability approach is possible, or an RDD telephone contact could be undertaken.3
There are considerable advantages associated with web/mobile web survey compared with face‐to‐face, mainly in term of cost and timeliness. However, web mode does perform more poorly in terms of coverage and participation. For this reason, researchers sometimes consider using a mixed‐mode approach including web, even if no problem exists in term of sampling frame (see experiments and comments in Jäckle, Lynn, and Burton, 2015).
If the conditions to a web survey are present, the surveyor will return to the steps and sub‐steps of the Figure 3.1 flowchart.
At this stage, the sub‐step to be faced is to Define the sampling frame and the sampling approach. This refers only to probability‐based surveys. The identification of the sampling frame requires a complete e‐mail list of the target population; thus, the entire target population should be online and listed (e‐mail list) without under‐coverage or duplications.4 Once the sampling frame and the sampling approach is decided, the sample selection takes place (see Chapter 4 about sampling problems and methods). Sampling strategies do not differ from those of traditional surveys. Other sub‐steps are the Questionnaire design and the Design paradata methodology. It is possible to work on in a parallel way. Paradata are usually “administrative data about the survey,” and, during the data collection, they are gathered. Web survey paradata include how many times the respondent had been accessing the questionnaire, item nonresponses, editing errors, and the time the questionnaire was completed. Thus, there are paradata about each observation in the survey. When considering mobile web surveys, a variety of devices could be used to complete the questionnaire; therefore paradata include the type of device, possibly the device where the survey contact/questionnaire is opened and the one from which the completed questionnaire is submitted. Paradata are useful for understanding problems in the questionnaire. For example, questions not clear enough or ambiguous are identified. Usually unclear questions present plenty of item nonresponse and higher response time, and/or they are inducing up and down navigation in the questionnaire. When monitoring survey participation, paradata are useful in order to allow for additional strategies in the solicitation process. For example, looking at the characteristics of the respondents during the data collection, it is possible to use adaptive design (see Chapter 8). This brings to higher response rate and limited costs. Paradata also provide insights into the under‐coverage and over‐coverage. In summary, paradata are an interesting information source that can improve the survey process and identify errors and their interrelations.
According to Callegaro (2013), in web surveys, it is possible to distinguish between device‐type paradata and questionnaire navigation paradata. Device‐type paradata provide information regarding the kind of device used to complete the interview (i.e., tablet or desktop). They provide information about the technical features of the device (browser, screen resolution, IP address, and several other characteristics). Questionnaire navigation paradata describe the full set of activities undertaken in completing the questionnaire, for example, mouse clicks, forward and backward movements along the questionnaire, number of error messages generated, time spent per question, and question answered before dropping out (if dropout exists). Other authors (for example, Heerwegh, 2011) distinguish between client‐side paradata (they include click mouse and everything related to the activities of the respondent) and server‐side paradata (they include everything collected from the server hosting the survey). The literature proposes also other classifications, and the technology evolution is going to offer new types of paradata. Capturing paradata is one of the main challenges. Software industry improves greatly and constantly. Traditionally most programs were collecting only a few server‐side paradata. Not every program is registering client‐side paradata. Technological innovation and commitment on this important task have contributed to enlarge the offer of programs collecting paradata and transmitting the often unintelligible strings into useful data sets. Due to high innovation in this field, software is fast over; some discussion about the topic is found in Olson and Parkhurst (2013) and in Kreuter (2013).
Currently, it is clear that paradata are useful data types for several different functions, such as monitoring nonresponse and measurement profiles, checking for measurement error and bias, improving questionnaire usability, and fixing many other problems. Due to their potential usefulness in helping to understand relationships between different errors, improving the data collection process, and the quality of results, paradata require a decisional sub‐step to plan their structure and data collection.
Regarding the Design questionnaire sub‐step, several decisions must be undertaken. First, decision about the Questions’ wording and the Response format takes places. Question wording rules are similar to the other modes, except that the sentences should be especially simple and short; this recommendation is stronger for web surveys than for the other modes. As regards Response format, even if most of the basic criteria are like traditional modes, the general rules and response formats in web surveys are different and specific to a self‐completed questionnaire and to the digital format used (see Chapter 7). The criteria for mobile web survey are similar to the ones for web surveys. Some specific requirements are due to the technical structure of the devices, especially to the characteristic of the small size of the screen, especially in smartphones. A poorly structured questionnaire and technically not adequate to mobile phones could critically affect the quality of the survey; errors could arise in terms of response rate, item nonresponse, and estimates. An important issue arises when a mixed‐mode approach is adopted. In this case, the decision if the optimal approach for each mode or a unimode approach has to be adopted is debated (see Dillman, Smyth, and Christian 2014). Recently it has been suggested to achieve the balance between the basic presumption that survey questions should be as identical as possible between modes (unimode approach) and, at the same time, consider that mode effects might be reduced by optimizing each questionnaire for corresponding mode (mode‐specific approach).
Another important sub‐step related to the questionnaire is Interactivity. A web process provides the opportunity for automatically interacting with the respondent. For example, pop‐up windows with a needed definition or other forms of metadata could appear. Some questions (one or more questions) are compulsory by design, meaning the blocking of the questionnaire's progress, until the questions are fulfilled (Example 3.2, Figure 3.3). Furthermore, the researcher could allow for going up and down throughout the questions or just a question‐by‐question (top‐down) approach. In this case, when going from one page of the questionnaire to the successive page, an error message is given, and the uncompleted question appears again. In some cases, activation of the automatic check is necessary, for example, if the question asks for a percentage composition, the answer check is during the compilation and an error message appears (Example 3.3, Figure 3.4). The error correction takes place before continuing the questionnaire compilation.