Читать книгу Using Stata for Quantitative Analysis - Kyle C. Longest - Страница 21

Merging Data

Оглавление

There are two typical variants for the second combination scenario. The first occurs when you conduct a follow-up survey on a similar set of cases, such as a pre-test–post-test model. Here you would want to include the new variables (i.e., post-test responses) to the initial data set from the pre-test. In this situation your new data would look like that presented in Figure 1.10.


FIGURE 1.10 • NEW VARIABLES FOR ORIGINAL RESPONDENTS

If you compare these new data to the original, you will notice that these are the same 10 cases, noted by their similar ids values. Also, none of their gender identifications have changed. But all of their ages have increased by a year (as if this follow-up survey was conducted 1 year after the initial survey), and several of their employment status and religion responses have changed. In this case, you would be looking to attach this new information to the original data to potentially examine the causes of why some respondents shift employment categories or religions, for example.

A second variant that involves the same data combination process would be where you would like to include new variables for existing cases that correspond to some other information about these cases. For example, perhaps you have a set of survey responses from adults who recently visited a hospital. You may want to bring in new variables that involve information about the particular hospital each case visited. Or following the example we have been using, you may want to bring in information about the religious denomination with which they affiliate. In this situation, your new data might look like those shown in Figure 1.11.


FIGURE 1.11 • DENOMINATION SPECIFIC DATA

In these data, you will notice that the information pertains to the particular religion, not the respondents. The variables therefore are information about how many total Baptists there are, or whether Mormonism would be considered an evangelical denomination. Of course, in a real situation, you could have a great deal more information about each denomination that may be useful in analyzing your survey data. Notice here that you do not have every denomination in this new data that is present in your original data. This situation can occur with this type of combination and will not cause a problem for Stata.

Both of these situations are referred to as “merging” in Stata because you are bringing in new information about the existing cases. As you may have guessed, then, the command to complete the combination is –merge-. One key difference in the two types of merges is what exactly you are merging on. Understanding this difference is the key to completing the merge correctly. In the first merge example, you would be adding new information about the cases, which means you would merge on the ids variable. It is the ids variable that links the original data to the new data. The second situation, however, would require that you merge on the religoth variable because it is the link between the two data sets. You may have realized that doing the latter means that several cases in your combined data will have the exact same values for the new denomination-based variables. That is, every respondent that identifies as Baptist will receive the exact same value for the totalmembers and evangelical variables. This commonality is exactly what you are looking for when you incorporate this type of information.

Once you have identified the variable that you will merge the two data sets with (i.e., which variable allows you to link to the two data sets), the –merge- command is relatively straightforward. Again, following along with the Stata Help Files section of Chapter 8 will help you understand exactly how to complete this combination for your particular needs. Again, it may be helpful here to see what the final product looks like to have a better sense of exactly what the –merge- command does and whether it may be what you need. Figures 1.12 and 1.13 display the final data after completing a merge first with the post-test data shown in Figure 1.10 and then completing a different merge with the denomination data from Figure 1.11.


FIGURE 1.12 • NEW MERGED DATA WITH NEW OBSERVATIONS


FIGURE 1.13 • NEW MERGED DATA WITH DENOMINATION INFORMATION

In this example, you can see that the final data set still contains the original 10 cases, but now the information from their follow-up survey is connected to their original responses. Again, some information (i.e., gender) has remained constant, whereas other data have altered as their lives have presumably changed.

In this merge example, the same original cases are present, but information that pertains to their response in the religoth variable is now included. Because the data set with information about each denomination did not include some of the particular denominations that the respondents reported, several cases now have missing information on these new variables. But the new information that is provided may be helpful in analyzing why belonging to specific denominations may be related to particular behaviors or trajectories.

Using Stata for Quantitative Analysis

Подняться наверх