Читать книгу Using Stata for Quantitative Analysis - Kyle C. Longest - Страница 22

Types of Variables in Data Files

Оглавление

At this point, you should feel comfortable with the basic structure of data files. Each row holds the information for one case and each column is a different variable. With this knowledge, you are almost ready to start analyzing your data. There is, however, one distinction in the types of variables included in data that is important to understand.

To help illustrate this difference, consider the NSYR variable gender in the Chapter 1 Data.dta file. This variable came from the following question asked of all respondents:

Are you

1 Male?

2 Female?

If you were entering the responses to this question into a Stata data set, you could record them in one of two ways. First, the actual answer “Male” or “Female” could be recorded for each case. Second, you could use a number to represent each answer. For example, you could choose to enter 0 for all respondents reporting “Male” and 1 for all respondents reporting “Female.”

If you record the responses in the first way, it would be what Stata refers to as a string variable. A string variable is a variable in which the contents are actual words. String variables can be very useful for many purposes. For example, you can enter verbatim answers to questions directly into Stata, as was done for the variable religoth in the Chapter 1 Data.dta file.

The drawback of storing a variable such as gender as a string variable is that some statistical operations require numbers. For example, if you wanted to calculate the mean (i.e., mathematical average) of a variable, each category must be assigned a numeric value. For this reason, it is generally advisable, when possible, to use the second method and enter variables as numeric variables. These are variables that have actual numbers attached to each response.

Fortunately, many of the Stata commands that will be discussed in this book operate similarly with numeric or string variables. The commands that work only with numeric variables are those that perform statistical operations that require numbers to calculate, for example, the mean or a linear regression. Because numeric variables, typically, are more applicable to the vast majority of data analyses, the commands discussed in this book focus on their use with numeric variables (keeping in mind that many operate identically for string variables). The primary commands that are used (and are different) for string variables, including methods for changing a string variable to a numeric variable, are addressed in the Data Management: Using String Variables section in Chapter 3.

As has been discussed, often, you may be using data that you did not enter, so you may not have a choice or even be certain about the way in which variables were entered. There are several ways to determine whether a variable is a numeric or string variable. The most straightforward way is to open the Data Browser window. In versions Stata 10 or later, string variables are shown in a red font, whereas numeric variables are shown in either a black or blue font. In the Chapter 1 Data.dta file, you will see that only the variable religoth is a string variable.

Another option to see which variables are string variables is to click on a particular variable in the Variables window. In the Properties window, you will see an entry for Type. When the variable type starts with the letters “str,” the variable is stored as a string variable.

Using Stata for Quantitative Analysis

Подняться наверх