Читать книгу A Gentle Introduction to Statistics Using SAS Studio - Ron Cody - Страница 11
ОглавлениеChapter 5: Importing Data into SAS
Importing Data from an Excel Workbook
Importing an Excel Workbook with Invalid SAS Variable Names
Importing an Excel Workbook That Does Not Have Column Headings
Importing Data from a CSV File
Shared Folders (Accessing Data from Anywhere on Your Hard Drive)
Introduction
Now that you have learned how to perform operations on built-in SASHELP data sets, it’s time to see how to import your own data into a SAS library.
SAS data sets contain two parts: the first part is called the data descriptor, also known as metadata. Metadata is a fancy word for data about your data. In the case of a SAS data set, this portion of the data set contains such information as the number of rows and columns in the table, the column names, the data type for each column (SAS has only two data types—character and numeric), and other information about when the data set was created.
The second part of a SAS data set contains the actual data values. If you tried to examine a SAS data set using another program such as Word or Notebook, it would show up as nonsense. Only SAS can read, write, and analyze data in a SAS data set. If you have data in Excel workbooks or text files, you need to convert that data into a SAS data set before you can use SAS to modify or analyze the data.
In this chapter, you will see how easy it is to import your own data from Excel workbooks, CSV files, and many other file formats such as Microsoft Access and SPSS, and create SAS data sets.
Exploring the Utilities Tab
Start by clicking the Tasks and Utilities tab in the navigation pane. It looks like this.
Figure 5.1: The Tasks and Utilities Tab in the Navigation Pane
When you click this tab, you see three separate tabs, one labeled My Tasks, another labeled Tasks, the last labeled Utilities. Expanding the Utilities tab displays three sub-tabs: Import Data, Query, and SAS Program. (See Figure 5.2 below.)
Figure 5.2: Expanding the Utilities Tab
The Import Data task is used to import data in a variety of formats and to create SAS data sets. A complete list of supported file types is shown in Figure 5.3.
Figure 5.3: List of Supported Files
As you can see in Figure 5.3, this import data utility can import data from many of the most common PC data formats. Because Excel workbooks and CSV files are so popular, let’s use them to demonstrate how SAS converts various file formats into SAS data sets.
Importing Data from an Excel Workbook
Your virtual machine is running a Linux operating system where naming conventions for files are different from the naming conventions used on Microsoft or Apple computers. Filenames in Linux are case sensitive, and folders and subfolders are separated by forward slashes. Filenames on Microsoft platforms are not case sensitive, and folders and subfolders are separated by backward slashes. To help resolve these file-naming conventions, you set up shared folders in your virtual machine that allow your SAS programs to read and write files to the hard drive on your computer.
There are slight differences in how you create shared folders, depending on whether you are running VirtualBox, VMware Workstation Player, or VMware Fusion. The easiest way to read and write data between your SAS Studio session and your hard drive is to place your data files in a specific location on your Windows hard drive—\SASUniversityEdition\myfolders. If you followed the installation directions for your choice of virtualization software, this location on your hard drive is mapped to a shared folder called /folders/myfolders in SAS Studio.
For most of the examples in this book, the location c:\SASUniversityEdition\myfolders is the folder where your data files and SAS data sets are located. All the programs and data files that you place in \SASUniversityEdition\myfolders will show up when you click the My Folders tab in the Navigation pane. Remember that this folder (or an equivalent folder on other operating systems) was created when you installed and configured SAS University Edition.
Let’s use the workbook Grades.xlsx (located in the folder c:\SASUniversityEdition\myfolders) for this demonstration.
If you go to the SAS author site (support.sas.com/cody) and scroll down to this book, you will see some choices listed, including one that reads, “Example Code and Data.” If you click on this link, you can download a ZIP file that contains some programs and data sets. Find the program Create_Datasets.sas and extract it. If you are using SAS Studio with SAS University Edition, a good place to put the files that you downloaded is in a folder called:
c:\SASUniversityEdition\Myfolders
It you do that, you can access the programs and data in the Server Files and Folders tab on the left side of the navigation screen.
Next, open up SAS Studio. In the option to edit the Autoexec.sas file (click on the icon to the left of the question mark (?) on the top line of SAS Studio and select “Edit Autoexec File”), add a line similar to the one below:
libname Stats ‘/folders/Myfolders’;
If you are using SAS Studio in another environment (such as the SAS Windowing Environment or SAS on Demand), you will be placing your files in different locations and modifying the LIBNAME statement shown above.
If you open this workbook in Excel, it looks like this.
Figure 5.4: Excel Workbook Grades.xlsx
The first row of the worksheet contains column names (also known as variable names). The remaining rows contain data on three students (yes, it was a very small class). The worksheet name was not changed so that it has the default name Sheet1.
The first step to import this data into a SAS data set is to double-click the Import Data task.
Figure 5.5: Double-Clicking the Import Data Task
You have two ways to select which file you want to import. One is to click the Select File button on the right side of the screen—the other method is to click the Server Files and Folders tab in the Navigation pane (on the left), find the file, and drag it to the Drag and Drop area. Depending on how you set up your SAS Studio session, you might find your files under Folder shortcuts then myfolder.
Using the first method and clicking Select File, brings up a window where you can select a file to import. Here it is:
Figure 5.6: Clicking on the Select File Button
Select the file that you want to import and click Open. This brings up the Import Window:
Figure 5.7: The Import Window
Use the mouse to enlarge the top half of the Import window or use the scroll bar on the right to reveal the entire window. The figure below shows the expanded view of the Import window:
Figure 5.8: Expanded View of the Import Window
The top part of the window shows information about the file that you want to import. You can enter a Worksheet Name (if there are multiple worksheets). But because you only have one worksheet, you do not have to enter a worksheet name.
The OPTIONS pulldown menu enables you to select a file type. However, if your file has the appropriate extension (for example, XLSX, XLS, or CSV), you can leave the default actions (based on the file extension) to decide how to import the data.
Because the first row of the spreadsheet contains column names, leave the check on the “Generate SAS variable names” option. This tells the import utility to use the first row of the worksheet to generate variable names.
You probably want to change the name of the output SAS data set. Clicking the Change button in the Output Data area of the screen brings up a list of SAS libraries (below).
Figure 5.9: Changing the Name of the SAS Data Set
The WORK library is used to create a temporary SAS data set (that disappears when you close your SAS Session). For now, let’s select the WORK library and name the data set Grades. Click the Save button at the bottom of the screen to complete the file selection process.
When all is ready, click the Run icon (Figure 5.10).
Figure 5.10: Click the Run Icon
You are done! Here is a section of the results.
Figure 5.11: Variable List for the Work.Grades SAS Data Set
Here you see a list of the variable names (note: you may have to scroll down through several pages to see this), whether they are stored as numeric or character, along with some other information that we don’t need at this time. Notice that the import utility correctly reads Name as character and the other variables as numeric.
Listing the SAS Data Set
A quick way to see a listing of the Grades data set is to select the Libraries tab in the navigation pane, open the WORK library, and double-click Grades. It looks like this:
Figure 5.12: Data Set Grades in the Work Library
You can use your mouse to scroll to the right to see the rest of the table. To create a better-looking report, click the Tasks and Utilities tab of the navigation pane and select Tasks, then Data, followed by List Data. (See Figure 5.13.)
Figure 5.13: The List Data Task
Double-click List Data and select the Grades data set in the WORK library in the Data selection box. Then click the Run icon. You will be presented with a nice-looking list of the Grades data set. (See Figure 5.14 below.)
Figure 5.14: List of the Grades Data Set
Importing an Excel Workbook with Invalid SAS Variable Names
What if your Excel worksheet has column headings that are not valid SAS variable names?
Valid SAS variable names are up to 32 characters long. The first character must be a letter or underscore—the remaining characters can be letters, digits, or underscores. You are free to use upper- or lowercase letters.
As an example of a worksheet with invalid variable names, look at the worksheet Grades2 shown in Figure 5.15.
Figure 5.15: List of Excel Workbook Grades2
Most of the column headings in this spreadsheet are not valid SAS variable names. Six of them contain a blank in the middle of the name, and the last column (2015Final) starts with a digit. What happens when you import this worksheet? Because you now know how to use the Import Data task, it is not necessary to describe the import task again. All you really need to see is the final list of variables in the data set. Here they are:
Figure 5.16: Variable Names in the Grades2 SAS Data Set
SAS replaced all the blanks with underscores and added an underscore as the first character in the 2015Final name to create valid SAS variable names. Note: the option to use column labels as column headings was deselected so that you could see the actual variable names.
Importing an Excel Workbook That Does Not Have Column Headings
What if the first row of your worksheet does not contain column headings (variable names)? You have two choices: First, you can edit the worksheet and insert a row with column headings (probably the best option). The other option is to deselect “Create Variable Names” in the OPTIONS section in the Import Window (see Figure 5.17) and let SAS create variable names for you.
Figure 5.17: Uncheck the Create Variable Names Option
Here is the result:
Figure 5.18: Variable Names Generated by SAS
SAS used the Excel column identifiers (A through F) as variable names. You can leave these variable names as they are or change them using DATA step programming. Another option is to use PROC DATASETS, a SAS procedure that enables you to alter various attributes of a SAS data set without having to create a new copy of the data set.
When you import a CSV file without variable names, you will see variable names VAR1, VAR2, and so on, that are generated by SAS.
Importing Data from a CSV File
Comma-separated values (CSV) files are a popular format for external data files. As the name implies, CSV files use commas as data delimiters. Many websites enable you to download data as CSV files. As with Excel workbooks, your CSV file may or may not contain variable names at the beginning of the file. If the file does contain variable names, make sure that the “Generate SAS Variable Names” options box is checked; if not, deselect this option.
For example, look at the CSV file called Grades.csv in Figure 5.19 below:
Figure 5.19: CSV File Grades.csv
This CSV file contains the same data as the Excel Workbook Grades.xlsx. Notice that variable names are included in the file. You can import this file and create a SAS data set using the same steps that you used to import the Excel workbook. The import facility will automatically use the correct code to import this data file because of the CSV file extension. The resulting SAS data set is identical to the one shown in Figure 5.14.
Shared Folders (Accessing Data from Anywhere on Your Hard Drive)
When you follow the instructions in setting up SAS Studio, a default folder referred to in SAS Studio as /folders/myfolders allows you to read data from the folder called \SASUniversityEdition\myfolders on your hard drive. If this is the only place where you plan to read data, you do not need to create any other shared folders in your virtual computer.
If you need to read data from other locations on your hard drive, please see the relevant sections in SAS documentation.
Conclusion
In this chapter, you saw how to import data from Excel workbooks and CSV files. Importing data from any of the other choices displayed in Figure 5.3 follows the same basic procedure. If you need to read data from text files, you will need to learn some basic SAS programming, especially how to use the INPUT statement, one of the most powerful and versatile components of the SAS system.