Читать книгу Exercises and Projects for The Little SAS Book, Sixth Edition - Lora D. Delwiche - Страница 10

Chapter 2 Accessing Your Data

Оглавление

Multiple Choice

Short Answer

Programming Exercises

Multiple Choice

1. What type of data files are not considered raw data?

a. ASCII files

b. CSV files

c. Text files

d. SQL tables

2. In the SAS windowing environment, the Viewtable window is useful for which of the following actions?

a. Combining existing SAS data sets

b. Entering data into a SAS data set

c. Exporting data to another file type

d. Importing data from another file type

3. Which of the following is a valid libref name that can be used to create a permanent SAS data set?

a. working

b. 365days

c. permanent

d. All of the above

4. Which DATA statement will create a permanent SAS data set called DOGS assuming that all SAS libraries have been properly defined?

a. DATA dogs.sas7bdat;

b. DATA dogs;

c. DATA sasdata.dogs;

d. None of the above

5. When using the following direct reference to create a permanent SAS data set in the Windows operating environment, what does the name dogs refer to?

DATA ‘c:\MySASLib\dogs’;

a. The drive

b. The directory

c. The filename

d. The libref

6. The descriptor portion of a SAS data set includes which of the following?

a. The SAS engine with which the data set was created

b. The date on which the data set was created

c. The number of variables and observations

d. All of the above

7. Which optional statement in PROC IMPORT tells SAS to expect a column that contains both character and numeric values?

a. GETNAMES = NO

b. GETNAMES = YES

c. MIXED = NO

d. MIXED = YES

8. Which PROC IMPORT option identifies the type of Microsoft Excel file to be read?

a. DELIMITER=

b. DBMS=

c. DATAROWS=

d. None of the above

9. If your raw data file contains only data and no variable names, which PROC IMPORT option should you use?

a. DELIMITER=

b. GUESSINGROWS=

c. GETNAMES=

d. OUT=

10. The data in the following program are an example of what type of data?

DATA readme;

INPUT Place $ Code $;

DATALINES;

AG 5678

SLO 1234

PB 3456

;

RUN;

a. Character

b. Instream

c. Internal raw data

d. All of the above

11. Which statement is synonymous with a DATALINES statement?

a. DATA

b. INFILE

c. CARDS

d. INPUT

12. Which SAS statement enables you to refer to an external raw data file?

a. DATALINES

b. DATA

c. INFILE

d. INPUT

13. Which of the following types of data cannot be read with list input?

a. Missing data indicated by a period

b. Date and time values

c. Standard numeric data

d. All of the above

14. Assuming that the raw data are arranged in neat columns, what is an advantage of column input?

a. It can read missing data indicated by spaces

b. It can read embedded blanks

c. It can read character data longer than eight characters

d. All of the above

15. With column input, you cannot do which of the following?

a. Read data separated by spaces

b. Specify an informat in the INPUT statement

c. Read numeric data in scientific notation

d. All of the above

16. Given this note in the SAS log, what could you add to fix the INPUT statement so that the ID variable would be read correctly including all digits and hyphens?

INPUT ID GPA Age;

NOTE: Invalid data for ID in line 1 1-9.

RULE: ----+----1----+----2

1 5437-2212 3.84 21

ID=. GPA=3.84 Age=21 _ERROR_=1 _N_=1

a. A dollar sign

b. A column range

c. An informat

d. None of the above

17. Which of the following data values would not require an informat?

a. 44.5E2

b. $1,689

c. 08/18/1920

d. 4,928

18. Which informat would be appropriate to read the value 07/04/1776?

a. MMDDYY8.

b. MMDDYY10.

c. DATE8.

d. DATE10.

19. Which input style is the best for reading date values from raw data?

a. List

b. Column

c. Formatted

d. All input styles can read date values

20. Select the INPUT statement that would be appropriate for reading data values for the variables Name, Salary, and Age in the following raw data.

----+----1----+----2----+----3

Sally $64,350 41

Marian $55,500 38

Oprah $75,000,000 59

a. INPUT Name $ Salary & DOLLAR11. Age;

b. INPUT Name $ Salary :DOLLAR11. Age;

c. INPUT Name $ @10 Salary DOLLAR11. Age;

d. INPUT Name $ @’$’ Salary Age;

21. Which of the following tells SAS to go to the next line when reading in data?

a. @

b. @@

c. /

d. +n

22. Which input style can be used with a double trailing @?

a. Column

b. List

c. Both can be used

d. Neither can be used

23. A record that is being held by a trailing @ will be released for which of the following reasons?

a. The current loop through the DATA step completes

b. SAS finds a subsequent INPUT statement with no line-hold specifier

c. Both of these

d. Neither of these

24. A record that is being held by a double trailing @ will be released for which of the following reasons?

a. The current loop through the DATA step completes

b. SAS finds a subsequent INPUT statement with no line-hold specifier

c. Both of these

d. Neither of these

25. Which of the following is a valid INFILE option that tells SAS to stop reading after the fifth line of raw data?

a. FIRSTOBS = 5

b. OBS = 5

c. TOTALOBS = 5

d. N = 5

26. Select the INFILE option that specifies that the raw data use a comma as the delimiter.

a. DLM = ‘,’

b. DLM = ,

c. DLM = COMMA

d. DLM = COMMA.

Short Answer

27. Discuss the advantages of using LIBNAME statements versus direct referencing for creating permanent SAS data sets.

28. Suppose that you inherit a program that reads data from the raw data file called NationalParks.dat into a permanent SAS data set called NATIONALPARKS. Would this cause SAS to overwrite the original raw data file?

29. Explain the reasons that you might choose to use internal versus external raw data.

30. Explain the difference between using a LIBNAME statement versus using an INFILE statement.

31. List five examples of data values that cannot be read with list input.

32. Write an INPUT statement for the following raw data with variables named Year, City, Name1, and Name2.

----+----1----+----2----+----3----+----4

18 San Diego Rebecca Marian

19 San Francisco Kathy Ginger

20 Long Beach Scott Sally

21 Las Vegas Cynthia MaryAnne

22 San Jose Ethan Frank

33. In the preceding data set some of the values for the variable City are longer than 8 characters. Explain why using a LENGTH statement with list input is not sufficient to read City correctly for this data set.

34. Describe one advantage of using formatted input over column input.

35. Write an INPUT statement for the following raw data with variables named Brand, Qty, and Amount.

----+----1----+----2----+----3

Pampers 42 $44.99

Huggies 7 $34.99

Seventh Generation 7 $39.99

Nature Babycare 4 $41.99

36. Explain why it would be a good idea to use an informat when reading data using the & modifier.

37. When reading raw data files, by default, the colon modifier cannot read character data with embedded blanks. Explain why and suggest a type of raw data file that would allow SAS to read embedded blanks using a colon modifier.

38. Examine the following raw data that contain the genus, species, and quantity of plants at a local nursery. Would a line pointer work to read this data file into SAS? Explain why or why not.

----+----1----+----2----+----3

Rosa

multiflora 49

canina 38

Narcissus

papyraceus 15

Dendrobium

kingianum 8

nobile 5

phalaenopsis 12

39. Examine the following raw data, which contain a patient ID and group designation (A, B, or C) with multiple observations per line. Write the SAS statements that will read the data into variables named ID and Group using a line-hold specifier, and then keep only those patients in groups A and C.

----+----1----+----2----+----3

4165 A 2255 B 3312 C 5689 C

1287 A 5454 A 6672 C 8521 B

8936 C 5764 B

40. Suppose that you have a raw data file from a national bank that contains millions of transactions from branches across the country. Reading in the entire data set takes too much processing time, and you are only interested in the records that correspond to your branch. Discuss how you can modify the following DATA step to decrease the processing time while reading this raw data file.

DATA transaction;

INFILE ‘c:\MyRawData\BankTrans.csv’ DLM = ‘,’;

INPUT Branch_Name Branch_ID Trans_ID Account

Date MMDDYY8. Start_Time TIME8.

End_Time TIME8. Amount Balance;

RUN;

41. Explain the difference between the TRUNCOVER and MISSOVER options for the INFILE statement.

42. Suppose that you have a raw data file that contains data values with embedded commas and uses tabs as a delimiter. Explain why it would or would not be necessary to enclose the data values in quotes and use the DSD option.

43. Write an INFILE statement that will tell SAS to read the raw data file c:\MyRawData\Records.csv, which contains data values that are separated by commas, and allows for missing data at the end of the record.

Programming Exercises

44. Annual attendance for the top 10 amusement parks in North America is listed in the raw data file ParkAttendance.dat. For each park, the data include the ranking, park name, location, and four years of attendance.

a. Open the raw data file ParkAttendance.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.

b. Use the IMPORT procedure to read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).

c. Print the data set.

45. The file CancerRates.dat contains data on the top 10 cancer sites in the United States from the Centers for Disease Control and Prevention (CDC) website. These statistics are condensed across genders and races. The variables are ranking, cancer site, and incidence rate per 100,000 people.

a. Open the raw data file CancerRates.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.

b. Read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).

c. Print the data set.

d. Copy the CancerRates.dat data set to a different location such as your desktop or a flash drive and read it into SAS a second time from that new location.

46. The American Kennel Club (AKC) reports rankings of dog breeds by year based on the number of registrations. These data are found in the raw data file AKCbreeds.dat. For each breed, the data include the name of the breed, and ranking for each of four years. Breeds with missing ranks were not recognized by the AKC during that year.

a. Open the raw data file AKCbreeds.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.

b. Read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).

c. Print the data set.

47. The World Health Organization (WHO) monitors vaccine recommendations in countries around the world. The raw data file Vaccines.dat contains the recommended vaccines for a sample of 13 countries. The variables in this file are vaccine name, mode of disease transmission, worldwide incidence, worldwide deaths, and recommendations (stored in 13 individual columns for the respective countries of Chile, Cuba, United States, United Kingdom, Finland, Germany, Saudi Arabia, Ethiopia, Botswana, India, Australia, China, and Japan).

a. Open the raw data file Vaccines.dat in a simple editor such as WordPad. In a comment in your program, state the number of variables and observations.

b. Read the raw data file into SAS. View the log to verify that your data set has the same number of variables and observations as you stated in part a).

c. Print the data set.

48. Each year, Forbes magazine publishes a list of the world’s 100 biggest companies. Each company receives a score using four metrics: sales, profits, assets, and market value. The final overall ranking is based on a composite score of these metrics. The variables in the raw data file BigCompanies.dat are ranking, company name, country, sales (billions), profits (billions), assets (billions), and market value (billions).

a. Open the raw data file BigCompanies.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

b. Read the raw data file into SAS.

c. Print the data set.

49. Crayola crayons were introduced in 1903, and since then numerous standard colors have been released. Each crayon has a unique name, which corresponds to a hexadecimal code and RGB triplet. The raw data file Crayons.dat contains information on these standard crayon colors with variables corresponding to crayon number, color name, hexadecimal code, RGB triplet, pack size, year issued, and year retired.

a. Open the raw data file Crayons.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

b. Read the raw data file into a permanent SAS data set.

c. Print the data set.

50. The tallest mountains in the world are located in central and southern Asia. The raw data file Mountains.dat contains information on mountains over 7,200 meters (23,622 ft). Researchers measure the prominence of a mountain as the height above the highest saddle connecting it to a higher summit. The variables in this file are mountain name, height (m), height (ft), year of first ascent, and prominence (m).

a. Open the raw data file Mountains.dat in a simple editor such as WordPad. In a comment in your program, state which variables must be read in as character and which variables should be read in as numeric.

b. Read the raw data file into SAS.

c. Print the data set.

51. Information Technology Services (ITS) at Central State University has a computing service called ”the Grid,” which is offered to faculty, staff, and students. This supercomputer is a cluster of 10 computers that, if programmed correctly in a grid environment, can process much faster by distributing the work across 10 machines. University users that would like to use the Grid computing environment must register with ITS. The raw data file CompUsers.dat contains the variables user ID, classification group (faculty, staff, or student), first name, last name, email address, campus phone number, and department.

a. Examine the raw data file CompUsers.dat and read it into SAS.

b. Print the data set.

c. Write another DATA step to read the raw data file and remove the student records. Do this as efficiently as possible by testing the classification group as it is being read in with the INPUT statement.

d. Print the data set.

52. The World Health Organization (WHO) collected data in countries across the world regarding the outbreak of swine flu cases and deaths in 2009. The data in the file SwineFlu2009.dat include counts per country by month during the epidemic. There are many variables in the raw data file with the following descriptions:

By date, ID for sorting by first case date

By continent, ID (X.YY) for sorting by first case date within a continent where X represents continent X, and YY represents the YYth country with the next first case

Country

Date of first case reported

Number of cumulative cases reported on the first day of the month for April, May, June, July, and August (across the columns, respectively)

Last reported cumulative number of cases reported to WHO as of August 9, 2009

By date, ID for sorting by first death date

By continent, ID (X.YY) for sorting by first death date within a continent where X represents continent X, and YY represents the YYth country with the next first death

Date of first death

Number of cumulative deaths reported on the first day of the month for May, June, July, August, September, October, November, and December (across the columns, respectively)

a. Examine the raw data file SwineFlu2009.dat and read it into SAS.

b. Print a report that describes the contents of the data set including attributes of the variables.

53. The data in the file BenAndJerrys.dat represent various ice cream flavors and their nutritional information. The variables in the raw data file are flavor name, portion size (g), calories, calories from fat, fat (g), saturated fat (g), trans fat (g), cholesterol (mg), sodium (mg), total carbohydrate (g), dietary fiber (g), sugars (g), protein (g), year introduced, year retired, content description, and notes.

a. Examine the raw data file BenAndJerrys.dat and read it into SAS using a DATA step.

b. Read the raw data file using PROC IMPORT.

c. Create reports that describe the contents for each data set.

d. Note any differences between the two data sets as a comment in your program.

54. Data on previous winners of the Oscars are stored in a Microsoft Excel file named Oscars.xlsx. The variables in this file are ID, year, host, best picture, best actor, best actress, best director, and best screenplay.

a. Examine the Microsoft Excel file Oscars.xlsx and read it into a permanent SAS data set using the IMPORT procedure.

b. Print a report that describes the contents of the data set including the attributes of the variables and data set.

c. In a comment in your program, discuss any limitations of the functionality of the resulting data set.

d. Print the Oscars.xlsx data file using the XLSX LIBNAME engine. In a comment in your program, discuss any limitations of using this method to read in the data.

55. Researchers randomly assigned subjects to either a treatment group taking a cholesterol-lowering medication daily, or a control group taking a placebo daily. The difference in total cholesterol was measured after four months. The variables in the Tchol.dat file are subject ID, treatment group, difference in cholesterol, pre-treatment total cholesterol, and post-treatment total cholesterol.

a. Examine the raw data file Tchol.dat and read it into SAS.

b. Print the data set.

c. Create a new DATA step and read in the data for only the subjects assigned to the treatment group. Do this as efficiently as possible by testing the treatment group variable as it is being read in with the INPUT statement.

d. Print the data set.

56. A gourmet pizza restaurant is considering adding new toppings to its menu. Each month they survey 10 customers about their preferences for three different toppings. They want data on several different toppings, so they don’t always ask about the same three toppings. Customers rate each topping on a scale of 1 (would never order) to 5 (would order often). The restaurant wants to compute average ratings for all toppings, so the ratings variables need to be numeric. The raw data file Pizza.csv has variables for the respondent’s survey number, and the ratings for five different toppings: arugula, pine nuts, roasted butternut squash, shrimp, and grilled eggplant. The first two digits in the survey number correspond to the month of the survey.

a. Examine the raw data file Pizza.csv and read it into SAS using the IMPORT procedure.

b. Print the data set.

c. Print a report that describes the contents of the data set to make sure all the variables are the correct type.

d. Open the raw data file in a simple editor like WordPad and compare the data values to the output from parts b) and c) to make sure that they were read correctly into SAS. In a comment in your program, identify any problems with the SAS data set that cannot be resolved using the IMPORT procedure.

e. Read the same raw data file, Pizza.csv, this time using a DATA step. Be sure to resolve any issues identified in part d).

f. Print the data set.

57. The Microsoft Excel file named CarTalk.xlsx contains information regarding episodes of the automotive repair radio talk show Car Talk. Variables in this file include episode number, air date, title, and a description of the show.

a. Examine the Microsoft Excel file Cartalk.xlsx by printing the Excel spreadsheet using the XLSX LIBNAME engine.

b. Read the Microsoft Excel file Cartalk.xlsx into a SAS data set using the XLSX LIBNAME engine.

c. Read the Microsoft Excel file into a SAS data set using PROC IMPORT.

d. Print the two SAS data sets.

e. Read the rows of the Excel file that correspond to the month of May into SAS using the IMPORT procedure. Print the data set.

Exercises and Projects for The Little SAS Book, Sixth Edition

Подняться наверх