Читать книгу Exercises and Projects for The Little SAS Book, Sixth Edition - Lora D. Delwiche - Страница 11
Chapter 3 Working with Your Data
ОглавлениеProgramming Exercises
Multiple Choice
1. Which DATA step will not overwrite a temporary SAS data set called TOYS?
a. DATA WORK.toys; SET WORK.toys; RUN;
b. DATA ‘c:\MySASLib\toys’; SET ‘c:\MySASLib\toys’; RUN;
c. DATA toys; SET toys; RUN;
d. None of the above
2. Which SAS statement can be used to read a SAS data set?
a. SET
b. INFILE
c. INPUT
d. All of the above
3. Which of the following assignment statements is valid for the numeric variable Score?
a. Score / 100;
b. Score = Score / 100;
c. Score = ‘Score’ / 100;
d. Score = ‘Score / 100’;
4. Given the following raw data and program, what will be the value of Total1 for the second observation in the resulting SAS data set?
----+----1----+----2
1 160 50 20
2 150 55 .
3 120 40 30
4 140 50 25
DATA cholesterol;
INFILE ‘c:\MyRawData\Patients.dat’;
INPUT ID Ldl Hdl Vldl;
Total1 = Ldl + Hdl + Vldl;
RUN;
a. 230
b. 205
c. .
d. 215
5. Given the following raw data and program, what will be the value of Total2 for the second observation in the resulting SAS data set?
----+----1----+----2
1 160 50 20
2 150 55 .
3 120 40 30
4 140 50 25
DATA cholesterol;
INFILE ‘c:\MyRawData\Patients.dat’;
INPUT ID Ldl Hdl Vldl;
Total2 = SUM(Ldl,Hdl,Vldl);
RUN;
a. 230
b. 205
c. .
d. 215
6. Which function can be used to replace text?
a. TRIM
b. INDEX
c. TRANWRD
d. PROPCASE
7. Which of the following is a valid function for finding the average of X1, X2, and X3?
a. AVERAGE(X1,X2,X3)
b. AVG(X1,X2,X3)
c. MEAN(X1,X2,X3)
d. MU(X1,X2,X3)
8. What will SAS return for the value of X?
X = MIN(SUM(1,2,3),56/8,N(8));
a. 1
b. 6
c. 8
d. .
9. Which of the following IF-THEN statements will not assign a value of 1 to the variable named Flag for patients with an eye color of blue or brown?
a. IF EyeColor = ‘blue’ OR ‘brown’ THEN Flag = 1;
b. IF EyeColor = ‘blue’ OR EyeColor = ‘brown’ THEN Flag = 1;
c. IF EyeColor IN (‘blue’,’brown’) THEN Flag = 1;
d. All of the above will work
10. Which set of IF-THEN/ELSE statements will run without errors?
a. IF 0 <= Age <= 50 THEN Group = ‘A’;
ELSE 50 < Age <=70 THEN Group = ‘B’;
ELSE Age > 70 THEN Group = ‘C’;
b. IF 0 <= Age <= 50 THEN Group = ‘A’;
ELSE IF 50 < Age <= 70 THEN Group = ‘B’;
ELSE Age > 70 THEN Group = ‘C’;
c. IF 0 <= Age <= 50 THEN Group = ‘A’;
ELSE IF 50 < Age <= 70 THEN Group = ‘B’;
ELSE IF Age > 70 THEN Group = ‘C’;
d. All of the above will work
11. Given the following raw data and program, how many observations will be in the resulting SAS data set?
----+----1----+----2
41 25 male
32 79 female
36 52 female
74 63 male
DATA pts;
INFILE ‘c:\MyRawData\Measures.dat’;
INPUT ID Age Gender $;
IF Age < 75;
IF Age < 50 AND Gender = ‘female’ THEN
Guideline = ‘Inv4a’;
ELSE IF Age >= 50 AND Gender = ‘female’ THEN
Guideline = ‘Inv4b’;
ELSE Guideline = ‘n/a’;
RUN;
a. 1
b. 2
c. 3
d. 4
12. How many clauses are in the following SQL step?
PROC SQL;
SELECT Name, Address, Phone, Email
FROM contacts;
QUIT;
a. 1
b. 2
c. 3
d. 4
13. When creating a table using PROC SQL, which of the following clauses would select only rows that have a value greater than 10 for the column called Age?
a. WHERE Age > 10
b. IF Age > 10
c. SELECT Age > 10
d. None of the above
14. Given the following program and SAS data set ANIMALS, what will be the value of the variable DogYears for the second observation in the resulting SAS data set called DOGS?
ANIMALS
Name | Type | Breed | Age |
Mina | Canine | German Shepherd | 5 |
Bailey | Feline | Norwegian Forest | 9 |
Sammy | Canine | Shetland Sheepdog | 10 |
Taco | Canine | Terrier | 14 |
DATA dogs;
SET animals;
DogYears = Age * 7;
IF Type = ‘Canine’ THEN OUTPUT;
RUN;
a. .
b. 35
c. 63
d. 70
15. How many observations will be produced with the following program?
DATA new;
DO p = 1 TO 5;
OUTPUT;
END;
RUN;
a. 0
b. 1
c. 5
d. 6
16. Suppose that the YEARCUTOFF= option is set to 1950, and your raw data file has the following date that is read using the MMDDYY8. informat and then printed using the MMDDYY10. format. How would the resulting date appear in the output?
----+----1----+----2
01/01/1920
a. 01/01/1920
b. 01/01/1919
c. 01/01/2019
d. 01/01/2020
17. What is the SAS date value that corresponds to December 25, 1959?
a. -25
b. -7
c. 25
d. 359
18. What will be the value of Quarter in the following statement?
Quarter = QTR(MDY(04,05,2063));
a. 1
b. 2
c. 3
d. 4
19. Which type of DATA step statement can be used to initialize a variable to a specified value?
a. sum
b. RETAIN
c. Both of the above
d. Neither of the above
20. Which of the following is considered a sum statement in the DATA step?
a. X = A + B;
b. X = SUM(A,B);
c. A + B;
d. All of the above
21. The raw data file called Class.dat contains three test scores for each of two students in a class. If you submit the following SAS program, what will be the value of the variable represented by p(i) for the first observation after the second time through the iterative DO group?
----+----1----+----2----+----3
222 Jimmy 95 85 75
333 Ulric 90 80 70
DATA score;
INFILE ‘c:\MyRawData\Class.dat’;
INPUT ID Name $ Test1 Test2 Test3;
ARRAY t(3) Test1 - Test3;
ARRAY p(3) Prop1 - Prop3;
DO i = 1 TO 3;
p(i) = t(i) / 100;
END;
Total = SUM(Test1 - Test3);
RUN;
a. 0.85
b. 0.80
c. 0.75
d. 0.70
22. Referring to the preceding raw data and SAS program, what will be the value of Total for the second observation?
a. 255
b. 240
c. 160
d. 20
Short Answer
23. Discuss a situation where it would not be a good idea to overwrite a permanent SAS data set by specifying the same name in the DATA and SET statements.
24. Describe why you would not use a SET statement and an INFILE statement to refer to the same data in a DATA step.
25. Explain why the following assignment statement is incorrect for creating a numeric variable X that has a missing value.
X = ‘.’;
26. Is there a difference between calculating the mean of three variables using a function compared to calculating the mean using an assignment statement as shown in the following code? Explain your answer.
Avg1 = MEAN(X1,X2,X3);
Avg2 = (X1 + X2 + X3) / 3;
27. Would there be any advantage to using the UPCASE, LOWCASE, or PROPCASE functions when working with messy character data? Explain your answer.
28. An elementary school is holding a public fun run for children and adults as a fundraiser. Runners will start at different times based on age, and must be at least four years old. The following code classifies runners into three groups. Rewrite the code so that once a runner is assigned to a group, SAS will skip the rest of the statements. In addition, make sure that anyone who does not fit into one of the age groups or has a missing value for age is assigned to a fourth group of entrants who require follow-up.
** Assign runners to groups 1-3 based on age;
IF 4 <= Age < 9 THEN Group = 1;
IF 9 <= Age < 13 THEN Group = 2;
IF Age >= 13 THEN Group = 3;
29. The following portion of code was used to classify patients into stroke risk groups based on their smoking status and blood pressure measurements. Rewrite the code so that it is less repetitive and will keep SAS from checking every condition for every observation. In addition, make sure that patients who fall into more than one group, based on their systolic blood pressure and diastolic blood pressure, will be placed in the group with the highest risk. Add code that will create an unknown risk group for patients with any data that do not fall into the specified ranges.
** for smokers;
IF Smoke > 0 AND (0 < Sbp < 120 AND 0 < Dbp < 80)
THEN Risk = ‘Medium’;
IF Smoke > 0 AND (120 <= Sbp < 140 OR
80 <= Dbp < 90)
THEN Risk = ‘High’;
IF Smoke > 0 AND (Sbp >= 140 OR Dbp >= 90)
THEN Risk = ‘Severe’;
** for non-smokers;
IF Smoke = 0 AND (0 < Sbp < 120 AND 0 < Dbp < 80)
THEN Risk = ‘Low’;
IF Smoke = 0 AND (120 <= Sbp < 140 OR
80 <= Dbp < 90)
THEN Risk = ‘Medium’;
IF Smoke = 0 AND (Sbp >= 140 OR Dbp >= 90)
THEN Risk = ‘High’;
30. Suppose that you have an extremely large data set that contains banking transaction records for branches across the United States, with the majority of records coming from the northeastern states. Your task is to group the records into regions based on the state where the transaction occurred. Discuss how you can accomplish this grouping as efficiently as possible.
31. Describe one potential pitfall of using an ELSE statement instead of an ELSE IF statement.
32. How could the following code be rewritten so that it is more efficient? Explain why this might be important with a very large data set and then rewrite the DATA step.
DATA mtn;
INFILE ‘c:\MyRawData\UnitedStates.dat’;
INPUT State $ Pop2000 Pop2010;
IF State IN (‘Arizona’,’Colorado’,’Idaho’,
‘Montana’,’Nevada’,
‘New Mexico’,’Utah’,’Wyoming’)
THEN Region = ‘Mountain’;
PopDiff = Pop2010 - Pop2000;
IF Region = ‘Mountain’;
RUN;
33. In the following code, the observations will have a value of missing for the variable HeightCM. Explain why, and how to fix this problem.
DATA femaleheight;
SET height;
IF Gender = ‘Female’ THEN OUTPUT;
HeightCM = HeightIN * 2.54;
RUN;
34. In the following code, the data for the 7th, 14th, 21st, and 28th observations in the DETAIL data set are very similar to the resulting observations in the SUMMARY data set. Explain any differences for these observations between the two data sets and why these differences occur.
DATA summary detail;
DO Weeks = 1 TO 4;
DO Days = 1 TO 7;
Count + 1;
OUTPUT detail;
END;
OUTPUT summary;
END;
RUN;
35. Explain the difference between a DO statement and an iterative DO statement.
36. In the following code, you were expecting the data set TALLY to have 10 observations, but instead no observations were written. Explain why, and how to fix this problem.
DATA tally;
Sum = 0;
DO WHILE (Sum >= 10);
Sum + 1;
OUTPUT;
END;
RUN;
37. Explain why it is useful to use a date format when working with SAS dates.