This chapter will introduce the basics of the propensity score and focus on the process for estimating the propensity score using real world data. It is organized as follows. First, we will introduce the theoretical properties of the propensity score. Second, we will discuss best practice guidance for estimating the propensity score and provide associated SAS code. This guidance includes the selection of an appropriate statistical model for propensity score estimation, the covariates included in the estimation model, the methods to address missing covariate values, and the assessment of quality of the estimated propensity score. Based on the guidance, propensity score will be estimated for the simulated REFLECTIONS data (described in Chapter 3). The estimated propensity scores will be further used to adjust for confounding in analyzing simulated REFLECTIONS data via matching (Chapter 6), stratification (Chapter 7) and weighting (Chapter 8). Those chapters focus on the scenario of comparing two interventions and we leave the discussion on comparing multiple (>2) interventions using propensity scores to Chapter 10. For simplicity, the term “treatment” refers to the intervention whose causal effect is of research interest and the term “control” indicates the intervention that is compared to the treatment. Note also throughout this book, the terms “treatment” and “cohort” and “interventions” are used interchangeably to denote general groups of patients identified by their treatment selection or other patient factors.

In Chapter 2, we discussed the concept of using randomized experiments to assess causal treatment effects and the difficulties in estimating such effects without randomization. The existence of confounders can bias the causal treatment effect estimates in observational studies. Thus, to analyze observational data for causal treatment effects, the most important methodological challenge is to control bias due to lack of randomization. Cochran (1972) summarized three basic methods – matching, standardization and covariance adjustments via modeling – that attempt to reduce the bias due to confounders (which he termed as “extraneous variables”) in non-randomized settings and these methods set the stage for developing bias control methods in observational studies. Over the past decades, new methods have been proposed to deal with the rising challenge of analyzing more complex observational data, and the propensity score has been the foundation for many of these approaches.

In 1983, Rubin and Rosenbaum proposed the use of the propensity score in analyzing observational data to obtain unbiased causal treatment effect estimates. Since then, bias control methods based on propensity score have become widely accepted. They have been used in many research fields such as economics, epidemiology, health care and the social sciences. To define the propensity score, we introduce the following notation: let represent confounders that are measured prior to intervention initiation (referred as “baseline confounders” below), then is a vector of the value of the confounders for the th subject. Let represent the available interventions, with indicating the subject is in the treated group and meaning the subject in the control group. For the th subject, the propensity score is the conditional probability of being in the treated group given their measured baseline confounders,

Intuitively, conditioning on the propensity score, each subject has the same chance of receiving treatment. Thus, propensity score is a tool to mimic randomization when randomization is not available.

Like other statistical methods, the validity of the propensity scoring methods is not without assumptions. For causal inference using the propensity score, the following assumptions are necessary:

● Stable Unit Treatment Value Assumption (SUTVA): the potential outcomes (see Chapter 2) for any subject do not vary with the intervention assigned to other subjects, and, for each subject, there are no different forms or versions of each intervention level, which lead to different potential outcomes.

● Positivity: the probability of assignment to either intervention for each subject is strictly between 0 and 1.

● Unconfoundedness: the assignment to treatment for each subject is independent of the p otential outcomes, given a set of pre-intervention covariates.

If these assumptions hold, then the propensity score is a balancing score, which means the treatment assignment is independent of the potential outcome, given the propensity score. Conditioning on the propensity score, the distributions of measured baseline confounders are similar between treatment and control groups. However, unless in a randomized clinical trial, the true propensity score of a subject is unknown. Thus, if the researchers plan to use propensity score to control for bias when estimating causal treatment effects, proper estimation of propensity score is critical. For the remainder of this chapter, we will discuss the key considerations in estimating propensity scores, along with SAS code for implementation.

4.2 Estimate Propensity Score

In this section, we discuss four important issues in estimating propensity scores, (1) selection of covariates in the estimation model; (2) addressing missing covariates values; (3) selection of an appropriate modeling approach; (4) assessment of quality of the estimated propensity score. Keep in mind that the purpose of using a propensity score in observational studies is to create the balance in distributions of the baseline confounders between interventions, so that estimating the causal treatment effect can be similar to randomized clinical trials. A “good” propensity score estimate should always induce balance in baseline confounders between treatment and control groups. In section 4.2.4, we will discuss the standard of “good” propensity scores in a more formal way by introducing statistical approaches for assessing the quality of propensity scores.

4.2.1 Selection of Covariates

As the true propensity score of each subject is unknown in any observational study, in practice, models are always used to estimate the propensity score. The selection of which covariates to include in the estimation models is an important step. First, the unconfoundedness assumption requires all baseline confounders are identified and included appropriately. Thus, failure to include a confounder in the estimation model will most likely result in a biased estimate of the causal treatment effect. However, blindly including every possible covariate into the model might also not be a good strategy. If certain type of covariates, for instance, “colliders” (Pearl, 2000), are included, it might exacerbate the bias of the treatment effect estimate, which is contrary to the purpose of using propensity score. Ding et al. (2017) also found that instrumental variables should not be included in the propensity score estimation as including instrumental variables can increase the bias in estimating the causal treatment effect rather than reduce it. Rubin (2001) suggested that if a covariate is neither associated with the treatment selection nor the outcome, then it should not be included in the models for propensity score estimation. Notice that the candidate covariates must be measured prior to intervention initiation to ensure they were not influenced by the interventions.

In general, there are three sets of covariates that we can consider for inclusion in the estimation model:

a. Covariates that are predictive of treatment assignment

b. Covariates that are associated with the outcome variable

c. Covariates that are predictive of both treatment assignment and the outcome

Given that only variables in category c are true confounders, it might be assumed that we should follow option c for selecting variables for the estimation model. Brookhart et al. (2006) conducted simulation studies to evaluate which variables to include and their results suggested c is the “optimal” one among the three choices. However, in a letter responding to their publication (Shrier et al. 2007), the authors argue that including covariates in categories b and c has advantages. For instance, if a variable is not really a true confounder (but is strongly associated with outcome, for example, is in category b), the random imbalance seen in that variable will result in bias that could have been addressed by including the variable in the propensity score. In real data analysis, identifying which covariate belongs to category b or c can be difficult, unless researchers were accompanied with some prior knowledge on the relationship between covariates and interventions as well as between covariates and outcomes.

Directed acyclic graphs (DAGs), introduced in Chapter 2, can be a useful tool to guide the selection of covariates because a causal diagram is able to identify covariates that are prognostically important or that confound the treatment-outcome relationship (Austin and Stuart, 2015). A DAG is a graph whose nodes (vertices) are random variables with directed edges (arrows) and no directed cycles. The nodes in a DAG correspond to random variables and the edges represent the relationships between random variables, and an arrow from node A to node B can be interpreted as a direct causal effect of A on B (relative to other variables on the graph). DAGs help identify covariates one should adjust for (for example, in categories b or c above) and covariates that should NOT be included (for example, collider, covariates on causal pathway).

Figure 4.1 is a DAG created based on the simulated REFLECTIONS data analyses that will be conducted in Chapters 6 through 10. In these analyses, the interest was in estimating the causal effect of initiating opioid treatment (relative to initiating other treatments) on the change or the endpoint score in Brief Pain Inventory (BPI) pain severity scores from point of treatment initiation to one year following initiation. The covariates here were grouped into those that influence treatment selection only (insurance, region, income) and those that are confounders (influence treatment selection and outcome). Based on this DAG, the propensity score models in Section 4.3 contain all 15 of the confounding covariates (those influencing both treatment selection and the pain outcome measure).

Figure 4.1: DAG for Simulated REFLECTIONS Data Analysis

For developing a DAG, the choice of covariates should be based on expert opinion and prior research. In theory, one could use the outcome data in the current study to confirm any associations between outcome and the pre-baseline covariates. However, we suggest following the idea of “outcome-free design” in conducting observational studies, which means the researchers should avoid using any outcome data before finalizing the study design, including all analytic models.

There are other proposals in the literature for selecting covariates in estimating propensity score and we list them here for reference purposes. Rosenbaum (2002) proposed a selection method based on the significance level of difference of covariates between the two groups. He suggested including all baseline covariates on which group differences meet a low threshold for significance (for example, |t| > 1.5) in the propensity score estimation models. Imbens and Rubin (2015) developed an iterative approach of identifying covariates for the estimation model. First, covariates believed to be associated with intervention assignments according to experts’ opinion or prior evidence will be included. Second, regression models will be built separately between the intervention indicator and each of the remaining covariates. If the value of the likelihood estimate of the model exceeds a pre-specified value, then that covariate will be included.

In applied research, it may also be important to consider the impact of temporal effect in the estimation model. For instance, in a study comparing the effect of an older intervention with that of a newer intervention, subjects who entered the study in an earlier period might be more likely to receive the older intervention, whereas subjects who entered the study in a later period might be more likely to receive the newer intervention. Similarly, when a drug is first introduced on the market, physicians only try the new medication in patients who have exhausted other treatment options and then gradually introduce to a broader population later. In these cases, time does influence the intervention assignment and should be considered for the propensity model. In epidemiological research, this situation is called “channeling bias” (Petri and Urquhart 1991) and calendar time-specific propensity score methods (Mack et al. 2013, Dusetzina et al. 2013) were proposed to incorporate temporal period influence on the intervention assignment.

Hansen (2008) took a different approach than propensity score to improve the quality of causal inference in non-randomized studies by introducing the use of the prognostic score. Unlike propensity scores, whose purpose is to replicate the intervention assignment generating process, prognostic scores aim to replicate the outcome generation process. While the propensity score is a single measure of the covariates’ influence on the probability of treatment assignment, the prognostic score is based on a model of covariates’ influence on the outcome variable. Thus, to estimate the prognostic score, the model will include covariates that are highly predictive of the outcome.

The greatest strength of the propensity score is to help separating the design and analysis stages, but it is not without limitations. A recent study suggested that failure to include in the propensity score model a variable that is highly predictive of the outcome but not associated with treatment status can lead to increased bias and decreased precision in treatment effect estimates in some settings. To date, the use of prognostic score or the combination of propensity score and prognostic score still receives only limited attention. Leacy and Stuart (2014) conducted simulation studies to compare the combination use of propensity score and prognostic score versus single use of either score for matching and stratification-based analyses. Their simulation results suggested the combination use exhibited strong-to-superior performance in terms of root mean square error across all simulation settings and scenarios. Furthermore, they found “[m]ethods combining propensity and prognostic scores were no less robust to model misspecification than single-score methods even when both score models were incorrectly specified.”

Recently, Nguyen and Debray (2019) extended the use of prognostic scores with multiple intervention comparison and propose estimators for different estimands of interest and empirically verified their validity through a series of simulations. While not directly addressed further in this book, the use of prognostic scores is of potential value, and research is needed to further evaluate and provide best practices on the use of prognostic scores for causal inference in applied settings.

4.2.2 Address Missing Covariates Values in Estimating Propensity Score

In large, real world health care databases such as administrative claims databases, missing covariates values are not uncommon. As the propensity score of a subject is the conditional probability of treatment given all observed covariates, missing data for any covariates can make the propensity score estimation more challenging. To address this issue, the following methods can be considered.

The first and the simplest approach is to use only the observations without missing covariates values. This is called the complete case (CC) method. Clearly, ignoring patients with at least one missing covariate value is not a viable strategy with even moderate levels of missing data. Information from patients with any amount of missing data is ignored and one must assume the generalizability of using only a select subset of patients for the analysis. This method could result in biased estimates when the data are not missing completely at random (MCAR). Even when the data are MCAR, the complete case analysis results in reduced power.

The second way to handle missing data is to treat the missing value of each categorical variable as an additional outcome category and impute the missing value of each continuous variable with the marginal mean while adding a dummy variable to indicate it is an imputed value. However, this approach ignores the correlations among original covariate values and thus is not an efficient approach.

The third method also imputes the missing covariates values, but not by simply creating a new “missing category” or using marginal means. The method is called multiple imputation (MI), which Rubin (1978) first proposed. The key step in the MI method is to randomly impute any missing values multiple times with sampling from the posterior predictive distribution of the missing values given the observed values of the same covariate, thereby creating a series of “complete” data sets. One advantage of this method is that each “complete” data set in the imputation can be analyzed separately to estimate the treatment effect, and the pooled (averaged) treatment effect estimate can be considered as the estimate of the causal treatment effect. Another approach is to use the averaged propensity score estimates from each “complete” data set as the propensity score estimates of the subjects in the analysis. There is no consensus on which of these two approaches is more effective, as evidenced in the simulations of Hill (2004). However, a recent study (Mitra and Reiter, 2011) found the second approach would result in less biased treatment effect estimates than the first one. Therefore, we will incorporate the averaged propensity scores approach when implementing the MI method. In addition, MI procedures allow us to include variables that are not included in the estimation of propensity score and therefore might contain useful information about missing values in important covariates.

Another method is to fit separate regressions in estimation of the propensity score for each distinct missingness pattern (MP) (D’Agostino 2001). For illustrative purposes, assume there are only two confounding covariates, denoted by and. Use a binary indicator “Y/N” if the corresponding covariate value is missing/non-missing for a subject, then the possible missing patterns are shown in Table 4.1.

Table 4.1: Possible Missing Patterns

Missing Pattern	X1	X2
1	N	N
2	Y	N
3	N	Y
4	Y	Y

According to Table 4.1, if there are two covariates, for one subject, there are 4 possible missing patterns: (1) both covariate values were missing; (2 and 3) either one covariate value was missing; (4) neither of covariate values were missing. Notice these are “possible” missing patterns, which means the patterns may or may not exist in a real data analysis. To generalize, if there are n confounding covariates, then the number of possible missing patterns is 2^n.

The MP approach includes all nonmissing values for those subjects with the same missing pattern. However, as the subjects of each missing pattern is only a subgroup of the original population, the variability of estimated propensity scores increases because the number of subjects included in each propensity score model is smaller. In practice, to reduce the variability induced by the small numbers in some missing patterns, we suggest pooling the missing patterns with less than 100 subjects iteratively until the pooled missing pattern has at least 100 observations.

For reference, a much more complicated and computationally intensive approach is to jointly model the propensity score and the missingness and then use the EM/ECM algorithm (Ibrahim, et al., 1999) or Gibbs sampling (D’Agostino et al., 2000) to estimate parameters and propensity scores. Due to its complexity, we will not implement this approach in SAS.

Qu and Lipkovich (2009) combined the MI and MP methods and developed a new method called multiple imputation missingness pattern (MIMP) to estimate the propensity scores. In this approach, missing data are imputed using a multiple imputation procedure. Then, the propensity scores are estimated from a logistic regression model including the covariates (with missing values imputed) and a factor (a set of indicator variables) indicating the missingness pattern for each observation. A simulation study showed that MIMP performs as well as MI and better than MP when the missingness mechanism is either completely at random or missing at random, and it performs better than MI when data are missing not at random (Qu and Lipkovich, 2009).

In Programs 4.1 through 4.4, we provide SAS code for the MI, MP and MIMP imputation methods. These programs are similar to the code in Chapter 5 of Faries et al. (2010) but use a new SAS procedure, PROC PSMATCH, for the propensity score estimation. The code is based on the simulated REFLECTIONS data. Note that in the REFLECTIONS data, among all confounders identified by the DAG, only duration of disease (DxDur) has missing values.

Programs 4.1a and 4.1b use the MI procedure in SAS to implement multiple imputation. 100 imputed data sets are generated and PROC PSMATCH then estimates the propensity score for each imputed data set. The macro variable VARLIST contains the list of variables to be included in the later propensity score estimations. The BPIPain_LOCF variable is included in Programs 4.1a and 4.1b as an example of a variable that can be in the multiple imputation model but not the propensity model.

Program 4.1a: Multiple Imputation (MI)

**********************************************************************;

* NIMPUTE: number of imputed data, suggested minimum is 100;

* SEED: random seed in multiple imputation

**********************************************************************;

%let VARLIST=

Age BMI_B BPIInterf_B BPIPain_B CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B

PhysicalSymp_B SDS_B DXdur;

PROC MI DATA = REFL2 ROUND=.001 NIMPUTE=100 SEED=123456 OUT=DAT_MI NOPRINT;

VAR &VARLIST BPIPain_LOCF;

RUN;

PROC SORT DATA=DAT_MI;

BY _IMPUTATION_;

RUN;

PROC PSMATCH DATA = DAT_MI REGION=ALLOBS;

CLASS COHORT Gender Race Dr_Rheum Dr_PrimCare;

PSMODEL COHORT(TREATED=’OPIOID’) = &VARLIST Gender Race Dr_Rheum Dr_PrimCare;

OUTPUT OUT = DAT_PS PS = _PS_;

BY _IMPUTATION_;

RUN;

In our case, the covariate with missing values is a continuous variable. Therefore, we used the code in Program 4.1a, where one critical assumption is made: the variables in PROC MI are jointly and individually normally distributed. If there exist categorical covariates with missing values, an alternative approach is to use full conditional model in PROC MI. An attractive feature of this method is that it does not require a multivariate normal assumption. We provide the SAS code in Program 4.1b to implement this approach, assuming that the variable Gender has missing values.

Program 4.1b: Optional PROC MI Alternative for Categorical Covariates

PROC MI DATA = REFL2 ROUND=.001 NIMPUTE=100 SEED=123456 OUT=DAT_MI_FCS NOPRINT;

Class Gender;

VAR &VARLIST Gender BPIPain_LOCF;

fcs discrim(Gender /classeffects=include)

nbiter=100;

RUN;

Programs 4.2 and 4.3 provides code to implement missing pattern approach for estimating propensity scores in SAS. Program 4.2 assigns missing value patterns to the analysis data and pool missing patterns that contain less than 100 subjects. After missing patterns are assigned, program 4.3 uses PROC PSMATCH to estimate the propensity score.

Program 4.2: Assign Missing Patterns and Pool Missing Patterns with Small Number of Observations

******************************************************************************

Macro: MP_ASSIGN

Purpose: Find and create pooled missing value patterns

******************************************************************************;

* Input parameters:

* indata = input data set

* outdata = output data set

* varlist = a list of variables to be included in the propensity score

* estimation. Notice the variable type should be the same.

* M_MP_MIN = minimum number of observations for each missing pattern.

* Missing patterns with less than MIN_MP observations will be pooled.

******************************************************************************;

%MACRO MP_ASSIGN(MSDATA = , OUTDATA =, VARLIST =, N_MP_MIN = 100);

/* Determine how many variables to include in the propensity score estimation */

%LET N = 1;

%LET VARINT = ;

%DO %UNTIL(%QSCAN(&VARLIST., &N. , %STR( )) EQ %STR( ));

%LET VAR = %QSCAN(&VARLIST. , &N. , %STR( ));

%LET VARINT = &VARINT &VAR.*MP;

%LET N = %EVAL(&N. + 1);

%END;

%LET KO = %EVAL(&N-1);

%LET M_MISSING = %EVAL(&N-1);

%PUT &VARINT;

%PUT &KO;

%PUT &M_MISSING;

/* Create indicators for missing values and missingness patterns */

DATA MS;

SET &MSDATA;

ARRAY MS{&M_MISSING} M1-M&M_MISSING.;

ARRAY X{&M_MISSING} &VARLIST;

MV = 0;

DO I = 1 TO &M_MISSING;

IF X{I} = . THEN MS{I} = 1;

ELSE MS{I} = 0;

MV = 2*MV + MS{I};

END;

MV = MV + 1;

DROP I;

RUN;

/* Only keep one record for each missingness pattern */

PROC SORT DATA = MS OUT = PATTERN NODUPKEY;

BY MV;

RUN;

/* Calculate the number of observations in each missingness pattern */

PROC FREQ DATA = MS NOPRINT;

TABLES MV / OUT = M_MP(KEEP = MV COUNT);

RUN;

DATA PATTERN;

MERGE PATTERN M_MP;

BY MV;

RUN;

PROC SORT DATA = PATTERN;

BY DESCENDING COUNT;

RUN;

/* Assign missingness pattern to new index from the largest to the smallest */

DATA PATTERN;

RETAIN M1-M&M_MISSING MV COUNT MV_S;

SET PATTERN;

KEEP M1-M&M_MISSING MV COUNT MV_S;

MV_S = _N_;

RUN;

PROC IML;

USE PATTERN;

READ ALL INTO A;

CLOSE PATTERN;

MS = A[, 1:&M_MISSING];

MV = A[, 1+&M_MISSING];

N_MP = A[, 2+&M_MISSING];

MV_S = A[, 3+&M_MISSING];

M_MP = NROW(MS);

M = NCOL(MS);

/* Calculate the distance between missingness patterns */

DISTANCE = J(M_MP, M_MP, 0);

DO I = 1 TO M_MP;

DO J = 1 TO I-1;

D = 0;

DO L = 1 TO M;

D = D + ( (MS[I,L]-MS[J,L])*(MS[I,L]-MS[J,L]) );

END;

DISTANCE[I,J] = D;

DISTANCE[J,I] = D;

END;

I = 0;

K_MV_POOL = 0;

MV_POOL = J(M_MP, 1, 0);

/*Pooling small missingness patterns according to their similarities to reach a prespecified minimum number of observations (&N_MP_MIN) in each pattern */

DO WHILE( I < M_MP);

I = I + 1;

IF MV_POOL[I] = 0 THEN

DO;

K_MV_POOL = K_MV_POOL + 1;

N_MP_POOL = N_MP[I];

IF N_MP_POOL >= &N_MP_MIN THEN

DO;

MV_POOL[I] = K_MV_POOL;

END;

ELSE

DO;

IF I < M_MP THEN

DO;

A = DISTANCE[(I+1):M_MP, I];

B = MV[(I+1):M_MP];

C = N_MP[(I+1):M_MP];

D = MV_S[(I+1):M_MP];

E = MV_POOL[(I+1):M_MP];

TT = A || B || C || D || E;

CALL SORT( TT, {1 3});

J = 0;

DO WHILE( (N_MP_POOL < &N_MP_MIN) & (I+J < M_MP) );

J = J+1;

IF (TT[J,5] = 0) THEN

DO;

N_MP_POOL = N_MP_POOL + TT[J,3];

TT[J,5] = K_MV_POOL;

END;

IF ( N_MP_POOL >= &N_MP_MIN ) THEN

DO;

MV_POOL[I] = K_MV_POOL;

DO K = 1 TO J;

MV_POOL[TT[K,4]] = K_MV_POOL;

END;

ELSE

DO J = I TO M_MP;

SGN_TMP = 0;

K = 1;

DO WHILE(SGN_TMP = 0 & K <= M_MP);

DO L = 1 TO M_MP;

IF (DISTANCE[J,L] = K) & (MV_POOL[J]=0) &

(MV_POOL[L]>0) THEN

DO;

MV_POOL[J] = MV_POOL[L];

SGN_TMP = 1;

END;

K = K + 1;

END;

MV_FINAL = MV || MV_POOL;

VARNAMES={‘MV’ ‘MV_POOL’};

CREATE MVPOOL FROM MV_FINAL[COLNAME=VARNAMES];

APPEND FROM MV_FINAL;

QUIT;

PROC SORT DATA = MVPOOL;

BY MV;

RUN;

PROC SORT DATA = MS;

BY MV;

RUN;

/* The variable MVPOOL in the &OUTDATA set indicates the pooled missingness

pattern */

DATA &OUTDATA(RENAME=(MV=MP_ORIG MV_POOL=MP));

MERGE MS MVPOOL;

BY MV;

RUN;

%MEND MP_ASSIGN;

Program 4.3: The Missingness Pattern (MP) Imputation

************************************************************************

* MISSINGNESS PATTERN (MP) METHOD *

* This macro uses Proc PSMATCH to estimate propensity scores using the *

* missing pattern approach. This code calls the macro MP_ASSIGN *

* (Program 4.2) which produces the dataset DAT_MP with the pooled *

* missing patterns. *

************************************************************************;

%let VARLIST=

Age BMI_B BPIInterf_B BPIPain_B CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B

PhysicalSymp_B SDS_B DXdur;

%MP_ASSIGN(MSDATA = REFL2, OUTDATA = DAT_MP, VARLIST = &VARLIST, N_MP_MIN = 100);

PROC MEANS DATA = DAT_MP NOPRINT;

VAR &VARLIST;

OUTPUT OUT = MN MEAN = XM1-XM12;

BY MP;

RUN;

DATA TEMP;

MERGE DAT_MP MN;

BY MP;

RUN;

DATA TEMP;

SET TEMP;

ARRAY X{12} &VARLIST;

ARRAY XM{12} XM1-XM12;

DO I = 1 TO 12;

IF X{I} = . THEN X{I} = XM{I};

END;

DROP I;

RUN;

PROC SORT DATA = TEMP;

BY MP;

RUN;

PROC PSMATCH DATA = TEMP REGION=ALLOBS;

CLASS COHORT Gender Race Dr_Rheum Dr_PrimCare;

PSMODEL COHORT(TREATED=’OPIOID’) = Gender Race Dr_Rheum Dr_PrimCare

Age BMI_B BPIInterf_B BPIPain_B CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B

PhysicalSymp_B SDS_B DXdur;

OUTPUT OUT = DAT_PS_MP PS = _PS_;

WHERE MP=1;

RUN;

PROC PSMATCH DATA = TEMP REGION=ALLOBS;

CLASS COHORT Gender Race Dr_Rheum Dr_PrimCare;

PSMODEL COHORT(TREATED=’OPIOID’) = Gender Race Dr_Rheum Dr_PrimCare

Age BMI_B BPIInterf_B BPIPain_B CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B

PhysicalSymp_B SDS_B;

OUTPUT OUT = DAT_PS_MP2 PS = _PS_;

WHERE MP=2;

RUN;

DATA DAT_PS_MP;

SET DAT_PS_MP1 DAT_PS_MP2;

RUN;

Programs 4.2 and 4.4 allow implementation of the MIMP approach for propensity score estimation. After missing patterns were created using Program 4.2, Program 4.4 uses PROC MI to impute missing covariates values and PROC PSMATCH to estimate the propensity score. Note the variable MP in the PSMODEL statement, which is the key to implementing this MIMP approach.

Program 4.4: Multiple Imputation Missing Pattern (MIMP) Imputation

**********************************************************************;

* Multiple Imputation Missingness Pattern (MIMP) Method;

**********************************************************************;

PROC MI DATA = DAT_MP ROUND=.001 NIMPUTE=100 SEED=123456 OUT=DAT_MIMP NOPRINT;

VAR &VARLIST BPIPain_LOCF;

RUN;

PROC PSMATCH DATA = DAT_MIMP REGION=ALLOBS;

CLASS COHORT MP GENDER RACE;

PSMODEL COHORT(TREATED=’OPIOID’) = &VARLIST MP;

OUTPUT OUT = DAT_PS_MIMP PS = _PS_;

BY _IMPUTATION_;

RUN;

4.2.3 Selection of Propensity Score Estimation Model

Once the covariates have been selected and methods for addressing any missing covariate data have been applied, several statistical models can be used to estimate the propensity scores. The most common approach has been the use of logistic regression to model the binary intervention (treated or control) selection as a function of the measured covariates:

Where is the propensity score of the th subject, () represents a vector of values of the observed covariates of the th subject, and is a normally distributed error term. Notice, it is a simplified model that only contains main effect of the covariates, but interaction terms could be added. Furthermore, nonparametric models could also be used. In this section, we will introduce three different modeling approaches for estimating propensity score and provide SAS code for implementation – a priori logistic regression modeling, automatic parametric model selection, and nonparametric modeling.

A Priori Logistic Regression Model

The first approach is to fit a logistic regression model a priori, that is, identify the covariates in the model and fix the model before estimating the propensity score. The main advantage of an a priori model is that it allows researchers to incorporate knowledge external to the data into the model building. For example, if there is evidence that a covariate is correlated to the treatment assignment, then this covariate should be included in the model even if the association between this covariate and the treatment is not strong in the current data. In addition, the a priori model is easy to interpret. The DAG approach could be very informative in building a logistic propensity score model a priori, as it clearly points out the relationship between covariates and interventions. The correlation structure between each covariate and the intervention selection is pre-specified and in a fixed form. However, one main challenge of the a priori modeling approach is that it might not provide the optimal balance between treatment and control groups.

To build an a priori model for propensity score estimation in SAS, we can use either PROC PSMATCH or PROC LOGISTIC as shown in Program 4.5. In both cases, the input data set is a one observation per patient data set containing the treatment and baseline covariates (from the simulated REFLECTIONS study – see Chapter 3). Also, in both cases the code will produce an output data set containing the original data set with the additional estimated propensity score for each patient (_ps_).

Program 4.5: Propensity Score Estimation: A priori Logistic Regression

PROC PSMATCH DATA=REFL2 REGION=ALLOBS;

CLASS COHORT GENDER RACE DR_RHEUM DR_PRIMCARE;

PSMODEL COHORT(TREATED=’OPIOID’)= GENDER RACE AGE BMI_B BPIINTERF_B BPIPAIN_B

CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B PHYSICALSYMP_B SDS_B DR_RHEUM

DR_PRIMCARE;

OUTPUT OUT=PS PS=_PS_;

RUN;

PROC LOGISTIC DATA=REFL2;

CLASS COHORT GENDER RACE DR_RHEUM DR_PRIMCARE;

MODEL COHORT = GENDER RACE AGE BMI_B BPIINTERF_B BPIPAIN_B CPFQ_B FIQ_B GAD7_B

ISIX_B PHQ8_B PHYSICALSYMP_B SDS_B DR_RHEUM DR_PRIMCARE;

OUTPUT OUT=PS PREDICTED=PS;

RUN;

Before building a logistic model in SAS, we suggest examining the distribution of the intervention indicator at each level of the categorical variable to rule out the possibility of “complete separation” (or “perfect prediction”), which means that for subjects at some level of some categorical variable, they would all receive one intervention but not the other. Complete separation can occur for several reasons and one common example is when using several categorical variables whose categories are coded by indicators. When the logistic regression model is fit, the estimate of the regression coefficients s is based on the maximum likelihood estimation, and MLEs under logistic regression modeling do not have a closed form. In other words, the MLE cannot be written as a function of and . Thus, the MLE of s are obtained using some numerical analysis algorithms such as the Newton-Raphson method. However, if there is a covariate that can completely separate the interventions, then the procedure will not converge in SAS. If PROC LOGISTIC was used, the following warning message will be issued.

WARNING: There is a complete separation of data points. The maximum likelihood estimate does not exist.

WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.

Notice that SAS will continue to finish the computation despite issuing warning messages. However, the estimate of such s are incorrect, and so are the estimated propensity scores. If after examining the intervention distribution at each level of the categorical variables complete separation is found, then efforts should be made to address this issue. One possible solution is to collapse the categorical variable causing the problem. That is, combine the different outcome categories such that the complete separation no longer exists. Another possible solution is to use Firth logistic regression. It uses a penalized likelihood estimation method. Firth bias-correction is considered an ideal solution to the separation issue for logistic regression (Heinze and Schemper, 2002). In PROC LOGISTIC, we can add an option to run the Firth logistic regression as shown in Program 4.6.

Program 4.6: Firth Logistic Regression

PROC LOGISTIC DATA=REFL2;

CLASS COHORT GENDER RACE DR_RHEUM DR_PRIMCARE;

MODEL COHORT = GENDER RACE DR_RHEUM DR_PRIMCARE BPIInterf_B BPIPain_B

CPFQ_B FIQ_B GAD7_B ISIX_B PHQ8_B PhysicalSymp_B SDS_B / FIRTH;

OUTPUT OUT=PS PREDICTED=PS;

RUN;

Automatic Parametric Model Selection

A second parametric approach to estimate the propensity score is the use of an automated model building process that ensures the balance of confounders between interventions. The idea originated from Rubin and Rosenbaum (Rubin and Rosenbaum, 1984) and was later developed and proposed in detail by Dehejia and Wahba (Dehejia and Wahba, 1998, 1999). This approach also has broad application in other areas such as psychology (Stuart, 2003) and economics (Marco and Kopeinig, 2008). It is an iterative approach using logistic regression and we suggest using the following steps for implementation:

1. Estimate propensity scores using a logistic regression model that included the treatment indicator as the dependent variable and other measured covariates as explanatory variables. No interaction terms or high-order terms of those covariates are included at this step.

2. Order the estimated propensity score in step 1 from low to high, then divide the propensity scores into strata such that each stratum holds an approximately equal number of treated individuals. Some studies (Stuart, 2003) suggested five strata would be a reasonable choice to avoid too few comparison subjects within a stratum.

3. Evaluate the balance of the measured covariates and all of their two-way interaction-terms within each stratum from step 2. Balance can be quantified using the standardized bias (or standardized mean difference, see Chapter 5). For continuous covariates, the standardized bias is the difference in means of the covariate between the treated group and the comparison group divided by the standard deviation of the treatment group. For categorical covariates, at each level of the categorical covariate, the standardized bias is the difference in proportions of each level of the measured covariate divided by the standard deviation in the treatment group. To be more precise:

◦ For interactions between continuous covariates, create a new variable that is the product of the two variables.

◦ For interactions between a categorical and a continuous covariate, calculate the standardized difference per level of the categorical variable.

◦ For interactions between two categorical covariates A and B we calculate the following:

- For each level of A, the difference in proportions for each level of B divided by the standard deviation in the treatment group.

- For each level of B, the difference in proportions for each level of A divided by the standard deviation in the treatment group.

A covariate is considered “balanced” if the standardized bias is less than 0.25. Although some suggest a stricter threshold, < 0.1, which is preferred in propensity score-based analysis. If the standardized bias for an interaction is above 0.25 across two or more strata, then that interaction term is considered “imbalanced.”

4. After adding each interaction term to the model separately, keep the one that reduces the number of imbalanced interaction terms the most (in other words, improve the balance the most). Then fit all remaining interaction terms again separately, repeat steps 2 and 3, and again add the one that improves balance most. Repeat until no more improvement.

Program 4.9 provides the macros to implement the automatic model selection for estimating propensity score in SAS. For preparation, Program 4.7 provides a macro to automatically create binary indicator for all categorical variables specified in Programs 4.9 and 4.10. Notice that Program 4.7 will create (n-1) dummy binary variables for a categorical variable with n categories if this variable is a main effect term in the logistic regression model, and will create n dummy binary variables if this variable is part of an interaction term in the logistic model. Program 4.8 provides a macro to calculate the standardized bias of the covariates and their interaction terms included in the model. This macro will be called in program 4.8.

Program 4.7: Macro to Create a Binary Indicator for Multi-categorical Variables

***************************************************************************

Macro: _ps_indic

****************************************************************************;

%MACRO _ps_indic (in =, out =, full = NO);

PROC CONTENTS DATA = &in (KEEP = &classvars) NOPRINT OUT = _cont; RUN;

DATA _NULL_;

SET _cont (KEEP = name label type format) END = last;

CALL SYMPUT(COMPRESS(‘_cvar’||PUT(_N_, BEST.)), TRIM(LEFT(name)));

IF label ^= ‘’ THEN

CALL SYMPUT(COMPRESS(‘_clab’||PUT(_N_, BEST.)), TRIM(LEFT(label)));

ELSE CALL SYMPUT(COMPRESS(‘_clab’||PUT(_N_, BEST.)),

TRIM(LEFT(name)));

CALL SYMPUT(COMPRESS(‘_ctype’||PUT(_N_, BEST.)), type);

CALL SYMPUT(COMPRESS(‘_cfmt’||PUT(_N_, BEST.)), format);

IF last THEN

CALL SYMPUT(‘_ncvar’, COMPRESS(PUT(_n_, BEST.))); RUN;

%* Get the number of categorical covariates (_nnvar) and name and label

of separate categorical covariatest (_nvar# and nlab#).*;

%LET classvars_bin =;

DATA &out;

SET &in; RUN;

%DO iloop = 1 %TO &_ncvar;

%* Create indicator (0/1) variables for all categorical

covariates and put their names in macro var CLASSVARS_BIN *;

PROC SQL;

CREATE TABLE _cvar&iloop AS SELECT DISTINCT

TRIM(LEFT(&&_cvar&iloop)) AS &&_cvar&iloop FROM &in WHERE NOT

MISSING(&&_cvar&iloop);

QUIT;

%IF %SUBSTR(%QUPCASE(&full), 1, 1) = Y %THEN

%LET _n&iloop = &sqlobs;

%ELSE %LET _n&iloop = %EVAL(&sqlobs - 1);

DATA _NULL_;

SET _cvar&iloop;

%IF &&_ctype&iloop = 2 %THEN

%DO;

%IF %BQUOTE(&&_cfmt&iloop) ^= %THEN

CALL SYMPUT (‘_vlab_’||COMPRESS(PUT(_N_, BEST.)),

“&&_clab&iloop “||TRIM(LEFT(PUT(&&_cvar&iloop,

&&_cfmt&iloop...))));

%ELSE CALL SYMPUT (‘_vlab_’||COMPRESS(PUT(_N_, BEST.)),

“&&_clab&iloop “||TRIM(LEFT(&&_cvar&iloop)));

;

%END;

%ELSE

%DO;

%IF %BQUOTE(&&_cfmt&iloop) ^= %THEN

CALL SYMPUT (‘_vlab_’||COMPRESS(PUT(_N_, BEST.)),

“&&_clab&iloop “||TRIM(LEFT(PUT(&&_cvar&iloop,

&&_cfmt&iloop...))));

%ELSE CALL SYMPUT (‘_vlab_’||COMPRESS(PUT(_N_, BEST.)),

“&&_clab&iloop “||TRIM(LEFT(PUT(&&_cvar&iloop, BEST.))));

;

%END; RUN;

PROC TRANSPOSE DATA = _cvar&iloop OUT = _cvar&iloop;

VAR &&_cvar&iloop; RUN;

DATA &out;

IF _N_ = 1 THEN SET _cvar&iloop;

SET &out;

%DO jloop = 1 %TO &&_n&iloop;

%LET classvars_bin = &classvars_bin &&_cvar&iloop.._&jloop;

IF &&_cvar&iloop = col&jloop THEN

&&_cvar&iloop.._&jloop = 1;

ELSE IF NOT MISSING(&&_cvar&iloop) THEN

&&_cvar&iloop.._&jloop = 0;

%LET _label&iloop._&jloop = &&_vlab_&jloop;

LABEL &&_cvar&iloop.._&jloop = “&&_vlab_&jloop”;

DROP col&jloop;

%END;

DROP _name_ %IF %SUBSTR(%QUPCASE(&full), 1, 1) ^= Y %THEN

col%EVAL(&&_n&iloop + 1);;

RUN;

%END;

%MEND _ps_indic;

Program 4.8: Macro to Calculate the Standardized Bias

****************************************************************************

Macro: _ps_stddiff_apmb

****************************************************************************;

%MACRO _ps_stddiff_apmb (indata = , interactions = YES);

%_ps_indic(in = &indata, out = _indata_int, full = YES);

%* Get the number of binary categorical covariates as well as their

separate names *;

DATA _NULL_;

vars = “&classvars_bin”;

i = 1;

var = SCAN(vars, i);

DO WHILE (var ^= ‘’);

CALL SYMPUT(‘_cvar’||COMPRESS(PUT(i, BEST.)), TRIM(LEFT(var)));

i + 1;

var = SCAN(vars, i);

END;

CALL SYMPUT(‘_ncvar’, COMPRESS(PUT(i - 1, BEST.))); RUN;

%* Create interaction variables for continuous covariates *;

PROC CONTENTS DATA = _indata_int (KEEP = &contvars) NOPRINT OUT = _cont;

RUN;

DATA _NULL_;

SET _cont (KEEP = name label) END = last;

CALL SYMPUT(COMPRESS(‘_nvar’||PUT(_N_, BEST.)), TRIM(LEFT(name)));

IF label ^= ‘’ THEN

CALL SYMPUT(COMPRESS(‘_nlab’||PUT(_N_, BEST.)), TRIM(LEFT(label)));

ELSE CALL SYMPUT(COMPRESS(‘_nlab’||PUT(_N_, BEST.)), TRIM(LEFT(name)));

IF last THEN CALL SYMPUT(‘_nnvar’, COMPRESS(PUT(_n_, BEST.)));

RUN;

%LET interactionscont =;

DATA _indata_int;

SET _indata_int;

%DO contloop = 1 %TO %EVAL(&_nnvar - 1);

%DO contloop2 = %EVAL(&contloop + 1) %TO &_nnvar;

int_n&contloop._n&contloop2 = &&_nvar&contloop * &&_nvar&contloop2;

LABEL int_n&contloop._n&contloop2 = “&&_nlab&contloop *

&&_nlab&contloop2”;

%LET interactionscont = &interactionscont int_n&contloop._n&contloop2;

%END;

RUN;

PROC FORMAT;

VALUE $cont

%DO iloop = 1 %TO &_nnvar;

“n&iloop” = “&&_nvar&iloop”

%END;

;

RUN;

%* Get the number of interactions between continuous covariates as well

as their separate names *;

DATA _NULL_;

vars = “&interactionscont”;

i = 1;

var = SCAN(vars, i);

DO WHILE (var ^= ‘’);

CALL SYMPUT(‘_nint’||COMPRESS(PUT(i, BEST.)), TRIM(LEFT(var)));

i + 1;

var = SCAN(vars, i);

END;

CALL SYMPUT(‘_nnint’, COMPRESS(PUT(i - 1, BEST.))); RUN;

%* Calculate Standardized Bias for continuous covariates and

interactions between continuous variables;

PROC SUMMARY DATA = _indata_int (WHERE = (_strata_ ^= .)) NWAY;

CLASS _strata_ _cohort;

VAR &contvars &interactionscont;

OUTPUT OUT = _mean MEAN = STD = /AUTONAME; RUN;

PROC TRANSPOSE DATA=_mean (DROP=_type_ _freq_) OUT=_mean_t PREFIX=trt_;

BY _strata_;

ID _cohort; RUN;

PROC SORT DATA = _mean_t;

BY _strata_ _name_; RUN;

DATA _mean;

LENGTH _label_ $200;

MERGE _mean_t;

BY _strata_ _name_;

_stat = SCAN(_name_, -1, ‘_’);

IF UPCASE(_stat) = ‘MEAN’ THEN _statn = 1;

ELSE _statn = 3;

_name_ = REVERSE(SUBSTR(REVERSE(_name_), INDEX(REVERSE(_name_), ‘_’)

+ 1)); RUN;

PROC SORT DATA = _mean;

BY _strata_ _name_ _statn; RUN;

DATA _stddiff;

SET _mean;

BY _strata_ _name_ _statn;

RETAIN stddiff;

IF UPCASE(_stat) = ‘MEAN’ THEN

DO;

stddiff = trt_1 - trt_0;

END;

ELSE IF UPCASE(_stat) = ‘STDDEV’ THEN

DO;

stddiff = stddiff / trt_1;

END;

IF LAST._name_; RUN;

DATA _stddiff;

LENGTH variable1 variable2 $32;

SET _stddiff;

IF UPCASE(_name_) =: ‘INT_’ THEN

DO;

variable1 = UPCASE(PUT(SCAN(_name_, 2, ‘_’), $cont.));

variable2 = UPCASE(PUT(SCAN(_name_, 3, ‘_’), $cont.));

END;

ELSE variable1 = _name_;

IF variable1 ^= ‘’;

KEEP variable1 variable2 stddiff _strata_; RUN;

%* Now for every (binary) categorical covariate we calculate per strata the standardized bias for the covariate and for all interactions between the covariate and continuous covariates and all levels of all other categorical covariates;

DATA _mean;

STOP; RUN;

DATA _meancont;

STOP; RUN;

DATA _meanclass;

STOP; RUN;

%DO iloop = 1 %TO &_ncvar;

PROC SUMMARY DATA = _indata_int (WHERE = (_strata_ ^= .)) NWAY;

CLASS _strata_ _cohort;

VAR &&_cvar&iloop;

OUTPUT OUT = _mean0 MEAN = mean /AUTONAME; RUN;

DATA _mean;

LENGTH variable1 $32;

SET _mean _mean0 (IN = in);

IF in THEN

variable1 = UPCASE(“&&_cvar&iloop”); RUN;

PROC SUMMARY DATA = _indata_int (WHERE = (_strata_ ^= .)) NWAY;

WHERE &&_cvar&iloop;

CLASS _strata_ _cohort;

VAR &contvars;

OUTPUT OUT = _mean1 MEAN = STD = /AUTONAME; RUN;

DATA _meancont;

LENGTH variable1 $32;

SET _meancont _mean1 (IN = in);

IF in THEN

variable1 = UPCASE(“&&_cvar&iloop”); RUN;

PROC SUMMARY DATA = _indata_int (WHERE = (_strata_ ^= .)) NWAY;

WHERE &&_cvar&iloop;

CLASS _strata_ _cohort;

VAR &classvars_bin;

OUTPUT OUT = _mean2 MEAN =; RUN;

DATA _meanclass;

LENGTH variable1 $32;

SET _meanclass _mean2 (IN = in);

IF in THEN

variable1 = UPCASE(“&&_cvar&iloop”); RUN;

%END;

PROC SORT DATA = _meancont;

BY variable1 _strata_; RUN;

PROC TRANSPOSE DATA = _meancont (DROP = _type_ _freq_) OUT =

_meancont_t PREFIX = trt_;

BY variable1 _strata_;

ID _cohort; RUN;

PROC SORT DATA = _meancont_t;

BY variable1 _strata_ _name_; RUN;

DATA _meancont;

SET _meancont_t;

_stat = SCAN(_name_, -1, ‘_’);

IF UPCASE(_stat) = ‘MEAN’ THEN _statn = 1;

ELSE _statn = 3;

_name_ = REVERSE(SUBSTR(REVERSE(_name_), INDEX(REVERSE(_name_), ‘_’) + 1));

RUN;

PROC SORT DATA = _meancont;

BY variable1 _strata_ _name_ _statn;

RUN;

DATA stddiff1_ (RENAME = (_name_ = variable2));

SET _meancont;

BY variable1 _strata_ _name_ _statn;

RETAIN stddiff;

IF UPCASE(_stat) = ‘MEAN’ THEN

DO;

stddiff = trt_1 - trt_0;

END;

ELSE IF UPCASE(_stat) = ‘STDDEV’ THEN

DO;

IF trt_1 ^= 0 THEN stddiff = stddiff / trt_1;

ELSE stddiff = .;

END;

IF LAST._name_;

KEEP variable1 _name_ stddiff _strata_; RUN;

PROC SORT DATA = _mean;

BY variable1 _strata_; RUN;

PROC TRANSPOSE DATA=_mean (DROP=_type_ _freq_) OUT=_mean_t PREFIX=trt_;

BY variable1 _strata_;

ID _cohort; RUN;

PROC SORT DATA = _mean_t;

BY variable1 _strata_; RUN;

DATA stddiff0_;

SET _mean_t;

trt_1 = FUZZ(trt_1);

var1 = trt_1 * ( 1 - trt_1);

IF var1 ^= 0 AND trt_0 NOT IN (0 1) THEN

stddiff = (trt_1 - trt_0) / SQRT(var1);

KEEP variable1 stddiff _strata_; RUN;

PROC SORT DATA = _meanclass;

BY variable1 _strata_;

RUN;

PROC TRANSPOSE DATA = _meanclass (DROP = _type_ _freq_) OUT =

_meanclass_t PREFIX = trt_;

BY variable1 _strata_;

ID _cohort; RUN;

PROC SORT DATA = _meanclass_t;

BY variable1 _strata_ _name_; RUN;

DATA stddiff2_ (RENAME = (_name_ = variable2));

SET _meanclass_t;

trt_1 = FUZZ(trt_1);

var1 = trt_1 * ( 1 - trt_1);

IF var1 ^= 0 AND trt_0 NOT IN (0 1) THEN

stddiff = (trt_1 - trt_0) / SQRT(var1);

KEEP variable1 _name_ stddiff _strata_; RUN;

DATA _stddiff;

SET _stddiff stddiff0_ (IN = in0) stddiff1_ (IN = in1) stddiff2_(IN = in2);

IF in0 THEN

DO;

vartype2 = ‘ ‘;

variable1 = UPCASE(variable1);

vartype1 = ‘C’;

END;

IF in1 THEN

DO;

variable2 = UPCASE(variable2);

_var = UPCASE(REVERSE(variable1));

IF NOT(variable2 =: REVERSE(SUBSTR(_var, INDEX(_var, ‘_’) + 1)));

vartype2 = ‘ ‘;

variable1 = UPCASE(variable1);

vartype1 = ‘C’;

END;

IF in2 THEN

DO;

variable2 = UPCASE(variable2);

_var = UPCASE(REVERSE(variable1));

IF NOT(variable2 =: REVERSE(SUBSTR(_var, INDEX(_var, ‘_’) + 1)));

vartype2 = ‘C’;

variable1 = UPCASE(variable1);

vartype1 = ‘C’;

END;

KEEP variable1 variable2 stddiff _strata_ vartype1 vartype2; RUN;

%endmac:

%MEND _ps_stddiff_apmb;

Program 4.9: Automatic Propensity Score Estimation Model

*************************************************************************

* Macro Name: PS_CALC_APMB *

* Parameters: *

* INDATA Name of the input dataset containing propensity scores *

* OUTDATA Name of the output dataset [&indata] *

* COHORT Name of variable containing the cohort/treatment variable*

* TREATED A string with the value of &COHORT denoting treated *

* patients

* CONTVARS List of continuous covariates to include in this table *

* CLASSVARS List of categorical covariates to include in this table *

* PS Name of variable to contain the propensity scores *

* CRIT The criteria to be used to consider an interaction term *

* balanced *

* MAXITER Maximum number of iterations allowed [800] *

* NSTRATA Number of strata [5] *

* IMBAL_STRATA_CRIT The criteria to be used to consider a strata *

* balanced *

* IMBAL_NSTRATA_CRIT Minimum number of imbalanced strata for a term *

* to be considered imbalanced. *

* ENTRY_NSTRATA_CRIT Minimum number of imbalanced strata for an *

* interaction term to be considered for entry into the model. *

* ALWAYS_INT Interaction term that are to always be included into the *

* model *

* DEBUG To reduce the amount of information written to the SAS log, *

* this parameter is set to NO by default [NO] | YES. *

************************************************************************;

%MACRO ps_calc_apmb (

indata = ,

outdata = ,

cohort = _cohort,

contvars = ,

classvars = ,

ps = ps,

treated = _treated,

n_mp_min = 100,

maxiter = 800,

imbal_strata_crit = 0.25,

imbal_nstrata_crit = 2,

entry_nstrata_crit = 2,

nstrata = 5,

debug = NO,

always_int =

);

%PUT Now executing macro %UPCASE(&sysmacroname);

%_ps_util;

%LET _notes = %SYSFUNC(GETOPTION(NOTES));

%LET _mprint = %SYSFUNC(GETOPTION(MPRINT));

%LET _mlogic = %SYSFUNC(GETOPTION(MLOGIC));

%LET _symbolgen = %SYSFUNC(GETOPTION(SYMBOLGEN));

ODS HTML CLOSE;

ODS LISTING CLOSE;

%* Checking parameter specifications;

%IF %BQUOTE(&indata) = %THEN

%DO;

%PUT %STR(ER)ROR: No input dataset (INDATA) has been specified!

Execution of this macro will be stopped!;

%LET _errfound = 1;

%GOTO endmac;

%END;

%IF %BQUOTE(&contvars) = AND %BQUOTE(&classvars) = %THEN

%DO;

%PUT %STR(ER)ROR: Neither CLASSVARS nor CONTVARS have been

specified! Execution of this macro will be stopped!;

%LET _errfound = 1;

%GOTO endmac;

%END;

DATA _NULL_;

CALL SYMPUT(‘_indata’, SCAN(“&indata”, 1, ‘ ()’));

RUN;

%IF %BQUOTE(&outdata) = %THEN

%LET outdata = &indata;

%IF %SYSFUNC(EXIST(&_indata)) = 0 %THEN

%DO;

%LET _errfound = 1;

%PUT %STR(ER)ROR: Input dataset does not exist! Execution of

this macro will be stopped!;

%GOTO endmac;

%END;

DATA _indata;

SET &indata;

RUN;

%IF %BQUOTE(&treated) = %THEN

%DO;

%PUT %STR(ER)ROR: Parameter TREATED has been specified as

blank! Execution of this macro will be stopped!;

%LET _errfound = 1;

%GOTO endmac;

%END;

%ELSE

%DO;

DATA _NULL_;

IF SUBSTR(SYMGET(“treated”), 1, 1) ^= ‘”’ AND

SUBSTR(SYMGET(“treated”), 1, 1) ^= “’” THEN

CALL SYMPUT(“_treatedvar”, “1”);

ELSE CALL SYMPUT(“_treatedvar”, “0”);

RUN;

%END;

PROC CONTENTS DATA = &indata OUT = _contents_psm NOPRINT;

RUN;

%LET _ps_exist = 0;

%LET _cohort_exist = 0;

%LET _treated_exist = 0;

%LET __cohort_exist = 0;

DATA _NULL_;

SET _contents_psm;

IF UPCASE(name) = UPCASE(“&cohort”) THEN

DO;

CALL SYMPUT(‘_cohort_exist’, ‘1’);

CALL SYMPUT(‘_coh_tp’, COMPRESS(PUT(type, BEST.)));

CALL SYMPUT(‘_coh_fmt’, COMPRESS(format));

CALL SYMPUT(‘_coh_lab’, TRIM(LEFT(label)));

END;

ELSE IF UPCASE(name) = UPCASE(“_cohort”) THEN

CALL SYMPUT(‘__cohort_exist’, ‘1’);

ELSE IF UPCASE(name) = UPCASE(“&ps”) THEN

CALL SYMPUT(‘_ps_exist’, ‘1’);

%IF &_treatedvar = 1 %THEN

%DO;

ELSE IF UPCASE(name) = UPCASE(“&treated”) THEN

CALL SYMPUT(‘_treated_exist’, ‘1’);

%END;

RUN;

%IF &_ps_exist = 1 %THEN

%DO;

%PUT %STR(WAR)NING: PS variable &ps already exists in dataset

&indata! This variable will be overwritten!;

DATA _indata;

SET _indata (DROP = &ps);

RUN;

%END;

%IF &_cohort_exist = 0 %THEN

%DO;

%LET _errfound = 1;

%PUT %STR(ER)ROR: Cohort variable &cohort not found in dataset

&indata! Execution of this macro will be stopped!;

%GOTO endmac;

%END;

%IF &_treated_exist = 0 AND &_treatedvar = 1 %THEN

%LET _treatedvar = 0;

%IF &_treatedvar = 1 %THEN

%DO;

PROC SQL NOPRINT;

SELECT DISTINCT &treated INTO: _treated FROM _indata;

QUIT;

%LET treated = “&_treated”;

%IF &sqlobs > 1 %THEN

%DO;

%PUT %STR(ER)ROR: More than one value found for

variable &treated! Execution of this macro

will be stopped!;

%GOTO endmac;

%END;

PROC SQL NOPRINT;

CREATE TABLE _NULL_ AS SELECT DISTINCT &cohort FROM &indata WHERE

&cohort = &treated;

QUIT;

%LET _fnd_treated = &sqlobs;

%IF &_fnd_treated = 0 %THEN

%DO;

%LET _errfound = 1;

%PUT %STR(ER)ROR: Value &treated not found for variable

&cohort! Execution of this macro will be stopped!;

%GOTO endmac;

%END;

PROC SQL NOPRINT;

CREATE TABLE _cohort_psm AS SELECT DISTINCT &cohort FROM &indata WHERE

NOT MISSING(&cohort);

QUIT;

%LET _n_cohort = &sqlobs;

%IF &_n_cohort > 2 %THEN

%DO;

%LET _errfound = 1;

%PUT %STR(ER)ROR: More than 2 values for variable &cohort

found! Execution of this macro will be stopped!;

%GOTO endmac;

%END;

%ELSE %IF &_n_cohort < 2 %THEN

%DO;

%LET _errfound = 1;

%PUT %STR(ER)ROR: Less than 2 values for variable &cohort

found! Execution of this macro will be stopped!;

%GOTO endmac;

%END;

%LET _errfound = 0;

%* Creating a derived variable _COHORT with value 1 for treated and 0 for

control;

DATA _indata_keep (%IF &__cohort_exist = 1 %THEN DROP = _cohort;

RENAME = (_new_cohort = _cohort));

SET &indata;

%IF &_coh_tp = 2 %THEN

%DO;

IF &cohort = &treated THEN

_new_cohort = 1;

ELSE IF &cohort ^= ‘’ THEN

_new_cohort = 0;

%END;

%ELSE %IF &_coh_tp = 1 %THEN

%DO;

IF &cohort = &treated THEN

_new_cohort = 1;

ELSE IF &cohort ^= . THEN

_new_cohort = 0;

%END;

_mergekey = _N_;

RUN;

DATA _indata;

SET _indata_keep;

WHERE NOT MISSING(&cohort);

RUN;

%LET _errfound = 0;

%GLOBAL classvars_bin interactionscont;

%* Create binary indicator variable for all class variables and impute

missing values if required;

%_ps_indic(in = _indata, out = _indata_ps, full = NO);

%LET classvars_bin_model = &classvars_bin;

%* Run PSMATCH to create PS and derive _strata_ for step 0 - model without

interactions *;

PROC PSMATCH DATA = _indata_ps REGION = ALLOBS;

CLASS _cohort &classvars_bin_model;

PSMODEL _cohort(Treated = “1”) = &contvars &classvars_bin_model

&always_int;

OUTPUT OUT = ps PS = _ps_;

RUN;

PROC SUMMARY DATA = ps NWAY;

CLASS _mergekey _cohort;

VAR _ps_;

OUTPUT OUT = ps MEAN =;

RUN;

%IF %SUBSTR(%UPCASE(&debug, 1, 1)) ^= Y %THEN

OPTIONS NONOTES NOMPRINT NOMLOGIC;;

PROC PSMATCH DATA = ps REGION = ALLOBS;

CLASS _cohort;

PSDATA TREATVAR = _cohort(Treated = “1”) PS = _ps_;

STRATA NSTRATA = &nstrata KEY = TOTAL;

OUTPUT OUT (OBS = REGION) = ps;

RUN;

DATA ps;

MERGE _indata ps;

BY _mergekey;

RUN;

%* Calculate standardized bias for step 0 - model without interactions;

%_ps_stddiff_apmb (indata = ps);

%* Calculate IMBALANCE as ABS(stddiff) > &imbal_strata_crit and count the mean

and number of imbalanced over strata per term (main and interaction).;

DATA _stddiff;

SET _stddiff;

stddiff = ABS(stddiff);

IF stddiff > &imbal_strata_crit THEN

imbalance = 1;

ELSE imbalance = 0;

IF vartype1 = ‘C’ THEN

DO;

_var1 = UPCASE(REVERSE(variable1));

_var1 = REVERSE(SUBSTR(_var1, INDEX(_var1, ‘_’) + 1));

END;

ELSE _var1 = variable1;

IF vartype2 = ‘C’ THEN

DO;

_var2 = UPCASE(REVERSE(variable2));

_var2 = REVERSE(SUBSTR(_var2, INDEX(_var2, ‘_’) + 1));

END;

ELSE _var2 = variable2;

RUN;

PROC SORT DATA = _stddiff;

BY _var1 _var2;

RUN;

PROC SUMMARY DATA = _stddiff NWAY MISSING;

CLASS variable1 _var1 variable2 _var2;

VAR imbalance stddiff;

OUTPUT OUT = imbalance SUM = imbalance dum1 MEAN = dum2 stddiff;

RUN;

%* For interaction involving class variable the maximum number and maximum

mean over categories is taken;

PROC SUMMARY DATA = imbalance NWAY MISSING;

CLASS _var1 _var2;

VAR imbalance stddiff;

OUTPUT OUT = imbalance (DROP = _freq_ _type_) MAX = imbalance max;

RUN;

%* Macro variable _N_IMBAL with number of terms (main and interaction) with

more than &imbal_nstrata_crit imbalanced strata is created;

PROC SQL NOPRINT;

SELECT MEAN(max) INTO: _max FROM imbalance;

SELECT COMPRESS(PUT(COUNT(max), BEST.)) INTO: _n_imbal FROM imbalance

WHERE (imbalance >= &imbal_nstrata_crit);

QUIT;

%PUT STEP 0: #imbalanced: &_n_imbal;

%LET count = 0;

%* Select only the interaction terms and sort on number of imbalanced and

mean std. bias. Select the last record. This will contain the

interaction term to be added next;

PROC SORT DATA = imbalance (WHERE = (_var2 ^= ‘’)) OUT = imbalance_new;

BY imbalance max;

RUN;

DATA imbalance_new;

SET imbalance_new END = last;

IF last;

RUN;

%* If interaction term involves one or two class variable, get all indicator

variables to add to model;

PROC SORT NODUPKEY DATA = _stddiff (KEEP = _var1 variable1 _var2 variable2

vartype:) OUT = _vars;

BY _var1 _var2 variable1 variable2;

RUN;

DATA imbalance_new;

MERGE _vars imbalance_new (IN = in);

BY _var1 _var2;

IF in;

RUN;

DATA imbalance_new;

SET imbalance_new;

BY _var1 _var2 variable1 variable2;

IF vartype2 = ‘C’ AND LAST.variable1 THEN

DELETE;

RUN;

PROC SORT DATA = imbalance_new;

BY _var2 _var1 variable2 variable1;

RUN;

DATA imbalance_new;

SET imbalance_new;

BY _var2 _var1 variable2 variable1;

IF vartype1 = ‘C’ AND LAST.variable2 THEN

DELETE;

RUN;

PROC SORT DATA = imbalance_new;

BY _var1 variable1 _var2 variable2;

RUN;

%* Dataset IMBALANCE is to contain all interaction terms and whether they are

in the model;

DATA imbalance;

MERGE imbalance (WHERE = (_var2 ^= ‘’)) imbalance_new (KEEP = _var1

_var2 IN = in0 OBS = 1);

BY _var1 _var2;

iter = 0;

out = 0;

in = 0;

IF in0 THEN

in = 1;

RUN;

%* Dataset ALLINTER is the dataset contain all interaction terms already in

the model plus the one to be added.;

DATA allinter;

SET imbalance_new (IN = in0);

IF in0 THEN

iter = &count + 1;

RUN;

%LET n_inter = 0;

%LET new_n_inter = 1;

%LET _n_imbal_new = &_n_imbal;

%LET _n_imbal_start = &_n_imbal;

%* Add interaction terms to model and recalculate PS, _strata and

standardized bias until no more interaction terms have standardized

bias of more than &imbal_strata_crit and are not already in the model;

%DO %WHILE (&new_n_inter > 0 AND &count < &maxiter AND &_n_imbal_new ^= 0);

%LET count = %EVAL(&count + 1);

%LET n_inter = &new_n_inter;

%* Fill INTERACTIONSIN with all interaction to be fitted to the model

of this step;

DATA _NULL_;

SET allinter END = last;

CALL SYMPUT(‘_ibint’||COMPRESS(PUT(_n_, BEST.)),

COMPRESS(variable1||’*’||variable2));

IF last THEN

CALL SYMPUT(‘_nibint’, COMPRESS(PUT(_n_, BEST.)));

RUN;

%LET interactionsin =;

%DO iloop = 1 %TO &_nibint;

%LET interactionsin = &interactionsin &&_ibint&iloop;

%END;

%* Run PSMATCH to create PS and derive _strata_ *;

PROC PSMATCH DATA = _indata_ps REGION = ALLOBS;

CLASS _cohort &classvars_bin_model;

PSMODEL _cohort(Treated = “1”) = &contvars &classvars_bin_model

&always_int &interactionsin;

OUTPUT OUT = ps PS = _ps_;

RUN;

PROC SUMMARY DATA = ps NWAY;

CLASS _mergekey _cohort;

VAR _ps_;

OUTPUT OUT = ps MEAN =;

RUN;

PROC PSMATCH DATA = ps REGION = ALLOBS;

CLASS _cohort;

PSDATA TREATVAR = _cohort(Treated = “1”) PS = _ps_;

STRATA NSTRATA = &nstrata KEY = TOTAL;

OUTPUT OUT (OBS = REGION) = ps;

RUN;

DATA ps;

MERGE _indata ps;

BY _mergekey;

RUN;

%* Calculate standardized bias;

%_ps_stddiff_apmb (indata = ps);

%* Calculate IMBALANCE as ABS(stddiff) > &imbal_strata_crit and count

the number of imbalanced over strata per interaction.;

DATA _stddiff;

SET _stddiff;

stddiff = ABS(stddiff);

IF stddiff > &imbal_strata_crit THEN

imbalance = 1;

ELSE imbalance = 0;

IF vartype1 = ‘C’ THEN

DO;

_var1 = UPCASE(REVERSE(variable1));

_var1 = REVERSE(SUBSTR(_var1, INDEX(_var1, ‘_’) +

1));

END;

ELSE _var1 = variable1;

IF vartype2 = ‘C’ THEN

DO;

_var2 = UPCASE(REVERSE(variable2));

_var2 = REVERSE(SUBSTR(_var2, INDEX(_var2, ‘_’) +

1));

END;

ELSE _var2 = variable2;

RUN;

PROC SORT DATA = _stddiff;

BY _var1 _var2;

RUN;

DATA imbalance_old;

SET imbalance_new;

RUN;

PROC SUMMARY DATA = _stddiff NWAY MISSING;

CLASS variable1 _var1 variable2 _var2;

VAR imbalance stddiff;

OUTPUT OUT = imbalance_new SUM = imbalance dum1 MEAN = dum2

stddiff;

RUN;

%* For interaction involving class variable the maximum number and

maximum mean over categories is taken;

PROC SUMMARY DATA = imbalance_new NWAY MISSING;

CLASS _var1 _var2;

VAR imbalance stddiff;

OUTPUT OUT = imbalance_new MAX = imbalance max;

RUN;

%* Macro variable _N_IMBAL_NEW with number of terms (main and

interaction) with more than &imbal_nstrata_crit imbalanced strata is

created;

PROC SQL NOPRINT;

SELECT MEAN(max) INTO: _max_new FROM imbalance_new;

SELECT COMPRESS(PUT(COUNT(max), BEST.)) INTO: _n_imbal_new FROM

imbalance_new WHERE (imbalance >= &imbal_nstrata_crit);

QUIT;

%* If no improvement since last step then remove the term from the

existing terms by removing from dataset ALLINTER and setting

variables IN = 0, OUT = 1 in dataset IMBALANCE.

Select the record from dataset IMBALANCE with the next highest number

of imbalanced strata and the highest mean standard bias. This term

will be added in next step;

%IF NOT(&&_n_imbal_new < &_n_imbal) %THEN

%DO;

%LET _added = NOT ADDED;

DATA allinter;

SET allinter;

IF iter ^= &count;

RUN;

DATA imbalance_out;

SET imbalance_old (OBS = 1);

in = 0;

out = 1;

KEEP _var1 _var2 in out;

RUN;

DATA imbalance;

MERGE imbalance imbalance_out;

BY _var1 _var2;

RUN;

PROC SORT DATA = imbalance;

BY out in DESCENDING imbalance DESCENDING max;

RUN;

DATA imbalance_new;

SET imbalance (WHERE = (imbalance >=

&entry_nstrata_crit AND NOT in

AND NOT out) OBS = 1);

IF NOT(in OR out);

DROP in out;

RUN;

%END;

%* If improvement since last step then add term to the terms to stay

in the model. In dataset IMBALANCE var IN is set to 1.

Macro variable _N_IMBAL is updated to &_N_IMBAL_NEW. Dataset

IMBALANCE_NEW is created with the next term to be added.;

%ELSE

%DO;

%LET _added = ADDED;

DATA imbalance_keep;

SET imbalance_new;

step = &count;

RUN;

DATA imbalance;

MERGE imbalance (DROP = max imbalance)

imbalance_new (KEEP = _var1 _var2 max

imbalance WHERE = (_var2 ^= ‘’))

imbalance_old (KEEP = _var1 _var2 IN =

innew OBS = 1);

BY _var1 _var2;

out = .;

IF innew THEN

in = 1;

RUN;

%LET _n_imbal = &_n_imbal_new;

%LET _max = &&_max_new;

PROC SORT DATA = imbalance (WHERE = (in OR out)) OUT =

imbalance_prev (KEEP = _var1 _var2) NODUPKEY;

BY _var1 _var2;

RUN;

DATA imbalance_new;

MERGE imbalance_prev (IN = inp) imbalance_new

(WHERE = (_var2 ^= ‘’ AND imbalance >=

&entry_nstrata_crit));

BY _var1 _var2;

IF NOT inp;

keep = _var1;

_var1 = _var2;

_var2 = keep;

DROP keep;

RUN;

PROC SORT DATA = imbalance_new;

BY _var1 _var2;

RUN;

DATA imbalance_new;

MERGE imbalance_prev (IN = inp) imbalance_new

(WHERE = (_var2 ^= ‘’ AND imbalance >=

&entry_nstrata_crit));

BY _var1 _var2;

IF NOT inp;

keep = _var1;

_var1 = _var2;

_var2 = keep;

DROP keep;

RUN;

%* Select the interaction with the highest sum of

std.diffs. This one is the one to add;

PROC SORT DATA = imbalance_new;

BY imbalance max;

RUN;

DATA imbalance_new;

SET imbalance_new END = last;

IF last;

RUN;

%END;

%* If interaction term involves one or two class variable, get all

indicator variables to add to model;

PROC SORT NODUPKEY DATA = _stddiff (KEEP = _var1 variable1 _var2

variable2 vartype: WHERE = (_var2 ^= ‘’)) OUT = _vars;

BY _var1 _var2 variable1 variable2;

RUN;

DATA imbalance_new;

MERGE _vars imbalance_new (IN = in);

BY _var1 _var2;

IF in;

RUN;

DATA imbalance_new;

SET imbalance_new;

BY _var1 _var2 variable1 variable2;

IF vartype2 = ‘C’ AND LAST.variable1 THEN

DELETE;

RUN;

PROC SORT DATA = imbalance_new;

BY _var2 _var1 variable2 variable1;

RUN;

DATA imbalance_new;

SET imbalance_new;

BY _var2 _var1 variable2 variable1;

IF vartype1 = ‘C’ AND LAST.variable2 THEN

DELETE;

RUN;

PROC SORT DATA = imbalance_new;

BY _var1 variable1 _var2 variable2;

RUN;

PROC SORT DATA = imbalance;

BY _var1 _var2;

RUN;

* Finalize IMBALANCE_NEW and check if there is any more terms to add;

%LET new_n_inter = 0;

DATA imbalance_new;

SET imbalance_new END = last;

IF last THEN

CALL SYMPUT(‘new_n_inter’, COMPRESS(PUT(_n_, BEST.)));

RUN;

%* Dataset ALLINTER contains all interaction terms to be added in the

next step;

DATA allinter;

SET allinter imbalance_new (IN = in);

IF in THEN

iter = &count + 1;

RUN;

%PUT STEP &count: #imbalanced: &_n_imbal - &&_ibint&_nibint &_added;

%END;

%* Check whether convergence is met, i.e. no more new interactions term

available for selection;

%IF &new_n_inter > 0 %THEN

%DO;

%PUT %STR(ERR)OR: Maximum number of iteration reached and no

convergence yet!;

%GOTO endmac;

%END;

%* Run PSMATCH for final model to create PS *;

DATA _NULL_;

SET allinter END = last;

CALL SYMPUT(‘_ibint’||COMPRESS(PUT(_n_, BEST.)),

COMPRESS(variable1||’*’||variable2));

IF last THEN

CALL SYMPUT(‘_nibint’, COMPRESS(PUT(_n_, BEST.)));

RUN;

%LET interactionsin =;

%DO iloop = 1 %TO &_nibint;

%LET interactionsin = &interactionsin &&_ibint&iloop;

%END;

OPTIONS &_notes &_mprint &_mlogic &_symbolgen;

PROC PSMATCH DATA = _indata_ps REGION = ALLOBS;

CLASS _cohort &classvars_bin_model;

PSMODEL _cohort(Treated = “1”) = &contvars &classvars_bin_model

&always_int &interactionsin;

OUTPUT OUT = ps PS = _ps_;

RUN;

PROC SUMMARY DATA = ps NWAY;

CLASS _mergekey;

VAR _ps_;

OUTPUT OUT = ps (DROP = _type_ _freq_) MEAN =;

RUN;

%*If convergence has been reached then create output dataset with propensity

score and information about the method used.;

PROC SORT DATA = imbalance (WHERE = (in AND NOT out)) OUT = imb NODUPKEY;

BY _var1 _var2;

RUN;

PROC CONTENTS DATA = _indata_keep NOPRINT OUT = _cont;

RUN;

PROC SQL;

CREATE TABLE _inter1 AS SELECT a.name AS _var1, b._var2

FROM _cont AS a, imb AS b

WHERE UPCASE(a.name) = b._var1;

CREATE TABLE _inter AS SELECT b._var1, a.name AS _var2

FROM _cont AS a, _inter1 AS b

WHERE UPCASE(a.name) = b._var2;

QUIT;

DATA _NULL_;

SET _inter END = last;

CALL SYMPUT(‘_int’||COMPRESS(PUT(_N_, BEST.)),

COMPRESS(_var1||’*’||_var2));

IF last THEN

CALL SYMPUT(‘_n_int’, COMPRESS(PUT(_N_, BEST.)));

RUN;

%LET interactions =;

%DO iloop = 1 %TO &_n_int;

%LET interactions = &interactions &&_int&iloop;

%END;

PROC SUMMARY DATA = imbalance_keep NWAY;

VAR max;

OUTPUT OUT = stat min = min mean = mean median = median max =;

RUN;

DATA _NULL_;

SET stat;

CALL SYMPUT(‘_stats’, COMPBL(‘Standardized Bias: MEAN: ‘||PUT(mean,

8.2)||’; MIN: ‘||PUT(min, 8.2)||’; MEDIAN: ‘||PUT(median,

8.2)||’; MAX: ‘||PUT(max, 8.2)||’.’));

RUN;

PROC SUMMARY DATA = imbalance_keep (WHERE = (_var2 = ‘’)) NWAY;

VAR max;

OUTPUT OUT = stat_main min = min mean = mean median = median max =;

RUN;

DATA _NULL_;

SET stat_main;

CALL SYMPUT(‘_stats_main’, COMPBL(‘Standardized Bias: MEAN:

‘||PUT(mean, 8.2)||’; MIN: ‘||PUT(min, 8.2)||’; MEDIAN:

‘||PUT(median, 8.2)||’; MAX: ‘||PUT(max, 8.2)||’.’));

RUN;

DATA &outdata;

MERGE _indata_keep %IF &_ps_exist = 1 %THEN (DROP = &ps &ps._:);

ps (RENAME = (_ps_ = &ps));

BY _mergekey;

DROP _mergekey;

&ps._details = “Propensity Scores Calculation Details: Method:

Automatic Parametric Model Building.”;

&ps._cohort = “&cohort”;

&ps._treated = &treated;

&ps._details_settings = COMPBL(“Imbalance criterion:

&imbal_nstrata_crit strata (Entry &entry_nstrata_crit) >

&imbal_strata_crit; “||

“#Strata: &nstrata; Key: TOTAL; Region: ALLOBS”);

&ps._details_stats = COMPBL(“Number imbalanced at start:

&_n_imbal_start; Number imbalance at end: &_n_imbal; Number of

steps: &count; Standardardized bias summary for all terms:

&_stats; Standardized bias summary for main terms only

&_stats_main.”);

%IF %BQUOTE(&classvars) ^= %THEN

&ps._classvars = “Categorical covariates used for propensity

scores: %TRIM(&classvars).”;;

%IF %BQUOTE(&contvars) ^= %THEN

&ps._contvars = “Continuous covariates used for propensity

scores: %TRIM(&contvars).”;;

%IF %BQUOTE(&interactions) ^= %THEN

&ps._interactions = “Interactions used for propensity scores:

%TRIM(&interactions).”;;

RUN;

%endmac:

%* Clean-up;

ODS LISTING;

/*~~ PROC DATASETS LIBRARY = work NODETAILS NOLIST;

DELETE imbalance imb imbalance_new imbalance_old imbalance_prev imbalance_out stddiff1_ stddiff2_ stddiff0_ _cohort_psm _cont _contents_psm _indata _indata_int _indata_mi _indata_ps _inter _inter1 _mean _mean0 _mean1 _mean2 _meanclass _meanclass_t _meancont _meancont_t _mean_t _nmiss _stddiff ps allinter imbalance_keep stat stat_main _indata_keep _vars;

QUIT;*/

%MEND ps_calc_apmb;

There are also other proposed model selection methods for the propensity score estimation. For instance, Hirano and Imbens (2001) proposed one model selection algorithm that combines propensity score weighting and linear regression modeling that adjusts for covariates. This algorithm selects the propensity score model by testing the strength of association between a single covariate (or a single higher-order term or a single interaction) and intervention options, and pre-specified t-statistic values were used to measure the strength. The terms strongly associated with the intervention options are included in the final propensity score model.

Imbens and Rubin (2015) proposed an iterative approach in constructing the propensity score model. First, the covariates that are viewed as important for explaining the intervention assignment and possibly related to the outcomes will be included. Second, the remaining covariates will be added to the model iteratively based on likelihood ratio statistics that test the hypothesis whether the added single covariate would have a coefficient of 0. Last, the higher-order and interactions of the single covariates selected in the second step will be added to the existing model iteratively and will be included if the likelihood ratio statistics exceed a pre-specified value.

However, for these two methods, the authors do not provide specific guidelines for selecting values for the t-statistic values or the likelihood ratio statistic values. Instead, they consider a range of values and a range of the corresponding estimated treatment effects. Those issues made it difficult to implement this approach as an automatic model selection approach for propensity score estimation.

Nonparametric Models

In parametric modeling, we always assume a data model with unknown parameters and use the data to estimate those model parameters. Therefore, a mis-specified model can cause significant bias in estimating propensity scores. Contrary to the parametric approach, nonparametric models build the relationship between an outcome and predictors through a learning algorithm without an a priori data model. Classification and regression trees (CART) are a well-known example of a nonparametric approach. To estimate propensity score, they partition a data set into regions such that within each region, observations are as homogeneous as possible so that they will have similar probabilities of receiving treatment. CART has advantageous properties, including the ability to handle missing data without imputation and is insensitive to outliers. Additionally, interactions and non-linearities are modeled naturally as a result of the partitioning process instead of a priori specification. However, CART has difficulty in modeling smooth functions and is sensitive to overfitting.

To remedy these limitations, several approaches have been proposed, such as the pruned CART to address overfitting. Bootstrap aggregated (bagged) CART involves fitting a CART to a bootstrap sample with replacement and of the original sample size, repeated many times. For each observation, the number of times it is classified into a category by the set of trees is counted, with the final assignment of the treatment based on an average or majority vote over all the trees. Random forests are similar to bagging, but they use a random subsample of predictors in the construction of each CART.

Another approach, boosted CART, has been shown to outperform alternative methods in terms of prediction error. The boosted CART goes through multiple iterations of tree fitting on random subsets of the data like the bagged CART or random forest. However, with each iteration, a new tree gives greater priority to the data points that were incorrectly classified with the previous tree. This method adds together many simple functions to estimate a smooth function of a large number of covariates. While each individual simple function might be a poor approximation to the function of interest, together they are able to approximate a smooth function just as a sequence of linear segments can approximate a smooth curve.

As McCaffrey et al. (2004) suggested, the gradient boosting algorithm should stop at the number of iterations that minimizes the average standardized absolute mean difference (ASAM) in the covariates. The operating characteristics of these algorithms depends on hyper-parameter values that guide the model development process. The default values of these hyper-parameters might be suitable for some but not for other applications. While xgboost (Chen, 2015, 2016) has been in the open-source community for several years, SAS Viya provides its own gradient boosting CAS action (gbtreetrain) and accompanying procedure (PROC GRADBOOST). Both are similar to xgboost, yet have some nice enhancements sprinkled throughout. One huge bonus is the auto-tuning feature, which is the AUTOTUNE statement in GRADBOOST, and it could help identifying the best settings for those hyper-parameters in each individual user cases, so that researchers do not need to manually adjust the hyper-parameters. Notice that PROC GRADBOOST aims to minimize the prediction error but not ASAM, and more research need to be done to understand how to optimize PROC GRADBOOST when the criteria is ASAM in the covariates. Program 4.10 illustrates how to use GRADBOOST for building the boosted CART model.

Program 4.10: Gradient Boosting Model for Propensity Score Estimation

* gradient boosting for PS estimation: tune hyper-parameters, fit the tuned model, and obtain PS;

proc gradboost data=REFL seed=117 earlystop(stagnation=10);

autotune kfold=5 popsize=30;

id subjid cohort;

target cohort/level=nominal;

input

Gender

Race

DrSpecialty/level=nominal;

input

DxDur

Age

BMI_B

BPIInterf_B

BPIPain_B

CPFQ_B

FIQ_B

GAD7_B

ISIX_B

PHQ8_B

PhysicalSymp_B

SDS_B/level=interval;

output out=mycas.dps;

run;

* our focus is on PS=P(opioid);

data lib.dps;

set mycas.dps;

PS=P_Cohortopioid;

run;

4.2.4 The Criteria of “Good” Propensity Score Estimate

A natural question to ask is which of the three propensity score estimation approaches should be used in a particular study – and there is no definitive answer to this question. Parametric models are easier to interpret, and the a priori model approach allows researchers to incorporate the knowledge outside the data to the model building, for example, clinical evidence on which variable should be included. However, the risk of model mis-specification is not ignorable. The nonparametric CART approach performs well in predicting the treatment given the data, especially the boosted CART approach. In addition, CARTs handle the missing data naturally in the partition process so that they don’t require imputation of the missing covariate values. However, the CART approach is not as interpretable as the parametric modeling approach and prior knowledge is difficult to incorporate as the CARTs are a data-driven process. We suggest the researchers assess the quality of the propensity score estimates and use the desired quality to drive the model selection. For the remaining of this section, we will discuss some proposed criteria of evaluating the quality of propensity score estimates.

As a reminder of what we presented earlier in this chapter, the ultimate goal of using propensity scores in observational studies is to create the balance in distributions of the confounding covariates between the treatment and control groups. Thus, a “good” propensity score estimate should be able to induce good balance between the comparison groups. Imbens and Rubin (2015) provided the following approach in assessing such balance.

1. Stratify the subjects based on their estimated propensity score. For more details, see section 13.5 of Imbens and Rubin (2015).

2. Assess the global balance for each covariate across strata. Calculate the sampling mean and variance of the difference of the th covariate between treatment and control group within each strata, then use the weighted mean and variance of to form a test statistic for the null hypothesis that the weighted average mean is 0. Under the null hypothesis, the test statistic is normally distributed. Therefore, if the z-values are substantially larger than the absolute value of 1, then the balance of is achieved.

3. Assess the balance for each covariate within each stratum (for all strata). Calculate the sample mean of the th covariate of the control group and its difference with the treatment group for the th stratum (), further calculate the weighted sum of the mean of the th covariate for the treatment and control groups. An F statistic can then be constructed to test the null hypothesis that the mean for the treated subpopulation is identical to that of the mean of the control subpopulation in the th stratum.

4. Assess the balance within each stratum for each covariate. Similar to the first step, but construct the statistic for covariates and J strata. Therefore, a total test statistic value will be generated, and it will be useful to present Q-Q plots, comparing with those values. If the covariates are well-balanced, we would expect the Q-Q plots to be flatter than a 45⁰ line.

In general, it is not clear how to find the “best” set of propensity score estimates for a real world study. In some cases, the balance assessments might show one model to be clearly better than another. However, in other cases, some models may balance better for some covariates and less well for others (and not all covariates are equally important in controlling bias). As a general rule, as long as the estimated propensity scores induce reasonable balance between the treated and control group, it can be considered as a “good” estimate, and the researchers should be able to use the estimated propensity score to control the bias caused by the confounding covariates in estimating causal treatment effect. In Chapter 5, we will discuss those statistics as quality check of propensity score estimates in more detail.

4.3 Example: Estimate Propensity Scores Using the Simulated REFLECTIONS Data

In the REFLECTIONS study described in Chapter 3, the researchers were interested in comparing the BPI pain score at one year after intervention initiation between patients initiating opioids versus those who initiated all other (non-opioid) interventions. Based on the DAG assessment presented earlier, the following covariates were considered as important confounders: age, gender, race, body mass index (BMI), doctor specialty, and baseline scores for pain severity (BPI-S), pain interference (BPI-I), disease impact (FIQ), disability score (SDS), depression severity (PHQ-8), physical symptoms (PHQ-15), anxiety severity (GAD-7), insomnia severity (ISI), cognitive functioning (MGH-CPFQ), and time since initial therapy (DxDur). The propensity score is the probability of receiving opioid given these covariates. For covariates with missing values, after initial assessment, only the variable DxDur had missing values and the number of subjects with missing DxDur value is over 100 (133 out of 1000). For demonstration purposes in this chapter, we will only use MI to impute the missing DxDur for propensity score estimation. Readers are able to implement MP and MIMP methods for missing data imputation using the SAS code presented earlier in this chapter. The sections demonstrate the a priori, automated model building, and gradient boosting approaches to estimating the propensity scores. Only histograms of the propensity scores are presented, and a full evaluation of the quality of the models is withheld until Chapter 5.

4.3.1 A Priori Logistic Model

First, a logistic regression model with the previously described covariates as main effects was constructed to estimate propensity score. No interactions or other high order terms were added to the model since there is no strong clinical evidence suggesting the existence of those terms. Program 14.1 implements an a priori model such as this. The estimated propensity score distribution between opioid and non-opioid group is shown in Figure 4.2. Note, code for this mirrored histogram plot is presented in Chapter 5.

Figure 4.2: The Distribution of Estimated Propensity Score Using an a Priori Logistic Model

From the histogram of the distributions, we can see that the opioid group has higher estimated propensity scores compared with those of non-opioid group. It is not surprising because the propensity score estimated is the probability of receiving opioid, so in theory the opioid group should have higher probability of receiving the opioid as long as there are factors related to treatment selection in the model. For the opioid group, there are very few subjects with very little chance of receiving opioid (propensity score < 0.1). For the non-opioid group, quite a few subjects had very little chance of receiving opioid, which skewed the distribution of estimated propensity score to 0, that is, less likely to receive an opioid.

4.3.2 Automatic Logistic Model Selection

Second, an automatic logistic model selection approach was implemented. (See Program 4.10.) In addition to the main effect, interactions were added to the model iteratively if the added one was able to reduce the number of total imbalanced strata. In this example, we select five strata and determine a covariate is imbalanced if the standardized difference is more than 0.25 on 2 or more strata. An interaction term was added if it was still imbalanced at current iteration. The iterative process will stop if the added interactions cam no longer reduce the number of total imbalanced strata. The estimated propensity score distribution between opioid and non-opioid group is shown in Figure 4.3.

Figure 4.3: The Distribution of Estimated Propensity Score Using Automatic Logistic Model Selection

The automatic logistic model selection resulted in similar distributions of the estimated propensity scores compared with the ones generated by the a priori logistic model, although the number of subjects who had very low propensity score estimates (< 0.1) in the non-opioid group increased a little bit. The output data from Program 4.8 provides all the details about this automatic model selection procedure (not shown here). In our case, six interactions are included in the final propensity score estimation model, and the model was able to reduce the number of imbalanced interaction terms from 48 to 34.

4.3.3 Boosted CART Model

Lastly, a boosted CART propensity score model was constructed with PROC GRADBOOST. (See Program 4.9.) The cross validated (5 folds) tuning of the hyper-parameters was done using genetic algorithm (population size=30) with the misclassification error as the objective function. The early stopping rule has been applied in order to stop model fitting if there is no improvement on objective function in 10 iterations. The missing data are handled by default, so no imputation is needed.

The gradient boosting model resulted in similar distributions of the estimated propensity scores compared with the ones generated by a priori logistic model or automatic selected logistic model, as shown in

Figure 4.4.

Figure 4.4: The Distribution of Estimated Propensity Score Using Gradient Boosting

4.4 Summary

This chapter introduced the propensity score, a commonly used method when estimating causal treatment effects in non-randomized studies. This included a brief presentation of its theoretical properties to explain why the use of propensity scores is able to reduce bias in causal treatment effect estimates. Key assumptions of propensity scores were provided so that researchers can better evaluate the validity of the analysis results if propensity score methods were used. If some assumptions were violated, sensitivity analysis should be considered to assess the impact of such violation. Later in the book in Chapter 13, we will discuss the existence of unmeasured confounding and the appropriate approach to address this issue.

The main focus of the chapter was providing guidance and SAS code for implementation for estimating propensity scores – the true propensity score of a subject is usually unknown in observational research. Key steps covered in the discussion included: (1) selection of covariates included in the model, (2) addressing missing covariates values, (3) selection of an appropriate modeling approach, and (4) assessment of quality of the estimated propensity score. For each element, possible approaches were discussed and recommendations made. We also provided SAS code to implement the best practices. We applied selected methods to estimate propensity scores of the intervention groups using the simulated real world REFLECTIONS data. The propensity score estimates will be further used to control for confounding bias in estimating the causal treatment effect between opioid and non-opioid groups via matching (Chapter 6), stratification (Chapter 7), and weighting (Chapter 8).

References

Albert A, Anderson JA (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71, 1.

Brookhart MA, et al. (2006). Variable selection for propensity score models. American journal of epidemiology 163(12): 1149-1156.

Caliendo M, Kopeinig S (2008). Some practical guidance for the implementation of propensity score matching. Journal of economic surveys 22.1: 31-72.

Chen T, Guestrin C (2015). XGBoost: Reliable Large-scale Tree Boosting System. http://learningsys.org/papers/LearningSys_2015_paper_32.pdf. Accessed Nov. 14, 2019.

Chen T, Guestrin C (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’16. https://arxiv.org/abs/1603.02754.

Cochran WG (1972). Observational studies. Statistical Papers in Honor of George W. Snedecor, ed. T.A. Bancroft. Iowa State University Press, pp. 77-90.

D’Agostino R, Lang W, Walkup M, Morgon T (2001). Examining the Impact of Missing Data on Propensity Score Estimation in Determining the Effectiveness of Self-Monitoring of Blood Glucose (SMBG). Health Services & Outcomes Research Methodology 2:291–315.

D’Agostino Jr, RB, Rubin DB (2000). Estimating and using propensity scores with partially missing data. Journal of the American Statistical Association 95.451: 749-759.

Dehejia, RH, Wahba S (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association 94.448: 1053-1062.

Dehejia, RH, Wahba S (2002). Propensity score-matching methods for nonexperimental causal studies. Review of Economics and Statistics 84.1: 151-161.

Dusetzina, SB, Mack CD, Stürmer T (2013). Propensity score estimation to address calendar time-specific channeling in comparative effectiveness research of second generation antipsychotics. PloS one 8.5: e63973.

Hansen, BB (2008). The prognostic analogue of the propensity score. Biometrika 95.2: 481-488.

Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21.16: 2409-2419.

Hill J (2004). Reducing bias in treatment effect estimation in observational studies suffering from missing data. ISERP Working Papers, 04-01.

Hirano K, Imbens GW (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes research methodology 2.3-4: 259-278.

Ibrahim J, Lipitz S, Chen M (1999). Missing covariates in generalized linear models when the missing data mechanism is nonignorable, Journal of the Royal Statistical Society. Series B (Statistical Methodology) 61:173-190.

Leacy, FP, Stuart EA (2014).”On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in medicine 33.20: 3488-3508.

Mack, CD et al. (2013). Calendar time‐specific propensity scores and comparative effectiveness research for stage III colon cancer chemotherapy. Pharmacoepidemiology and drug safety 22.8: 810-818.

McCaffrey DF, Ridgeway G, Morral AR (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods 9.4: 403.

Mitra R, Reiter JP (2011). “Estimating propensity scores with missing covariate data using general location mixture models.” Statistics in Medicine 30.6: 627-641.

Nguyen T, Debray TPA (2019). “The use of prognostic scores for causal inference with general treatment regimes.” Statistics in Medicine 38.11: 2013-2029.

Pearl J (2000). Causality: models, reasoning and inference. Vol. 29. Cambridge: MIT Press.

Petri H, Urquhart J (1991). Channeling bias in the interpretation of drug effects. Statistics in Medicine 10(4): 577-581.

Qu Y, Lipkovich I (2009). “Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach.” Statistics in Medicine28.9: 1402-1414.

Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70: 41-55.

Rosenbaum PR, Rubin DB (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association 79.387: 516-524.

Rubin DB (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. Proceedings of the survey research methods section of the American Statistical Association. Vol. 1. American Statistical Association, 1978.

Rubin DB (2001). Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology 2.3-4: 169-188.

Shrier I, Platt RW, Steele RJ (2007). Re: Variable selection for propensity score models. American journal of epidemiology 166(2): 238-239.

Подняться наверх