BACKGROUND: Criticisms of propensity score matching (PSM) have accrued in the literature. For example, it has been suggested that continuous deletion of matched sets in decreasing order of propensity score distance may lead to increased bias in the effect estimate. We present the results of a comparison of PSM with propensity score fine stratification (FS) and with coarsened exact matching (CEM). We chose these alternative techniques because of the suggestion of their superiority over PSM in the literature.
OBJECTIVES: Compare PSM with FS and CEM with respect to validity and precision of effect estimates using claims data.
METHODS: We used data from the Pharmaceutical Assistance Contract for the Elderly database (PACE, n = 49 653, 19 pre‐specified confounders) to assess the association between NSAIDs vs COX‐2 inhibitors and gastrointestinal complications and data from the Medicaid Analytic eXtract database (MAX, n = 886 996, 20 pre‐specified confounders) to assess the association between statins vs no statins and congenital malformations. PACE was analyzed with 50 and 100 additional empirical confounders selected from a high‐dimensional propensity score algorithm. Three techniques were applied to each dataset: (1) 1:1 PSM using a nearest neighbor matching algorithm, (2) FS using 10, 50, and 100 strata ranked by the propensity score distribution of the exposed after deleting observations from non‐overlapping propensity score regions, and (3) CEM using an auto‐coarsening technique. Our strategy generated 20 analytic datasets. For each analytic dataset, we compared the resulting relative risks (RR) and standard errors (SE) from weighted log binomial models, as well as the numbers of units remaining.
RESULTS: For each PACE dataset, FS resulted in a larger analytic dataset (>90% of the original dataset) and a lower SE than PSM and CEM. The RRs from PSM and FS were similar (indicating a ~10% increase in risk with NSAIDs) and consistent with prior evidence from experimental studies, while CEM resulted in larger effect estimates in all cases (indicating up to a 150% increase in risk). The MAX analyses led to similar findings.
CONCLUSIONS: FS was optimal in our analyses due to the high retention of study size and low SEs. CEM appears sub‐optimal for claims data, likely due to the high volume of binary confounders. As next steps, we will explore different coarsening strategies for CEM, and we will generate plasmode‐simulated datasets, which allow for clearer comparisons based on known effect sizes.