Background: Researchers often use the National Death Index (NDI) for death tracing of study subjects. NDI can return multiple potential matches for an individual, which may require manual review to determine the most likely match. This ongoing drug safety study uses claims data maintained in a distributed common data model by 12 large health insurers/data partners (DP), with central programming provided to the DPs for analyses. DPs submit a large number of study subjects to NDI and may receive a very large number of potential matching deaths overall. Central programming of a common automated algorithm (AA) designed to select the most likely true match for each patient provides an opportunity to make the review of NDI results transparent, consistent, and efficient.
Objectives: To describe the AA used, evaluate the efficiency of this approach, and contrast this approach with NDI’s standard approach to selecting the most likely true match.
Methods: DPs will submit identifiers from study subjects to NDI for death tracing. DPs will apply a centrally developed AA to assess the large number of overall possible matches identified by NDI to select the most likely true match for each patient. The AA is based on an algorithm used widely by US cancer registries.
Results: We will describe the number of patients submitted by type (vital status known vs. unknown); the average number of potential matches NDI returned per patient; the total number of true matches by type of user record using the AA vs. NDI matching methodology; the proportion of AA-defined matches for which cause of death was provided (NDI assigned the match a high enough probabilistic score to be considered a "true match; assumed dead"). We will also discuss logistic challenges and whether the AA eliminates the need for manual review.
Conclusions: The application of a standard AA to assess many potential NDI matches represents a promising approach for increasing the efficiency, reducing the research burden, and standardizing the evaluation of NDI results in the context of large, multisite, complex research collaborations.