BACKGROUND: Multinational studies can be performed across a distributed network of data sources by converting theoriginal data into a common data model (CDM), and then running a common analysis script at each site.Multiple CDMs have been successfully deployed, but analysis scripts designed to run against one CDMcannot directly work for another CDM. It is therefore useful to develop scripts that can support multipleCDMs. In a past study, the analytic pipelines of multiple CDMs, including OMOP and Sentinel, wereconceptually mapped to a sequence of transformation steps: (T1) conversion to the CDM, resulting in aCDM instance; (T2) study variable creation, resulting in datasets of observations on the study population:this is the first step of a study script; (T3) application of study design, resulting in analytic datasets; and(T4) statistical analysis, resulting in datasets of study results. In practice, scripts can be structuredaccording to this sequence of steps. In such scripts, step T2 alone needs adaptation to the different CDMs
OBJECTIVES: To evaluate the feasibility of retrieving quantitative metadata (age and sex distribution) from datasources mapped to different CDMs
METHODS: In the European project MINERVA, we simulated a data source with sex and dates of birth and datasource entry and exit. We converted the data source to 4 commonly used CDMs: OMOP, ConcePTION,Nordic and TheShinISS and developed an R analysis script to calculate annual sex and age distributions ofthe population. Four versions of step T2 were programmed, one per CDM, to generate the same output.The next steps (T3 and T4) were designed to run on the output of T2, and were programmed just once.The script was run against the 4 conversions of the simulated data source, and the resulting 4 outputswere merged, to test whether they were the same. Finally, the script was run against two real instances ofCPRD and ARS, converted, respectively, to the ConcePTION and TheShinISS CDM.
RESULTS: The script took a few hours to develop and is loaded in a GitHub repository (https://github.com/ARS-toscana/MINERVA_samplescript). The 4 runs in the simulated data source resulted in generation of thesame output dataset (distributions). The script ran successfully against the two data sources and correctlycalculated their annual age and sex distribution.
CONCLUSIONS: It is possible to structure study scripts in a common sequence of steps. This minimizes the effort to adaptthem to multiple CDMs, because one step only (T2) requires adaptation. Structuring scripts this way hasthe potential to support collaboration in studies, by enabling the use of multiple CDMs.