BACKGROUND: One of the promises of the 'big data' era for real-world evidence generation is that we will finally be able to harness unstructured data. Tools and algorithms, such as natural language processing and machine learning, are relatively new to pharmacoepidemiology yet have garnered widespread attention resulting from claims that they will allow researchers to efficiently extract meaning from content generated by patients, clinicians, regulators, reimbursement agencies, and researchers. While some see these tools as critical to unlocking new data sources, others have urged strong caution in moving away from the existing paradigm. There are many options on the continuum between developing static code lists from structured data and fully automated and dynamic solutions for non-structured data. A discussion is warranted to understand the methodological and operational trade-offs of various approaches to natural language processing and machine learning. A systematic approach is needed to assess options.
OBJECTIVES: The goal of this session is to engage in a dialogue among skeptics and believers to identify and vet a potential path toward best use of structured and unstructured data for observational research, including discussion of how validation methods could be developed and updated for rapidly evolving technology.
DESCRIPTION: This symposium will provide a short primer on algorithms and phenotypes, give examples of cross-functional development, present a potential framework for evaluating the feasibility of various approaches, and invite audience participation in vetting the framework as potential guidance for moving forward.