Decentralized differentially-private methods for dynamic data release and analysis

Project Summary Large data sets are important in the development and evaluation of artificial intelligence (AI) and statistical learning models to predict morbidity, mortality, and other important health outcomes. Healthcare institutions are stewards of their patients' data, and want to contribute to the development, evaluation, and utilization of predictive analytics tools. However, they also know that simple "de-identification" per HIPAA rules is not sufficient to protect patient privacy. Additionally, other factors such as protection of market share, lack of control about who uses shared data for what purposes, and concerns about patients' reactions to having their data shared without explicit consent make initiatives such as certain registries and centralized repositories difficult to implement. We have shown that it is possible to decompose algorithms so that they can run on data that stays at each healthcare center, thus mitigating the concerns about control and potential misuse. In the first phase of this project, we concentrated on demonstrating the accuracy and performance of these algorithms for the study of chronic diseases in which (1) acquisition of new knowledge about the condition is slow (i.e., the disease is well understood, so scientific discoveries are not being published at a rapid pace); and (2) the incidence and presentation of the disease do not vary dramatically from place to place, and from person to person. In this competitive renewal, we propose to develop decentralized predictive models that meet all requirements for chronic diseases, but the methods are also applicable to rapidly evolving acute conditions such as COVID-19. We propose new approaches to deal with sites that may be missing certain patient profiles or certain variables but can still participate in model learning, evaluation and implementation. These new AI algorithms will permit supervised and unsupervised learning across institutions, using data from multiple modalities (e.g., imaging, genomes, laboratory tests), and will allow privacy-protecting record linkage. We will test these algorithms and approaches in data from three highly diverse medical centers across the US: Emory University in Atlanta, University of Texas Health Science Center at Houston, and University of California, San Diego.

Decentralized differentially-private methods for dynamic data release and analysis

Key facts

Abstract