Decentralized differentially-private methods for dynamic data release and analysis
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: 9R01LM013712-05A1
Grant search
Key facts
Disease
COVID-19Start & end year
20222022Known Financial Commitments (USD)
$647,096Funder
National Institutes of Health (NIH)Principal Investigator
Xiaoqian JiangResearch Location
United States of AmericaLead Research Institution
UNIVERSITY OF CALIFORNIA, SAN DIEGOResearch Priority Alignment
N/A
Research Category
Health Systems Research
Research Subcategory
Health information systems
Special Interest Tags
Data Management and Data Sharing
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Project Summary Large data sets are important in the development and evaluation of artificial intelligence (AI) and statistical learning models to predict morbidity, mortality, and other important health outcomes. Healthcare institutions are stewards of their patients' data, and want to contribute to the development, evaluation, and utilization of predictive analytics tools. However, they also know that simple "de-identification" per HIPAA rules is not sufficient to protect patient privacy. Additionally, other factors such as protection of market share, lack of control about who uses shared data for what purposes, and concerns about patients' reactions to having their data shared without explicit consent make initiatives such as certain registries and centralized repositories difficult to implement. We have shown that it is possible to decompose algorithms so that they can run on data that stays at each healthcare center, thus mitigating the concerns about control and potential misuse. In the first phase of this project, we concentrated on demonstrating the accuracy and performance of these algorithms for the study of chronic diseases in which (1) acquisition of new knowledge about the condition is slow (i.e., the disease is well understood, so scientific discoveries are not being published at a rapid pace); and (2) the incidence and presentation of the disease do not vary dramatically from place to place, and from person to person. In this competitive renewal, we propose to develop decentralized predictive models that meet all requirements for chronic diseases, but the methods are also applicable to rapidly evolving acute conditions such as COVID-19. We propose new approaches to deal with sites that may be missing certain patient profiles or certain variables but can still participate in model learning, evaluation and implementation. These new AI algorithms will permit supervised and unsupervised learning across institutions, using data from multiple modalities (e.g., imaging, genomes, laboratory tests), and will allow privacy-protecting record linkage. We will test these algorithms and approaches in data from three highly diverse medical centers across the US: Emory University in Atlanta, University of Texas Health Science Center at Houston, and University of California, San Diego.