Decentralized differentially-private methods for dynamic data release and analysis

  • Funded by National Institutes of Health (NIH)
  • Total publications:0 publications

Grant number: 5R01LM013712-07

Grant search

Key facts

  • Disease

    COVID-19
  • Start & end year

    2023
    2025
  • Known Financial Commitments (USD)

    $608,109
  • Funder

    National Institutes of Health (NIH)
  • Principal Investigator

    Xiaoqian Jiang
  • Research Location

    United States of America
  • Lead Research Institution

    YALE UNIVERSITY
  • Research Priority Alignment

    N/A
  • Research Category

    Health Systems Research

  • Research Subcategory

    Health information systems

  • Special Interest Tags

    Data Management and Data Sharing

  • Study Type

    Non-Clinical

  • Clinical Trial Details

    N/A

  • Broad Policy Alignment

    Pending

  • Age Group

    Not Applicable

  • Vulnerable Population

    Not applicable

  • Occupations of Interest

    Not applicable

Abstract

Project Summary Large data sets are important in the development and evaluation of artificial intelligence (AI) and statistical learning models to predict morbidity, mortality, and other important health outcomes. Healthcare institutions are stewards of their patients' data, and want to contribute to the development, evaluation, and utilization of predictive analytics tools. However, they also know that simple "de-identification" per HIPAA rules is not sufficient to protect patient privacy. Additionally, other factors such as protection of market share, lack of control about who uses shared data for what purposes, and concerns about patients' reactions to having their data shared without explicit consent make initiatives such as certain registries and centralized repositories difficult to implement. We have shown that it is possible to decompose algorithms so that they can run on data that stays at each healthcare center, thus mitigating the concerns about control and potential misuse. In the first phase of this project, we concentrated on demonstrating the accuracy and performance of these algorithms for the study of chronic diseases in which (1) acquisition of new knowledge about the condition is slow (i.e., the disease is well understood, so scientific discoveries are not being published at a rapid pace); and (2) the incidence and presentation of the disease do not vary dramatically from place to place, and from person to person. In this competitive renewal, we propose to develop decentralized predictive models that meet all requirements for chronic diseases, but the methods are also applicable to rapidly evolving acute conditions such as COVID-19. We propose new approaches to deal with sites that may be missing certain patient profiles or certain variables but can still participate in model learning, evaluation and implementation. These new AI algorithms will permit supervised and unsupervised learning across institutions, using data from multiple modalities (e.g., imaging, genomes, laboratory tests), and will allow privacy-protecting record linkage. We will test these algorithms and approaches in data from three highly diverse medical centers across the US: Emory University in Atlanta, University of Texas Health Science Center at Houston, and University of California, San Diego.