Statistical Methods for Information Synthesizing Using Multiple Existing Longitudinal Cohort Studies
- Funded by National Institutes of Health (NIH)
- Total publications:0 publications
Grant number: 1R01HL173153-01A1
Grant search
Key facts
Disease
COVID-19Start & end year
20252029Known Financial Commitments (USD)
$500,660Funder
National Institutes of Health (NIH)Principal Investigator
Yifei SunResearch Location
United States of AmericaLead Research Institution
COLUMBIA UNIVERSITY HEALTH SCIENCESResearch Priority Alignment
N/A
Research Category
13
Research Subcategory
N/A
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
PROJECT SUMMARY Building on the valuable groundwork laid by the Collaborative Cohort of Cohorts for COVID-19 Research (C4R), this research project aims to advance the statistical methods used in pooled cohort studies. Pooled cohort studies are a powerful tool in clinical and epidemiological research, enabling the detection of subtle effects and interactions and improving the generalizability of findings through increased sample diversity. However, they pose unique challenges, particularly systematic missing data and potential heterogeneity across studies. Our goal is to address these challenges and improve the robustness of pooled cohort studies. To achieve this goal, we have structured four specific aims: Under Aim 1, we propose a novel Generalized Method of Moments (GMM) framework for robust statistical inference across multiple studies dealing with systematically missing data. Our investigation will probe into the missing data mechanism across multiple samples, employing density ratio weight- ing to handle the heterogeneity in covariate and outcome distributions. Under Aim 2, we propose nonparametric predictive models that leverage data from multiple studies with systematically missing data. We will develop a gradient boosting algorithm for versatile prediction model accommodating predictors of varying detail. Addition- ally, we will design algorithms for cohort-specific prediction models that take advantage of information from other cohorts. Aim 3 extends the proposed methods for systematically missing data and cohort heterogeneity to right- censored time-to-event data. Under Aim 4, we perform comprehensive evaluations through simulations and real data analyses, and develop user-friendly analytical pipelines for the proposed methods. Our research design and methods are centered around developing, testing, and refining these new statistical methods. These methods will then be evaluated both via simulation and real-world application to the C4R data. The long-term objective is to establish reliable tools for integrating multiple cohorts and conducting individual participant data meta-analysis. The development of these robust statistical methods and a systematic pipeline for the pooled analysis of system- atically missing data will provide valuable tools for researchers working with pooled cohort data. This project will enhance the validity and reliability of findings from the C4R study, and thereby contribute to a more accurate un- derstanding of risk and resilience factors for COVID-19 severity and outcomes. Our findings will be disseminated widely, including the development of user-friendly software to facilitate the application of our proposed methods.