CAREER: Advancing the Role of Ontologies for Data Science in Biomedicine
- Funded by National Science Foundation (NSF)
- Total publications:7 publications
Grant number: 2047001
Grant search
Key facts
Disease
COVID-19Start & end year
20212026Known Financial Commitments (USD)
$213,408Funder
National Science Foundation (NSF)Principal Investigator
Licong CuiResearch Location
United States of AmericaLead Research Institution
The University of Texas Health Science Center at HoustonResearch Priority Alignment
N/A
Research Category
13
Research Subcategory
N/A
Special Interest Tags
Data Management and Data Sharing
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
An ontology is a formal representation of concepts (or classes), properties, and relationships between concepts within a knowledge domain. Ontologies and terminologies have played a vital role in biomedical research for coding, managing, sharing, and exchange of vast amounts of heterogeneous biomedical data that are being continuously generated, such as in Electronic Health Records (EHRs). EHRs have been widely used in translational research to learn predictive models for discovery and disease management across varying patient cohorts. The very first step in such EHR-based applications often concerns patient cohort identification. Cohort identification involves the specification of a collection of eligibility criterion that needs to be transformed into a computable representation using the EHR's semantic backbone (i.e., coding systems or ontologies) before queries can run against the EHR database. However, there are two critical barriers in performing effective cohort identification from large-scale EHRs. The first one is data (or semantic) heterogeneity, caused by a mixed utilization of coding systems. The second one is the quality of the semantic backbone or ontology hierarchy, which is essential for translating patient eligibility criteria to executable database queries. To address such challenges, this project will develop new methods for ontology matching and for ontology quality enhancement that directly impact data science practice in biomedicine, such as patient cohort identification. In addition, this project will incorporate the proposed computational aspects into data science-based courses to train next generation data scientists.
This project consists of three research objectives. In Objective 1, the PI will develop new graph neural network (GNN)-based learning methods for matching biomedical ontologies by harnessing knowledge embedded in sources such as the Unified Medical Language System. This will address the heterogeneity issue and achieve semantic interoperability. In Objective 2, the PI will develop learning-based methods for detecting quality defects in subclass relations. This will address the quality issue and achieve continued enhancement of ontology hierarchies. In Objective 3, the PI will develop an ontology-based COVID-19 query engine for patient cohort identification, which is a real-world application of enhancing semantic interoperability for supporting data-driven COVID-19 research. For evaluation of the proposed methods, domain experts will be involved in validation of the resulted matching concepts and detected quality issues. The PI will communicate validated quality issues to the respective ontology owners for correction in subsequent ontology versions.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
This project consists of three research objectives. In Objective 1, the PI will develop new graph neural network (GNN)-based learning methods for matching biomedical ontologies by harnessing knowledge embedded in sources such as the Unified Medical Language System. This will address the heterogeneity issue and achieve semantic interoperability. In Objective 2, the PI will develop learning-based methods for detecting quality defects in subclass relations. This will address the quality issue and achieve continued enhancement of ontology hierarchies. In Objective 3, the PI will develop an ontology-based COVID-19 query engine for patient cohort identification, which is a real-world application of enhancing semantic interoperability for supporting data-driven COVID-19 research. For evaluation of the proposed methods, domain experts will be involved in validation of the resulted matching concepts and detected quality issues. The PI will communicate validated quality issues to the respective ontology owners for correction in subsequent ontology versions.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Publicationslinked via Europe PMC
Last Updated:3 days ago
View all publications at Europe PMC