LEAPS-MPS: Statistical Learning on Next Generation Sequencing of T/B Cell Receptor Repertoire Data
- Funded by National Science Foundation (NSF)
- Total publications:0 publications
Grant number: 2137983
Grant search
Key facts
Disease
COVID-19Start & end year
20222024Known Financial Commitments (USD)
$245,681Funder
National Science Foundation (NSF)Principal Investigator
Tao HeResearch Location
United States of AmericaLead Research Institution
San Francisco State UniversityResearch Priority Alignment
N/A
Research Category
Pathogen: natural history, transmission and diagnostics
Research Subcategory
Immunity
Special Interest Tags
N/A
Study Type
Non-Clinical
Clinical Trial Details
N/A
Broad Policy Alignment
Pending
Age Group
Not Applicable
Vulnerable Population
Not applicable
Occupations of Interest
Not applicable
Abstract
Two crucial components of the adaptive immune system are the so-called T cells and B cells, whose function is identifying and responding to "body invaders", such as for example coronavirus or cancer cells. Following each of the identify-and-respond processes, B cells and T cells leave a lifetime lasting legacy on cell surfaces known as B/T cell receptors, or BCRs/TCRs, which the body uses to respond quickly and strongly once the pathogen is detected again. BCR/TCR repertoire, which are continually shaped throughout the lifetime of an individual in response to diseases and infections, can also serve as a fingerprint of one's current immunological profile. Recently, new technologies have enabled the profiling of BCR/TCR repertoire from a single sample of blood or tissue. However, due to the complex nature of the repertoire data, there is a need for novel statistical machine learning approaches and computational tools for immune repertoire data analysis. This project will produce statistical analysis methods, which will not only help us understand how the immune system is responding to disease or infection, but also help us advance precision medicine and immunotherapy, where treatments are developed and tailored to an individual for greater efficacy. This research has been designed to engage undergraduate and graduate students of mathematics and statistics, thus exposing them to the excitement of scientific discovery and preparing them for success in advanced degree programs and careers in academia and industry. By focusing on recruiting and training students from underrepresented groups, the PI will contribute to the diversification of the scientific workforce.
T cells and B cells represent a crucial component of the adaptive immune system and have been shown to mediate anti-humoral immunity and mediate immune response to respiratory coronavirus. Next generation sequencing of the T and B cell receptors (TCRs and BCRs) can be used as a platform to profile the TCR/BCR repertoire. Due to the complex characteristics of repertoire data (heterogeneous, high-dimensional, presents three layers of information: gene usage, abundance, clone network), there are very limited statistical models and inference tools existing in the literature. The current analyses tools lack the ability to identify the repertoire signatures that are associated with the outcome of interest or to integrate multiple layers of information. The main goal of this project is to develop advanced statistical methods and machine learning methods to 1) identify the gene and gene families associated with the outcome using the gene usage layer of repertoire; 2) prioritize the network properties associated with outcome using the network layer of repertoire; 3) integrate the multiple layers of repertoire to evaluate the joint effect of heterogenous repertoire profile on the outcome. Particularly, a Bayesian hierarchical model will be developed to differential gene usages, a permutation-assisted group lasso will be developed to prioritize both local and global properties for network analysis, and various kernel methods will be utilized to model the complex relationship between repertoire features and outcome. Simulation studies and real data analysis on a public-available covid database will be performed to demonstrate the methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
T cells and B cells represent a crucial component of the adaptive immune system and have been shown to mediate anti-humoral immunity and mediate immune response to respiratory coronavirus. Next generation sequencing of the T and B cell receptors (TCRs and BCRs) can be used as a platform to profile the TCR/BCR repertoire. Due to the complex characteristics of repertoire data (heterogeneous, high-dimensional, presents three layers of information: gene usage, abundance, clone network), there are very limited statistical models and inference tools existing in the literature. The current analyses tools lack the ability to identify the repertoire signatures that are associated with the outcome of interest or to integrate multiple layers of information. The main goal of this project is to develop advanced statistical methods and machine learning methods to 1) identify the gene and gene families associated with the outcome using the gene usage layer of repertoire; 2) prioritize the network properties associated with outcome using the network layer of repertoire; 3) integrate the multiple layers of repertoire to evaluate the joint effect of heterogenous repertoire profile on the outcome. Particularly, a Bayesian hierarchical model will be developed to differential gene usages, a permutation-assisted group lasso will be developed to prioritize both local and global properties for network analysis, and various kernel methods will be utilized to model the complex relationship between repertoire features and outcome. Simulation studies and real data analysis on a public-available covid database will be performed to demonstrate the methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.