Statistical Methods for T Cell Receptor (TCR) Analysis

Project Summary/Abstract The T cell receptor (TCR) repertoire of a subject resembles a huge book with millions of records, with each record being a TCR. Some of these records are generated by a random process and do not contain any clinically relevant information. Some other records, however, encode current or past history of immune-related diseases or exposure to pathogens. Accurate decoding of a TCR book gives us immensely valuable information on the health record of the corresponding subject. Such information can be collected through a single blood draw, which is much more convenient than conducting separate tests for different diseases or exposures, and thus is ideal as a screening or monitoring tool for a large population. In addition, T cell is very sensitive to detect even a very small amount of antigen and T cell memory lasts many years after the initial immune response. These features make the TCR repertoire an attractive source for constructing biomarkers for many immune-related diseases. An important factor that affects the TCRs in a TCR repertoire is the Human Leukocyte Antigen (HLA) of the corresponding subject. Each human has up to 16 unique HLA alleles that are part of thousands of HLA alleles in the human population. Ignoring HLA information leads to reduced accuracy of TCR-based biomarkers, particularly for the subjects with relatively rare HLAs. However, previous works have ignored the HLAs because they are highly polymorphic. We propose a statistical framework that combines powerful computational tools such as neural network with rigorous statistical models to fill this critical unmet need. Our methods deliver HLA- specific associations between TCRs and disease status and use TCRs to predict the disease status of a subject while conditioning on her/his HLA alleles. We will evaluate our methods by making prediction on the infection by cytomegalovirus or SARS-COV-2, though our methods are general, and they can be applied to study many other diseases/conditions that induce T cell response.

Statistical Methods for T Cell Receptor (TCR) Analysis

Key facts

Abstract