Our Research

We are a computational group (“dry lab”). We develop methods to study human immune response as a biological system by integrating transcriptome, cellular, and epigenome. We perform systems biology analysis within and across diseases using data from patient populations across the world that represent the real-world biological and technical heterogeneity.

Multi-Cohort Analysis Cell Frequency Deconvolution Deep Learning Reproducible Research Single-cell epigenome profiling Electronic Health Records Translational Research Computational & Systems Immunology

Multi-Cohort Analysis

A typical biological experiment is a controlled experiment that explicitly limits biological, clinical, and technical heterogeneity. Although these controlled experiments have improved our understanding of biology, a controlled experiment does not represent the real-world heterogeneity. Therefore, its results are nearly impossible to translate into clinical practice immediately.

The Khatri Lab has developed a multi-cohort analysis framework for integrating these “dirty data” in a consistent manner that is representative of the biological, clinical, and technical heterogeneity observed in the real-world patient population. We have shown that compared to single-cohort analysis, following general guidelines of our framework, implemented in R packages MetaIntegrator or bayesMetaIntegrator, significantly improves reproducibility by integrating data from multiple independent cohorts, even when controlling for sample size. We have repeatedly demonstrated that utility of our framework in a broad spectrum of diseases including organ transplant rejection, infectious diseases (sepsis, bacterial infections, viral infections, tuberculosis, dengue), autoimmune diseases (systemic sclerosis, IBD, lupus), cancers (lung cancer, KRAS-associated cancers), pan-organ fibrosis, and vaccination for identifying signatures that are diagnostic, prognostic, therapeutic, and mechanistic.

Cell Frequency Deconvolution

Using heterogeneous data can also improve new methods development. We have created immunoStates, a basis matrix, for estimating percentages of 20 immune cell types using transcriptome data. We have demonstrated that immunoStates is not affected by biological and technological differences in blood transcriptome data, irrespective of the method used. We have shown that immunoStates can be used to perform in silico cellular phenotyping across multiple independent cohorts to analyze changes in immune profiles that identify hitherto unknown biology and biomarkers in tuberculosis, pan-organ fibrosis, pan-viral infections, and influenza infection.

Deep Learning with Biomedical Datasets

We are applying techniques from deep learning to enhance drug discovery and learn about the relationships between diseases by integrating multiple types of data (from EHRs to high-throughput sequencing). We are integrating the advances in technologies with the advances in machine learning to ask and answer novel questions in biology that have been impossible to ask. For example, histone modifications interact with each other. However, although we can create networks of genes, proteins, and metabolites interactions, creating networks of histone modification interactions at a system level has been impossible. We developed a novel mass cytometry-based technology, EpiTOF, to measure multiple histone modifications at a single-cell resolution. Then, we developed a neural processes-based machine learning method to leverage these data and identify which histone modifications interact with each other and the direction of their interactions. We can now infer networks of interactions between histone modifications and how they change following a perturbation (e.g., vaccine or disease).

Reproducible Research

Biomedical research is facing a ‘reproducibility crisis.’ Contributing to this problem is the fact that most studies are conducted on small, homogeneous samples which do not reflect the natural heterogeneity of a given disease. To address this, the Khatri lab has developed the multi-cohort analysis platform, called MetaIntegrator, and used it to analyze over 50,000 human samples with approximately 1.5 billion data points from 103 diseases. We’ve made these results available to the public at metasignature.stanford.edu, with the goal of promoting data-driven hypothesis generation.

The Khatri lab is also interested in studying the systemic challenges to reproducible research. For example, we have carried out analyses highlighting how gene annotation can impede biomedical research and how interpretation of the same experiment changes over time. The most differentially expressed genes (determined by our comprehensive analysis above) are often poorly annotated and studied.

EpiTOF – single-cell epigenome profiling

In collaboration with the Utz lab, we have developed a mass cytometry-based technology for measuring epigenetic profiles at a single-cell level, called Epigenetic profiling using Cytometry Time of Flight (EpiTOF). EpiTOF allows measuring 8 classes of histone modifications and 4 histone variants along with various immune cell lineage markers. Using EpiTOF, we have demonstrated profound effect of aging on epigenome while accounting for genetics, a novel epigenetic mechanism for monocyte-to-macrophage differentiation, and memory in innate immune cells following influenza vaccine. We are also developing novel machine learning methods to leverage these unique data. For example, using EpiTOF, our neural processes-based method can now infer which histone modifications interact with each other and the direction of their interaction. Download data from these papers here.

Electronic Health Records

We integrate large datasets that span every level of molecular characterization, from epigenetic data to patient records. We integrate “science of medicine” (molecular data) with “practice of medicine” (electronic health records (EHR) and claims data) to draw connections and conclusions about factors that impact patient health. For example, we used cellular deconvolution of publicly-available transcriptome data to predict increased monocytes as a prognostic marker of poor outcomes in patients with a fibrotic disease. We then used EHRs from different hospitals and claims data to validate our prediction that has now been independently validated by multiple independent groups.

We have also integrated molecular data with EHRs to repurpose FDA-approved drugs. In a recent analysis, we integrated transcriptome profiles of Ulcerative Colitis (UC) patients across 11 independent cohorts and predicted atorvastatin, a lipid-lowering FDA-approved drug, can be repurposed to treat UC. Using EHRs from Stanford and Optum, we showed that patients with UC, who received atorvastatin, had reduced rates of colectomy, hopsitalization, and steroid prescription. As another example, based on a multi-cohort analysis of solid organ transplant, we predicted two drugs that could be repurposed to treat organ transplant patients. After demonstrating that treatment with our predicted drugs reduce allograft injury in a mouse model, we used EHRs of patients with kidney transplant to demonstrate that one of our predicted drugs substantially reduces graft failures.

Translational Research

We believe computational researchers have an integral role to play in translating basic science research into clinical practice. We collaborate with clinicians to prospectively validate our computational discoveries in independent cohorts, with basic science researchers to understand mechanisms underlying results of our analysis, and industry to translate our research to clinical practice. Several host response based diagnostics we developed are currently being translated in point-of-care tests for diagnosis of tuberculosis and sepsis.

Computational & Systems Immunology

We are interested in any research question related to our immune system. We believe that almost every human disease has an immune component that must be addressed to cure a disease. But immune system has many components that complement and work with each other. Each immune cell type, innate or adaptive, has different function. Immune cell pathways could also differ based on where an immune cell is – in circulation in periphery or a site of a disease (e.g., infection or tumor). Therefore, instead of focusing on a specific immune cell type in a specific disease, we are interested in studying our immune response as a whole, i.e., as a system. Of course, the role of a computational scientist is once again central in this research. We have developed methods and frameworks to integrate data from different modalities across different cohorts and tissues to understand how different components of immune system work together in a given inflammatory condition. We have developed and applied our computational frameworks for analysis of infections and vaccines. Using systems immunology analysis, we have showed that changes in peripheral levels of NK cells inform disease progression and treatment responses in tuberculosis, immune response in tuberculosis patients in peripheral blood and granuloma are correlated, and identified conserved immune response to pan-viral infections that predict severity.

We are building a phylogeny of host response by “reading the immune response”

The Issue of Heterogeneity

“P-hacking” and animal models that are not representative of human biology have been associated with “reproducibility crisis” in biomedical science. However, we believe there is another very important factor contributing to this “reproducibility crisis” – data used for biomedical discovery are not representative of the real-world patient population. Specifically, these data lack biological and technical heterogeneity observed in the real-world patient population. In the Khatri lab, we focus on developing methods and performing analyses such that both biological and technical heterogeneity is represented in the data in order to identify robust biological signal.

Our ongoing research is also highlighting another aspect of reproducibility – the street lamp effect. We as researchers tend to focus on ideas (drugs, genes, diseases, etc) that are already well-studied. This leaves many potential breakthroughs in the dark. The Khatri lab uses data-driven approach to shed light on them. We have shown that there is substantial ‘research bias‘ in literature that results in different interpretation of the same biological experiment over time.