A typical biological experiment is a controlled experiment that explicitly limits biological and technical heterogeneity by using the samples from the same tissue and from patients that are treated similarly, and profiled using the same technology. These controlled experiments have improved our understanding of biology. However, a controlled experiment does not represent the real-world heterogeneity. Therefore, its results are nearly impossible to translate into clinical practice immediately.
The Khatri Lab has developed a novel framework for integrating relatively “Small Data” – datasets with a few tens of samples – into “Big Data” – a few hundreds to thousands of samples – in a consistent manner that is representative of the biological and technical heterogeneity observed in the real-world patient population. We have shown that compared to single-cohort analysis, following general guidelines of our framework, implemented in an R package MetaIntegrator, significantly improves reproducibility by integrating data from multiple independent cohorts, even when controlling for sample size. We have repeatedly demonstrated that utility of our framework in a broad spectrum of diseases including organ transplant rejection, infectious diseases (sepsis, bacterial infections, viral infections, tuberculosis, dengue), autoimmune diseases (systemic sclerosis, IBD, lupus), cancers (lung cancer, KRAS-associated cancers), pan-organ fibrosis, and vaccination for identifying signatures that are diagnostic, prognostic, therapeutic, and mechanistic.