Multi-omics Data Integration, Interpretation, and Its Application

Abstract

To study complex biological processes holistically, it is imperative to take an integrative approach that combines multi-omics data to highlight the interrelationships of the involved biomolecules and their functions. With the advent of high-throughput techniques and availability of multi-omics data generated from a large set of samples, several promising tools and methods have been developed for data integration and interpretation. In this review, we collected the tools and methods that adopt integrative approach to analyze multiple omics data and summarized their ability to address applications such as disease subtyping, biomarker prediction, and deriving insights into the data. We provide the methodology, use-cases, and limitations of these tools; brief account of multi-omics data repositories and visualization portals; and challenges associated with multi-omics data integration.

Keywords: multi-omics, data integration, disease subtyping, biomarker prediction, data repositories

Go to:

Introduction

Comprehensive understanding of human health and diseases requires interpretation of molecular intricacy and variations at multiple levels such as genome, epigenome, transcriptome, proteome, and metabolome. With the advent of sequencing technology, biology has become increasingly dependent on data generated at these levels, which together is called as “multi-omics” data. Availability of multi-omics data has revolutionized the field of medicine and biology by creating avenues for integrated system-level approaches.

Analysis of multi-omics data along with clinical information has taken the front seat in deriving useful insights into the cellular functions. Integration of multi-omics data providing information on biomolecules from different layers seems to be promising to understand the complex biology systematically and holistically.1 Integrated approaches combine individual omics data, in a sequential or simultaneous manner, to understand the interplay of molecules.2 They help in assessing the flow of information from one omics level to the other and thus help in bridging the gap from genotype to phenotype. Integrative approaches, by virtue of their ability to study the biological phenomenon holistically, have the ability to improve prognostics and predictive accuracy of disease phenotypes and hence can eventually aid in better treatment and prevention.1,3

In recent times, various studies have shown that combining omics data sets yield better understanding and clearer picture of the system under study. For instance, integrative analysis of ChIP-Seq and RNA-Seq data of head and neck squamous cell carcinoma (HNSCC) cell lines showed that cancer-specific histone marks, H3K4me3 and H3K27ac, are associated with transcriptional changes in HNSCC driver genes, epidermal growth factor receptor (EGFR), FGFR1, and FOXA1.4 Zhang et al5 showed the importance of integrating proteomics data along with genomic and transcriptomic data to prioritize driver genes in colon and rectal cancers. Their results showed that chromosome 20q amplicon was associated with the largest global changes at both messenger RNA (mRNA) and protein levels. Integration of proteomics data helped in the identification of potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34), and SRC (SRC proto-oncogene, nonreceptor tyrosine kinase).5 In another study, integrating metabolomics and transcriptomics yielded molecular perturbations underlying prostate cancer. The metabolite sphingosine demonstrated high specificity and sensitivity for distinguishing prostate cancer from benign prostatic hyperplasia, as reported in this study. Downstream of sphingosine, the impaired sphingosine-1-phosphate receptor 2 signaling represents a loss of tumor suppressor gene and a potential key oncogenic pathway for therapeutic targeting.6

These studies widely proved the importance of integrating multi-omics data over single omics analysis. Employment of multi-omics approach has resulted in the development of various tools, methods, and platforms provisioning multi-omics data analysis, visualization, and interpretation. There are various review articles that cover the importance of multi-omics approaches from different perspectives. Multiple reviews are available that provide a summary of the multi-omics data integration methodologies categorized based on their underlying mathematical aspects.2,7-9 Yan et al1 summarize the network-based approaches used for multi-omics data analysis, whereas Tini et al10 provide benchmarking of unsupervised clustering methods in data integration.

In this review, we focus on the tools and methods that perform integration of multiple omics data and discuss in detail about their applications in understanding the complex human biology. The tools are chosen based on the below-mentioned criteria:

The approach must perform an integrative step wherein multiple data sets are analyzed in a simultaneous manner (parallel integration of data sets and not sequential). Platforms such as Galaxy11and O-Miner12 that help in analyzing multi-omics data, albeit individually, are not part of this review.
The approach must integrate at least 2 omics data sets derived from samples that have at least partial overlap.
The method or approach should be readily available in the form of tool/package to be able to execute the method on any data set.

In the following sections, the tools/methods are classified based on their ability to address diverse biological case studies showcased in their publications using multi-omics data. We also provide a detailed account of various portals that allow visualization of multi-omics data sets along with analysis that aids in understanding the correlation between the omics data sets.

Go to:

Omics Data Types and Repositories

Multi-omics data broadly cover the data generated from genome, proteome, transcriptome, metabolome, and epigenome. The spectrum of omics can be further extended to other biological data such as lipidome, phosphoproteome, and glycol-proteome. Multi-omics data generated for the same set of samples can provide useful insights into the flow of biological information at multiple levels and thus can help in unraveling the mechanisms underlying the biological condition of interest. There are a few publicly available databases, listed in Table 1, that provide multi-omics data sets of patients.

Table 1.

List of multi-omics data repositories.