^{1}

^{2}

^{‡}

^{1}

^{2}

^{‡}

^{1}

^{2}

^{1}

^{2}

^{3}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: OA. Performed the experiments: PS TES KAA OA. Analyzed the data: PS TES KAA OA. Contributed reagents/materials/analysis tools: PS TES KAA OA. Wrote the paper: PS TES KAA OA. Proved mathematical theorems: TES OA.

‡ These authors contributed equally to this work.

The number of large-scale high-dimensional datasets recording different aspects of a single disease is growing, accompanied by a need for frameworks that can create one coherent model from multiple tensors of matched columns, e.g., patients and platforms, but independent rows, e.g., probes. We define and prove the mathematical properties of a novel tensor generalized singular value decomposition (GSVD), which can simultaneously find the similarities and dissimilarities, i.e., patterns of varying relative significance, between any two such tensors. We demonstrate the tensor GSVD in comparative modeling of patient- and platform-matched but probe-independent ovarian serous cystadenocarcinoma (OV) tumor, mostly high-grade, and normal DNA copy-number profiles, across each chromosome arm, and combination of two arms, separately. The modeling uncovers previously unrecognized patterns of tumor-exclusive platform-consistent co-occurring copy-number alterations (CNAs). We find, first, and validate that each of the patterns across only 7p and Xq, and the combination of 6p+12p, is correlated with a patient’s prognosis, is independent of the tumor’s stage, the best predictor of OV survival to date, and together with stage makes a better predictor than stage alone. Second, these patterns include most known OV-associated CNAs that map to these chromosome arms, as well as several previously unreported, yet frequent focal CNAs. Third, differential mRNA, microRNA, and protein expression consistently map to the DNA CNAs. A coherent picture emerges for each pattern, suggesting roles for the CNAs in OV pathogenesis and personalized therapy. In 6p+12p, deletion of the p21-encoding

The growing number of large-scale high-dimensional datasets recording different aspects of a single disease promise to enhance basic understanding of life on the molecular level as well as medical diagnosis, prognosis, and treatment. This is accompanied by a fundamental need for mathematical frameworks that can create one coherent model from multiple datasets arranged in multiple order-matched, column-matched, and row-independent tensors, i.e., tensors of the same number of dimensions each, with one-to-one mappings among the columns across all but one of the corresponding dimensions among the tensors, but not necessarily among the rows across the one remaining dimension in each tensor. Consider, e.g., the structure of the DNA copy-number datasets in the Cancer Genome Atlas (TCGA) [

The higher-order generalized singular value decomposition (HO GSVD) is the only simultaneous decomposition to date of more than two such column-matched but row-independent datasets, which is by definition exact, and which mathematical properties allow interpreting its variables and operations in terms of the similar as well as dissimilar, e.g., biomedical reality among the datasets [

The GSVD and HO GSVD, however, are limited to datasets arranged in second-order tensors, i.e., matrices. We define, therefore, a novel tensor GSVD, i.e., an exact simultaneous decomposition of two datasets, arranged in two higher-than-second-order tensors of matched column dimensions but independent row dimensions. The tensor GSVD factors or separates the pair of tensors into corresponding pairs of “subtensors”, i.e., pairs of outer products or combinations of a paired set of patterns each: patterns, one across each of the matched column dimensions, which are identical for both tensors, combined with one pattern across the independent row dimension of either one of the two tensors. The pairs of subtensors are of varying relative mathematical significance, i.e., the significance of one subtensor in a pair in the corresponding tensor relative to the significance of the second subtensor in the second tensor varies among the pairs of subtensors. We prove that the tensor GSVD extends the GSVD and the tensor higher-order singular value decomposition (HOSVD) [

We demonstrate the tensor GSVD in comparative modeling of patient- and platform-matched but probe-independent ovarian serous cystadenocarcinoma (OV) tumor and normal DNA copy-number profiles from TCGA. Most of the tumors, i.e., >95%, are high-grade tumors [

By using survival analyses of the discovery and, separately, validation set of patients, as well as only the platinum-based chemotherapy patients in the discovery and validation sets, we find, first, and validate that each of the patterns across only the chromosome arms 7p and Xq, and across only the combination of the two chromosome arms 6p+12p (but not 6p nor 12p separately), is correlated with an OV patient’s prognosis and response to platinum-based chemotherapy, is independent of stage, and together with stage makes a better predictor than stage alone. By using survival analyses of only the > 95% patients with high-grade tumors, we find and validate that these patterns are also independent of the OV tumor’s grade. We observe three groups of significantly different prognoses among the patients classified by a combination of the 6p+12p, 7p, and Xq tensor GSVD classifications, suggesting a possible implementation of the patterns in a pathology laboratory test. Second, by using segmentation of the 6p+12p, 7p, and Xq patterns, we find that the amplifications and deletions identified by these patterns include most known OV-associated CNAs that map to these chromosome arms [

Taken together, a coherent picture emerges for each of these previously unrecognized chromosome arm-wide patterns of tumor-exclusive and platform-consistent co-occurring alterations, suggesting roles for the DNA CNAs in OV pathogenesis in addition to personalized diagnosis, prognosis, and treatment. In 6p+12p, loss of the p21-encoding

We selected primary OV tumor and normal DNA copy-number profiles of a set of 249 TCGA patients [_{1} and 𝒟_{2}, of _{1}-tumor and _{2}-normal probes × _{1} and _{2}, where _{1}, _{2} ≥

We define, therefore, a novel tensor GSVD that simultaneously separates the paired datasets into weighted sums of _{1,a}, or the corresponding normal-specific pattern across the normal probes, i.e., the “normal arraylet” _{2,a}, combined with one pattern of copy-number variation across the patients, i.e., an “_{a} _{i}, ×_{b} _{x} and ×_{c} _{y} denote tensor-matrix multiplications, which contract the _{i} with those of _{i}, _{x}, and _{y}, respectively, and where ⊗ denotes an outer product.

For each chromosome arm or combination of two chromosome arms, the structure of the tumor and normal discovery datasets (_{1} and _{2}) is that of two third-order tensors with one-to-one mappings between the column dimensions but different row dimensions. The patients, platforms, probes, and tissue types, each represent a degree of freedom. Unfolded into a single matrix, some of the degrees of freedom are lost and much of the information in the datasets might also be lost. We define a tensor GSVD that simultaneously separates the paired datasets into weighted sums of paired subtensors, i.e., combinations or outer products of three patterns each: Either one tumor-specific pattern of copy-number variation across the tumor probes, i.e., a tumor arraylet (a column basis vector of _{1}), or the corresponding normal-specific arraylet (a column basis vector of _{2}), combined with one pattern of variation across the patients, i.e., an _{1}) is a combination of (

Suppose that unfolding (or matricizing) both tensors 𝒟_{i} into matrices, each preserving the _{i}-row dimension, e.g., by appending the _{i,:lm} of the corresponding tensor, gives two full column-rank matrices _{i} ∈ ℝ^{Ki×LM}. We obtain the column bases vectors _{i} from the GSVD of _{i} [_{i} into matrices, each preserving the _{i} _{i} _{ix} ∈ ℝ^{Ki M×L} (or _{iy} ∈ ℝ^{Ki L×M}). We obtain the _{ix} (or _{iy}), i.e., the _{x} and _{y} are invertible. The column bases vectors are normalized and orthogonal, i.e., uncorrelated, such that

The generalized singular values are positive, and are arranged in Σ_{i}, Σ_{ix}, and Σ_{iy} in decreasing orders of the corresponding “GSVD angular distances”, i.e., decreasing orders of the ratios _{1,a}/_{2,a}, _{1x,b}/_{2x,b}, and _{1y,c}/_{2y,c}, respectively. We then compute the core tensors ℛ_{i} by contracting the row-, _{i} with those of the matrices _{i}, _{i,abc} tabulated in the core tensors are real but not necessarily positive. Our tensor GSVD construction generalizes the GSVD to higher orders in analogy with the generalization of the singular value decomposition (SVD) by the HOSVD [

We prove that our tensor GSVD exists for two tensors of any order because it is constructed from the GSVDs of the tensors unfolded into full column-rank matrices (Lemma A in _{i,a} and the row bases vectors _{i,a}, _{ix,b}, and _{iy,c}, respectively, and up to phase factors of ±1, such that each vector captures both parallel and antiparallel patterns (Lemma B in _{1} ∈ ℝ^{LM×L×M}, which row mode unfolding gives the identity matrix _{1} = ^{LM×LM}, and a tensor 𝒟_{2} of the same column dimensions reduces to the HOSVD of 𝒟_{2} (Theorem A in

The significance of the subtensor 𝒮_{i}(_{i} is defined proportional to the magnitude of the corresponding tensor generalized singular values ℛ_{i,abc} (Fig. C in _{1}(_{1} relative to that of 𝒮_{2}(_{2} is defined by the “tensor GSVD angular distance” Θ_{abc} as a function of the ratio ℛ_{1,abc}/ℛ_{2,abc}. This is in analogy with, e.g., the row mode GSVD angular distance _{a}, which defines the significance of the column basis vector _{1,a} in the matrix _{1} of _{2,a} in _{2} as a function of the ratio _{1,a}/_{2,a},
_{1,a}/_{2,a} ∈ [0, ∞), the row mode GSVD angular distances satisfy _{a} ∈ [−_{a} = _{1,a}/_{2,a} > > 1 (or −_{1,a}/_{2,a} < < 1), indicates that the row basis vector _{1,a} in _{1} and _{2,a} in _{2}, is exclusive to _{1} (or _{2}). An angular distance of _{a} = 0, which corresponds to _{1,a}/_{2,a} = 1, indicates a row basis vector _{1} and _{2}.

Thus, while the ratio _{1,a}/_{2,a} indicates the significance of _{1,a} in _{1} relative to the significance of _{2,a} in _{2}, this relative significance is defined, as previously described [_{a}, a function of the ratio _{1,a}/_{2,a}, which is antisymmetric in _{1} and _{2}. Note also that while other functions of the ratio _{1,a}/_{2,a} exist that are antisymmetric in _{1} and _{2}, the angular distance _{a}, which is a function of the arctangent of the ratio, i.e., arctan(_{1,a}/_{2,a}), is the natural function to use, because the GSVD is related to the cosine-sine (CS) decomposition, as previously described [_{1,a} and _{2,a} are related to the sine and the cosine functions of the angle _{a}, respectively.

_{abc} = θ_{a}

_{i} of _{i} of _{i} of _{i}, which preserve the row dimensions, i.e., the _{i}, and gives
_{i} are positive diagonal matrices, it follows that ℛ_{1,abc}/ℛ_{2,abc} = _{1,a}/_{2,a} = _{1,a}/_{2,a}. Substituting this in _{abc} = _{a}. Note that the proof holds for tensors of higher-than-third order.

From this it follows that the tensor GSVD angular distance ∣Θ_{abc}∣ ≤ _{1,abc}/ℛ_{2,abc} > 0, even though ℛ_{1,abc} and ℛ_{2,abc} are not necessarily positive. It also follows that Θ_{abc} = ±_{1} or 𝒟_{2}, respectively, and that Θ_{abc} = 0 indicates a subtensor common to both.

Note that since the generalized singular values are arranged in Σ_{i} of _{a}, the most tumor-exclusive tumor subtensors, i.e., 𝒮_{1}(_{a} of _{2}(_{a}, correspond to

We compute the tensor GSVD of the tumor and normal discovery datasets for each chromosome arm and each combination of two chromosome arms, separately (_{1}(_{1,abc} of

We, first, require the subtensor to be tumor-exclusive and platform-consistent: include the tumor arraylet _{1,a} that is the most exclusive to the tumor dataset, i.e., _{1,1}, as well as a

(

(^{−2}. The univariate Cox proportional hazard ratio is 1.7. (^{−2}, and the univariate Cox proportional hazard ratio 1.9. This validates the survival analyses of the discovery set of 249 patients. (

We find that each of the tensor GSVDs of only the chromosome arms 7p and Xq, and only the combination of the two chromosome arms 6p+12p (but not 6p nor 12p separately), uncovers a pattern of tumor-exclusive and platform-consistent co-occurring CNAs that is correlated with an OV patient’s prognosis in the discovery and, separately, validation set of patients.

To date, the best predictor of OV survival has remained the tumor’s stage at diagnosis [

We find and validate, by using survival analyses of the discovery and, separately, validation set of patients, as well as only the 88% and 95% platinum-based chemotherapy patients in the discovery and validation sets, respectively (Fig. F in

We also find and validate that each of these three tensor GSVDs is independent of each of the additional standard indicators (Tables A and B in

Note that while the discovery set of patients reflects the general OV patient population, with approximately 5%, 7%, 76%, and 12% of the patients diagnosed at stages I, II, III, and IV, respectively, the validation set reflects the high-stage OV patient population, with approximately 20% and 80% of the patients diagnosed at stages III and IV, respectively. The 6p+12p, 7p, and Xq tensor GSVDs, therefore, predict survival both in the general as well as in the high-stage OV patient population. Note also that the discovery and validation sets each include mostly, i.e., > 95% high-grade, i.e., grades 2 and higher tumors. Tumor grade does not correlate with survival in either the discovery or the validation set of patients. Survival analyses of only the > 95% patients with high-grade tumors in the discovery and, separately, validation set give qualitatively the same and quantitatively similar results to those of the analyses of 100% of the patients in each set, respectively. The 6p+12p, 7p, and Xq tensor GSVDs, therefore, predict survival in the high-grade OV patient population, and are independent of the OV tumor’s grade as well as the molecular distinctions between high- and low-grade OV tumors [

We observe three groups of significantly different prognoses among the discovery and, separately, validation set of patients, as well as only the platinum-based chemotherapy patients, classified by a combination of the three, i.e., 6p+12p, 7p, and Xq, tensor GSVD classifications, each of which is binomial (

(^{−3}. (^{−3}. This validates the survival analyses of the discovery set of 249 patients. (

This suggests a possible implementation of the 6p+12p, 7p, and Xq patterns in a pathology laboratory test, where a patient’s survival and response to platinum-based chemotherapy is predicted based upon the combination of the correlations of the OV tumor’s DNA copy-number profile with the 6p+12p, 7p, and Xq patterns.

OV tumors exhibit significant CNA variation among them, much more so than, e.g., GBM brain tumors [

We find, by using segmentation [

We also find that the three arraylet patterns include novel frequent focal CNAs (segments < 125 probes). Among these, four amplifications and two deletions are significantly correlated with OV survival (Fig. J in

We find, by using gene ontology enrichment analyses of the OV tumor mRNA expression profiles of the patients [

The differential mRNA expression of genes from these enriched ontologies that are located on any one of the chromosome arms is consistent with the CNAs across that arm (Fig. K in

The genes, which are significantly (Mann-Whitney-Wilcoxon ^{−3}) in the ontologies of cellular response to ionizing radiation (GO:0071479), and major histocompatibility (MHC) protein complex (GO:0042611). Most of the GO:0071479 genes are underexpressed, including the p21 cyclin-dependent kinase inhibitor-encoding

One of only two GO:0071479 overexpressed genes is the

Note that while the 6p+12p pattern of CNAs is correlated with survival in the discovery and, separately, validation sets, neither the 6p nor the 12p pattern alone are correlated with survival. Indeed, experiments studying the conditions for the transformation of human normal to tumor cells indicate that cells, where both p21 and p38 are inactive, are susceptible to Ras-mediated transformation [

In addition, p21 and p38 are necessary for p53-mediated cell cycle arrest [

Taken together, previously unrecognized co-occurring deletion of

The genes that are significantly differentially expressed between the 7p tensor GSVD classes are enriched (hypergeometric ^{−10}) in the ontology of DNA strand elongation involved in DNA replication (GO:0006271). Most of these genes are overexpressed, including the DNA polymerase delta subunit 2-encoding

Taken together, previously unrecognized co-occurring deletion and underexpression of

The genes that are differentially expressed between the Xq tensor GSVD classes are enriched (hypergeometric ^{−6}) in the ontology of antigen processing and presentation of peptide antigen (GO:0048002). Most of these genes are overexpressed, including the B-cell receptor-associated protein 31-encoding

Taken together, previously unrecognized co-occurring deletion of

We defined a novel tensor GSVD, an exact simultaneous decomposition of two datasets, arranged in two higher-than-second-order tensors of matched column dimensions but independent row dimensions. We showed that the mathematical properties of the tensor GSVD allow interpreting its variables and operations in terms of the similar as well as dissimilar, e.g., biomedical reality between the datasets. We demonstrated the tensor GSVD in comparative modeling of patient- and platform-matched but probe-independent OV tumor and normal DNA copy-number profiles from TCGA. The modeling resulted in new insights into the poorly understood relations between an OV tumor’s genome and a patient’s survival phenotype. Three previously unrecognized chromosome arm-wide patterns of tumor-exclusive and platform-consistent co-occurring alterations were uncovered, across 6p+12p, 7p, and Xq, that are correlated with an OV patient’s survival and response to platinum-based chemotherapy, and are of possible roles in OV pathogenesis, and of a possible implementation in a pathology laboratory test for personalized OV diagnosis, prognosis, and treatment.

Note that unlike previous analyses of the TCGA OV DNA copy-number data, notably by TCGA [

Unlike recent approaches to the integrative modeling of different types of large-scale molecular biological profiles from the same set of patients, notably clustering [

Additional possible applications of the tensor GSVD in personalized medicine include comparative modeling of two patient- and tissue-matched datasets, each corresponding to (

(PDF)

A PDF format file, readable by Adobe Acrobat Reader. The corresponding Mathematica 9.0.1 code file, executable by Mathematica and readable by Mathematica Player, is available at

(PDF)

A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing TCGA annotations of the discovery set of 249 patients. The tumor and normal profiles of the discovery set of patients measured by each of the two DNA microarray platforms, tabulating relative copy-number variation across the 6p+12p, 7p, and Xq tumor and normal probes, are available in tab-delimited text format files at

(TXT)

A tab-delimited text format file reproducing TCGA annotations of the validation set of 148 patients. The tumor profiles of the validation set of patients, tabulating relative copy-number variation across the 6p+12p, 7p, and Xq tumor probes, are available in tab-delimited text format files at

(TXT)

A tab-delimited text format file tabulating the segments of the first, most tumor-exclusive tumor arraylets computed by tensor GSVD of the discovery set of patients across 6p+12p, 7p, or Xq.

(TXT)

A tab-delimited text format file tabulating differential expression of 11,457 autosomal and X chromosome mRNAs in the 6p+12p, 7p, and Xq tensor GSVD classes. The mRNA expression profiles of 394 of the 397 patients in the discovery and validation sets are available in tab-delimited text format files at

(TXT)

A tab-delimited text format file tabulating differential expression of 639 autosomal and X chromosome microRNAs in the 6p+12p, 7p, and Xq tensor GSVD classes. The microRNA expression profiles of 395 patients are available in tab-delimited text format files at

(TXT)

A tab-delimited text format file tabulating differential expression of 175 antibodies that probe for 136 autosomal and X chromosome proteins in the 6p+12p, 7p, and Xq tensor GSVD classes. The protein expression profiles of 282 patients are available in tab-delimited text format files at

(TXT)

We thank RA Horn for thoughtful discussions of matrix analysis in general, and the tensor GSVD in particular. We thank DDL Bowtell and MM Janát-Amsbury for useful notes on OV in general, and the molecular distinctions between high- and low-grade OV tumors in particular. We also thank RA Weinberg for helpful comments on the hallmarks of cancer in general, and the transformation of human normal to tumor cells in particular.

_{3}receptor subtypes by caspases and calpain during TNFα-induced apoptosis of human T-lymphoma cells

^{Kip1}induces cell migration