Download
Article

A multivariate statistical analysis of the effects of styrene maleic acid encapsulated RL71 in a xenograft model of triple negative breast cancer

Orleans N.K. Martey1, Khaled Greish2, Paul F. Smith1, Rhonda J. Rosengren1*
1Department of Pharmacology and Toxicology, School of Biomedical Sciences, University of Otago, Dunedin 9045, New Zealand
2Department of Molecular Medicine, College of Medicine and Medical Sciences, Arabian Gulf University, Manama, Kingdom of Bahrain
*Corresponding author: Rhonda J. Rosengren, Email: rhonda.rosengren@otago.ac.nz
Competing interests: The authors have declared that no competing interests exist.
Abbreviations used: AUC, area under the curve; CA, cluster analyses; DFS, disease-free survival; ECL, enhanced chemiluminescence; GLM, general linear model; IHC, immunohistochemical; IVs, independent variables; LDA, linear discriminant analysis; LOO, leave-one-out; MVAs, multivariate statistical analyses; MVD, microvessel density; MLR, multiple linear regression; OOB, out-of-bag; OS, overall survival; RFC, random forest classification; ROC, receiver-operating characteristic; rHTK1, recombinant human TK1 protein; SMA, styrene maleic acid; Tk1, thymidine kinase 1; TNBC, triple negative breast cancer
Received June 3, 2019; Revision received September 8, 2016; Accepted October 7, 2019; Published December 16, 2019
Abstract
We have previously shown that the curcumin derivative 3,5-bis(3,4,5-trimethoxybenzylidene)-1-methylpiperidine-4-one (RL71), when encapsulated in styrene maleic acid micelles (SMA-RL71), significantly suppressed the growth of MDA-MB-231 xenografts by 67%. Univariate statistical analysis showed that pEGFR/EGFR, pAkt/Akt, pmTOR/mTOR and p4EBP1/4EPBP1 were all significantly decreased in tumors from treated mice compared to SMA controls. In this study, multivariate statistical analyses (MVAs) were performed to identify the molecular networks that worked together to drive tumor suppression, with the aim to determine if this analysis could also be used to predict treatment outcome. Linear discriminant analysis correctly predicted, to 100% certainty, mice that received SMA-RL71 treatment. Additionally, results from multiple linear regression showed that the expression of Ki67, PKC-α, PP2AA-α, PP2AA-β and CaD1 networked together to drive tumor growth suppression. Overall, the MVAs provided evidence for a molecular network of signaling proteins that drives tumor suppression in response to SMA-RL71 treatment, which should be explored further in animal studies of cancer.
Graphic Abstract
Keywords: breast tumor, RL71, multivariate statistics, data mining

INTRODUCTION

Multivariate statistical analyses (MVAs) have been widely applied to cancer genomics and proteomics in humans but rarely in the context of experimental cancer studies in animal models. In this study, we sought to investigate their value in the context of understanding the tumor suppressive actions of a 2nd-generation curcumin (diferuloylmethane) analogue, 3,5-bis(3,4,5-trimethoxybenzylidene)-1-methylpiperidine-4-one (RL71) [1-9], encapsulated in styrene maleic acid (SMA) micelles (SMA-RL71; [10-13]), in animals expressing a xenograft model of triple negative breast cancer (TNBC). In this model of TNBC, SMA-RL71 (10 mg/kg, iv.) was previously shown to decrease tumor growth by 67% and modulated the expression of EGFR, Akt, mTOR, and 4EBP1 [13]. However, only univariate statistical analyses were conducted on this data set. The mechanism of action of SMA-RL71 is more likely to involve complex interactions amongst various pathways. Therefore, in this current study, the range of proteins examined was extended and multivariate statistical and data mining analyses were used, in order to identify a network of signaling proteins involved in tumor suppression [14]. We employed a combination of multiple linear regression (MLR), linear discriminant analysis (LDA), random forest classification (RFC) and cluster analyses (CA) to achieve this aim. The specific objective of using LDA and RFC was to determine whether a combination of measurements of tumor-related independent variables could be used to correctly classify animals as having received drug treatment or not. This type of classification analysis is useful in the development of biomarkers for cancer and drug-responsiveness in the treatment of cancer [15]. The specific objectives of using MLR and CA were to determine whether a combination of measurements of similar independent variables could be used to predict the value of a continuous variable, such as tumor growth, in the case MLR, or to reveal the association between different continuous variables, in the case of CA. Again, such multivariate statistical methods have been used in the field of clinical cancer, but rarely in animal studies of cancer.

MATERIALS AND METHODS

The MVAs reported here are based on data partly reported previously [13]. Below is an abbreviated description of those methods. Analyses were based on n = 11 mice in the vehicle control group and n = 11 mice in the drug-treated group.

Preparation of SMA-RL71 micelles and xenograft model of TNBC

SMA-RL71 micelles were prepared as described previously [11]. SMA was used as a vehicle control by dissolving in NaOH and adjusting the pH to 7.4.
Female SCID mice (7–8 weeks old, 8/group) were inoculated s.c. into the rear flank with MDA-MB-231 cells (1 × 106/0.1 ml Matrigel 50%). Once tumors reached 100 mm3, the mice were randomly allocated into treatment groups. The mice received SMA-RL71 (10 mg/kg, iv) or SMA control twice a week for 3 weeks via the tail vein. Two independent measurements of tumor volume were performed bi-weekly using electronic calipers. The mice were euthanized 24 d after treatment began and full necropsies were performed.

Immunohistochemistry of tumor sections

Tissue sections were analyzed for both microvessel density (MVD) via CD105 staining and apoptosis via the ApopTag kit as previously described [13]. Briefly, tumors were embedded in cryomatrix, sectioned (6 µm), and fixed in acetone. When slide preparation was complete, the slides were scanned with an Aperio Image ScanScope System (Leica, Chicago, IL) and analyzed by an individual who was blinded to the treatment groups. The microvessel analysis algorithm was used to quantify the MVD at a dark- and light-staining threshold of 185 and 210, respectively. The nuclear image analysis algorithm was used to quantify apoptotic stained cells as the percentage of positively stained nuclei.
To add more proteins to the data set, proliferation was quantified by determining the number of cells with positive Ki67 nuclear staining. Sections were pre-treated with antigen retrieval solution (10 mM citrate buffer with 0.05% Tween 20, pH 6.0) for 20 min at 95°C in a pre-heated jar after blocking endogenous peroxidases. Sections were then incubated with the blocking buffer in a humidified chamber for 1 h and stained with a monoclonal mouse anti-human Ki67 antibody (1:100) containing biotin, overnight at 4°C in a humidified chamber. Sections were then treated with polyclonal goat anti-mouse IgG (11 mg/L) secondary antibody for 30 min at room temperature. Negative controls were generated by substituting antigens with PBS. Sections were counterstained with haematoxylin QS, dehydrated, and DPX mounting medium was used to mount cover slips. The nuclear image analysis algorithm of the system was used to quantify the percentage (Pi) and classify the intensity (i) of positively stained proliferative nuclei and expressed as HScore = ∑Pi (i+1) [16].

Detection of plasma thymidine kinase 1 by dot-blot assay and tumor lysate preparation and immunoblot analysis

Plasma thymidine kinase 1 (Tk1) levels in mice were measured by the enhanced chemiluminescent dot blot assay. Three µl of serum from control and SMA-RL7 treated mice as well as the recombinant human TK1 protein (rHTK1) standard (0.00056–0.18 µM), were applied to a nitrocellulose membrane and allowed to air dry. Membranes were then blocked with 10% non-fat milk in TBS for 1 h and washed 3× with TBS for 5 min. Membranes were incubated with anti-TK1 monoclonal antibody (1:500 in 5% BSA) at room temperature for 2 h. Membranes were then incubated (1 h) with biotinylated anti-mouse secondary antibody (1:1000) at room temperature, washed with TBST (5×) followed by streptavidin-conjugated HRP for 30 min and washed 5× with TBST. Membranes were then developed with SuperSignal substrate. Signal intensity was visualized on radiographic film and quantified with a GS-710 densitometer (Bio-Rad). PlasmaTK1 was determined using linear regression as a function of intensity and concentration of rHTK1. Protein extracts from tumors were prepared as previously described [13,16]. The density of each band was normalized to a β-tubulin loading control.

MVAs

MLR: Backward MLR was used to predict tumor volume. The validity of the MLR was based on an adjusted R2, which indicated the strength of the prediction [17-19]. In backward MLR, independent variables (IVs) are removed one at a time, in descending order of significance, to determine how the adjusted R2 changes. Backward MLR is preferred over some other methods because it allows for the examination of the interaction between variables. Therefore, the interaction between IVs was investigated. MLR is prone to artifacts, one being that the adjusted R2 increases with the number of IVs, even if they are not meaningful. The use of an ‘adjusted R2’ partly controls for this, but nonetheless, too many IVs and excessive correlation amongst them can lead to overfitting and multicollinearity [14,20-23]. Multicollinearity was tested using a variance inflation factor and autocorrelation using the Durban-Watson statistic [14,20-23].
Because MLR is part of the general linear model (GLM), it does assume that the data conform to multivariate normality, homogeneity of the covariance matrices and are independent. Diagnostic plots in the R program were used to assess these assumptions. The overall significance of the MLR was tested using the regression ANOVA, and the significance of the individual predictors was tested using t tests [17-19,22]. All MLR analyses were performed using the programs SPSS 25 and R (version 3.6.1).
LDA: LDA is used to predict categorical variables, and was used here to predict whether the animals had received drug treatment. LDA is also part of the GLM, and therefore assumes that the data are normally distributed and are independent [14]. A form of LDA was used in which the IVs were entered together, since stepwise LDA is prone to artifacts [24]. The analysis yielded a specific LDA and a standardized canonical discriminant function that indicated which IVs are important in their relationship to the dependent variable. The statistical significance of the LDA was tested using Wilk’s λ and its validity was tested using cross-validation. Cross-validation for the LDA in this study was conducted using a leave-one-out (LOO) procedure. Simulation studies by Zavorka and Perret [25] suggest that, with k = 4 predictor variables, as was the case here, and low-moderate bivariate correlation, sample sizes in the range of n1 = n2 = 714 are sufficient (see also Lachenbruch [26]).
RFC: Decision trees have been routinely used in data mining [27-29]. Their underlying principle is that a flow-like series of questions is applied to each variable, subdividing the sample into groups that have maximal similarity by minimizing the within group variance. In the case of random forests, hundreds or thousands of decision trees can be generated simultaneously and their results combined to increase the precision of the prediction. RFC can predict a categorical variable, such as whether animals received drug treatment, from other IVs, where each tree provides a “vote” for the categorical membership, and the majority vote “wins”. RFC is largely assumption-free, and therefore it is a powerful alternative to LDA when the GLM assumptions are violated [27-30]. Furthermore, simulation studies have demonstrated that RFC can provide robust classification even with high dimensional data with small sample sizes [31]. We used RFC with 500 trees generated using 4 variables at each split. Although Breiman et al. [27] have suggested that RFC does not overfit, this view has been challenged (e.g.,[32]). We tried to avoid over-fitting by increasing the number of trees to 500 as well as by optimizing “m”, the number of variables at each split, by testing a range of values for “m”. We chose m = 4, the √p [29], as the optimal value. We used cross-validation by splitting the data into training and test data sets (70:30) and calculated out-of-bag (OOB) error based on observations that were excluded from a subset of the training data (the ‘bag’) used to produce the decision trees, and also a classification matrix in which the model based on the training data was used to predict group membership in the test data set, blind to their actual membership. The test error was not greater than the training error, which further convinced us that our model did not overfit. All RFC analyses were performed using R and the R package, Rattle [30]. In R (version 3.6.1), RFC analyses provide graphs of error as a function of the number of trees, a list of predictor variables in order or importance, as well as OOB receiver-operating characteristic (ROC) curves.

Cluster analyses

CA is a type of non-parametric analysis that is used to determine associations between variables, with no pre-determined dependent variable [14]. It does not make assumptions about the normality of the data or homogeneity of the covariance matrices; CA uses measures of ‘distance’ between variables in order to group them according to their degree of association, which is usually shown on a ‘dendrogram’ in which similarity increases as an inverse function of the y axis value [14]. The squared Euclidean distance was used with a hierarchical, Ward minimal linkage algorithm, in which clusters are formed based on the minimization of variance [14]. The data were transformed into z scores first, in order to minimize the effects of differences in scales of measurement for the different IVs.

RESULTS

Effect of SMA-RL71 on tumor proteins

Previous work showed that SMA-RL71 decreased MVD, as shown by CD105 staining, and also increased Apoptag-TUNEL staining and cleaved caspase-3 protein expression in tumors from treated mice [13]. However, immunohistochemical (IHC) staining of tumor sections with Ki67, a nuclear antigen expressed in proliferating cells, showed no significant difference compared to SMA controls (Fig. 1A and 1B). This was confirmed following examination of TK1 levels in the plasma, where there was no difference between the treatment groups (Fig. 1C and 1D). Thus, univariate analysis showed no difference in cell proliferation in the tumor between SMA-RL71 treatment and controls, even though tumor volume was decreased 68% [13].
Figure 1. Effect of SMA-RL71 micelles on cell proliferation in tumors from treated mice. A. Representative photomicrographs of Ki67-positive proliferative cells by immunohistochemistry staining. B. Quantification using IHC nuclear image algorithm. C. Representative ECL dot blot of plasma TK1 and rHTK1 proteins. D. Scanning densitometry quantification of plasma TK1 and concentration determined from a linear regression of rHTK1 proteins. Results represent the mean ± SEM of 8 mice per group. None were significantly different.
While we had previously reported the role of EGFR, Akt, mTOR, and 4EBP1 proteins [13], as individually contributing to drug-mediated tumor suppression, more proteins were examined in tumors from treated mice in order to conduct MVA. The results showed that treatment with SMA-RL71 had no significant effect on the expression of Wnt between the treatment groups (Fig. 2A and 2B) but caused a 60% decrease in the expression of β-catenin (Fig. 2A and 2C). Further investigation showed that SMA-RL71 caused a 78% decrease in the expression of PKC-α in tumors (Fig. 2A and 2D), but had no significant effect on CaD1 (Fig. 2A and 2E), and PP2AA (Fig. 2A, 2F and 2G) compared to vehicle control.
Figure 2. Tumor protein expression levels of WNT5a/b, β-catenin, PKC-α, CAD1 and PP2AA following drug treatment. A. Representative western blots of the various proteins from individual mice. Scanning densitometry of western blots of Wnt5a/b (B), β-catenin (C), PKC-α (D), CaD1 (E), and PP2AA (F and G). Bars represent the mean ± SEM from 8 mice per group. Significance was determined with a one-way ANOVA coupled with a Bonferroni post-hoc test. *Significantly different compared to SMA control, P < 0.05.

MLR

The MLR adjusted R2 for the prediction of tumor growth was 0.896, which was significant according to an ANOVA (F(13,8) = 14.98, P ≤ 0.0001). The Durban-Watson statistic was 2.12, which indicated a lack of autocorrelation [23]. Multicollinearity becomes a concern when the tolerance value is < 0.1 [23]. Significant predictor variables that had a tolerance ≥ 0.1, and were therefore not likely to generate multicollinearity, were drug treatment (P ≤ 0.001), Ki67 (P ≤ 0.009), PKC-α (P ≤ 0.0001), PP2AA-α (P ≤ 0.0001), PP2AA-β (P ≤ 0.0001) and CaD1 (P ≤ 0.001). The validity of this regression model was checked independently using best subsets regression and the results were similar, with R2 values in the range of 86%–90% with multi-collinearity controlled for using the Mallow’s Cp index.

LDA

An LDA with a canonical correlation value of 0.88 was obtained, which was significant (Wilk’s λ (4,17) = 14.86, P ≤ 0.0001). The standardized canonical discriminant function coefficients were tumor volume 0.56, pAkt/Akt, 0.83, pEGRF, 0.71 and β-catenin, 0.57. Cross-validation demonstrated that this linear discriminant function was100% successful in classifying the animals to the correct treatment group (Table 1). The analysis was repeated using stepwise regression and similar results were obtained.

RFC

RFC was used as an alternative to LDA to predict the classification of the animals to the correct treatment groups. Five hundred trees with 4 variables at each split were used. The OOB estimate of error was only 6.25% and the error became reasonably stable after the first 400 trees. The ROC curve showed a good hit versus false alarm rate with an area under the curve (AUC) value of 0.95 (Fig. 3). In terms of variable importance, the most important variables for classifying the animals into treatment groups were pEGFR, PKC-α, PP2AA-β, pAkt, pAkt/Akt, Apoptag, β-catenin and tumor volume (Fig. 4).
Figure 3. Area under ROC curve from the random forest model, predicting group membership. The diagonal line represents a random classifier, showing ROC curve above and an AUC > 0.5. RFC of signaling protein expression was performed using R3.4.3 and the R package Rattle.
Figure 4. Rank order of the specific proteins important to tumor suppression by the random forest model. The OOB estimation method was used to classify the rank list in order of highest score of relative importance. RFC of signaling protein expression was performed using R3.4.3 and the R package Rattle. Mean decrease accuracy: average measure of obstruction in classification if the variable is removed from the model. Mean decrease Gini: average measure of the difference in split nodes from individual variables over all trees in the model to predict variable importance.

Cluster analysis

Hierarchical cluster analysis showed that pEGFR clustered closely with tumor volume and Ki67, pAkt with CD105, NF-κB with PKC-α and β-catenin, and PP2AA-α with PP2AA-β, EGFR and Wnt5ab. The two largest clusters were of pEGFR, tumor volume, Ki67, pAkt and CD105 on the one hand, and Apoptag, CaD1, NF-κB, PKC-α, β-catenin, PP2AA-α, PP2AA-β, EGFR, Wnt5ab and Akt, on the other (Fig. 5).
Figure 5. Dendrogram of clustered signaling proteins involved in tumor suppression. The hierarchically clustered variables were calculated by an agglomerative Ward’s linkage method and using squared Euclidean distance. Clustering was performed by SPSS 24.
Table 1. Classification of treatment groups by linear discriminant analysis.
Predicted group membership
Treatment SMA SMA-RL71 Total
Originalb Count SMA 11 0 11
SMA-RL71 0 11 11
% SMA 100 0 100
SMA-RL71 0 100 100
Cross-validateda Count SMA 9 2 11
SMA-RL71 0 11 11
% SMA 81.8 18.2 100
aCross validation is conducted only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. b100% of original grouped cases correctly classified.

DISCUSSION

Most animal studies investigating the role of cancer-related proteins only perform univariate analyses. As useful as this can be, differences in related proteins which function as a network, can be missed. In this study, we adopted a different approach, using MVAs to identify a network of proteins critical to growth suppression of TNBC tumors.
Using MLR, the growth of the tumor itself was strongly predicted by whether the animals had received treatment with SMA-RL71, as well as the expression of Ki67, PKC-α, PP2AA-α, PP2AA-β and CaD1. The adjusted R2 of 0.896 indicated that these variables accounted for almost 90% of the variation in the growth of the tumor. The prediction potential of this group of proteins was interesting given that Ki67 (P = 0.4175), CaD1 (P = 0.2178) PP2AA-α (P = 0.1295) and PP2AA-β (P = 0.3901) were not significantly changed when analyzed by univariate methods.
It is important to note that although studies have established a relationship between Ki67 and overall survival (OS) and disease-free survival (DFS) [33-35], the predictive value of Ki67 as an indicator of chemotherapy benefit is controversial. For example, in the BR9601 adjuvant chemotherapy trial, 8 cycles of CMF (cyclophosphamide, methotrexate and fluorouracil) or epirubicin-CMF (4 cycles of epirubicin, followed by four cycles of the same CMF regimen) were administered to patients every 21 d. The primary end point was relapse-free survival (RFS) (HR = 0.36) and OS (HR = 0.30). However, Ki67 levels did not significantly interact with OS (P = 0.247) and RFS (P = 0.736) and therefore did not predict treatment outcome [36]. Similarly, in a study by the International Breast Cancer Study Group (IBCSG) IX, patients were randomly assigned to receive three 28-day courses of adjuvant CMF chemotherapy (cyclophosphamide on days 1–14, methotrexate on days 1 and 8; and 5-fluorouracil on days 1 and 8), followed by tamoxifen (57 months) or tamoxifen alone (60 months) [37,38]. CMF-tamoxifen resulted in a 5 year DFS and OS (87% and 89%) while tamoxifen also resulted in a 5 year DFS and OS (69% and 81%) [37-38]. However, there was no interaction between Ki67 levels and the response to the treatment [36,38]. Thus, Ki67 as an independent prognostic factor does not always reflect the response to treatment. However, our MVA results show that they form a biological network of molecular targets in the prediction of treatment outcome.
The role of CaD1 as a cancer metastasis associated protein was confirmed by Hou et al. [39]. Notably, CaD1 was less expressed based on IHC staining intensity and Western blotting in metastasis gastric (MKN7 and AZ521) cells compared to primary cancer cells (AGS and FU97) [39]. Additionally, IHC staining of gastric cancer tissues supported the in vitro expression of CaD1 and showed a significant decrease in CaD1 staining in lymph node metastasized tissues compared to the primary gastric tumors [39]. Furthermore, siRNA knockdown of CaD1 in AGS cells elicited an ~150% and ~50% increase in migration and invasion, respectively. In contrast, overexpression of CaD1 in AZ521 resulted in a ~50% decrease in both migration and invasion [39]. Similarly, low expression levels are found in colon cancer HCA7 cells and human breast cancer MB435S cells and a 4-fold and 7-fold increased invasion following siRNA-CaD1 treated cells compared to control cells was observed [40]. Thus, it is possible that if other studies had performed MVA, CaD1 would likely have contributed to a tumor suppressor network.
Using LDA to predict which animals had received SMA-RL71 treatment, we found a linear discriminant function that was statistically significant and 100.00% successful in correctly classifying the animals. The most important variables appeared to be tumor volume, the ratio of pAkt/Akt, pEGFR and β-catenin. A different form of classification analysis, RFC, found that pEGFR, PKC-α, PP2AA-β, pAkt, pAkt/Akt, Apoptag, β-catenin and tumor volume, were the most important predictor variables of the treatment group, although it is notable that pEGRF, pAkt/Akt and tumor volume were common to these two forms of analysis. For the RFC, the OOB estimate of error was only 6.25% and the AUC for the ROC curve, showing the hit versus false alarm rate, was 0.95. Thus, the expression of these signaling proteins was able to accurately distinguish tumor bearing mice receiving drug from those receiving vehicle with a specificity and sensitivity of 95%.
PKC-α may also be an important therapeutic target for SMARL71 in TNBC, as it was identified in the prediction of the SMA-treatment outcome during interaction with other signaling proteins in the MVAs and when analyzed as a single protein via univariate analysis. Inhibition of PKC phosphorylates Wnt/β-catenin signaling through direct phosphorylation of β-catenin at Ser45 and promotes β-catenin degradation [41]. In the present study, PKC-α was decreased by 78% following SMA-RL71 treatment in vivo. Similarly, the curcumin analogue, J1, inhibited the phosphorylation of PKC-theta in MDA-MB-231 and MCF7 cells by approximately 89% and 91%, respectively after 12 h [42]. Studies have also shown that the presence of heat shock protein 105 recruits PP2A, which in turn prevents the phosphorylation of the β-catenin degradation complex and subsequently activates Wnt-signaling that leads to cell proliferation and inhibition of apoptosis [43,44]. Shieh et al. [45] reported a time-dependent suppression of PP2A signaling by demethoxycurcumin in MDA-MB-231 cells with complete suppression at 48 h. Similarly, curcumin decreased the expression of PP2A by 50% in human rhabdomyosarcoma after 24 h, which led to an activation of mitogen-activated protein kinases and death in tumor cells [43]. Thus, there is evidence to suggest that the protein network identified does work in concert to modulate tumor growth.
CA are ‘unsupervised’ (i.e., there is no specific dependent variable to be predicted) but they explore the natural groupings of the signaling proteins. In a study by Dieninger et al. [46], 28 statistically significant m/z species were differentially expressed from 15 metastasized and 17 non-metastasized Barrett’s adenoma cases. Hierarchical clustering was performed on 10 of the most significant m/z species of the tumor tissues, which distinguished Barrett’s adenocarcinoma with lymph node metastasis from those without lymph node metastasis with 81% accuracy, following a specificity and sensitivity of 77% and 94%, respectively [42]. Also, hierarchical clustering of mass spectra of proteins identified by matrix-assisted laser desorption ionization imaging correctly separated gastric cancer and non-neoplastic mucosa [42]. CA in our studies showed that pEGFR clustered closely with tumor volume, which is consistent with the results of the LDA and RFC analyses. Also, Ki67 (anti-proliferation) clustered closely with tumor volume, which also confirmed its important role in tumor suppression. Overall, pAKT clustered with CD105 (anti-angiogenesis) while CaD1 clustered closely with Apoptag (pro-apoptosis).
We were concerned to ensure that the MVAs were valid and were not undermined by the relatively small sample size (n = 22), multicollinearity or overfitting. For the MLRs, the Durban-Watson statistic was 2.12, which indicated a lack of autocorrelation [23]. Multicollinearity becomes a concern when the tolerance value is < 0.1 [23]; however, all of the significant variables had a tolerance ≥ 0.1. We tested the validity of the MLR by repeating the analysis using a best subsets regression, which generated similar results. The validity of the LDA was tested using a ‘leave-one-out’ (LOO) cross-validation procedure, which makes over-fitting less likely, and simulation studies suggest that the sample sizes should have been sufficient given the number of predictor variables [25,26]. We tested the validity of the LDA by repeating the analysis using stepwise regression and similar results were obtained. For RFC, simulation studies have demonstrated that RFC can provide robust classification with small sample sizes [31]. We tried to avoid over-fitting by increasing the number of trees to 500 as well as by optimizing ‘m’, the number of variables at each split, by testing a range of values for ‘m’. Finally, we chose m = 4, the √p, which is recommended [29].
Statistical analysis is an integral part of drug efficacy studies, but research investigating the role of signaling pathways usually reports analysis of one protein at a time. Given the complexity of the cell signaling network in cancer, changes in one protein may not accurately reflect treatment outcome. Additionally, the interactions between signaling proteins may be missed. In this study, the MVAs have identified biological networks that drive tumor growth suppression mediated by SMA-RL71. These results should encourage others to include MVA as part of their data analysis when using in vivo cancer models.

Acknowledgments

The authors would like to thank M. Nimick and S. Taurin for technical assistance. We would also like to thank Ms. Lucy Thomsen for preparing the graphical abstract. The work was funded by a grant from the New Zealand Breast Cancer Foundation (RJR/KG) and a PhD scholarship from the University of Otago (ONKM).

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Somers-Edgar TJ, Taurin S, Larsen L, Chandramouli A, Nelson MA, et al. (2009) Mechanisms for the activity of heterocyclic cyclohexanone curcumin derivatives in estrogen receptor negative human breast cancer cell lines. Invest New Drugs 29: 87-97. doi: 10.1007/s10637-009-9339-0. [View Article] [PubMed] [Google Scholar]
  2. Yadav B, Taurin S, Larsen L, Rosengren RJ (2012) RL66 a second-generation curcumin analog has potent in vivo and in vitro anticancer activity in ER-negative breast cancer models. Int J Oncol 41: 1723-1732. doi: 10.3892/ijo.2012.1625. [View Article] [PubMed] [Google Scholar]
  3. Yadav B, Taurin S, Larsen L, Rosengren RJ (2012) RL71, a second-generation curcumin analog, induces apoptosis and downregulates Akt in ER-negative breast cancer cells. Int J Oncol 41: 1119-1127. doi: 10.3892/ijo.2012.1521. [View Article] [PubMed] [Google Scholar]
  4. Yadav B, Taurin S, Rosengren RJ, Schumacher M, Diederich M, et al. (2010) Synthesis and cytotoxic potential of heterocyclic cyclohexanone analogues of curcumin. Bioorg Med Chem 18: 6701-6707. doi: 10.1016/j.bmc.2010.07.063. [View Article] [PubMed] [Google Scholar]
  5. Bisht S, Schlesinger M, Rupp A, Schubert R, Nolting J, et al. (2016) A liposomal formulation of the synthetic curcumin analog EF24 (Lipo-EF24) inhibits pancreatic cancer progression: towards future combination therapies. J Nanobiotechnology 14: 57. doi: 10.1186/s12951-016-0209-6. [View Article] [PubMed] [Google Scholar]
  6. Nalli M, Ortar G, Schiano Moriello A, Di Marzo V, De Petrocellis L (2017) Effects of curcumin and curcumin analogues on TRP channels. Fitoterapia 122: 126-131. doi: 10.1016/j.fitote.2017.09.007. [View Article] [PubMed] [Google Scholar]
  7. Padhye S, Banerjee S, Chavan D, Pandye S, Swamy KV, et al. (2009) Fluorocurcumins as cyclooxygenase-2 inhibitor: molecular docking, pharmacokinetics and tissue distribution in mice. Pharm Res 26: 2438-2445. doi: 10.1007/s11095-009-9955-6. [View Article] [PubMed] [Google Scholar]
  8. Qiu X, Du Y, Lou B, Zuo Y, Shao W, et al. (2010) Synthesis and identification of new 4-arylidene curcumin analogues as potential anticancer agents targeting nuclear factor-κB signaling pathway. J Med Chem 53: 8260-8273. doi: 10.1021/jm1004545. [View Article] [PubMed] [Google Scholar]
  9. Vyas A, Dandawate P, Padhye S, Ahmad A, Sarkar F (2013) Perspectives on new synthetic curcumin analogs and their potential anticancer properties. Curr Pharm Des 19: 2047-2069. [PubMed] [Google Scholar]
  10. Greish K (2010) Enhanced permeability and retention (EPR) effect for anticancer nanomedicine drug targeting. Methods Mol Biol 624: 25-37. doi: 10.1007/978-1-60761-609-2_3. [View Article] [PubMed] [Google Scholar]
  11. Taurin S, Nehoff H, Diong J, Larsen L, Rosengren RJ, et al. (2013) Curcumin-derivative nanomicelles for the treatment of triple negative breast cancer. J Drug Target 21: 675-683. doi: 10.3109/1061186X.2013.796955. [View Article] [PubMed] [Google Scholar]
  12. Angelova N, Yordanov G (2014) Nanopharmaceutical formulations based onpoly (styrene-co-maleic acid). Bulg J Chem 3: 33-43.
  13. Martey O, Nimick M, Taurin S, Sundararajan V, Greish K, et al. (2017) Styrene maleic acid-encapsulated RL71 micelles suppress tumor growth in a murine xenograft model of triple negative breast cancer. Int J Nanomedicine 12: 7225-7237. doi: 10.2147/IJN.S148908. [View Article] [PubMed] [Google Scholar]
  14. Manly BFJ (2005) Multivariate statistical analyses. A Primer. 3rd Edition. London: Chapman and Hall/CRC. 208 p.
  15. Kitbumrungrat K (2012) Comparison logistic regression and discriminant analysis in classification groups for breast cancer. Int J Comput Sci Netw 12: 111-115.
  16. Somers-Edgar TJ, Scandlyn MJ, Stuart EC, Le Nedelec MJ, Valentine SP, et al. (2008) The combination of epigallocatechin gallate and curcumin suppresses ER alpha-breast cancer cell growth in vitro and in vivo. Int J Cancer 122: 1966-1971. doi: 10.1002/ijc.23328. [View Article] [PubMed] [Google Scholar]
  17. Brook RJ, Arnold GC (1985) Applied regression analysis and experimental design. Boca Raton: Chapman and Hall/CRC. 256 p.
  18. Ryan TP (2009) Modern regression methods. New Jersey: Wiley-Interscience. 672 p.
  19. Stevens JP (2009) Applied multivariate statistics for the social sciences. 5th Edition. Hillsdale NJ: Lawrence Erlbaum.
  20. Noes T, Mevik BH (2001) Understanding the collinearity problem in regression and discriminant analysis. J Chemometrics 15: 413-426.
  21. Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med 66: 411-421. doi: 10.1097/01.psy.0000127692.23278.a9. [View Article] [PubMed] [Google Scholar]
  22. Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE (2005) Regression methods in statistics: Linear, logistic, survival and repeated measures models. New York: Springer.
  23. Field A (2011) Discovering statistics using SPSS. Los Angeles: Sage.
  24. Smith PF (2018) On the Application of Multivariate Statistical and Data Mining Analyses to Data in Neuroscience. J Undergrad Neurosci Educ 16: [PubMed] [Google Scholar]
  25. Zavorka S, Perrett JJ (2014) Minimum sample size considerations for two-group linear and quadratic discriminant analysis with rare populations. Communications in Statistics-Simulation and Computation 43: 1726-1739. doi: 10.1080/03610918.2012.744041. [View Article]
  26. Lachenbruch PA (1968) On expected probabilities of misclassification in discriminant analysis, necessary sample size and a relation with the multiple correlation coefficient. Biometrics 24: 823-834. doi: 10.2307/2528873. [View Article]
  27. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and Regression Trees. 1st Edition. Boca Raton: CRC Press.
  28. Pang H, Lin A, Holford M, Enerson BE, Lu B, et al. (2006) Pathway analysis using random forests classification and regression. Bioinformatics 22: 2028-2036. doi: 10.1093/bioinformatics/btl344. [View Article] [PubMed] [Google Scholar]
  29. Hastie T, Tibshirani R, Friedman J (2009) Elements of statistical learning: Data mining, inference and prediction. 2nd Edition. Heidelberg: Springer. 745 p.
  30. Williams GJ (2011) Data mining with Rattle and R. New York: Springer. 396 p.
  31. Gunduz N, Fokoue E (2015) Robust classification of high dimension low sample size data. arXiv: 1501.00592v1
  32. Cueto-López N, García-Ordás MT, Dávila-Batista V, Moreno V, Aragonés N, et al. (2019) A comparative study on feature selection for a risk prediction model for colorectal cancer. Comput Methods Programs Biomed 177: 219-229. doi: 10.1016/j.cmpb.2019.06.001. [View Article] [PubMed] [Google Scholar]
  33. de Azambuja E, Cardoso F, de Castro G Jr, Colozza M, Mano MS, et al. (2007) Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12,155 patients. Br J Cancer 96: 1504-1513. doi: 10.1038/sj.bjc.6603756. [View Article] [PubMed] [Google Scholar]
  34. Soliman NA, Yussif SM (2016) Ki-67 as a prognostic marker according to breast cancer molecular subtype. Cancer Biol Med 13: 496-504. doi: 10.20892/j.issn.2095-3941.2016.0066. [View Article] [PubMed] [Google Scholar]
  35. Xie Y, Chen L, Ma X, Li H, Gu L, et al. (2017) Prognostic and clinicopathological role of high Ki-67 expression in patients with renal cell carcinoma: a systematic review and meta-analysis. Sci Rep 7: 44281. doi: 10.1038/srep44281. [View Article] [PubMed] [Google Scholar]
  36. Yerushalmi R, Woods R, Ravdin PM, Hayes MM, Gelmon KA (2010) Ki67 in breast cancer: prognostic and predictive potential. Lancet Oncol 11: 174-183. doi: 10.1016/S1470-2045(09)70262-1. [View Article] [PubMed] [Google Scholar]
  37. International Breast Cancer Study Group (2002) Endocrine responsiveness and tailoring adjuvant therapy for postmenopausal lymph node-negative breast cancer: a randomized trial. J Natl Cancer Inst 94: 1054-1065. doi: 10.1093/jnci/94.14.1054. [View Article] [PubMed] [Google Scholar]
  38. Viale G, Regan MM, Mastropasqua MG, Maffini F, Maiorano E, et al. (2008) Predictive value of tumor Ki-67 expression in two randomized trials of adjuvant chemoendocrine therapy for node-negative breast cancer. J Natl Cancer Inst 100: 207-212. doi: 10.1093/jnci/djm289. [View Article] [PubMed] [Google Scholar]
  39. Hou Q, Tan HT, Lim KH, Lim TK, Khoo A, et al. (2013) Identification and functional validation of caldesmon as a potential gastric cancer metastasis-associated protein. J Proteome Res 12: 980-990. doi: 10.1021/pr3010259. [View Article] [PubMed] [Google Scholar]
  40. Yoshio T, Morita T, Kimura Y, Tsujii M, Hayashi N, et al. (2007) Caldesmon suppresses cancer cell invasion by regulating podosome/invadopodium formation. FEBS Lett 581: 3777-3782. doi: 10.1016/j.febslet.2007.06.073. [View Article] [PubMed] [Google Scholar]
  41. Shang S, Hua F, Hu Z (2017) The regulation of β-catenin activity and function in cancer: therapeutic opportunities. Oncotarget 8: 33972-33989. doi: 10.18632/oncotarget.15687. [View Article] [PubMed] [Google Scholar]
  42. Badr G, Gul HI, Yamali C, Mohamed AAM, Badr BM, et al. (2018) Curcumin analogue 1,5-bis(4-hydroxy-3-((4-methylpiperazin-1-yl)methyl)phenyl)penta-1,4-dien-3-one mediates growth arrest and apoptosis by targeting the PI3K/AKT/mTOR and PKC-theta signaling pathways in human breast carcinoma cells. Bioorg Chem 78: 46-57. doi: 10.1016/j.bioorg.2018.03.006. [View Article] [PubMed]
  43. Han X, Xu B, Beevers CS, Odaka Y, Chen L, et al. (2012) Curcumin inhibits protein phosphatases 2A and 5, leading to activation of mitogen-activated protein kinases and death in tumor cells. Carcinogenesis 33: 868-875. doi: 10.1093/carcin/bgs029. [View Article] [PubMed] [Google Scholar]
  44. Yu N, Kakunda M, Pham V, Lill JR, Du P, et al. (2015) HSP105 recruits protein phosphatase 2A to dephosphorylate β-catenin. Mol Cell Biol 35: 1390-1400. doi: 10.1128/MCB.01307-14. [View Article] [PubMed] [Google Scholar]
  45. Shieh J, Chen Y, Lin Y, Lin J, Chen W, et al. (2013) Demethoxycurcumin inhibits energy metabolic and oncogenic signaling pathways through AMPK activation in triple-negative breast cancer cells. J Agric Food Chem 61: 6366-6375. doi: 10.1021/jf4012455. [View Article] [PubMed] [Google Scholar]
  46. Deininger S, Ebert MP, Fütterer A, Gerhard M, Röcken C (2008) MALDI imaging combined with hierarchical clustering as a new tool for the interpretation of complex human cancers. J Proteome Res 7: 5230-5236. doi: 10.1021/pr8005777. [View Article] [PubMed] [Google Scholar]