br Study population and clinicopathological variables br
2.2. Study population and clinicopathological variables
Samples were randomly separated into training and validation (7:3) sets for prognostic analyses based on cohorts, in order to identify and evaluate the models as we described before using “createDataPartition” function of the “caret” package . Clinical information was retrieved using the “GEOquery” package for GEO datasets and the “IMvigor” pack-age for the IMvigor cohort. The endpoints analysed in this study were RFS, defined as the interval between the date of diagnosis and date of tu-mour relapse, and OS, defined as the interval between the date of diag-nosis and death.
2.3. Robust tumour microenvironment prognostic gene identification
TME-relevant genes were obtained from 12 published studies [16–27], which provided transcriptomic signatures for multiple im-mune and stromal cell populations. Robust prognostic genes were iden-tified using two steps: first, we assessed the correlation between the relative LLY507 value (z-transformed) of each gene and RFS via Cox univariate regression analysis in the entire cohort, where genes with P b 0·05 were selected for further analysis; next, we used bootstrapping to test the genes which passed initial filtering for robust-ness as follows: 70% patients randomly extracted from the training co-hort were assessed for survival impact of their genes. This procedure was repeated 1000 times and the genes that were incorporated in 70% of resample runs (achieved P b 0·05 in robustness testing) were consid-ered as robust prognostic genes and selected for further analysis.
2.4. TMRS gene panel generation using LASSO Cox regression
The Cox regression model, with least absolute shrinkage and selec-tion operator (LASSO) penalty, was implemented to reduce
dimensionality and select the most useful prognostic markers among the robust prognostic genes identified before [28,29]. Notably, all gene expression values were dichotomised before entering the LASSO Cox re-gression, and the “surv_cutpoint” function of the “survminer” R package
was used to determine the optimal cut-off point of each gene based on the maximally selected log-rank statistics. Moreover, we set the “minprop” parameter of the “surv_cutpoint” function (referring to the minimal proportion of observations per group) to 30% to avoid the
occurrence of too few patients in a certain group. Genes represented by optimal values of the penalty parameter λ, which were determined by ten-fold cross-validations, constituted the TMRS panel in this study. Risk scores based on the TMRS panel were also constructed using the dichotomised expression value of selected genes (a value of one or two was given to represent an expression value higher or lower than the cut-off value) via a Cox regression analysis in the training cohort. Among these, the risk score was named TMRS-RFS when RFS was used as the endpoint variable to generate the cut-off values and analysed by Cox regression, and correspondingly, the risk score was named TMRS-OS when OS was used as the endpoint.
2.5. Estimation of immune infiltration
An immune infiltration estimation was conducted using the “Micro-environment Cell Populations-counter (MCP-counter)” method, which allows robust quantification of the absolute abundance of ten immune and stromal cell populations in heterogeneous tissues from transcriptomic data . The R package “MCPcounter” was applied to achieve the transformation of mRNA data to the level of non-tumour cell infiltration in the tumour microenvironment, and gene expression profiles were prepared using standard annotation files prior to MCP-counter analysis. The stromal score, immune score, and tumour purity were estimated by applying the “ESTIMATE” R package devel-oped by Yoshihara et al. (https://sourceforge.net/projects/ estimateproject/).