Development of a risk prediction model for endometrial carcinoma among postmenopausal women in the Western province of Sri Lanka

Introduction: Globally, endometrial carcinoma is the most common reproductive tract cancer among women. Risk prediction model is a simple, low-cost tool to identify women with increased risk of developing endometrial carcinoma. Objectives : The aim of the study was to develop a model to predict the risk of endometrial carcinoma among postmenopausal women in Sri Lanka. Methods : A case control study was conducted. The cases and the controls were defined as postmenopausal women who had and had not been diagnosed as endometrial carcinoma based on histological confirmation respectively. Variable selection was done considering the objectivity and feasibility of the measurements in addition to the statistical criteria. A scoring system [0-9] was designed based on weighted score of each risk predictor. Predictive validity of the model was tested by calibration and discrimination. Receiver Operator Characteristic (ROC) curve was used to determine the cut-off value. Results : The developed model consisted of six predictors; Age >55 years, never conceived, age at menarche  11 years, ever experienced postmenopausal bleeding, having family history of any type of cancer among first degree relative, generalized obesity. Discrimination of the model was measured by the area under the ROC curve (0.92, 95% Confidence Interval: 0.88-0.95). Calibration with goodness of fit by Hosmer and Lemeshow test (p=0.72) was satisfactory. The tool demonstrated a good predictive ability with sensitivity of 79.5% (CI: 68.9%-87.3%) and specificity of 90.7% (CI: 86.8%-93.5%) at the cut-off point of 4.5. Conclusions : Model demonstrated good discrimination and well calibration. It can be used in screening of high-risk women for developing endometrial carcinoma.


Introduction
Endometrial carcinoma (EC) is generally diagnosed at an early stage (nearly 80%) in developed countries due to their early presentation to a health care facility [1]. Earlystage endometrial carcinoma shows a good prognosis; therefore, the importance of early detection of the disease is felt vital. The five-year survival of stage I, II, III and IV of the disease are nearly 85%, 70-75%, 45% and <30% respectively [2]. Though, it is hardly found any evidence to show the proportion of women diagnosed at an early stage and their prognosis in developing countries.
The most common presentation of endometrial carcinoma patients is postmenopausal bleeding and it is defined as an episode of bleeding, 12 months after the last menstrual period. Evidence shows that 10% of the women with post-menopausal bleeding have EC as the diagnosis [3,4]. Transvaginal Ultrasound Scanning (TVS) provides the best screening test for early detection of EC [5]. With its introduction, the survival of women with endometrial carcinoma has improved dramatically over the years [6]. However, the use of TVS is limited only to symptomatic women. Thus, establishing an organized EC screening programme in a country is less feasible due to financial, human resource and infrastructure shortages [7], but screening of only high-risk women may be an alternative for the resource limited countries.
Cancer risk prediction provides an estimate of the risk for developing cancer and can be used in reducing the disease burden. Prediction of cancer risk using a risk prediction model can help to identify individuals at high risk for any cancer, and follow them up closely with periodic screening and counsel them on behavioural changes to reduce risk. It is a less invasive and low-cost mechanism to reduce morbidity and mortality related to EC with early detection of the cancer [8].
(Index words: endometrial cancer, high-risk population, risk prediction model, postmenopausal women)

Original article
Risk prediction models are developed based on several risk factors associated with individual characteristics that are proven to be linked with the health event of interest [9]. A comprehensive analysis of risk factors should be ideally carried out by conducting a prospective study i.e. cohort study, however, long follow up and less feasibility discourage the prospective studies and suggest retrospective studies i.e. case control study designs in developing risk prediction models [10][11][12][13][14]. Predictors can vary from subject characteristics such as age and sex, history, physical examination, to imaging, electrophysiology, blood, urine or even genetic markers like advanced techniques.
There are several risk predictions models available to predict the risk of a woman developing endometrial carcinoma, thus the risk predictors included in the models and weight assigned for each predictor by the model are specific for the setting in which the models were developed limit its common use in other settings [4,13,15,16]. Most of the models have been developed to predict the risk of endometrial carcinoma among symptomatic women with a combination of risk factor assessment, ultrasound imaging and presence of biomarkers. A limitation inherent to these tools is that it has been developed based on their countryspecific risk factors where it cannot be applied in another population without doing a proper external validation. None of these available risk prediction models were developed or validated in developing countries. Therefore, the objective of this study was to develop a country specific model to predict the risk of endometrial carcinoma among postmenopausal women in Sri Lanka.

Methods
A hospital-based unmatched case-control study was carried out from September 2016 to March 2017 in the Western Province. The required sample size was calculated for several common risk factors of endometrial carcinoma and the largest calculated value was selected as the required sample size for the study. The values for odds ratios (OR) for different risk factors, their community prevalence was based on the available literature. Case to control ratio was taken as 1:4. For the calculation of the sample size, power was taken as 80%. The largest sample size calculated was 79 and adding 5% for non-response, the study included histologically confirmed 83 newly diagnosed EC cases and 332 controls, confirmed as not having EC by performing hysteroscopy, endometrial biopsy or curettage, from fourteen secondary and tertiary hospitals in the government sector where a Consultant Obstetrician and Gynaecologist and a Consultant Pathologist were available. Details of the methods have been published elsewhere [17]. Outcome of interest was defined as whether the patient was diagnosed of EC with histological confirmation. Assessment of Outcome variable was blinded to information about predictors as the disease was already assessed before commencing the study.
An interviewer-administered questionnaire was used to assess potential risk predictors. Firstly, potential risk factors consistently identified in the literature [17][18][19][20][21][22][23][24] and secondly, few additional risk predictors with the consensus of a panel of experts were included. Information on socio-demographic factors, reproductive factors, lifestyle related factors, biological factors, genetic factors and co-morbid factors were obtained with verification through medical records when necessary. Trained five preintern medical officers collected data. Blinding of data collectors to the outcome while collecting predictor data was not done.
Quality of diet was assessed into optimal or suboptimal consumption of energy dense food, food containing dietary fibre and anti-oxidants using validated Food Frequency Questionnaire. Overconsumption of energy dense food and inadequate consumption of food containing dietary fibre and anti-oxidants were considered as sub-optimal consumption. Physical activity was defined as life time total physical activity in terms of occupational, household, sports and exercise activities and assessed by calculating the average Metabolic Equivalent of Task (MET) value for total physical activities for a week during a year by validated Life Time Total Physical Activity Questionnaire. The Metabolic Equivalent values less than 25 th percentile was considered as low, more than 75 th percentile was high and in between 25 th to 75 th percentile was taken as average physical activity in life. Exposure to electromagnetic field was defined as a distance less than 100m from home to high tension wires and telecommunication towers. Exposure to outdoor air pollution was defined as having the house within 100m from main road or having a large industry within one km. Long term illness was defined as diagnosis of diabetes mellitus, hypertension and hyperlipidaemia. Generalized obesity was categorized based on Body Mass Index (BMI) according to the World Health Organization (WHO) definitions for adult Asians, Body Mass Index value of 25 kg/m 2 or more was categorized as obese. Central obese was defined as waist circumference of 80 cm or more based on American Diabetes Association Criteria.
Data analysis was done using the Statistical Package for the Social Sciences (SPSS)-16 th version. Multiple logistic regression was performed to identify the independent risk predictors for endometrial carcinoma. The independent variables used in the LR analysis were the variables that showed a statistically significant association with endometrial carcinoma at a significant level of < 0.2 in the bivariate analysis. Logistic regression analysis was carried out by purposeful selection of variables method. Goodness of fit of the LR model was assessed by Hosmer and Lameshow test. The values of Cox and Snell R Square test and Negelkerke R Square test were given an indication of the amount of variation in the dependent variable explained by the model. The model with the best goodness-of-fit in the LR analysis was selected as the final LR model. Selection of the risk Vol. 67, No. 4

, December 2022
Original article predictors in the model was done based on the feasibility and reliability of the independent variables retained in the multivariable analysis. The variables entailed recall bias (Eg: x-ray exposure), and could change over time (Eg: income) were not included in the model. Two models were developed including six variables as risk predictors. For the first model, BMI and age were included as categorical variables and for the second model, as continuous variables. However, the later stage of the development the continuous predictors were categorised into several categories. Two scoring methods were used to calculate the individual score to each predictor variable.
Four models were developed and assessed for model performance by discrimination and calibration. The discriminative performance of the model was assessed by Receiver Operator Characteristic (ROC) curve. Calibration curve and Hosmer-Lemeshow goodness-of-fit test were used to assess the calibration of the model [25].
In the final risk prediction model, each risk predictor carried a weighted score. This weight was based on the coefficient obtained for that variable in the LR model. Out of the two scoring methods used in giving the weighted score based on the ROC values the best scoring method was selected. One method was converting the value of regression coefficient to the closest integer and second method was calculating point values based on the smallest  coefficient [25]. At the end, a single total risk score which predicted the overall risk of an individual was calculated by adding up all those weighted scores given for each risk predictor that was relevant to the individual whom the risk prediction model applied.
Based on the scores assigned to each predictor in the model, every individual was assigned a total risk score. ROC curve was plotted and the point on the curve with the minimum distance (d 2 ) to the upper left corner (0,1) of the ROC plane was used to determine the optimal cut off point to discriminate the cases from non-cases. At-risk or not at-risk categories that were defined by the risk prediction model were tested against the true presence (cases) or absence (controls) of endometrial carcinoma. Sensitivity, specificity, likelihood ratios and predictive values were also presented with 95% confidence intervals. Reliability of the developed risk prediction model was assessed by re-administrating the risk prediction model, employing test-retest method among the cases and the controls.

Results
The total study sample for the development study consisted of 415 postmenopausal women, while 83 were cases diagnosed with endometrial carcinoma and 332 were controls who were free of endometrial carcinoma at the time of recruitment. The response rate of the cases and controls were 100%. A majority of the sample was above 55 years (n=223,53.7%), Sinhalese (n=364, 87.7%), Buddhist (n=333, 80.2%), and had a family income of more than twenty thousand rupees (n=229, 55.2%).
The risk predictors included in the model were age 55 years or more, never conceived, age at menarche 11 years, family history of any type of cancer, generalized obesity and ever experienced postmenopausal bleeding. The unadjusted odds ratios of the risk predictors included in the model were shown in Table 1. The size of the sample included in the multivariable analysis was 415 postmenopausal women.

Selection of best risk prediction model
Two models with BMI and age as categorical variables and as continuous variables were shown in Table 2 and Table 3 respectively. However, the later stage of the development the continuous predictors were categorised into several categories and it is shown in Table 4. Further, Table 2 shows the final scores allocated to each variable using two scoring methods in model one. The total score of the method I was 11 and the method II was 9. The final scores of the model two by two scoring method were shown in table 4, and the maximum total scores of the method I and II were 13 and 20 respectively. The area under the curve values of ROC curves drawn to model I and model II with scoring systems separately are given in Table 5. The highest value of AUC was 0.92 (95% CI: 0.88, 0.95) (Figure 1) which was corresponding to the model I (categorical variables) with scoring system II. Therefore, it was considered as the best model to be selected as the final risk prediction model. The results of the Hosmer-Lemeshow goodness-of-fit test indicated a significance of 0.72. The calibration curve to provide insight into the calibrating potential of the model is shown in Figure 2.

Development of a cut-off point to predict the risk of developing endometrial carcinoma
Cut off values of total risk score ranged from -1 to 9. The shortest distance (d 2 ) in the ROC curve for the final model was 0.051 (Table 6). It corresponded with the total risk score of 4.5, indicating that 4.5 to be the optimal cut off value to categorize each participant into "at-risk" of developing endometrial carcinoma or "not at-risk" of developing endometrial carcinoma. Table 7 shows the validity indicators of the final risk prediction model estimated based on the calculated cut off of 4.5. The sensitivity of the model was 79.5% (95% CI:68.9,87.3) and the specificity was 90.7% (95% CI:86.8,93.5). Likelihood ratio positive, which explains how much to increase the probability of having a disease, given a positive test result was 8.55 (95% CI:5.99, 12.11) and likelihood ratio negative, which interprets the decreased probability of having a disease, given the negative test result was 0.23 (95% CI:0.15,0.35). These estimates were indicative of a good predictive ability of the risk prediction model based on the development sample. The results of the test-retest reliability demonstrated good test-retest reliability with the correlation coefficient of 0.85, at 0.05 significance level.

Discussion
The aim of the present study was to develop a simple, low cost and user-friendly risk prediction model for endometrial carcinoma. Predictors included in the final risk prediction model of the present study were age more than 55 years, history of never conceived, age at menarche 11 years, having a family history of any type of cancer, postmenopausal bleeding and generalized obesity with BMI 25 kgm -2 .
In the present study, a prognostic model was developed to predict the risk of a postmenopausal woman in developing endometrial carcinoma in future. The most preferable study design to be used in developing a  prediction model is a prospective cohort study, as it facilitates the optimal measurement of predictors and the outcomes [26]. It should be ideally developed using individuals in good health [27]. However, a case control study was carried out to develop the prediction model in the current study due to the limited time and other logistic constraints which is a limitation of the present study. However, case control study design being an alternative to longitudinal designs in developing risk prediction models [14], it had been used to develop risk prediction models for other carcinomas in literature [10][11][12].
The present study was conducted based on scientifically estimated sample size to minimize the chance during sampling. Case to control ratio of 1:4 was taken in the sample size calculation to strengthen the statistical power of the study. Several pieces of research had suggested estimating the sample size based on rule 'at least ten events per variable' [26,28,29]. In this study, 83 cases of endometrial carcinoma had been used to develop the model resulting in only six variables in the final risk prediction model indicating the adequacy of the sample size.
The primary objective of developing risk prediction model was to predict 'at risk' population using a simple, less time-consuming tool with low cost. Therefore, it was ensured that developed risk prediction model was a user friendly, easy to be applied in a community setting or in a clinic setting as a screening tool to identify 'at risk' women for developing endometrial carcinoma among postmenopausal women. Ideally, the development of the risk prediction model to screen the high-risk women before symptoms occur should be carried out by including risk factors of endometrial carcinoma to prevent the effect of inflating the predictive value of the tool by including presenting symptoms. In this study, one of the common presenting symptoms of endometrial carcinoma, postmenopausal bleeding had been included as a risk predictor, thus this model was developed as a detection tool to improve the detection rate of endometrial carcinoma with or without symptoms but had not sought medical care. Further, the risk predictors in the current study had also been found in the other risk prediction models for endometrial carcinoma [4,13,15]. Endometrial thickness was a common predictor in many of these risk prediction models [4,13,15] and models incorporating endometrial thickness were more accurate in predicting endometrial carcinoma, Though, in a low resource country like Sri Lanka, to use a risk prediction model with ultrasound imaging at the community or even at a clinic is not feasible. Hence, a model based on clinical characteristics would be a better option at early triage of high-risk women. Potential bias introduced by predictor assessment contaminated by knowledge on outcome is also a limitation of the study.
Consistency was observed in methods to assess the model performance of the present study compared to similar studies to develop risk prediction models for EC. An AUC of 0.92 indicated a good discriminative power compared ROC Curve

-Specificity
Sensitivity to other models to differentiate between individuals who are at-risk from not at-risk of developing EC. The calibration power was also assessed by calculating the agreement between the predicted and observed probabilities of developing EC. Ideally, the model should be highly sensitive as well as highly specific, though this becomes impossible due to the trade-off between sensitivity and specificity. The model developed in the present study indicated a high specificity at the expense of sensitivity. It is well accepted that the specificity of the model should be increased at the expense of sensitivity when the cost and the risk associated with further investigations are considerable [30]. The women with increased risk of developing EC identified by the model need to be closely followed up and undergone invasive and costly investigations for definitive diagnosis. Hence, a model with high specificity compared to low sensitivity is justifiable.
This present study was carried out in the Western province. The socio-economic status of residents in the Western province is different from the residents of the other provinces. District of Colombo being highly commercial and exposed to mixture of modern life influence the lifestyle of residents in the Western province. Exposure to several risk factors related to lifestyle and environment among women of Western province may differ from the women of other provinces. Hence, the magnitude of the risk factors identified in the risk profile for endometrial carcinoma cannot be generalized to the women in the rest of the country. This was identified as a limitation of the study.
The developed risk prediction model can be incorporated into the existing healthcare system at the community setting as well as at the clinical setting. The Well Women Clinics (WWC), Healthy Lifestyle Centres (HLC) and Medical Officer of Health (MOH) office are common community settings where postmenopausal women are attending to get the health services in Sri Lanka. In addition, gynaecology clinics, medical clinics, eye clinics, outpatient departments and primary healthcare units are the most common clinical settings where that the model can be applied to identify postmenopausal women at increased risk of developing EC in Sri Lankan healthcare setting.
The risk prediction model developed to predict the risk of EC among postmenopausal women comprised of six risk predictors. The model demonstrated good discrimination and well calibration. It entails many features of a model that can be applied in the community as well as in the clinical setting with low cost and minimum time consumption.
The performance of a predictive model is overestimated when simply determined on the sample of subjects that is used to develop the model [31]. After developing a prediction model, it is strongly recommended to evaluate the performance of the model in another set of participants' data than is used for the model development [32]. Therefore, temporal validation was carried out as a technique of external validation of the developed risk prediction model before application in reality [33].