A Goodness of Fit Test for a Survival and Count Bayesian Joint Model: In the Presence of Clusters

Bayesian statistical model fitting was an uncommon approach until recently, causing a lack of assessment techniques for these models. However, with the enhancement of computational facilities and advanced estimation techniques, Bayesian models have become popular. Though there are developed goodness of fit (GOF) tests available for classical multilevel models including joint modelling of mixed responses, there is no suitable model based GOF test to be applied on such a model which is fitted under a Bayesian framework. Therefore, this study focused on developing a suitable GOF test for multilevel Bayesian joint models having survival and count responses which are two frequently occurring data types in many fields. The novel test is developed mainly based on four classical GOF tests, including the well-known Hosmer-Lemeshow test and, the Bayesian concepts such as Bayesian credible intervals and regions. In addition, a simulation study has been used to examine the properties of the GOF test together with an application to a real-life example. The novel test performed well in terms of power and acceptable in terms of Type I error rates. Overall, the test performed well with small sample sizes.


Introduction
The research is conducted with the aim of suggesting a method to assess the goodness of fit (GOF) of a joint Bayesian model of bivariate responses in the presence of cluster variations (i.e., under multilevel modelling). Multilevel https://doi.org/10. 4038/sljastats.v24i1.8090 modelling which introduces random effects into the model has become a popular area in statistical modelling due to its applicability in various disciplines. This is widely used in medical, biological, educational and social sciences to overcome the consequences of ignoring hierarchical data structures such as producing underestimated standard errors of regression coefficients (Agresti, 2003). Furthermore, among the two types of statistical modelling techniques namely the classical (frequentist) and the Bayesian, the popularity of Bayesian modelling has been risen in recent decades with the vast growth of computational power (Dunson & Herring, 2005, Bello et al. 2009, Chen et al., 2016. Providing precise estimates for posterior distributions for small samples when using informative priors is a special advantage of Bayesian modelling (Gelman et al., 2004). Thus, the study was focusing mainly on Bayesian modelling in the presence of the hierarchical data structure where the estimation is achieved by a sampling technique, namely the Markov chain Monte Carlo (MCMC) approach. In addition, the basis of this study is built on a joint model of common mixed responses, the survival and the count. Although the joint modelling of these two variables is challenging due to their different distributional nature, it may usually outperform the separate univariate analysis because of the natural correlation among the variables (Sunethra & Sooriyarachchi, 2020, Hapugoda & Sooriyarachchi, 2018. The literature suggests that there is no model based GOF test to assess adequacy of multilevel joint Bayesian models which are essentially fitted for survival and count responses. Perera, Sooriyarachchi, and Wickramasuriya (2016) have proposed a GOF test for multilevel logistic models where there is a single binary response. More recently, Adhikari and Sooriyarachchi (2020) developed a GOF test for clustered survival and count data based on the tests developed by Hosmer and Lemeshow (1980), Lipsitz et al., (1996) and Perera et al. (2016). All these tests have been developed based on a classical statistical modelling, thus cannot be used to assess the GOF of a joint Bayesian model. Therefore, the main objective of this research is to suggest a GOF test for such models. Starting with the initial idea of developing such a test, the scope of the study was narrowed down to be adapted for small samples since Bayesian analysis works well even for small sample situations. Therefore, the novel development will give rise to the GOF assessment in the Bayesian paradigm which is more practical and better equipped for modelling small sample schemes. The test was developed based on the joint model formulation of Discrete Time Hazard Model for the survival data and Normal model for the log count with Bayesian modelling, by Hapugoda and Sooriyarachchi (2018). Moreover, to assess the performance of the developed test, the two properties, Type I error and power of the test were examined using a simulation study. To generate data for the simulation study, MCMC procedure in SAS software version 9.4 is used. As another objective, the developed test was applied to a real-life dataset to investigate how well the test performs in practice.
The paper is organized into five sections, of which the first section provides an introduction to the study giving some literature related to the research problem and the rest of the paper is organized as follows. Section 2 discusses the methodologies associated with the study and novel GOF test procedure. Section 3 describes the simulation studies to assess the properties of the test. Section 4 gives an application to the developed test. Finally, Section 5 concludes the paper.

Generalized Linear Mixed Models
To deal with the multilevel data structure of the study, generalized linear mixed modelling (GLMM) which is an extension of generalized linear modelling (GLM) that allows cluster-specific components to model is used. Let y ij represents an observation j in cluster i, where j = 1,2, ..., n. The number of observations can vary from cluster to cluster. Let x ij be the vector of explanatory variables for fixed-effect parameters β. Let u i be the vector of random effects for cluster i, which is the same for all observations within a cluster and z ij represents the covariate vector of random effects. A GLMM extends an ordinary GLM by defining conditional expectation; is the link function and u i is assumed to follow a normal distribution with zero mean and a constant variance. (Agresti, 2003).

Bayesian Approach and Estimation Methods
The parameters in Bayesian are random variables and treated as degrees of belief. As its name implies, the "Bayesian methods" use the well-known Bayes' theorem as the baseline. Thus, the conclusions about an unknown parameter θ are made in terms of conditional probability statements on the observed values of y: p(θ|y), where p(θ|y) ∝ p(θ)p(y|θ) for a prior distribution p(θ) and a data distribution p(y|θ). (Goldstein, 2011). There are two types of prior distributions used in Bayesian modeling, namely the non-informative priors and the informative priors (Browne & Draper, 2006). In the study, non-informative priors are used when fitting the models, as with just a little prior information about the parameters and the data, these can be set to keep the Bayesian estimation in the correct direction (Goldstein, 2011).

Bayesian Hierarchical Models
Due to the hierarchical behaviour of the data in the study, multiple parameters are involved as used in many statistical applications. This implies a need for a joint probability model to reflect the dependence between these parameters (Gelman et al., 2004). Suppose, θ j be the parameter of interest to be estimated which belongs to j th group and θ has a prior distribution with parameter IASSL ISSN-2424-6271 vector ϕ (hyper-parameters), for j = 1, 2,…, J. The hierarchical structure occurred for these models, since ϕ has its own prior distribution, p(ϕ) as ϕ is not known. Thus, the appropriate joint prior distribution for the vector (ϕ ,θ) is, p (ϕ,θ)=p(ϕ) p(θ|ϕ). And the joint posterior distribution is, p (ϕ,θ| y)∝ p (ϕ,θ) p(y |ϕ,θ)=p (ϕ,θ) p(y |θ), since p(y|ϕ,θ) depends only on θ; the y is affected from hyper-parameters only through θ. (Gelman et al., 2004).

Markov chain Monte Carlo Simulations
Markov chain Monte Carlo (MCMC) is a simulation procedure used to fit Bayesian models which draws values of a parameter θ by approximating distributions and correcting the draws to better approximate a target posterior distribution. The samples are drawn sequentially, as depending only on the previous value drawn; hence, the notion of the Markov chain. For converging to a specific target distribution, the approximated distributions are improved at each step in the simulation, using the Markov property (Gelman et al., 2004). Among various algorithms of Markov chain simulations, the Gibbs sampler and the Metropolis-Hastings (MH) algorithm are widely used. However, a recent comparative study on Bayesian MCMC methods in the multilevel context by Karunarasan, Sooriyarachchi, and Pinto (2021) showed that the MH algorithm behaves superior to the Gibbs sampler for small sample sizes, by comparing two-level random intercept models. Concepts associated with MCMC algorithms such as burn-in, thinning and chain length are explained in Karunarasan et al. (2021).

Summarizing Posterior Inference
The Bayesian summary statistics of the posterior probability distribution which are used in this study are Bayesian Credible Intervals and Deviance Information Criterion (DIC). Bayesian Credible Intervals: For this study, a type of Bayesian credible interval, the highest posterior density (HPD) intervals are used to assess the model adequacy, as p-values are not incorporated with Bayesian analysis. The key purpose of using these is to describe and summarise the uncertainty of unknown parameters to be estimated (SAS/STAT 15.1 User's Guide). The 95% HPD interval gives the 95% most credible values. The joint Bayesian model assessment based on 95% credible intervals is discussed in section 2.5. Deviance Information Criterion (DIC): The Deviance Information Criterion (DIC) is used for selecting the best joint model in the application of the study (section 4.2) as the Bayesian model assessment tool. The smaller the DIC, the better the fit of the model.

Joint Modelling of Survival and Count data
Modelling of Survival and Count variables by taking them as a bivariate response is limited especially where the data being in a hierarchical structure. In such scenarios, it is said that better to model these variables together within joint modelling using random effects as it accounts for the clustering effect as well as the correlation among responses (Hapugoda & Sooriyarachchi, 2018; Sunethra & Sooriyarachchi, 2020. One such development is the joint survival and count model by combining Discrete Time Hazard Model (DTHM) with the Poisson regression by Hapugoda and Sooriyarachchi (2018). To overcome the difficulties arising due to the nature of the two response variables (i.e., survival variable is continuous while the count is discrete), for modelling the survival time DTHM has been used by converting the survival time into an appropriate discrete time scale. For a more complex method of fitting a joint model of these mixed responses that uses different random effects and complex continuous survival distributions, refer Sunethra & Sooriyarachchi (2020). Moreover, the joint Bayesian model fitted for these responses by Hapugoda and Sooriyarachchi (2018) is explained in the following section.

The joint Bayesian model used as the basis for this study
The joint model of DTHM with Poisson model, for survival and count responses using Bayesian methods (Hapugoda & Sooriyarachchi, 2018) is used as the basis for the current GOF test development. For simplicity, the model formulation is mentioned here using the same set of notations that will be used in the upcoming sections. As there are two responses, the subscripts 1 and 2 (i = 1, 2) are used to denote them separately. Suppose the notation g indicates the event of interest is happening in time interval g. Let Y 1gjk denote the binary (survival) response measured on the jth individual belonging to the kth cluster with a probability of success π 1gjk , and let Y 2jk denote the count response measured on the jth individual belonging to the kth cluster with an expected count λ 2jk . Moreover, X jk denotes the explanatory variables. For and Here, l i denote the GLM link functions for i = 1, 2 where l 1 is the logit link and l 2 is the log link. Note: Here the second response variable Y 2jk has been directly modeled as the log of count (log(Y 2jk ) = LY 2jk ) which is normally distributed with mean µ 2jk . Hence, where l ′ 2 denotes the identity link.

The steps of novel goodness of fit test of a joint Bayesian model for survival and count responses
The notations used in the succeeding workflow provide the same meaning as when defining the joint Bayesian model for survival and count responses, in section 2.4.
Step 1: The joint Bayesian model is fitted as defined in section 2.4, and the model parameters are estimated using the MCMC method defining appropriate prior distributions.
Step 2: Using the model, for each jth individual nested within kth cluster, the fitted values of the binary survival data and log count data (π 1gjk ,μ 2jk ) are obtained.
The Hosmer and Lemeshow test (1980) approach can be applied within clusters according to Perera et al. (2016).
Step 4: Then to capture the G = 5 groups, four indicator variables are introduced as follows.
where G = 2, 3, 4, 5 as the '1' is considered as the reference group. Afterwards, the observations are re-arranged as per the observation ID (i.e., the data set is put in order as before the data are sorted). This means the ranks are pooled over the clusters defining the overall 5 groups assigned for the data (Perera et al., 2016).
Step 5: Similarly, the estimated log countsμ 2jk are sorted in ascending order within each cluster, and then the sorted values are grouped into 5 groups within each cluster. Subsequently, the ranks G = 1, 2, . . . , 5 are assigned, and four indicator variables are introduced.
where G = 2, 3, 4, 5 and the first group is taken as the reference group. Then, the data set is re-arranged as per the observation ID.
Step 6: The four indicator variables are included in the joint model of interest, resulting in the following alternative model. For the binary survival response, and for the log count, The indicator variables introduce new parameters (γ 2_1 , γ 3_1 , . . . , γ 4_2 , γ 5_2 ) into the joint model. The revised model is estimated using the MCMC method.
Step 7: If all these parameters are simultaneously equal to zero, then the joint Bayesian model is correctly specified. Hence, the hypotheses to be tested are: . . , 5 and for i = 1, 2) is not equal to zero.
Step 8: Bayesian 95% credible intervals for the 4 indicators in each marginal model are assessed to test the above hypotheses, at an overall α = 0.05 × 4 = 0.2 with respect to the credible region = 0.8 (This is further explained in section 3.3). If zero is included in each credible interval that belongs to the 4 indicator variables, we do not reject H 0 , implying that all the parameters are simultaneously equal to zero. Therefore, the fitted joint Bayesian model is adequate. If zero is not included in at least one credible interval, we reject H 0 , implying that at least one of the parameters is not equal to zero. Therefore, the fitted joint Bayesian model is not adequate.

Simulation Study
The objective of the simulation study is to assess Type I error and power of the developed test. The data for the simulation study are generated using SAS macros based on the MCMC Procedure in SAS. Different parameters and combinations are considered, and for each scenario, 1000 datasets are generated. The two properties are assessed using the parameter estimates and their credible intervals, which are derived from these datasets (Goldstein, 2011). The design of the simulation procedure and the model fitting is similar for each of the combinations used in the study.

Parameters and Combinations used for the Simulations
Let the binary (survival) response be denoted as Y 1gjk with the probability of success π 1gjk . The log count is denoted as LY 2jk with the expected log count µ 2jk , where the count response is denoted as Y 2jk with the expected count λ 2jk . Note that these subscripts 1 and 2 are used for the survival model and the log count model, respectively, and j and k denote the second and third levels. The common explanatory variable is represented by X jk , and the survival time indicators in the survival model are represented by T 1 and T 2 .
To simulate binary (Y 1gjk ) and Poisson (Y 2jk ) data before fitting the joint model, the model coefficients were set either through a trial-and-error method or by directly using values obtained in similar studies (Perera et al., 2016; Adhikari and. The fixed effect coefficients were chosen as β 01 = 0.5694, β 02 = −0.25, β 1 = −0.725, β 2 = −0.2, α 1 = 0.02, and α 2 = 0.025, where β 01 and β 02 represent the two intercepts, β 1 and β 2 are the slope parameters, and α 1 and α 2 denote the parameters of T 1 and T 2 . To simulate X jk , a normal distribution with a mean of 2.0 and a standard deviation of 0.2 is employed (Perera et al., 2016). T 1 and T 2 are created as binary variables with probabilities 0.2 and 0.4, respectively, while the third indicator, T 3 , with a probability of 0.6, is considered as the reference level. Using these coefficients and parameters, Y 1gjk and Y 2jk are simulated by defining the expected values π 1gjk and λ 2jk , respectively, considering one combination at a time as described below. After generating Poisson data for the second response, the log of the count (LY 2jk ) is obtained. Hence, in model fitting, the distribution of the survival response is binary, while the distribution of the log count is defined as normal.
The main reason behind using a joint model for the survival and count responses is the dependency between these two responses. To account for the correlation between the responses of the same individual and the correlation within individuals in the same cluster, two levels of random effects are used in the simulation study (Sunethra & Sooriyarachchi, 2020). The random effect for the individuals (second level) is denoted by u 0jk , and the random effect for the clusters (third level) is denoted by v ok . One individual represents two responses, which defines the first level of the model. The random effects are generated as u 0jk ∼ N (0, σ 2 u ) and v ok ∼ N (0, σ 2 v ). Three different combinations of standard deviations (SD) for u ojk and v ok are considered such that σ 2 u > σ 2 v : (σ u , σ v ) = (1, 0.75), (1, 0.5), and (0.75, 0.5). The cluster sizes are chosen to be multiples of 5 as it simplifies the implementation of the grouping strategy in simulations. The different combinations of the two main criteria for clusters are as follows: 1) Number of clusters (K) -3, 5, 7, 10; 2) Number of individuals per cluster (n k ) -10, 15, 20, 25. If n denotes the total sample size, then n = n k × K. Thus, the study is conducted considering 4x4 = 16 sample size combinations. It is important to note that there are 16 combinations of sample sizes (n k × K) for each of the 3 combinations of standard deviations. In total, this results in 3x4x4 = 48 combinations.
To fit the Bayesian marginal models, the following MCMC parameters are used to specify the models: Number of posterior simulation iterations (N M C) = 15000, Thinning rate (T HIN ) = 10, Maximum number of autocorrelation lags (ACLAG) = 1000, and Number of tuning iterations (N T U ) = 1000. The MCMC sample size (N ) retained after thinning is 1500.

Prior distributions used in the Simulations
To fit hierarchical models using PROC MCMC, prior distributions of the fixed-effect parameters and hyperprior distributions of the random-effect parameters were specified. Normal priors with large constant variances are used for estimating the fixed effects of the joint model as that type of a normal prior is usually considered to be non-informative or weakly-informative. The prior distribution of the variance of random effect is considered as inverse-gamma as using conjugate inverse-gamma distribution for the non-informative prior of variance of normally distributed random effect is more efficient than others. (Chen et al., 2016).

Proportion of rejections under null hypothesis
The hypotheses of interest are: H 0 : The fitted model is adequate vs. H 1 : The fitted model is not adequate. As discussed in section 2.5, according to the development of the novel goodness-of-fit (GOF) test, the hypotheses are modified as follows: H 0 : γ 2_1 = γ 3_1 = γ 4_1 = γ 5_1 = γ 2_2 = γ 3_2 = γ 4_2 = γ 5_2 = 0 vs. H 1 : At least one γ G_i (for G = 2, . . . , 5 and for i = 1, 2) is not equal to zero, where γ G_i 's are the coefficients of the indicator variables in the alternative model (equations (6) and (7)). For each combination, 1000 datasets were generated under the null hypothesis and the rejection proportion of the null hypothesis is calculated as a region of Type I error. To generate the data under this scenario, the priors of γ G_i 's were defined as γ G_i ∼ normal(0, 10 4 ). The steps were followed as described in section 2.5.

Region of Type I error
Bayesian statistics deals with credible intervals to describe and summarize the uncertainty related to unknown parameters. Within a credible interval, an unobserved parameter value falls with a specific probability, and the generalization to its multivariate problems is known as the credible region. In this study, which involves multiple Bayesian credible intervals, it was observed that the probability of rejecting the null hypothesis (α) is high due to the incorporation of multiple testing. Therefore, the probability outside the credible region is 0.05×4 = 0.2, as each marginal model contains 4 parameters with 95% credibility, according to the Bonferroni correction for the issue of multiple comparisons (Bland & Altman, 1995). Thus, to assess the proportion of rejections under H 0 , α = 0.2 is used. However, due to random variation, it is examined whether the calculated proportion lies within the 95% probability IASSL ISSN-2424-6271 9 interval, which is defined as α ± Z 2.5% where n is the sample size. For n = 1000, the 95% probability interval of α = 0.2 is [0.175, 0.225].

Proportion of rejections under alternative hypothesis
It is important to maximize the ability to reject the null hypothesis of a test when the alternative hypothesis is true, which is usually referred to as the power of the test. Therefore, to assess this property of the test, the data are simulated under the alternative hypothesis, and the number of times the test rejects the null hypothesis is calculated. For this step, the priors of γ G_i 's were defined differently from those used in Type I error simulations in section 3.3. Specifically, the prior distribution was defined as γ G_i ∼ normal(200, 10 4 ), where the mean value is well above zero. This mean value can be any value that is different from zero, and for this study, the power analysis is conducted only for the selected mean value. The other steps were followed as described in section 2.5.

Simulation results
The simulation results are presented in Table 1 for all 48 combinations of the number of clusters, cluster sizes, and random effect variances. It is important to note that almost all of the 1000 datasets generated for each combination were converged.
The following two points describe the main findings of the simulation study in terms of rejection proportions under H 0 .
• When analyzing the results of a small number of clusters (3 and 5), the proportions tend to lie within the limits except only for a few combinations. Moreover, the results are better for the small cluster sizes (10 and 15) even with large numbers of clusters. That is, the combinations of a large number of clusters and large cluster sizes tend to provide bad results.
• Observing the 2nd (1, 0.5) and the 3rd (0.75, 0.5) SD combinations results separately, most of the proportions are smaller than the first SD combination (1, 0.75). As a result, for the number of clusters 7 in the SD combination 2 and 3, more proportions are fallen within the limits. In the simulations under H 1 (power analysis), almost all the results are above 90%, and it is essential to note that the variations in results are not much influential, as the range of the results is very small (i.e., min = 0.941 and max = 0.995).
• Among all SD combinations, the proportion is largest for the smallest number of clusters (3) and smallest for the largest number of clusters (10). That is, the proportions decrease slightly when the number of clusters increases. However, there is no clear pattern in these proportions with respect to the cluster sizes within each number of clusters.
• For most of the combinations, the results of SD combination 2 and 3 are smaller than the SD combination 1. That is, for low standard deviations of random effects, the power values are low.

The Description of the Dataset
The dataset used for the example has been obtained through a randomized control trial on multi-centred Epilepsy patients (Marson et al., 2002; Sunethra & Sooriyarachchi, 2020. The trial initially has been conducted to compare two antiepileptic treatments at 81 hospitals and follow-up data has been obtained from 1380 epilepsy patients (Marson et al., 2002). The dataset consists of the variables related to patients' demographic information such as age and gender, clinical features such as Epilepsy type and treatment type, statuses of having previous Neurological disorders and previous history of seizures. The selected survival variable is 'time to first seizure after randomization' and the count variable is 'number of seizures experienced by the patient'. The selected variables for the best joint model are defined in section 4.2.

Sample selection
Since the simulations were carried out as to apply the GOF test for small samples having small cluster sizes, a random sample was selected from the original dataset to be matched with the size limits. According to the example a hospital represents a cluster. Among the clusters having more than 10 individuals, 5 clusters/hospitals were selected randomly, as small numbers of clusters provided better results in the simulations. For large clusters, random samples of 25 individuals were obtained. The cluster sizes of the selected random sample of 5 hospitals are 24, 16, 20, 25 and 25. Thus, the total sample size is 108.

Creating a Binary Survival response
The survival time used for the study is 'time to first seizure' experienced by a patient and is a continuous variable as usual. To model this survival time, the Discrete Time Hazard Model is used where the survival time should be discretized into pre-determined time intervals forming a binary response variable (Goldstein, 2011; Hapugoda & Sooriyarachchi, 2018. Considering literature suggestions, the survival time; 'Time_seizure' was discretized into three time intervals 1-7 days (T 1 ), 8 days -1 year (T 2 ) and > 1 year (T 3 ). Then an outcome of DTHM was created by linking these time intervals in such a way that the survival time is represented by the binary outcome variable (Hapugoda & Sooriyarachchi, 2018).

Model fitting and Application of the test
In order to identify the variables that have a significant impact on the responses of interest, separate univariate models for the two responses were fitted including fixed and random effects. Then including the selected significant variables, the joint model is fitted which is used to assess the model adequacy from the novel GOF test. To identify the most significant explanatory variables, the backward elimination method was performed, and the exclusion of variables is decided based on mainly the Deviance Information Criteria (DIC) of the fitted model at each stage. The least significant variables were removed until all the remaining variables seem to be significant according to the DIC and the significance of selected variables is re-checked using the HPD intervals of the model coefficients. In the joint model, a shared random effect model is used for the marginal log count model (Sunethra & Sooriyarachchi, 2020). For the cluster-specific random effect v 0k where v 0k ∼ normal(0, σ 2 v ), the shared cluster effects for each marginal model can be written as; for the survival model, φ 1k = v 0k and for the log count model, φ 2k = h · v 0k where h links the two random effects. The selected joint model is a model including Treatment type = 'treat' (Deferred/Immediate), Status of previous head injury = 'pr_dis_headinj' ( = 1, . . . , 4), scale=2), the best joint model can be defined as follows.
For Survival ∼ Binary(π 1jk ), logit(π 1jk ) = α 0 + α 1 treat + α 2 pr_dis_othnd + α 3 his_sei_febr + α 4 asleep + α 5 T1 + α 6 T2 and for Log(count) ∼ Normal(µ 2jk , σ 2 ) where σ 2 = exp(µ 2jk ) count 2 , µ 2jk = β 0 + β 1 pr_dis_headinj + β 2 pr_dis_menin + β 3 pr_dis_othnd + β 4 his_sei_febr + β 5 his_sei_relat For the considered joint Bayesian model above, the fitted values (π 1jk and µ 2jk ) for each observation in the dataset are calculated for survival and log count. Then, by taking one response at a time, the fitted values are sorted and collapsed within each cluster into 5 groups. Four indicators (I 1 , I 2 , I 3 , and I 5 ) were defined. These five groups are not always allocated as of the same size (roughly equal size) since in the example dataset most of the cluster sizes were not divisible by 5 (Fernando & Sooriyarachchi, 2020). For each response, there will be 4 indicator variables resulting in overall 8 new parameters (γ (G_i) for G = 2, 3, 4, 5 and i = 1, 2). Then, the model is re-fitted including these indicator variables through the MCMC method. Finally, by checking the output of the model with indicator variables, the goodness-of-fit of the joint Bayesian model is assessed using the 95% highest posterior density (HPD) intervals of indicator variables. For ease of reference, the output (Table 2) of posterior summaries and HPD intervals is given only for the coefficients (γ (G_i) ) of the four indicators relevant to the two responses. Since all the 95% HPD intervals obtained for the coefficients of indicator variables in each marginal model (survival and log count) contain zero, each of the indicator variables is insignificant. This indicates that H 0 should not be rejected. Therefore, based on the novel GOF test, the multilevel joint Bayesian model fitted for the Epilepsy survival and count data is adequate.
The study mainly focused on assessing the model adequacy of a multilevel joint Bayesian model for survival and count responses. As it was the primary objective of the study, a suitable test was suggested by considering past studies (Hosmer and Lemeshow, 1980; Lipsitz et al., 1996; Perera et al., 2016; and Adhikari & Sooriyarachchi, 2020 and theoretical concepts relevant to Bayesian statistics such as Bayesian credible intervals for assessing the model adequacy.
Secondly, a simulation study was conducted to evaluate the performance of the test by assessing the proportions of rejections under the null hypothesis and the alternative hypothesis of the novel test. The rejection proportions under H 0 were assessed as whether they lie within a probability interval. According to those results, the novel GOF test developed under the Bayesian framework performed well overall for small numbers of clusters, while for large numbers of clusters, the results held within an acceptable region when the cluster size was small. Moreover, there was a decreasing pattern in the probabilities of Type I error when the random-effect standard deviations become small.
Then, a power analysis was conducted under the alternative hypothesis where the datasets were generated using incorrect distributional forms of priors purposely. The proportions under H 1 of small numbers of clusters were slightly higher than the large numbers of clusters for most of the combinations. The proportions tend to decrease when the random-effect standard deviations become small, irrespective of the number of clusters and cluster sizes. Overall, the power of the test is highly satisfactory for all combinations of the number of clusters, cluster sizes, and the random effect variances.
In order to assess how well the test performs in practical scenarios and to have a clear understanding of how the developed test can be applied, the test was illustrated using a real-life Epilepsy dataset. Through the real-world application of the test, it is concluded that the test provides accurate and reliable results for practical applications. Moreover, the test can be used to apply with many explanatory variables and unequal cluster sizes having no difficulties.