The authors have declared that no competing interests exist.
Conceived and designed the experiments: THW CTF. Analyzed the data: ICN. Contributed reagents/materials/analysis tools: THW CTF. Wrote the paper: ICN THW CTF. Interpreted data and revised the manuscript: JYW.
Tuberculosis (TB) disease can be caused by either recent transmission from infectious patients or reactivation of remote latent infection. Spatial dependency (correlation between nearby geographic areas) in tuberculosis incidence is a signature for chains of recent transmission with geographic diffusion. To understand the contribution of recent transmission in the TB endemic in Taiwan, where reactivation has been assumed to be the predominant mode of pathogenesis, we used spatial regression analysis to examine whether there was spatial dependency between the TB incidence in each township and in its neighbors. A total of 90,661 TB cases from 349 townships in 2003–2008 were included in this analysis. After adjusting for the effects of confounding socioeconomic variables, including the percentages of aboriginals and average household income, the results show that the spatial lag parameter remains positively significant (0.43, p<0.001), which indicates that the TB incidences of neighboring townships had an effect on the TB incidence in each township. Townships with substantial spatial spillover effects were mainly located in the northern, western and eastern parts of Taiwan. Spatial dependency implies that recent transmission plays a significant role in the pathogenesis of TB in Taiwan. Therefore, in addition to the current focus on improving the cure rate under directly observed therapy programs, more resource need to be allocated to active case finding in order to break the chain of transmission.
Human tuberculosis (TB) is an airborne infectious disease caused by
Taiwan is a middleburden country with an annual TB incidence remaining around 70 per 100,000 people from 1997 through 2005
Because the cumulative effect of local TB transmission among communities will cause geographic diffusion, we hypothesize that, if recent transmission plays a significant role in the TB endemic in Taiwan, we should be able to observe the presence of spatial dependency (the correlation between nearby geographical areas) in TB incidence between neighboring townships after adjusting for the spatial autocorrelation of the underlying sociodemographic and ethnic factors that influence the incidence of TB and TB reactivation (i.e. age, economic status, human immunodeficiency virus (HIV) infection, and aborigines)
To understand the role of recent transmission in TB endemic in Taiwan, we applied spatial regression analyses to examine whether spatial dependency exists for the TB incidence at the townshiplevel, after adjusting for the effects of socioeconomic geography.
Pulmonary TB is a notifiable disease that must be reported in Taiwan. Anonymized data on TB cases were obtained from the Notifiable Infectious Disease Statistics System
Taiwan Centers for Disease Control (Taipei, Taiwan) approved the use of data for the present study. The study procedure was reviewed and approved by the Institutional Review Board (IRB) of National Taiwan University Hospital (Taipei, Taiwan). The IRB approved the exemption of informed consent because the data on TB and HIV cases had been anonymized by the Notifiable Infectious Disease Statistics System.
Taiwan Census data included the population density, average household income, average number of persons per household, average years of education, and percentages of the population that were elderly (>60 years), aboriginal, Southeast Asian brides, and Southeast Asian laborers for each township. The average household income and average years of education were analyzed by quartiles using dummy variables (see
Variable Abbreviation  Definition  Mean (SD)  Regression coefficient 
Regression coefficient 

TB_INCI  2003–2008 TB cumulative incidence  0.0052 (0.0035)  –  –  
TB_INCI_6  2006–2008 TB cumulative incidence  0.0024 (0.0016)  –  –  
ABOR_P  Aborigines %  0.0775 (0.1966) 



BRIDE_P  % of population of brides from Southeast Asia  0.0001 (0.0002) 



DENSITY  Township population/area (m^{2})  0.0029 (0.0061) 



EDU1  8.2<Education years< = 8.7 (lower middle)  –  
EDU2  8.7<Education years< = 9.5 (middle)  – 


EDU3  Education years>9.5 (high)  – 



ELDER_P  % of Population >60 years old  0.1413 (0.0398)  
HIV_INCI  1984–2002 HIV cumulative incidence  0.0001 (0.0001) 



HOU_PERS  Average number of persons per household  3.5027 (0.4262) 



INCOME1  320<Average household income< = 440 (lower middle)  –  0.08  0.06  
INCOME2  440<Average household income< = 560 (middle)  – 



INCOME3  Average household income>560 (high)  – 



LABOR_P  % of population of laborers from Southeast Asia  0.0109 (0.0152) 


p<0.05 **p<0.01
p<0.001
Dependent variable: ln (TB_INCI)
Dependent variable: ln (TB_INCI_6).
Spatial autocorrelation identifies the patterns of spatial dependency by calculating the correlation of a variable with itself within a geographic space, meaning that the value of a variable is associated with those of the same variable in nearby areas. If spatial autocorrelation exists, general statistical methods that assume values of observations are independent may be invalid for further analysis. Spatial autocorrelation can occur in two directions: positive and negative. Positive spatial autocorrelation implies that the values of neighboring areas are similar to one another, while negative autocorrelation implies they are opposed to each other. The statistic used in this study to measure spatial autocorrelation is Moran’s I. This measure is used for variables at interval or ratio scales. The value of Moran’s I is calculated based on the deviation from the mean of two neighboring values
where
Spatial neighbors can be defined by a spatial weight matrix that is created in accordance with the neighbor definition chosen. We first calculated the mean distances between population centers of townships. Townships with shorter distances between their population centers were defined as neighbors. The geospatial relationships between pairs of the 349 townships were stored in a 349×349 matrix. The weight of each cell was the inverse of the distance between the two neighbors.
We inspected the residuals from the ordinary least squares (OLS) regression model to identify the spatial dependency of the residuals. If spatial dependency exists, it violates the assumption that the error terms of individual observations are independent of each other in the OLS regression; therefore a model that considers spatial autocorrelation is necessary
We used a spatially lagged y model
Because y is recruited in both sides of the regression equation, spatial dynamics creates a feedback effect between townships, in which a township’s level of TB incidence has an effect on its neighbors’, and the neighbors’ neighbors are also affected, throughout all connected townships
The spatial multiplier, (IρW)^{−1}, shows how much the change in independent variable x in one township “spills over” onto other surrounding townships. This “spillover” then affects y through the effect of its spatial lag
We used maps and a histogram to illustrate the variability in the spatial spillover (diffusion) of each township. These figures present the spillover (diffusion) at equilibrium of TB incidence into the surrounding townships with oneunit changes in the explanatory variables.
We were also interested in determining whether a neighbors’ previous TB incidence could be associated with that township’s future TB incidence. The spacetime model appears as y_{t} = ρWy_{t−1}+Xβ+ε, where we set y_{t} as the TB incidence from 2006–2008 and y_{t−1} as the TB incidence from 2003–2005.
The associations between socioeconomic variables and TB incidences were analyzed using a linear regression model. Natural logarithmic transformations were used for TB incidence to accommodate the assumption of normal distribution. Stepwise regression modeling was conducted using SAS version 9.2 (SAS Institute, Cary, North Carolina). Moran’s I statistic calculation, the permutation process, and the spatial regression analysis were performed using Geoda® version 0.9.5I
We performed linear regression to identify the socioeconomic variables associated with higher TB incidence. Univariate analyses were performed for each independent variable, and they showed that most socioeconomic variables were significant, except for lower middle education (EDU1), the percentage of elderly (ELDER_P), and lower middle income (INCOME1) (
Variables  ABOR_P  BRIDE_P  DENSITY  EDU1  EDU2  EDU3  ELDER_P  HIV_INCI  HOU_PERS  INCOME1  INCOME2  INCOME3  LABOR_P  
TB_INCI 







+0.07 




ABOR_P  1 











BRIDE_P  1 








+0.05  
DENSITY  1 






+0.02 

+0.00  
EDU1  1 






+0.02 


EDU2  1 


+0.05  +0.08 


+0.11  
EDU3  1 




+0.01 

+0.05  
ELDER_P  1 






HIV_INCI  1 




HOU_PERS  1  +0.09  +0.08 


INCOME1  1 




INCOME2  1 



INCOME3  1 


LABOR_P  1 
p<0.05 coefficient >0.3 or<
We again used Moran’s I statistic to test if there was still spatial autocorrelation for the residuals of OLS regression. The Moran’s I statistic was 0.18, indicating that the independent variables in the OLS model did not account for all spatial dependence in the outcome variable. These results confirmed the need to conduct spatial regression.
Spatial lag regression was conducted using the distance between population centers of polygons as the spatial weight. These results are shown in
Variable  OLS model^{∧}  Spatial Lag model^{†}  SpatialTime Lag model 
ABOR_P  1.38 
1.19 
1.15 
INCOME2  
INCOME3  
Spatial Lag (Wy)  –  0.43 
– 
Spatial Time Lag (Wy_{t−1})  –  –  64.63 
Adjusted R^{2}  0.53  –  0.42 
Log likelihood  
AIC  202.08  167.05  284.42 
p<0.05
p<0.01
p<0.001
Dependent variable: ln (TB_INCI)
Dependent variable: ln (TB_INCI_6).
The log likelihood and Akaike’s information criterion (AIC) showed that the spatial lag model had a better fit than the OLS model. The Moran’s I statistic for the residuals of the spatial lag model was 0.05, which was very close to 0. This demonstrated that the spatial parameter could eliminate the effect of spatial autocorrelation in the regression model.
The spatial multiplier for the spatial lag model was calculated for each township and presented in
We further consider a spatialtime lag model. The model appeared as y_{t} = Xβ+ρWy_{t−1}+ε, where we set y_{t} as the logtransformed TB incidence from 2006–2008 (under national DOT programs) and y_{t−1} as the logtransformed TB incidence from 2003–2005 (before national DOT programs). Using y_{t} as the dependent variable, univariate analyses were performed; these analyses are presented in
Our geospatial analysis of the countrywide TB data for Taiwan indicated that the TB incidence in a township was significantly affected by the TB incidence in neighboring townships, which implies that recent transmission plays a significant role in TB endemic in Taiwan. Therefore, in addition to the current focus on improving the cure rate under DOT programs, more resource need to be allocated to active case finding in order to break the chain of transmission.
Using spatial regression modeling, we demonstrated that there exists a spatial dependency of townshiplevel TB incidences in Taiwan, after adjusting for the effects of confounding socioeconomic variables, including the percentages of aboriginals and average household income. Furthermore, when we considered the temporality of the infectious processes, the spatialtime lag model indicated that a town’s TB incidences from 2006–2008 were affected by their neighbors’ TB incidences from 2003–2005, as would be expected from the cumulative effects of local TB transmission with contagious diffusion among the community.
The geospatial findings in the present study are consistent with molecular epidemiologic findings
Our analysis showed that the percentage of aborigines is an independent risk factor for higher TB incidence after adjusting for the effects of spatial dependency and household income. This finding was in agreement with previous studies of TB incidence in Taiwan
Consistent with previous observations that TB is a disease of the deprived and the poor
HIV infection weakens the immunity of patients and increases the risk of rapid progression to active TB disease after infection
The resolution of the geospatial analysis in the present study was limited to the township level because further details on the residential addresses of TB patients were kept confidential by the Notifiable Infectious Disease Statistics System. Therefore, we were unable to use spatial point analysis methods to identify localized spatial clustering of TB cases. Another limitation of this study is the lack of data on the molecular genotype of clinical isolates and the host factors of individual persons, as well as the social network data, which restricts our inferences to the ecological level. The last limitation is that, if a spatially autocorrelated determinant of reactivated latent TB cases has been overlooked, our conclusions could be incorrect. We do take into consideration a range of important socioeconomic factors, but it is still possible that an important variable is missing. Our findings justify further largescale genotypinggeospatial correlation studies to provide more insight on TB epidemiology in Taiwan.
In conclusion, our results add to the evidence that recent transmission plays a significant role in TB incidence in Taiwan, as well as highlighting the importance of taking a geospatial perspective in TB epidemiology.