^{1}

^{2}

^{2}

^{3}

The authors have declared that no competing interests exist.

Estimating the percentages of undiagnosed and asymptomatic patients is essential for controlling the outbreak of SARS-CoV-2, and for assessing any strategy for controlling the disease. In this paper, we propose a novel analysis based on the birth-death process with recursive full tracing. We estimated the numbers of undiagnosed symptomatic patients and the lower bound of the number of total infected individuals per diagnosed patient before and after the declaration of the state of emergency in Hokkaido, Japan. The median of the estimated number of undiagnosed symptomatic patients per diagnosed patient decreased from 1.7 to 0.77 after the declaration, and the median of the estimated lower bound of the number of total infected individuals per diagnosed patient decreased from 4.2 to 2.4. We will discuss the limitations and possible expansions of the model.

The novel coronavirus (SARS-CoV-2) spread to the most populated areas of the world in the first few months of 2020. In Japan, the first case was reported on January 16th, 2020; on March 31st, the number of cases increased to 2122 [

The histograms of the date of onset (above) and the date of diagnosis (below) are shown.

To effectively control the spread of the infection, we need to know several parameter values characterizing the infection, such as the basic reproduction number, _{0}, the percentage of asymptomatic patients, and the fatality rate. One of the factors that complicate the decision making in disease control is the uncertainty in the percentage of asymptomatic patients. Several lines of evidence indicate that the virus can be transmitted by asymptomatic patients [

It would be useful if these values could be estimated by the information available in the early phase of the outbreak. Contact tracing is considered to be one of the effective measures, and health officials have conducted contact tracing of infected patients to prevent the spread of the virus infection for outbreaks of new or reemerging infections [

One promising model for contact tracing is a stochastic model based on the birth-death process, which is a formulation of branching processes [

A network with two clusters is shown on the left side, and the network after the progression of events is shown on the right side. The nodes represent symptomatic patients, and two nodes are connected by an edge if one has infected the other. The nodes can recover and be removed from the network (dotted circle/lines), or infect and connect to a new node (green circle). A new node without edges can be generated in the network (blue circle). A node can be diagnosed (gray filled circle), and the nodes in the same connected component (gray open circles) are removed from the network and counted among diagnosed symptomatic clusters (dashed rounded rectangle). Nodes and edges removed from the network are indicated in gray, and those newly generated in the network are indicated by bold lines.

This paper is organized as follows. In the Methods section, we summarize the SARS-CoV-2 infection in Hokkaido and divide it into those before and after the declaration of the state of emergency. We classify patients into symptomatic and asymptomatic, and diagnosed and undiagnosed. We explain what corresponds to diagnosed and undiagnosed patients in the present model. We describe the formulation of the model and the details of the simulations. In the Results section, we estimate the parameter values and the number of asymptomatic and undiagnosed patients. In the Discussion section, we relate our results with previous studies and discuss the limitations and possible expansions of the model.

This paper reports the analysis of the SARS-CoV-2 infection in Hokkaido, Japan [

We represent the patients with nodes and their contacts with edges in a network. If two patients were in close contact with each other, the corresponding nodes are connected by an edge. The network consists of distinct connected components, which we refer to as clusters. Sporadic patients are regarded as size-1 clusters.

The white and gray circles represent the patients in dataset 1 and 2, respectively. The numbers in the circles are the case IDs. Two circles are connected by an edge if these two patients were in close contact with each other. Only the clusters with sizes larger than 2 are shown. There are 59 size-1 clusters and 12 size-2 clusters in dataset 1 and 20 size-1 clusters and 8 size-2 clusters in dataset 2.

All cases were divided into datasets 1 and 2 according to the cluster they belong to. If the earliest onset of the cases in a cluster was prior to the declaration of the state of emergency, this cluster was included in dataset 1; if the earliest onset was between the declaration and the lifting thereof, it was included in dataset 2. Datasets 1 and 2 contain 78 and 30 clusters, respectively. Because the declaration of the state of emergency might have changed the behavior of residents and health officials in Hokkaido, we compared the data before the declaration, dataset 1, and the data between the declaration and the lifting thereof, dataset 2.

Dataset 1 | Dataset 2 | Dataset 1 & 2 | |
---|---|---|---|

Patients | 126 | 43 | 169 |

Clusters | 78 | 30 | 108 |

Average time from onset to diagnosis of the patient diagnosed first in a cluster | 9.3 | 6.6 | 8.5 |

Average time from onset to diagnosis of the all diagnosed patient | 8.4 | 6.2 | 7.8 |

Average cluster size | 1.6 | 1.4 | 1.6 |

Largest degree | 8 | 3 | 8 |

Average degree | 0.87 | 0.74 | 0.84 |

Average clustering coefficient | 0.070 | 0.093 | 0.076 |

The patients included in datasets 1 and 2 were all diagnosed and mostly symptomatic. However, not all of the individuals infected with SARS-CoV-2 were diagnosed and symptomatic; they can be classified into diagnosed symptomatic, diagnosed asymptomatic, undiagnosed symptomatic, and undiagnosed asymptomatic groups. The diagnosed symptomatic group consists of those who developed symptoms and were diagnosed, or those who were found in contact tracing. All individuals belonging to this group were covered by our datasets. Although the diagnosed asymptomatic group was also included in the datasets, we ignored this group because this group included only two individuals. The undiagnosed symptomatic group is comprised of individuals who were infected and developed symptoms, but recovered or died without being diagnosed. This group is not directly observable, and thus its percentage is one of the parameters we tried to estimate with the model, which takes this group into account. The undiagnosed asymptomatic group is not directly observable either. It has been suggested that a percentage of SARS-CoV-2 carriers do not develop symptoms but can infect others [

The birth-death processes have been used to model infectious diseases and population dynamics [

We modeled the contact tracing of SARS-CoV-2 with a variant of the continuous-time birth-death process, referred to as the birth-death process with recursive full tracing [

The birth-death process with recursive full tracing incorporates the diagnosis and quarantine of patients in addition to the features of continuous-time birth-death processes. The time from infection to diagnosis of a node is a random variable drawn from the exponential distribution with the scale parameter 1/

The simulation of the model is implemented as follows. Nodes without edges are generated in the network according to a Poisson point process with the stationary rate λ = 10^{−5} (

Let us note that _{0} is given by

We performed the approximate Bayesian computation of the posterior distribution of

We chose these two summary statistics, the average cluster size and the average time from onset to diagnosis of clusters, to fit the model parameters because of the following reasons. First, these can be obtained without using sophisticated techniques. Second, these allow the precise determination of

In each run of the simulation, we removed _{0} + _{0} = 100 clusters were discarded to eliminate the dependence of the results on the initial condition and used the following

The ratio of undiagnosed symptomatic patients to diagnosed symptomatic patients can be estimated by the number of nodes that recover without being diagnosed in a period divided by the number of diagnosed nodes that recover in the same period. We used the period between the removal of the _{0}-th cluster and the removal of the _{0} +

We performed simulations with randomly generated 100000 parameter sets and accepted the parameter sets that replicated the average cluster size and the average time from onset to diagnosis of clusters. Before applying the parameter estimation to datasets 1 and 2, we tested whether the present model can successfully estimate the parameter values of an artificial data. The artificial data was generated by a simulation run of the model with

The parameter sets that replicated both of the average cluster size and the average time from onset to diagnosis of clusters of a simulation run with

To examine the number of undiagnosed symptomatic patients, we calculated the number of nodes that recovered without being diagnosed in the target period.

In this paper, we have formulated a model to describe the spreading of infection and the quarantine of infected individuals, and estimated the number of undiagnosed symptomatic and asymptomatic COVID-19 patients in Hokkaido. The estimated percentages of undiagnosed symptomatic and asymptomatic patients coincided with previous studies [

There are several reasons we have chosen the cases in Hokkaido as the subject of this paper. Hokkaido is an island isolated from the other regions of Japan. In other words, we can assume that a relatively small percentage of the population commutes between Hokkaido and other parts of the world. This makes Hokkaido an ideal subject of the investigation. Until March 20th, one day after the lifting of the state of emergency, 1549 out of 1707 individuals tested with RT-PCR turned out to be negative for SARS-CoV-2 [

The claim by the local government that test capability was strengthened after the declaration of the state of emergency [

Although the Hokkaido datasets were an ideal subject for the present model, the model is not necessarily suitable for other datasets. Kyushu and Shikoku, both of which are islands in Japan, contain several prefectures and, consequently, several local governments. Because the cases reported by these local governments must be merged, the analysis of the spread of SARS-CoV-2 in these islands is much more difficult than that in Hokkaido. If a nation-wide dataset was available, it might be an ideal subject for the present model because of the recent travel restrictions. However, the present model cannot be applied to the dataset that exhibits the superspreader phenomenon because the number of individuals directly infected by an individual follows a geometric distribution (

One of the features of the present model is its simplicity. The model has only three essential parameters. The simplicity of the model allowed us to estimate the number of asymptomatic and undiagnosed patients without using a large number of parameter values estimated by previous studies. In the early phase of the spreading of infectious diseases, this simple model can enable the estimation of the asymptomatic and undiagnosed patients despite limited data. Our approach can estimate the number of patients without using costly and time-consuming techniques such as RT-PCR.

Also, its simplicity might allow for an analytical solution. The model is an extension of the birth-death processes, which has been studied intensively. The birth-death processes with contact tracing is analytically tractable [

The simplicity of the present model allows expansions in several ways. First, we assumed that

(CSV)

(IPYNB)