The authors have declared that no competing interests exist.

Forecasting epidemics like COVID-19 is of crucial importance, it will not only help the governments but also, the medical practitioners to know the future trajectory of the spread, which might help them with the best possible treatments, precautionary measures and protections. In this study, the popular autoregressive integrated moving average (ARIMA) will be used to forecast the cumulative number of confirmed, recovered cases, and the number of deaths in Pakistan from COVID-19 spanning June 25, 2020 to July 04, 2020 (10 days ahead forecast).

To meet the desire objectives, data for this study have been taken from the Ministry of National Health Service of Pakistan’s website from February 27, 2020 to June 24, 2020. Two different ARIMA models will be used to obtain the next 10 days ahead point and 95% interval forecast of the cumulative confirmed cases, recovered cases, and deaths. Statistical software, RStudio, with “forecast”, “ggplot2”, “tseries”, and “seasonal” packages have been used for data analysis.

The forecasted cumulative confirmed cases, recovered, and the number of deaths up to July 04, 2020 are 231239 with a 95% prediction interval of (219648, 242832), 111616 with a prediction interval of (101063, 122168), and 5043 with a 95% prediction interval of (4791, 5295) respectively. Statistical measures i.e. root mean square error (RMSE) and mean absolute error (MAE) are used for model accuracy. It is evident from the analysis results that the ARIMA and seasonal ARIMA model is better than the other time series models in terms of forecasting accuracy and hence recommended to be used for forecasting epidemics like COVID-19.

It is concluded from this study that the forecasting accuracy of ARIMA models in terms of RMSE, and MAE are better than the other time series models, and therefore could be considered a good forecasting tool in forecasting the spread, recoveries, and deaths from the current outbreak of COVID-19. Besides, this study can also help the decision-makers in developing short-term strategies with regards to the current number of disease occurrences until an appropriate medication is developed.

Coronavirus disease (COVID-19), caused by the novel severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) has been declared as a global epidemic by the WHO. The emergence of the novel coronavirus disease (COVID-19) was first reported after a bunch of severe pneumonia cases identified by officials in Wuhan, China in December 2019 [

COVID-19 has been declared as a global threat by the WHO and asked the international community to take it seriously from time to time. The ability to identify the growth rate at which the epidemic is spreading is very important to fight against it and help in governments’ awareness regarding public planning and policy-making to properly address the consequences of the disease. The key motivation behind the current research work is: to accurately forecast the spread of COVID-19 in Pakistan that could help the Govt officials for better planning to minimize its impact.

So far, several studies have been conducted to predict the spread of the COVID-19 pandemic using various mathematical and statistical models. The ARIMA model has been commonly used in the literature to analyze and predict the spread of the disease. To evaluate the prevalence of the COVID-19, ARIMA (1, 0, 4) was selected as the best ARIMA model, while ARIMA (1, 0, 3) was recommended for the prediction of COVID-19 [

The rest of the paper is organized into five sections. The first section includes the introduction as discussed above. The second section consists of the related work providing the relevant studies conducted on the forecasting of COVID-19 using time series models. Section 3 includes the methods and material with the main focus on a data source, model, and methods employed for the analysis of time-series data. Whereas, section 4 consists of the results and discussion. Lastly, the conclusion of the paper is presented in section 5.

In this section, scientific research work relevant to this study is presented. Generally, this section includes all the related studies that employ time series models that capture trends and patterns of all the events associated with infectious diseases. Secondly, this section will also focus on the use of such methods that strictly focuses on the prediction of epidemiological variables like cumulative cases, deaths, and recoveries from the current pandemic of COVID-19.

Time series models have been effectively implemented in the literature to forecast infectious diseases. For the prediction of infectious diseases that occur in cyclical patterns such as influenza, similar approaches have been used and are widely published [

With the emergence of COVID-19, there has been a tremendous rise in the scientific research work conducted regarding the forecasting of COVID-19 and published during the last few months. Roosa, et. al., [

Recently, simple mean-field models were used to assess a quantifiable picture of the COVID-19 pandemic spreading in China, Italy, and France [

Similarly, an attempt is being made to forecast C0VID-19 in China from February 5^{th} to February 24^{th}, 2020 using the generalized logistic model, the Richards growth model, with quantified uncertainty, and a sub-epidemic wave model [_{0} will significantly reduce the spread in the ship [

Forecasting is the most significant tool that allows us to understand the present scenario and plan for the future in a better possible way. For this purpose, the current study focuses on forecasting of the cumulative confirmed cases, recovered cases, and cumulative deaths from COVID-19 using time series models. With the application of these time series models, the aim is to assess and predict 10 days forecast of the cumulative number of confirmed cases, recovered cases, and deaths in Pakistan as well as to estimate the overall trajectory of the pandemic in the country.

The data for the current study on the number of confirmed cases reported, the number of deaths and recoveries for the COVID-19 were collected from the website of the Ministry of National Health Services, Pakistan [

Various studies have demonstrated that time series forecasting models focus on the past behavior of a random phenomenon that best captures the underlying trends and patterns. The optimum model is then employed for the prediction of future behavior of the underlying random variable. Over the past few years, there has been tremendous work done on the development of different time series forecasting models for forecasting the pandemics. In this article, several forecasting techniques are implemented such as single exponential smoothing, Holt linear trend method, Holt winter method including the most popular and widely used model that was originally developed for economics applications is the auto-regressive integrated moving average (ARIMA) model [

Time series models are used previously for forecasting several epidemics, and infectious diseases including SARS, Ebola, influenza, and dengue [

In this study, the cumulative data were used to forecast the confirmed cases, deaths and recoveries. The reason for using the cumulative cases that the available data is limited and is greatly affected by the variations. It can be seen from _{m} of the ARIMA and SARIMA model are selected by the autocorrelation function (ACF) and partial autocorrelation function (PACF). The mathematical structure of the ARIMA model is given in the following equation.

Where

Mean method of forecasting is very simple and affective especially when the time series do not have very complex behavior. In this method, simple mean of the historical data will be considered as the future forecast. Suppose that the historical values of a time series are denoted by _{1,}_{2},…,_{t} then the

Naïve method is a simple forecasting technique in which the last period actual values are to be considered as the future forecasted values. The naïve method perform really well in certain circumstances and sometimes perform even better than the other comparable/complicated methods. The naïve method can be symbolized mathematically as follows, i.e.

Where F_{t} is the current period forecast that depends upon the previous actual value at time domain D_{t-1}.

Seasonal naïve method of forecast is very much similar to the naïve method, and is very useful when there is high seasonality present in the data. In this method, each forecasted value is equal to the last observed value from the same season of the year. An

Where

The drift method of forecasting is nothing but a linear extrapolation. In the first step, a line is drawn between the first and the last point of the data and then this line can be extended to find out the future forecast. One of the advantages of this method over others is that this method is very simple and do not require any complicated mathematical calculation and even can be solved manually. The forecasted value at time

The simple exponential smoothing (SES) method is the most common and simplest method of forecasting. This method is a good choice for forecasting the future values when the data have no clear trend or seasonal components. Consider an observed time series _{1,} _{2}, _{3,} _{t}, then the mathematical structure of the SES takes the following form.

Where _{t} is the current value at time _{t} and a weight of ‘1-α’ to the most recent forecast

Holt linear trend method is a two-parameter model, also known as linear exponential smoothing model that can be used for forecasting efficiently the data having a trend component. There are three separate equations for Holt’s method that can be used collectively to produce the final future forecast. The mathematical structure of these equations is given as under.

Where _{t} denotes estimate of the level, _{t} represents an estimate of the trend, ∂ is a smoothing parameter for the level and ∅* is a smoothing parameter for the trend. Values of both of these smoothing parameters lies between ‘0’ and ‘1’.

The method of holt’s exponential smoothing was extended by [

Holt-Winter’s additive method:

Where

Holt-Winter multiplicative method:

Exponential smoothing methods are not only restricted to Holt-Winter’s additive and multiplicative trend. There are nine different combinations of seasonal and trend components are possible. Each method is categorized as a pair of letters _{d},

Trend Component | Seasonal Component | ||
---|---|---|---|

N (None) | A (Additive) | M (Multiplicative) | |

(N,N) | (A,N) | (N,M) | |

(A,N) | (A,A) | (A,M) | |

_{d} |
(A_{d},N) |
(A_{d},A) |
(A_{d},M) |

As one of the important criteria in time series analysis is the forecast evaluation of competing models. To test the robustness and generalizability of different models for the COVID-19 outbreak in Pakistan, two forecasting measures are employed for evaluation in this study. These criteria are root mean square error (RMSE) and mean absolute error (MAE) and their mathematical equations are as follows:

Where _{i} and

The forecast accuracy of non-seasonal, and seasonal ARIMA, model in terms of RMSE and MAE is better than the other time series models to predict the cumulative number of confirmed cases, recovered cases and deaths from COVID-19 in Pakistan, and hence recommended for forecasting [

The estimated seasonal ARIMA model is given in

The values for the p, d, and q in the estimated ARIMA model are 0, 2, and 1. Similarly, the values for P, Q, and D are 1, 0, and 0, and the value of m is 7, i.e. 7 days’ seasonality.

The estimated ARIMA model for both the cumulative recovered cases and cumulative deaths are given in Eqs

The values for p, d, and q in the estimated ARIMA models are 0, 2, and 1, which means that there are zero AR terms, one MA term, and the series is integrated twice to make it stationary.

It can be seen from the results produced in

Method | RMSE | MAE |
---|---|---|

SES | 2499.41 | 1621.75 |

Mean | 53559.24 | 42203.94 |

Naïve | 2509.73 | 1635.32 |

Seasonal Naïve | 17138.84 | 11261.96 |

Drift | 1903.80 | 1535.86 |

Holt’s Linear Trend | 422.97 | 269.17 |

Holt’s Linear Damped Trend | 502.91 | 305.60 |

Holt-Winter’s Seasonal Additive | 534.92 | 372.66 |

Holt-Winter’s Seasonal Multiplicative | 975.92 | 684.93 |

ETS(A,A,N) | 422.97 | 269.16 |

The point cumulative confirmed forecast cases up to July 04, 2020, are 231239 with a 95% prediction interval of (219648, 242832). The prediction interval means that our point forecast of 231239 lies within this interval, as well as the maximum number of estimated cumulative confirmed cases up to July 04, 2020, are 242832. The actual and forecasted confirmed cases are also shown in

In this article, we not only focus on the number of confirmed cases but the number of recovered cases as well. Forecasting the cumulative number of recovered cases are as much important as the cumulative number of confirmed cases. It will help not only the medical professionals but the government officials as well to take further necessary actions in the coming days to combat with COVID-19. The procedure of forecasting the cumulative number of recovered cases is very much similar to that of confirmed cumulative cases. The accuracy of different time series models in terms of statistical measure, i.e. RMSE, and MAE are calculated. It can be seen from

Method | RMSE | MAE |
---|---|---|

SES | 1404.38 | 683.31 |

Mean | 20490.11 | 15684.08 |

Naïve | 1410.22 | 689.04 |

Seasonal Naïve | 7480.11 | 4435.43 |

Drift | 1230.43 | 757.05 |

Holt’s Linear Trend | 890.40 | 295.45 |

Damped Trend | 944.86 | 342.67 |

Holt-Winter’s Additive | 891.31 | 367.00 |

Damped Holt-Winter’s Multiplicative | 1096.68 | 511.64 |

ETS(A,A,N) | 890.45 | 294.40 |

ARIMA(0,2,1) |

The procedure of forecasting the cumulative deaths is similar to that of cumulative confirmed and recovered cases, the accuracy of different time series models in terms of statistical measure, i.e. RMSE, and MAE is calculated for model assessment. It can be seen from

Method | RMSE | MAE |
---|---|---|

SES | 50.66 | 32.81 |

Mean | 1059.08 | 843.06 |

Naïve | 50.87 | 33.08 |

Seasonal Naïve | 334.86 | 223.71 |

Drift | 38.64 | 30.13 |

Holt’s Linear Trend | 12.95 | 7.25 |

Damped Trend | 14.67 | 7.99 |

Holt-Winter’s Additive | 12.90 | 7.54 |

Damped Holt-Winter’s Multiplicative | 1096.68 | 511.64 |

ETS(A,A,N) | 12.88 | 7.25 |

Based on the forecasted values, the cumulative number of confirmed, recovered, and deaths up to July 04, 2020 will be 2,31,239 with a 95% prediction interval of (2,19,648, 2,42,832), 1,11,616 with prediction interval of (1,01,063, 1,22,168), and 5,043 with 95% prediction interval of (4,791, 5,295) respectively. Based on these forecasted values, the active cumulative confirmed cases from COVID-19 in Pakistan for the next 10 days are estimated to be 1,14,580 (excluding recoveries, and deaths) with a maximum of 1,15,369. As the government of Pakistan, eased lockdown during religious festivals and allowed all the shopping centers to be opened, therefore, resulting in more spread and deaths from COVID-19. If the current policy of the government continued, then in the coming months there will be a disaster, and the actual number of cumulative confirmed cases may be more than the projected, and therefore, our front line medical professionals will fail to deliver, not only our hospitals, but all the places that have been declared as quarantine centers in different cities of Pakistan will be overcrowded. It is time for the government to revise its policy regarding easing the restrictions and opening the businesses to flatten the curve. Otherwise, the situation is going to be worse than the countries that are affected the most from the COVID-19 pandemic.

The results showed the compensations of these algorithms to support strategy/decision-makers in evolving short term policies about the number of disease prevalence. The forecast models will support the government and health staff to be ready for the forthcoming circumstances and take further promptness in healthcare structures. It is worth noting that forecasting is a complex matter, and some tailored models might not be ubiquitous owing to the complex societal and economic circumstances of different nations. The models and predictions proposed in this article do not reflect the local demography, and the real statistics can variate owing to numerous governmental actions like concentration of lockdown, strategy of isolation and health facilities etc. Thus, readers should be careful while interpreting these forecasts.

In this research study, an attempt has been made to predict the cumulative number of confirmed cases, deaths, and recoveries of COVID-19 in Pakistan. Here the cumulative data follows an upward or exponential trend, therefore the ARIMA model is used for forecasting. However, ARIMA may perform poorly if the daily deaths, confirmed, and recovered cases follow a nonlinear trend. Similarly, when the data follows a nonlinear trend, then autoregressive conditional heteroscedasticity (ARCH) can be used to forecast the current pandemic of COVID-19. In addition to time series models, machine learning and deep learning tools such as support vector machine (SVM), convolutional neural network (CNN), and recurrent neural network (RNN) can also be used to forecast the COVID-19.