1 Introduction

Governments have been taking strict measures to slowdown the COVID-19 pandemic or to “flatten the curve”. The main reason is funded on avoiding a shortfall of health care resources while buying time to learn about the disease. In the other hand, current lockdowns are provoking collateral problems, especially in fragile economies, which concerns about the actual sustainability of the system.
Under this context, insurance companies need to have a comprehensive and cross-functional view in order to go in line with public decisions and adapt to the society’s reponse. It is critical to constantly evaluate the spread of the virus and assess potential scenarios that may arise during this crisis.
With the goal of representing the virus evolution in the following weeks, we propose a dynamic probabilistic framework to estimate the number of confirmed cases in several countries. The structure is based on a Gompertz model whose parameters describe outbreak delay, growth rate and final point. We rely on Bayesian inference that provides a mathematical architecture to 1) propagate uncertainty from model assumptions to predictions, 2) partially-pool estimates towards a common clusterized behavior of the pandemic (“borrowing strength” property).

2 Data Source

Confirmed Cases

Sources : JHU CSSE - Johns Hopkins School of Public Health

3 Model Estimates

Estimation of the evolution of the Sars-Cov-2

Cluster 1




Cluster 2




Cluster 3




Cluster 4




Cluster 5

4 Model Specification

The previous figures, predicted means and credible intervals, are generated by the model developed in this subsection.
Let \(y_{tj}\) be the cumulative number of confirmed cases for time \(t\) and country \(j\). We assume that \(y_{tj}\) is Lognormal distributed

\[ y_{tj} \sim Lognormal(\mu_{tjl},\sigma)\\ \] where \(\mu_{tjl}\) and \(\sigma\) are the mean and standard deviation of the associated normal, respectively, for time \(t\), country \(j\) and cluster \(l\).
The residual error \(\sigma\) does not depend on the country. The latter, in a hierarchical model structure, is understood as the within-group residual variability. In this context, it can be originated by different intensities of social distancing measures, different capabilities of governments to stop the spread or spatial/demographic variation across populations, among others. Based on these factors and using demographic proxy variables, we defined different groups of countries that share common attributes. Therefore, variation among clusters is captured by a higher level constructed over \(\mu_{tjl}\), specifically in the parameteres that define it.
Under the Gompertz model, \(\mu_{tj}\) is denoted as

\[ \mu_{tjl}=a_j \text{ } exp ({-b_j \times exp({-c_{tjl} \times t}}))\\ \] where \(a_j\) represents the reaching point, \(b_j\) the delay of the virus to start spreading, \(c_{tjl}\) the growth rate on time \(t\). In order to address the dynamics of the curve changes, we introduced a time-based \(c_{tjl}\).

\[ c_{tjl} = \frac{c_{jl}}{h_{jl} + exp(-g_{jl} \times (t - k_{jl}))} \]

where \(c_{jl}\), \(h_{jl}\) and \(g_{jl}\) are pulled towards the cluster mean and \(k_{jl}\) has its own mean for country \(j\). The b-parameter in equation (1) has the form \(b_j= b_{China} + \beta_j \times \sigma_b\).
The prior distributions are defined as follows

\[\begin{gather*} \sigma \sim Normal^+(0,10)\\ a_j \sim Normal^+(11,5)\\ b_{China} \sim Normal^+(5,10)\\ \beta_j \sim Normal^+(0,1)\\ \sigma_b \sim Cauchy^+(0,25)\\ c_{jl} \sim Normal^+(\gamma_l,\sigma_c)\\ h_{jl} \sim Normal^+(\theta_l,\sigma_c)\\ g_{jl} \sim Normal^+(\alpha_l,\sigma_c)\\ k_{jl} \sim Normal^+(70,70)\\ \end{gather*}\]

where \(j=1,...,J\) represents the country, \(l=1,...,L\) represents the cluster for country \(j\). The hierarchical structure is introduced through \(c_{jl}\), \(h_{jl}\) and \(g_{jl}\), i.e., country \(j\) shares information with the cluster through the hyperparameters \(\gamma_l\),\(\theta_l\) and \(\alpha_l\). The hyperpriors for the latter are

\[\begin{gather*} \gamma_l \sim Normal^+(0,1)\\ \theta_l \sim Normal^+(0,10)\\ \alpha_l \sim Normal^+(0,0.5) \end{gather*}\]

It is interesting to recover \(P(c_{tjl} \mid y)\) to analyze the change on the growth rate in time for the country \(j\). Since \(c_{tjl}\) represents a relative growth rate with respect to the reaching point \(a_{j}\), a higher \(c_{tjl}\) means a higher relative increase at that point t. The interpretation is not actually the same as the Gompertz model with \(c_{j}\).

5 Stan Code

To sample from the posterior densities, we wrote the model in Stan, a probabilistic programming language especially built for Bayesian inference. Stan uses Hamiltonean Monte Carlo, a family of Markov Chain Monte Carlo (MCMC) algorithms. The final code of the proposed model is the following


data {
  int<lower=0> N;
  int<lower=0> M;
  vector[N] y;
  vector[N] day;
  vector[M] day_projected;
  int<lower=0> J; //number of countries
  int<lower=0> L; //number of clusters
  matrix[N,J] design_matrix_1;
  matrix[J,L] design_matrix_2;
  matrix[N,J-1] design_matrix_b;
  matrix[M,J] design_matrix_proj;
  matrix[M,J-1] design_matrix_b_proj;
}
parameters {
  
  vector[J] a_j;

  vector<lower=0>[J-1] b_j;
  real<lower=0> b_intercept;
  real<lower=0> sigma_b;
  
  vector<lower=0>[J] c_j;
  vector<lower=0>[J] c_h;
  vector<lower=0>[J] c_b;
  vector<lower=0>[J] c_a;
  real<lower=0> sigma_c_j;  
  real<lower=0> sigma_c_h;  
  real<lower=0> sigma_c_b;  
  real<lower=0> sigma_c_a; 
  
  vector<lower=0>[L] c_cluster_1;
  vector<lower=0>[L] c_cluster_2;
  vector<lower=0>[L] c_cluster_3;  
  vector<lower=0>[L] c_cluster_4;  

  real<lower=0> sigma;
}
transformed parameters{
  
  vector[N] a_index=  design_matrix_1 * a_j;
  vector[N] b_index = b_intercept + design_matrix_b * b_j * sigma_b;
  vector<lower=0>[N] c_index = design_matrix_1 * c_j ./ (design_matrix_1 * c_h + exp(-design_matrix_1 * c_b .* (day - design_matrix_1 * c_a)));

  vector[N] mu= a_index .* exp(- b_index .* exp(-c_index .* day));
//  vector[N] sigma= design_matrix_sigma * sigma;

  vector[M] a_index_proj= design_matrix_proj * a_j;
  vector[M] b_index_proj= b_intercept + design_matrix_b_proj * b_j * sigma_b;
  vector<lower=0>[M] c_index_proj = design_matrix_proj * c_j ./ (design_matrix_proj * c_h + exp(-design_matrix_proj * c_b .* (day_projected- design_matrix_proj *c_a )));

  
  vector[M] mu_proj= a_index_proj .* exp(-b_index_proj .* exp(-c_index_proj .* day_projected));
}
model {
//priors

target+= normal_lpdf(a_j|11,6);

target+= normal_lpdf(b_j|0,1);
target+= normal_lpdf(b_intercept|5,10);
target+= cauchy_lpdf(sigma_b|0,25);

target+= normal_lpdf(c_j|design_matrix_2 * c_cluster_1,sigma_c_j);
target+= normal_lpdf(c_h|design_matrix_2 * c_cluster_2,sigma_c_h);
target+= normal_lpdf(c_b|design_matrix_2 * c_cluster_3,sigma_c_b);
target+= normal_lpdf(c_a|design_matrix_2 * c_cluster_4,sigma_c_a);

target+= normal_lpdf(c_cluster_1|0,1);
target+= normal_lpdf(c_cluster_2|0,10);
target+= normal_lpdf(c_cluster_3|0,0.5);
target+= normal_lpdf(c_cluster_4|80,20);

target+= cauchy_lpdf(sigma_c_j|0,1);
target+= cauchy_lpdf(sigma_c_h|0,10);
target+= cauchy_lpdf(sigma_c_b|0,1);
target+= cauchy_lpdf(sigma_c_a|0,70);

target+= normal_lpdf(sigma|0,10);

target+= lognormal_lpdf(y|mu,sigma);
}
generated quantities{
 real yrep[N]= lognormal_rng(mu,sigma);
 real yproj[M]= lognormal_rng(mu_proj,sigma);
}