Skip to main content

BLOG 3: COVID-19 prediction models: to be handled with care

Published on: 04/07/2020 Discussion Archived

Since the start of the COVID-19 epidemic outbreak, researchers from all over the world have developed scores of predictive models with the aim to support policy makers in their decisions in the view to mitigate the effects of the outbreak. However, although the contribution of models and modellers is indisputable, when using predictive models to inform policy making it is crucial to be aware of the limitations, especially when considering a relative scarcity of data and novelty of the phenomenon at hand.  In this regard, as argued by Koerth et al. there are several caveats to be taken into account when stemming from data and modelling assumption, particularly when the phenomena studied are still ongoing.

Considering for example the simplest SIR (Susceptible-Infected-Recovered) model, in principle the number of deaths from an infectious disease is given by the susceptible population times the infection rate times the fatality rate. Starting from the fatality rate, it is difficult to have an average single dimension as it depends on the age of individuals and the presence of comorbidities, and therefore it changes from cohort to cohort and from country to country. Furthermore, even in the same subset of individuals, there are many uncertainties. In fact, the fatality rate is the ratio of the number of people who have died from the disease and the number of people infected with the disease. First of all, it is difficult to state with certainty how many people died from COVID-19, rather than with COVID-19, in particular in the presence of comorbidities. As reported inter al. by the BBC, there are in fact differences in how countries record Covid-19 deaths. Secondly, it is extremely impractical to determine the number of people that are infected at any given moment, as there are some studies suggesting that there are a lot of people affected by COVID-19 who do not display symptoms, and therefore the fatality rates are lower than what is currently argued in many countries.

On the other hand, there are also several studies that suggest a higher mortality of the COVID-19 outbreak by looking at “excess mortality”, i.e. the gap between the total number of people who died from any cause, and the historical average for the same place and time of year, as well as that many individuals were killed by conditions that might normally have been treated, had hospitals not been overwhelmed by a surge of patients needing intensive care (as argued for instance by The Economist). Further, it is not easy to estimate to what extent the fatality rate is influenced by the hospital capacity, e.g. access to the best care (ICU). It is also difficult to have a precise estimation of the symptomaticity ratio, which calculates how many people are symptomatic versus asymptomatic. In fact, it is clear that in case the healthcare capacity of a country (or a region) is overwhelmed, the fatality rate goes up.

There is a lot of uncertainty also concerning the infection rate. In principle, it depends on the basic reproduction number, which is the average number of new infections traced back to each infected person in a population where everyone is susceptible to the disease. This is influenced by the rate of contact, which is given by how many people an infected person interacts with in a given period of time and that depends on the circumstances, and by the rate of transmission per contact, which is basically how many of the people an infected person meets will become infected themselves. In turns, there are other variables that influence the infection rate: how long the virus can survive on a given surface, how far it can be flung through the air, the duration of infectiousness, and the extent to which asymptomatic individuals are infectious in comparison with symptomatic ones. And finally, all these dimensions are influenced by interventions such as social distancing and school closing, as well as of the modelling technique and the stage of the epidemics.

Keeping aside the uncertainty related to the construction of the database, there is also a difference in assumptions and modelling approaches that can lead to different results and policy recommendations. In that regard, following Rofer an interesting comparison can be done between top down and bottom up approaches.

The top down approach consists in fitting a curve to the data set and then to extrapolate the future data points. A bottom up approach consists in modelling a series of components mimicking the progress of the epidemics such as social distancing, allowing to separate the different mechanisms of the transmission process. A set of models by the Imperial College (1-2-3) are based on the bottom up approach. In fact, they model the ways in which the virus can be transmitted, and then assess how social distance and transportation influence the process. On the other hand, the model by IHME fits curves representing deaths in various locations with a series of parameters, and then extrapolates the numbers of deaths and the need for hospitalization and equipment. This leads to uncertainty at the beginning of the outbreak in which less location-specific data is available. Another important issue is that the IHME model assumes that the US has had a lockdown as strict as Wuhan, but this seems not to be the case. Further, only one location Wuhan has had a generalized epidemics, and therefore modelling the US fitting curve on such location is difficult, especially because the timing and extent of social distancing is difficult to mimic. When more US data will be available, the more will become more precise. Further, even though the model takes into account age structure, some other factors are not modelled, such as the prevalence of multi and co-morbidities, chronic lung disease, use of public transport, pollution and population density. On the top of that, the reduction in healthcare quality due to overload is not explicitly taken into account.

Another final consideration is that different models lead to very diverging recommendations. For instance, the first version (16 March) of the Imperial College model has grim predictions for what concerns the death toll in US and UK (respectively up to 500K and 2.2 million deaths) and the strain on ICU capacity, prompting the government to put in place mitigation measures. On the other hand, a model by the University of Oxford model suggests that the new coronavirus may already have infected far more people in the UK than scientists had previously estimated (maybe half of the population), and that thereby the mortality rate from the virus is much lower than what is generally thought to be, as the vast majority of infected individuals develop mild symptoms or not at all.

However, both models are built on a series of extreme assumptions: for the Imperial College model the value of the reproduction number, the rate of death, the length of incubation, and the period in which infected and asymptomatics can be infectious. For the Oxford model the suggestion that the infection has reached the UK by December or January, and the figure that only one in 1,000 infections will need hospitalization is removed from reality. Clearly the two models provide different recommendations: the Oxford model recommends to put more effort in trying to achieve herd immunity, and concludes that the country had already acquired substantial herd immunity through the unrecognised spread of Covid-19 over more than two months, while the model by the Imperial College recommends to put more effort on containment measures. However, both models agree with the measures of social distancing put into place by the UK government, and the only point of argument concerns the timing of removing such restrictions. In that regard, the crucial info hidden from the modellers regards the number of people that have been infected without showing symptoms, and for which a reliable test would be a game changer for modellers as it might significantly alter the predicted path of the pandemics. A final consideration is linked to the availability of data and the data collection activity. In this regard, there is a huge difference across the countries. Very interestingly, the German central register for ICU beds is based on voluntary contributions from all hospitals seems to be a unique platform and maybe something to replicate in other countries.