Repository: Freie Universität Berlin, Math Department

Characterising information loss due to aggregating epidemic model outputs

Sherratt, Katharine and Srivastava, Ajitesh and Ainslie, Kylie and Singh, David E and Cublier, Aymar and Marinescu, Maria Cristina and Carretero, Jesus and Garcia, Alberto Cascajo and Franco, Nicolas and Willem, Lander and Abrams, Steven and Faes, Christel and Beutels, Philippe and Hens, Niel and Mueller, Sebastian and Charlton, Billy and Ewert, Ricardo and Paltra, Sydney and Rakow, Christian and Rehmann, Jakob and Conrad, T. O. F. and Schuette, Christof and Nagel, Kai and Grah, Rok and Niehus, Rene and Prasse, Bastian and Sandmann, Frank and Funk, Sebastian (2023) Characterising information loss due to aggregating epidemic model outputs. medRxiv . (Submitted)

Full text not available from this repository.

Official URL:


Background. Collaborative comparisons and combinations of multiple epidemic models are used as policy-relevant evidence during epidemic outbreaks. Typically, each modeller summarises their own distribution of simulated trajectories using descriptive statistics at each modelled time step. We explored information losses compared to directly collecting a sample of the simulated trajectories, in terms of key epidemic quantities, ensemble uncertainty, and performance against data. Methods. We compared July 2022 projections from the European COVID-19 Scenario Modelling Hub. Using shared scenario assumptions, five modelling teams contributed up to 100 simulated trajectories projecting incidence in Belgium, the Netherlands, and Spain. First, we compared epidemic characteristics including incidence, peaks, and cumulative totals. Second, we drew a set of quantiles from the sampled trajectories for each model at each time step. We created an ensemble as the median across models at each quantile, and compared this to an ensemble of quantiles drawn from all available trajectories at each time step. Third, we compared each trajectory to between 4 and 29 weeks of observed data, using the mean absolute error to weight trajectories in consecutive ensembles. Results. We found that collecting models’ simulated trajectories, as opposed to collecting models’ quantiles at each time point, enabled us to show additional epidemic characteristics, a wider range of uncertainty, and performance against data. Sampled trajectories contained a right-skewed distribution which was poorly captured by an ensemble of models’ quantile intervals. Ensembles weighted by predictive performance narrowed the range of plausible incidence over time, excluding some epidemic shapes altogether. Conclusions. Understanding potential information loss when collecting model projections can support the accuracy, reliability, and communication of collaborative infectious disease modelling efforts. The importance of different information losses may vary with each collaboration’s aims, with lesser impact on short term predictions compared to assessing threshold risks and longer term uncertainty.Competing Interest StatementThe authors have declared no competing interest.Funding StatementKS, SF funded by ECDC and Wellcome (210758/Z/18/Z). AS funded by National Science Foundation Award 2135784, 2223933. KA funded by Netherlands Ministry of Health, Welfare and Sport, and European Union Horizon 2020 research and innovation programme, project EpiPose (grant agreement number 101003688). DES, AC, MM, JC, ACG funded by U3CM, Instituto de Salud Carlos III, Gobierno de España, European Commission. NF, LW, SA, CF, PB, NH funded by European Union Horizon 2020 research and innovation programme (grant number 101003688, EpiPose project). SM, BC, RE, SP, CR, JR, TC, CS, KN funded by Ministry of research and education (BMBF) Germany (grants number 031L0300D, 031L0302A). RG, RN, BP, FS funded by ECDC.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study used openly available data originally available at: confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll code and data available on Github: covid19-forecast-hub-europe/aggregation-info-loss

Item Type:Article
Subjects:Mathematical and Computer Sciences > Statistics > Applied Statistics
Mathematical and Computer Sciences > Statistics > Applied Statistics > Medical Statistics
Mathematical and Computer Sciences > Statistics > Statistical Modelling
Divisions:Department of Mathematics and Computer Science > Institute of Mathematics
Department of Mathematics and Computer Science > Institute of Mathematics > BioComputing Group
Department of Mathematics and Computer Science > Institute of Mathematics > Comp. Proteomics Group
ID Code:3018
Deposited By: Admin Administrator
Deposited On:11 Jul 2023 12:54
Last Modified:11 Jul 2023 12:54

Repository Staff Only: item control page