A Markov Chain Modeling Approach for Predicting Relative Risks of Spatial Clusters in Public Health
Abstract
Predicting relative risk (RR) of spatial clusters is a complex task in public health that can be achieved through various statistical and machine-learning methods for different time intervals. However, high-resolution longitudinal data is often unavailable to successfully apply such methods. The goal of the present study is to further develop and test a new methodology proposed in our previous work for accurate sequential RR predictions in the case of limited lon gitudinal data. In particular, we first use a well-known likelihood ratio test to identify significant spatial clusters over user-defined time intervals. Then we apply a Markov chain modeling ap approach to predict RR values for each time interval. Our findings demonstrate that the proposed approach yields better performance with COVID-19 morbidity data compared to the previous study on mortality data. Additionally, increasing the number of time intervals enhances the accuracy of the proposed Markov chain modeling method.
Summary
This paper addresses the problem of predicting relative risks (RR) of spatial clusters in public health, specifically focusing on COVID-19 morbidity data. The authors aim to improve upon their previous work by applying a Markov chain modeling approach to predict RR values across multiple time intervals, even when high-resolution longitudinal data is limited. The methodology involves two main steps: first, identifying statistically significant spatial clusters using a likelihood ratio test based on a Poisson spatial scan model implemented in SaTScan. Second, predicting the RR for the next time interval using a predictor-corrector approach rooted in Markov chain modeling. This approach combines multiple linear regression and exponential smoothing to estimate the corrected future relative risk, using a weighting parameter to balance recent observations with corrected estimates. The key findings indicate that the proposed approach performs better with COVID-19 morbidity data compared to previous results using mortality data. The model achieved a high coefficient of determination (R^2 = 0.95682) and a low mean squared error (MSE = 0.00344) when predicting the RR for the seventh time interval. The model also outperformed using either multiple linear regression (R^2 = 0.892) or exponential smoothing (R^2 = 0.810) alone. This demonstrates the effectiveness of combining these methods. The study also found that increasing the number of time intervals enhanced the accuracy of the Markov chain modeling. This research is important because it offers a more accurate and reliable method for predicting disease risk in spatial clusters, which can aid in public health planning and intervention strategies, particularly when dealing with limited data.
Key Insights
- •The proposed balanced Markov chain approach effectively predicts the relative risk of spatial clusters by combining multiple linear regression and exponential smoothing techniques.
- •The model achieved a high coefficient of determination (R^2 = 0.95682) and a low mean squared error (MSE = 0.00344) when predicting the relative risk for the seventh time interval, indicating strong predictive accuracy.
- •The combined approach outperformed using either multiple linear regression (R^2 = 0.892) or exponential smoothing (R^2 = 0.810) alone, demonstrating the benefits of the hybrid method.
- •The optimal weighting parameter (α* = 0) indicates that the model heavily relies on past predicted values, suggesting that incorporating historical trends is crucial for accurate predictions.
- •The coefficient of variation (CV) for the predicted relative risks (29.48%) was lower than that of the observed relative risks (32.36%), suggesting that the model smooths out extreme fluctuations and provides more stable risk estimates.
- •The use of morbidity data, rather than mortality data, leads to more precise identification of spatial clusters and more reliable estimates of relative risk due to its larger size and ability to capture early transmission patterns.
- •A limitation is the assumption of constant risk within clusters in the spatial scan statistic, potentially affecting prediction accuracy.
Practical Implications
- •The model can be used by public health officials to predict areas at heightened risk of disease outbreaks, enabling targeted interventions and resource allocation.
- •The methodology can be adapted and applied to predict the relative risks of spatial clusters for other diseases or health outcomes beyond COVID-19.
- •Practitioners can use this approach to improve the accuracy of disease risk assessments, leading to more effective public health strategies.
- •Future research can focus on incorporating additional covariates, such as population density, healthcare access, and vaccination coverage, to further enhance the model's predictive performance.
- •The model's limitations, such as the assumption of constant risk within clusters and reliance on publicly available data, highlight areas for future refinement and data integration.