Dpto. Estadística, Informática y Matemáticas - Estatistika, Informatika eta Matematika Saila [desde mayo 2018 / 2018ko maiatzetik]
Permanent URI for this community
Véase además departamentos anteriores a mayo 2018 / Ikus, halaber, 2018ko maiatza baino lehenagoko sailak
Dpto. Automática y Computación - Automatika eta Konputazioa Saila
Dpto. Estadística e Investigación Operativa - Estatistika eta Ikerketa Operatiboa Saila
Dpto. Ingeniería Matemática e Informática - Matematika eta Informatika Ingeniaritza Saila
Dpto. Matemáticas - Matematika Saila
Browse
Browsing Dpto. Estadística, Informática y Matemáticas - Estatistika, Informatika eta Matematika Saila [desde mayo 2018 / 2018ko maiatzetik] by Author "Adin Urtasun, Aritz"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Publication Open Access Alleviating confounding in spatio-temporal areal models with an application on crimes against women in India(SAGE Publications, 2021) Adin Urtasun, Aritz; Goicoa Mangado, Tomás; Hodges, James S.; Schnell, Patrick M.; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasAssessing associations between a response of interest and a set of covariates in spatial areal models is the leitmotiv of ecological regression. However, the presence of spatially correlated random effects can mask or even bias estimates of such associations due to confounding effects if they are not carefully handled. Though potentially harmful, confounding issues have often been ignored in practice leading to wrong conclusions about the underlying associations between the response and the covariates. In spatio-temporal areal models, the temporal dimension may emerge as a new source of confounding, and the problem may be even worse. In this work, we propose two approaches to deal with confounding of fixed effects by spatial and temporal random effects, while obtaining good model predictions. In particular, restricted regression and an apparently—though in fact not—equivalent procedure using constraints are proposed within both fully Bayes and empirical Bayes approaches. The methods are compared in terms of fixed-effect estimates and model selection criteria. The techniques are used to assess the association between dowry deaths and certain socio-demographic covariates in the districts of Uttar Pradesh, India.Publication Open Access Big problems in spatio-temporal disease mapping: methods and software(Elsevier, 2023) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA20001Background and objective: Fitting spatio-temporal models for areal data is crucial in many fields such as cancer epidemiology. However, when data sets are very large, many issues arise. The main objective of this paper is to propose a general procedure to analyze high-dimensional spatio-temporal areal data, with special emphasis on mortality/incidence relative risk estimation. Methods: We present a pragmatic and simple idea that permits hierarchical spatio-temporal models to be fitted when the number of small areas is very large. Model fitting is carried out using integrated nested Laplace approximations over a partition of the spatial domain. We also use parallel and distributed strategies to speed up computations in a setting where Bayesian model fitting is generally prohibitively time-consuming or even unfeasible. Results: Using simulated and real data, we show that our method outperforms classical global models. We implement the methods and algorithms that we develop in the open-source R package bigDM where specific vignettes have been included to facilitate the use of the methodology for non-expert users. Conclusions: Our scalable methodology proposal provides reliable risk estimates when fitting Bayesian hierarchical spatio-temporal models for high-dimensional data.Publication Open Access Exploring disease mapping models in big data contexts: some new proposals(2023) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA2001La representación cartográfica de enfermedades es un área de investigación muy relevante y significativa dentro del campo de la estadística espacial (datos de área), ya que ofrece un apoyo muy importante para la toma de decisiones en materia de salud pública. Debido a la gran variabilidad de los estimadores de riesgo clásicos, como la razón de mortalidad estandarizada (RME), el uso de modelos estadísticos complejos resulta esencial para obtener una representación más coherente del riesgo de enfermedad subyacente. Durante las últimas décadas se han propuesto en la literatura varios modelos estadísticos para suavizar riesgos espacio-temporales, la mayoría de ellos considerando modelos que incorporan efectos aleatorios con distribuciones a priori condicionales autorregresivas (CAR), basándose en el trabajo seminal de Besag et al. (1991). Sin embargo, la escalabilidad de estos modelos, concretamente su viabilidad en escenarios en los que el número de áreas pequeñas aumenta significativamente, no ha sido estudiada suficientemente. Por lo tanto, el principal objetivo de esta tesis es proponer nuevos métodos de modelización bayesiana escalables para suavizar riesgos (o tasas) de incidencia/mortalidad en datos de área espaciales y espacio-temporales de alta dimensión. La metodología está basada en el principio de “divide y vencerás”. La presente tesis aborda en concreto los objetivos descritos a continuación. El primer objetivo es revisar la bibliografía más reciente acerca de las principales aportaciones en el ámbito espacial y espacio-temporal que son relevantes para los objetivos de esta investigación. El capítulo 1 ofrece una visión general del ajuste y la inferencia de modelos, centrándose en la técnica INLA, basada en aproximaciones de Laplace anidadas e integración numérica, ampliamente utilizada para modelos Gaussianos latentes dentro del paradigma Bayesiano (Rue et al., 2009). En este capítulo también se proporcionan aproximaciones de criterios de selección de modelos basados en la desviación Bayesiana (denominada deviance en inglés) y la distribución predictiva bajo las nuevas propuestas de modelos escalables. También se incluye una breve descripción del paquete bigDM de R, que implementa todos los algoritmos y modelos propuestos en esta disertación. El segundo objetivo de esta tesis es proponer un método de modelización Bayesiana escalable para el tratamiento de datos de área espaciales de alta dimensión. En el Capítulo 2, se facilita una descripción exhaustiva de una nueva metodología de suavización de riesgos. También se lleva a cabo un estudio de simulación multiescenario que incluye casi 8 000 municipios españoles para comparar el método propuesto con un modelo global tipo CAR en términos de bondad de ajuste y precisión en la estimación de la superficie de riesgos. Además, se ilustra el comportamiento de los modelos escalables analizando datos de mortalidad por cáncer de colon y recto en hombres para municipios españoles utilizando dos estrategias diferentes de partición del dominio espacial. El tercer objetivo es ampliar el enfoque de modelización Bayesiana escalable para suavizar riesgos de mortalidad o incidencia espacio-temporales de alta dimensión. En el capítulo 3, se presenta una descripción exhaustiva de los modelos CAR espaciotemporales propuestos originalmente por Knorr-Held (2000), que son la base de la nueva propuesta de modelización para analizar datos de área espacio-temporales. El capítulo también explica las estrategias de paralelización y computación distribuida implementadas en el paquete bigDM para acelerar los cálculos mediante el uso del paquete future (Bengtsson, 2021) de R. Se realiza un estudio de simulación para comparar la nueva propuesta escalable con dos estrategias de fusión diferentes frente a los modelos CAR espacio-temporales tradicionales utilizando el mapa de los municipios españoles como plantilla. Además, se evalúa la nueva propuesta en términos de tiempo computacional. Finalmente, se ilustran y comparan todos los enfoques descritos en este capítulo analizando la evolución espacio-temporal de la mortalidad por cáncer de pulmón en hombres en los municipios españoles durante el periodo 1991-2015. El cuarto objetivo es evaluar la idoneidad del método desarrollado en el Capítulo 3 para la previsión a corto plazo de datos de alta resolución espacial. En el Capítulo 4, se presenta el modelo CAR espacio-temporal que incorpora observaciones faltantes en la variable respuesta para los periodos de tiempo que se van a pronosticar. Adicionalmente, se realiza un estudio de validación para evaluar la capacidad predictiva de los modelos para predicciones a uno, dos y tres periodos utilizando datos reales de mortalidad por cáncer de pulmón en municipios españoles. En este capítulo, también se compara la capacidad predictiva de los modelos utilizando medidas de validación cruzada (denominadas en inglés leave-one-out y leave-group-out) (Liu and Rue, 2022). El quinto objetivo es transversal a todos los capítulos. El objetivo es desarrollar un paquete en lenguaje R de código abierto llamado bigDM (Adin et al., 2023b) que consolida todos los métodos propuestos en esta disertación haciéndolos fácilmente disponibles para su uso por la comunidad científica. La tesis finaliza con las principales conclusiones de este trabajo y detalla futuras líneas de investigación.Publication Open Access Flexible Bayesian P-splines for smoothing age-specific spatio-temporal mortality patterns(SAGE, 2019) Goicoa Mangado, Tomás; Adin Urtasun, Aritz; Etxeberria Andueza, Jaione; Militino, Ana F.; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2In this paper age-space-time models based on one and two-dimensional P-splines with B-spline bases are proposed for smoothing mortality rates, where both xed relative scale and scale invariant two-dimensional penalties are examined. Model tting and inference are carried out using integrated nested Laplace approximations (INLA), a recent Bayesian technique that speeds up computations compared to McMC methods. The models will be illustrated with Spanish breast cancer mortality data during the period 1985-2010, where a general decline in breast cancer mortality has been observed in Spanish provinces in the last decades. The results reveal that mortality rates for the oldest age groups do not decrease in all provinces.Publication Open Access High-dimensional order-free multivariate spatial disease mapping(Springer, 2023) Vicente Fuenzalida, Gonzalo; Adin Urtasun, Aritz; Goicoa Mangado, Tomás; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA2001Despite the amount of research on disease mapping in recent years, the use of multivariate models for areal spatial data remains limited due to difficulties in implementation and computational burden. These problems are exacerbated when the number of areas is very large. In this paper, we introduce an order-free multivariate scalable Bayesian modelling approach to smooth mortality (or incidence) risks of several diseases simultaneously. The proposal partitions the spatial domain into smaller subregions, fits multivariate models in each subdivision and obtains the posterior distribution of the relative risks across the entire spatial domain. The approach also provides posterior correlations among the spatial patterns of the diseases in each partition that are combined through a consensus Monte Carlo algorithm to obtain correlations for the whole study region. We implement the proposal using integrated nested Laplace approximations (INLA) in the R package bigDM and use it to jointly analyse colorectal, lung, and stomach cancer mortality data in Spanish municipalities. The new proposal allows for the analysis of large datasets and yields superior results compared to fitting a single multivariate model. Additionally, it facilitates statistical inference through local homogeneous models, which may be more appropriate than a global homogeneous model when dealing with a large number of areas.Publication Open Access Identifying extreme COVID-19 mortality risks in English small areas: a disease cluster approach(Springer, 2022) Adin Urtasun, Aritz; Congdon, P.; Santafé Rodrigo, Guzmán; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasThe COVID-19 pandemic is having a huge impact worldwide and has highlighted the extent of health inequalities between countries but also in small areas within a country. Identifying areas with high mortality is important both of public health mitigation in COVID-19 outbreaks, and of longer term efforts to tackle social inequalities in health. In this paper we consider different statistical models and an extension of a recent method to analyze COVID-19 related mortality in English small areas during the first wave of the epidemic in the first half of 2020. We seek to identify hotspots, and where they are most geographically concentrated, taking account of observed area factors as well as spatial correlation and clustering in regression residuals, while also allowing for spatial discontinuities. Results show an excess of COVID-19 mortality cases in small areas surrounding London and in other small areas in North-East and and North-West of England. Models alleviating spatial confounding show ethnic isolation, air quality and area morbidity covariates having a significant and broadly similar impact on COVID-19 mortality, whereas nursing home location seems to be slightly less important.Publication Open Access Online relative risks/rates estimation in spatial and spatio-temporal disease mapping(Elsevier, 2019) Adin Urtasun, Aritz; Goicoa Mangado, Tomás; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasBackground and objective: Spatial and spatio-temporal analyses of count data are crucial in epidemiology and other fields to unveil spatial and spatio-temporal patterns of incidence and/or mortality risks. However, fitting spatial and spatio-temporal models is not easy for non-expert users. The objective of this paper is to present an interactive and user-friendly web application (named SSTCDapp) for the analysis of spatial and spatio-temporal mortality or incidence data. Although SSTCDapp is simple to use, the underlying statistical theory is well founded and all key issues such as model identifiability, model selection, and several spatial priors and hyperpriors for sensitivity analyses are properly addressed. Methods: The web application is designed to fit an extensive range of fairly complex spatio-temporal models to smooth the very often extremely variable standardized incidence/mortality risks or crude rates. The application is built with the R package shiny and relies on the well founded integrated nested Laplace approximation technique for model fitting and inference. Results: The use of the web application is shown through the analysis of Spanish spatio-temporal breast cancer data. Different possibilities for the analysis regarding the type of model, model selection criteria, and a range of graphical as well as numerical outputs are provided. Conclusions: Unlike other software used in disease mapping, SSTCDapp facilitates the fit of complex statistical models to non-experts users without the need of installing any software in their own computers, since all the analyses and computations are made in a powerful remote server. In addition, a desktop version is also available to run the application locally in those cases in which data confidentiality is a serious issue.Publication Open Access A scalable approach for short-term disease forecasting in high spatial resolution areal data(Wiley-VCH, 2023) Orozco Acosta, Erick; Riebler, Andrea; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate PublikoaShort-term disease forecasting at specific discrete spatial resolutions has become a high-impact decision-support tool in health planning. However, when the number of areas is very large obtaining predictions can be computationally intensive or even unfeasible using standard spatiotemporal models. The purpose of this paper is to provide a method for short-term predictions in high-dimensional areal data based on a newly proposed ¿divide-and-conquer¿ approach. We assess the predictive performance of this method and other classical spatiotemporal models in a validation study that uses cancer mortality data for the 7907 municipalities of continental Spain. The new proposal outperforms traditional models in terms of mean absolute error, root mean square error, and interval score when forecasting cancer mortality 1, 2, and 3 years ahead. Models are implemented in a fully Bayesian framework using the well-known integrated nested Laplace estimation technique.Publication Open Access Scalable Bayesian modeling for smoothing disease mapping risks in large spatial data sets using INLA(Elsevier, 2021) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2Several methods have been proposed in the spatial statistics literature to analyse big data sets in continuous domains. However, new methods for analysing high-dimensional areal data are still scarce. Here, we propose a scalable Bayesian modelling approach for smoothing mortality (or incidence) risks in high-dimensional data, that is, when the number of small areas is very large. The method is implemented in the R add-on package bigDM and it is based on the idea of “divide and conquer“. Although this proposal could possibly be implemented using any Bayesian fitting technique, we use INLA here (integrated nested Laplace approximations) as it is now a well-known technique, computationally efficient, and easy for practitioners to handle. We analyse the proposal’s empirical performance in a comprehensive simulation study that considers two model-free settings. Finally, the methodology is applied to analyse male colorectal cancer mortality in Spanish municipalities showing its benefits with regard to the standard approach in terms of goodness of fit and computational time.Publication Open Access Space-time analysis of ovarian cancer mortality rates by age groups in Spanish provinces (1989-2015)(BioMed Central, 2020) Trandafir, Paula Camelia; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2Background: Ovarian cancer is a silent and largely asymptomatic cancer, leading to late diagnosis and worse prognosis. The late-stage detection and low survival rates, makes the study of the space-time evolution of ovarian cancer particularly relevant. In addition, research of this cancer in small areas (like provinces or counties) is still scarce. Methods: The study presented here covers all ovarian cancer deaths for women over 50 years of age in the provinces of Spain during the period 1989-2015. Spatio-temporal models have been fitted to smooth ovarian cancer mortality rates in age groups [50,60), [60,70), [70,80), and [80,+), borrowing information from spatial and temporal neighbours. Model fitting and inference has been carried out using the Integrated Nested Laplace Approximation (INLA) technique. Results: Large differences in ovarian cancer mortality among the age groups have been found, with higher mortality rates in the older age groups. Striking differences are observed between northern and southern Spain. The global temporal trends (by age group) reveal that the evolution of ovarian cancer over the whole of Spain has remained nearly constant since the early 2000s. Conclusion: Differences in ovarian cancer mortality exist among the Spanish provinces, years, and age groups. As the exact causes of ovarian cancer remain unknown, spatio-temporal analyses by age groups are essential to discover inequalities in ovarian cancer mortality. Women over 60 years of age should be the focus of follow-up studies as the mortality rates remain constant since 2002. High-mortality provinces should also be monitored to look for specific risk factors.Publication Open Access Two-level resolution of relative risk of dengue disease in a hyperendemic city of Colombia(Public Library of Science, 2018) Adin Urtasun, Aritz; Martínez Bello, Daniel Adyro; López Quílez, Antonio; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasRisk maps of dengue disease offer to the public health officers a tool to model disease risk in space and time. We analyzed the geographical distribution of relative incidence risk of dengue disease in a high incidence city from Colombia, and its evolution in time during the period January 2009—December 2015, identifying regional effects at different levels of spatial aggregations. Cases of dengue disease were geocoded and spatially allocated to census sectors, and temporally aggregated by epidemiological periods. The census sectors are nested in administrative divisions defined as communes, configuring two levels of spatial aggregation for the dengue cases. Spatio-temporal models including census sector and commune-level spatially structured random effects were fitted to estimate dengue incidence relative risks using the integrated nested Laplace approximation (INLA) technique. The final selected model included two-level spatial random effects, a global structured temporal random effect, and a census sector-level interaction term. Risk maps by epidemiological period and risk profiles by census sector were generated from the modeling process, showing the transmission dynamics of the disease. All the census sectors in the city displayed high risk at some epidemiological period in the outbreak periods. Relative risk estimation of dengue disease using INLA offered a quick and powerful method for parameter estimation and inference.