Adin Urtasun, Aritz
Loading...
Email Address
person.page.identifierURI
Birth Date
Job Title
Last Name
Adin Urtasun
First Name
Aritz
person.page.departamento
Estadística, Informática y Matemáticas
person.page.instituteName
InaMat2. Instituto de Investigación en Materiales Avanzados y Matemáticas
ORCID
person.page.observainves
person.page.upna
Name
- Publications
- item.page.relationships.isAdvisorOfPublication
- item.page.relationships.isAdvisorTFEOfPublication
- item.page.relationships.isAuthorMDOfPublication
19 results
Search Results
Now showing 1 - 10 of 19
Publication Open Access Automatic cross-validation in structured models: is it time to leave out leave-one-out?(Elsevier, 2024-07-01) Adin Urtasun, Aritz; Krainski, Elias Teixeira; Lenzi, Amanda; Liu, Zhedong; Martínez-Minaya, Joaquín; Rue, Håvard; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertistate PublikoaStandard techniques such as leave-one-out cross-validation (LOOCV) might not be suitable for evaluating the predictive performance of models incorporating structured random effects. In such cases, the correlation between the training and test sets could have a notable impact on the model's prediction error. To overcome this issue, an automatic group construction procedure for leave-group-out cross validation (LGOCV) has recently emerged as a valuable tool for enhancing predictive performance measurement in structured models. The purpose of this paper is (i) to compare LOOCV and LGOCV within structured models, emphasizing model selection and predictive performance, and (ii) to provide real data applications in spatial statistics using complex structured models fitted with INLA, showcasing the utility of the automatic LGOCV method. First, we briefly review the key aspects of the recently proposed LGOCV method for automatic group construction in latent Gaussian models. We also demonstrate the effectiveness of this method for selecting the model with the highest predictive performance by simulating extrapolation tasks in both temporal and spatial data analyses. Finally, we provide insights into the effectiveness of the LGOCV method in modeling complex structured data, encompassing spatio-temporal multivariate count data, spatial compositional data, and spatio-temporal geospatial data.Publication Open Access Análisis espacio-temporal de los accidentes mortales con tractor en España durante el período 2010-2019(Interempresas Media, 2023) Arazuri Garín, Silvia; Ibarrola, Alicia; Mangado Ederra, Jesús; Adin Urtasun, Aritz; Arnal Atarés, Pedro; López Maestresalas, Ainara; Jarén Ceballos, Carmen; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Ingeniería; IngeniaritzaEl sector agrario y el de la construcción son los que presentan los índices de incidencia de accidentes de trabajo mortales más altos de nuestro país, según los datos recogidos por el Instituto Nacional de Seguridad y Salud en el Trabajo (INSST) (2021) dependiente del Ministerio de Trabajo y Economía Social (Cirauqui, 2022). Si tenemos en cuenta la evolución de estos índices, el sector agrario es el único que no ha mejorado dicho índice desde la aparición de la Ley 31/1995 de prevención de riesgos laborales y su siniestralidad continúa aumentando (Fundación Mapfre 2020). Pero, ¿qué ocurre cuando el accidente lo sufren personas que no encajan en la definición legal de trabajador? Estos accidentes no son considerados 'accidente de trabajo' y, por tanto, escapan a todas las estadísticas y datos oficiales del INSST. Este suele ser el caso de muchos accidentes que sufren personas jubiladas, menores de 16 años, familiares colaboradores, etc. que no son personas vinculadas a la actividad laboral tal y como se define en la legislación. Según Arana et al. (2010) de un total de 388 accidentes mortales ocurridos en España con maquinaria agrícola durante los años 2004-2008, solamente el 61,85% de ellos tuvieron carácter oficial. Las personas mayores fueron el sector de la población con un mayor riesgo, seguidos de los niños y las personas ajenas al sector agrario. La mayoría de las muertes fueron debidas al vuelco de tractores sin estructuras de protección.Publication Open Access In spatio-temporal disease mapping models, identifiability constraints affect PQL and INLA results(Springer, 2018) Goicoa Mangado, Tomás; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Hodges, James S.; Institute for Advanced Materials and Mathematics - INAMAT2Disease mapping studies the distribution of relative risks or rates in space and time, and typically relies on generalized linear mixed models (GLMMs) including fixed effects and spatial, temporal, and spatio-temporal random effects. These GLMMs are typically not identifiable and constraints are required to achieve sensible results. However, automatic specification of constraints can sometimes lead to misleading results. In particular, the penalized quasi-likelihood fitting technique automatically centers the random effects even when this is not necessary. In the Bayesian approach, the recently-introduced integrated nested Laplace approximations computing technique can also produce wrong results if constraints are not wellspecified. In this paper the spatial, temporal, and spatiotemporal interaction random effects are reparameterized using the spectral decompositions of their precision matrices to establish the appropriate identifiability constraints. Breast cancer mortality data from Spain is used to illustrate the ideas.Publication Open Access Scalable Bayesian modeling for smoothing disease mapping risks in large spatial data sets using INLA(Elsevier, 2021) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2Several methods have been proposed in the spatial statistics literature to analyse big data sets in continuous domains. However, new methods for analysing high-dimensional areal data are still scarce. Here, we propose a scalable Bayesian modelling approach for smoothing mortality (or incidence) risks in high-dimensional data, that is, when the number of small areas is very large. The method is implemented in the R add-on package bigDM and it is based on the idea of “divide and conquer“. Although this proposal could possibly be implemented using any Bayesian fitting technique, we use INLA here (integrated nested Laplace approximations) as it is now a well-known technique, computationally efficient, and easy for practitioners to handle. We analyse the proposal’s empirical performance in a comprehensive simulation study that considers two model-free settings. Finally, the methodology is applied to analyse male colorectal cancer mortality in Spanish municipalities showing its benefits with regard to the standard approach in terms of goodness of fit and computational time.Publication Open Access Dealing with risk discontinuities to estimate cancer mortality risks when the number of small areas is large(SAGE, 2021-02-17) Santafé Rodrigo, Guzmán; Adin Urtasun, Aritz; Lee, Duncan; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2Many statistical models have been developed during the last years to smooth risks in disease mapping. However, most of these modeling approaches do not take possible local discontinuities into consideration or if they do, they are computationally prohibitive or simply do not work when the number of small areas is large. In this paper, we propose a two-step method to deal with discontinuities and to smooth noisy risks in small areas. In a first stage, a novel density-based clustering algorithm is used. In contrast to previous proposals, this algorithm is able to automatically detect the number of spatial clusters, thus providing a single cluster structure. In the second stage, a Bayesian hierarchical spatial model that takes the cluster configuration into account is fitted, which accounts for the discontinuities in disease risk. To evaluate the performance of this new procedure in comparison to previous proposals, a simulation study has been conducted. Results show competitive risk estimates at a much better computational cost. The new methodology is used to analyze stomach cancer mortality data in Spanish municipalities.Publication Open Access Bayesian modeling approach in Big Data contexts: an application in spatial epidemiology(IEEE, 2020) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasIn this work we propose a novel scalable Bayesian modeling approach to smooth mortality risks borrowing information from neighbouring regions in high-dimensional spatial disease mapping contexts. The method is based on the well-known divide and conquer approach, so that the spatial domain is divided into D subregions where local spatial models can be fitted simultaneously. Model fitting and inference has been carried out using the integrated nested Laplace approximation (INLA) technique. Male colorectal cancer mortality data in the municipalities of continental Spain have been analyzed using the new model proposals. Results show that the new modeling approach is very competitive in terms of model fitting criteria when compared with a global spatial model, and it is computationally much more efficient.Publication Open Access Hierarchical and spline-based models in space-time disease mapping(2017) Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Goicoa Mangado, Tomás; Estadística e Investigación Operativa; Estatistika eta Ikerketa OperatiboaLa representación cartográfica de enfermedades (disease mapping) es un área de investigación de gran interés en epidemiología y salud pública. La gran variabilidad inherente a las medidas clásicas de estimación de riesgo como la razón de mortalidad estandarizada, hacen necesario el uso de técnicas estadísticas que estabilicen estas razones. Durante los últimos años se han desarrollado muchos modelos estadísticos para estudiar la distribución geográfica de una enfermedad y su evolución en el tiempo. Sin embargo, la disponibilidad de datos de alta calidad recogidos en muchas regiones y durante largos periodos de tiempo, así como la aparición de nuevos y cada vez más sofisticados modelos, han revelado nuevas dificultades que necesitan ser investigadas a fondo. En el Capítulo 1 se describen algunos modelos espacio-temporales de relevancia para el resto de capítulos abordados en la tesis y se detallan las restricciones necesarias para resolver los problemas de identificación de dichos modelos. El Capítulo 1 también describe la técnica inferencia! Bayesiana utilizada a lo largo de la tesis, basada en aproximaciones de Laplace e integración numérica (conocida como INLA), y su implementación en R. En el Capítulo 2 se han comparado cinco modelos espacio-temporales utilizados en disease mapping. Para poder comparar los diferentes términos de estos modelos, se ha calculado una descomposición del logaritmo de los riesgos estimados definiendo patrones espaciales, temporales y espacio-temporales a posteriori. Los resultados se ilustran con datos de mortalidad por cáncer de encéfalo en las provincias Españolas durante el periodo 1986-2010. Además, se ha realizado un estudio de simulación para comparar el rendimiento de los modelos en términos de sensitividad (habilidad para detectar regiones de alto riesgo verdaderas) y especificidad (habilidad para descartar regiones de alto riesgo falsas). Se concluye que cuando el número de casos esperados es muy pequeño (algo común cuando se analizan enfermedades raras o dominios muy pequeños como municipios), los modelos de P-splines se comportan mejor en términos de detección de áreas de alto riesgo. En el Capítulo 3 se propone una nueva familia de modelos espacio-temporales que incluyen efectos aleatorios para dos niveles espaciales, permitiendo modelizar efectos espaciales y espacio-temporales a diferentes niveles de agregación (como por ejemplo, municipios dentro de provincias o zonas de salud que se ven afectados por políticas de salud similares). Estos modelos han sido utilizados para analizar los datos de mortalidad en los municipios del País Vasco y Navarra durante el periodo 1986-2008. Se ha realizado un estudio de simulación en donde se concluye que si existen diferentes niveles de agregación espacial, los nuevos modelos a dos niveles se comportan mejor que modelos previos propuestos en la literatura. En el Capítulo 4 se presentan nuevos modelos de E-splines que incluyen correlaciones espaciales y temporales desde un enfoque completamente Bayesiano. Concretamente se describen modelos que incluyen B-spline temporales unidimensionales que pueden tener (o no) correlación espacial, así como modelos de B-spline espaciales bidimensionales que pueden tener (o no) correlación temporal. Los resultados se ilustran con datos de mortalidad por cáncer de mama en la España peninsular durante el periodo 1990-2010. Se observa que, en general, utilizar modelos con B-spline temporales distintos para cada área proporciona mejores resultados en términos de ajuste. Sin embargo, cuando el número de áreas aumenta, estos modelos pueden no ser factibles desde un punto de vista computacional. Por el contrario, los modelos de P-spline tridimensionales (previamente propuestos en la literatura y formulados en esta tesis desde un punto de vista completamente Bayesiano) son una alternativa prometedora, obteniendo estimaciones del riesgo precisas en tiempos computaciones mucho más cortos.Publication Open Access Big problems in spatio-temporal disease mapping: methods and software(Elsevier, 2023) Orozco Acosta, Erick; Adin Urtasun, Aritz; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA20001Background and objective: Fitting spatio-temporal models for areal data is crucial in many fields such as cancer epidemiology. However, when data sets are very large, many issues arise. The main objective of this paper is to propose a general procedure to analyze high-dimensional spatio-temporal areal data, with special emphasis on mortality/incidence relative risk estimation. Methods: We present a pragmatic and simple idea that permits hierarchical spatio-temporal models to be fitted when the number of small areas is very large. Model fitting is carried out using integrated nested Laplace approximations over a partition of the spatial domain. We also use parallel and distributed strategies to speed up computations in a setting where Bayesian model fitting is generally prohibitively time-consuming or even unfeasible. Results: Using simulated and real data, we show that our method outperforms classical global models. We implement the methods and algorithms that we develop in the open-source R package bigDM where specific vignettes have been included to facilitate the use of the methodology for non-expert users. Conclusions: Our scalable methodology proposal provides reliable risk estimates when fitting Bayesian hierarchical spatio-temporal models for high-dimensional data.Publication Open Access Alleviating confounding in spatio-temporal areal models with an application on crimes against women in India(SAGE Publications, 2021) Adin Urtasun, Aritz; Goicoa Mangado, Tomás; Hodges, James S.; Schnell, Patrick M.; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasAssessing associations between a response of interest and a set of covariates in spatial areal models is the leitmotiv of ecological regression. However, the presence of spatially correlated random effects can mask or even bias estimates of such associations due to confounding effects if they are not carefully handled. Though potentially harmful, confounding issues have often been ignored in practice leading to wrong conclusions about the underlying associations between the response and the covariates. In spatio-temporal areal models, the temporal dimension may emerge as a new source of confounding, and the problem may be even worse. In this work, we propose two approaches to deal with confounding of fixed effects by spatial and temporal random effects, while obtaining good model predictions. In particular, restricted regression and an apparently—though in fact not—equivalent procedure using constraints are proposed within both fully Bayes and empirical Bayes approaches. The methods are compared in terms of fixed-effect estimates and model selection criteria. The techniques are used to assess the association between dowry deaths and certain socio-demographic covariates in the districts of Uttar Pradesh, India.Publication Open Access Identifying extreme COVID-19 mortality risks in English small areas: a disease cluster approach(Springer, 2022) Adin Urtasun, Aritz; Congdon, P.; Santafé Rodrigo, Guzmán; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y MatemáticasThe COVID-19 pandemic is having a huge impact worldwide and has highlighted the extent of health inequalities between countries but also in small areas within a country. Identifying areas with high mortality is important both of public health mitigation in COVID-19 outbreaks, and of longer term efforts to tackle social inequalities in health. In this paper we consider different statistical models and an extension of a recent method to analyze COVID-19 related mortality in English small areas during the first wave of the epidemic in the first half of 2020. We seek to identify hotspots, and where they are most geographically concentrated, taking account of observed area factors as well as spatial correlation and clustering in regression residuals, while also allowing for spatial discontinuities. Results show an excess of COVID-19 mortality cases in small areas surrounding London and in other small areas in North-East and and North-West of England. Models alleviating spatial confounding show ethnic isolation, air quality and area morbidity covariates having a significant and broadly similar impact on COVID-19 mortality, whereas nursing home location seems to be slightly less important.