Open Access
Logistic regression versus XGBoost for detecting burned areas using satellite images
(Springer, 2024) Militino, Ana F.; Goyena Baroja, Harkaitz; Pérez Goya, Unai; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Classical statistical methods prove advantageous for small datasets, whereas machine learning algorithms can excel with larger datasets. Our paper challenges this conventional wisdom by addressing a highly significant problem: the identification of burned areas through satellite imagery, that is a clear example of imbalanced data. The methods are illustrated in the North-Central Portugal and the North-West of Spain in October 2017 within a multi-temporal setting of satellite imagery. Daily satellite images are taken from Moderate Resolution Imaging Spectroradiometer (MODIS) products. Our analysis shows that a classical Logistic regression (LR) model competes on par, if not surpasses, a widely employed machine learning algorithm called the extreme gradient boosting algorithm (XGBoost) within this particular domain.
Open Access
Machine learning procedures for daily interpolation of rainfall in Navarre (Spain)
(Springer, 2023) Militino, Ana F.; Ugarte Martínez, María Dolores; Pérez Goya, Unai; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2
Kriging is by far the most well known and widely used statistical method for interpolating data in spatial random fields. The main reason is that it provides the best linear unbiased predictor and it is an exact interpolator when normality is assumed. The robustness of this method allows small departures from normality, however, many meteorological, pollutant and environmental variables have extremely asymmetrical distributions and Kriging cannot be used. Machine learning techniques such as neural networks, random forest, and k-nearest neighbor can be used instead, because they do not require specific distributional assumptions. The drawback is that they do not take account of the spatial dependence, and for an optimal performance in spatial random fields more complex machine learning techniques could be considered. These techniques also require a relatively large amount of training data and they are computationally challenging to implement. For a reduced number of observations, we illustrate the performance of the aforementioned procedures using daily rainfall data of manual meteorological gauge stations in Navarre, where the only auxiliary variables available are the spatial coordinates and the altitude. The quality of the predictions is carefully checked through three versions of the relative root mean squared error (RRMSE). The conclusion is that when we cannot use Kriging, random forest and neural networks outperform k-nearest neighbor technique, and provide reliable predictions of rainfall daily data with scarce auxiliary information.
Open Access
Unpaired spatio-temporal fusion of image patches (USTFIP) from cloud covered images
(Elsevier, 2023) Goyena Baroja, Harkaitz; Pérez Goya, Unai; Montesino San Martín, Manuel; Militino, Ana F.; Wang, Qunming; Atkinson, Peter M.; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2
Spatio-temporal image fusion aims to increase the frequency and resolution of multispectral satellite sensor images in a cost-effective manner. However, practical constraints on input data requirements and computational cost prevent a wider adoption of these methods in real case-studies. We propose an ensemble of strategies to eliminate the need for cloud-free matching pairs of satellite sensor images. The new methodology called Unpaired Spatio-Temporal Fusion of Image Patches (USTFIP) is tested in situations where classical requirements are progressively difficult to meet. Overall, the study shows that USTFIP reduces the root mean square error by 2-to-13% relative to the state-of-the-art Fit-FC fusion method, due to an efficient use of the available information. Implementation of USTFIP through parallel computing saves up to 40% of the computational time required for Fit-FC.
Open Access
Stochastic spatio-temporal models for analysing NDVI distribution of GIMMS NDVI3g images
(MDPI, 2017) Militino, Ana F.; Ugarte Martínez, María Dolores; Pérez Goya, Unai; Estatistika eta Ikerketa Operatiboa; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística e Investigación Operativa; Gobierno de Navarra / Nafarroako Gobernua: Project PI015, 2016
The normalized difference vegetation index (NDVI) is an important indicator for evaluating vegetation change, monitoring land surface fluxes or predicting crop models. Due to the great availability of images provided by different satellites in recent years, much attention has been devoted to testing trend changes with a time series of NDVI individual pixels. However, the spatial dependence inherent in these data is usually lost unless global scales are analyzed. In this paper, we propose incorporating both the spatial and the temporal dependence among pixels using a stochastic spatio-temporal model for estimating the NDVI distribution thoroughly. The stochastic model is a state-space model that uses meteorological data of the Climatic Research Unit (CRU TS3.10) as auxiliary information. The model will be estimated with the Expectation-Maximization (EM) algorithm. The result is a set of smoothed images providing an overall analysis of the NDVI distribution across space and time, where fluctuations generated by atmospheric disturbances, fire events, land-use/cover changes or engineering problems from image capture are treated as random fluctuations. The illustration is carried out with the third generation of NDVI images, termed NDVI3g, of the Global Inventory Modeling and Mapping Studies (GIMMS) in continental Spain. This data are taken in bymonthly periods from January 2011 to December 2013, but the model can be applied to many other variables, countries or regions with different resolutions.
Open Access
Estimación del desempleo por comarcas en Navarra
(Gobierno de Navarra, Departamento de Economía y Hacienda, 2005) Ugarte Martínez, María Dolores; Militino, Ana F.; González Ramajo, Begoña; Goicoa Mangado, Tomás; Sagaseta López, M.; Estadística e Investigación Operativa; Estatistika eta Ikerketa Operatiboa
El conocimiento del desempleo en una región es un indicador potente del ritmo de crecimiento de una economía, ya que de forma indirecta mide su capacidad para generar empleo. El Instituto de Estadística de Navarra está apostando por proporcionar en un futuro cercano estimaciones del desempleo a un nivel cada vez más desagregado. La heterogeneidad de las comarcas navarras y el interés mostrado por administraciones locales y sindicatos, hace necesario tener un conocimiento de la situación de desempleo a nivel comarcal, evitando así descansar únicamente en el resultado global para toda Navarra tal y como lo proporciona la Encuesta de Población Activa (EPA). La tarea es compleja, pero está incardinada además en uno de los objetivos prioritarios del proyecto europeo EURAREA, del cual ha formado parte el Instituto Nacional de Estadística (INE), y por ende, el Instituto de Estadística de Navarra. Es decir, hay un interés real en Europa por proporcionar estimaciones a nivel comarcal. En Navarra esta tarea ya ha comenzado y en este congreso presentamos algunos de los resultados obtenidos. En particular se ilustran las estimaciones preliminares derivadas de la aplicación de diversos estimadores basados en el diseño para obtener la proporción de parados por sexo en las siete comarcas de Navarra. Se compara además el comportamiento de diversos estimadores en términos del sesgo relativo y del error cuadrático medio relativo. Los estimadores ofrecidos permiten calcular además la estimación del número de ocupados e inactivos, así como de sus correspondientes tasas.
Open Access
Locally adaptive change-point detection (LACPD) with applications to environmental changes
(Springer, 2021) Moradi, Mohammad Mehdi; Montesino San Martín, Manuel; Ugarte Martínez, María Dolores; Militino, Ana F.; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y Matemáticas
We propose an adaptive-sliding-window approach (LACPD) for the problem of change-point detection in a set of time-ordered observations. The proposed method is combined with sub-sampling techniques to compensate for the lack of enough data near the time series’ tails. Through a simulation study, we analyse its behaviour in the presence of an early/middle/late change-point in the mean, and compare its performance with some of the frequently used and recently developed change-point detection methods in terms of power, type I error probability, area under the ROC curves (AUC), absolute bias, variance, and root-mean-square error (RMSE). We conclude that LACPD outperforms other methods by maintaining a low type I error probability. Unlike some other methods, the performance of LACPD does not depend on the time index of change-points, and it generally has lower bias than other alternative methods. Moreover, in terms of variance and RMSE, it outperforms other methods when change-points are close to the time series’ tails, whereas it shows a similar (sometimes slightly poorer) performance as other methods when change-points are close to the middle of time series. Finally, we apply our proposal to two sets of real data: the well-known example of annual flow of the Nile river in Awsan, Egypt, from 1871 to 1970, and a novel remote sensing data application consisting of a 34-year time-series of satellite images of the Normalised Difference Vegetation Index in Wadi As-Sirham valley, Saudi Arabia, from 1986 to 2019. We conclude that LACPD shows a good performance in detecting the presence of a change as well as the time and magnitude of change in real conditions.
Open Access
Checking unimodality using isotonic regression: an application to breast cancer mortality rates
(Springer, 2016) Rueda, Cristina; Ugarte Martínez, María Dolores; Militino, Ana F.; Estatistika eta Ikerketa Operatiboa; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística e Investigación Operativa
In some diseases it is well-known that a unimodal mortality pattern exists. A clear example in developed countries is breast cancer, where mortality increased sharply until the nineties and then decreased. This clear unimodal pattern is not necessarily applicable to all regions within a country. In this paper, we develop statistical tools to check if the unimodality pattern persists within regions using order restricted inference. Break points as well as confidence intervals are also provided. In addition, a new test for checking monotonicity against unimodality is derived allowing to discriminate between a simple increasing pattern and an up-then-down response pattern. A comparison with the widely used joinpoint regression technique under unimodality is provided. We show that the joinpoint technique could fail when the underlying function is not piecewise linear. Results will be illustrated using age-specific breast cancer mortality data from Spain in the period 1975-2005.
Open Access
Improving the quality of satellite imagery based on ground-truth data from rain gauge stations
(MDPI, 2018) Militino, Ana F.; Ugarte Martínez, María Dolores; Pérez Goya, Unai; Estatistika eta Ikerketa Operatiboa; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística e Investigación Operativa; Gobierno de Navarra / Nafarroako Gobernua
Multitemporal imagery is by and large geometrically and radiometrically accurate, but the residual noise arising from removal clouds and other atmospheric and electronic effects can produce outliers that must be mitigated to properly exploit the remote sensing information. In this study, we show how ground-truth data from rain gauge stations can improve the quality of satellite imagery. To this end, a simulation study is conducted wherein different sizes of outlier outbreaks are spread and randomly introduced in the normalized difference vegetation index (NDVI) and the day and night land surface temperature (LST) of composite images from Navarre (Spain) between 2011 and 2015. To remove outliers, a new method called thin-plate splines with covariates (TpsWc) is proposed. This method consists of smoothing the median anomalies with a thin-plate spline model, whereby transformed ground-truth data are the external covariates of the model. The performance of the proposed method is measured with the square root of the mean square error (RMSE), calculated as the root of the pixel-by-pixel mean square differences between the original data and the predicted data with the TpsWc model and with a state-space model with and without covariates. The study shows that the use of ground-truth data reduces the RMSE in both the TpsWc model and the state-space model used for comparison purposes. The new method successfully removes the abnormal data while preserving the phenology of the raw data. The RMSE reduction percentage varies according to the derived variables (NDVI or LST), but reductions of up to 20% are achieved with the new proposal.
Open Access
Flexible Bayesian P-splines for smoothing age-specific spatio-temporal mortality patterns
(SAGE, 2019) Goicoa Mangado, Tomás; Adin Urtasun, Aritz; Etxeberria Andueza, Jaione; Militino, Ana F.; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2
In this paper age-space-time models based on one and two-dimensional P-splines with B-spline bases are proposed for smoothing mortality rates, where both xed relative scale and scale invariant two-dimensional penalties are examined. Model tting and inference are carried out using integrated nested Laplace approximations (INLA), a recent Bayesian technique that speeds up computations compared to McMC methods. The models will be illustrated with Spanish breast cancer mortality data during the period 1985-2010, where a general decline in breast cancer mortality has been observed in Spanish provinces in the last decades. The results reveal that mortality rates for the oldest age groups do not decrease in all provinces.
Open Access
Using RGISTools to estimate water levels in reservoirs and lakes
(MDPI, 2020) Militino, Ana F.; Montesino San Martín, Manuel; Pérez Goya, Unai; Ugarte Martínez, María Dolores; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Estadística, Informática y Matemáticas
The combination of freely accessible satellite imagery from multiple programs improves the spatio-temporal coverage of remote sensing data, but it exhibits barriers regarding the variety of web services, file formats, and data standards. Ris an open-source software environment with state-of-the-art statistical packages for the analysis of optical imagery. However, it lacks the tools for providing unified access to multi-program archives to customize and process the time series of images. This manuscript introduces RGISTools, a new software that solves these issues, and provides a working example on water mapping, which is a socially and environmentally relevant research field. The case study uses a digital elevation model and a rarely assessed combination of Landsat-8 and Sentinel-2 imagery to determine the water level of a reservoir in Northern Spain. The case study demonstrates how to acquire and process time series of surface reflectance data in an efficient manner. Our method achieves reasonably accurate results, with a root mean squared error of 0.90 m. Future improvements of the package involve the expansion of the workflow to cover the processing of radar images. This should counteract the limitation of the cloud coverage with multi-spectral images.