Open Access
Guidelines to compare semantic segmentation maps at different resolutions
(IEEE, 2024) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Choosing the proper ground sampling distance (GSD) is a vital decision in remote sensing, which can determine the success or failure of a project. Higher resolutions may be more suitable for accurately detecting objects, but they also come with higher costs and require more computing power. Semantic segmentation is a common task in remote sensing where GSD plays a crucial role. In semantic segmentation, each pixel of an image is classified into a predefined set of classes, resulting in a semantic segmentation map. However, comparing the results of semantic segmentation at different GSDs is not straightforward. Unlike scene classification and object detection tasks, which are evaluated at scene and object level, respectively, semantic segmentation is typically evaluated at pixel level. This makes it difficult to match elements across different GSDs, resulting in a range of methods for computing metrics, some of which may not be adequate. For this reason, the purpose of this work is to set out a clear set of guidelines for fairly comparing semantic segmentation results obtained at various spatial resolutions. Additionally, we propose to complement the commonly used scene-based pixel-wise metrics with region-based pixel-wise metrics, allowing for a more detailed analysis of the model performance. The set of guidelines together with the proposed region-based metrics are illustrated with building and swimming pool detection problems. The experimental study demonstrates that by following the proposed guidelines and the proposed region-based pixel-wise metrics, it is possible to fairly compare segmentation maps at different spatial resolutions and gain a better understanding of the model's performance. To promote the usage of these guidelines and ease the computation of the new region-based metrics, we create the seg-eval Python library and make it publicly available at https://github.com/itracasa/ seg-eval.
Open Access
Diffusion models for remote sensing imagery semantic segmentation
(IEEE, 2023-10-20) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA25-2022
Denoising Diffusion Probabilistic Models have exhibited impressive performance for generative modelling of images. This paper aims to explore the potential of diffusion models for semantic segmentation tasks in the context of remote sensing. The major challenge of employing these models for semantic segmentation tasks is the generative nature of the model, which produces an arbitrary segmentation mask from a random noise input. Therefore, the diffusion process needs to be constrained to produce a segmentation mask that matches the target image. To address this issue, the denoising process is conditioned by utilizing the input image as a reference. In the experimental study, the proposed model is compared against other state-of-the-art semantic segmentation architectures using the Massachusetts Buildings Aerial dataset. The results of this study provide valuable insights into the potential of diffusion models for semantic segmentation tasks in the field of remote sensing.
Open Access
Towards fine-grained road maps extraction using sentinel-2 imagery
(Copernicus, 2021) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
Nowadays, it is highly important to keep road maps up-to-date since a great deal of services rely on them. However, to date, these labours have demanded a great deal of human attention due to their complexity. In the last decade, promising attempts have been carried out to fully-automatize the extraction of road networks from remote sensing imagery. Nevertheless, the vast majority of methods rely on aerial imagery (< 1 m), whose costs are not yet affordable for maintaining up-to-date maps. This work proves that it is also possible to accurately detect roads using high resolution satellite imagery (10 m). Accordingly, we have relied on Sentinel-2 imagery considering its freely availability and the higher revisit times compared to aerial imagery. It must be taken into account that the lack of spatial resolution of this sensor drastically increases the difficulty of the road detection task, since the feasibility to detect a road depends on its width, which can reach sub-pixel size in Sentinel-2 imagery. For that purpose, a new deep learning architecture which combines semantic segmentation and super-resolution techniques is proposed. As a result, fine-grained road maps at 2.5 m are generated from Sentinel-2 imagery.
Open Access
A deep learning approach to an enhanced building footprint and road detection in high-resolution satellite imagery
(MDPI, 2021) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology.
Open Access
Multi-class strategies for joint building footprint and road detection in remote sensing
(MDPI, 2021) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua, 0011-1408-2020-000008
Building footprints and road networks are important inputs for a great deal of services. For instance, building maps are useful for urban planning, whereas road maps are essential for disaster response services. Traditionally, building and road maps are manually generated by remote sensing experts or land surveying, occasionally assisted by semi-automatic tools. In the last decade, deep learning-based approaches have demonstrated their capabilities to extract these elements automatically and accurately from remote sensing imagery. The building footprint and road network detection problem can be considered a multi-class semantic segmentation task, that is, a single model performs a pixel-wise classification on multiple classes, optimizing the overall performance. However, depending on the spatial resolution of the imagery used, both classes may coexist within the same pixel, drastically reducing their separability. In this regard, binary decomposition techniques, which have been widely studied in the machine learning literature, are proved useful for addressing multiclass problems. Accordingly, the multi-class problem can be split into multiple binary semantic segmentation sub-problems, specializing different models for each class. Nevertheless, in these cases, an aggregation step is required to obtain the final output labels. Additionally, other novel approaches, such as multi-task learning, may come in handy to further increase the performance of the binary semantic segmentation models. Since there is no certainty as to which strategy should be carried out to accurately tackle a multi-class remote sensing semantic segmentation problem, this paper performs an in-depth study to shed light on the issue. For this purpose, open-access Sentinel-1 and Sentinel-2 imagery (at 10 m) are considered for extracting buildings and roads, making use of the well-known U-Net convolutional neural network. It is worth stressing that building and road classes may coexist within the same pixel when working at such a low spatial resolution, setting a challenging problem scheme. Accordingly, a robust experimental study is developed to assess the benefits of the decomposition strategies and their combination with a multi-task learning scheme. The obtained results demonstrate that decomposing the considered multi-class remote sensing semantic segmentation problem into multiple binary ones using a One-vs-All binary decomposition technique leads to better results than the standard direct multi-class approach. Additionally, the benefits of using a multi-task learning scheme for pushing the performance of binary segmentation models are also shown.