Open Access
Generative adversarial networks for bitcoin data augmentation
(IEEE, 2020) Zola, Francesco; Bruse, Jan Lukas; Etxeberria Barrio, Xabier; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISC
In Bitcoin entity classification, results are strongly conditioned by the ground-truth dataset, especially when applying supervised machine learning approaches. However, these ground-truth datasets are frequently affected by significant class imbalance as generally they contain much more information regarding legal services (Exchange, Gambling), than regarding services that may be related to illicit activities (Mixer, Service). Class imbalance increases the complexity of applying machine learning techniques and reduces the quality of classification results, especially for underrepresented, but critical classes.In this paper, we propose to address this problem by using Generative Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have shown promising results in the domain of image classification. However, there is no 'one-fits-all' GAN solution that works for every scenario. In fact, setting GAN training parameters is non-trivial and heavily affects the quality of the generated synthetic data. We therefore evaluate how GAN parameters such as the optimization function, the size of the dataset and the chosen batch size affect GAN implementation for one underrepresented entity class (Mining Pool) and demonstrate how a 'good' GAN configuration can be obtained that achieves high similarity between synthetically generated and real Bitcoin address data. To the best of our knowledge, this is the first study presenting GANs as a valid tool for generating synthetic address data for data augmentation in Bitcoin entity classification.
Open Access
Extensions of fuzzy sets in image processing: an overview
(EUSFLAT, 2011) Pagola Barrio, Miguel; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Fernández Fernández, Francisco Javier; Galar Idoate, Mikel; Jurío Munárriz, Aránzazu; López Molina, Carlos; Paternain Dallo, Daniel; Sanz Delgado, José Antonio; Couto, Pedro; Melo-Pinto, Pedro; Automática y Computación; Automatika eta Konputazioa
This work presents a valuable review for the interested reader of the recent Works using extensions of fuzzy sets in image processing. The chapter is divided as follows: first we recall the basics of the extensions of fuzzy sets, i.e. Type 2 fuzzy sets, interval-valued fuzzy sets and Atanassov’s intuitionistic fuzzy sets. In sequent sections we review the methods proposed for noise removal (sections 3), image enhancement (section 4), edge detection (section 5) and segmentation (section 6). There exist other image segmentation tasks such as video de-interlacing, stereo matching or object representation that are not described in this work.
Open Access
A supervised fuzzy measure learning algorithm for combining classifiers
(Elsevier, 2023) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Bustince Sola, Humberto; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Fuzzy measure-based aggregations allow taking interactions among coalitions of the input sources into account. Their main drawback when applying them in real-world problems, such as combining classifier ensembles, is how to define the fuzzy measure that governs the aggregation and specifies the interactions. However, their usage for combining classifiers has shown its advantage. The learning of the fuzzy measure can be done either in a supervised or unsupervised manner. This paper focuses on supervised approaches. Existing supervised approaches are designed to minimize the mean squared error cost function, even for classification problems. We propose a new fuzzy measure learning algorithm for combining classifiers that can optimize any cost function. To do so, advancements from deep learning frameworks are considered such as automatic gradient computation. Therefore, a gradient-based method is presented together with three new update policies that are required to preserve the monotonicity constraints of the fuzzy measures. The usefulness of the proposal and the optimization of cross-entropy cost are shown in an extensive experimental study with 58 datasets corresponding to both binary and multi-class classification problems. In this framework, the proposed method is compared with other state-of-the-art methods for fuzzy measure learning.
Open Access
Metrics for dataset demographic bias: a case study on facial expression recognition
(IEEE, 2024) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Universidad Pública de Navarra - Nafarroako Unibertsitate Publikoa
Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
Open Access
Network traffic analysis through node behaviour classification: a graph-based approach with temporal dissection and data-level preprocessing
(Elsevier, 2022) Zola, Francesco; Segurola-Gil, Lander; Bruse, Jan Lukas; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISC
Network traffic analysis is an important cybersecurity task, which helps to classify anomalous, potentially dangerous connections. In many cases, it is critical not only to detect individual malicious connections, but to detect which node in a network has generated malicious traffic so that appropriate actions can be taken to reduce the threat and increase the system's cybersecurity. Instead of analysing connections only, node behavioural analysis can be performed by exploiting the graph information encoded in a connection network. Network traffic, however, is temporal data and extracting graph information without a fixed time scope may only unveil macro-dynamics that are less related to cybersecurity threats. To address these issues, a threefold approach is proposed here: firstly, temporal dissection for extracting graph-based information is applied. As the resulting graphs are typically affected by class imbalance (i.e. malicious nodes are under-represented), two novel graph data-level preprocessing techniques - R-hybrid and SM-hybrid - are introduced, which focus on exploiting the most relevant graph substructures. Finally, a Neural Network (NN) and two Graph Convolutional Network (GCN) approaches are compared when performing node behaviour classification. Furthermore, we compare the node classification performance of these supervised models with traditional unsupervised anomaly detection techniques. Results show that temporal dissection parameters affected classification performance, while the data-level preprocessing strategies reduced class imbalance and led to improved supervised node behaviour classification, outperforming anomaly detection models. In particular, Neural Network (NN) outperformed Graph Convolutional Network (GCN) approaches for two attack families and was less affected by class imbalance, yet one GCN performed best overall. The presented study successfully applies a temporal graph-based approach for malicious actor detection in network traffic data.
Open Access
Additional feature layers from ordered aggregations for deep neural networks
(IEEE, 2020) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
In the last years we have seen huge advancements in the area of Machine Learning, specially with the use of Deep Neural Networks. One of the most relevant examples is in image classification, where convolutional neural networks have shown to be a vital tool, hard to replace with any other techniques. Although aggregation functions, such as OWA operators, have been previously used on top of neural networks, usually to aggregate the outputs of different networks or systems (ensembles), in this paper we propose and explore a new way of using OWA aggregations in deep learning. We implement OWA aggregations as a new layer inside a convolutional neural network. These layers are used to learn additional order-based information from the feature maps of a certain layer, and then the newly generated information is used as a complement input for the following layers. We carry out several tests introducing the new layer in a VGG13-based reference network and show that this layer introduces new knowledge into the network without substantially increasing training times.
Open Access
Pushing the limits of Sentinel-2 for building footprint extraction
(IEEE, 2022) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC
Building footprint maps are of high importance nowadays since a wide range of services relies on them to work. However, activities to keep these maps up-to-date are costly and time-consuming due to the great deal of human intervention required. Several automation attempts have been carried out in the last decade aiming at fully automatizing them. However, taking into account the complexity of the task and the current limitations of semantic segmentation deep learning models, the vast majority of approaches rely on aerial imagery (<1 m). As a result, prohibitive costs and high revisit times prevent the remote sensing community from maintaining up-to-date building maps. This work proposes a novel deep learning architecture to accurately extract building footprints from high resolution satellite imagery (10 m). Accordingly, super-resolution and semantic segmentation techniques have been fused to make it possible not only to improve the building's boundary definition but also to detect buildings with sub-pixel width. As a result, fine-grained building maps at 2.5 m are generated using Sentinel-2 imagery, closing the gap between satellite and aerial semantic segmentation.
Open Access
A survey on fingerprint minutiae-based local matching for verification and identification: taxonomy and experimental evaluation
(Elsevier, 2015) Peralta, Daniel; Galar Idoate, Mikel; Triguero, Isaac; Paternain Dallo, Daniel; García, Salvador; Barrenechea Tartas, Edurne; Benítez, José Manuel; Bustince Sola, Humberto; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
Fingerprint recognition has found a reliable application for verification or identification of people in biometrics. Globally, fingerprints can be viewed as valuable traits due to several perceptions observed by the experts; such as the distinctiveness and the permanence on humans and the performance in real applications. Among the main stages of fingerprint recognition, the automated matching phase has received much attention from the early years up to nowadays. This paper is devoted to review and categorize the vast number of fingerprint matching methods proposed in the specialized literature. In particular, we focus on local minutiae-based matching algorithms, which provide good performance with an excellent trade-off between efficacy and efficiency. We identify the main properties and differences of existing methods. Then, we include an experimental evaluation involving the most representative local minutiae-based matching models in both verification and evaluation tasks. The results obtained will be discussed in detail, supporting the description of future directions.
Open Access
A deep learning approach to an enhanced building footprint and road detection in high-resolution satellite imagery
(MDPI, 2021) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology.
Open Access
Towards fine-grained road maps extraction using sentinel-2 imagery
(Copernicus, 2021) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
Nowadays, it is highly important to keep road maps up-to-date since a great deal of services rely on them. However, to date, these labours have demanded a great deal of human attention due to their complexity. In the last decade, promising attempts have been carried out to fully-automatize the extraction of road networks from remote sensing imagery. Nevertheless, the vast majority of methods rely on aerial imagery (< 1 m), whose costs are not yet affordable for maintaining up-to-date maps. This work proves that it is also possible to accurately detect roads using high resolution satellite imagery (10 m). Accordingly, we have relied on Sentinel-2 imagery considering its freely availability and the higher revisit times compared to aerial imagery. It must be taken into account that the lack of spatial resolution of this sensor drastically increases the difficulty of the road detection task, since the feasibility to detect a road depends on its width, which can reach sub-pixel size in Sentinel-2 imagery. For that purpose, a new deep learning architecture which combines semantic segmentation and super-resolution techniques is proposed. As a result, fine-grained road maps at 2.5 m are generated from Sentinel-2 imagery.