Open Access
On the influence of interval normalization in IVOVO fuzzy multi-class classifier
(Springer, 2019) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA13
IVOVO stands for Inverval-Valued One-Vs-One and is the combination of IVTURS fuzzy classifier and the One-Vs-One strategy. This method is designed to improve the performance of IVTURS in multi-class problems, by dividing the original problem into simpler binary ones. The key issue with IVTURS is that interval-valued confidence degrees for each class are returned and, consequently, they have to be normalized for applying a One-Vs-One strategy. However, there is no consensus on which normalization method should be used with intervals. In IVOVO, the normalization method based on the upper bounds was considered as it maintains the admissible order between intervals and also the proportion of ignorance, but no further study was developed. In this work, we aim to extend this analysis considering several normalizations in the literature. We will study both their main theoretical properties and empirical performance in the final results of IVOVO.
Open Access
Metrics for dataset demographic bias: a case study on facial expression recognition
(IEEE, 2024) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Universidad Pública de Navarra - Nafarroako Unibertsitate Publikoa
Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
Open Access
A study of different families of fusion functions for combining classifiers in the one-vs-one strategy
(Springer, 2018) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Jurío Munárriz, Aránzazu; Bustince Sola, Humberto; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
In this work we study the usage of different families of fusion functions for combining classifiers in a multiple classifier system of One-vs-One (OVO) classifiers. OVO is a decomposition strategy used to deal with multi-class classification problems, where the original multi-class problem is divided into as many problems as pair of classes. In a multiple classifier system, classifiers coming from different paradigms such as support vector machines, rule induction algorithms or decision trees are combined. In the literature, several works have addressed the usage of classifier selection methods for these kinds of systems, where the best classifier for each pair of classes is selected. In this work, we look at the problem from a different perspective aiming at analyzing the behavior of different families of fusion functions to combine the classifiers. In fact, a multiple classifier system of OVO classifiers can be seen as a multi-expert decision making problem. In this context, for the fusion functions depending on weights or fuzzy measures, we propose to obtain these parameters from data. Backed-up by a thorough experimental analysis we show that the fusion function to be considered is a key factor in the system. Moreover, those based on weights or fuzzy measures can allow one to better model the aggregation problem.
Open Access
A survey on fingerprint minutiae-based local matching for verification and identification: taxonomy and experimental evaluation
(Elsevier, 2015) Peralta, Daniel; Galar Idoate, Mikel; Triguero, Isaac; Paternain Dallo, Daniel; García, Salvador; Barrenechea Tartas, Edurne; Benítez, José Manuel; Bustince Sola, Humberto; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
Fingerprint recognition has found a reliable application for verification or identification of people in biometrics. Globally, fingerprints can be viewed as valuable traits due to several perceptions observed by the experts; such as the distinctiveness and the permanence on humans and the performance in real applications. Among the main stages of fingerprint recognition, the automated matching phase has received much attention from the early years up to nowadays. This paper is devoted to review and categorize the vast number of fingerprint matching methods proposed in the specialized literature. In particular, we focus on local minutiae-based matching algorithms, which provide good performance with an excellent trade-off between efficacy and efficiency. We identify the main properties and differences of existing methods. Then, we include an experimental evaluation involving the most representative local minutiae-based matching models in both verification and evaluation tasks. The results obtained will be discussed in detail, supporting the description of future directions.
Open Access
Bitcoin and cybersecurity: temporal dissection of blockchain data to unveil changes in entity behavioral patterns
(MDPI, 2019) Zola, Francesco; Bruse, Jan Lukas; Eguimendia, María; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISC
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a 'temporal dissection' of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles'a type of ensemble learning applying stacking techniques'introducing a 'k-fold cross-testing' concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that 'typical' Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data.
Open Access
A supervised fuzzy measure learning algorithm for combining classifiers
(Elsevier, 2023) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Bustince Sola, Humberto; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Fuzzy measure-based aggregations allow taking interactions among coalitions of the input sources into account. Their main drawback when applying them in real-world problems, such as combining classifier ensembles, is how to define the fuzzy measure that governs the aggregation and specifies the interactions. However, their usage for combining classifiers has shown its advantage. The learning of the fuzzy measure can be done either in a supervised or unsupervised manner. This paper focuses on supervised approaches. Existing supervised approaches are designed to minimize the mean squared error cost function, even for classification problems. We propose a new fuzzy measure learning algorithm for combining classifiers that can optimize any cost function. To do so, advancements from deep learning frameworks are considered such as automatic gradient computation. Therefore, a gradient-based method is presented together with three new update policies that are required to preserve the monotonicity constraints of the fuzzy measures. The usefulness of the proposal and the optimization of cross-entropy cost are shown in an extensive experimental study with 58 datasets corresponding to both binary and multi-class classification problems. In this framework, the proposed method is compared with other state-of-the-art methods for fuzzy measure learning.
Open Access
A scalable and flexible Open Source Big Data architecture for small and medium-sized enterprises
(Springer, 2021) Iñiguez Jiménez, Luis; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
The advancements of Big Data, Internet of Things and Artificial Intelligence are causing the industrial revolution known as Industry 4.0. For automated factories, adopting the necessary technologies for its implementation involves a series of challenges such as the lack of a proper infrastructure, financial limitations, coordination problems or a low understanding of Industry 4.0 implications. Additionally, many implementations focus on solving specific problems without taking other future or parallel projects into account, leading to continuous restructuring and increased complexity, that is, increasing costs. A lack of a global view when implementing Industry 4.0 solutions can cause difficulties in its adoption, leading to future problems that may be unaffordable for Small and Medium-sized Enterprises (SMEs). Traditional Big Data architectures offer remarkable solutions to complex data issues, but do not cover the complete flow of information that is required in Industry 4.0 applications. Therefore, there is a need to create solutions for the difficulties that this new digital transformation brings to avoid future problems, making it affordable also for SMEs. In this work we propose a flexible and scalable Big Data architecture that is well-suited for SMEs with automated factories, taking the aforementioned difficulties into account.
Open Access
A deep learning approach to an enhanced building footprint and road detection in high-resolution satellite imagery
(MDPI, 2021) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology.
Open Access
Towards fine-grained road maps extraction using sentinel-2 imagery
(Copernicus, 2021) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
Nowadays, it is highly important to keep road maps up-to-date since a great deal of services rely on them. However, to date, these labours have demanded a great deal of human attention due to their complexity. In the last decade, promising attempts have been carried out to fully-automatize the extraction of road networks from remote sensing imagery. Nevertheless, the vast majority of methods rely on aerial imagery (< 1 m), whose costs are not yet affordable for maintaining up-to-date maps. This work proves that it is also possible to accurately detect roads using high resolution satellite imagery (10 m). Accordingly, we have relied on Sentinel-2 imagery considering its freely availability and the higher revisit times compared to aerial imagery. It must be taken into account that the lack of spatial resolution of this sensor drastically increases the difficulty of the road detection task, since the feasibility to detect a road depends on its width, which can reach sub-pixel size in Sentinel-2 imagery. For that purpose, a new deep learning architecture which combines semantic segmentation and super-resolution techniques is proposed. As a result, fine-grained road maps at 2.5 m are generated from Sentinel-2 imagery.
Open Access
CFM-BD: a distributed rule induction algorithm for building compact fuzzy models in Big Data classification problems
(IEEE, 2020) Elkano Ilintxeta, Mikel; Sanz Delgado, José Antonio; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas
Interpretability has always been a major concern for fuzzy rule-based classifiers. The usage of human-readable models allows them to explain the reasoning behind their predictions and decisions. However, when it comes to Big Data classification problems, fuzzy rule based classifiers have not been able to maintain the good tradeoff between accuracy and interpretability that has characterized these techniques in non-Big-Data environments. The most accurate methods build models composed of a large number of rules and fuzzy sets that are too complex, while those approaches focusing on interpretability do not provide state-of-the-art discrimination capabilities. In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data. This method has been specifically designed from scratch for Big Data problems and does not adapt or extend any existing algorithm. The proposed learning process consists of three stages: Preprocessing based on the probability integral transform theorem; rule induction inspired by CHI-BD and Apriori algorithms; and rule selection by means of a global evolutionary optimization. We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime. The results obtained were compared and contrasted with four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS, and CHI-BD). According to this study, CFM-BD is able to provide competitive discrimination capabilities using significantly simpler models composed of a few rules of less than three antecedents, employing five linguistic labels for all variables.