Open Access
CFM-BD: a distributed rule induction algorithm for building compact fuzzy models in Big Data classification problems
(IEEE, 2020) Elkano Ilintxeta, Mikel; Sanz Delgado, José Antonio; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas
Interpretability has always been a major concern for fuzzy rule-based classifiers. The usage of human-readable models allows them to explain the reasoning behind their predictions and decisions. However, when it comes to Big Data classification problems, fuzzy rule based classifiers have not been able to maintain the good tradeoff between accuracy and interpretability that has characterized these techniques in non-Big-Data environments. The most accurate methods build models composed of a large number of rules and fuzzy sets that are too complex, while those approaches focusing on interpretability do not provide state-of-the-art discrimination capabilities. In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data. This method has been specifically designed from scratch for Big Data problems and does not adapt or extend any existing algorithm. The proposed learning process consists of three stages: Preprocessing based on the probability integral transform theorem; rule induction inspired by CHI-BD and Apriori algorithms; and rule selection by means of a global evolutionary optimization. We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime. The results obtained were compared and contrasted with four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS, and CHI-BD). According to this study, CFM-BD is able to provide competitive discrimination capabilities using significantly simpler models composed of a few rules of less than three antecedents, employing five linguistic labels for all variables.
Open Access
A survey of fingerprint classification Part I: taxonomies on feature extraction methods and learning models
(Elsevier, 2015) Galar Idoate, Mikel; Derrac, Joaquín; Peralta, Daniel; Triguero, Isaac; Paternain Dallo, Daniel; López Molina, Carlos; García, Salvador; Benítez, José Manuel; Pagola Barrio, Miguel; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
This paper reviews the fingerprint classification literature looking at the problem from a double perspective. We first deal with feature extraction methods, including the different models considered for singular point detection and for orientation map extraction. Then, we focus on the different learning models considered to build the classifiers used to label new fingerprints. Taxonomies and classifications for the feature extraction, singular point detection, orientation extraction and learning methods are presented. A critical view of the existing literature have led us to present a discussion on the existing methods and their drawbacks such as difficulty in their reimplementation, lack of details or major differences in their evaluations procedures. On this account, an experimental analysis of the most relevant methods is carried out in the second part of this paper, and a new method based on their combination is presented.
Open Access
Extensions of fuzzy sets in image processing: an overview
(EUSFLAT, 2011) Pagola Barrio, Miguel; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Fernández Fernández, Francisco Javier; Galar Idoate, Mikel; Jurío Munárriz, Aránzazu; López Molina, Carlos; Paternain Dallo, Daniel; Sanz Delgado, José Antonio; Couto, Pedro; Melo-Pinto, Pedro; Automática y Computación; Automatika eta Konputazioa
This work presents a valuable review for the interested reader of the recent Works using extensions of fuzzy sets in image processing. The chapter is divided as follows: first we recall the basics of the extensions of fuzzy sets, i.e. Type 2 fuzzy sets, interval-valued fuzzy sets and Atanassov’s intuitionistic fuzzy sets. In sequent sections we review the methods proposed for noise removal (sections 3), image enhancement (section 4), edge detection (section 5) and segmentation (section 6). There exist other image segmentation tasks such as video de-interlacing, stereo matching or object representation that are not described in this work.
Open Access
Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy
(IEEE, 2019) Sáez, José Antonio; Galar Idoate, Mikel; Krawczyk, Bartosz; Institute of Smart Cities - ISC
Learning good-performing classifiers from data with easily separable classes is not usually a difficult task for most of the algorithms. However, problems affecting classifier performance may arise when samples from different classes share similar characteristics or are overlapped, since the boundaries of each class may not be clearly defined. In order to address this problem, the majority of existing works in the literature propose to either adapt well-known algorithms to reduce the negative impact of overlapping or modify the original data by introducing/removing features which decrease the overlapping region. However, these approaches may present some drawbacks: the changes in specific algorithms may not be useful for other methods and modifying the original data can produce variable results depending on data characteristics and the technique used later. An unexplored and interesting research line to deal with the overlapping phenomenon consists of decomposing the problem into several binary subproblems to reduce its complexity, diminishing the negative effects of overlapping. Based on this novel idea in the field of overlapping data, this paper proposes the usage of the One-vs-One (OVO) strategy to alleviate the presence of overlapping, without modifying existing algorithms or data conformations as suggested by previous works. To test the suitability of the OVO approach with overlapping data, and due to the lack of proposals in the specialized literature, this research also introduces a novel scheme to artificially induce overlapping in real-world datasets, which enables us to simulate different types and levels of overlapping among the classes. The results obtained show that the methods using the OVO achieve better performances when considering data with overlapped classes than those dealing with all classes at the same time.
Open Access
FUZZ-EQ: a data equalizer for boosting the discrimination power of fuzzy classifiers
(Elsevier, 2020) Uriz Martín, Mikel Xabier; Elkano Ilintxeta, Mikel; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA13
The definition of linguistic terms is a critical part of the construction of any fuzzy classifier. Fuzzy partitioning methods (FPMs) range from simple uniform partitioning to sophisticated optimization algorithms. In this paper we present FUZZ-EQ, a preprocessing algorithm that facilitates the construc-tion of meaningful fuzzy partitions regardless of the FPM used. The proposed approach is radically different from any existing FPM: instead of adjusting the fuzzy sets to the training data, FUZZ-EQ adjusts the training data to a hypothetical uniform partition before applying any FPM. To do so, the original data distribution is transformed into a uniform distribution by applying the probability integral transform. FUZZ-EQ allows FPMs to provide classifiers with more granularity on high density regions, increasing the overall discrimination capability. Additionally, we describe the procedure to reverse this transformation and recover the interpretability of linguistic terms. To assess the effectiveness of our proposal, we conducted an extensive empirical study consisting of 41 classification tasks and 9 fuzzy classifiers with different FPMs, rule induction algorithms, and rule structures. We also tested the scalability of FUZZ-EQ in Big Data classification problems such as HIGGS, with 11 million examples. Experimental results reveal that FUZZ-EQ significantly boosted the classification performance of those classifiers using the same linguistic terms for all rules, including state-of-the-art classifiers such as FARC-HD or IVTURS.
Open Access
An empirical study on supervised and unsupervised fuzzy measure construction methods in highly imbalanced classification
(IEEE, 2020) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
The design of an ensemble of classifiers involves the definition of an aggregation mechanism that produces a single response obtained from the information provided by the classifiers. A specific aggregation methodology that has been studied in the literature is the use of fuzzy integrals, such as the Choquet or the Sugeno integral, where the associated fuzzy measure tries to represent the interaction existing between the classifiers of the ensemble. However, defining the big number of coefficients of a fuzzy measure is not a trivial task and therefore, many different algorithms have been proposed. These can be split into supervised and unsupervised, each class having different learning mechanisms and particularities. Since there is no clear knowledge about the correct method to be used, in this work we propose an experimental study for comparing the performance of eight different learning algorithms under the same framework of imbalanced dataset. Moreover, we also compare the specific fuzzy integral (Choquet or Sugeno) and their synergies with the different fuzzy measure construction methods.
Open Access
A scalable and flexible Open Source Big Data architecture for small and medium-sized enterprises
(Springer, 2021) Iñiguez Jiménez, Luis; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
The advancements of Big Data, Internet of Things and Artificial Intelligence are causing the industrial revolution known as Industry 4.0. For automated factories, adopting the necessary technologies for its implementation involves a series of challenges such as the lack of a proper infrastructure, financial limitations, coordination problems or a low understanding of Industry 4.0 implications. Additionally, many implementations focus on solving specific problems without taking other future or parallel projects into account, leading to continuous restructuring and increased complexity, that is, increasing costs. A lack of a global view when implementing Industry 4.0 solutions can cause difficulties in its adoption, leading to future problems that may be unaffordable for Small and Medium-sized Enterprises (SMEs). Traditional Big Data architectures offer remarkable solutions to complex data issues, but do not cover the complete flow of information that is required in Industry 4.0 applications. Therefore, there is a need to create solutions for the difficulties that this new digital transformation brings to avoid future problems, making it affordable also for SMEs. In this work we propose a flexible and scalable Big Data architecture that is well-suited for SMEs with automated factories, taking the aforementioned difficulties into account.
Open Access
Network traffic analysis through node behaviour classification: a graph-based approach with temporal dissection and data-level preprocessing
(Elsevier, 2022) Zola, Francesco; Segurola-Gil, Lander; Bruse, Jan Lukas; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISC
Network traffic analysis is an important cybersecurity task, which helps to classify anomalous, potentially dangerous connections. In many cases, it is critical not only to detect individual malicious connections, but to detect which node in a network has generated malicious traffic so that appropriate actions can be taken to reduce the threat and increase the system's cybersecurity. Instead of analysing connections only, node behavioural analysis can be performed by exploiting the graph information encoded in a connection network. Network traffic, however, is temporal data and extracting graph information without a fixed time scope may only unveil macro-dynamics that are less related to cybersecurity threats. To address these issues, a threefold approach is proposed here: firstly, temporal dissection for extracting graph-based information is applied. As the resulting graphs are typically affected by class imbalance (i.e. malicious nodes are under-represented), two novel graph data-level preprocessing techniques - R-hybrid and SM-hybrid - are introduced, which focus on exploiting the most relevant graph substructures. Finally, a Neural Network (NN) and two Graph Convolutional Network (GCN) approaches are compared when performing node behaviour classification. Furthermore, we compare the node classification performance of these supervised models with traditional unsupervised anomaly detection techniques. Results show that temporal dissection parameters affected classification performance, while the data-level preprocessing strategies reduced class imbalance and led to improved supervised node behaviour classification, outperforming anomaly detection models. In particular, Neural Network (NN) outperformed Graph Convolutional Network (GCN) approaches for two attack families and was less affected by class imbalance, yet one GCN performed best overall. The presented study successfully applies a temporal graph-based approach for malicious actor detection in network traffic data.
Open Access
A deep learning approach to an enhanced building footprint and road detection in high-resolution satellite imagery
(MDPI, 2021) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology.
Open Access
Towards fine-grained road maps extraction using sentinel-2 imagery
(Copernicus, 2021) Ayala Lauroba, Christian; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako Gobernua
Nowadays, it is highly important to keep road maps up-to-date since a great deal of services rely on them. However, to date, these labours have demanded a great deal of human attention due to their complexity. In the last decade, promising attempts have been carried out to fully-automatize the extraction of road networks from remote sensing imagery. Nevertheless, the vast majority of methods rely on aerial imagery (< 1 m), whose costs are not yet affordable for maintaining up-to-date maps. This work proves that it is also possible to accurately detect roads using high resolution satellite imagery (10 m). Accordingly, we have relied on Sentinel-2 imagery considering its freely availability and the higher revisit times compared to aerial imagery. It must be taken into account that the lack of spatial resolution of this sensor drastically increases the difficulty of the road detection task, since the feasibility to detect a road depends on its width, which can reach sub-pixel size in Sentinel-2 imagery. For that purpose, a new deep learning architecture which combines semantic segmentation and super-resolution techniques is proposed. As a result, fine-grained road maps at 2.5 m are generated from Sentinel-2 imagery.