Open Access
A supervised fuzzy measure learning algorithm for combining classifiers
(Elsevier, 2023) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Bustince Sola, Humberto; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Fuzzy measure-based aggregations allow taking interactions among coalitions of the input sources into account. Their main drawback when applying them in real-world problems, such as combining classifier ensembles, is how to define the fuzzy measure that governs the aggregation and specifies the interactions. However, their usage for combining classifiers has shown its advantage. The learning of the fuzzy measure can be done either in a supervised or unsupervised manner. This paper focuses on supervised approaches. Existing supervised approaches are designed to minimize the mean squared error cost function, even for classification problems. We propose a new fuzzy measure learning algorithm for combining classifiers that can optimize any cost function. To do so, advancements from deep learning frameworks are considered such as automatic gradient computation. Therefore, a gradient-based method is presented together with three new update policies that are required to preserve the monotonicity constraints of the fuzzy measures. The usefulness of the proposal and the optimization of cross-entropy cost are shown in an extensive experimental study with 58 datasets corresponding to both binary and multi-class classification problems. In this framework, the proposed method is compared with other state-of-the-art methods for fuzzy measure learning.
Open Access
A survey of fingerprint classification Part I: taxonomies on feature extraction methods and learning models
(Elsevier, 2015) Galar Idoate, Mikel; Derrac, Joaquín; Peralta, Daniel; Triguero, Isaac; Paternain Dallo, Daniel; López Molina, Carlos; García, Salvador; Benítez, José Manuel; Pagola Barrio, Miguel; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
This paper reviews the fingerprint classification literature looking at the problem from a double perspective. We first deal with feature extraction methods, including the different models considered for singular point detection and for orientation map extraction. Then, we focus on the different learning models considered to build the classifiers used to label new fingerprints. Taxonomies and classifications for the feature extraction, singular point detection, orientation extraction and learning methods are presented. A critical view of the existing literature have led us to present a discussion on the existing methods and their drawbacks such as difficulty in their reimplementation, lack of details or major differences in their evaluations procedures. On this account, an experimental analysis of the most relevant methods is carried out in the second part of this paper, and a new method based on their combination is presented.
Open Access
A survey of fingerprint classification Part II: experimental analysis and ensemble proposal
(Elsevier, 2015) Galar Idoate, Mikel; Derrac, Joaquín; Peralta, Daniel; Triguero, Isaac; Paternain Dallo, Daniel; López Molina, Carlos; García, Salvador; Benítez, José Manuel; Pagola Barrio, Miguel; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
In the first part of this paper we reviewed the fingerprint classification literature from two different perspectives: the feature extraction and the classifier learning. Aiming at answering the question of which among the reviewed methods would perform better in a real implementation we end up in a discussion which showed the difficulty in answering this question. No previous comparison exists in the literature and comparisons among papers are done with different experimental frameworks. Moreover, the difficulty in implementing published methods was stated due to the lack of details in their description, parameters and the fact that no source code is shared. For this reason, in this paper we will go through a deep experimental study following the proposed double perspective. In order to do so, we have carefully implemented some of the most relevant feature extraction methods according to the explanations found in the corresponding papers and we have tested their performance with different classifiers, including those specific proposals made by the authors. Our aim is to develop an objective experimental study in a common framework, which has not been done before and which can serve as a baseline for future works on the topic. This way, we will not only test their quality, but their reusability by other researchers and will be able to indicate which proposals could be considered for future developments. Furthermore, we will show that combining different feature extraction models in an ensemble can lead to a superior performance, significantly increasing the results obtained by individual models.
Open Access
Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy
(IEEE, 2019) Sáez, José Antonio; Galar Idoate, Mikel; Krawczyk, Bartosz; Institute of Smart Cities - ISC
Learning good-performing classifiers from data with easily separable classes is not usually a difficult task for most of the algorithms. However, problems affecting classifier performance may arise when samples from different classes share similar characteristics or are overlapped, since the boundaries of each class may not be clearly defined. In order to address this problem, the majority of existing works in the literature propose to either adapt well-known algorithms to reduce the negative impact of overlapping or modify the original data by introducing/removing features which decrease the overlapping region. However, these approaches may present some drawbacks: the changes in specific algorithms may not be useful for other methods and modifying the original data can produce variable results depending on data characteristics and the technique used later. An unexplored and interesting research line to deal with the overlapping phenomenon consists of decomposing the problem into several binary subproblems to reduce its complexity, diminishing the negative effects of overlapping. Based on this novel idea in the field of overlapping data, this paper proposes the usage of the One-vs-One (OVO) strategy to alleviate the presence of overlapping, without modifying existing algorithms or data conformations as suggested by previous works. To test the suitability of the OVO approach with overlapping data, and due to the lack of proposals in the specialized literature, this research also introduces a novel scheme to artificially induce overlapping in real-world datasets, which enables us to simulate different types and levels of overlapping among the classes. The results obtained show that the methods using the OVO achieve better performances when considering data with overlapped classes than those dealing with all classes at the same time.
Open Access
Unsupervised fuzzy measure learning for classifier ensembles from coalitions performance
(IEEE, 2020) Uriz Martín, Mikel Xabier; Paternain Dallo, Daniel; Domínguez Catena, Iris; Bustince Sola, Humberto; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA13
In Machine Learning an ensemble refers to the combination of several classifiers with the objective of improving the performance of every one of its counterparts. To design an ensemble two main aspects must be considered: how to create a diverse set of classifiers and how to combine their outputs. This work focuses on the latter task. More specifically, we focus on the usage of aggregation functions based on fuzzy measures, such as the Sugeno and Choquet integrals, since they allow to model the coalitions and interactions among the members of the ensemble. In this scenario the challenge is how to construct a fuzzy measure that models the relations among the members of the ensemble. We focus on unsupervised methods for fuzzy measure construction, review existing alternatives and categorize them depending on their features. Furthermore, we intend to address the weaknesses of previous alternatives by proposing a new construction method that obtains the fuzzy measure directly evaluating the performance of each possible subset of classifiers, which can be efficiently computed. To test the usefulness of the proposed fuzzy measure, we focus on the application of ensembles for imbalanced datasets. We consider a set of 66 imbalanced datasets and develop a complete experimental study comparing the reviewed methods and our proposal.
Open Access
INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control
(Elsevier, 2015) Sáez, José Antonio; Galar Idoate, Mikel; Luengo, Julián; Herrera, Francisco; Automática y Computación; Automatika eta Konputazioa
In classification, noise may deteriorate the system performance and increase the complexity of the models built. In order to mitigate its consequences, several approaches have been proposed in the literature. Among them, noise filtering, which removes noisy examples from the training data, is one of the most used techniques. This paper proposes a new noise filtering method that combines several filtering strategies in order to increase the accuracy of the classification algorithms used after the filtering process. The filtering is based on the fusion of the predictions of several classifiers used to detect the presence of noise. We translate the idea behind multiple classifier systems, where the information gathered from different models is combined, to noise filtering. In this way, we consider the combination of classifiers instead of using only one to detect noise. Additionally, the proposed method follows an iterative noise filtering scheme that allows us to avoid the usage of detected noisy examples in each new iteration of the filtering process. Finally, we introduce a noisy score to control the filtering sensitivity, in such a way that the amount of noisy examples removed in each iteration can be adapted to the necessities of the practitioner. The first two strategies (use of multiple classifiers and iterative filtering) are used to improve the filtering accuracy, whereas the last one (the noisy score) controls the level of conservation of the filter removing potentially noisy examples. The validity of the proposed method is studied in an exhaustive experimental study. We compare the new filtering method against several state-of-the-art methods to deal with datasets with class noise and study their efficacy in three classifiers with different sensitivity to noise.