EUSC: a clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

View/ Open
Date
2021Version
Acceso abierto / Sarbide irekia
Type
Artículo / Artikulua
Version
Versión aceptada / Onetsi den bertsioa
Impact
|
10.1016/j.asoc.2020.107033
Abstract
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem ...
[++]
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance. [--]
Subject
Data preprocessing,
Evolutionary undersampling,
Fitness approximation,
Imbalanced classification,
Surrogate models
Publisher
Elsevier
Published in
Applied Soft Computing, 101 (2021) 107033
Departament
Universidad Pública de Navarra. Departamento de Automática y Computación /
Nafarroako Unibertsitate Publikoa. Automatika eta Konputazioa Saila
Publisher version
Sponsorship
The work of H. Lam Le was funded by a Ph.D. scholarship from the School of Computer Science of the University of Nottingham, United Kingdom.