EUSC: a clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
Fecha
2021Versión
Acceso abierto / Sarbide irekia
Tipo
Artículo / Artikulua
Versión
Versión aceptada / Onetsi den bertsioa
Impacto
|
10.1016/j.asoc.2020.107033
Resumen
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem ...
[++]
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance. [--]
Materias
Data preprocessing,
Evolutionary undersampling,
Fitness approximation,
Imbalanced classification,
Surrogate models
Editor
Elsevier
Publicado en
Applied Soft Computing, 101 (2021) 107033
Departamento
Universidad Pública de Navarra. Departamento de Automática y Computación /
Nafarroako Unibertsitate Publikoa. Automatika eta Konputazioa Saila
Versión del editor
Entidades Financiadoras
The work of H. Lam Le was funded by a Ph.D. scholarship from the School of Computer Science of the University of Nottingham, United Kingdom.