Data stream clustering: introducing recursively extendable aggregation functions for incremental cluster fusion processes

Date

2025-03-07

Authors

Camargo, Heloisa A.
Asmus, Tiago da Cruz
Schick, L.
Andreu-Pérez, Javier
Dimuro, Graçaliz Pereira

Director

Publisher

IEEE
Acceso abierto / Sarbide irekia
Artículo / Artikulua
Versión aceptada / Onetsi den bertsioa

Project identifier

  • AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136627NB-I00/ES/ recolecta
Impacto
No disponible en Scopus

Abstract

In data stream (DS) learning, the system has to extract knowledge from data generated continuously, usually at high speed and in large volumes, making it impossible to store the entire set of data to be processed in batch mode. Hence, machine learning models must be built incrementally by processing the incoming examples, as data arrive, while updating the model to be compatible with the current data. In fuzzy DS clustering, the model can either absorb incoming data into existing clusters or initiate a new cluster. As the volume of data increases, there is a possibility that the clusters will overlap to the point where it is convenient to merge two or more clusters into one. Then, a cluster comparison measure (CM) should be applied, to decide whether such clusters should be combined, also in an incremental manner. This defines an incremental fusion process based on aggregation functions that can aggregate the incoming inputs without storing all the previous inputs. The objective of this article is to solve the fuzzy DS clustering problem of incrementally comparing fuzzy clusters on a formal basis. First, we formalize and operationalize incremental fusion processes of fuzzy clusters by introducing recursively extendable (RE) aggregation functions, studying construction methods and different classes of such functions. Second, we propose two approaches to compare clusters: 1) similarity and 2) overlapping between clusters, based on RE aggregation functions. Finally, we analyze the effect of those incremental CMs on the online and offline phases of the well-known fuzzy clustering algorithm d-FuzzStream, showing that our new approach outperforms the original algorithm and presents better or comparable performance to other state-of-the-art DS clustering algorithms found in the literature.

Description

Keywords

Data streams, Fuzzy clustering, Similarity measures, Overlap indices, Aggregation functions

Department

Estadística, Informática y Matemáticas / Estatistika, Informatika eta Matematika / Institute of Smart Cities - ISC

Faculty/School

Degree

Doctorate program

item.page.cita

Urio-Larrea, A., Camargo, H., Lucca, G., Asmus, T., Marco-Detchart, C., Schick, L., Lopez-Molina, C., Andreu-Perez, J., Bustince, H., Dimuro, G. P. (2025). Data stream clustering: introducing recursively extendable aggregation functions for incremental cluster fusion processes. IEEE Transactions on Cybernetics, 55(3), 1421-1435. https://doi.org/10.1109/TCYB.2025.3527862.

item.page.rights

© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work.

Los documentos de Academica-e están protegidos por derechos de autor con todos los derechos reservados, a no ser que se indique lo contrario.