Temporal analysis of distribution shifts in malware classification for digital forensics

Zola, Francesco; Bruse, Jan Lukas; Galar Idoate, Mikel

doi:10.1109/EuroSPW59978.2023.00054

dc.creator	Zola, Francesco	es_ES
dc.creator	Bruse, Jan Lukas	es_ES
dc.creator	Galar Idoate, Mikel	es_ES
dc.date.accessioned	2023-11-15T13:18:58Z
dc.date.available	2023-11-15T13:18:58Z
dc.date.issued	2023
dc.identifier.citation	Zola, F., Bruse, J. L., Galar, M. (2023) Temporal analysis of distribution shifts in malware classification for digital forensics. En 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) 439-450. Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/EuroSPW59978.2023.00054.	en
dc.identifier.isbn	979-8-3503-2720-5
dc.identifier.uri	https://hdl.handle.net/2454/46779
dc.description.abstract	In recent years, malware diversity and complexity have increased substantially, so the detection and classification of malware families have become one of the key objectives of information security. Machine learning (ML)-based approaches have been proposed to tackle this problem. However, most of these approaches focus on achieving high classification performance scores in static scenarios, without taking into account a key feature of malware: it is constantly evolving. This leads to ML models being outdated and performing poorly after only a few months, leaving stakeholders exposed to potential security risks. With this work, our aim is to highlight the issues that may arise when applying ML-based classification to malware data. We propose a three-step approach to carry out a forensics exploration of model failures. In particular, in the first step, we evaluate and compare the concept drift generated by models trained using a rolling windows approach for selecting the training dataset. In the second step, we evaluate model drift based on the amount of temporal information used in the training dataset. Finally, we perform an in-depth misclassification and feature analysis to emphasize the interpretation of the results and to highlight drift causes. We conclude that caution is warranted when training ML models for malware analysis, as concept drift and clear performance drops were observed even for models trained on larger datasets. Based on our results, it may be more beneficial to train models on fewer but recent data and re-train them after a few months in order to maintain performance.	en
dc.description.sponsorship	This work has been partially supported by the European Union’s Horizon 2020 Research and Innovation Programme under the project STARLIGHT (Grant Agreement No. 101021797).	en
dc.format.mimetype	application/pdf	en
dc.language.iso	eng	en
dc.publisher	IEEE	en
dc.relation.ispartof	2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). Piscataway: Institute of Electrical and Electronics Engineers Inc.; 2023. p.439-450 979-8-3503-2720-5	en
dc.rights	© 2023, Francesco Zola. Under license to IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work	en
dc.subject	Concept drift	en
dc.subject	Explainability	en
dc.subject	Forensic exploration	en
dc.subject	Malware classification	en
dc.subject	Temporal analysis	en
dc.title	Temporal analysis of distribution shifts in malware classification for digital forensics	en
dc.type	Contribución a congreso / Biltzarrerako ekarpena	es
dc.type	info:eu-repo/semantics/conferenceObject	en
dc.date.updated	2023-11-15T13:11:33Z
dc.contributor.department	Institute of Smart Cities - ISC	en
dc.rights.accessRights	Acceso abierto / Sarbide irekia	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	en
dc.identifier.doi	10.1109/EuroSPW59978.2023.00054
dc.relation.projectID	info:eu-repo/grantAgreement/European Commission/Horizon 2020 Framework Programme/101021797	en
dc.relation.publisherversion	https://doi.org/10.1109/EuroSPW59978.2023.00054
dc.type.version	Versión aceptada / Onetsi den bertsioa	es
dc.type.version	info:eu-repo/semantics/acceptedVersion	en

Ficheros en el ítem

Nombre:: Zola_TemporalAnalysis.pdf
Tamaño:: 682.1Kb
Formato:: PDF

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Comunicaciones y ponencias de congresos - Biltzarrak eta Argitalpenak [808]
Comunicaciones y ponencias de congresos ISC - ISC biltzarretako komunikazioak eta txostenak [227]
Investigaciones financiadas por la Unión Europea (OpenAire) - Europar Batasunak finantzatutako ikerketak (OpenAire) [260]

Mostrar el registro sencillo del ítem