Mostrar el registro sencillo del ítem
Temporal analysis of distribution shifts in malware classification for digital forensics
dc.creator | Zola, Francesco | es_ES |
dc.creator | Bruse, Jan Lukas | es_ES |
dc.creator | Galar Idoate, Mikel | es_ES |
dc.date.accessioned | 2023-11-15T13:18:58Z | |
dc.date.available | 2023-11-15T13:18:58Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | Zola, F., Bruse, J. L., Galar, M. (2023) Temporal analysis of distribution shifts in malware classification for digital forensics. En 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) 439-450. Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/EuroSPW59978.2023.00054. | en |
dc.identifier.isbn | 979-8-3503-2720-5 | |
dc.identifier.uri | https://hdl.handle.net/2454/46779 | |
dc.description.abstract | In recent years, malware diversity and complexity have increased substantially, so the detection and classification of malware families have become one of the key objectives of information security. Machine learning (ML)-based approaches have been proposed to tackle this problem. However, most of these approaches focus on achieving high classification performance scores in static scenarios, without taking into account a key feature of malware: it is constantly evolving. This leads to ML models being outdated and performing poorly after only a few months, leaving stakeholders exposed to potential security risks. With this work, our aim is to highlight the issues that may arise when applying ML-based classification to malware data. We propose a three-step approach to carry out a forensics exploration of model failures. In particular, in the first step, we evaluate and compare the concept drift generated by models trained using a rolling windows approach for selecting the training dataset. In the second step, we evaluate model drift based on the amount of temporal information used in the training dataset. Finally, we perform an in-depth misclassification and feature analysis to emphasize the interpretation of the results and to highlight drift causes. We conclude that caution is warranted when training ML models for malware analysis, as concept drift and clear performance drops were observed even for models trained on larger datasets. Based on our results, it may be more beneficial to train models on fewer but recent data and re-train them after a few months in order to maintain performance. | en |
dc.description.sponsorship | This work has been partially supported by the European Union’s Horizon 2020 Research and Innovation Programme under the project STARLIGHT (Grant Agreement No. 101021797). | en |
dc.format.mimetype | application/pdf | en |
dc.language.iso | eng | en |
dc.publisher | IEEE | en |
dc.relation.ispartof | 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). Piscataway: Institute of Electrical and Electronics Engineers Inc.; 2023. p.439-450 979-8-3503-2720-5 | en |
dc.rights | © 2023, Francesco Zola. Under license to IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work | en |
dc.subject | Concept drift | en |
dc.subject | Explainability | en |
dc.subject | Forensic exploration | en |
dc.subject | Malware classification | en |
dc.subject | Temporal analysis | en |
dc.title | Temporal analysis of distribution shifts in malware classification for digital forensics | en |
dc.type | Contribución a congreso / Biltzarrerako ekarpena | es |
dc.type | info:eu-repo/semantics/conferenceObject | en |
dc.date.updated | 2023-11-15T13:11:33Z | |
dc.contributor.department | Institute of Smart Cities - ISC | en |
dc.rights.accessRights | Acceso abierto / Sarbide irekia | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | en |
dc.identifier.doi | 10.1109/EuroSPW59978.2023.00054 | |
dc.relation.projectID | info:eu-repo/grantAgreement/European Commission/Horizon 2020 Framework Programme/101021797 | en |
dc.relation.publisherversion | https://doi.org/10.1109/EuroSPW59978.2023.00054 | |
dc.type.version | Versión aceptada / Onetsi den bertsioa | es |
dc.type.version | info:eu-repo/semantics/acceptedVersion | en |