Mostrar el registro sencillo del ítem

dc.creatorZola, Francescoes_ES
dc.creatorBruse, Jan Lukases_ES
dc.creatorGalar Idoate, Mikeles_ES
dc.date.accessioned2023-11-15T13:18:58Z
dc.date.available2023-11-15T13:18:58Z
dc.date.issued2023
dc.identifier.citationZola, F., Bruse, J. L., Galar, M. (2023) Temporal analysis of distribution shifts in malware classification for digital forensics. En 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) 439-450. Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/EuroSPW59978.2023.00054.en
dc.identifier.isbn979-8-3503-2720-5
dc.identifier.urihttps://hdl.handle.net/2454/46779
dc.description.abstractIn recent years, malware diversity and complexity have increased substantially, so the detection and classification of malware families have become one of the key objectives of information security. Machine learning (ML)-based approaches have been proposed to tackle this problem. However, most of these approaches focus on achieving high classification performance scores in static scenarios, without taking into account a key feature of malware: it is constantly evolving. This leads to ML models being outdated and performing poorly after only a few months, leaving stakeholders exposed to potential security risks. With this work, our aim is to highlight the issues that may arise when applying ML-based classification to malware data. We propose a three-step approach to carry out a forensics exploration of model failures. In particular, in the first step, we evaluate and compare the concept drift generated by models trained using a rolling windows approach for selecting the training dataset. In the second step, we evaluate model drift based on the amount of temporal information used in the training dataset. Finally, we perform an in-depth misclassification and feature analysis to emphasize the interpretation of the results and to highlight drift causes. We conclude that caution is warranted when training ML models for malware analysis, as concept drift and clear performance drops were observed even for models trained on larger datasets. Based on our results, it may be more beneficial to train models on fewer but recent data and re-train them after a few months in order to maintain performance.en
dc.description.sponsorshipThis work has been partially supported by the European Union’s Horizon 2020 Research and Innovation Programme under the project STARLIGHT (Grant Agreement No. 101021797).en
dc.format.mimetypeapplication/pdfen
dc.language.isoengen
dc.publisherIEEEen
dc.relation.ispartof2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). Piscataway: Institute of Electrical and Electronics Engineers Inc.; 2023. p.439-450 979-8-3503-2720-5en
dc.rights© 2023, Francesco Zola. Under license to IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worken
dc.subjectConcept driften
dc.subjectExplainabilityen
dc.subjectForensic explorationen
dc.subjectMalware classificationen
dc.subjectTemporal analysisen
dc.titleTemporal analysis of distribution shifts in malware classification for digital forensicsen
dc.typeContribución a congreso / Biltzarrerako ekarpenaes
dc.typeinfo:eu-repo/semantics/conferenceObjecten
dc.date.updated2023-11-15T13:11:33Z
dc.contributor.departmentInstitute of Smart Cities - ISCen
dc.rights.accessRightsAcceso abierto / Sarbide irekiaes
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessen
dc.identifier.doi10.1109/EuroSPW59978.2023.00054
dc.relation.projectIDinfo:eu-repo/grantAgreement/European Commission/Horizon 2020 Framework Programme/101021797en
dc.relation.publisherversionhttps://doi.org/10.1109/EuroSPW59978.2023.00054
dc.type.versionVersión aceptada / Onetsi den bertsioaes
dc.type.versioninfo:eu-repo/semantics/acceptedVersionen


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem


El Repositorio ha recibido la ayuda de la Fundación Española para la Ciencia y la Tecnología para la realización de actividades en el ámbito del fomento de la investigación científica de excelencia, en la Línea 2. Repositorios institucionales (convocatoria 2020-2021).
Logo MinisterioLogo Fecyt