Show simple item record

dc.contributor.advisorGalar Idoate, Mikeles_ES
dc.contributor.advisorBruse, Jan Lukases_ES
dc.creatorZola, Francescoes_ES
dc.description.abstractThe main goal of this thesis is to improve behavioral cybersecurity analysis using machine learning, exploiting graph structures, temporal dissection, and addressing imbalance problems.This main objective is divided into four specific goals: OBJ1: To study the influence of the temporal resolution on highlighting micro-dynamics in the entity behavior classification problem. In real use cases, time-series information could be not enough for describing the entity behavior classification. For this reason, we plan to exploit graph structures for integrating both structured and unstructured data in a representation of entities and their relationships. In this way, it will be possible to appreciate not only the single temporal communication but the whole behavior of these entities. Nevertheless, entity behaviors evolve over time and therefore, a static graph may not be enoughto describe all these changes. For this reason, we propose to use a temporal dissection for creating temporal subgraphs and therefore, analyze the influence of the temporal resolution on the graph creation and the entity behaviors within. Furthermore, we propose to study how the temporal granularity should be used for highlighting network micro-dynamics and short-term behavioral changes which can be a hint of suspicious activities. OBJ2: To develop novel sampling methods that work with disconnected graphs for addressing imbalanced problems avoiding component topology changes. Graph imbalance problem is a very common and challenging task and traditional graph sampling techniques that work directly on these structures cannot be used without modifying the graph’s intrinsic information or introducing bias. Furthermore, existing techniques have shown to be limited when disconnected graphs are used. For this reason, novel resampling methods for balancing the number of nodes that can be directly applied over disconnected graphs, without altering component topologies, need to be introduced. In particular, we propose to take advantage of the existence of disconnected graphs to detect and replicate the most relevant graph components without changing their topology, while considering traditional data-level strategies for handling the entity behaviors within. OBJ3: To study the usefulness of the generative adversarial networks for addressing the class imbalance problem in cybersecurity applications. Although traditional data-level pre-processing techniques have shown to be effective for addressing class imbalance problems, they have also shown downside effects when highly variable datasets are used, as it happens in cybersecurity. For this reason, new techniques that can exploit the overall data distribution for learning highly variable behaviors should be investigated. In this sense, GANs have shown promising results in the image and video domain, however, their extension to tabular data is not trivial. For this reason, we propose to adapt GANs for working with cybersecurity data and exploit their ability in learning and reproducing the input distribution for addressing the class imbalance problem (as an oversampling technique). Furthermore, since it is not possible to find a unique GAN solution that works for every scenario, we propose to study several GAN architectures with several training configurations to detect which is the best option for a cybersecurity application. OBJ4: To analyze temporal data trends and performance drift for enhancing cyber threat analysis. Temporal dynamics and incoming new data can affect the quality of the predictions compromising the model reliability. This phenomenon makes models get outdated without noticing. In this sense, it is very important to be able to extract more insightful information from the application domain analyzing data trends, learning processes, and performance drifts over time. For this reason, we propose to develop a systematic approach for analyzing how the data quality and their amount affect the learning process. Moreover, in the contextof CTI, we propose to study the relations between temporal performance drifts and the input data distribution for detecting possible model limitations, enhancing cyber threat analysis.en
dc.format.extent160 p.
dc.rightsCreative Commons Reconocimiento-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)es_ES
dc.subjectMachine learningen
dc.subjectGraph representation|Class imbalanceen
dc.subjectTemporal dissectionen
dc.titleBehavioral analysis in cybersecurity using machine learning: a study based on graph representation, class imbalance and temporal dissectionen
dc.typeTesis doctoral / Doktoretza tesiaes
dc.contributor.departmentEstadística, Informática y Matemáticases_ES
dc.contributor.departmentEstatistika, Informatika eta Matematikaeu
dc.rights.accessRightsAcceso abierto / Sarbide irekiaes
dc.description.doctorateProgramPrograma de Doctorado en Ciencias y Tecnologías Industriales (RD 99/2011)es_ES
dc.description.doctorateProgramIndustria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011)eu

Files in this item


This item appears in the following Collection(s)

Show simple item record

Creative Commons Reconocimiento-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
Except where otherwise noted, this item's license is described as Creative Commons Reconocimiento-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)

El Repositorio ha recibido la ayuda de la Fundación Española para la Ciencia y la Tecnología para la realización de actividades en el ámbito del fomento de la investigación científica de excelencia, en la Línea 2. Repositorios institucionales (convocatoria 2020-2021).
Logo MinisterioLogo Fecyt