Browsing by Author "Galar Idoate, Mikel"
Now showing 1 - 20 of 101
Results Per Page
Sort Options
Publication Open Access Additional feature layers from ordered aggregations for deep neural networks(IEEE, 2020) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate PublikoaIn the last years we have seen huge advancements in the area of Machine Learning, specially with the use of Deep Neural Networks. One of the most relevant examples is in image classification, where convolutional neural networks have shown to be a vital tool, hard to replace with any other techniques. Although aggregation functions, such as OWA operators, have been previously used on top of neural networks, usually to aggregate the outputs of different networks or systems (ensembles), in this paper we propose and explore a new way of using OWA aggregations in deep learning. We implement OWA aggregations as a new layer inside a convolutional neural network. These layers are used to learn additional order-based information from the feature maps of a certain layer, and then the newly generated information is used as a complement input for the following layers. We carry out several tests introducing the new layer in a VGG13-based reference network and show that this layer introduces new knowledge into the network without substantially increasing training times.Publication Open Access Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy(IEEE, 2019) Sáez, José Antonio; Galar Idoate, Mikel; Krawczyk, Bartosz; Institute of Smart Cities - ISCLearning good-performing classifiers from data with easily separable classes is not usually a difficult task for most of the algorithms. However, problems affecting classifier performance may arise when samples from different classes share similar characteristics or are overlapped, since the boundaries of each class may not be clearly defined. In order to address this problem, the majority of existing works in the literature propose to either adapt well-known algorithms to reduce the negative impact of overlapping or modify the original data by introducing/removing features which decrease the overlapping region. However, these approaches may present some drawbacks: the changes in specific algorithms may not be useful for other methods and modifying the original data can produce variable results depending on data characteristics and the technique used later. An unexplored and interesting research line to deal with the overlapping phenomenon consists of decomposing the problem into several binary subproblems to reduce its complexity, diminishing the negative effects of overlapping. Based on this novel idea in the field of overlapping data, this paper proposes the usage of the One-vs-One (OVO) strategy to alleviate the presence of overlapping, without modifying existing algorithms or data conformations as suggested by previous works. To test the suitability of the OVO approach with overlapping data, and due to the lack of proposals in the specialized literature, this research also introduces a novel scheme to artificially induce overlapping in real-world datasets, which enables us to simulate different types and levels of overlapping among the classes. The results obtained show that the methods using the OVO achieve better performances when considering data with overlapped classes than those dealing with all classes at the same time.Publication Open Access Aggregation functions to combine RGB color channels in stereo matching(Optical Society of America, 2013) Galar Idoate, Mikel; Jurío Munárriz, Aránzazu; López Molina, Carlos; Sanz Delgado, José Antonio; Paternain Dallo, Daniel; Bustince Sola, Humberto; Automática y Computación; Automatika eta Konputazioa; Universidad Pública de Navarra / Nafarroako Unibertsitate PublikoaIn this paper we present a comparison study between different aggregation functions for the combination of RGB color channels in stereo matching problem. We introduce color information from images to the stereo matching algorithm by aggregating the similarities of the RGB channels which are calculated independently. We compare the accuracy of different stereo matching algorithms and aggregation functions. We show experimentally that the best function depends on the stereo matching algorithm considered, but the dual of the geometric mean excels as the most robust aggregation.Publication Open Access Algoritmo evolutivo para la optimización y generación de rutas de recogida selectiva de basura. Ahorro en costes y emisiones de Co2(2016) Rodríguez Alfaro, Iosu; Sanz Delgado, José Antonio; Galar Idoate, Mikel; Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación; Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola TeknikoaLa recogida selectiva de basura y residuos urbanos, presta un servicio muy importante para la sociedad. Cada día son muchas las rutas realizadas por los vehículos que se encargan de la recogida de basura. Sin embargo debido a los elevados consumos de los vehículos, las emisiones de gases contaminantes son factor negativo para la sociedad. En la actualidad la necesidad de que estas rutas de recogida sean lo menos contaminantes posibles supone un factor de éxito tanto para el ciudadano como para la empresa encargada de prestar el servicio. En este TFM se propone el estudio de un algoritmo evolutivo para la optimización de rutas ya existentes, así como un algoritmo capaz de generar rutas nuevas a partir de un conjunto de contenedores. En todo momento el objetivo de estos algoritmos será el obtener rutas que reduzcan el coste de las rutas y el consumo de combustible así como las emisiones de CO2, y por tanto supongan un ahorro económicoPublication Open Access Análisis de sentimientos en armonías y melodías mediante Deep Learning(2019) Dendarieta Sarries, Xabier; Galar Idoate, Mikel; Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación; Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola TeknikoaEste proyecto consiste en determinar la capacidad de las redes neuronales profundas (Deep Learning) para analizar el sentimiento que transmite la música. La música es algo que transmite sentimientos de una forma compleja y por tanto es difícil de tratar. Para realizar este trabajo, se transformará la señal de audio a una representación basada en frecuencias (espectrograma) que será utilizado por las redes neuronales para clasificar cada fragmento de audio en uno de los sentimientos establecidos. Estudiamos primero las clases de sentimientos a considerar, posteriormente creamos una base de datos propia sobre la que aplicar los algoritmos de Deep Learning. Dividimos el problema en dos partes: clasificación de la energía que transmite una canción; y clasificación del placer que transmite. Realizamos pruebas con varios tipos de redes neuronales recurrentes, obteniendo buenos resultados en la clasificación de la energía y malos resultados en la clasificación del placer. También probamos una red convolucional para poder comparar los diferentes tipos de red. Creemos que el mayor problema reside en la base de datos creada, tanto en su tamaño como en su consistencia. La tecnología como tal, resulta prometedora a pesar de la mayoría de los resultados.Publication Embargo Análisis, diseño e implemetación de un set-up de captura masiva de imágenes para el entrenamiento de modelos de Deep learning.(2021) Hulsman Bordonaba, Iñaki; Galar Idoate, Mikel; Paternain Dallo, Daniel; Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación; Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola TeknikoaEste trabajo está enmarcado dentro del proyecto "Emotional Films" de la Universidad Pública de Navarra. Emotional Films pretende conseguir mediante Deep Learning el desarrollo de un detector de emociones con el que poder identificar la emoción emitida por un espectador ante una película, y en base a esa emoción modificar en tiempo real ciertos elementos del metraje. De esta forma la película se adapta siempre a los gustos de la persona. En este trabajo, se refleja toda la parte de análisis, diseño e implementación del set-up multicámara para la captura de imágenes , las que servirán de entrenamiento para el modelo de detección de emociones. Entre otras cosas, incluye: un estudio de la literatura sobre otros set-ups de captura de imágenes semejantes al que se pretende implementar, un diseño inicial de la estructura y distribución de cámaras, preparación de distintos entornos de iluminación automatizados, diseño e implementación de la aplicación de ordenador principal para la captura de imágenes, estudio y programación de una cámara con detección de profundidad y sus librerías correspondientes y otros primeros pasos referentes a la fase inicial del proyecto.Publication Open Access Análisis, diseño y despliegue de una base de datos orientada a grafos para la investigación de Derivaciones de Responsabilidades(2019) Carabantes Guerrero, Iván; Galar Idoate, Mikel; Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación; Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola TeknikoaLa principal finalidad del proyecto es el desarrollo de un sistema de investigación de Derivaciones de Responsabilidades para el departamento de Gestión de Deudas de la empresa Tracasa Instrumental S.L que colabora con la Hacienda Tributaria de Navarra (HTN). Actualmente, el proceso de investigación es manual y costoso y carece de una herramienta que lo facilite. El proceso se basa en la investigación de posibles sujetos de una derivación de responsabilidades, por lo que se necesita plasmar las relaciones (jurídicas, familiares...) entre las personas (físicas, jurídicas) junto con los indicadores de capacidad financiera de dichas personas. Asimismo, para satisfacer esta necesidad se ha planteado el despliegue de una base de datos orientada a grafos. Esto implica elaborar una prospección de este tipo de bases de datos en el mercado, confeccionar un análisis y diseño nuevo del modelo de datos de Hacienda y originar un planteamiento del suministro de datos, entre otras muchas tareas.Publication Open Access Aprendizaje de distancias basadas en disimilitudes para el algoritmo de clasificación KNN(2015) Uriz Martín, Mikel Xabier; Galar Idoate, Mikel; Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación; Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola TeknikoaEl objetivo de este proyecto es el de tratar de mejorar el algoritmo KNN (k vecinos más cercanos) sustituyendo la distancia Euclidea clásica por disimilitudes parametrizadas que serán ajustadas utilizando un algoritmo genético. La idea es que el algoritmo genético aprenda diferentes parámetros para luego calcular las distancias entre instancias utilizando esos parámetros, en vez de utilizar otras distancias clásicas como la Euclidea. También consideramos la opción de poder realizar la selección de instancias y de atributos, de esta manera, el algoritmo genético podrá excluir las instancias que sean ruido. Al utilizar esta técnica se acelerara el cálculo de las distancias, ya que al disminuir el número de instancias y de atributos, se requieren menos cálculos a la hora de calcular las distancias. Al final, realizaremos una comparativa con las diversas variantes que se puedan dar y el algoritmo KNN original, para ver si existe mejora a la hora de clasificar.Publication Open Access Attacking bitcoin anonymity: generative adversarial networks for improving bitcoin entity classification(Springer, 2022) Zola, Francesco; Segurola-Gil, Lander; Bruse, Jan Lukas; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISCClassification of Bitcoin entities is an important task to help Law Enforcement Agencies reduce anonymity in the Bitcoin blockchain network and to detect classes more tied to illegal activities. However, this task is strongly conditioned by a severe class imbalance in Bitcoin datasets. Existing approaches for addressing the class imbalance problem can be improved considering generative adversarial networks (GANs) that can boost data diversity. However, GANs are mainly applied in computer vision and natural language processing tasks, but not in Bitcoin entity behaviour classification where they may be useful for learning and generating synthetic behaviours. Therefore, in this work, we present a novel approach to address the class imbalance in Bitcoin entity classification by applying GANs. In particular, three GAN architectures were implemented and compared in order to find the most suitable architecture for generating Bitcoin entity behaviours. More specifically, GANs were used to address the Bitcoin imbalance problem by generating synthetic data of the less represented classes before training the final entity classifier. The results were used to evaluate the capabilities of the different GAN architectures in terms of training time, performance, repeatability, and computational costs. Finally, the results achieved by the proposed GAN-based resampling were compared with those obtained using five well-known data-level preprocessing techniques. Models trained with data resampled with our GAN-based approach achieved the highest accuracy improvements and were among the best in terms of precision, recall and f1-score. Together with Random Oversampling (ROS), GANs proved to be strong contenders in addressing Bitcoin class imbalance and consequently in reducing Bitcoin entity anonymity (overall and per-class classification performance). To the best of our knowledge, this is the first work to explore the advantages and limitations of GANs in generating specific Bitcoin data and “attacking” Bitcoin anonymity. The proposed methods ultimately demonstrate that in Bitcoin applications, GANs are indeed able to learn the data distribution and generate new samples starting from a very limited class representation, which leads to better detection of classes related to illegal activities.Publication Open Access Behavioral analysis in cybersecurity using machine learning: a study based on graph representation, class imbalance and temporal dissection(2022) Zola, Francesco; Galar Idoate, Mikel; Bruse, Jan Lukas; Estadística, Informática y Matemáticas; Estatistika, Informatika eta MatematikaThe main goal of this thesis is to improve behavioral cybersecurity analysis using machine learning, exploiting graph structures, temporal dissection, and addressing imbalance problems.This main objective is divided into four specific goals: OBJ1: To study the influence of the temporal resolution on highlighting micro-dynamics in the entity behavior classification problem. In real use cases, time-series information could be not enough for describing the entity behavior classification. For this reason, we plan to exploit graph structures for integrating both structured and unstructured data in a representation of entities and their relationships. In this way, it will be possible to appreciate not only the single temporal communication but the whole behavior of these entities. Nevertheless, entity behaviors evolve over time and therefore, a static graph may not be enoughto describe all these changes. For this reason, we propose to use a temporal dissection for creating temporal subgraphs and therefore, analyze the influence of the temporal resolution on the graph creation and the entity behaviors within. Furthermore, we propose to study how the temporal granularity should be used for highlighting network micro-dynamics and short-term behavioral changes which can be a hint of suspicious activities. OBJ2: To develop novel sampling methods that work with disconnected graphs for addressing imbalanced problems avoiding component topology changes. Graph imbalance problem is a very common and challenging task and traditional graph sampling techniques that work directly on these structures cannot be used without modifying the graph’s intrinsic information or introducing bias. Furthermore, existing techniques have shown to be limited when disconnected graphs are used. For this reason, novel resampling methods for balancing the number of nodes that can be directly applied over disconnected graphs, without altering component topologies, need to be introduced. In particular, we propose to take advantage of the existence of disconnected graphs to detect and replicate the most relevant graph components without changing their topology, while considering traditional data-level strategies for handling the entity behaviors within. OBJ3: To study the usefulness of the generative adversarial networks for addressing the class imbalance problem in cybersecurity applications. Although traditional data-level pre-processing techniques have shown to be effective for addressing class imbalance problems, they have also shown downside effects when highly variable datasets are used, as it happens in cybersecurity. For this reason, new techniques that can exploit the overall data distribution for learning highly variable behaviors should be investigated. In this sense, GANs have shown promising results in the image and video domain, however, their extension to tabular data is not trivial. For this reason, we propose to adapt GANs for working with cybersecurity data and exploit their ability in learning and reproducing the input distribution for addressing the class imbalance problem (as an oversampling technique). Furthermore, since it is not possible to find a unique GAN solution that works for every scenario, we propose to study several GAN architectures with several training configurations to detect which is the best option for a cybersecurity application. OBJ4: To analyze temporal data trends and performance drift for enhancing cyber threat analysis. Temporal dynamics and incoming new data can affect the quality of the predictions compromising the model reliability. This phenomenon makes models get outdated without noticing. In this sense, it is very important to be able to extract more insightful information from the application domain analyzing data trends, learning processes, and performance drifts over time. For this reason, we propose to develop a systematic approach for analyzing how the data quality and their amount affect the learning process. Moreover, in the contextof CTI, we propose to study the relations between temporal performance drifts and the input data distribution for detecting possible model limitations, enhancing cyber threat analysis.Publication Open Access Bitcoin and cybersecurity: temporal dissection of blockchain data to unveil changes in entity behavioral patterns(MDPI, 2019) Zola, Francesco; Bruse, Jan Lukas; Eguimendia, María; Galar Idoate, Mikel; Orduna Urrutia, Raúl; Institute of Smart Cities - ISCThe Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a 'temporal dissection' of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles'a type of ensemble learning applying stacking techniques'introducing a 'k-fold cross-testing' concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that 'typical' Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data.Publication Open Access Capas basadas en operadores OWA para Redes Neuronales Convolucionales(2020) Domínguez Catena, Iris; Galar Idoate, Mikel; Paternain Dallo, Daniel; Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación; Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola TeknikoaEn este trabajo exploramos una nueva forma de ampliar la capacidad de las Redes Neuronales Convolucionales. En concreto, planteamos una nueva t´ecnica para generar informaci´on adicional a partir de la salida de un bloque convolucional de una Red Neuronal Convolucional, empleando para ello operadores OWA a nivel de canal, y usando esta nueva informaci´on para ampliar la entrada de las siguientes capas de la red. Realizamos diversas pruebas con esta nueva t´ecnica, comprobando como afectan diferentes par´ametros a los resultados, incluyendo el punto de inserci´on de la nueva informaci´on, la cantidad de operadores OWA aplicados, o el tipo de m´etrica empleada para ordenar los canales de informaci´on original.Publication Open Access CFM-BD: a distributed rule induction algorithm for building compact fuzzy models in Big Data classification problems(IEEE, 2020) Elkano Ilintxeta, Mikel; Sanz Delgado, José Antonio; Barrenechea Tartas, Edurne; Bustince Sola, Humberto; Galar Idoate, Mikel; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y MatemáticasInterpretability has always been a major concern for fuzzy rule-based classifiers. The usage of human-readable models allows them to explain the reasoning behind their predictions and decisions. However, when it comes to Big Data classification problems, fuzzy rule based classifiers have not been able to maintain the good tradeoff between accuracy and interpretability that has characterized these techniques in non-Big-Data environments. The most accurate methods build models composed of a large number of rules and fuzzy sets that are too complex, while those approaches focusing on interpretability do not provide state-of-the-art discrimination capabilities. In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data. This method has been specifically designed from scratch for Big Data problems and does not adapt or extend any existing algorithm. The proposed learning process consists of three stages: Preprocessing based on the probability integral transform theorem; rule induction inspired by CHI-BD and Apriori algorithms; and rule selection by means of a global evolutionary optimization. We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime. The results obtained were compared and contrasted with four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS, and CHI-BD). According to this study, CFM-BD is able to provide competitive discrimination capabilities using significantly simpler models composed of a few rules of less than three antecedents, employing five linguistic labels for all variables.Publication Open Access Construction of capacities from overlap indexes(Springer, 2017) Sanz Delgado, José Antonio; Galar Idoate, Mikel; Mesiar, Radko; Bustince Sola, Humberto; Fernández Fernández, Francisco Javier; Automatika eta Konputazioa; Institute of Smart Cities - ISC; Automática y ComputaciónIn this chapter, we show how the concepts of overlap function and overlap index can be used to define fuzzy measures which depend on the specific data of each considered problem.Publication Open Access Creación de un sistema para la aplicación de redes neuronales convolucionales en un entorno de visión artificial(2020) Errea López, Adrián; Galar Idoate, Mikel; Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación; Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola TeknikoaEl trabajo consiste en realizar una aplicación capaz de integrar el lenguaje de programación Python y todas sus funcionalidades en un entorno de C++ para la captura y el procesado de imágenes en tiempo real a partir de una cámara utilizada en entornos de producción de visión artificial. Concretamente, se realizará una aplicación que sea capaz de conectarse a una cámara industrial, configurarla y poder capturar fotos de manera que sea la base para aplicar Deep Learning sobre esas imágenes. Tras ello, y mediante la propia integración de Python en C++, se aplicarán redes neuronales convolucionales a la imagen obtenida por la aplicación, con el fin de obtener un resultado para cada imagen (clasificación).Publication Open Access d-Choquet integrals: Choquet integrals based on dissimilarities(Elsevier, 2020) Bustince Sola, Humberto; Mesiar, Radko; Fernández Fernández, Francisco Javier; Galar Idoate, Mikel; Paternain Dallo, Daniel; Altalhi, A. H.; Pereira Dimuro, Graçaliz; Bedregal, Benjamin; Takáč, Zdenko; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Estadística, Informática y Matemáticas; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA13The paper introduces a new class of functions from [0,1]n to [0,n] called d-Choquet integrals. These functions are a generalization of the 'standard' Choquet integral obtained by replacing the difference in the definition of the usual Choquet integral by a dissimilarity function. In particular, the class of all d-Choquet integrals encompasses the class of all 'standard' Choquet integrals but the use of dissimilarities provides higher flexibility and generality. We show that some d-Choquet integrals are aggregation/pre-aggregation/averaging/functions and some of them are not. The conditions under which this happens are stated and other properties of the d-Choquet integrals are studied.Publication Open Access A deep learning approach to aerial LiDAR point cloud segmentation(2021) Gutiérrez Lancho, Christian; Galar Idoate, Mikel; Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación; Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola TeknikoaGracias a la evolución tecnológica experimentada estos últimos años, cada vez disponemos de hardware más potente a precios más bajos. Esto ha provocado que la información que seamos capaces de procesar sea cada vez más compleja, pudiendo trabajar, por ejemplo, con imágenes sin dificultad. Sin embargo, existe un tipo de dato algo más complejo que las imágenes que está cogiendo cada vez más fuerza, un tipo concreto de dato 3D, las nubes de puntos.Publication Open Access A deep learning approach to an enhanced building footprint and road detection in high-resolution satellite imagery(MDPI, 2021) Ayala Lauroba, Christian; Sesma Redín, Rubén; Aranda, Carlos; Galar Idoate, Mikel; Institute of Smart Cities - ISC; Gobierno de Navarra / Nafarroako GobernuaThe detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology.Publication Open Access A Deep Learning approach to land use classification in high resolution satellite imagery(2020) Ayala Lauroba, Christian; Galar Idoate, Mikel; Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación; Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola TeknikoaA lo largo de los últimos años ha aumentado el interés y la necesidad de disponer de información de usos y coberturas del territorio fiable y actualizada, siendo numerosos los proyectos de carácter local, nacional e internacional cuyo objetivo es la creación y actualización de bases de datos de usos y ocupación del suelo. En los últimos años se han producido importantes avances tecnológicos en el sector de la teledetección y el tratamiento de imágenes de satélite. En Europa, se ha impulsado la investigación en el ámbito de la observación de la Tierra gracias al programa Copernicus gestionado por la Agencia Espacial Europea (ESA). Este proyecto se encuentra focalizado en la puesta a punto de una metodología para el seguimiento del grado de consolidación en las áreas de suelo en desarrollo urbano de las ciudades. Para tales fines se ha optado por segmentar semánticamente imágenes satelitales del programa Copernicus mediante la aplicación de innovadoras técnicas de Deep Learning. Los resultados obtenidos han sido comparados a los obtenidos mediante un proceso semiautomático, realizado por profesionales de teledetección.Publication Open Access Demographic bias in machine learning: measuring transference from dataset bias to model predictions(2024) Domínguez Catena, Iris; Galar Idoate, Mikel; Paternain Dallo, Daniel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta MatematikaAs artificial intelligence (AI) systems increasingly influence critical decisions in society, ensuring fairness and avoiding bias have become pressing challenges. This dissertation investigates demographic bias in machine learning, with a particular focus on measuring how bias transfers from datasets to model predictions. Using Facial Expression Recognition (FER) as a primary case study, we develop novel metrics and methodologies to quantify and analyze bias at both the dataset and model levels. The thesis makes several key contributions to the field of algorithmic fairness. We propose a comprehensive taxonomy of types of dataset bias and metrics available for each type. Through extensive evaluation on FER datasets, we demonstrate the effectiveness and limitations of these metrics in capturing different aspects of demographic bias. Additionally, we introduce DSAP (Demographic Similarity from Auxiliary Profiles), a novel method for comparing datasets based on their demographic properties. DSAP enables interpretable bias measurement and analysis of demographic shifts between datasets, providing valuable insights for dataset curation and model development. Our research includes in-depth experiments examining the propagation of representational and stereotypical biases from datasets to FER models. Our findings reveal that while representational bias tends to be mitigated during model training, stereotypical bias is more likely to persist in model predictions. Furthermore, we present a framework for measuring bias transference from datasets to models across various bias induction scenarios. This analysis uncovers complex relationships between dataset bias and resulting model bias, highlighting the need for nuanced approaches to bias mitigation. Throughout the dissertation, we emphasize the importance of considering both representational and stereotypical biases in AI systems. Our work demonstrates that these biases can manifest and propagate differently, necessitating tailored strategies for detection and mitigation. By providing robust methodologies for quantifying and analyzing demographic bias, this research contributes to the broader goal of developing fairer and more equitable AI systems. The insights and tools presented here have implications beyond FER, offering valuable approaches for addressing bias in various machine learning applications. This dissertation paves the way for future work in algorithmic fairness, emphasizing the need for continued research into bias measurement, mitigation strategies, and the development of more inclusive AI technologies.