Open Access
Less can be more: representational vs. stereotypical gender bias in facial expression recognition
(Springer, 2024-10-14) Domínguez Catena, Iris; Paternain Dallo, Daniel; Jurío Munárriz, Aránzazu; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Universidad Publica de Navarra / Nafarroako Unibertsitate Publikoa
Machine learning models can inherit biases from their training data, leading to discriminatory or inaccurate predictions. This is particularly concerning with the increasing use of large, unsupervised datasets for training foundational models. Traditionally, demographic biases within these datasets have not been well-understood, limiting our ability to understand how they propagate to the models themselves. To address this issue, this paper investigates the propagation of demographic biases from datasets into machine learning models. We focus on the gender demographic component, analyzing two types of bias: representational and stereotypical. For our analysis, we consider the domain of facial expression recognition (FER), a field known to exhibit biases in most popular datasets. We use Affectnet, one of the largest FER datasets, as our baseline for carefully designing and generating subsets that incorporate varying strengths of both representational and stereotypical bias. Subsequently, we train several models on these biased subsets, evaluating their performance on a common test set to assess the propagation of bias into the models¿ predictions. Our results show that representational bias has a weaker impact than expected. Models exhibit a good generalization ability even in the absence of one gender in the training dataset. Conversely, stereotypical bias has a significantly stronger impact, primarily concentrated on the biased class, although it can also influence predictions for unbiased classes. These results highlight the need for a bias analysis that differentiates between types of bias, which is crucial for the development of effective bias mitigation strategies.
Open Access
Metrics for dataset demographic bias: a case study on facial expression recognition
(IEEE, 2024) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Universidad Pública de Navarra - Nafarroako Unibertsitate Publikoa
Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
Open Access
DSAP: analyzing bias through demographic comparison of datasets
(Elsevier, 2024-10-29) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Universidad Publica de Navarra / Nafarroako Unibertsitate Publikoa ; Gobierno de Navarra / Nafarroako Gobernua
In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset's demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.
Open Access
Demographic bias in machine learning: measuring transference from dataset bias to model predictions
(2024) Domínguez Catena, Iris; Galar Idoate, Mikel; Paternain Dallo, Daniel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
As artificial intelligence (AI) systems increasingly influence critical decisions in society, ensuring fairness and avoiding bias have become pressing challenges. This dissertation investigates demographic bias in machine learning, with a particular focus on measuring how bias transfers from datasets to model predictions. Using Facial Expression Recognition (FER) as a primary case study, we develop novel metrics and methodologies to quantify and analyze bias at both the dataset and model levels. The thesis makes several key contributions to the field of algorithmic fairness. We propose a comprehensive taxonomy of types of dataset bias and metrics available for each type. Through extensive evaluation on FER datasets, we demonstrate the effectiveness and limitations of these metrics in capturing different aspects of demographic bias. Additionally, we introduce DSAP (Demographic Similarity from Auxiliary Profiles), a novel method for comparing datasets based on their demographic properties. DSAP enables interpretable bias measurement and analysis of demographic shifts between datasets, providing valuable insights for dataset curation and model development. Our research includes in-depth experiments examining the propagation of representational and stereotypical biases from datasets to FER models. Our findings reveal that while representational bias tends to be mitigated during model training, stereotypical bias is more likely to persist in model predictions. Furthermore, we present a framework for measuring bias transference from datasets to models across various bias induction scenarios. This analysis uncovers complex relationships between dataset bias and resulting model bias, highlighting the need for nuanced approaches to bias mitigation. Throughout the dissertation, we emphasize the importance of considering both representational and stereotypical biases in AI systems. Our work demonstrates that these biases can manifest and propagate differently, necessitating tailored strategies for detection and mitigation. By providing robust methodologies for quantifying and analyzing demographic bias, this research contributes to the broader goal of developing fairer and more equitable AI systems. The insights and tools presented here have implications beyond FER, offering valuable approaches for addressing bias in various machine learning applications. This dissertation paves the way for future work in algorithmic fairness, emphasizing the need for continued research into bias measurement, mitigation strategies, and the development of more inclusive AI technologies.
Open Access
Gender stereotyping impact in facial expression recognition
(Springer, 2023) Domínguez Catena, Iris; Paternain Dallo, Daniel; Galar Idoate, Mikel; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute of Smart Cities - ISC; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Facial Expression Recognition (FER) uses images of faces to identify the emotional state of users, allowing for a closer interaction between humans and autonomous systems. Unfortunately, as the images naturally integrate some demographic information, such as apparent age, gender, and race of the subject, these systems are prone to demographic bias issues. In recent years, machine learning-based models have become the most popular approach to FER. These models require training on large datasets of facial expression images, and their generalization capabilities are strongly related to the characteristics of the dataset. In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not, embedding social stereotypes into the datasets and generating a potential for harm. Although this type of bias has been overlooked so far, it is important to understand the impact it may have in the context of FER. To do so, we use a popular FER dataset, FER+, to generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels. We then proceed to measure the discrepancy between the performance of the models trained on these datasets for the apparent gender groups. We observe a discrepancy in the recognition of certain emotions between genders of up to 29 % under the worst bias conditions. Our results also suggest a safety range for stereotypical bias in a dataset that does not appear to produce stereotypical bias in the resulting model. Our findings support the need for a thorough bias analysis of public datasets in problems like FER, where a global balance of demographic representation can still hide other types of bias that harm certain demographic groups.