Otazu Redín, Judit2024-10-072024https://academica-e.unavarra.es/handle/2454/52031In response to the exponential growth and increasing adoption of Multi-Modal Large Language Models, this project aims to explore their application in a critical field: the verification and validation of identity documents. These models, which effectively integrate image, text, video, and audio processing, are proposed as potential improvements over traditional systems specialized in specific tasks. The research will compare the effectiveness of MM-LLMs against dedicated models, including both commercial and open-source solutions, in key tasks such as classification, image quality, fraud detection, OCR (Optical Character Recognition) and Entity Mapping. Additionally, the explainability of these multimodal models will be analyzed, offering a transparent alternative to the opacity of the ’black box’ typically associated with artificial intelligence. The study also recognizes and addresses the challenges that arise from the substantial hardware demands and potential latency issues inherent in these advanced systems.application/pdfengLarge Language ModelMulti-Modal Large Language ModelNatural Language ProcessingComputer VisionTransformersEmbeddingsCNNOCR (Optical Character Recognition)Document AuthenticityAnti-spoofingPromptingResearch on Multi-Modal Large Language Models and their application for the verification and validation of identity documentsinfo:eu-repo/semantics/masterThesis2024-10-07info:eu-repo/semantics/embargoedAccess