Machine learning for the classification of texts in hispanic literature by authors

dc.contributor.advisorTFEPagola Barrio, Miguel
dc.contributor.affiliationEscuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicaciónes_ES
dc.contributor.affiliationIndustria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola Teknikoaeu
dc.contributor.authorPeñas Escribano, Lucas
dc.date.accessioned2025-02-18T15:26:40Z
dc.date.available2025-02-18T15:26:40Z
dc.date.issued2025
dc.date.updated2025-02-18T13:33:01Z
dc.description.abstractWhen we talk about a person´s style, we have made a probably unconscious exercise of patron recognition to be able to assure something like “he wouldn´t do that, it isn´t his style”. Taking this into account, it would be interesting to link what would be a more human and artistic matter like, “what is the style of this author?” with the usage of state of the art languague models to try to give an objective answer to that question. In this project, I merge some techniques learnt throught the degree (tokenization via bag of words, multiple classification methods…) with some new ones learnt through some investigation about how some recently famous languague models work (pre-trained Bert model, bidirectionality…). The main reasons why I have settled with this project are two. First, getting deeper into a kind of problem that I have previously faced to learn about new techniques and put them to the test along with the previous ones. And second, curiosity on how machine learning will approach a problem that I have seen solved by humans. The two main final objectives would be to obtain a method to classify the texts between author with a high enough level of confidence and to be able to extract the traits from each author that the classifier investigated in order to make its predictions. In conclusion, this project is a combination of gathering data, performing already know procedures, investigating and updating the processes with more advanced techniques, comparing and analyzing results and finally trying to reach a conclusion that tells us how long the bridge that separates humans and machines in this topic is.en
dc.description.degreeGraduado o Graduada en Ingeniería Informática por la Universidad Pública de Navarra (Programa Internacional)es_ES
dc.description.degreeInformatika Ingeniaritzan Graduatua Nafarroako Unibertsitate Publikoan (Nazioarteko Programa)eu
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttps://academica-e.unavarra.es/handle/2454/53459
dc.language.isoeng
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subjectPythonen
dc.subjectMachine learningen
dc.subjectLanguague modelen
dc.subjectText classificationen
dc.subjectBag of wordsen
dc.subjectBERTen
dc.subjectBidirectionalityen
dc.subjectNeural networken
dc.subjectRandom foresten
dc.subjectSupport Vector Machineen
dc.subjectStochastic Gradient Descenten
dc.subjectExtreme Gradient Boostingen
dc.titleMachine learning for the classification of texts in hispanic literature by authorsen
dc.typeinfo:eu-repo/semantics/bachelorThesis
dspace.entity.typePublication
relation.isAdvisorTFEOfPublicatione5ab14f5-4f2e-4000-a415-0a7c3b28ec78
relation.isAdvisorTFEOfPublication.latestForDiscoverye5ab14f5-4f2e-4000-a415-0a7c3b28ec78

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ML_for_the_classification_of_texts_by_authors.pdf
Size:
2.25 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: