Machine learning for the classification of texts in hispanic literature by authors
dc.contributor.advisorTFE | Pagola Barrio, Miguel | |
dc.contributor.affiliation | Escuela Técnica Superior de Ingeniería Industrial, Informática y de Telecomunicación | es_ES |
dc.contributor.affiliation | Industria, Informatika eta Telekomunikazio Ingeniaritzako Goi Mailako Eskola Teknikoa | eu |
dc.contributor.author | Peñas Escribano, Lucas | |
dc.date.accessioned | 2025-02-18T15:26:40Z | |
dc.date.available | 2025-02-18T15:26:40Z | |
dc.date.issued | 2025 | |
dc.date.updated | 2025-02-18T13:33:01Z | |
dc.description.abstract | When we talk about a person´s style, we have made a probably unconscious exercise of patron recognition to be able to assure something like “he wouldn´t do that, it isn´t his style”. Taking this into account, it would be interesting to link what would be a more human and artistic matter like, “what is the style of this author?” with the usage of state of the art languague models to try to give an objective answer to that question. In this project, I merge some techniques learnt throught the degree (tokenization via bag of words, multiple classification methods…) with some new ones learnt through some investigation about how some recently famous languague models work (pre-trained Bert model, bidirectionality…). The main reasons why I have settled with this project are two. First, getting deeper into a kind of problem that I have previously faced to learn about new techniques and put them to the test along with the previous ones. And second, curiosity on how machine learning will approach a problem that I have seen solved by humans. The two main final objectives would be to obtain a method to classify the texts between author with a high enough level of confidence and to be able to extract the traits from each author that the classifier investigated in order to make its predictions. In conclusion, this project is a combination of gathering data, performing already know procedures, investigating and updating the processes with more advanced techniques, comparing and analyzing results and finally trying to reach a conclusion that tells us how long the bridge that separates humans and machines in this topic is. | en |
dc.description.degree | Graduado o Graduada en Ingeniería Informática por la Universidad Pública de Navarra (Programa Internacional) | es_ES |
dc.description.degree | Informatika Ingeniaritzan Graduatua Nafarroako Unibertsitate Publikoan (Nazioarteko Programa) | eu |
dc.format.mimetype | application/pdf | en |
dc.identifier.uri | https://academica-e.unavarra.es/handle/2454/53459 | |
dc.language.iso | eng | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | |
dc.subject | Python | en |
dc.subject | Machine learning | en |
dc.subject | Languague model | en |
dc.subject | Text classification | en |
dc.subject | Bag of words | en |
dc.subject | BERT | en |
dc.subject | Bidirectionality | en |
dc.subject | Neural network | en |
dc.subject | Random forest | en |
dc.subject | Support Vector Machine | en |
dc.subject | Stochastic Gradient Descent | en |
dc.subject | Extreme Gradient Boosting | en |
dc.title | Machine learning for the classification of texts in hispanic literature by authors | en |
dc.type | info:eu-repo/semantics/bachelorThesis | |
dspace.entity.type | Publication | |
relation.isAdvisorTFEOfPublication | e5ab14f5-4f2e-4000-a415-0a7c3b28ec78 | |
relation.isAdvisorTFEOfPublication.latestForDiscovery | e5ab14f5-4f2e-4000-a415-0a7c3b28ec78 |