Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion
dc.contributor.author | Pascual Casas, Rubén | |
dc.contributor.author | Maiza Coupin, Adrián Mikel | |
dc.contributor.author | Sesma Sara, Mikel | |
dc.contributor.author | Paternain Dallo, Daniel | |
dc.contributor.author | Galar Idoate, Mikel | |
dc.contributor.department | Estadística, Informática y Matemáticas | es_ES |
dc.contributor.department | Estatistika, Informatika eta Matematika | eu |
dc.contributor.department | Institute of Smart Cities - ISC | en |
dc.contributor.funder | Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA2023-11377 | |
dc.date.accessioned | 2025-02-24T07:39:47Z | |
dc.date.available | 2025-02-24T07:39:47Z | |
dc.date.issued | 2024-09-09 | |
dc.date.updated | 2025-02-24T07:33:39Z | |
dc.description.abstract | This paper addresses the challenge of generating unlimited new and distinct characters that encompass the style and shared visual characteristics of a limited set of human designed characters. This is a relevant problem in the audiovisual industry, as the ability to rapidly produce original characters that adhere to specific characteristics greatly increases the possibilities in the production of movies, series, or video games. Our solution is built upon DreamBooth, a widely extended fine-tuning method for text-to-image models. We propose an adaptation focusing on two main challenges: the impracticality of relying on detailed image prompts for character description and the few-shot learning scenario with a limited set of characters available for training. To solve these issues, we introduce additional character-specific tokens to DreamBooth training and remove its class-specific regularization dataset. For an unlimited generation of characters, we propose the usage of random tokens and random embeddings. This proposal is tested on two specialized datasets and the results shows our method¿s capability to produce diverse characters that adhere to a style and visual characteristics. An ablation study to analyze the contributions of the proposed modifications is also developed. | en |
dc.description.sponsorship | This work has been funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, with the project PID2022-136627NB-I00, by the Government of Navarre under the project 0011-1365-2022-000130, and by the Public University of Navarra under the project PJUPNA2023-11377. The Scary and Virus datasets are designed by Freepik. This research received support from an FPU grant (Formación de Profesorado Universitario) awarded by the Spanish Ministry of Science and Innovation (MCINN) to Rubén Pascual. | |
dc.format.mimetype | application/pdf | en |
dc.identifier.citation | Pascual, R., Maiza, A., Sesma-Sara, M., Paternain, D., Galar, M. (2024) Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion. In Poggio, T., Comminiello, D., Morabito, F. C., Vellasco, M., Uncini, A., Scarpiniti, M., Hammer, B., Chen, B., Gori, M., Dauwels, J., Kuh, A., Tian, Z., Tanaka, T., Grassucci, E., Took, C. C., Ricci, E., Scardapane, S., Mitsufuji, Y., Silvestri, F., Squartini, S., Venayagamoorthy, G. K., Principi, E., Zhou, J., Soda, P., Xu, Z., Ji, H., Liwicki, M., Amerini, I., Roy, A., Príncipe, J. C., Sperduti, A., Duro, R., Tobar, F., Bacciu, D., Qin, K., Guarrasi, V., Ludermir, T. B., Hirose, A., Kasabov, N., Jayne C, 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN60899.2024.10651300 | |
dc.identifier.doi | 10.1109/IJCNN60899.2024.10651300 | |
dc.identifier.isbn | 979-8-3503-5931-2 | |
dc.identifier.uri | https://academica-e.unavarra.es/handle/2454/53539 | |
dc.language.iso | eng | |
dc.publisher | IEEE | |
dc.relation.ispartof | In Poggio, T.; Comminiello, D.; Morabito, F. C.; Vellasco, M.; Uncini, A.; Scarpiniti, M.; Hammer, B.; Chen, B.; Gori, M.; Dauwels, J.; Kuh, A.; Tian, Z.; Tanaka, T.; Grassucci, E.; Took, C. C.; Ricci, E.; Scardapane, S.; Mitsufuji, Y.; Silvestri, F.; Squartini, S.; Venayagamoorthy, G. K.; Principi, E.; Zhou, J.; Soda, P.; Xu, Z.; Ji, H.; Liwicki, M.; Amerini, I.; Roy, A.; Príncipe, J. C.; Sperduti, A.; Duro, R.; Tobar, F.; Bacciu, D.; Qin, K.; Guarrasi, V.; Ludermir, T. B.; Hirose, A.; Kasabov, N.; Jayne, C. 2024 International Joint Conference on Neural Networks (IJCNN). IEEE; 2024. p. 1-8 | |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136627NB-I00/ES/ | |
dc.relation.projectID | info:eu-repo/grantAgreement/Gobierno de Navarra//0011-1365-2022-000130/ | |
dc.relation.publisherversion | https://doi.org/10.1109/IJCNN60899.2024.10651300 | |
dc.rights | © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work. | |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | |
dc.subject | Training | en |
dc.subject | Industries | en |
dc.subject | Visualization | en |
dc.subject | Video games | en |
dc.subject | Neural networks | en |
dc.subject | Text to image | en |
dc.subject | Focusing | en |
dc.title | Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion | en |
dc.type | info:eu-repo/semantics/conferenceObject | |
dc.type.version | info:eu-repo/semantics/acceptedVersion | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | fc089a3b-9c89-4fd7-8685-ac5a2e812a27 | |
relation.isAuthorOfPublication | 3a541442-8e82-49d5-903d-60e0aedbc1f6 | |
relation.isAuthorOfPublication | ca16c024-51e4-4f8f-b457-dc5307be32d9 | |
relation.isAuthorOfPublication | 44c7a308-9c21-49ef-aa03-b45c2c5a06fd | |
relation.isAuthorOfPublication.latestForDiscovery | fc089a3b-9c89-4fd7-8685-ac5a2e812a27 |