Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion

dc.contributor.authorPascual Casas, Rubén
dc.contributor.authorMaiza Coupin, Adrián Mikel
dc.contributor.authorSesma Sara, Mikel
dc.contributor.authorPaternain Dallo, Daniel
dc.contributor.authorGalar Idoate, Mikel
dc.contributor.departmentEstadística, Informática y Matemáticases_ES
dc.contributor.departmentEstatistika, Informatika eta Matematikaeu
dc.contributor.departmentInstitute of Smart Cities - ISCen
dc.contributor.funderUniversidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA2023-11377
dc.date.accessioned2025-02-24T07:39:47Z
dc.date.available2025-02-24T07:39:47Z
dc.date.issued2024-09-09
dc.date.updated2025-02-24T07:33:39Z
dc.description.abstractThis paper addresses the challenge of generating unlimited new and distinct characters that encompass the style and shared visual characteristics of a limited set of human designed characters. This is a relevant problem in the audiovisual industry, as the ability to rapidly produce original characters that adhere to specific characteristics greatly increases the possibilities in the production of movies, series, or video games. Our solution is built upon DreamBooth, a widely extended fine-tuning method for text-to-image models. We propose an adaptation focusing on two main challenges: the impracticality of relying on detailed image prompts for character description and the few-shot learning scenario with a limited set of characters available for training. To solve these issues, we introduce additional character-specific tokens to DreamBooth training and remove its class-specific regularization dataset. For an unlimited generation of characters, we propose the usage of random tokens and random embeddings. This proposal is tested on two specialized datasets and the results shows our method¿s capability to produce diverse characters that adhere to a style and visual characteristics. An ablation study to analyze the contributions of the proposed modifications is also developed.en
dc.description.sponsorshipThis work has been funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, with the project PID2022-136627NB-I00, by the Government of Navarre under the project 0011-1365-2022-000130, and by the Public University of Navarra under the project PJUPNA2023-11377. The Scary and Virus datasets are designed by Freepik. This research received support from an FPU grant (Formación de Profesorado Universitario) awarded by the Spanish Ministry of Science and Innovation (MCINN) to Rubén Pascual.
dc.format.mimetypeapplication/pdfen
dc.identifier.citationPascual, R., Maiza, A., Sesma-Sara, M., Paternain, D., Galar, M. (2024) Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion. In Poggio, T., Comminiello, D., Morabito, F. C., Vellasco, M., Uncini, A., Scarpiniti, M., Hammer, B., Chen, B., Gori, M., Dauwels, J., Kuh, A., Tian, Z., Tanaka, T., Grassucci, E., Took, C. C., Ricci, E., Scardapane, S., Mitsufuji, Y., Silvestri, F., Squartini, S., Venayagamoorthy, G. K., Principi, E., Zhou, J., Soda, P., Xu, Z., Ji, H., Liwicki, M., Amerini, I., Roy, A., Príncipe, J. C., Sperduti, A., Duro, R., Tobar, F., Bacciu, D., Qin, K., Guarrasi, V., Ludermir, T. B., Hirose, A., Kasabov, N., Jayne C, 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN60899.2024.10651300
dc.identifier.doi10.1109/IJCNN60899.2024.10651300
dc.identifier.isbn979-8-3503-5931-2
dc.identifier.urihttps://academica-e.unavarra.es/handle/2454/53539
dc.language.isoeng
dc.publisherIEEE
dc.relation.ispartofIn Poggio, T.; Comminiello, D.; Morabito, F. C.; Vellasco, M.; Uncini, A.; Scarpiniti, M.; Hammer, B.; Chen, B.; Gori, M.; Dauwels, J.; Kuh, A.; Tian, Z.; Tanaka, T.; Grassucci, E.; Took, C. C.; Ricci, E.; Scardapane, S.; Mitsufuji, Y.; Silvestri, F.; Squartini, S.; Venayagamoorthy, G. K.; Principi, E.; Zhou, J.; Soda, P.; Xu, Z.; Ji, H.; Liwicki, M.; Amerini, I.; Roy, A.; Príncipe, J. C.; Sperduti, A.; Duro, R.; Tobar, F.; Bacciu, D.; Qin, K.; Guarrasi, V.; Ludermir, T. B.; Hirose, A.; Kasabov, N.; Jayne, C. 2024 International Joint Conference on Neural Networks (IJCNN). IEEE; 2024. p. 1-8
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136627NB-I00/ES/
dc.relation.projectIDinfo:eu-repo/grantAgreement/Gobierno de Navarra//0011-1365-2022-000130/
dc.relation.publisherversionhttps://doi.org/10.1109/IJCNN60899.2024.10651300
dc.rights© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work.
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subjectTrainingen
dc.subjectIndustriesen
dc.subjectVisualizationen
dc.subjectVideo gamesen
dc.subjectNeural networksen
dc.subjectText to imageen
dc.subjectFocusingen
dc.titleEnhancing DreamBooth with LoRA for generating unlimited characters with stable diffusionen
dc.typeinfo:eu-repo/semantics/conferenceObject
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
dspace.entity.typePublication
relation.isAuthorOfPublicationfc089a3b-9c89-4fd7-8685-ac5a2e812a27
relation.isAuthorOfPublication3a541442-8e82-49d5-903d-60e0aedbc1f6
relation.isAuthorOfPublicationca16c024-51e4-4f8f-b457-dc5307be32d9
relation.isAuthorOfPublication44c7a308-9c21-49ef-aa03-b45c2c5a06fd
relation.isAuthorOfPublication.latestForDiscoveryfc089a3b-9c89-4fd7-8685-ac5a2e812a27

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pascual_Enhancing.pdf
Size:
13.67 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: