Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion

Pascual Casas, Rubén; Maiza Coupin, Adrián Mikel; Sesma Sara, Mikel; Paternain Dallo, Daniel; Galar Idoate, Mikel

Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion

dc.contributor.author	Pascual Casas, Rubén
dc.contributor.author	Maiza Coupin, Adrián Mikel
dc.contributor.author	Sesma Sara, Mikel
dc.contributor.author	Paternain Dallo, Daniel
dc.contributor.author	Galar Idoate, Mikel
dc.contributor.department	Estadística, Informática y Matemáticas	es_ES
dc.contributor.department	Estatistika, Informatika eta Matematika	eu
dc.contributor.department	Institute of Smart Cities - ISC	en
dc.contributor.funder	Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa, PJUPNA2023-11377
dc.date.accessioned	2025-02-24T07:39:47Z
dc.date.available	2025-02-24T07:39:47Z
dc.date.issued	2024-09-09
dc.date.updated	2025-02-24T07:33:39Z
dc.description.abstract	This paper addresses the challenge of generating unlimited new and distinct characters that encompass the style and shared visual characteristics of a limited set of human designed characters. This is a relevant problem in the audiovisual industry, as the ability to rapidly produce original characters that adhere to specific characteristics greatly increases the possibilities in the production of movies, series, or video games. Our solution is built upon DreamBooth, a widely extended fine-tuning method for text-to-image models. We propose an adaptation focusing on two main challenges: the impracticality of relying on detailed image prompts for character description and the few-shot learning scenario with a limited set of characters available for training. To solve these issues, we introduce additional character-specific tokens to DreamBooth training and remove its class-specific regularization dataset. For an unlimited generation of characters, we propose the usage of random tokens and random embeddings. This proposal is tested on two specialized datasets and the results shows our method¿s capability to produce diverse characters that adhere to a style and visual characteristics. An ablation study to analyze the contributions of the proposed modifications is also developed.	en
dc.description.sponsorship	This work has been funded by MCIN/AEI/10.13039/501100011033/FEDER, UE, with the project PID2022-136627NB-I00, by the Government of Navarre under the project 0011-1365-2022-000130, and by the Public University of Navarra under the project PJUPNA2023-11377. The Scary and Virus datasets are designed by Freepik. This research received support from an FPU grant (Formación de Profesorado Universitario) awarded by the Spanish Ministry of Science and Innovation (MCINN) to Rubén Pascual.
dc.format.mimetype	application/pdf	en
dc.identifier.citation	Pascual, R., Maiza, A., Sesma-Sara, M., Paternain, D., Galar, M. (2024) Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion. In Poggio, T., Comminiello, D., Morabito, F. C., Vellasco, M., Uncini, A., Scarpiniti, M., Hammer, B., Chen, B., Gori, M., Dauwels, J., Kuh, A., Tian, Z., Tanaka, T., Grassucci, E., Took, C. C., Ricci, E., Scardapane, S., Mitsufuji, Y., Silvestri, F., Squartini, S., Venayagamoorthy, G. K., Principi, E., Zhou, J., Soda, P., Xu, Z., Ji, H., Liwicki, M., Amerini, I., Roy, A., Príncipe, J. C., Sperduti, A., Duro, R., Tobar, F., Bacciu, D., Qin, K., Guarrasi, V., Ludermir, T. B., Hirose, A., Kasabov, N., Jayne C, 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN60899.2024.10651300
dc.identifier.doi	10.1109/IJCNN60899.2024.10651300
dc.identifier.isbn	979-8-3503-5931-2
dc.identifier.uri	https://academica-e.unavarra.es/handle/2454/53539
dc.language.iso	eng
dc.publisher	IEEE
dc.relation.ispartof	In Poggio, T.; Comminiello, D.; Morabito, F. C.; Vellasco, M.; Uncini, A.; Scarpiniti, M.; Hammer, B.; Chen, B.; Gori, M.; Dauwels, J.; Kuh, A.; Tian, Z.; Tanaka, T.; Grassucci, E.; Took, C. C.; Ricci, E.; Scardapane, S.; Mitsufuji, Y.; Silvestri, F.; Squartini, S.; Venayagamoorthy, G. K.; Principi, E.; Zhou, J.; Soda, P.; Xu, Z.; Ji, H.; Liwicki, M.; Amerini, I.; Roy, A.; Príncipe, J. C.; Sperduti, A.; Duro, R.; Tobar, F.; Bacciu, D.; Qin, K.; Guarrasi, V.; Ludermir, T. B.; Hirose, A.; Kasabov, N.; Jayne, C. 2024 International Joint Conference on Neural Networks (IJCNN). IEEE; 2024. p. 1-8
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136627NB-I00/ES/
dc.relation.projectID	info:eu-repo/grantAgreement/Gobierno de Navarra//0011-1365-2022-000130/
dc.relation.publisherversion	https://doi.org/10.1109/IJCNN60899.2024.10651300
dc.rights	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work.
dc.rights.accessRights	info:eu-repo/semantics/openAccess
dc.subject	Training	en
dc.subject	Industries	en
dc.subject	Visualization	en
dc.subject	Video games	en
dc.subject	Neural networks	en
dc.subject	Text to image	en
dc.subject	Focusing	en
dc.title	Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion	en
dc.type	info:eu-repo/semantics/conferenceObject
dc.type.version	info:eu-repo/semantics/acceptedVersion
dspace.entity.type	Publication
relation.isAuthorOfPublication	fc089a3b-9c89-4fd7-8685-ac5a2e812a27
relation.isAuthorOfPublication	3a541442-8e82-49d5-903d-60e0aedbc1f6
relation.isAuthorOfPublication	ca16c024-51e4-4f8f-b457-dc5307be32d9
relation.isAuthorOfPublication	44c7a308-9c21-49ef-aa03-b45c2c5a06fd
relation.isAuthorOfPublication.latestForDiscovery	fc089a3b-9c89-4fd7-8685-ac5a2e812a27

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Pascual_Enhancing.pdf
Size:: 13.67 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Comunicaciones y ponencias de congresos DEIM - EIMS Biltzarretako komunikazioak eta txostenak
Comunicaciones y ponencias de congresos - Biltzarrak eta Argitalpenak
Comunicaciones y ponencias de congresos ISC - ISC biltzarretako komunikazioak eta txostenak