Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion

Date

2024-09-09

Director

Publisher

IEEE
Acceso abierto / Sarbide irekia
Contribución a congreso / Biltzarrerako ekarpena
Versión aceptada / Onetsi den bertsioa

Project identifier

AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136627NB-I00/ES/ recolecta
Gobierno de Navarra//0011-1365-2022-000130
Impacto
OpenAlexGoogle Scholar
No disponible en Scopus

Abstract

This paper addresses the challenge of generating unlimited new and distinct characters that encompass the style and shared visual characteristics of a limited set of human designed characters. This is a relevant problem in the audiovisual industry, as the ability to rapidly produce original characters that adhere to specific characteristics greatly increases the possibilities in the production of movies, series, or video games. Our solution is built upon DreamBooth, a widely extended fine-tuning method for text-to-image models. We propose an adaptation focusing on two main challenges: the impracticality of relying on detailed image prompts for character description and the few-shot learning scenario with a limited set of characters available for training. To solve these issues, we introduce additional character-specific tokens to DreamBooth training and remove its class-specific regularization dataset. For an unlimited generation of characters, we propose the usage of random tokens and random embeddings. This proposal is tested on two specialized datasets and the results shows our method¿s capability to produce diverse characters that adhere to a style and visual characteristics. An ablation study to analyze the contributions of the proposed modifications is also developed.

Description

Keywords

Training, Industries, Visualization, Video games, Neural networks, Text to image, Focusing

Department

Estadística, Informática y Matemáticas / Estatistika, Informatika eta Matematika / Institute of Smart Cities - ISC

Faculty/School

Degree

Doctorate program

item.page.cita

Pascual, R., Maiza, A., Sesma-Sara, M., Paternain, D., Galar, M. (2024) Enhancing DreamBooth with LoRA for generating unlimited characters with stable diffusion. In Poggio, T., Comminiello, D., Morabito, F. C., Vellasco, M., Uncini, A., Scarpiniti, M., Hammer, B., Chen, B., Gori, M., Dauwels, J., Kuh, A., Tian, Z., Tanaka, T., Grassucci, E., Took, C. C., Ricci, E., Scardapane, S., Mitsufuji, Y., Silvestri, F., Squartini, S., Venayagamoorthy, G. K., Principi, E., Zhou, J., Soda, P., Xu, Z., Ji, H., Liwicki, M., Amerini, I., Roy, A., Príncipe, J. C., Sperduti, A., Duro, R., Tobar, F., Bacciu, D., Qin, K., Guarrasi, V., Ludermir, T. B., Hirose, A., Kasabov, N., Jayne C, 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE. https://doi.org/10.1109/IJCNN60899.2024.10651300

item.page.rights

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work.

Los documentos de Academica-e están protegidos por derechos de autor con todos los derechos reservados, a no ser que se indique lo contrario.