Open Access
Mateda-2.0: estimation of distribution algorithms in MATLAB
(Foundation for Open Access Statistics, 2010-07-26) Larrañaga, Pedro; Santana, Roberto; Bielza, Concha; Lozano, José Antonio; Echegoyen Arruti, Carlos; Mendiburu, Alexander; Armañanzas, Rubén; Shakya, Siddartha; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorporate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs.
Open Access
On the limits of effectiveness in estimation of distribution algorithms
(IEEE, 2011) Echegoyen Arruti, Carlos; Zhang, Qingfu; Mendiburu, Alexander; Santana, Roberto; Lozano, José Antonio; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
Which problems a search algorithm can effectively solve is a fundamental issue that plays a key role in understanding and developing algorithms. In order to study the ability limit of estimation of distribution algorithms (EDAs), this paper experimentally tests three different EDA implementations on a sequence of additively decomposable functions (ADFs) with an increasing number of interactions among binary variables. The results show that the ability of EDAs to solve problems could be lost immediately when the degree of variable interaction is larger than a threshold. We argue that this phase-transition phenomenon is closely related with the computational restrictions imposed in the learning step of this type of algorithms. Moreover, we demonstrate how the use of unrestricted Bayesian networks rapidly becomes inefficient as the number of sub-functions in an ADF increases. The study conducted in this paper is useful in order to identify patterns of behavior in EDAs and, thus, improve their performances.
Open Access
Large-scale unsupervised spatio-temporal semantic analysis of vast regions from satellite images sequences
(Springer, 2024) Echegoyen Arruti, Carlos; Pérez, Aritz; Santafé Rodrigo, Guzmán; Pérez Goya, Unai; Ugarte Martínez, María Dolores; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika; Institute for Advanced Materials and Mathematics - INAMAT2; Universidad Pública de Navarra / Nafarroako Unibertsitate Publikoa
Temporal sequences of satellite images constitute a highly valuable and abundant resource for analyzing regions of interest. However, the automatic acquisition of knowledge on a large scale is a challenging task due to different factors such as the lack of precise labeled data, the definition and variability of the terrain entities, or the inherent complexity of the images and their fusion. In this context, we present a fully unsupervised and general methodology to conduct spatio-temporal taxonomies of large regions from sequences of satellite images. Our approach relies on a combination of deep embeddings and time series clustering to capture the semantic properties of the ground and its evolution over time, providing a comprehensive understanding of the region of interest. The proposed method is enhanced by a novel procedure specifically devised to refine the embedding and exploit the underlying spatio-temporal patterns. We use this methodology to conduct an in-depth analysis of a 220 km region in northern Spain in different settings. The results provide a broad and intuitive perspective of the land where large areas are connected in a compact and well-structured manner, mainly based on climatic, phytological, and hydrological factors.
Open Access
On the taxonomy of optimization problems under estimation of distribution algorithms
(MIT Press Journals, 2013-09-13) Echegoyen Arruti, Carlos; Mendiburu, Alexander; Santana, Roberto; Lozano, José Antonio; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
Understanding the relationship between a search algorithm and the space of problems is a fundamental issue in the optimization field. In this paper, we lay the foundations to elaborate taxonomies of problems under estimation of distribution algorithms (EDAs). By using an infinite population model and assuming that the selection operator is based on the rank of the solutions, we group optimization problems according to the behavior of the EDA. Throughout the definition of an equivalence relation between functions it is possible to partition the space of problems in equivalence classes in which the algorithm has the same behavior. We show that only the probabilistic model is able to generate different partitions of the set of possible problems and hence, it predetermines the number of different behaviors that the algorithm can exhibit. As a natural consequence of our definitions, all the objective functions are in the same equivalence class when the algorithm does not impose restrictions to the probabilistic model. The taxonomy of problems, which is also valid for finite populations, is studied in depth for a simple EDA that considers independence among the variables of the problem. We provide the sufficient and necessary condition to decide the equivalence between functions and then we develop the operators to describe and count the members of a class. In addition, we show the intrinsic relation between univariate EDAs and the neighborhood system induced by the Hamming distance by proving that all the functions in the same class have the same number of local optima and that they are in the same ranking positions. Finally, we carry out numerical simulations in order to analyze the different behaviors that the algorithm can exhibit for the functions defined over the search space {0,1}^3
Open Access
The impact of exact probabilistic learning algorithms in EDAs based on bayesian networks
(Springer, 2008) Echegoyen Arruti, Carlos; Santana, Roberto; Lozano, José Antonio; Larrañaga, Pedro; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
This paper discusses exact learning of Bayesian networks in estimation of distribution algorithms. The estimation of Bayesian network algorithm (EBNA) is used to analyze the impact of learning the optimal (exact) structure in the search. By applying recently introduced methods that allow learning optimal Bayesian networks, we investigate two important issues in EDAs. First, we analyze the question of whether learning more accurate (exact) models of the dependencies implies a better performance of EDAs. Secondly, we are able to study the way in which the problem structure is translated into the probabilistic model when exact learning is accomplished. The results obtained reveal that the quality of the problem information captured by the probability model can improve when the accuracy of the learning algorithm employed is increased. However, improvements in model accuracy do not always imply a more efficient search.
Open Access
Analyzing the k most probable solutions in EDAs based on bayesian networks
(Springer, 2010) Echegoyen Arruti, Carlos; Mendiburu, Alexander; Santana, Roberto; Lozano, José Antonio; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
Estimation of distribution algorithms (EDAs) have been successfully applied to a wide variety of problems but, for themost complex approaches, there is no clear understanding of the way these algorithms complete the search. For that reason, in this work we exploit the probabilistic models that EDAs based on Bayesian networks are able to learn in order to provide new information about their behavior. Particularly, we analyze the k solutions with the highest probability in the distributions estimated during the search. In order to study the relationship between the probabilistic model and the fitness function, we focus on calculating, for the k most probable solutions (MPSs), the probability values, the function values and the correlation between both sets of values at each step of the algorithm. Furthermore, the objective functions of the k MPSs are contrasted with the k best individuals in the population. We complete the analysis by calculating the position of the optimum in the k MPSs during the search and the genotypic diversity of these solutions. We carry out the analysis by optimizing functions of different natures such as Trap5, two variants of Ising spin glass and Max-SAT. The results not only show information about the relationship between the probabilistic model and the fitness function, but also allow us to observe characteristics of the search space, the quality of the setup of the parameters and even distinguish between successful and unsuccessful runs.
Open Access
Comprehensive characterization of the behaviors of estimation of distribution algorithms
(Elsevier, 2015-04-20) Echegoyen Arruti, Carlos; Santana, Roberto; Mendiburu, Alexander; Lozano, José Antonio; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
Estimation of distribution algorithms (EDAs) are a successful example of how to use machine learning techniques for designing robust and efficient heuristic search algorithms. Understanding the relationship between EDAs and the space of optimization problems is a fundamental issue for the successful application of this type of algorithms. A step forward in this matter is to create a taxonomy of optimization problems according to the different behaviors that an EDA can exhibit. This paper substantially extends previous work in the proposal of a taxonomy of problems for univariate EDAs, mainly by generalizing those results to EDAs that are able to deal with multivariate dependences among the variables of the problem. Through the definition of an equivalence relation between functions, it is possible to partition the space of problems into equivalence classes in which the algorithm has the same behavior. We provide a sufficient and necessary condition to determine the equivalence between functions. This condition is based on a set of matrices which provides a novel encoding of the relationship between the function and the probabilistic model used by the algorithm. The description of the equivalent functions belonging to a class is studied in depth for EDAs whose probabilistic model is given by a chordal Markov network. Assuming this class of factorization, we unveil the intrinsic connection between the behaviors of EDAs and neighborhood systems defined over the search space. In addition, we carry out numerical simulations that effectively reveal the different behaviors of EDAs for the injective functions defined over the search space {0,1}^3 . Finally, we provide a novel approach to extend the analysis of equivalence classes to non-injective functions.
Open Access
Toward understanding EDAs based on bayesian networks through a quantitative analysis
(IEEE, 2011-06-28) Echegoyen Arruti, Carlos; Mendiburu, Alexander; Santana, Roberto; Lozano, José Antonio; Estadística, Informática y Matemáticas; Estatistika, Informatika eta Matematika
The successful application of estimation of distribution algorithms (EDAs) to solve different kinds of problems has reinforced their candidature as promising black-box optimization tools. However, their internal behavior is still not completely understood and therefore it is necessary to work in this direction in order to advance their development. This paper presents a methodology of analysis which provides new information about the behavior of EDAs by quantitatively analyzing the probabilistic models learned during the search. We particularly focus on calculating the probabilities of the optimal solutions, the most probable solution given by the model and the best individual of the population at each step of the algorithm. We carry out the analysis by optimizing functions of different nature such as Trap5, two variants of Ising spin glass and Max-SAT. By using different structures in the probabilistic models, we also analyze the impact of the structural model accuracy in the quantitative behavior of EDAs. In addition, the objective function values of our analyzed key solutions are contrasted with their probability values in order to study the connection between function and probabilistic models. The results not only show information about the internal behavior of EDAs, but also about the quality of the optimization process and setup of the parameters, the relationship between the probabilistic model and the fitness function, and even about the problem itself. Furthermore, the results allow us to discover common patterns of behavior in EDAs and propose new ideas in the development of this type of algorithms.