Searching for novel clustering programs

Conferences Papers
  1. Leonardo Trujillo, Enrique Naredo and Yuliana Martínez. Preliminary Study of Bloat in Genetic Programming with Behavior-Based Search. pages 293–305, Springer International Publishing, 2013. URL, DOI BibTeX

    	author = "Trujillo, Leonardo and Naredo, Enrique and Mart{\'i}nez, Yuliana",
    	title = "Preliminary Study of Bloat in Genetic Programming with Behavior-Based Search",
    	booktitle = "EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV: International Conference held at Leiden University, July 10-13, 2013",
    	year = 2013,
    	publisher = "Springer International Publishing",
    	address = "Heidelberg",
    	pages = "293--305",
    	abstract = "Bloat is one of the most interesting theoretical problems in genetic programming (GP), and one of the most important pragmatic limitations in the development of real-world GP solutions. Over the years, many theories regarding the causes of bloat have been proposed and a variety of bloat control methods have been developed. It seems that one of the underlying causes of bloat is the search for fitness; as the fitness-causes-bloat theory states, selective bias towards fitness seems to unavoidably lead the search towards programs with a large size. Intuitively, however, abandoning fitness does not appear to be an option. This paper, studies a GP system that does not require an explicit fitness function, instead it relies on behavior-based search, where programs are described by the behavior they exhibit and selective pressure is biased towards unique behaviors using the novelty search algorithm. Initial results are encouraging, the average program size of the evolving population does not increase with novelty search; i.e., bloat is avoided by focusing on novelty instead of quality.",
    	isbn = "978-3-319-01128-8",
    	doi = "10.1007/978-3-319-01128-8_19",
    	url = ""

Novelty search (NS) is an open-ended evolutionary algorithm that eliminates the need for an explicit objective function. Instead, NS focuses selective pressure on the search for novel solutions. NS has produced intriguing results in specialized domains, but has not been applied in most machine learning areas. The key component of NS is that each individual is described by the behavior it exhibits, and this description is used to determine how novel each individual is with respect to what the search has produced thus far. However, describing individuals in behavioral space is not trivial, and care must be taken to properly define a descriptor for a particular domain. This paper applies NS to a mainstream pattern analysis area: data clustering. To do so, a descriptor of clustering performance is proposed and tested on several problems, and compared with two control methods, Fuzzy C-means and K-means. Results show that NS can effectively be applied to data clustering in some circumstances. NS performance is quite poor on simple or easy problems, achieving basically random performance. Conversely, as the problems get harder NS performs better, and outperforming the control methods. It seems that the search space exploration induced by NS is fully exploited only when generating good solutions is more challenging.

Published in
GECCO '13 Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference
Pages 1093-1100
Date of conference
03-05 Abril 2013
Last modified on%AM, %08 %231 %2013 %04:%Oct
(0 votes)
Read 3417 times