Abstract: We develop an algorithm to evolve sets of probabilistically significant multivariate feature interactions, with co-evolved feature ranges, for classification in large, complex datasets. The datasets may include nominal, ordinal, and/or continuous features, missing data, imbalanced classes, and other complexities. Our age-layered evolutionary algorithm generates conjunctive clauses to model multivariate interactions in datasets that are too large to be analyzed using traditional methods such as logistic regression. Using a novel hypergeometric probability mass function for fitness evaluation, the algorithm automatically archives conjunctive clauses that are probabilistically significant at a given threshold, thus identifying strong complex multivariate interactions. The method is validated on two synthetic epistatic datasets and applied to a complex real-world survey dataset aimed at determining the drivers of household infestation for an insect that transmits Chagas disease. We identify a set of 178,719 predictive feature interactions that are associated with household infestation, thus dramatically reducing the size of the search space for future analysis.
Abstract: We propose NM landscapes as a new class of tunably rugged benchmark problems. NM landscapes are well defined on alphabets of any arity, including both discrete and real-valued alphabets, include epistasis in a natural and transparent manner, are proven to have known value and location of the global maximum and, with some additional constraints, are proven to also have a known global minimum. Empirical studies are used to illustrate that, when coefficients are selected from a recommended distribution, the ruggedness of NM landscapes is smoothly tunable and correlates with several measures of search difficulty. We discuss why these properties make NM landscapes preferable to both NK landscapes and Walsh polynomials as benchmark landscape models with tunable epistasis.
Abstract: For the past 25 years, NK landscapes have been the classic benchmarks for modeling combinatorial fitness landscapes with epistatic interactions between up to K+1 of N binary features. However, the ruggedness of NK landscapes grows in large discrete jumps as K increases, and computing the global optimum of unrestricted NK landscapes is an NP-complete problem. Walsh polynomials are a superset of NK landscapes that solve some of the problems. In this paper, we propose a new class of benchmarks called NM landscapes, where M refers to the Maximum order of epistatic interactions between N features. NM landscapes are much more smoothly tunable in ruggedness than NK landscapes and the location and value of the global optima are trivially known. For a subset of NM landscapes the location and magnitude of global minima are also easily computed, enabling proper normalization of fitnesses. NM landscapes are simpler than Walsh polynomials and can be used with alphabets of any arity, from binary to real-valued. We discuss several advantages of NM landscapes over NK landscapes and Walsh polynomials as benchmark problems for evaluating search strategies.
Abstract: Widespread unexplained variations in clinical practices and patient outcomes suggest major opportunities for improving the quality and safety of medical care. However, there is little consensus regarding how to best identify and disseminate healthcare improvements and a dearth of theory to guide the debate. Many consider multicenter randomized controlled trials to be the gold standard of evidence-based medicine, although results are often inconclusive or may not be generally applicable due to differences in the contexts within which care is provided. Increasingly, others advocate the use 'quality improvement collaboratives', in which multi-institutional teams share information to identify potentially better practices that are subsequently evaluated in the local contexts of specific institutions, but there is concern that such collaborative learning approaches lack the statistical rigor of randomized trials. Using an agent-based model, we show how and why a collaborative learning approach almost invariably leads to greater improvements in expected patient outcomes than more traditional approaches in searching simulated clinical fitness landscapes. This is due to a combination of greater statistical power and more context-dependent evaluation of treatments, especially in complex terrains where some combinations of practices may interact in affecting outcomes. The results of our simulations are consistent with observed limitations of randomized controlled trials and provide important insights into probable reasons for effectiveness of quality improvement collaboratives in the complex socio-technical environments of healthcare institutions. Our approach illustrates how modeling the evolution of medical practice as search on a clinical fitness landscape can aid in identifying and understanding strategies for improving the quality and safety of medical care.