Abstract: Amazon droughts in 2005 and 2010 have raised serious concern about the future of the rainforest. Amazon forests are crucial because of their role as the largest carbon sink in the world which would effect the global warming phenomena with decreased photosynthesis activity. Especially, after a decline in plant growth in 1.68 million km2 forest area during the once-in-a-century severe drought in 2010, it is of primary importance to understand the relationship between different climatic variables and vegetation. In an earlier study, we have shown that non-linear models are better at capturing the relation dynamics of vegetation and climate variables such as temperature and precipitation, compared to linear models. In this research, we learn precise models between vegetation and climatic variables (temperature, precipitation) for normal conditions in the Amazon region using genetic programming based symbolic regression. This is done by removing high elevation and drought affected areas and also considering the slope of the region as one of the important factors while building the model. The model learned reveals new and interesting ways historical and current climate variables affect the vegetation at any location. MAIAC data has been used as a vegetation surrogate in our study. For temperature and precipitation, we have used TRMM and MODIS Land Surface Temperature data sets while learning the non-linear regression model. However, to generalize the model to make it independent of the data source, we perform transfer learning where we regress a regularized least squares to learn the parameters of the non-linear model using other data sources such as the precipitation and temperature from the Climatic Research Center (CRU). This new model is very similar in structure and performance compared to the original learned model and verifies the same claims about the nature of dependency between these climate variables and the vegetation in the Amazon region. As a result of this study, we are able to learn, for the very first time how exactly different climate factors influence vegetation at any location in the Amazon rainforests, independent of the specific sources from which the data has been obtained.
Abstract: Satellite imagery and remote sensing provide explanatory variables at relatively high resolutions for modeling geospatial phenomena, yet regional summaries are often desirable for analysis and actionable insight. In this paper, we propose a novel method of inducing spatial aggregations as a component of the machine learning process, yielding regional model features whose construction is driven by model prediction performance rather than prior assumptions. Our results demonstrate that Genetic Programming is particularly well suited to this type of feature construction because it can automatically synthesize appropriate aggregations, as well as better incorporate them into predictive models compared to other regression methods we tested. In our experiments we consider a specific problem instance and real-world dataset relevant to predicting snow properties in high-mountain Asia.
Abstract: n recent years, a number of methods have been proposed that attempt to improve the performance of genetic programming by exploiting information about program semantics. One of the most important developments in this area is semantic backpropagation. The key idea of this method is to decompose a program into two parts—a subprogram and a context—and calculate the desired semantics of the subprogram that would make the entire program correct, assuming that the context remains unchanged. In this paper we introduce Forward Propagation Mutation, a novel operator that relies on the opposite assumption—instead of preserving the context, it retains the subprogram and attempts to place it in the semantically right context. We empirically compare the performance of semantic backpropagation and forward propagation operators on a set of symbolic regression benchmarks. The experimental results demonstrate that semantic forward propagation produces smaller programs that achieve significantly higher generalization performance.
Abstract: Maintaining population diversity has long been considered fundamental to the effectiveness of evolutionary algorithms. Recently, with the advent of novelty search, there has been an increasing interest in sustaining behavioral diversity by using both fitness and behavioral novelty as separate search objectives. However, since the novelty objective explicitly rewards diverging from other individuals, it can antagonize the original fitness objective that rewards convergence toward the solution(s). As a result, fostering behavioral diversity may prevent proper exploitation of the most interesting regions of the behavioral space, and thus adversely affect the overall search performance. In this paper, we argue that an antagonism between behavioral diversity and fitness can indeed exist in semantic genetic programming applied to symbolic regression. Minimizing error draws individuals toward the target semantics but promoting novelty, defined as a distance in the semantic space, scatters them away from it. We introduce a less conflicting novelty metric, defined as an angular distance between two program semantics with respect to the target semantics. The experimental results show that this metric, in contrast to the other considered diversity promoting objectives, allows to consistently improve the performance of genetic programming regardless of whether it employs a syntactic or a semantic search operator.