Abstract: An open question in ensemble-based active learning is how to choose one classifier type, or appropriate combinations of multiple classifier types, to construct ensembles for a given task. While existing approaches typically choose one classifier type, this paper presents a method that trains and adapts multiple instances of multiple classifier types toward an appropriate ensemble during active learning. The method is termed adaptive heterogeneous ensembles (henceforth referred to as AHE). Experimental evaluations show that AHE constructs heterogeneous ensembles that outperform homogeneous ensembles composed of any one of the classifier types, as well as bagging, boosting and the random subspace method with random sampling. We also show in this paper that the advantage of AHE over other methods is increased if (1) the overall size of the ensemble also adapts during learning; and (2) the target data set is composed of more than two class labels. Through analysis we show that the AHE outperforms other methods because it automatically discovers complementary classifiers: for each data instance in the data set, instances of the classifier type best suited for that data point vote together, while instances of the other, inappropriate classifier types disagree, thereby producing a correct overall majority vote.
Abstract: An ensemble is a set of learned models that make decisions collectively. Although an ensemble is usually more accurate than a single learner, existing ensemble methods often tend to construct unnecessarily large ensembles, which increases the memory consumption and computational cost. Ensemble pruning tackles this problem by selecting a subset of ensemble members to form subensembles that are subject to less resource consumption and response time with accuracy that is similar to or better than the original ensemble. In this paper, we analyze the accuracy/diversity trade-off and prove that classifiers that are more accurate and make more predictions in the minority group are more important for subensemble construction. Based on the gained insights, a heuristic metric that considers both accuracy and diversity is proposed to explicitly evaluate each individual classifier’s contribution to the whole ensemble. By incorporating ensemble members in decreasing order of their contributions, subensembles are formed such that users can select the top p percent of ensemble members, depending on their resource availability and tolerable waiting time, for predictions. Experimental results on 26 UCI data sets show that subensembles formed by the proposed EPIC (Ensemble Pruning via Individual Contribution ordering) algorithm outperform the original ensemble and a state-of- the-art ensemble pruning method, Orientation Ordering (OO) .
Abstract: Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence, but designing a confidence measure is nontrivial. An alternative approach is to periodically choose data instances that maximize disagreement among the label predictions across an ensemble of classifiers. Many classifiers with different underlying structures could fit this framework, but some ensembles are more suitable for some data sets than others. The question then arises as to how to find the most suitable ensemble for a given data set. In this work we introduce a method that begins with a heterogeneous ensemble composed of multiple instances of different classifier types, which we call adaptive informative sampling. The algorithm periodically adds data points to the training set, adapts the ratio of classifier types in the heterogeneous ensemble in favor of the better classifier type, and optimizes the classifiers in the ensemble using stochastic methods. Experimental results show that the proposed method performs consistently better than homogeneous ensembles. Comparison with random sampling and uncertainty sampling shows that the algorithm effectively draws informative data points for training.
Abstract: One common approach to active learning is to iteratively train a single classifier by choosing data points based on its uncertainty, but it is nontrivial to design uncertainty measures unbiased by the choice of classifier. Query by committee  suggests that given an ensemble of diverse but accurate classifiers, the most informative data points are those that cause maximal disagreement among the predictions of the ensemble members. However the method for finding ensembles appropriate to a given data set remains an open question. In this paper, the random subspace method is combined with active learning to create multiple instances of different classifier types, and an algorithm is introduced that adapts the ratio of different classifier types in the ensemble towards better overall accuracy. Here we show that the proposed algorithm outperforms C4.5 with uncertainty sampling, Naive Bayes with uncertainty sampling, bagging, boosting and the random subspace method with random sampling. To the best of our knowledge, our work is the first to adapt the ratio of classifiers in a heterogeneous ensemble for active learning.