Abstract: Organizing or clustering data into natural groups is one of the most fundamental aspects of understanding and mining information. The recent explosion in sensor networks and data storage associated with hydrological monitoring has created a huge potential for automating data analysis and classification of large, high-dimensional data sets. In this work, we develop a new classification tool that couples a Naïve Bayesian classifier with a neural network clustering algorithm (i.e., Kohonen Self-Organizing Map (SOM)). The combined Bayesian-SOM algorithm reduces classification error by leveraging the Bayesian's ability to accommodate parameter uncertainty with the SOM's ability to reduce high-dimensional data to lower dimensions. The resulting algorithm is data-driven, nonparametric and is as computationally efficient as a Naïve Bayesian classifier due to its parallel architecture. We apply, evaluate and test the Bayesian-SOM network using two real-world hydrological data sets. The first uses genetic data to classify the state of disease in native fish populations in the upper Madison River, MT, USA. The second uses stream geomorphic and water quality data measured at ∼2500 Vermont stream reaches to predict habitat conditions. The new classification tool has substantial benefits over traditional classification methods due to its ability to dynamically update prior information, assess the uncertainty/confidence of the posterior probability values, and visualize both the input data and resulting probabilistic clusters onto two-dimensional maps to better assess nonlinear mappings between the two.
[edit database entry]