SVM Tools
for ArcGIS 10.1+


An Image Classification Toolbox



Downloads:
Toolbox (version 1.5)
Last Updated:
January 8, 2015

Change Log

1.5:
fixed bug with interpreting version number of 10.2.X
1.4:
fixed issues with filepaths (as reported)
1.3:
fixed issue with error assessment tool

Contents

This ArcGIS 10.1+ toolbox provides tools for the supervised classification of raster imagery by Support Vector Machine (SVM) using a radial basis function (RBF) kernel. Three tools are provided:

To use these tools, training and testing data should be single band integer rasters with known pixels given values in the label set {1, ..., n} and unknown pixels given a value of 0. For example, the classification scheme for four land cover types might be coded {1 = water, 2 = forest, 3 = field, 4 = urban/built-up} and so training pixels corresponding to water would be given a value of 1, forest a value of 2, etc. The training and testing rasters should have the exact same extent and gridding as the multiband raster image you want to classify. Usually this multiband raster is a satellite image (e.g. Landsat, MODIS, etc.), but it might also be a stacked image including additional information like elevation or texture.

You can ensure your training and testing data matches your raster image by setting the Extent, Snap Raster, and Cell Size options or environments appropriately in ArcGIS during their production. Additionally, note that rasters with low pixel depth can produce unexpected results due to how such files are interpretted by ArcPy's RasterToNumPyArray function, which is used internally to obtain pixel values during classification. In such a case, it's necessary that the pixel depth of the input data be promoted using the Copy Raster tool provided in the ArcGIS Data Management toolbox.

The provided classification tools do not perform random sampling or statistical summarization of the input data. In particular, each non-zero pixel in the training and testing datasets is treated independently as a training sample. This means that non-point methods of generating training data may lead to poor classification results when the samples are simply rasterized to put the data in the format required by this tool. For example, a polygon training sample covering a region of forest typically includes pixels that are not primarily composed of trees, such as understory or bare ground pixels depending on the season. The spectral signature of these pixels will be different from the primarily tree-containing pixels. Consequently, simply rasterizing the polygon will incorrectly label these understory and bare ground pixels with the aggregate forest class given by the polygon.

Incorrectly labelled training samples will have a strongly negative effect on the classification if they remain in the training dataset. Hence, they should be removed (by reclassification to the value 0) or correctly labelled prior to classification. Accordingly, when there is choice in the method used to generate training sample data, a randomized point-sampling scheme is the preferred method for generating data to be used with this tool.

Before running the SVM tools, it's recommended that you disable building pyramids and calculating statistics in the Raster Storage environment setting. Not doing so can slow processing time.

Background

SVM are state-of-the-art classifiers that have consistently shown the ability for superior classification accuracy in comparison to most other non-contextual classifiers in a variety of situations. In remote sensing, SVM have most often been applied to land cover classification. Huang et al. (2002) assessed the viability of SVM for this purpose, comparing SVM with MLC, Decision Tree Classification (DTC), and Neural Network Classification (NNC). They found SVM was more accurate than MLC, generally more accurate than DTC, and comparable in accuracy to NNC with SVM performing better with higher dimensional input data. Foody and Mathur (2004) also assessed the viability of SVM for this purpose, but substituted discriminant analysis (DA) for MLC in their comparison. The results were similar with SVM outperforming DTC, generally outperforming DA, and comparable with NNC.

For the binary (two-class) classification problem, SVM learn a unique, optimally-placed decision boundary to separate the input dataset. This decision boundary is determined by maximizing the margins to a hyperplane fitted between the two classes in the data. Non-linear decision boundaries can be induced by performing a non-linear mapping of the input data to a higher dimensional space and then fitting this optimally separating linear decision boundary in that space. The linear boundary in the transformed space then corresponds to a non-linear boundary in the original space. A kernel function specifies how to perform this mapping. Because SVM choose a max-margin hyperplane for a decision boundary, only a portion of the training data, known as support vectors, are needed to describe its position. This sparse representation of the training data along with the non-linear mapping is known to provide robust classification results compared to those produced with other methods.

To interface SVM with ArcGIS, this project makes use of the LIBSVM classification library for support vector machines. LIBSVM was developed by Chih-Chung Chang and Chih-Jen Lin out of the Machine Learning and Data Mining Group at National Taiwan University. It is freely available and specifically developed to enhance SVM usage in other scientific fields. LIBSVM provides a Python interface, enabling ease of use in a Python scripting environment.

This software is not affiliated with or endorsed by LIBSVM or its authors.

Method

Two functions, named "train" and "predict" found in cl_svm.py, were created to handle the two phases of SVM classification. The Radial Basis Function (RBF) kernel was chosen for classification because this kernel has shown high performance for remote sensing data. Other kernels could be chosen, but are not yet implemented here.

The purpose of the train function is to train a SVM model for use in classification. In order to do this, two parameter values must be chosen, called C and G here, and training data must be provided. C is a regularization parameter, conceptually indicating the level of trust the classifier should have for the accuracy of the training data with higher values indicating greater trust. G corresponds to the bandwidth of the RBF kernel. Conceptually, it determines the smoothness of the function approximated with the kernel. Higher values indicate greater degrees of smoothing. Smoothing here refers to the decision surface produced during SVM training and does not indicate any spatial property of the output classification. If these values are known for a given dataset, they may be used directly by the train function. More often, however, they are not known, so the function also implements an n-fold cross-validated grid search for automatic selection of these parameters. An arbitrary number of folds (commonly 3 or 5) may be used in the search and the grain and extent of the search may also be specified. The C and G values with best training accuracy are chosen.

The purpose of the predict function is to perform classification using a given SVM model. Any valid LIBSVM model may be used by this procedure, but only a model using a RBF kernel is produced by the train function. To use other models, other training functions would need to be written. The predict function optionally allows class conditional probability (CCP) estimates to be produced for a given classification. This output indicates the probability that a given pixel should be assigned a given class. Because SVM are not a probabilistic classifier, these probabilities are estimated using a procedure called pairwise coupling. In general, the class with the highest conditional probability estimate will be the one that has been assigned to a pixel, but, because these probabilities are not (and cannot be) produced by the SVM itself, this is not guaranteed. Consequently, when discrepancies exist, it's due to the probability estimation procedure. For more information on pairwise coupling, see the 2004 paper by Wu, Lin, and Weng.

Actual training and prediction is performed using the LIBSVM classification library.

Cited By

If you use this toolbox in your research, I'd like to know about it!

References