Advanced Classification Library for ArcGIS


Beta Version | December 2012



Downloads:

Introduction

Esri provides only one supervised classification tool for use in ArcGIS 10.0. Maximum Likelihood Classification (MLC) is available through the Multivariate toolbox in the Spatial Analysis extension. The classification toolset in ArcGIS is inadequate compared to the current state of image classification research. This project enhances the software's classification abilities by providing a toolset for performing supervised classification by support vector machines (SVM) and for enhancing classification results in multitemporal studies using a MRF-based contextual classification procedure.

Incorporating contextual information into classification is particularly important for accuracy in the analysis of land cover change. Without it, analyses that rely on the quantitative comparison of land cover types between multiple dates have decreased worth. Even though the accuracy of individual images may be high, the accuracy of the land cover transitions between images will be much lower because of the low spatial and temporal coherence within and between images.

Background on SVM and STM

SVM are state-of-the-art classifiers that have consistently shown the ability for superior classification accuracy in comparison to most other non-contextual classifiers in a variety of situations. In remote sensing, SVM have most often been applied to land cover classification. Huang et al. (2002) assessed the viability of SVM for this purpose, comparing SVM with MLC, Decision Tree Classification (DTC), and Neural Network Classification (NNC). They found SVM was more accurate than MLC, generally more accurate than DTC, and comparable in accuracy to NNC with SVM performing better with higher dimensional input data. Foody and Mathur (2004) also assessed the viability of SVM for this purpose, but substituted discriminant analysis (DA) for MLC in their comparison. The results were similar with SVM outperforming DTC, generally outperforming DA, and comparable with NNC.

For the binary classification problem, SVM fit a maximum margin hyperplane to separate input data after implicitly mapping it to a higher dimensional space. By using a mathematical trick, this mapping does not have to be computed explicitly. Because this hyperplane creates an optimally separating decision boundary and this boundary is created in a higher dimensional space where data is more likely to be linearly separable, SVM tend to yield stable classification accuracies even with pixels of mixed spectral response and classes that are spectrally heterogeneous. SVM are a binary classifier, but are extended to multiclass problems through winner-takes-all or maximum-vote strategies. To interface SVM with ArcGIS, this project makes use of the LIBSVM classification library for support vector machines. LIBSVM was developed by Chih-Chung Chang and Chih-Jen Lin out of the Machine Learning and Data Mining Group at National Taiwan University. It is freely available and specifically developed to enhance SVM usage in other scientific fields. It provides a Python interface, enabling ease of use in a Python scripting environment.

The Spatial-Temporal Modeling Method (STM) is a contextual classification procedure proposed by Liu and Cai in 2012. It uses a maximum a posteriori Markov random field (MAP-MRF) framework to determine the optimal class label for a pixel based on its spectral, spatial, and temporal contextual information. The aim of this procedure is to improve the coherence of class labels in both the spatial and spatial-temporal domains. It adopts a mutual approach to sharing temporal information so that the classification result of a pixel at one date incorporates information about its neighbors at the preceding and following dates. It also incorporates expert knowledge about allowed class transitions in the study area through specification of illogical transitions. Applying the procedure to a sequence of classified images involves iteratively updating the classification of each pixel based on the minimization of an energy function calculated for its spatial and spatial-temporal neighbors. As in Liu and Cai 2012, this implementation uses the well-known Iterated Conditional Modes algorithm to achieve this energy minimization. In contrast to the simulated annealing method used in the paper, however, MRF parameters are here estimated using a least squares minimum energy perturbation method proposed by Melgani and Serpico (2003) due to its computational efficiency.

A MRF-based contextual classification was chosen for implemention because of the excellent results it has shown in improving spatial and spatial-temporal coherence for multitemporal imagery. Additionally, the method's true multitemporal applicability, incorporation of expert knowledge into the classification process, and firm theoretical foundations were judged to be advantageous.

Methods

A collection of Python scripts were produced to implement SVM and STM functionality for ArcGIS. The workhorses of the collection were two scripts each focusing on one of the methods. Within these scripts, the model building and classifying steps of each method were separated to facilitate modularity in use of the methods. In order to load rasters into a usable format for processing and to write them to display results, a raster object class was also produced, implementing functionality to load, save, and block process multiband raster data as NumPy arrays. This class uses ArcPy methods to convert rasters to and from arrays, but alternate methods could also be implemented without affecting the abilities of the SVM and STM workhorse scripts.

Raster Class

The raster class loads, saves, scales, and block processes raster data. If the path to a raster file is provided to the initialization procedure of this class, the specified raster is loaded into memory using the RasterToNumPyArray method of ArcPy and accessible as a NumPy data array. For multiband rasters, each band is loaded sequentially on the assumption that bands may be accessed by the standard file technique in the ArcGIS environment, e.g. "raster.tif\Band_1". If a path is not provided, a NumPy array containing image data and an ArcPy RasterBand description object for that array may be provided instead. Additional rasters may be added as bands to the class's data array. This is useful in manipulating multitemporal image data or, theoretically, incorporating multisource or derived data into the classification process. The raster band data is directly accessible in the data array, which has the structure of a stacked set of 2D NumPy arrays where stacking takes place along a third dimension. In total, the array is thus 3D where the third dimension is used to access 2D slices of data corresponding to individual rasters. Besides slicing, portions of this array may be accessed by data block using a generator. These data blocks are 3D in shape and based on a spatial subsetting of the first two dimensions of the array.

Because of the 3D array structure used to store raster data, when multiple data sources are stacked, each source must have the same spatial extent and cell size such that the number of rows and columns in its data matrix matches each other source. Additionally, because each raster is loaded completely into memory, care must be taken not to load or process data that exceeds available computer memory. Alternate methods could be used to populate and manipulate the data structure that are more memory efficient, but are not yet implemented here.

Saving data arrays as rasters is accomplished using ArcPy's NumPyArrayToRaster method. However, as this method only supports 2D arrays, saving multiband rasters is implemented by saving each band individually and then using the Composite Bands tool to merge the bands into a single raster.

The raster class also allows the data array to be linearly scaled, which is useful because the SVM classifier is not scale-invariant and, thus, doing so is necessary to improve classification result. The scaling function operates by scaling the first band of a raster between 0 and 1 and based on the minimum and maximum values found in that band. It then scales each additional band based on those same values. This is done to ensure that each band is scaled based on the same range, which is necessary because bands values are not independent of each other. As a result, the first band of a scaled raster will always have values between 0 and 1, while additional bands may have values slightly outside of this range. Other methods for scaling are possible, but were not implemented here.

SVM

Two functions, named "train" and "predict," were created to handle the two phases of SVM classification. The Radial Basis Function (RBF) kernel was chosen for classification because this kernel has shown high performance for remote sensing data. Other kernels could be chosen, but are not yet implemented here.

The purpose of the train function is to train a SVM model for use in classification. In order to do this, two parameter values must be chosen, called C and G here, and training data must be provided. C is a regularization parameter, conceptually indicating the level of trust the classifier should have for the accuracy of the training data. Higher values indicate more trust. G corresponds to the Gaussian width used for the RBF kernel. Conceptually, it determines the smoothness of the function approximated with the kernel. Higher values indicate greater degrees of smoothing. Smoothing here refers to the decision surface produced during SVM training and does not indicate any spatial property of the output classification. If these values are known for a given dataset, they may be used directly by the train function. More often, however, they are not known, so the function also implements an n-fold cross-validated grid search for automatic selection of these parameters. An arbitrary number of folds (commonly 3 or 5) may be used in the search and the grain and extent of the search may also be specified. The C and G values with best training accuracy are chosen.

The purpose of the predict function is to perform classification using a given SVM model. Any valid LIBSVM model may be used by this procedure, but only a model using a RBF kernel is produced by the train function. To use other models, other training functions would need to be written. The predict function optionally allows class conditional probability (CCP) estimates to be produced for a given classification. This output indicates the probability that a given pixel should be assigned a given class. It is used as an input to the STM procedure defined later on. Classification itself proceeds in a blockwise fashion in order to increase the speed of processing and reduce memory requirements, as only portions of the raster data array need be fed to the classifier rather than the entire data array at once.

Actual training and prediction is performed using the LIBSVM classification library described earlier.

STM

Two functions, named "estimate" and "update," were created to handle the two phases of STM classification. Additional internal functions for block-based neighborhood processing were also created.

The purpose of the estimate function is to determine how to weight the various contextual contributions in the energy function used in the energy minimization procedure described previously. It also computes the global transition probabilities for the study area, which are necessary for computation with any Markov model. These probabilities are computed globally for past-to-future and future-to-past steps since a mutual approach to sharing contextual information is adopted in STM. The method of computation is found in Liu et al. 2008. The estimate function accept as input a time-ordered sequence of classifications, a set of time-ordered CCP estimates, training site locations and correct class labels, an illogical transition matrix, and an optional specification of the neighborhood system to use during processing. The illogical transition matrix specifies which class transitions are illogical for the study area given the length of time between image dates or the processes of change at work. It incorporates expert knowledge into the classification process. Zeroing out this matrix will remove its contribution to the classification process. Estimation of weights proceeds by the minimum energy perturbation method exhibited in Melgani and Serpico 2003. In this method, the spectral, spatial, and temporal energies at each training site are computed for the correct classification of each pixel. Assuming each energy is an equal contributor, the energies are then summed to derive a total energy. At the same time, the minimum total energy for each training site among the other possible class labels is found. The difference between these total energies is then used to perturb the total energy of a correct class labeling. The individual spectral, spatial, and temporal energies as well as this perturbed total are stored in matrix form, together specifying an over-determined system of equations when all training sites have been examined. This system is then solved by pseudo-inverse method to determine weights for energy contribution for each classification. This is equivalent to obtaining a least-squares estimate of these parameter values. The estimate function returns these parameter estimates, the input illogical transition matrix, and the past and future global transition probabilities, which together specify the STM model for the input set of classifications.

The purpose of the update function is to relabel a set of classifications based on the energy minimization procedure described previously. It uses the parameter estimates, the illogical transition matrix, and the past and future global transition probabilities produced by the estimate function. The update function accepts as input a time-ordered sequence of classifications, a set of time-ordered CCP estimates, and criteria for stopping iteration of the algorithm. Because it includes an option to save intermediate classification results, the update function also requires paths for saving classification outputs. The algorithm employed must store information about the classification results at each iteration and the iteration previous. Consequently, the function does not make use of the block processing functionality of the provided raster class, but instead only uses it to initially read the raster data and then write classification results. Processing itself takes place with two large 3D NumPy arrays, so the memory requirements of this method are more substantial than for SVM. In execution, for each iterate of the algorithm, the total energy for each pixel is computed for each potential class labeling given its last class labeling. The potential class label with the lowest total energy is then assigned to the pixel. By design, over iteration, all class labels will settle so that the classifications reach a low energy configuration. At each iterate, the percentage of pixels that had their label changed because of the iteration is computed and this is compared with a provided stopping criterion to determine whether the algorithm has run its course.

User's Manual

The following is information about the tools contained in this library. Specific information about the parameters each tool takes and its outputs are provided in the help documentation for each tool. This section gives an overview of the purpose of each tool and how they fit in the classification procedure. It also describes the datasets that are used as input for the tools in each toolset.

Note that all rasters used by these tools must have a pixel depth of at least 8 bits due to limitations of the ArcPy methods used to read the data. A tool to promote the pixel depth of a set of rasters is provided. Alternatively, the Copy Raster packaged with ArcGIS may be used to promote the pixel depth of a single raster.

SVM Tools

There are four tools.

SVM Datasets

Two datasets are needed.

  1. The first is a multispectral raster image.

  2. The second is a training raster. The training raster should have the same extent and cell size as the image such that the number of rows and columns of its data matrix matches that of the image. It should be accurately coregistered with the image. Training pixels should be assigned integer labels 1, ..., n indicating the true classes of those pixels. Pixels that are not labeled should have a value of 0.

STM Tools

There are three tools.

STM Datasets

Four datasets are needed.

  1. The first is a sequence of t classifications. Each image should be coregistered to the others. Each should have the same extent and cell size such that the number of rows and columns of its data matrix matches that of all the other images. Each image should use the classification scheme 1, ..., n with every pixel given a classification.

  2. The second is a sequence of t class conditional probability rasters. Each image should be coregistered with its corresponding classification. Each should have the same extent and cell size as its corresponding classification such that the number of rows and columns of its data matrix matches that of its corresponding classification. Each should have values in the interval [0, 1]. Each should have n bands where the position of the band specifies the class it describes. The first band corresponds to class 1, the second to class 2, and so on.

  3. The third is a sequence of t training rasters. Each image should be coregistered with its corresponding classification. Each should have the same extent and cell size as its corresponding classification such that the number of rows and columns of its data matrix matches that of its corresponding classification. Training pixels should be assigned integer labels 1, ..., n indicating the true classes of those pixels. Pixels that are not labeled should have a value of 0.

  4. The fourth is an illogical transition matrix of size n-by-n where rows indicate the class the transition is from and columns indicate the class the transition is to. The value of a matrix cell may be 1 or 0, indicating whether the transition is illogical or logical, respectively. Other values for illogical are possible, but not recommended. The matrix should be written in form "0,0,...,0;...;0,0,...,0".

Utilities

There are two tools.

The accuracy assessment tool uses two datasets. The first is a classification. The second is a testing raster. The classification should use the scheme 1, ..., n. The testing raster should be coregistered with the classification. It should have the same extent and size such that the number of rows and columns of its data matrix matches that of the classification. Training pixels should be assigned integer labels 1, ..., n indicating the true classes of those pixels. Pixels that are not labeled should have a value of 0.

Future

In the future, more options could be incorporated into the SVM classification tool. Currently, the toolset only explicitly allows the use of an RBF kernel for classification. Other kernels, such as polynomial and composite kernels, are sometimes used in remote sensing classification. Providing the ability to use these kernels would expand the utility of the SVM toolset. Additionally, untying the raster class from ArcPy will be advantageous to using the provided functions in a scripting environment. ArcPy was chosen for raster file input and output because the goal of the project was to bring the SVM and STM methods to ArcGIS. However, other Python libraries such as GDAL could be used instead for this purpose. Furthermore, a module such as PyTables or memory mapping methods could be incorporated to allow the processing of memory-prohibitive datasets. Computational time could also be decreased through multithreading and parallelization. Alternatively, the STM algorithm could potentially be made faster by writing it in a lower level language and binding the results to Python. Both tools might benefit by allowing masking, so that not all pixels in a scene must be classified.

In the future, other classifiers could be incorporated into this toolset to expand its functionality. Particularly useful might be a neural network classifier. As noted earlier, NNC's are competitive with SVM, particularly for low dimensional data. As well, a semi-supervised classifier to reduce the burden of selecting training samples may be useful. Additional contextual methods, particularly ones that are purely spatial or not MRF-based, might be useful too. Finally, classifiers working specifically with vector data and their attributes might be interesting.

References