Chapter 5 Supervised Classifcation

In supervised classification, we have prior knowledge about some og the land-cover types. We need areas with identified land-cover types ti use them as training set to train the classification algotithm. In the following example we will use a Classification and Regresstion Trees (CART) classifier to predict land cover classes in the study area. We will perform the following steps: - Generate sample sites based on a reference raster - Extract alues from landsat data for the sample sites - Train the classifier using training samples - classify the landsat data using the trained model - Evaluate the accuracy of the model

5.2 Generate sample sites

In this part We split the NLCD reference RatserLayer into training and validation sets.Before that we will spatial points of the site by ensuring equidistribution of land use classes by performing a stratified random sampling.

## class       : SpatialPointsDataFrame 
## features    : 1600 
## extent      : -121.9257, -121.4225, 37.85415, 38.18536  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## variables   : 2
## names       :    cell, nlcd2011 
## min values  :     413,        1 
## max values  : 2307837,        9
## 
##   1   2   3   4   5   7   8   9 
## 200 200 200 200 200 200 200 200

We can plot the sampled points to visualize the distribution of sampling locations.

## Loading required package: lattice
## Loading required package: latticeExtra

5.3 Extract values for sites

We load the Landsat data

We extract the raster values (for different band length of landsat data) corresponding to the sampled points with identified land cover classes. Then we create a dataframe containing the band values (which are our predictor variables) and the corresponding classes (which is our reponse variable)

5.5 Classify

Now we have a trained classification model that we can use to classify the cells in the landsat rasterstack.

## class      : RasterLayer 
## dimensions : 1230, 1877, 2308710  (nrow, ncol, ncell)
## resolution : 0.0002694946, 0.0002694946  (x, y)
## extent     : -121.9258, -121.42, 37.85402, 38.1855  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## source     : memory
## names      : layer 
## values     : 1, 9  (min, max)
## attributes :
##        ID value
##  from:  1     1
##   to :  8     9

We can plot the classification results

We can plot side by side the real and predicted land cover in order to assess the classification model accuracy

5.6 Model Evaluation

Now we have to assess the accuracy of the model. Two metrics are widely used in remote sensing: “overall accuracy” and “kappa”.

5.6.1 K-fold cross validation

We will use k-fold cross validation for the model evaluation. This technique consists on splitting the data to fit the model into k groups. In turn, one of the groups will be used for model testing, while the rest of the data is used for model training.

## j
##   1   2   3   4   5 
## 320 320 320 320 320

Now we trainand test the model five times, each time computing a confusion matrix that we store in a list.

Now we combine the five list elements in a single dataframe

##                     predicted
## observed             Water Developed Barren Forest Shrubland Herbaceous
##   Water                175         6      0      3         0          0
##   Developed              2        90     51      8        10         22
##   Barren                 7        39     82      4        19         38
##   Forest                 0         2      1    106        57          1
##   Shrubland              0         3      5     59       102         12
##   Herbaceous             0         9     36     10        27        109
##   Planted/Cultivated     0         7     11     34        42         19
##   Wetlands              18        10      6     36        29          5
##                     predicted
## observed             Planted/Cultivated Wetlands
##   Water                               7        9
##   Developed                          11        6
##   Barren                              5        6
##   Forest                              6       27
##   Shrubland                          12        7
##   Herbaceous                          8        1
##   Planted/Cultivated                 69       18
##   Wetlands                           33       63

5.6.4 Producer and user accuracy

##                    producerAccuracy userAccuracy
## Water                     0.8663366        0.875
## Developed                 0.5421687        0.450
## Barren                    0.4270833        0.410
## Forest                    0.4076923        0.530
## Shrubland                 0.3566434        0.510
## Herbaceous                0.5291262        0.545
## Planted/Cultivated        0.4569536        0.345
## Wetlands                  0.4598540        0.315