Assessing the model

 
It is considered good practice to train the model on one set of data and assess how well the fitted result performs by testing it against data that was not used for training. Providing there is sufficient data, this is most often achieved by randomly selecting half of the available observations for training and then to use the other half for testing.
 
The testing process is, therefore:

Step 1

The result from the model is a probability surface. We can turn this into a predicted potential distribution for the species by applying a threshold. If the probability generated by the fitted model is at or above this threshold for a given square then the species is predicted to be able to survive there.

 

In this probability map, light blue represents a low probability shading through to red which represents the highest probability.

Apply threshold to results

Step 2

We then compare the presence or absence predicted at this particular threshold with actual observations, using the randomly selected half of the observations that were NOT used to train the model.

Compare prediction at threshold with observations

Step 3

This comparison has four possible outcomes for each grid square:

  • Predicted to be present by the model and it was observed there,
  • Predicted to be present by the model but it was not observed,
  • Predicted to be absent by the model but it was observed there,
  • Predicted to be absent by the model and it was not observed.
Four possible outcomes of the comparison

Step 4

Count the number of grid squares falling in each of these categories and tabulate the results. This produces a Matrix of Confusion.

 

We can calculate two important results from this table:

  • The True positive rate = the proportion of observations that were correctly predicted by the model.

    In the example, 127 out 131 observations were correctly predicted giving a True positive rate of 127/131 = 96.95%.
  • The False Positive rate = the proportion grid squares where the species was not observed, but where the model predicts presence.

    In the example, the model predicted presence in 1,005 squares out of a sample of 10,000 for which there were no observations. The False positive rate is therefore 1,005/10,000 = 10.04%.
Confusion matrix

Step 5

Plot the False positive rate (x-axis) against the True positive rate (y-axis). Repeat Steps 1 - 4 across the range of possible thresholds ranging from 0 to 1, plotting a new point for each threshold.

 

The resulting plot is known as the receiver operating characteristic (ROC). The Area Under the Curve (AUC) gives a good summary of the overall fit of the model across the whole range of thresholds. A perfect fit (i.e. the model always predicts the observations exactly) would give an AUC of 1.0, whilst a random prediction would give an AUC of 0.5 (the green-blue line on the plot).

 

We are primarily interested in the blue line - the ROC plot for test data that was not used to train the model. In this example the Test AUC is 0.918 - a pretty good result! As a rule of thumb, an AUC of 0.85 or over is generally considered to be a good fit, whilst an AUC of 0.9 or over is very good. 

 

The ROC curve for the training data nearly always has a higher AUC than that for the test data. This is what we would expect. The model was fitted to the training data, therefore it ought to perform well when tested against it!

ROC plot