Workflow for fitting models

 
Modelling works on a data presented in a regular grid. For modelling within Great Britain, the Ordnance Survey National Grid is convenient and modelling has been carried out using presence of species and environmental data in 1km squares.
 

Step 1

Assemble data about the known occurrences of the species of interest. We need observations where the grid reference is at 1km square precision or better and we need to consider how recent the data must be for us to use it.

Occurrences

Step 2

Decide which environmental variables are appropriate. A vast number of possible environmental variables are available!

 

In the rainfall and elevation maps, blue represents a low value shading through to red which represents the highest value. The colours in the soil map are arbitrary and represent different soil types.

Examples of environmental variables

Step 3

Train the model using a randomly selected half of the available observations. The result is a map showing a probability that indicates the relative suitability of each 1km square (for which all environmental variables were available) for the species.

 

In this probability map, light blue represents a low probability shading through to red which represents the highest probability.

Model result

Step 4

Assess the model by testing it against the other half of the observations that were not used for training.

ROC plot

Step 5

The probability surface resulting from the model can be turned into a map of the potential distribution predicted for the species by applying a suitable threshold. The species is predicted to be present in grid squares where the modelled probability is at or above this threshold. Deciding on a suitable threshold is the challenge!

Potential distribution after applying a threshold