Species distribution modelling


The problem of modelling the geographic distribution of a given animal or plant species has received much attention recently. This is a critical problem in conservation biology because one needs to know where a given species prefers to live and what its requirements are before conservation action can be taken. The data available typically consists of a list of occurrences, i.e. a set of geographic coordinates for locations where the species has been observed. In addition, we have access to data on environmental variables, such as climate, elevation, land uses, etc., which have been measured or estimated across the region of interest. The goal is to predict which areas within the region satisfy the requirements of the species and thus form part of its potential distribution. The potential distribution describes where conditions are suitable for survival of the species. The actual, or realised distribution, is often somewhat smaller than the potential distribution either because the species cannot reach all the areas that could potentially support it (e.g. because of some barrier to dispersal) or because it is has been eliminated from some areas by human exploitation, pollution, competition with other species, etc.


Natural history museum and herbarium collections and field observations from volunteers (collated by National Recording Schemes and Local Record Centres) provide a rich source of such occurrences. In the UK, his type of data is becoming increasingly accessible via the National Biodiversity Network. However, there is typically little or no information about the failure to observe the species at any given location and many locations have not been surveyed. Consequently it is usually the case that only presence data is available to indicate the occurrence of the species. In addition, for many species in more obscure groups (e.g. many lower plants and invertebrates), even this data is quite sparse.


This sort of scattered presence only data is much more difficult to deal with than systematically collected presence/absence or abundance data and statistical methods to analyse and model it are a recent and rapidly developing field of study.


The most successful modelling methods so far are based on machine learning techniques. The computer packages Maxent and DesktopGarp (GARP: Genetic Alogorithm for Rule set Production) are both freely downloadable and use the same formats for their environmental and species observation data. The most comprehensive model comparison to date was provided by Elith et al. (2006). The authors compared 16 modeling methods using 226 species across six regions of the world. These analyses found differences between predictions from alternative methods, but also found that some methods consistently outperformed others. Maxent came out as the top rated method, narrowly ahead of GARP. However, on practical grounds, Maxent is much easier and quicker to use and is the main modelling system that has been used here. It is described in detail by Philips et al. (2006). The GARP algorithm is described by Stockwell & Peters (1999).




Elith, J., Graham, C., & group, t.N.s.d.m. (2006) Novel methods improve prediction of species' distributions from occurrence data. Ecography, 29, 129-151.
Phillips, S.J., Anderson, R.P., & Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.
Stockwell, D.R.B. & Peters, D.P. (1999) The GARP modelling system: Problems and solutions to automated spatial prediction. International Journal of Geographical Information Systems, 13, 143-158.