Associate Professor McGill University, Quebec, Canada
Abstract: Species distribution models (SDMs) are widely used for ecological inference and conservation. However, they (1) demonstrably result in incorrect projections, often reflecting distributions of people (i.e. observation bias) as species habitat requirements, and (2) are difficult to project at different spatial scales due to autocorrelation. The first issue of observation bias arises when presence-only data used to fit SDMs are more abundant in certain environments, such as when citizen science data is concentrated near cities. The second issue of scaling SDM predictions is rooted in the fact that SDMs estimate distributions based on a few environmental variables, but species’ distributions are often also shaped by unmeasured (latent) environmental variation, biotic interactions, and dispersal. All these factors could cause species occurrences to cluster together more than the SDM predicts. Thus nearby SDM predictions are not independent, which makes it difficult to aggregate small scale predictions into predictions at larger spatial scales.
In this study, we show how to use species checklists from national parks to re-calibrate SDMs built with presence-only data, and correct for observation biases. Our method accounts for latent clustering and allows us to translate SDMs across spatial scales to incorporate the (larger-scale) park checklist data. We test this model by simulating species occurrences across an environmentally heterogenous landscape with both measured and latent environments, and fitting SDMs to a spatially biased sample of occurrences.
We demonstrate that latent clustering causes SDMs to overestimate the probability of species occurrence at larger spatial scales. This effect is stronger when the clustering is coarser, and when the latent environment is a more important driver of species distributions. Large scale predictions can be ameliorated by fitting a simple “clustering coefficient”. Our model was able to successfully distinguish error caused by clustering from error caused by sampling bias, and thus produce unbiased SDMs. This approach also provides absolute rather than relative probabilities, allowing meaningful estimations of species richness. This method provides a framework for translating SDM predictions to different spatial scales, allowing us to integrate diverse data sources robustly.