The European Distribution of Sus Scrofa . Model Outputs from the Project Described within the Poster – Where are All the Boars? An Attempt to Gain a Continental Perspective

Wild boar is a host of a number of arthropod-vectored diseases and its numbers are on the rise in mainland Europe. The species potentially impacts ecosystems, humans and farming practices and so its distribution is of interest to policy makers in a number of fields beyond that of the primarily epidemiological goal of this study. Three statistical model outputs describing the distribution and abundance of the species Sus scrofa (Wild boar) are included in this data package. The extent of this dataset covers continental Europe. These data were presented as a poster [1] at the conference Genes, Ecosystems and Risk of Infection (GERI 2015). The first of the three models provide a European map presenting the probability of presence of Sus scrofa, which can be used to describe the likely geographical distribution of the species. The second and third models provide indices to help describe the likely abundance across the continent. The two indices include “the proportion of suitable habitat where presence is estimated” and a simple classification of boar abundance across Europe using quantiles of existing abundance data and proxies.


Overview Introduction/Study Description
Wild boar Sus scrofa are an important component of the ecological and epidemiological systems within which vector-borne diseases persist. Wild boar are hosts to a number of vector species, and they can therefore impact on disease cycles as reservoirs of pathogens. Information on wild boar distribution and abundance could therefore make an important contribution to models of vectorborne disease risk.
With a single exception [2], the many studies that have focussed on the distribution, abundance and habitat-use of wild boar were generally carried out in relatively small areas such as national parks or at country level. Given the broader, continental scale required for effectively advising European policy on disease management, an attempt has been made to produce a continental scale distribution and abundance map.
This study combines a review of the existing literature along with abundance-related data from a range of sources, including national hunting organisations, international and national distribution databases, to provide a continental dataset and perspective of boar distribution and abundance.
To create the final European 1km resolution boar map, the combined quantitative data described above were constrained using a habitat suitability mask derived from the GlobCover land cover database informed by published descriptions of habitat preference as well as expert opinion. A number of spatial distribution modelling tools available from the VECMAP [3] Modelling suite were used to produce three final modelled distribution outputs for Europe using the Random Forest approach. These comprise a 1km probability of presence/absence layer, a 1km abundance index based on presence and habitat availability, and a 1km ranked abundance map based on regional abundance studies and national hunting figures.

Steps Binary presence and absence
Five independent sets of distribution data were combined to produce a single presence absence mask. The data sets used were as follows: • The EMMA Database [4]: Mapping Europe's mammals using data from the Atlas of European Mammals. • The Global Biodiversity Information Facility (GBIF) [5].
• The National Biodiversity Network [7] UK 10k Data. • Spanish Ministry of Agriculture National Inventory of Biodiversity [8].

Habitat definition
For much of the indicated range, the distributions detailed above were, by their nature indications of current presence limits. Within these designated boundaries there was no indication of absence. In order to introduce absences within these limits, suitability masks were defined using species-specific habitat preferences derived from land cover classes, using GLOBCOVER [9] at 1 km resolution Downloaded from the EDENext Data Portal [10]. These suitability definitions are recorded in Table 1. The presence absence data described in the previous section were combined with the suitability layer and aggregated to a 10km grid as a proportion of suitable habitat. The values of which were sampled and offered up to the Random Forest modelling framework within VECMAP [3] outlined later in this paper.

Boar Abundance Inputs
A comprehensive literature review of Sus scrofa abundance studies was undertaken  which unearthed a piecemeal collection of abundance data focused mainly on small areas such as national parks or in some cases up to country level. These were recorded by different methods and across different time periods and has a spatial coverage across Europe which was far from regular. A notable exception was a recent review of wild boar population trends in 18 countries in Europe, based on hunting statistics [2].
To complement these abundance data, hunting figures were also identified for a number of countries at both national level and sub-national level [34][35][36][37][38]. After discussion with boar specialists it was agreed that, at least within a single country, hunting data could be considered as a valid proxy for abundance. In order to get the most complete coverage across the continent, it was decided to convert the available data to relative abundance indices that could be compared across countries by normalising the available number according to known national abundance figures.
The data were thus categorised into quantiles, with a fifth category of 0 or negligible boar numbers where known or inferred in areas defined as unsuitable habitat. The resulting database provided categorical boar abundance ranging from 0-4 (0 = none/negligible boar abundance to 4 = high abundance).

Model predictor suite
A suite of spatial covariate layers of environmental data were used by the VECMAP [3] model tools to define statistical relationships with the variable to be modelled. This predictor suite included a wide range of remotely sensed variables as follows: regression continuous output; a classified boar abundance index, which resulted in a RF categorical model output.

Sampling strategy
Sample points were extracted for input into the three different Random Forest models from a 10km matrix defining each of the three input variables within known distributions. Overall there were ~1 2000 random points used across Europe. The following VECMAP [3] default sample parameters were used to define the Random Forest prediction for each of the models: • Prediction forest forest size: 100.

Quality Control
These models are a first attempt at quantifying the boar distribution at this scale and there has been no ground truth validation of these maps so far. All the model outputs l, however, satisfy standard accuracy metrics (R squared or Cohen's kappa coefficient where relevant) assuring statistical reliability. Model outputs have also been informally reviewed by project boar experts.

Constraints
There were no constraints involved in data production.

N/A.
Research involving human participants should be approved by your institutional review board or equivalent committee(s) and that board must be named here. In addition, the research must have been conducted in accordance with the Declaration of Helsinki.
Non-human research on vertebrates must comply with institutional, national, or international guidelines, and where available should have been approved by an appropriate ethics committee. primary data, processed data, interpretation of data.

Dataset creators
As per author list.

Language
English.

Programming language
None.

Accessibility criteria
All three layers have been provided as a quick look map in JPEG format to view from any image viewer.
The data itself are distributed as GIS Raster data in two formats. GeoTIFFs which is a standard proprietary GIS raster format. GeoJP2 (JPEG 2000 format) which is a nonproprietary format. To access and analyse the Raster data directly GeoTIFFs and GeoJPGs can be read by most GIS software and some other software packages These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) [48] or R-project [47] raster package). If the user has no suitable software already installed the authors suggest downloading the open source QGIS software free of charge from http://www. qgis.org to view these data.

Reuse potential
Wild boar is a large mammal and a species for which numbers and distribution are increasing in mainland Europe. The species' potential impact to environment, human activities and farming practices ensure the model outputs will be of interest to ecologists, human and animal health authorities and policy makers in a number of fields beyond that of the epidemiological goal of this study.

Competing Interests
The authors declare that they have no competing interests.