CN113220810B - Multi-source species distribution data processing method and device - Google Patents
Multi-source species distribution data processing method and device Download PDFInfo
- Publication number
- CN113220810B CN113220810B CN202110410212.7A CN202110410212A CN113220810B CN 113220810 B CN113220810 B CN 113220810B CN 202110410212 A CN202110410212 A CN 202110410212A CN 113220810 B CN113220810 B CN 113220810B
- Authority
- CN
- China
- Prior art keywords
- species
- distribution
- data
- map
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000010586 diagram Methods 0.000 claims abstract description 80
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000007613 environmental effect Effects 0.000 claims abstract description 30
- 230000004083 survival effect Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 20
- 230000000694 effects Effects 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000013480 data collection Methods 0.000 claims description 3
- 239000000463 material Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 241000894007 species Species 0.000 description 287
- 241000718131 Cercopithecus kandti Species 0.000 description 31
- 238000011160 research Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 7
- 238000011835 investigation Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 102100039148 Ankyrin repeat domain-containing protein 49 Human genes 0.000 description 1
- 101000889457 Homo sapiens Ankyrin repeat domain-containing protein 49 Proteins 0.000 description 1
- 241000282567 Macaca fascicularis Species 0.000 description 1
- 239000003181 biological factor Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Remote Sensing (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a multi-source species distribution data processing method, which comprises the following steps: step 1: collecting environmental data of the species distribution area, and determining an environmental data map layer corresponding to each species according to conditions required by species survival; step 2: collecting species multi-source distribution data; and step 3: generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the area data, the point data and the descriptive data; and 4, step 4: respectively taking the species point data and the environment data layer as input data of a species distribution model, and calculating the distribution model to obtain a species probability distribution map; and 5: determining a species distribution threshold according to the prior distribution range diagram, the known species occurrence points and the species probability distribution diagram, and generating a species binary distribution diagram; step 6: and cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map. The method has universality.
Description
Technical Field
The invention relates to the technical field of biological geography research, in particular to a multi-source species distribution data processing method and device.
Background
The endangered condition of a single species is often directly related to the size of the spatial distribution of the species, the abundance condition of the species in a region needs to depend on the spatial distribution data of a plurality of species in the region, and therefore, a key problem of ecology and conservation biology is to determine how the species are distributed in the space (Yoan F,2014), which is the basis for further evaluating the survival condition of the species and planning the protection measures of the species. However, the field investigation work of the species spatial distribution data has high technical requirements, and the time and financial consumption is huge. Therefore, people's knowledge of the geographical distribution of life on earth is still very limited (Jetz, W et al,2012), and the spatial distribution data of most species is very poor except for systematic investigation of star species that are of particular interest.
In order to solve the problem that Species distribution data is deficient and field investigation workload is huge, Chinese and foreign scholars propose a method for simulating Species distribution by using Species Distribution Models (SDMs), hope that the spatial distribution of Species is inferred through distribution points of a small number of Species in the absence of system investigation data, and the principle is to estimate the spatial distribution of target Species by establishing an association relationship between Species distribution sample information and corresponding environment variable information and projecting the relationship to a researched geospatial space. Species distribution models have been continuously paid attention in the last 30 years, and the number of proposed models exceeds 40 (Liu Xiao Tong, 2019), but in the development process, people gradually find that the species distribution models are influenced by factors such as sampling deviation (Merow et al,2016), small sample limit (Merow et al,2016) and non-environmental factor under-consideration (Greg J et al,2012), and the reliability of simulation results is questioned. The essential reason for the questioning of the prediction result of the species distribution model is that the formation process of the species space distribution status is an extremely complex and delicate process of the interaction between the species and the environment and between the species, no matter how complex the algorithm of the species distribution model is, the process is simplified, and the calculation result can only reflect certain aspects influencing the species distribution. Therefore, it is not sufficient to rely solely on the progress of the algorithm to obtain the most accurate species distribution possible, and moreover recent work has shown that several of the most popular species distribution modeling methods that use only the presence data are equivalent or nearly equivalent (William fixian, 2015). The reliability problem caused by sampling deviation, small samples, insufficient consideration of non-environmental factors and the like in the traditional species distribution model is solved only by means of an improved model algorithm, reasonable solution is to integrate species distribution related data, overcome the disadvantage of another data type by using the advantage of one data type, overcome the disadvantage of single data by using the advantage of set data, supplement the data with each other, and obtain the optimal probability estimation of each species by using the maximum amount of available information.
Species distribution models have shortcomings in data, threshold determination, model inspection, etc., for which many researchers propose solutions from different perspectives, but many are not systematic. The invention of application No. 201711463990.2 discloses a data processing method and apparatus, which basically operates a species distribution model according to collected species records and environmental data to obtain a species distribution probability map; then, rasterizing the species distribution probability graph to obtain a species distribution threshold; then, carrying out binarization processing on the species distribution probability map to obtain a species distribution map layer; and finally, performing cutting processing on the species distribution image layer according to the species habitat type to obtain a species habitat distribution map. However, the invention has the following defects: firstly, the determination method of the species distribution threshold lacks basis, cannot be verified, and does not reflect the complementary action of multi-source data; and secondly, the species distribution map is only cut according to the species habitat type, the universality is lacked, and the distribution prediction cannot be carried out by using the method for the species lacking the species habitat type data.
The determination of the species distribution threshold is key to obtaining species distribution data. The commonly used determination method includes random selection, empirical selection and the like. The invention patent of application No. 201711463990.2 proposes that the most common and most suitable species distribution threshold value of selected N types is tested and verified, for example, 200 kinds of endangered birds are used as test data, the species distribution probability graph of each bird is respectively binarized by the N types of threshold values, and the threshold value corresponding to the test result with the smallest range and capable of including all known bird distribution record points is selected as the optimal threshold value; the method firstly has great workload, and if more than 200 distribution thresholds of endangered birds are possibly tested, a relatively proper threshold can be obtained. In addition, since the distribution ratio (prediction) of each species is different, the basis for obtaining the optimal threshold value by using the threshold values of a plurality of similar species is insufficient. According to preference selection of species to habitats, species distribution maps are obtained by clipping binary distribution maps, and the accuracy of species distribution range can be further improved. The species distribution multi-source data comprises distribution limit data describing annual average temperature limit, highest temperature limit, lowest temperature limit or altitude limit of species, or distribution preference data about species habitat type, soil type, water source condition and the like. However, due to the different working depths of basic research of species, different species can obtain different relevant habitat restriction or preference data. The invention patent of application No. 201711463990.2 is limited only by the type of habitat, and has high limitation, on one hand, species without habitat selection data cannot be cut by the method, on the other hand, other environmental limitation or environmental selection data are not used, the accuracy of the distribution diagram cannot be further improved, and the advantages of multi-source data are not fully exerted.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a method and a device for improving species distribution prediction reliability based on multi-source data.
A multi-source species distribution data processing method comprises the following steps:
step 1: collecting environmental data of the species distribution area, and determining an environmental data map layer corresponding to each species according to conditions required by species survival;
the environment data comprises four types of data of climate, terrain, ground surface coverage and artificial activities;
step 2: collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data and species descriptive data;
the species area data comprises: expert profiles, local profiles; the species spotting data includes: species occurrence data, species nonoccurrence data; the species descriptive data include: ecological limitation data, habitat preference data;
and step 3: generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the species area data, the species point data and the species descriptive data;
and 4, step 4: respectively taking the species point data and the environment data layer as input data of a species distribution model, and calculating the distribution model to obtain a species probability distribution map;
and 5: determining a species distribution threshold according to the prior distribution range diagram, the known species occurrence points and the species probability distribution diagram, and generating a species binary distribution diagram;
step 6: and cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map.
Further, in the above method for processing data on distribution of a multi-source species, the method for generating the prior distribution range diagram is as follows: and (3) solving a convex polygon with the minimum area for the species occurrence point data by using a minimum convex hull algorithm, so that the polygon contains all known species occurrence points, and combining the species area data and the minimum convex polygon of the occurrence points to form a prior distribution range diagram of the species.
Further, in the above multi-source species distribution data processing method, the generation method of the distribution restriction map is: and converting the ecological limitation data into corresponding spatial data, thereby forming a distribution limitation area map.
Further, in the above multi-source species distribution data processing method, the generation method of the distribution preference area is as follows: and converting the habitat preference data into corresponding spatial data to form a distribution preference area map.
Further, in the method for processing data of multi-source species distribution as described above, the obtaining of the species distribution map in step 6 includes:
firstly, cutting a binary distribution map by using a prior distribution range map;
secondly, erasing the cut binary distribution diagram by using a species distribution limiting area diagram;
and finally, cutting the erased binary distribution diagram by using the species distribution preference area diagram to finally obtain the species distribution diagram.
A multi-source species distribution data processing apparatus comprising:
the data preprocessing module is used for collecting environmental data of the material distribution area and determining an environmental data layer corresponding to each species according to conditions required by species survival;
the environment data comprises four types of data of climate, terrain, ground surface coverage and artificial activities;
the data collection module is used for collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data and species descriptive data;
the species area data comprises: expert profiles, local profiles; the species spotting data includes: species occurrence data, species nonoccurrence data; the species descriptive data include: ecological limitation data, habitat preference data;
the data processing module is used for generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the species area data, the species point data and the species descriptive data;
the model calling module is used for calculating a distribution model by taking the species point data and the environment data layers as input data of a species distribution model respectively to obtain a species probability distribution map;
a threshold determining module, configured to determine a species distribution threshold according to the prior distribution range map, the known species occurrence points, and the species probability distribution map, and generate a species binary distribution map;
and the cutting module is used for cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map.
Further, in the multi-source species distribution data processing apparatus, the method for generating the prior distribution range map includes: and (3) solving a convex polygon with the minimum area for the species occurrence point data by using a minimum convex hull algorithm, so that the polygon contains all known species occurrence points, and combining the species area data and the minimum convex polygon of the occurrence points to form a prior distribution range diagram of the species.
Further, in the multi-source species distribution data processing apparatus as described above, the generation method of the distribution restriction map is: and converting the ecological limitation data into corresponding spatial data, thereby forming a distribution limitation area map.
Further, in the multi-source species distribution data processing apparatus as described above, the generation method of the distribution preference map is: and converting the habitat preference data into corresponding spatial data to form a distribution preference area map.
Further, the multi-source species distribution data processing apparatus as described above, the cropping module comprising:
the first cutting unit is used for cutting the binary distribution map by utilizing the prior distribution range map;
the erasing unit is used for erasing the cut binary distribution map by utilizing the species distribution limiting area map;
and the second cutting unit is used for cutting the erased binary distribution diagram by using the species distribution preference area diagram to finally obtain the species distribution diagram.
Has the advantages that:
(1) the species distribution threshold determining method based on complementary advantages among multi-source species distribution data solves the problem of conversion from a species probability graph to a species distribution binary graph. The method utilizes prior distribution range data of a single species to carry out binarization analysis on a species distribution probability map obtained by calculating a species distribution model, and the method has the minimum total area on the premise of ensuring that more than 75% of the predicted species distribution range is located in a prior distribution area and contains more than 90% of known distribution points of the species, and takes the corresponding threshold value at the moment as the threshold value of species binarization distribution. The method has universality and corresponding biological significance.
(2) The method provided by the invention solves the problem of conversion from the species distribution binary image to the species distribution image. In reality, the basic research depths of different species are different, and the data about species distribution limitation or preference are also different.
Drawings
FIG. 1 is a flow chart of a multi-source species distribution data processing method of the present invention;
FIG. 2 is a schematic diagram of a multi-source species distribution data processing apparatus according to the present invention;
FIG. 3 is a graph of climate class (BIO1-BIO19) environmental background data;
FIG. 4 is a diagram of terrain, surface coverage, and artificial activity type environmental background data; wherein, 1 is an elevation environment background data graph, 2 is a slope environment background data graph, 3 is a slope environment background data graph, 4 is a landform type environment background data graph, 5 is a topographic relief degree environment background data graph, 6 is a topographic roughness environment background data graph, 7 is a human activity footprint index environment background data graph, 8 is a land cover type environment background data graph, and 9 is a land utilization type environment background data graph;
FIG. 5 is a graph of data relating to the distribution of the golden monkey; wherein, 1 is a Yunnan golden monkey expert distribution diagram; 2 is a local (protective area) distribution diagram of the golden monkey; 3 is a distribution point diagram of the Yunnan golden monkey;
FIG. 6 is a drawing of the data integration of the distribution of the golden monkey; wherein, 1 is a prior distribution area diagram of the golden monkey; 2 is a map of elevation distribution restricted area of the golden monkey; 3 is a preferred and rejected habitat type diagram of the Yunnan golden monkey;
FIG. 7 is a graph of probability distribution of the golden monkey;
FIG. 8-1 is a binary distribution diagram of the Yunnan golden monkey under different probability thresholds;
FIG. 8-2 is a graph showing the ratio of the distribution area of the species inside and outside the threshold value and the prior region;
8-3 are binary profiles obtained with an optimum threshold of 0.65;
FIG. 9-1 is a tailored distribution area of the prior distribution area;
FIG. 9-2 is a cropped distribution area of the prior distribution area and the elevation limit area;
FIG. 9-3 is a prior distribution area, an altitude restricted area, and a habitat type tailored distribution area;
fig. 10 is an overlay of the predicted species distribution area and TNC field survey distribution area.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described clearly and completely below, and it is obvious that the described embodiments are some, not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention provides a method for processing data of multi-source species distribution, which comprises the following steps:
step 1: collecting environmental data of the species distribution area, and determining an environmental data map layer corresponding to each species according to conditions required by species survival; the environmental data comprises four kinds of data of climate, terrain, ground surface coverage and human activities.
Specifically, environmental background data of a research area is obtained, wherein the environmental background data comprises four types of data of Climate (BIO1-BIO19, and relevant temperature data are derived from Worldclim-Global Climate data (http:// www.worldclim.org /), terrain (altitude), landform type, terrain roughness), ground surface coverage (ground surface coverage type) and human activities (human activity intensity). The environment background data corresponding to each different species is selected and combined from the four types of data, and the selection basis is to select and combine the four types of data according to the characteristics of the species. Specifically, the environment layer corresponding to each species may be determined according to a condition required for survival of the species, for example, the requirement for survival of the amphibious species a on temperature and humidity is relatively high, the environment layer corresponding to the species a may include a temperature layer and a humidity layer, and meanwhile, the environment layer corresponding to each species may further include any other environment layer, such as an altitude layer, a topographic layer, a land utilization type layer, a human activity intensity layer, and the like, without being limited to a reference condition required for survival of the species, which is not limited by the present invention. The environmental layers used by most species are the same, while there are still a few species with special requirements for certain survival conditions, and therefore, the invention contemplates the inclusion of such special environmental layers.
Step 2: collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data, and species descriptive data.
Specifically, related data of species distribution is acquired and stored by classification. The species area data comprises an expert range map (expert range map) and a local range map (local range map); the species punctuality data comprises species occurrence data (occurence data) and species non-occurrence data (absence data); the species descriptive data includes a restriction factor (restrictive factor), a preference factor (reference factor).
Sources of species multi-source data include: carrying out field investigation on the obtained species distribution information; information published or updated in the species database; information published or updated in a website; information published or updated in the literature; species distribution information of a specimen museum; information published or updated in the personal home page of the biological enthusiast.
And step 3: and integrating the distribution data of the multi-source species. And generating a prior distribution range map, a distribution limiting area map and a distribution preference area map of the species according to the species area data, the species point data and the species descriptive data.
Specifically, a convex polygon with the smallest area is obtained by a minimum convex hull algorithm on the species occurrence point data, so that the polygon contains all the known species occurrence points. Combining the species area data (namely an expert range and a local range) and the minimum convex polygon of the occurrence point to form a prior distribution range diagram of the species;
converting the distribution limitation data in the species descriptive data into corresponding spatial data to form a distribution limitation area map (if the distribution range of the altitude is limited, screening the spatial range data of a specified altitude area from the digital elevation model);
and converting the distribution preference data in the species description data into corresponding spatial data to form a distribution preference area map (if the selection preference of the habitat type is adopted, spatial range data of a specified habitat type is screened from the habitat type data).
And 4, step 4: and respectively taking the species point data and the environment data layer as input data of a species distribution model, and calculating the distribution model to obtain a species probability distribution map.
Specifically, the species distribution point recorded data and the environment data map layer are respectively used as input data of a species distribution model, and the distribution model is calculated to obtain a species probability distribution map;
the commonly used species distribution model algorithms include Maximum Entropy model (MaxEnt), Random Forest (RF), Generalized Linear Model (GLM), etc., and according to related research, the input data of these different models are the same (species occurrence point data + environmental background data map layer), but the algorithms have no essential difference. Therefore, the present invention does not limit the species distribution model algorithm used for the species distribution simulation.
The species probability distribution map reflects the preference degree of the species to the habitat in the form of probability, and the result can be interpreted as the occurrence probability of the species or the habitat suitability and the like and stored in an ASC format.
And 5: and determining a species distribution threshold according to the prior distribution range map, the known species occurrence points and the species probability distribution map, and generating a species binary distribution map.
Specifically, assuming that the species distribution threshold is N, N takes values from 0 to 1, and the step length is 0.01, the number of pixels taking a value of 1 in the prior distribution range and the number of pixels taking a value of 1 outside the prior distribution range in the generated binary distribution map under each N value are respectively calculated, and the number of points where the pixels where the known species occurrence points are located take values of 1 is counted at the same time. (the pixel value is 1, the distribution probability value of the pixel is larger than the current distribution threshold value N, and the pixel is considered to have species distribution)
Setting the number of pixels with the value of 1 in the prior distribution range to be more than 3 times of the number of pixels with the value of 1 outside the prior distribution range (namely more than 75% of binary distribution areas with the value of 1 are positioned in the prior distribution areas), and setting the corresponding distribution probability threshold as the optimal distribution probability threshold of the species when 90% of the known species occurrence points are positioned in the binary distribution areas with the value of 1. If the model which can not satisfy the two conditions at the same time is considered to have insufficient model precision, the prediction fails.
Step 6: and cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map.
Specifically, first, a species binary distribution region is clipped by using a prior distribution region. Secondly, erasing is carried out by utilizing the species distribution limiting range layer. And finally, cutting by using the species habitat preference area to finally obtain a species distribution diagram.
As shown in fig. 2, the present invention further provides a multi-source species distribution data processing apparatus, comprising:
the data preprocessing module is used for collecting environmental data of the material distribution area and determining an environmental data layer corresponding to each species according to conditions required by species survival;
the environment data comprises four types of data of climate, terrain, ground surface coverage and artificial activities;
the data collection module is used for collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data and species descriptive data;
the species area data comprises: expert profiles, local profiles; the species spotting data includes: species occurrence data, species nonoccurrence data; the species descriptive data include: ecological limitation data, habitat preference data;
the data processing module is used for generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the species area data, the species point data and the species descriptive data;
the model calling module is used for calculating a distribution model by taking the species point data and the environment data layers as input data of a species distribution model respectively to obtain a species probability distribution map;
a threshold determining module, configured to determine a species distribution threshold according to the prior distribution range map, the known species occurrence points, and the species probability distribution map, and generate a species binary distribution map;
and the cutting module is used for cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map.
Further, the clipping module as described above includes:
the first cutting unit is used for cutting the binary distribution map by utilizing the prior distribution range map;
the erasing unit is used for erasing the cut binary distribution map by utilizing the species distribution limiting area map;
and the second cutting unit is used for cutting the erased binary distribution diagram by using the species distribution preference area diagram to finally obtain the species distribution diagram.
Example (b):
the contents of the present invention are illustrated by taking Yunnan province as a research area and the prediction of the spatial distribution of the golden monkey (Rhinopithicus bieti) as an example.
(1) And collecting environmental data of the object distribution area, wherein the environmental data comprises four types of data of climate, terrain, ground surface coverage and artificial activities.
And collecting the spatial data of climate, terrain, ground surface coverage and artificial activities of Yunnan province. FIG. 3 shows the temperature data (Worldclim-Global simulation data (http:// www.worldclim.org /)) associated with BIO1-BIO 19; in fig. 4, 1-9 are elevation, gradient, slope, landform type, relief, roughness, human activity footprint index, land cover type, land use type, respectively.
All data are converted into asc file formats, cutting is carried out by Yunnan province boundaries to unify spatial ranges, the data uniformly adopt a CGCS2000_3_ Degree _ GK _ CM _99E projection coordinate system, the spatial resolution is 1km, and the data are placed under the same folder.
(2) Collecting species multi-source distribution data, including species occurrence/non-occurrence points, expert distribution maps, local distribution maps, ecological limitation data and habitat preference data.
Collecting relevant data of the distribution of the golden monkey, as shown in fig. 5, 1-3 are respectively a golden monkey expert distribution diagram (data from world natural protection alliance, International Union for Conservation of nature, IUCN), a local distribution diagram (main distribution area of the golden monkey: national natural protection area of snow mountain of white horse) and species distribution point data (data from Biodiversity Information agency, Global biological Information Facility, GBIF); in addition, the descriptive data of the Yunnan golden monkey in altitude selection (2500-.
Arranging the distribution points of the golden monkey into a data format suitable for the species distribution model to call, establishing an EXCLE data file (CSV format, hereinafter referred to as 'CSV file'), wherein the first line in the CSV file is used as a header, each next line is used as a species distribution point record, and each species distribution point record is recorded according to three column formats of 'species name-longitude-latitude'.
(3) And integrating the distribution data of the multi-source species. And generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the multi-source data.
And taking the species distribution points as input, and obtaining a convex polygon with the smallest area and containing all the known species generation points by using a minimum convex hull algorithm. And (3) carrying out layer merging on the convex polygon, the Yunnan golden monkey expert distribution diagram and the local distribution diagram to obtain a prior distribution area of the species (namely, the Yunnan golden monkeys are distributed in the range according to multi-source data), as shown in 1 in figure 6.
According to the descriptive data of the species basic research of the Yunnan golden monkey, the Yunnan golden monkey is distributed in the elevation interval of 2500-. The altitude data of the area is screened out by using a Digital Elevation Model (DEM) of the research area as shown in 2 in FIG. 6 (the black area is 2500-. According to descriptive data of species basic research of the Yunnan golden monkey, the Yunnan golden monkey tends to select land utilization types with high human activity interference intensity far away from agricultural land, construction land and the like. Using the research area land use type data, the Yunnan golden monkey was screened for preferred and rejected habitat types, as shown in 3 in figure 6. (Black areas are habitats for preference, gray areas are areas for rejection)
(4) And (3) performing distribution simulation by using a species distribution model based on the data in the step (1) and the step (2) to obtain a species probability distribution map.
And (3) calling a species distribution model to perform distribution prediction by using the environmental background data set obtained in the step (1) and the species distribution point CSV file generated in the step (2), so as to obtain a probability distribution diagram of the cynomolgus monkey, as shown in FIG. 7. From white to black, the distribution probability takes values from 0 to 1.
(5) Determining a species distribution threshold value according to the species prior distribution range diagram obtained in the step (3), the known species occurrence points and the species probability distribution diagram obtained in the step (4), and generating a species binary distribution diagram.
Assuming that the species distribution threshold is N, N takes values of 0 to 1, and the step length is 0.01, respectively calculating the number of pixels taking a value of 1 in the prior distribution range and the number of pixels taking a value of 1 outside the prior distribution area in the generated binary distribution map under each N value, and meanwhile, counting the number of points where the pixels where the known species occurrence points are located take a value of 1. (a pixel value of 1 indicates that the distribution probability value of the pixel is greater than the current distribution threshold value N, and the pixel is considered to have species distribution). The binary profiles obtained by way of example with different thresholds are shown in fig. 8-1.
Setting the number of pixels with the value of 1 in the prior distribution range to be more than 3 times of the number of pixels with the value of 1 outside the prior distribution range (namely more than 75% of binary distribution areas with the value of 1 are positioned in the prior distribution areas), and setting the corresponding distribution probability threshold as the optimal distribution probability threshold of the species when 90% of the known species occurrence points are positioned in the binary distribution areas with the value of 1. As shown in fig. 8-2.
According to two conditions that the area ratio of the binary distribution area with the value of 1 inside and outside the prior distribution area is more than or equal to 3 and 90% of the occurrence points of the known species are located in the binary distribution area with the value of 1, the distribution threshold of the Yunnan golden monkey is determined to be 0.65, and the obtained binary distribution graph is shown in fig. 8-3.
(6) And (4) cutting or erasing the binary distribution diagram obtained in the step (5) according to the prior distribution diagram, the species distribution limiting range and the distribution preference range diagram obtained in the step (3) to obtain a species distribution diagram.
Firstly, a prior distribution area is utilized to cut a species binary distribution area, as shown in fig. 9-1, the total area of the cut predicted distribution area is 4474km2. Secondly, the area outside the altitude area is erased by using the altitude limit (2500-2. Finally, the land type areas such as farmlands, construction land, large water bodies and the like are erased by utilizing the preference of species of the Yunnan golden monkey in habitat selection, and the distribution diagram of the Yunnan golden monkey is finally obtained, as shown in fig. 9-3, and the total area of the distribution area is predicted to be 3955km 2.
Comparing experimental results with true distribution values
The data of The field survey of The distribution area of The golden monkey made by The Nature conservation association (The Nature consorvany, TNC) in 2004 is a true value, The true value of The distribution is superposed with The distribution area of The golden monkey predicted by The scheme, and as shown in fig. 10, The result of The prediction of The scheme is shown to be generally consistent with The distribution of The golden monkey obtained by The TNC field survey.
The main parameters are as follows:
TNC2004 survey area: 867.15km2
The scheme predicts the area of the distribution area: 3955km2
True positive rate (rate of correct prediction of survey distribution area): 78 percent of
Analysis of Experimental results
The experimental results of the invention were analyzed. The results show that: (1) the threshold determination method provided by the invention provides a new scheme with biological significance for determining the distribution threshold of a single species, and the prediction result that more than 75% of distribution areas are located in the prior distribution area and more than 90% of known species generation points are located in the distribution areas can be screened out by the method. (2) The multi-source data processing method provided by the scheme improves the accuracy of species distribution by utilizing the complementary advantages among data. Firstly, a prior distribution area is used for cutting a species binary distribution area, so that the predicted Yunnan golden monkey distribution area is strictly limited in a narrow and long zone between the lancang river and the Jinsha river, the cognition of people on the Yunnan golden monkey distribution area at present is completely met, and the prediction effect cannot be achieved due to insufficient consideration of biological factors such as the migration capability of species and the like of a traditional species distribution model. Secondly, the altitude limit (2500-.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A multi-source species distribution data processing method is characterized by comprising the following steps:
step 1: collecting environmental data of the species distribution area, and determining an environmental data map layer corresponding to each species according to conditions required by species survival;
the environment data comprises four types of data of climate, terrain, ground surface coverage and artificial activities;
step 2: collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data and species descriptive data;
the species area data comprises: expert profiles, local profiles; the species spotting data includes: species occurrence data, species nonoccurrence data; the species descriptive data include: ecological limitation data, habitat preference data;
and step 3: generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the species area data, the species point data and the species descriptive data;
and 4, step 4: respectively taking the species point data and the environment data layer as input data of a species distribution model, and calculating the distribution model to obtain a species probability distribution map;
and 5: determining a species distribution threshold according to the prior distribution range diagram, the known species occurrence points and the species probability distribution diagram, and generating a species binary distribution diagram;
step 6: cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map;
the generation method of the prior distribution range diagram comprises the following steps: and (3) solving a convex polygon with the minimum area for the species occurrence point data by using a minimum convex hull algorithm, so that the polygon contains all known species occurrence points, and combining the species area data and the minimum convex polygon of the occurrence points to form a prior distribution range diagram of the species.
2. The multi-source species distribution data processing method according to claim 1, wherein the distribution restriction map is generated by: and converting the ecological limitation data into corresponding spatial data, thereby forming a distribution limitation area map.
3. The multi-source species distribution data processing method according to claim 1, wherein the distribution preference map is generated by: and converting the habitat preference data into corresponding spatial data to form a distribution preference area map.
4. The method for processing data on distribution of multi-source species according to claim 1, wherein the obtaining of the species distribution map in step 6 comprises:
firstly, cutting a binary distribution map by using a prior distribution range map;
secondly, erasing the cut binary distribution diagram by using a species distribution limiting area diagram;
and finally, cutting the erased binary distribution diagram by using the species distribution preference area diagram to finally obtain the species distribution diagram.
5. A multi-source species distribution data processing apparatus, comprising:
the data preprocessing module is used for collecting environmental data of the material distribution area and determining an environmental data layer corresponding to each species according to conditions required by species survival;
the environment data comprises four types of data of climate, terrain, ground surface coverage and artificial activities;
the data collection module is used for collecting species multi-source distribution data; the species multi-source distribution data comprises: species area data, species point data and species descriptive data;
the species area data comprises: expert profiles, local profiles; the species spotting data includes: species occurrence data, species nonoccurrence data; the species descriptive data include: ecological limitation data, habitat preference data;
the data processing module is used for generating a prior distribution range diagram, a distribution limiting area diagram and a distribution preference area diagram of the species according to the species area data, the species point data and the species descriptive data;
the model calling module is used for calculating a distribution model by taking the species point data and the environment data layers as input data of a species distribution model respectively to obtain a species probability distribution map;
a threshold determining module, configured to determine a species distribution threshold according to the prior distribution range map, the known species occurrence points, and the species probability distribution map, and generate a species binary distribution map;
the cutting module is used for cutting or erasing the binary distribution map according to the prior distribution range map, the distribution limiting area map and the distribution preference area map to obtain a species distribution map;
the generation method of the prior distribution range diagram comprises the following steps: and (3) solving a convex polygon with the minimum area for the species occurrence point data by using a minimum convex hull algorithm, so that the polygon contains all known species occurrence points, and combining the species area data and the minimum convex polygon of the occurrence points to form a prior distribution range diagram of the species.
6. The multi-source species distribution data processing apparatus of claim 5, wherein the distribution restriction map is generated by: and converting the ecological limitation data into corresponding spatial data, thereby forming a distribution limitation area map.
7. The multi-source species distribution data processing apparatus of claim 5, wherein the distribution preference map is generated by: and converting the habitat preference data into corresponding spatial data to form a distribution preference area map.
8. The multi-source species distribution data processing apparatus of claim 5, wherein the cropping module comprises:
the first cutting unit is used for cutting the binary distribution map by utilizing the prior distribution range map;
the erasing unit is used for erasing the cut binary distribution map by utilizing the species distribution limiting area map;
and the second cutting unit is used for cutting the erased binary distribution diagram by using the species distribution preference area diagram to finally obtain the species distribution diagram.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110410212.7A CN113220810B (en) | 2021-04-16 | 2021-04-16 | Multi-source species distribution data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110410212.7A CN113220810B (en) | 2021-04-16 | 2021-04-16 | Multi-source species distribution data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113220810A CN113220810A (en) | 2021-08-06 |
CN113220810B true CN113220810B (en) | 2022-02-18 |
Family
ID=77087845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110410212.7A Active CN113220810B (en) | 2021-04-16 | 2021-04-16 | Multi-source species distribution data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113220810B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779167B (en) * | 2021-08-20 | 2023-11-07 | 北京百度网讯科技有限公司 | Map data processing method, device, equipment and storage medium |
CN113793405A (en) * | 2021-09-15 | 2021-12-14 | 杭州睿胜软件有限公司 | Method, computer system and storage medium for presenting distribution of plants |
CN116304991B (en) * | 2023-05-16 | 2023-08-08 | 广东省科学院广州地理研究所 | Multi-source heterogeneous species distribution data fusion method and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5290358A (en) * | 1992-09-30 | 1994-03-01 | International Business Machines Corporation | Apparatus for directional low pressure chemical vapor deposition (DLPCVD) |
CN106294289A (en) * | 2016-08-25 | 2017-01-04 | 环境保护部南京环境科学研究所 | A kind of ecological red line demarcation method protecting animal |
CN108062408A (en) * | 2017-12-28 | 2018-05-22 | 闻丞 | A kind of data processing method and device |
CN109145072A (en) * | 2018-08-10 | 2019-01-04 | 中国农业科学院农业资源与农业区划研究所 | A kind of grassland biomass remote sensing monitoring partition method and device |
CN110991562A (en) * | 2019-12-23 | 2020-04-10 | 南京大学 | Animal group geographical zoning method based on species composition characteristics |
CN111125285A (en) * | 2019-12-25 | 2020-05-08 | 南京大学 | Animal geographic zoning method based on species spatial distribution relation |
CN111164224A (en) * | 2017-09-14 | 2020-05-15 | 普梭梅根公司 | Index of importance related to microorganism |
CN111444824A (en) * | 2020-03-24 | 2020-07-24 | 北京大学深圳研究生院 | Vegetation spatial distribution pattern investigation method and vegetation classification method based on unmanned aerial vehicle technology |
CN111815102A (en) * | 2020-04-15 | 2020-10-23 | 中国环境科学研究院 | Comprehensive biodiversity investigation sampling method based on space technology |
CN112348086A (en) * | 2020-11-06 | 2021-02-09 | 中国科学院西北生态环境资源研究院 | Species habitat quality simulation method based on multi-source data |
US10947596B2 (en) * | 2014-05-06 | 2021-03-16 | Dana-Farber Cancer Institute, Inc. | Compositions and methods for identification, assessment, prevention, and treatment of cancer using NFS1 biomarkers and modulators |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103063202B (en) * | 2012-12-30 | 2015-04-15 | 同济大学 | Cyanobacteria biomass spatial-temporal change monitoring and visualization method based on remote sensing image |
US20160271487A1 (en) * | 2015-03-17 | 2016-09-22 | Donald Wayne Crouse | Method of Game Play Providing For the Equitable Collecting of Certain Game Components |
AU2017203723A1 (en) * | 2016-06-07 | 2017-12-21 | David Nixon | Meeting management system and process |
-
2021
- 2021-04-16 CN CN202110410212.7A patent/CN113220810B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5290358A (en) * | 1992-09-30 | 1994-03-01 | International Business Machines Corporation | Apparatus for directional low pressure chemical vapor deposition (DLPCVD) |
US10947596B2 (en) * | 2014-05-06 | 2021-03-16 | Dana-Farber Cancer Institute, Inc. | Compositions and methods for identification, assessment, prevention, and treatment of cancer using NFS1 biomarkers and modulators |
CN106294289A (en) * | 2016-08-25 | 2017-01-04 | 环境保护部南京环境科学研究所 | A kind of ecological red line demarcation method protecting animal |
CN111164224A (en) * | 2017-09-14 | 2020-05-15 | 普梭梅根公司 | Index of importance related to microorganism |
CN108062408A (en) * | 2017-12-28 | 2018-05-22 | 闻丞 | A kind of data processing method and device |
CN109145072A (en) * | 2018-08-10 | 2019-01-04 | 中国农业科学院农业资源与农业区划研究所 | A kind of grassland biomass remote sensing monitoring partition method and device |
CN110991562A (en) * | 2019-12-23 | 2020-04-10 | 南京大学 | Animal group geographical zoning method based on species composition characteristics |
CN111125285A (en) * | 2019-12-25 | 2020-05-08 | 南京大学 | Animal geographic zoning method based on species spatial distribution relation |
CN111444824A (en) * | 2020-03-24 | 2020-07-24 | 北京大学深圳研究生院 | Vegetation spatial distribution pattern investigation method and vegetation classification method based on unmanned aerial vehicle technology |
CN111815102A (en) * | 2020-04-15 | 2020-10-23 | 中国环境科学研究院 | Comprehensive biodiversity investigation sampling method based on space technology |
CN112348086A (en) * | 2020-11-06 | 2021-02-09 | 中国科学院西北生态环境资源研究院 | Species habitat quality simulation method based on multi-source data |
Non-Patent Citations (2)
Title |
---|
"Uniformity of organellar DNA in Aldrovanda vesiculosa, an endangered aquatic";Hosam O.M. Elansary 等;《Aquatic Botany》;20101231;第214-220页 * |
"基于RS和GIS森林资源分布图的制作方法";牛娟;《电子技术与软件工程》;20071025;第102-103页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113220810A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113220810B (en) | Multi-source species distribution data processing method and device | |
CN112819319B (en) | Method for measuring correlation between urban vitality and spatial social characteristics and application | |
García-Comas et al. | Zooplankton long-term changes in the NW Mediterranean Sea: Decadal periodicity forced by winter hydrographic conditions related to large-scale atmospheric changes? | |
Carretero et al. | Evaluating how species niche modelling is affected by partial distributions with an empirical case | |
Wen et al. | MODIS NDVI based metrics improve habitat suitability modelling in fragmented patchy floodplains | |
Zacharias et al. | Large scale characterization of intertidal communities using a predictive model | |
Man et al. | Can high-resolution topography and forest canopy structure substitute microclimate measurements? Bryophytes say no | |
CN115630567A (en) | Coastal zone soil organic carbon reserve simulation and prediction method | |
Philippe et al. | Assessing Forest cover change and deforestation hot-spots in the north Kivu Province, DR-Congo using remote sensing and GIS | |
He et al. | Comparative performance of the LUR, ANN, and BME techniques in the multiscale spatiotemporal mapping of PM 2.5 concentrations in North China | |
CN115937692B (en) | Coastal wetland carbon sink effect evaluation method and system | |
Paul | Analysis of land use and land cover change in Kiskatinaw River Watershed: A remote sensing, gis & modeling approach. | |
Olivier et al. | GIS-based application of resource selection functions to the prediction of snow petrel distribution and abundance in East Antarctica: comparing models at multiple scales | |
Roncoroni et al. | Centimeter-scale mapping of phototrophic biofilms in glacial forefields using visible band ratios and UAV imagery | |
Pélissié et al. | Pushed northward by climate change: Range shifts with a chance of co-occurrence reshuffling in the forecast for northern European odonates | |
CN114118511B (en) | Large-area multi-star combined coverage effectiveness evaluation method based on cloud cover prediction information | |
Amici et al. | Effects of an afforestation process on plant species richness: A retrogressive analysis | |
Gianinetto et al. | Rapid response flood assessment using minimum noise fraction and composed spline interpolation | |
Shen | Multi-layer perceptron-markov chain based geospatial analysis of land use and land cover change: A case study of Stoney Creek Watershed, BC, Canada | |
Ozaki | Chasing rays: Distribution and habitat-use of mobulid rays in the northeastern continental shelf of Aotearoa New Zealand | |
Cavalcante et al. | Modelling 21st century refugia and impact of climate change on Amazonia's largest primates | |
CN113392585B (en) | Method for spatialization of sensitive crowd around polluted land | |
Handayani et al. | Modelling of land use change in indramayu district, west Java Province | |
Godfrey | Spatial Distribution of the Endangered Pacific Pocket Mouse (Perognathus longimembrus ssp. pacificus) Within Coastal Sage Scrub Habitat at Dana Point Headlands Conservation Area | |
Xi | Modeling urban expansion in Vietnam using time series Landsat |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |