WO2023168781A1 - Soil cadmium risk prediction method based on spatial-temporal interaction relationship - Google Patents

Soil cadmium risk prediction method based on spatial-temporal interaction relationship Download PDF

Info

Publication number
WO2023168781A1
WO2023168781A1 PCT/CN2022/086484 CN2022086484W WO2023168781A1 WO 2023168781 A1 WO2023168781 A1 WO 2023168781A1 CN 2022086484 W CN2022086484 W CN 2022086484W WO 2023168781 A1 WO2023168781 A1 WO 2023168781A1
Authority
WO
WIPO (PCT)
Prior art keywords
soil
risk
polluting
data
enterprise
Prior art date
Application number
PCT/CN2022/086484
Other languages
French (fr)
Chinese (zh)
Inventor
史舟
江叶枫
陈颂超
周炼清
贾晓琳
伍温强
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2023168781A1 publication Critical patent/WO2023168781A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Definitions

  • the invention belongs to the field of soil cadmium (Cd) risk prediction, and specifically relates to a soil Cd risk prediction method based on spatiotemporal interaction.
  • Cd soil cadmium
  • the purpose of the present invention is to solve the problems existing in the prior art and provide a soil Cd risk prediction method based on spatiotemporal interaction.
  • a soil cadmium risk prediction method based on spatiotemporal interaction the steps are as follows:
  • the data set related to soil Cd pollution includes soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images related to soil Cd pollution, as well as soil in different periods.
  • Sampling point data among them, soil-related enterprise data includes the name of the enterprise related to soil Cd pollution and the city where it belongs; POI data includes the enterprise name and its longitude and latitude information; historical industrial enterprise data includes the enterprise name and the year of production activities; soil sampling Point data include soil Cd concentration and soil pH data at different sampling points.
  • the method for obtaining the spatial point distribution data of the soil Cd pollution enterprises is:
  • the company name is segmented using the word segmentation engine jieba, and is divided into entities at four levels: administrative division, font size, industry, and organizational form. Then similarity matching is performed by calculating edit distance. If two companies If the four levels of entities in the names match exactly, the two company names are deemed to match.
  • the image semantic segmentation model based on deep learning is the U-Net convolutional neural network model; when extracting building features from high-resolution remote sensing images, the longitude and latitude of polluting enterprises in different historical periods are used and the information on the corresponding remote sensing images to obtain the corresponding remote sensing images around the point; perform data preprocessing on the acquired remote sensing images, generate corresponding annotation data, and use the U-Net convolutional neural network model for training and modeling; and then use The trained U-Net convolutional neural network model performs image segmentation on high-resolution remote sensing images. Each pixel in the returned result map has two categories: building and non-building; the returned result map belonging to the same production year is compared with Compare the data of polluting enterprises.
  • the specific method of S3 is as follows: for each historical period, after dividing the soil Cd risk level, according to the boundary of the target prediction area, obtain the range enclosed by its minimum circumscribed rectangle; then use the minimum Starting from a certain vertex of the circumscribed rectangle, a grid and corresponding grid points are generated; the grid points are used to extract the nuclear density and soil Cd risk classification values of polluting enterprises in different historical periods, which respectively represent the aggregation degree and concentration of polluting enterprises in different historical periods in the boundary. Soil Cd risk level; perform bivariate local Moran analysis on the grid points with soil Cd risk level and polluting enterprise density value, and obtain the spatio-temporal interaction relationship between the two.
  • the specific formula is as follows:
  • Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods on grid i and grid j.
  • Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods in the grid;
  • W ij is the spatial weight matrix of grid i and grid j, which is obtained based on the Euclidean distance weight between grid i and grid j;
  • I ab represents the local invariance between the a attribute at grid i and the b attribute at grid j.
  • the soil Cd risk area is divided into three risk control areas: high, medium and low, and risk uncertainty areas.
  • the receptor vulnerability is population density.
  • the patch generation land use simulation model extracts the changing characteristics of soil Cd risk areas at each level between historical periods, and uses a random forest algorithm to mine the influencing factors of each soil Cd risk evolution one by one. Obtain the change occurrence probability of soil Cd risk areas at each level and the contribution of each soil Cd risk evolution influencing factor to the change characteristics. Finally, under the constraints of the change occurrence probability, based on the latest soil Cd risk evolution influencing factors, achieve the target prediction area Prediction of future soil Cd risk areas in China.
  • the specific steps for the patch generation land use simulation model to predict future soil Cd risk areas in the target prediction area are as follows: First, perform overlay analysis on the soil Cd risk areas of each level in the two historical periods, and extract Change characteristics of each grade of soil Cd risk area; secondly, based on the random forest model, establish the occurrence probability of change in each grade of soil Cd risk area, and calculate the contribution rate of each soil Cd risk evolution influencing factor to each land use change, The occurrence probability is used to estimate the dynamic change trend of each level of soil Cd risk area in the grid unit; finally, the future level of soil Cd risk area is predicted by integrating the time series of soil Cd risk area data and the combined probability generated by the model, where The following formula estimates the combined probability that grid cell p will be occupied by a soil Cd risk area of level k in the future:
  • the present invention has the following beneficial effects:
  • This invention is based on the public information of soil-related enterprises and at the same time matches the data of pollution-related enterprises investigated and disclosed by the local government to obtain the data of soil-related enterprises related to pollution. There is no need to classify the enterprise data and determine whether the enterprise is a polluting enterprise in the future. Data cleaning and modeling are directly performed based on the data of polluting enterprises disclosed by the government where the target prediction area is located; in addition, the present invention obtains the density value based on the kernel density analysis of polluting enterprises in different periods, and conducts bi-variable local modeling with the soil Cd risk level in the corresponding period.
  • Lan analysis analyzes the spatiotemporal interaction between the two, so that the discrete Cd pollution point data and polluting enterprise point data can more accurately explore the spatiotemporal interaction; finally, through the soil Cd risk zoning characteristics and the influencing factors of the risk zoning evolution in different periods , using a patch generation land use simulation model to model soil Cd risk zoning in different periods at present, and predict future soil Cd risk zoning characteristics, expanding the existing soil Cd zoning methods and ideas, and providing further guidance for the further management and prevention of soil Cd. Pollution has important theoretical and practical significance, and has the value of promotion and application.
  • Figure 1 is a partial display of the results in the embodiment, including (a) soil Cd risk level zoning in 2002, (b) population density zoning in 2002, (c) polluting enterprise density distribution in 2002, (d) soil Cd in 2002 Local Moran spatial interactions with polluting companies;
  • Figure 2 is a partial display of the results in the embodiment, including (a) soil Cd risk level zoning in 2012, (b) population density zoning in 2012, (c) polluting enterprise density distribution in 2012, (d) soil Cd in 2012 Local Moran spatial interactions with polluting companies;
  • Figure 3 is a partial display of the results in the embodiment, including (a) schematic diagram of soil Cd risk management zoning in 2002, (b) schematic diagram of soil Cd risk management zoning in 2012, (c) patch generation land use simulation model predicted in 2012 Schematic diagram of soil Cd risk management zoning, (d) Schematic diagram of soil Cd risk management zoning in 2022 predicted by the patch generation land use simulation model;
  • Figure 4 is the model accuracy of the patch generation land use simulation model in the embodiment to predict the risk zoning in 2012 based on the 2002 risk zoning.
  • This invention uses a similarity matching algorithm based on multi-source data to spatially identify polluting enterprises. It mainly establishes a similarity calculation method for data matching based on multi-source enterprise data. At the same time, it uses the U-Net model to establish remote sensing verification for identifying polluting enterprises.
  • a soil Cd risk prediction method based on spatiotemporal interaction is provided.
  • the specific implementation steps are as follows:
  • Obtain data sets related to soil Cd pollution including: soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images related to soil Cd pollution, as well as soil sampling point data in different periods; among them, involving Soil enterprise data includes the names of enterprises related to soil Cd pollution and the cities where they belong.
  • AutoNavi POI data includes enterprise names and their longitude and latitude information; historical industrial enterprise data includes enterprise names and years of production activities; soil sampling point data includes different sampling Soil Cd concentration and soil pH data at points.
  • pollution-related soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images can be obtained using the http protocol and a public web page.
  • the POI data uses Amap POI data; soil Sampling points are obtained from field sampling, which can be queried from soil census data or obtained through other means. Each sampling point should contain the soil Cd concentration and soil pH at the location of the sampling point. The reason for introducing soil pH is that the Cd risk in the soil is closely related to pH, so soil Cd concentration and soil pH need to be combined to comprehensively determine the soil Cd risk area.
  • AutoNavi POI data and historical industrial enterprise data should try to cover enterprises within the target prediction area, so as to facilitate subsequent matching of more polluting enterprise data as much as possible.
  • the company names in the POI data are segmented separately, entities at different levels in the company names are extracted, and then similarity matching is performed.
  • the soil-related companies that completely match the entities at all levels in the company names are regarded as polluting companies, and the entities in the historical industrial company data are
  • the year of production activities of the polluting enterprise and the geographical location information in the POI data are associated with the polluting enterprise, and the data of polluting enterprises including the geographical location and year of production activities are obtained.
  • lev a,b (i,j) represents the edit distance between the first i characters of string a and the first j characters of string b.
  • the soil Cd concentration risk screening values corresponding to different pH ranges can be determined based on relevant standards and specifications or expert experience.
  • “Soil Environmental Quality - Agricultural Land Soil Pollution Risk Management and Control Standards” GB 15618-2018 is used to determine the soil Cd concentration risk screening values corresponding to different pH ranges, and then based on the soil Cd concentration risk of each point
  • the screening value is used as a benchmark to determine whether each point exceeds the Cd risk screening value, so as to achieve the classification of soil Cd risk levels based on the judgment results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Remote Sensing (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed in the present invention is a soil cadmium (Cd) risk prediction method based on a spatial-temporal interaction relationship, comprising: firstly, obtaining a data set related to soil Cd pollution in a target prediction area in a plurality of different historical periods, and extracting and cross-verifying data to obtain spatial point location distribution data of enterprises subjected to the soil Cd pollution in the target prediction area in different historical periods, as well as a soil Cd concentration and a soil pH at each sampling point in the target prediction area in different historical periods; then, performing bivariate local spatial autocorrelation analysis on a soil Cd risk level and a density value of polluted enterprises in different periods to obtain a spatial-temporal interaction relationship between soil Cd and the polluted enterprises; and finally, identifying a soil Cd risk in the target prediction area in different periods, and predicting a future soil Cd risk category by using a patch-generating land use simulation model. The present invention expands the existing method and thought for soil Cd partitioning, and has important theoretical and practical significance for further management and prevention of soil Cd pollution.

Description

一种基于时空交互关系的土壤镉风险预测方法A soil cadmium risk prediction method based on spatiotemporal interaction 技术领域Technical field
本发明属于土壤镉(Cd)风险预测领域,具体涉及一种基于时空交互关系的土壤Cd风险预测方法。The invention belongs to the field of soil cadmium (Cd) risk prediction, and specifically relates to a soil Cd risk prediction method based on spatiotemporal interaction.
背景技术Background technique
由于污染企业与农用地土壤污染之间的源汇理论关系,越来越多的研究者开始研究土壤重金属与污染企业之间的关系,进而为土壤重金属污染溯源提供证据链。然而,传统的源解析方法,如多元和地统计分析,主成分分析、正定矩阵分解等,在一定程度上都忽略了土壤重金属的时空污染特征,部分地区有可能得出相反的结果,难以有效和精准的指导土壤重金属管理。此外,土壤重金属污染往往与人为活动密切相关,尤其是污染企业,使用传统的源解析方法没有提供源汇之间可靠的时空耦合关系,对于污染的时空变异性也不能很好的解决,这使得目前由污染企业变迁引起的土壤重金属污染防控治理工作变得相当艰难。Due to the source-sink theoretical relationship between polluting enterprises and agricultural land soil pollution, more and more researchers have begun to study the relationship between soil heavy metals and polluting enterprises, thereby providing an evidence chain for traceability of soil heavy metal pollution. However, traditional source analysis methods, such as multivariate and geostatistical analysis, principal component analysis, positive definite matrix decomposition, etc., all ignore the spatiotemporal pollution characteristics of soil heavy metals to a certain extent. In some areas, opposite results may be obtained, which is difficult to be effective. and precise guidance on soil heavy metal management. In addition, soil heavy metal pollution is often closely related to human activities, especially polluting enterprises. The use of traditional source analysis methods does not provide a reliable spatiotemporal coupling relationship between sources and sinks, and cannot well solve the spatiotemporal variability of pollution. This makes At present, the prevention and control of heavy metal pollution in soil caused by changes in polluting enterprises has become quite difficult.
目前,相继出台了一些关于土壤重金属污染分区管理的技术规定或标准,但这些分区管理办法仅仅考虑到当前土壤重金属污染水平,而忽略了土壤重金属污染的未来风险,比如由于污染企业引起的土壤污染风险。无论如何,基于未来风险的土壤污染管理可以作为未来土壤管理的重要参考,模拟土壤Cd未来风险概率区的动态变化特征对于当前和未来土壤Cd污染管理的有效决策具有重要意义。但是,如何实现土壤重金属污染的未来风险预测,是目前亟待解决的技术问题。At present, some technical regulations or standards on the zoning management of heavy metal pollution in soil have been introduced. However, these zoning management methods only take into account the current level of heavy metal pollution in soil and ignore the future risks of heavy metal pollution in soil, such as soil pollution caused by polluting enterprises. risk. In any case, soil pollution management based on future risks can be used as an important reference for future soil management, and simulating the dynamic change characteristics of soil Cd future risk probability zones is of great significance for effective decision-making in current and future soil Cd pollution management. However, how to predict the future risk of soil heavy metal pollution is an urgent technical problem that needs to be solved.
发明内容Contents of the invention
本发明的目的在于解决现有技术中存在的问题,并提供一种基于时空交互关系的土壤Cd风险预测方法。The purpose of the present invention is to solve the problems existing in the prior art and provide a soil Cd risk prediction method based on spatiotemporal interaction.
本发明所采用的具体技术方案如下:The specific technical solutions adopted by the present invention are as follows:
一种基于时空交互关系的土壤镉风险预测方法,其步骤如下:A soil cadmium risk prediction method based on spatiotemporal interaction, the steps are as follows:
S1、获取目标预测区域内多个不同历史时期与土壤Cd污染相关的数据集,并通过数据的提取和交叉验证,得到不同历史时期目标预测区域内的土壤Cd污 染企业空间点位分布数据,以及不同历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH;S1. Obtain multiple data sets related to soil Cd pollution in different historical periods in the target prediction area, and through data extraction and cross-validation, obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods, and Soil Cd concentration and soil pH at each sampling point in the target prediction area in different historical periods;
S2、针对每一个历史时期,使用核密度法对该历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据进行核密度分析,得到污染企业密度值,同时基于该历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH,通过插值方法得到目标预测区域内的土壤Cd浓度空间分布和土壤pH空间分布,结合不同pH范围对应的土壤Cd浓度风险筛选值,确定目标预测区域内不同位置的土壤Cd风险级别;S2. For each historical period, use the kernel density method to conduct kernel density analysis on the spatial point distribution data of soil Cd pollution enterprises in the target prediction area of the historical period, and obtain the density value of polluting enterprises. At the same time, based on the target prediction area of the historical period, For the soil Cd concentration and soil pH at each sampling point, the spatial distribution of soil Cd concentration and soil pH in the target prediction area was obtained through interpolation method. Combined with the soil Cd concentration risk screening values corresponding to different pH ranges, different differences in the target prediction area were determined. The soil Cd risk level of the location;
S3、针对每一个历史时期,对目标预测区域进行网格化后,统计各网格内的土壤Cd风险级别与污染企业密度值,再对该历史时期土壤Cd风险级别与污染企业密度值进行双变量局部空间自相关分析,得到土壤Cd与污染企业的时空交互关系;S3. For each historical period, after gridding the target prediction area, count the soil Cd risk level and polluting enterprise density value in each grid, and then double-check the soil Cd risk level and polluting enterprise density value for the historical period. Variable local spatial autocorrelation analysis is used to obtain the spatiotemporal interaction relationship between soil Cd and polluting enterprises;
S4、针对每一个历史时期,根据对应的所述土壤Cd风险级别、受体脆弱性和所述时空交互关系,按照预设的风险区分级标准进行目标预测区域内不同等级土壤Cd风险区的识别;S4. For each historical period, according to the corresponding soil Cd risk level, receptor vulnerability and the spatio-temporal interaction relationship, identify different levels of soil Cd risk areas in the target prediction area according to the preset risk area grading standards. ;
S5、获取目标预测区域内每一个历史时期对应的土壤Cd风险演变影响因素,结合各历史时期的土壤Cd风险区识别结果,运用斑块生成土地利用模拟模型(Patch-generating Land Use Simulation Model,PLUS)实现对目标预测区域内未来土壤Cd风险区的预测。S5. Obtain the influencing factors of soil Cd risk evolution corresponding to each historical period in the target prediction area, combine the identification results of soil Cd risk areas in each historical period, and use the patch-generating Land Use Simulation Model (PLUS) ) to achieve prediction of future soil Cd risk areas within the target prediction area.
作为优选,所述S1中,所述与土壤Cd污染相关的数据集包括与土壤Cd污染相关的涉土企业数据、POI数据、历史工业企业数据和多时期高分辨率遥感影像,以及不同时期土壤采样点数据;其中,涉土企业数据包括与土壤Cd污染相关的企业的名称及其所属地市,POI数据包括企业名称及其经纬度信息;历史工业企业数据包括企业名称和生产活动年份;土壤采样点数据包括不同采样点处的土壤Cd浓度和土壤pH数据。Preferably, in S1, the data set related to soil Cd pollution includes soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images related to soil Cd pollution, as well as soil in different periods. Sampling point data; among them, soil-related enterprise data includes the name of the enterprise related to soil Cd pollution and the city where it belongs; POI data includes the enterprise name and its longitude and latitude information; historical industrial enterprise data includes the enterprise name and the year of production activities; soil sampling Point data include soil Cd concentration and soil pH data at different sampling points.
进一步的,所述S1中,所述土壤Cd污染企业空间点位分布数据的获取方法为:Further, in the S1, the method for obtaining the spatial point distribution data of the soil Cd pollution enterprises is:
S11、将所述与土壤Cd污染相关的涉土企业数据中的非结构化数据转换为结构化数据,从而得到与土壤Cd污染相关的涉土企业名称;对所述涉土企业名 称、以及历史工业企业数据和POI数据中的企业名称分别进行分词处理,提取企业名称中不同层级的实体,随后进行相似度匹配,将企业名称中各层级实体完全匹配的涉土企业作为污染企业,将历史工业企业数据中该污染企业的生产活动年份和POI数据中的地理位置信息关联至该污染企业中,得到含地理位置和生产活动年份的污染企业数据。S11. Convert the unstructured data in the soil-related enterprise data related to soil Cd pollution into structured data, thereby obtaining the names of soil-related enterprises related to soil Cd pollution; The company names in the industrial enterprise data and POI data are separately processed by word segmentation, entities at different levels in the enterprise names are extracted, and then similarity matching is performed. The soil-related enterprises that completely match the entities at all levels in the enterprise names are regarded as polluting enterprises, and the historical industries are The year of production activities of the polluting enterprise in the enterprise data and the geographical location information in the POI data are associated with the polluting enterprise, and the data of polluting enterprises including the geographical location and year of production activities are obtained.
S12、从多时期高分辨率遥感影像信息中提取每个污染企业及其周边场地在各生产活动年份的高分辨率遥感影像,然后使用基于深度学习的图像语义分割模型,对每个污染企业对应的高分辨率遥感影像数据进行建筑物特征提取,以判断各生产活动年份中影像区域内是否存在建筑或者企业工厂,实现不同年份污染企业空间点位分布的遥感验证,剔除未通过遥感验证的污染企业数据,按生产年份对剩余污染企业数据进行划分,得到不同历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据。S12. Extract high-resolution remote sensing images of each polluting enterprise and its surrounding sites in each production activity year from multi-period high-resolution remote sensing image information, and then use an image semantic segmentation model based on deep learning to map each polluting enterprise's corresponding Building features are extracted from high-resolution remote sensing image data to determine whether there are buildings or corporate factories in the image area in each production activity year, to achieve remote sensing verification of the spatial point distribution of polluting enterprises in different years, and to eliminate pollution that has not passed remote sensing verification. For enterprise data, the remaining polluting enterprise data are divided according to the year of production to obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods.
进一步的,所述S11中,企业名称使用分词引擎jieba进行分词处理,被分为行政区划、字号、行业、组织形式四个层级的实体,随后通过计算编辑距离进行相似度匹配,若两个企业名称中四个层级的实体完全匹配,则视为这两个企业名称匹配。Furthermore, in S11, the company name is segmented using the word segmentation engine jieba, and is divided into entities at four levels: administrative division, font size, industry, and organizational form. Then similarity matching is performed by calculating edit distance. If two companies If the four levels of entities in the names match exactly, the two company names are deemed to match.
进一步的,所述S12中,所述基于深度学习的图像语义分割模型为U-Net卷积神经网络模型;在对高分辨率遥感影像进行建筑物特征提取时,根据不同历史时期污染企业的经纬度及其对应遥感影像上的信息获取该点位周边相应的遥感影像;对获取的遥感影像进行数据预处理,生成相应的标注数据并运用U-Net卷积神经网络模型进行训练建模;再利用训练好的U-Net卷积神经网络模型对高分辨率遥感影像进行图像分割,返还的结果图中的每个像元有建筑与非建筑两类;将属于同一生产年份的返还的结果图与污染企业数据进行比较,若污染企业所在位置在返还的结果图中存在建筑,则说明该点位的污染企业真实存在;若污染企业所在位置在返还的结果图中只存在非建筑,则说明该点位的污染企业并不存在,将该污染企业信息进行二次审查以判断该点位是否存在污染企业。Further, in S12, the image semantic segmentation model based on deep learning is the U-Net convolutional neural network model; when extracting building features from high-resolution remote sensing images, the longitude and latitude of polluting enterprises in different historical periods are used and the information on the corresponding remote sensing images to obtain the corresponding remote sensing images around the point; perform data preprocessing on the acquired remote sensing images, generate corresponding annotation data, and use the U-Net convolutional neural network model for training and modeling; and then use The trained U-Net convolutional neural network model performs image segmentation on high-resolution remote sensing images. Each pixel in the returned result map has two categories: building and non-building; the returned result map belonging to the same production year is compared with Compare the data of polluting enterprises. If there are buildings in the returned result map at the location of the polluting enterprise, it means that the polluting enterprise at that point really exists; if there are only non-buildings in the returned result map at the location of the polluting enterprise, it means that there are only non-buildings in the returned result map. The polluting enterprise at the point does not exist, and the polluting enterprise information is reviewed twice to determine whether there is a polluting enterprise at the point.
作为优选,所述S3的具体方法如下:针对每一个历史时期,对所述土壤Cd风险级别进行划分后,根据目标预测区域的边界,得到其最小外接矩形所围成的范围;然后以该最小外接矩形的某个顶点开始,生成格网和对应的网格点;运用 网格点提取不同历史时期污染企业核密度和土壤Cd风险分级值,分别代表边界中不同历史时期污染企业的聚集程度和土壤Cd风险水平;将带有土壤Cd风险级别与污染企业密度值的网格点进行双变量局部莫兰分析,得到二者的时空交互关系,具体公式如下:As a preferred method, the specific method of S3 is as follows: for each historical period, after dividing the soil Cd risk level, according to the boundary of the target prediction area, obtain the range enclosed by its minimum circumscribed rectangle; then use the minimum Starting from a certain vertex of the circumscribed rectangle, a grid and corresponding grid points are generated; the grid points are used to extract the nuclear density and soil Cd risk classification values of polluting enterprises in different historical periods, which respectively represent the aggregation degree and concentration of polluting enterprises in different historical periods in the boundary. Soil Cd risk level; perform bivariate local Moran analysis on the grid points with soil Cd risk level and polluting enterprise density value, and obtain the spatio-temporal interaction relationship between the two. The specific formula is as follows:
Figure PCTCN2022086484-appb-000001
Figure PCTCN2022086484-appb-000001
其中
Figure PCTCN2022086484-appb-000002
Figure PCTCN2022086484-appb-000003
分别是变量a和b在网格i和网格j上的不同历史时期土壤Cd风险级别和污染企业密度,变量a和b分别代表网格内不同历史时期土壤Cd风险级别与污染企业密度;W ij为网格i和网格j的空间权重矩阵,根据网格i和网格j之间的欧式距离权重所得;I ab表示网格i处的a属性与网格j处b属性的局部莫兰指数;当I ab显著为正时,则网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部正相关关系;当I ab显著为负时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部负相关关系;当I ab不显著时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度无明显的时空交互关系。
in
Figure PCTCN2022086484-appb-000002
and
Figure PCTCN2022086484-appb-000003
Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods on grid i and grid j. Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods in the grid; W ij is the spatial weight matrix of grid i and grid j, which is obtained based on the Euclidean distance weight between grid i and grid j; I ab represents the local invariance between the a attribute at grid i and the b attribute at grid j. blue index; when I ab is significantly positive, then the soil Cd risk level at grid i has a significant local positive correlation with the density of polluting enterprises at grid j; when I ab is significantly negative, then the grid The soil Cd risk level at grid i has a significant local negative correlation with the density of polluting enterprises at grid j; when I ab is not significant, it is considered that the soil Cd risk level at grid i is related to the density of polluting enterprises at grid j Density has no obvious spatiotemporal interaction.
作为优选,所述S4中,在预设的风险区分级标准中,土壤Cd风险区被分为高、中、低三种风险控制区和风险不确定区。Preferably, in S4, in the preset risk area grading standards, the soil Cd risk area is divided into three risk control areas: high, medium and low, and risk uncertainty areas.
作为优选,所述受体脆弱性为人口密度。Preferably, the receptor vulnerability is population density.
作为优选,所述S5中,所述斑块生成土地利用模拟模型通过提取历史时期之间各等级土壤Cd风险区的变化特征,并采用随机森林算法逐一对各土壤Cd风险演变影响因素进行挖掘,获取各等级的土壤Cd风险区的变化发生概率以及各土壤Cd风险演变影响因素对变化特征的贡献,最终在变化发生概率的约束下,基于最新的土壤Cd风险演变影响因素,实现对目标预测区域内未来土壤Cd风险区的预测。Preferably, in S5, the patch generation land use simulation model extracts the changing characteristics of soil Cd risk areas at each level between historical periods, and uses a random forest algorithm to mine the influencing factors of each soil Cd risk evolution one by one. Obtain the change occurrence probability of soil Cd risk areas at each level and the contribution of each soil Cd risk evolution influencing factor to the change characteristics. Finally, under the constraints of the change occurrence probability, based on the latest soil Cd risk evolution influencing factors, achieve the target prediction area Prediction of future soil Cd risk areas in China.
作为优选,所述斑块生成土地利用模拟模型实现对目标预测区域内未来土壤Cd风险区的预测的具体步骤如下:首先,将两个历史时期的各等级土壤Cd风险区进行叠加分析,从中提取每种等级土壤Cd风险区的变化特征;其次,基于随机森林模型建立每种等级土壤Cd风险区发生变化的发生概率,并计算各土壤Cd风险演变影响因素对每种土地利用变化的贡献率,发生概率用于估计网格单元中 每种等级土壤Cd风险区的动态变化趋势;最后,通过整合时间序列的土壤Cd风险区数据和模型产生的组合概率来预测未来等级土壤Cd风险区,其中通过以下公式估计网格单元p未来被k等级的土壤Cd风险区占用的组合概率:As an option, the specific steps for the patch generation land use simulation model to predict future soil Cd risk areas in the target prediction area are as follows: First, perform overlay analysis on the soil Cd risk areas of each level in the two historical periods, and extract Change characteristics of each grade of soil Cd risk area; secondly, based on the random forest model, establish the occurrence probability of change in each grade of soil Cd risk area, and calculate the contribution rate of each soil Cd risk evolution influencing factor to each land use change, The occurrence probability is used to estimate the dynamic change trend of each level of soil Cd risk area in the grid unit; finally, the future level of soil Cd risk area is predicted by integrating the time series of soil Cd risk area data and the combined probability generated by the model, where The following formula estimates the combined probability that grid cell p will be occupied by a soil Cd risk area of level k in the future:
Figure PCTCN2022086484-appb-000004
Figure PCTCN2022086484-appb-000004
其中
Figure PCTCN2022086484-appb-000005
表示网格单元p在迭代时间t从原始等级的土壤Cd风险区转换到目标k等级的土壤Cd风险区的组合概率;P p,k表示网格单元p上目标k等级的土壤Cd风险区出现的概率;
Figure PCTCN2022086484-appb-000006
表示在迭代时间t对应网格单元p上目标k等级的土壤Cd风险区的邻域效应;
Figure PCTCN2022086484-appb-000007
表示在迭代时间t上目标k等级的土壤Cd风险区的惯性系数;sc c→k表示从原始c等级的土壤Cd风险区到目标k等级的土壤Cd风险区的转换成本;斑块生成土地利用模拟模型使用轮盘赌选择机制来确定哪种土地利用类型将占据网格单元。
in
Figure PCTCN2022086484-appb-000005
Indicates the combined probability that grid unit p converts from the soil Cd risk area of the original level to the soil Cd risk area of the target level k at the iteration time t; P p,k indicates the occurrence of the soil Cd risk area of the target level k on the grid unit p The probability;
Figure PCTCN2022086484-appb-000006
Represents the neighborhood effect of the soil Cd risk area of the target k level on the grid unit p corresponding to the iteration time t;
Figure PCTCN2022086484-appb-000007
represents the inertia coefficient of the soil Cd risk area of target k level at iteration time t; sc c→k represents the conversion cost from the original c level soil Cd risk area to the target k level soil Cd risk area; patch generation land use The simulation model uses a roulette selection mechanism to determine which land use type will occupy a grid cell.
本发明相对于现有技术而言,具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
本发明基于公开的涉土企业信息,同时匹配当地政府调查并公开的涉污企业数据,获取与污染相关的涉土企业数据,后续无需对企业数据进行分类以及判别是否该企业是污染企业,可直接根据目标预测区域所在地政府公开的污染企业数据进行数据清洗与建模;此外,本发明根据不同时期污染企业的核密度分析得到密度值,并与对应时期的土壤Cd风险级别进行双变量局部莫兰分析,分析二者的时空交互关系,从而使得离散的Cd污染点数据与污染企业点数据能够较为准确的探索时空交互关系;最后通过不同时期土壤Cd风险分区特征及其风险分区演变的影响因素,运用斑块生成土地利用模拟模型对当前不同时期的土壤Cd风险分区进行建模,并预测未来土壤Cd风险分区特征,扩展了现有土壤Cd分区的方法和思路,对进一步管理与预防土壤Cd污染具有重要的理论与实际意义,并存在推广应用的价值。This invention is based on the public information of soil-related enterprises and at the same time matches the data of pollution-related enterprises investigated and disclosed by the local government to obtain the data of soil-related enterprises related to pollution. There is no need to classify the enterprise data and determine whether the enterprise is a polluting enterprise in the future. Data cleaning and modeling are directly performed based on the data of polluting enterprises disclosed by the government where the target prediction area is located; in addition, the present invention obtains the density value based on the kernel density analysis of polluting enterprises in different periods, and conducts bi-variable local modeling with the soil Cd risk level in the corresponding period. Lan analysis analyzes the spatiotemporal interaction between the two, so that the discrete Cd pollution point data and polluting enterprise point data can more accurately explore the spatiotemporal interaction; finally, through the soil Cd risk zoning characteristics and the influencing factors of the risk zoning evolution in different periods , using a patch generation land use simulation model to model soil Cd risk zoning in different periods at present, and predict future soil Cd risk zoning characteristics, expanding the existing soil Cd zoning methods and ideas, and providing further guidance for the further management and prevention of soil Cd. Pollution has important theoretical and practical significance, and has the value of promotion and application.
附图说明Description of the drawings
图1是实施例中的部分结果展示,其中(a)2002年土壤Cd风险水平分区,(b)2002年人口密度的分区,(c)2002年污染企业密度分布,(d)2002年土壤Cd与污染企业的局部莫兰空间交互关系;Figure 1 is a partial display of the results in the embodiment, including (a) soil Cd risk level zoning in 2002, (b) population density zoning in 2002, (c) polluting enterprise density distribution in 2002, (d) soil Cd in 2002 Local Moran spatial interactions with polluting companies;
图2是实施例中的部分结果展示,其中(a)2012年土壤Cd风险水平分区,(b)2012年人口密度的分区,(c)2012年污染企业密度分布,(d)2012年土壤Cd与污染企业的局部莫兰空间交互关系;Figure 2 is a partial display of the results in the embodiment, including (a) soil Cd risk level zoning in 2012, (b) population density zoning in 2012, (c) polluting enterprise density distribution in 2012, (d) soil Cd in 2012 Local Moran spatial interactions with polluting companies;
图3是实施例中的部分结果展示,其中(a)2002年土壤Cd风险管理分区示意图,(b)2012年土壤Cd风险管理分区示意图,(c)斑块生成土地利用模拟模型预测的2012年土壤Cd风险管理分区示意图,(d)斑块生成土地利用模拟模型预测的2022年土壤Cd风险管理分区示意图;Figure 3 is a partial display of the results in the embodiment, including (a) schematic diagram of soil Cd risk management zoning in 2002, (b) schematic diagram of soil Cd risk management zoning in 2012, (c) patch generation land use simulation model predicted in 2012 Schematic diagram of soil Cd risk management zoning, (d) Schematic diagram of soil Cd risk management zoning in 2022 predicted by the patch generation land use simulation model;
图4是实施例中斑块生成土地利用模拟模型基于2002年风险分区预测2012年风险分区的模型精度。Figure 4 is the model accuracy of the patch generation land use simulation model in the embodiment to predict the risk zoning in 2012 based on the 2002 risk zoning.
具体实施方式Detailed ways
下面结合附图和具体实施方式对本发明做进一步阐述和说明。本发明中各个实施方式的技术特征在没有相互冲突的前提下,均可进行相应组合。The present invention will be further elaborated and described below in conjunction with the accompanying drawings and specific embodiments. The technical features of various embodiments of the present invention can be combined accordingly as long as they do not conflict with each other.
本发明是通过一种基于多源数据的相似度匹配算法进行污染企业空间识别,主要基于多源企业数据,建立相似度计算方法进行数据匹配,同时采用U-Net模型建立污染企业识别的遥感验证,并通过得到的不同时期的污染企业数据与土壤Cd污染数据进行双变量局部莫兰指数分析来构建污染企业与土壤Cd风险分级之间的时空交互关系,同时运用源汇理论和时空交互关系来确定不同时期土壤Cd风险分区特征,最终运用斑块生成土地利用模拟模型对未来土壤Cd风险分区进行预测,进而对土壤Cd污染管理分区研究工作起到一个指向性的作用。This invention uses a similarity matching algorithm based on multi-source data to spatially identify polluting enterprises. It mainly establishes a similarity calculation method for data matching based on multi-source enterprise data. At the same time, it uses the U-Net model to establish remote sensing verification for identifying polluting enterprises. , and conduct bivariate local Moran index analysis on the obtained polluting enterprise data and soil Cd pollution data in different periods to construct the spatio-temporal interaction relationship between polluting enterprises and soil Cd risk classification, and use source-sink theory and spatio-temporal interaction relationship to construct Determine the characteristics of soil Cd risk zoning in different periods, and finally use the patch generation land use simulation model to predict future soil Cd risk zoning, thereby playing a directional role in the research on soil Cd pollution management zoning.
在本发明的一个较佳实施例中,提供了一种基于时空交互关系的土壤Cd风险预测方法,其具体实现步骤如下:In a preferred embodiment of the present invention, a soil Cd risk prediction method based on spatiotemporal interaction is provided. The specific implementation steps are as follows:
S1、获取目标预测区域内多个不同历史时期与土壤Cd污染相关的数据集,并通过数据的提取和交叉验证,得到不同历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据,以及不同历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH。S1. Obtain multiple data sets related to soil Cd pollution in different historical periods in the target prediction area, and through data extraction and cross-validation, obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods, and Soil Cd concentration and soil pH at each sampling point in the target prediction area in different historical periods.
在本实施例中,上述S1步骤的具体实现方式如下:In this embodiment, the specific implementation of the above S1 step is as follows:
1)数据获取:1) Data acquisition:
获取与土壤Cd污染相关的数据集,包括:与土壤Cd污染相关的涉土企业数据、POI数据、历史工业企业数据和多时期高分辨率遥感影像,以及不同时期 土壤采样点数据;其中,涉土企业数据包括与土壤Cd污染相关的企业的名称及其所属地市,高德POI数据包括企业名称及其经纬度信息;历史工业企业数据包括企业名称和生产活动年份;土壤采样点数据包括不同采样点处的土壤Cd浓度和土壤pH数据。Obtain data sets related to soil Cd pollution, including: soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images related to soil Cd pollution, as well as soil sampling point data in different periods; among them, involving Soil enterprise data includes the names of enterprises related to soil Cd pollution and the cities where they belong. AutoNavi POI data includes enterprise names and their longitude and latitude information; historical industrial enterprise data includes enterprise names and years of production activities; soil sampling point data includes different sampling Soil Cd concentration and soil pH data at points.
本实施例中,与污染相关的涉土企业数据、POI数据、历史工业企业数据和多时期高分辨率遥感影像可使用基于http协议与公开网页端进行获取,POI数据采用高德POI数据;土壤采样点由田间采样获取,可从土壤普查数据中查询或者其他途径获取,每个采样点应当含有该采样点位置的土壤Cd浓度和土壤pH。引入土壤pH的原因是,土壤中的Cd风险是与pH密切相关的,因此需要结合土壤Cd浓度和土壤pH来综合判断土壤Cd风险区。另外,高德POI数据和历史工业企业数据应当尽量覆盖目标预测区域内的企业,以利于后续尽可能匹配到更多的污染企业数据。In this embodiment, pollution-related soil-related enterprise data, POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images can be obtained using the http protocol and a public web page. The POI data uses Amap POI data; soil Sampling points are obtained from field sampling, which can be queried from soil census data or obtained through other means. Each sampling point should contain the soil Cd concentration and soil pH at the location of the sampling point. The reason for introducing soil pH is that the Cd risk in the soil is closely related to pH, so soil Cd concentration and soil pH need to be combined to comprehensively determine the soil Cd risk area. In addition, AutoNavi POI data and historical industrial enterprise data should try to cover enterprises within the target prediction area, so as to facilitate subsequent matching of more polluting enterprise data as much as possible.
2)数据预处理:2) Data preprocessing:
将与土壤Cd污染相关的涉土企业数据中的非结构化数据转换为结构化数据,从而得到与土壤Cd污染相关的涉土企业名称;对得到的涉土企业名称、以及历史工业企业数据和POI数据中的企业名称分别进行分词处理,提取企业名称中不同层级的实体,随后进行相似度匹配,将企业名称中各层级实体完全匹配的涉土企业作为污染企业,将历史工业企业数据中该污染企业的生产活动年份和POI数据中的地理位置信息关联至该污染企业中,得到含地理位置和生产活动年份的污染企业数据。Convert the unstructured data in the soil-related enterprise data related to soil Cd pollution into structured data, thereby obtaining the names of soil-related enterprises related to soil Cd pollution; the obtained soil-related enterprise names, as well as historical industrial enterprise data and The company names in the POI data are segmented separately, entities at different levels in the company names are extracted, and then similarity matching is performed. The soil-related companies that completely match the entities at all levels in the company names are regarded as polluting companies, and the entities in the historical industrial company data are The year of production activities of the polluting enterprise and the geographical location information in the POI data are associated with the polluting enterprise, and the data of polluting enterprises including the geographical location and year of production activities are obtained.
在本实施例中,与污染相关的涉土企业中以非结构化文档数据PDF、word或图片格式存在信息,使用光学字符识别(Optical Character Recognition,OCR)技术将其识别为基于Office Open XML标准的压缩文件格式,从而转换为结构化数据。In this embodiment, information in pollution-related enterprises exists in unstructured document data PDF, word or picture format, and optical character recognition (Optical Character Recognition, OCR) technology is used to identify it as based on the Office Open XML standard compressed file format, thereby converting it into structured data.
在本实施例中,企业名称可使用分词引擎jieba进行分词处理,被分为行政区划、字号、行业、组织形式四个层级的实体,随后通过计算编辑距离进行相似度匹配,若两个企业名称中四个层级的实体完全匹配,则视为这两个企业名称匹配。jieba分词是基于词典的分词方法,为了得到较好的切分效果,需要使用自定义词典的方式来对未登录词进行更好的识别,通过加入了行政区划名和通名特 征词到自定义词典,使得对名称的切分更为准确。本实施例中相似度匹配通过计算两个字符的编辑距离,从而得到字符串之间的差异性,差异越小其相似度越高。编辑距离是是指两个字符串之间,由一个转成另一个所需的最少编辑操作次数。如果它们的编辑距离越小,说明它们差异性越小,相似度越高。编辑距离允许的字符操作包括:a.删除一个字符,b.插入一个字符,c.修改一个字符。其核心思想是对一个字符串内部的单个字符进行最少的编辑操作后得到另一个字符串,算法数学定义为公式2.2。In this embodiment, the enterprise name can be segmented using the word segmentation engine jieba and divided into entities at four levels: administrative division, font size, industry, and organizational form. Then similarity matching is performed by calculating the edit distance. If two enterprise names If the entities at the four levels are completely matched, the two enterprise names are deemed to match. Jieba word segmentation is a dictionary-based word segmentation method. In order to obtain better segmentation results, it is necessary to use a custom dictionary to better identify unregistered words. By adding administrative division names and general name characteristic words to the custom dictionary , making the segmentation of names more accurate. In this embodiment, similarity matching calculates the edit distance of two characters to obtain the difference between strings. The smaller the difference, the higher the similarity. Edit distance refers to the minimum number of edit operations required between two strings to convert one into the other. If their edit distance is smaller, it means that their differences are smaller and their similarity is higher. Character operations allowed by edit distance include: a. Delete a character, b. Insert a character, c. Modify a character. The core idea is to perform minimal editing operations on a single character inside a string to obtain another string. The algorithm is mathematically defined as Formula 2.2.
Figure PCTCN2022086484-appb-000008
Figure PCTCN2022086484-appb-000008
其中lev a,b(i,j)表示字符串a前i个字符和字符串b前j个字符之间的编辑距离。当min(i,j)=0,a和b之间的编辑距离为max(i,j);当min(i,j)≠0,为如下三种情况的最小值:1.lev a,b(i-1,j)+1,2.lev a,b(i,j-1)+1,和3.
Figure PCTCN2022086484-appb-000009
他们分别表示删除a i,插入b j和替换b j
Figure PCTCN2022086484-appb-000010
为一个指示函数,表示当a i=b j的时候取0;当a i≠b j的时候,其值为1。
where lev a,b (i,j) represents the edit distance between the first i characters of string a and the first j characters of string b. When min(i,j)=0, the edit distance between a and b is max(i,j); when min(i,j)≠0, it is the minimum value of the following three situations: 1.lev a, b (i-1,j)+1,2.lev a,b (i,j-1)+1, and 3.
Figure PCTCN2022086484-appb-000009
They respectively represent deletion of a i , insertion of b j and replacement of b j ,
Figure PCTCN2022086484-appb-000010
is an indicator function, which means that when a i = b j, it takes 0; when a i ≠ b j , its value is 1.
基于编辑距离,两个字符串之间的相似度计算公式如下:Based on edit distance, the similarity calculation formula between two strings is as follows:
Figure PCTCN2022086484-appb-000011
Figure PCTCN2022086484-appb-000011
式中,|a|和|b|为字符串a和b的字符串长度,max(|a|,|b|)是字符串a和b中较大的字符串长度,lev a,b(i,j)为由a转换为b所需的最小编辑操作字数。 In the formula, |a| and |b| are the string lengths of strings a and b, max(|a|,|b|) is the larger string length of strings a and b, lev a,b ( i,j) is the minimum number of words of editing operations required to convert a to b.
3)遥感验证:3) Remote sensing verification:
从多时期高分辨率遥感影像信息中提取每个污染企业及其周边场地在各生产活动年份的高分辨率遥感影像,然后使用基于深度学习的图像语义分割模型,对每个污染企业对应的高分辨率遥感影像数据进行建筑物特征提取,以判断各生产活动年份中影像区域内是否存在建筑或者企业工厂,实现不同年份污染企业空间点位分布的遥感验证,剔除未通过遥感验证的污染企业数据,按生产年份对剩 余污染企业数据进行划分,得到不同历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据。Extract high-resolution remote sensing images of each polluting enterprise and its surrounding sites in each production activity year from multi-period high-resolution remote sensing image information, and then use an image semantic segmentation model based on deep learning to classify the high-resolution images corresponding to each polluting enterprise. High-resolution remote sensing image data is used to extract building features to determine whether there are buildings or corporate factories in the image area in each production activity year, to achieve remote sensing verification of the spatial point distribution of polluting enterprises in different years, and to eliminate data from polluting enterprises that have not passed remote sensing verification. , divide the remaining polluting enterprise data according to the year of production, and obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods.
在本实施例中,基于深度学习的图像语义分割模型可采用U-Net卷积神经网络模型。在对高分辨率遥感影像进行建筑物特征提取时,根据不同历史时期污染企业坐落点位的经纬度及其对应遥感影像上的信息获取该点位周边相应的遥感影像;然后对获取的遥感影像进行数据预处理,生成相应的标注数据并运用U-Net卷积神经网络模型进行训练建模;再利用训练好的U-Net卷积神经网络模型对高分辨率遥感影像进行图像分割,返还的结果图中的每个像元有建筑与非建筑两类;最后将属于同一生产年份的返还的结果图与污染企业数据进行比较,若污染企业所在位置在返还的结果图中存在建筑,则说明该点位的污染企业真实存在;若污染企业所在位置在返还的结果图中只存在非建筑,则说明该点位的污染企业并不存在,将该污染企业信息进行二次审查以判断该点位是否存在污染企业。此处的二次审查可通过人工审核或其他形式实现。In this embodiment, the image semantic segmentation model based on deep learning can adopt the U-Net convolutional neural network model. When extracting building features from high-resolution remote sensing images, the corresponding remote sensing images around the point are obtained based on the longitude and latitude of the location of polluting enterprises in different historical periods and the information on the corresponding remote sensing images; then the obtained remote sensing images are Data preprocessing, generate corresponding annotation data and use U-Net convolutional neural network model for training modeling; then use the trained U-Net convolutional neural network model to perform image segmentation on high-resolution remote sensing images, and return the results Each pixel in the figure has two categories: building and non-building; finally, the returned result map belonging to the same production year is compared with the polluting enterprise data. If the location of the polluting enterprise has a building in the returned result map, it means that the polluting enterprise has a building in the returned result map. The polluting enterprise at the point does exist; if the location of the polluting enterprise only contains non-buildings in the returned result map, it means that the polluting enterprise at the point does not exist, and the polluting enterprise information will be reviewed twice to determine the point. Whether there are polluting enterprises. The secondary review here can be achieved through manual review or other forms.
U-Net卷积神经网络模型属于现有技术,模型主要通过卷积与池化的步骤对图像进行特征提取,并通过反卷积的步骤将特征图恢复至原始图像的尺寸大小,卷积与池化的具体计算如下公式所示:The U-Net convolutional neural network model belongs to the existing technology. The model mainly extracts features from the image through the steps of convolution and pooling, and restores the feature map to the size of the original image through the step of deconvolution. Convolution and The specific calculation of pooling is as follows:
Figure PCTCN2022086484-appb-000012
Figure PCTCN2022086484-appb-000012
其中假设k为模型的卷积层,
Figure PCTCN2022086484-appb-000013
表示第k层中产生的第j个特征图。将x i k-1的特征图与相应卷积核中的像元
Figure PCTCN2022086484-appb-000014
进行卷积计算并求和,b代表相应的偏置。F(X)则是该层的激励函数,此处选用ReLU作为激活函数。
Among them, it is assumed that k is the convolution layer of the model,
Figure PCTCN2022086484-appb-000013
Represents the j-th feature map generated in the k-th layer. Compare the feature map of x i k-1 with the pixels in the corresponding convolution kernel
Figure PCTCN2022086484-appb-000014
Convolution calculations are performed and summed, b represents the corresponding bias. F(X) is the activation function of this layer, and ReLU is selected as the activation function here.
Figure PCTCN2022086484-appb-000015
Figure PCTCN2022086484-appb-000015
其中down()作为下采样的函数,在平均池化的步骤下,将固定大小的像素区域里的所有像素相加,最后得到的特征图的大小变为原来的1/n。U-Net卷积神经网络模型的具体结构以及训练过程不再赘述。Among them, down() is used as a downsampling function. Under the average pooling step, all pixels in a fixed-size pixel area are added, and the size of the final feature map becomes 1/n of the original. The specific structure and training process of the U-Net convolutional neural network model will not be described again.
S2、针对每一个历史时期,使用核密度法对该历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据进行核密度分析,得到污染企业密度值,同时基于该历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH,通过插值方法得到目标预测区域内的土壤Cd浓度空间分布和土壤pH空间分布,结合不 同pH范围对应的土壤Cd浓度风险筛选值,确定目标预测区域内不同位置的土壤Cd风险级别。S2. For each historical period, use the kernel density method to conduct kernel density analysis on the spatial point distribution data of soil Cd pollution enterprises in the target prediction area of the historical period, and obtain the density value of polluting enterprises. At the same time, based on the target prediction area of the historical period, For the soil Cd concentration and soil pH at each sampling point, the spatial distribution of soil Cd concentration and soil pH in the target prediction area was obtained through interpolation method. Combined with the soil Cd concentration risk screening values corresponding to different pH ranges, different differences in the target prediction area were determined. Soil Cd risk level of the location.
不同pH范围对应的土壤Cd浓度风险筛选值可根据相关的标准规范或者专家经验进行确定。在本实施例中,以运用《土壤环境质量—农用地土壤污染风险管控标准》GB 15618—2018来确定不同pH范围对应的土壤Cd浓度风险筛选值,进而以每个点位的土壤Cd浓度风险筛选值为基准判断每个点位是否超过Cd风险筛选值,从而以判断结果实现对土壤Cd风险级别的划分。The soil Cd concentration risk screening values corresponding to different pH ranges can be determined based on relevant standards and specifications or expert experience. In this example, "Soil Environmental Quality - Agricultural Land Soil Pollution Risk Management and Control Standards" GB 15618-2018 is used to determine the soil Cd concentration risk screening values corresponding to different pH ranges, and then based on the soil Cd concentration risk of each point The screening value is used as a benchmark to determine whether each point exceeds the Cd risk screening value, so as to achieve the classification of soil Cd risk levels based on the judgment results.
S3、针对每一个历史时期,对目标预测区域进行网格化后,统计各网格内的土壤Cd风险级别与污染企业密度值,再对该历史时期土壤Cd风险级别与污染企业密度值进行双变量局部空间自相关分析,得到土壤Cd与污染企业的时空交互关系。S3. For each historical period, after gridding the target prediction area, count the soil Cd risk level and polluting enterprise density value in each grid, and then double-check the soil Cd risk level and polluting enterprise density value for the historical period. Through local spatial autocorrelation analysis of variables, the spatiotemporal interaction relationship between soil Cd and polluting enterprises was obtained.
在本实施例中,步骤S3的具体实现方法如下:针对每一个历史时期,完成土壤Cd风险级别的划分后,根据目标预测区域的边界,得到其最小外接矩形所围成的范围;然后以该最小外接矩形的某个顶点开始,生成格网和对应的网格点;运用网格点提取不同历史时期污染企业核密度和土壤Cd风险分级值,分别代表边界中不同历史时期污染企业的聚集程度和土壤Cd风险水平;将带有土壤Cd风险级别与污染企业密度值的网格点进行双变量局部莫兰分析,得到二者的时空交互关系,具体公式如下:In this embodiment, the specific implementation method of step S3 is as follows: for each historical period, after completing the division of soil Cd risk levels, according to the boundary of the target prediction area, obtain the range enclosed by its minimum circumscribed rectangle; then use this Starting from a certain vertex of the minimum circumscribed rectangle, a grid and corresponding grid points are generated; the grid points are used to extract the nuclear density of polluting enterprises and soil Cd risk classification values in different historical periods, which respectively represent the aggregation degree of polluting enterprises in different historical periods in the boundary. and soil Cd risk level; perform bivariate local Moran analysis on the grid points with soil Cd risk level and polluting enterprise density value, and obtain the spatio-temporal interaction relationship between the two. The specific formula is as follows:
Figure PCTCN2022086484-appb-000016
Figure PCTCN2022086484-appb-000016
其中
Figure PCTCN2022086484-appb-000017
Figure PCTCN2022086484-appb-000018
分别是变量a和b在网格i和网格j上的不同历史时期土壤Cd风险级别和污染企业密度,变量a和b分别代表网格内不同历史时期土壤Cd风险级别与污染企业密度;W ij为网格i和网格j的空间权重矩阵,根据网格i和网格j之间的欧式距离权重所得;I ab表示网格i处的a属性与网格j处b属性的局部莫兰指数;当I ab显著为正时,则网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部正相关关系;当I ab显著为负时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部负相关关系;当I ab不显著时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度无明显的时空交互关系。
in
Figure PCTCN2022086484-appb-000017
and
Figure PCTCN2022086484-appb-000018
Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods on grid i and grid j. Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods in the grid; W ij is the spatial weight matrix of grid i and grid j, which is obtained based on the Euclidean distance weight between grid i and grid j; I ab represents the local invariance between the a attribute at grid i and the b attribute at grid j. blue index; when I ab is significantly positive, then the soil Cd risk level at grid i has a significant local positive correlation with the density of polluting enterprises at grid j; when I ab is significantly negative, then the grid The soil Cd risk level at grid i has a significant local negative correlation with the density of polluting enterprises at grid j; when I ab is not significant, it is considered that the soil Cd risk level at grid i is related to the density of polluting enterprises at grid j Density has no obvious spatiotemporal interaction.
S4、针对每一个历史时期,根据对应的所述土壤Cd风险级别、受体脆弱性和所述时空交互关系,按照预设的风险区分级标准进行目标预测区域内不同等级土壤Cd风险区的识别。S4. For each historical period, according to the corresponding soil Cd risk level, receptor vulnerability and the spatio-temporal interaction relationship, identify different levels of soil Cd risk areas in the target prediction area according to the preset risk area grading standards. .
本实施例中,将目标预测区域内不同历史时期的人口密度作为受体脆弱性指标,当人口密度高出当年我国平均人口密度10倍则认为是高脆弱性,否则为低脆弱性。土壤Cd风险区的具体等级可根据实际需要进行设置,而每一个分级的标准则可以根据相关指南或者规范或者专家经验进行调整。在本实施例中,在预设的风险区分级标准中,土壤Cd风险区被分为高、中、低三种风险控制区和风险不确定区。在本实施例中,运用源汇理论和前一步确定的时空交互关系来确定不同时期土壤Cd风险区的等级,其采用的风险区分级标准如下:In this embodiment, the population density in different historical periods in the target prediction area is used as the receptor vulnerability index. When the population density is 10 times higher than my country's average population density in that year, it is considered to be high vulnerability, otherwise it is considered low vulnerability. The specific levels of soil Cd risk areas can be set according to actual needs, and the standards for each classification can be adjusted based on relevant guidelines or specifications or expert experience. In this embodiment, in the preset risk area grading standards, the soil Cd risk area is divided into three risk control areas: high, medium and low, and risk uncertainty areas. In this embodiment, the source-sink theory and the spatio-temporal interaction determined in the previous step are used to determine the levels of soil Cd risk areas in different periods. The risk area grading standards adopted are as follows:
表1 风险区分级标准Table 1 Risk zone grading standards
Figure PCTCN2022086484-appb-000019
Figure PCTCN2022086484-appb-000019
S5、获取目标预测区域内每一个历史时期对应的土壤Cd风险演变影响因素,结合各历史时期的土壤Cd风险区识别结果,运用斑块生成土地利用模拟模型(PLUS)实现对目标预测区域内未来土壤Cd风险区的预测。S5. Obtain the influencing factors of soil Cd risk evolution corresponding to each historical period in the target prediction area, combine the identification results of soil Cd risk areas in each historical period, and use the patch generation land use simulation model (PLUS) to realize the future prediction of the target prediction area. Prediction of soil Cd risk areas.
本发明中,斑块生成土地利用模拟模型PLUS属于现有模型软件,下载链接为:https://github.com/HPSCIL/Patch-generating_Land_Use_Simulation_Model模型软件可以独立运行在Windows Vista/7/8/X64位环境,PLUS模型软件可以独立运行在Windows Vista/7/8/X64位环境。PLUS模型通过提取历史时期之间各等级土壤Cd风险区的变化特征,并采用随机森林算法逐一对各土壤Cd风险演变影响 因素进行挖掘,获取各等级的土壤Cd风险区的变化发生概率以及各土壤Cd风险演变影响因素对变化特征的贡献,最终在变化发生概率的约束下,基于最新的土壤Cd风险演变影响因素,实现对目标预测区域内未来土壤Cd风险区的预测。作为一种土地利用变化模拟模型,PLUS模型集成了土地扩展分析策略和基于多类型随机斑块种子的元胞自动机模型,是模拟土地利用类型动态变化的有效方法。因此,基于源汇理论关系,考虑土壤Cd污染水平和受体脆弱性水平,运用双变量局部莫兰指数建立土壤Cd与污染企业之间的时空交互关系,同时通过斑块生成土地利用模拟模型对未来土壤Cd风险进行预测,对于科学合理的土壤Cd污染风险分区管理,以及指导政策制定者和利益相关者管控土壤Cd污染具有迫切的现实意义。In the present invention, the patch generation land use simulation model PLUS belongs to the existing model software. The download link is: https://github.com/HPSCIL/Patch-generating_Land_Use_Simulation_Model model software can run independently on Windows Vista/7/8/X64 Environment, PLUS model software can run independently in Windows Vista/7/8/X64-bit environment. The PLUS model extracts the change characteristics of soil Cd risk areas at each level between historical periods, and uses the random forest algorithm to mine the influencing factors of each soil Cd risk evolution one by one to obtain the change occurrence probability of soil Cd risk areas at each level and each soil The contribution of Cd risk evolution influencing factors to the change characteristics is ultimately constrained by the change occurrence probability and based on the latest soil Cd risk evolution influencing factors, the prediction of future soil Cd risk areas in the target prediction area is achieved. As a land use change simulation model, the PLUS model integrates land expansion analysis strategies and a cellular automaton model based on multi-type random patch seeds. It is an effective method for simulating dynamic changes in land use types. Therefore, based on the source-sink theoretical relationship, considering the soil Cd pollution level and receptor vulnerability level, the bivariate local Moran index was used to establish the spatiotemporal interaction between soil Cd and polluting enterprises, and at the same time, the patch generation land use simulation model was used to Predicting future soil Cd risks is of urgent practical significance for scientific and reasonable zoning management of soil Cd pollution risks, and for guiding policymakers and stakeholders to manage and control soil Cd pollution.
在本实施例中,斑块生成土地利用模拟模型实现对目标预测区域内未来土壤Cd风险区的预测的具体步骤如下:首先,将两个历史时期的各等级土壤Cd风险区进行叠加分析,从中提取每种等级土壤Cd风险区的变化特征;其次,基于随机森林模型建立每种等级土壤Cd风险区发生变化的发生概率,并计算各土壤Cd风险演变影响因素对每种土地利用变化的贡献率,发生概率用于估计网格单元中每种等级土壤Cd风险区的动态变化趋势;最后,通过整合时间序列的土壤Cd风险区数据和模型产生的组合概率来预测未来等级土壤Cd风险区,其中通过以下公式估计网格单元p未来被k等级的土壤Cd风险区占用的组合概率:In this embodiment, the specific steps for the patch generation land use simulation model to predict future soil Cd risk areas in the target prediction area are as follows: First, the soil Cd risk areas of each level in the two historical periods are superimposed and analyzed. Extract the change characteristics of each level of soil Cd risk area; secondly, establish the occurrence probability of change in each level of soil Cd risk area based on the random forest model, and calculate the contribution rate of each soil Cd risk evolution influencing factor to each land use change , the occurrence probability is used to estimate the dynamic change trend of each level of soil Cd risk area in the grid unit; finally, the future level of soil Cd risk area is predicted by integrating the time series of soil Cd risk area data and the combined probability generated by the model, where The combined probability that grid unit p will be occupied by a k-level soil Cd risk zone in the future is estimated by the following formula:
Figure PCTCN2022086484-appb-000020
Figure PCTCN2022086484-appb-000020
其中
Figure PCTCN2022086484-appb-000021
表示网格单元p在迭代时间t从原始等级的土壤Cd风险区转换到目标k等级的土壤Cd风险区的组合概率;P p,k表示网格单元p上目标k等级的土壤Cd风险区出现的概率;
Figure PCTCN2022086484-appb-000022
表示在迭代时间t对应网格单元p上目标k等级的土壤Cd风险区的邻域效应;
Figure PCTCN2022086484-appb-000023
表示在迭代时间t上目标k等级的土壤Cd风险区的惯性系数;sc c→k表示从原始c等级的土壤Cd风险区到目标k等级的土壤Cd风险区的转换成本;斑块生成土地利用模拟模型使用轮盘赌选择机制来确定哪种土地利用类型将占据网格单元。
in
Figure PCTCN2022086484-appb-000021
Indicates the combined probability that grid unit p converts from the soil Cd risk area of the original level to the soil Cd risk area of the target level k at the iteration time t; P p,k indicates the occurrence of the soil Cd risk area of the target level k on the grid unit p The probability;
Figure PCTCN2022086484-appb-000022
Represents the neighborhood effect of the soil Cd risk area of the target k level on the grid unit p corresponding to the iteration time t;
Figure PCTCN2022086484-appb-000023
represents the inertia coefficient of the soil Cd risk area of target k level at iteration time t; sc c→k represents the conversion cost from the original c level soil Cd risk area to the target k level soil Cd risk area; patch generation land use The simulation model uses a roulette selection mechanism to determine which land use type will occupy a grid cell.
毫无疑问,网格单元上组合概率最高的主导的土壤Cd风险区等级优先被分 配转换,但其他组合概率相对较低的土壤Cd风险区等级仍然有机会被分配,即使机会很小。为了实现这一点,斑块生成土地利用模拟模型使用轮盘赌选择机制来确定哪种土壤Cd风险区等级将占据网格单元。轮盘选择机制是指不同类型轮盘所占据的区域代表分配概率。一种土壤Cd风险区等级占的区域越多,分配概率就越大,且不会剥夺其他区域被分配的机会和可能性。There is no doubt that the dominant soil Cd risk area level with the highest combination probability on the grid unit is prioritized for allocation conversion, but other soil Cd risk area levels with relatively low combination probability still have a chance to be assigned, even if the chance is small. To achieve this, the patch generation land use simulation model uses a roulette selection mechanism to determine which soil Cd risk zone class will occupy the grid cell. The roulette selection mechanism means that the areas occupied by different types of roulette wheels represent distribution probabilities. The more areas a soil Cd risk zone level occupies, the greater the probability of allocation, and it will not deprive other areas of the opportunity and possibility of being allocated.
下面将上述实施例中的基于时空交互关系的土壤镉风险预测方法应用于一个具体实例中,以进一步展示本发明所能够实现的技术效果,以便本领域技术人员更好地理解本发明的实质。下述实例中的具体实现步骤框架如前述S1~S5,不再完全重复赘述,下面主要展示其具体实现细节以及技术效果。Below, the soil cadmium risk prediction method based on spatiotemporal interaction in the above embodiment is applied to a specific example to further demonstrate the technical effects that can be achieved by the present invention, so that those skilled in the art can better understand the essence of the present invention. The specific implementation step framework in the following examples is as mentioned above S1 to S5, and will not be completely repeated. The specific implementation details and technical effects are mainly shown below.
实施例Example
选取中国东南沿海某区域作为研究区,使用本发明前述实施例S1~S5所示的方法进行土壤镉风险预测,具体步骤如下:Select an area along the southeastern coast of China as the research area, and use the method shown in the aforementioned embodiments S1 to S5 of the present invention to predict soil cadmium risk. The specific steps are as follows:
首先,基于http协议与公开网页端获得研究区与污染相关的涉土企业数据、高德POI数据、历史工业企业数据和多时期高分辨率遥感影像;其中,涉土企业数据有污染企业信息,高德POI数据有GCJ-02坐标系的经纬度信息,由高德地图API下载而来,历史工业企业数据来源与中国工业企业历史数据库,高分辨率遥感影像数据来源于中国科学院资源环境科学与数据中心(https://www.resdc.cn/)。土壤Cd浓度和pH数据由研究区田间采样点获取。First, based on the http protocol and the public web page, the pollution-related enterprise data in the study area, AutoNavi POI data, historical industrial enterprise data and multi-period high-resolution remote sensing images were obtained; among them, the soil-related enterprise data includes polluting enterprise information, The Amap POI data includes the latitude and longitude information of the GCJ-02 coordinate system, which is downloaded from the Amap Map API. The source of historical industrial enterprise data is the Chinese Industrial Enterprise Historical Database. The high-resolution remote sensing image data comes from the Resource and Environmental Science and Data of the Chinese Academy of Sciences. Center (https://www.resdc.cn/). Soil Cd concentration and pH data were obtained from field sampling points in the study area.
将与污染相关的涉土企业中以非结构化文档数据PDF、word或图片格式存在信息,使用OCR技术将其识别为基于Office Open XML标准的压缩文件格式,转换为结构化数据。根据污染企业数据与历史工业企业数据和高德POI数据,分别对以上数据中的企业名称使用分词引擎jieba进行分词处理,将其分词成“城市”+“企业名”+“行业名”+“后缀”四项;若两个数据中的企业名称经分词处理后的四项完全匹配,则将该信息返回至污染企业数据中,得到含地理位置和开始生产活动年份的污染企业数据;同时,将POI数据的GCJ-02坐标系转换为WGS-84地理坐标系。Information in pollution-related enterprises that exists in unstructured document data in PDF, word, or image formats will be identified using OCR technology as a compressed file format based on the Office Open XML standard and converted into structured data. Based on the polluting enterprise data, historical industrial enterprise data and Amap POI data, the enterprise names in the above data were segmented using the word segmentation engine jieba, and segmented into "city" + "enterprise name" + "industry name" + " "Suffix" four items; if the four items of the company names in the two data completely match after word segmentation processing, then the information will be returned to the polluting enterprise data, and the polluting enterprise data including the geographical location and the year of starting production activities will be obtained; at the same time, Convert the GCJ-02 coordinate system of POI data to the WGS-84 geographic coordinate system.
其次,根据每个污染企业及其对应点位上的遥感影像信息,提取该点位周边相应的遥感影像。通过获取的遥感影像进行数据的预处理,并生成相应的标注数据,对U-Net卷积神经网络模型进行训练建模。具体如下:Secondly, based on the remote sensing image information of each polluting enterprise and its corresponding point, the corresponding remote sensing images around the point are extracted. Preprocess the data through the acquired remote sensing images, and generate corresponding annotation data to train and model the U-Net convolutional neural network model. details as follows:
将获得的相应点位的遥感影像与相应标注图进行切块处理,分成256*256大小的图像,并进行翻转与裁剪等数据增强的操作提高数据的多样性,以训练样本大小为10作为模型的训练输入,初始学习率为5e-5,损失函数选用二元交叉熵。在达到200K次的迭代后,模型基本达到局部最优点并收敛。在后续的模型验证过程中,对模型分割的结果以总体精度、召回率、F1-score作为模型性能的精度评价。The obtained remote sensing images and corresponding annotation maps of the corresponding points are cut into 256*256 images, and data enhancement operations such as flipping and cropping are performed to increase the diversity of the data. The training sample size is 10 as the model. The training input is, the initial learning rate is 5e-5, and the loss function uses binary cross entropy. After reaching 200K iterations, the model basically reaches the local optimum and converges. In the subsequent model verification process, the overall accuracy, recall rate, and F1-score of the model segmentation results were used as the accuracy evaluation of model performance.
Figure PCTCN2022086484-appb-000024
Figure PCTCN2022086484-appb-000024
Figure PCTCN2022086484-appb-000025
Figure PCTCN2022086484-appb-000025
Figure PCTCN2022086484-appb-000026
Figure PCTCN2022086484-appb-000026
其中TP代表着是分类正确的并且类别为建筑的像元个数;TN表示分类正确的并且类别为背景的像元个数;FN表示本为建筑类别的像元被分为背景的像元个数;FP表示类别为背景的像元被分为建筑类别的像元个数。TP represents the number of pixels that are classified correctly and are classified as buildings; TN represents the number of pixels that are classified correctly and are classified as background; FN represents the number of pixels that are classified as background and are classified as buildings. Number; FP represents the number of pixels whose category is background and is divided into building categories.
然后,将训练好的U-Net卷积神经网络模型在研究区进行测试验证,模型对相应点位的影像进行图像分割,返还的结果图中的每个像元只有建筑与非建筑两类。若返还的结果图中存在建筑,则说明该点位的企业真实存在;若返还的结果图中只存在非建筑,则说明该点位的企业并不存在;将该结果与污染企业数据进行比较,若两者相同,则自动筛选出有效污染企业数据;若两者不同,对该企业信息进行二次审查以判断该点位是否存在企业。Then, the trained U-Net convolutional neural network model is tested and verified in the research area. The model performs image segmentation on the image at the corresponding point. Each pixel in the returned result map has only two categories: building and non-building. If there are buildings in the returned result map, it means that the enterprise at that point really exists; if there are only non-buildings in the returned result map, it means that the enterprise at that point does not exist; compare the results with the data of polluting enterprises. , if the two are the same, the valid polluting enterprise data will be automatically filtered out; if the two are different, the enterprise information will be reviewed twice to determine whether there is an enterprise at that point.
再后,使用核密度法对前述获取的不同时期污染企业点位数据进行空间插值,设定输出的分辨率为100m,得到研究区不同时期污染企业的空间分布密度图(图1c和图2c),即完成核密度分析。运用反距离加权法预测研究区不同时期土壤Cd浓度空间分布,并运用《土壤环境质量—农用地土壤污染风险管控标准》对土壤Cd风险级别进行划分(图1a和图2a)。根据研究区的研究边界生成特定100m分辨率的网格以及对应的网格点,基于网格点统计网格内不同时期土壤Cd风险级别与污染企业密度值。对不同时期土壤Cd风险级别与污染企业密度值进行双变量局部空间自相关分析,选取的空间相邻关系为K-Nearest neighbors, 得到不同时期土壤Cd与污染企业的时空交互关系(图1d和图2d)。Then, use the kernel density method to perform spatial interpolation on the previously obtained point data of polluting enterprises in different periods, and set the output resolution to 100m to obtain the spatial distribution density map of polluting enterprises in different periods in the study area (Figure 1c and Figure 2c) , that is, kernel density analysis is completed. The inverse distance weighting method was used to predict the spatial distribution of soil Cd concentration in the study area in different periods, and the "Soil Environmental Quality - Agricultural Land Soil Pollution Risk Management and Control Standards" was used to classify soil Cd risk levels (Figure 1a and Figure 2a). A specific 100m resolution grid and corresponding grid points are generated based on the research boundary of the study area, and the soil Cd risk levels and polluting enterprise density values in the grid in different periods are calculated based on the grid points. Bivariate local spatial autocorrelation analysis was performed on soil Cd risk levels and polluting enterprise density values in different periods. The selected spatial neighbor relationship was K-Nearest neighbors, and the spatiotemporal interaction relationship between soil Cd and polluting enterprises in different periods was obtained (Figure 1d and Figure 1 2d).
最后,如图2所示,根据研究区不同时期土壤Cd风险级别、受体脆弱性(以人口密度为例(图1b和图2b))和时空交互关系,参考表1对研究区不同时期土壤Cd进行风险识别(图3a和图3b)。随后通过收集研究区不同时期土壤Cd风险演变的影响因素(包括数字高程模型、污染企业密度、PM2.5、年平均降水、年平均气温、人口密度和夜间灯光),运用2002风险管理分区和影响因素预测2012年风险管理分区并构建模型(图3c),以总体精度、kappa系数以及不同等级风险控制区的生产者精度和使用者精度作为评价指标,从而得到模型的精度评价结果(图4);最后运用斑块生成土地利用模拟模型对研究区未来土壤Cd风险类别进行预测(图3d),以便后续进行监测与管理。Finally, as shown in Figure 2, based on the soil Cd risk level, receptor vulnerability (taking population density as an example (Figure 1b and Figure 2b)) and spatiotemporal interaction in the study area at different periods, refer to Table 1 for soil Cd risk levels in the study area at different periods. Cd for risk identification (Figure 3a and Figure 3b). Subsequently, by collecting the influencing factors of soil Cd risk evolution in the study area in different periods (including digital elevation model, density of polluting enterprises, PM2.5, average annual precipitation, average annual temperature, population density and nighttime lighting), the 2002 risk management zoning and impact Factors predicted the risk management partition in 2012 and built a model (Figure 3c). The overall accuracy, kappa coefficient, and producer accuracy and user accuracy of different levels of risk control areas were used as evaluation indicators to obtain the accuracy evaluation results of the model (Figure 4) ; Finally, the patch generation land use simulation model was used to predict future soil Cd risk categories in the study area (Figure 3d) for subsequent monitoring and management.
以上所述的实施例只是本发明的一种较佳的方案,然其并非用以限制本发明。有关技术领域的普通技术人员,在不脱离本发明的精神和范围的情况下,还可以做出各种变化和变型。因此凡采取等同替换或等效变换的方式所获得的技术方案,均落在本发明的保护范围内。The above-described embodiment is only a preferred solution of the present invention, but it is not intended to limit the present invention. Those of ordinary skill in the relevant technical fields can also make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, any technical solution obtained by adopting equivalent substitution or equivalent transformation shall fall within the protection scope of the present invention.

Claims (10)

  1. 一种基于时空交互关系的土壤镉风险预测方法,其特征在于,步骤如下:A soil cadmium risk prediction method based on spatiotemporal interaction, characterized by the following steps:
    S1、获取目标预测区域内多个不同历史时期与土壤Cd污染相关的数据集,并通过数据的提取和交叉验证,得到不同历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据,以及不同历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH;S1. Obtain multiple data sets related to soil Cd pollution in different historical periods in the target prediction area, and through data extraction and cross-validation, obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods, and Soil Cd concentration and soil pH at each sampling point in the target prediction area in different historical periods;
    S2、针对每一个历史时期,使用核密度法对该历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据进行核密度分析,得到污染企业密度值,同时基于该历史时期目标预测区域内各采样点的土壤Cd浓度和土壤pH,通过插值方法得到目标预测区域内的土壤Cd浓度空间分布和土壤pH空间分布,结合不同pH范围对应的土壤Cd浓度风险筛选值,确定目标预测区域内不同位置的土壤Cd风险级别;S2. For each historical period, use the kernel density method to conduct kernel density analysis on the spatial point distribution data of soil Cd pollution enterprises in the target prediction area of the historical period, and obtain the density value of polluting enterprises. At the same time, based on the target prediction area of the historical period, For the soil Cd concentration and soil pH at each sampling point, the spatial distribution of soil Cd concentration and soil pH in the target prediction area was obtained through interpolation method. Combined with the soil Cd concentration risk screening values corresponding to different pH ranges, different differences in the target prediction area were determined. The soil Cd risk level of the location;
    S3、针对每一个历史时期,对目标预测区域进行网格化后,统计各网格内的土壤Cd风险级别与污染企业密度值,再对该历史时期土壤Cd风险级别与污染企业密度值进行双变量局部空间自相关分析,得到土壤Cd与污染企业的时空交互关系;S3. For each historical period, after gridding the target prediction area, count the soil Cd risk level and polluting enterprise density value in each grid, and then double-check the soil Cd risk level and polluting enterprise density value for the historical period. Variable local spatial autocorrelation analysis is used to obtain the spatiotemporal interaction relationship between soil Cd and polluting enterprises;
    S4、针对每一个历史时期,根据对应的所述土壤Cd风险级别、受体脆弱性和所述时空交互关系,按照预设的风险区分级标准进行目标预测区域内不同等级土壤Cd风险区的识别;S4. For each historical period, according to the corresponding soil Cd risk level, receptor vulnerability and the spatio-temporal interaction relationship, identify different levels of soil Cd risk areas in the target prediction area according to the preset risk area grading standards. ;
    S5、获取目标预测区域内每一个历史时期对应的土壤Cd风险演变影响因素,结合各历史时期的土壤Cd风险区识别结果,运用斑块生成土地利用模拟模型(PLUS)实现对目标预测区域内未来土壤Cd风险区的预测。S5. Obtain the influencing factors of soil Cd risk evolution corresponding to each historical period in the target prediction area, combine the identification results of soil Cd risk areas in each historical period, and use the patch generation land use simulation model (PLUS) to realize the future prediction of the target prediction area. Prediction of soil Cd risk areas.
  2. 如权利要求1所述的基于时空交互关系的土壤镉风险预测方法,其特征在于,所述S1中,所述与土壤Cd污染相关的数据集包括与土壤Cd污染相关的涉土企业数据、POI数据、历史工业企业数据和多时期高分辨率遥感影像,以及不同时期土壤采样点数据;其中,涉土企业数据包括与土壤Cd污染相关的企业的名称及其所属地市,POI数据包括企业名称及其经纬度信息;历史工业企业数据包括企业名称和生产活动年份;土壤采样点数据包括不同采样点处的土壤Cd浓度和土壤pH数据。The soil cadmium risk prediction method based on spatiotemporal interaction as claimed in claim 1, wherein in S1, the data set related to soil Cd pollution includes soil-related enterprise data and POI related to soil Cd pollution. Data, historical industrial enterprise data and multi-period high-resolution remote sensing images, as well as soil sampling point data in different periods; among them, soil-related enterprise data includes the names of enterprises related to soil Cd pollution and the cities where they belong, and POI data includes the names of enterprises and its latitude and longitude information; historical industrial enterprise data includes enterprise names and years of production activities; soil sampling point data includes soil Cd concentration and soil pH data at different sampling points.
  3. 如权利要求2所述的基于时空交互关系的土壤镉风险预测方法,其特征在于,所述S1中,所述土壤Cd污染企业空间点位分布数据的获取方法为:The soil cadmium risk prediction method based on spatiotemporal interaction according to claim 2, characterized in that, in the S1, the method for obtaining the spatial point distribution data of the soil Cd pollution enterprises is:
    S11、将所述与土壤Cd污染相关的涉土企业数据中的非结构化数据转换为结构化数据,从而得到与土壤Cd污染相关的涉土企业名称;对所述涉土企业名称、以及历史工业企业数据和POI数据中的企业名称分别进行分词处理,提取企业名称中不同层级的实体,随后进行相似度匹配,将企业名称中各层级实体完全匹配的涉土企业作为污染企业,将历史工业企业数据中该污染企业的生产活动年份和POI数据中的地理位置信息关联至该污染企业中,得到含地理位置和生产活动年份的污染企业数据。S11. Convert the unstructured data in the soil-related enterprise data related to soil Cd pollution into structured data, thereby obtaining the names of soil-related enterprises related to soil Cd pollution; The company names in the industrial enterprise data and POI data are separately processed by word segmentation, entities at different levels in the enterprise names are extracted, and then similarity matching is performed. The soil-related enterprises that completely match the entities at all levels in the enterprise names are regarded as polluting enterprises, and the historical industries are The year of production activities of the polluting enterprise in the enterprise data and the geographical location information in the POI data are associated with the polluting enterprise, and the data of polluting enterprises including the geographical location and year of production activities are obtained.
    S12、从多时期高分辨率遥感影像信息中提取每个污染企业及其周边场地在各生产活动年份的高分辨率遥感影像,然后使用基于深度学习的图像语义分割模型,对每个污染企业对应的高分辨率遥感影像数据进行建筑物特征提取,以判断各生产活动年份中影像区域内是否存在建筑或者企业工厂,实现不同年份污染企业空间点位分布的遥感验证,剔除未通过遥感验证的污染企业数据,按生产年份对剩余污染企业数据进行划分,得到不同历史时期目标预测区域内的土壤Cd污染企业空间点位分布数据。S12. Extract high-resolution remote sensing images of each polluting enterprise and its surrounding sites in each production activity year from multi-period high-resolution remote sensing image information, and then use an image semantic segmentation model based on deep learning to map each polluting enterprise's corresponding Building features are extracted from high-resolution remote sensing image data to determine whether there are buildings or corporate factories in the image area in each production activity year, to achieve remote sensing verification of the spatial point distribution of polluting enterprises in different years, and to eliminate pollution that has not passed remote sensing verification. For enterprise data, the remaining polluting enterprise data are divided according to the year of production to obtain the spatial point distribution data of soil Cd pollution enterprises in the target prediction area in different historical periods.
  4. 如权利要求3所述的基于时空交互关系的土壤镉风险预测方法,其特征在于,所述S11中,企业名称使用分词引擎jieba进行分词处理,被分为行政区划、字号、行业、组织形式四个层级的实体,随后通过计算编辑距离进行相似度匹配,若两个企业名称中四个层级的实体完全匹配,则视为这两个企业名称匹配。The soil cadmium risk prediction method based on spatio-temporal interaction as claimed in claim 3, characterized in that in S11, the company name is segmented using the word segmentation engine jieba, and is divided into four categories: administrative division, trade size, industry, and organizational form. The entities at each level are then matched by calculating the edit distance for similarity. If the entities at the four levels of the two company names completely match, the two company names are deemed to match.
  5. 如权利要求3所述的基于时空交互关系的土壤镉风险预测方法,其特征在于,所述S12中,所述基于深度学习的图像语义分割模型为U-Net卷积神经网络模型;在对高分辨率遥感影像进行建筑物特征提取时,根据不同历史时期污染企业的经纬度及其对应遥感影像上的信息获取该点位周边相应的遥感影像;对获取的遥感影像进行数据预处理,生成相应的标注数据并运用U-Net卷积神经网络模型进行训练建模;再利用训练好的U-Net卷积神经网络模型对高分辨率遥感影像进行图像分割,返还的结果图中的每个像元有建筑与非建筑两类;将属于同一生产年份的返还的结果图与污染企业数据进行比较,若污染企业所在位置在返还的结果图中存在建筑,则说明该点位的污染企业真实存在;若污染企业所在位置 在返还的结果图中只存在非建筑,则说明该点位的污染企业并不存在,将该污染企业信息进行二次审查以判断该点位是否存在污染企业。The soil cadmium risk prediction method based on spatiotemporal interaction as claimed in claim 3, characterized in that, in the S12, the image semantic segmentation model based on deep learning is a U-Net convolutional neural network model; When extracting building features from high-resolution remote sensing images, the corresponding remote sensing images around the point are obtained based on the longitude and latitude of polluting enterprises in different historical periods and the information on the corresponding remote sensing images; data preprocessing is performed on the obtained remote sensing images to generate the corresponding Label the data and use the U-Net convolutional neural network model for training modeling; then use the trained U-Net convolutional neural network model to segment the high-resolution remote sensing images, and return each pixel in the result image There are two categories: building and non-building; compare the returned result map belonging to the same production year with the polluting enterprise data. If there is a building in the returned result map at the location of the polluting enterprise, it means that the polluting enterprise at that point really exists; If there are only non-buildings in the returned result map for the location of the polluting enterprise, it means that the polluting enterprise does not exist at that point, and the polluting enterprise information will be reviewed twice to determine whether there is a polluting enterprise at that point.
  6. 如权利要求1所述的一种基于时空交互关系的土壤Cd风险预测方法,其特征在于,所述S3的具体方法如下:针对每一个历史时期,对所述土壤Cd风险级别进行划分后,根据目标预测区域的边界,得到其最小外接矩形所围成的范围;然后以该最小外接矩形的某个顶点开始,生成格网和对应的网格点;运用网格点提取不同历史时期污染企业核密度和土壤Cd风险分级值,分别代表边界中不同历史时期污染企业的聚集程度和土壤Cd风险水平;将带有土壤Cd风险级别与污染企业密度值的网格点进行双变量局部莫兰分析,得到二者的时空交互关系,具体公式如下:A soil Cd risk prediction method based on spatiotemporal interaction as claimed in claim 1, characterized in that the specific method of S3 is as follows: for each historical period, after dividing the soil Cd risk levels, according to The boundary of the target prediction area is obtained to obtain the range enclosed by its minimum circumscribed rectangle; then starting from a certain vertex of the minimum circumscribed rectangle, a grid and corresponding grid points are generated; the grid points are used to extract the polluting enterprise core values in different historical periods. Density and soil Cd risk classification values represent the aggregation degree of polluting enterprises and soil Cd risk level in different historical periods in the boundary respectively; bivariate local Moran analysis is performed on the grid points with soil Cd risk level and density value of polluting enterprises. The spatio-temporal interaction relationship between the two is obtained. The specific formula is as follows:
    Figure PCTCN2022086484-appb-100001
    Figure PCTCN2022086484-appb-100001
    其中
    Figure PCTCN2022086484-appb-100002
    Figure PCTCN2022086484-appb-100003
    分别是变量a和b在网格i和网格j上的不同历史时期土壤Cd风险级别和污染企业密度,变量a和b分别代表网格内不同历史时期土壤Cd风险级别与污染企业密度;W ij为网格i和网格j的空间权重矩阵,根据网格i和网格j之间的欧式距离权重所得;I ab表示网格i处的a属性与网格j处b属性的局部莫兰指数;当I ab显著为正时,则网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部正相关关系;当I ab显著为负时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度具有显著的局部负相关关系;当I ab不显著时,则认为网格i处的土壤Cd风险级别与网格j处的污染企业密度无明显的时空交互关系。
    in
    Figure PCTCN2022086484-appb-100002
    and
    Figure PCTCN2022086484-appb-100003
    Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods on grid i and grid j. Variables a and b respectively represent soil Cd risk levels and polluting enterprise densities in different historical periods in the grid; W ij is the spatial weight matrix of grid i and grid j, which is obtained based on the Euclidean distance weight between grid i and grid j; I ab represents the local invariance between the a attribute at grid i and the b attribute at grid j. blue index; when I ab is significantly positive, then the soil Cd risk level at grid i has a significant local positive correlation with the density of polluting enterprises at grid j; when I ab is significantly negative, then the grid The soil Cd risk level at grid i has a significant local negative correlation with the density of polluting enterprises at grid j; when I ab is not significant, it is considered that the soil Cd risk level at grid i is related to the density of polluting enterprises at grid j Density has no obvious spatiotemporal interaction.
  7. 如权利要求1所述的一种基于时空交互关系的土壤Cd风险预测方法,其特征在于,所述S4中,在预设的风险区分级标准中,土壤Cd风险区被分为高、中、低三种风险控制区和风险不确定区。A soil Cd risk prediction method based on spatio-temporal interaction as claimed in claim 1, characterized in that, in said S4, in the preset risk zone grading standards, the soil Cd risk zone is divided into high, medium, There are three low risk control areas and risk uncertainty areas.
  8. 如权利要求1所述的一种基于时空交互关系的土壤Cd风险预测方法,其特征在于,所述受体脆弱性为人口密度。A soil Cd risk prediction method based on spatiotemporal interaction according to claim 1, characterized in that the receptor vulnerability is population density.
  9. 如权利要求1所述的一种基于时空交互关系的土壤Cd风险预测方法,其特征在于,所述S5中,所述斑块生成土地利用模拟模型通过提取历史时期之间各等级土壤Cd风险区的变化特征,并采用随机森林算法逐一对各土壤Cd风险 演变影响因素进行挖掘,获取各等级的土壤Cd风险区的变化发生概率以及各土壤Cd风险演变影响因素对变化特征的贡献,最终在变化发生概率的约束下,基于最新的土壤Cd风险演变影响因素,实现对目标预测区域内未来土壤Cd风险区的预测。A soil Cd risk prediction method based on spatiotemporal interaction as claimed in claim 1, characterized in that, in the S5, the patch generation land use simulation model extracts soil Cd risk areas of various levels between historical periods. change characteristics, and use the random forest algorithm to mine the influencing factors of soil Cd risk evolution one by one to obtain the change occurrence probability of soil Cd risk areas at each level and the contribution of each soil Cd risk evolution influencing factor to the change characteristics, and finally in the change Under the constraints of occurrence probability, based on the latest soil Cd risk evolution influencing factors, the prediction of future soil Cd risk areas in the target prediction area is realized.
  10. 如权利要求1所述的一种基于时空交互关系的土壤Cd风险预测方法,其特征在于,所述斑块生成土地利用模拟模型实现对目标预测区域内未来土壤Cd风险区的预测的具体步骤如下:首先,将两个历史时期的各等级土壤Cd风险区进行叠加分析,从中提取每种等级土壤Cd风险区的变化特征;其次,基于随机森林模型建立每种等级土壤Cd风险区发生变化的发生概率,并计算各土壤Cd风险演变影响因素对每种土地利用变化的贡献率,发生概率用于估计网格单元中每种等级土壤Cd风险区的动态变化趋势;最后,通过整合时间序列的土壤Cd风险区数据和模型产生的组合概率来预测未来各等级的土壤Cd风险区,其中通过以下公式估计网格单元p未来被k等级的土壤Cd风险区占用的组合概率:A soil Cd risk prediction method based on spatiotemporal interaction as claimed in claim 1, characterized in that the specific steps for the patch generation land use simulation model to predict future soil Cd risk areas in the target prediction area are as follows : First, the soil Cd risk areas of each level in the two historical periods were overlaid and analyzed to extract the change characteristics of each level of soil Cd risk area; secondly, the occurrence of changes in each level of soil Cd risk area was established based on the random forest model. probability, and calculate the contribution rate of each soil Cd risk evolution influencing factor to each land use change. The occurrence probability is used to estimate the dynamic change trend of each grade of soil Cd risk area in the grid unit; finally, by integrating the time series of soil The combined probability generated by Cd risk area data and the model is used to predict future soil Cd risk areas of each level, where the combined probability that grid unit p will be occupied by k level soil Cd risk areas in the future is estimated by the following formula:
    Figure PCTCN2022086484-appb-100004
    Figure PCTCN2022086484-appb-100004
    其中
    Figure PCTCN2022086484-appb-100005
    表示网格单元p在迭代时间t从原始等级的土壤Cd风险区转换到目标k等级的土壤Cd风险区的组合概率;P p,k表示网格单元p上目标k等级的土壤Cd风险区出现的概率;
    Figure PCTCN2022086484-appb-100006
    表示在迭代时间t对应网格单元p上目标k等级的土壤Cd风险区的邻域效应;
    Figure PCTCN2022086484-appb-100007
    表示在迭代时间t上目标k等级的土壤Cd风险区的惯性系数;sc c→k表示从原始c等级的土壤Cd风险区到目标k等级的土壤Cd风险区的转换成本;斑块生成土地利用模拟模型使用轮盘赌选择机制来确定哪种土地利用类型将占据网格单元。
    in
    Figure PCTCN2022086484-appb-100005
    Indicates the combined probability that grid unit p converts from the soil Cd risk area of the original level to the soil Cd risk area of the target level k at the iteration time t; P p,k indicates the occurrence of the soil Cd risk area of the target level k on the grid unit p The probability;
    Figure PCTCN2022086484-appb-100006
    Represents the neighborhood effect of the soil Cd risk area of the target k level on the grid unit p corresponding to the iteration time t;
    Figure PCTCN2022086484-appb-100007
    represents the inertia coefficient of the soil Cd risk area of target k level at iteration time t; sc c→k represents the conversion cost from the original c level soil Cd risk area to the target k level soil Cd risk area; patch generation land use The simulation model uses a roulette selection mechanism to determine which land use type will occupy a grid cell.
PCT/CN2022/086484 2022-03-10 2022-04-13 Soil cadmium risk prediction method based on spatial-temporal interaction relationship WO2023168781A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210234132.5A CN114742272A (en) 2022-03-10 2022-03-10 Soil cadmium risk prediction method based on space-time interaction relation
CN202210234132.5 2022-03-10

Publications (1)

Publication Number Publication Date
WO2023168781A1 true WO2023168781A1 (en) 2023-09-14

Family

ID=82275948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086484 WO2023168781A1 (en) 2022-03-10 2022-04-13 Soil cadmium risk prediction method based on spatial-temporal interaction relationship

Country Status (2)

Country Link
CN (1) CN114742272A (en)
WO (1) WO2023168781A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151557A (en) * 2023-11-01 2023-12-01 甘肃蓝曦环保科技有限公司 Quality monitoring method and system based on industrial wastewater monitoring data
CN117437254A (en) * 2023-12-21 2024-01-23 北京英视睿达科技股份有限公司 Grid division method, device, equipment and medium based on environment space-time data
CN117589646A (en) * 2024-01-19 2024-02-23 中国科学院空天信息创新研究院 Method, device, equipment and medium for monitoring concentration of atmospheric fine particulate matters
CN117609928A (en) * 2024-01-23 2024-02-27 华北电力科学研究院有限责任公司 Pollutant emission amount anomaly identification method and device based on power data
CN117935081A (en) * 2024-03-21 2024-04-26 泰安市金土地测绘整理有限公司 Cultivated land change monitoring method and system based on remote sensing satellite data
CN118069774A (en) * 2024-04-19 2024-05-24 四川省地质矿产勘查开发局成都综合岩矿测试中心(国土资源部成都矿产资源监督检测中心) Ecological pollution migration path analysis method and medium based on soil heavy metals
CN118095612A (en) * 2023-12-15 2024-05-28 广东惠利通环境科技有限公司 Industrial park soil and groundwater pollution early warning grade evaluation method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424143B (en) * 2022-08-29 2024-05-28 南方海洋科学与工程广东省实验室(广州) Water source pollution tracing method and device, storage medium and computer equipment
CN116242991B (en) * 2023-05-08 2023-07-21 北京建工环境修复股份有限公司 Device and method for monitoring pollutants in soil
CN116773465B (en) * 2023-08-25 2023-10-27 北京建工环境修复股份有限公司 Perfluoro compound pollution on-line monitoring method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111707490A (en) * 2020-06-24 2020-09-25 湘潭大学 Method for staged and zoned sampling of agricultural land soil pollution survey
CN112288247A (en) * 2020-10-20 2021-01-29 浙江大学 Soil heavy metal risk identification method based on space interaction relation
WO2021226976A1 (en) * 2020-05-15 2021-11-18 安徽中科智能感知产业技术研究院有限责任公司 Soil available nutrient inversion method based on deep neural network
CN114062649A (en) * 2021-10-27 2022-02-18 生态环境部南京环境科学研究所 Soil pollution trend analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021226976A1 (en) * 2020-05-15 2021-11-18 安徽中科智能感知产业技术研究院有限责任公司 Soil available nutrient inversion method based on deep neural network
CN111707490A (en) * 2020-06-24 2020-09-25 湘潭大学 Method for staged and zoned sampling of agricultural land soil pollution survey
CN112288247A (en) * 2020-10-20 2021-01-29 浙江大学 Soil heavy metal risk identification method based on space interaction relation
CN114062649A (en) * 2021-10-27 2022-02-18 生态环境部南京环境科学研究所 Soil pollution trend analysis method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 15 April 2021, BEIJING UNIVERSITY OF SCIENCE AND TECHNOLOGY INFORMATION, CN, article HE, YUNSHAN: "Research and Application of Prediction Model of Regional Soil Heavy Metal Pollution", pages: 1 - 76, XP009548641, DOI: 10.26966/d.cnki.gbjjc.2021.000025 *
KUMAR VINOD; THAKUR ROUSHAN KUMAR; KUMAR PANKAJ: "Assessment of heavy metals uptake by cauliflower (Brassica oleracea var. botrytis) grown in integrated industrial effluent irrigated soils: A prediction modeling study", SCIENTIA HORTICULTURAE, ELSEVIER, AMSTERDAM, NL, vol. 257, 24 July 2019 (2019-07-24), AMSTERDAM, NL , XP085762020, ISSN: 0304-4238, DOI: 10.1016/j.scienta.2019.108682 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117151557A (en) * 2023-11-01 2023-12-01 甘肃蓝曦环保科技有限公司 Quality monitoring method and system based on industrial wastewater monitoring data
CN117151557B (en) * 2023-11-01 2024-01-09 甘肃蓝曦环保科技有限公司 Quality monitoring method and system based on industrial wastewater monitoring data
CN118095612A (en) * 2023-12-15 2024-05-28 广东惠利通环境科技有限公司 Industrial park soil and groundwater pollution early warning grade evaluation method
CN117437254A (en) * 2023-12-21 2024-01-23 北京英视睿达科技股份有限公司 Grid division method, device, equipment and medium based on environment space-time data
CN117437254B (en) * 2023-12-21 2024-05-03 北京英视睿达科技股份有限公司 Grid division method, device, equipment and medium based on environment space-time data
CN117589646A (en) * 2024-01-19 2024-02-23 中国科学院空天信息创新研究院 Method, device, equipment and medium for monitoring concentration of atmospheric fine particulate matters
CN117589646B (en) * 2024-01-19 2024-04-26 中国科学院空天信息创新研究院 Method, device, equipment and medium for monitoring concentration of atmospheric fine particulate matters
CN117609928A (en) * 2024-01-23 2024-02-27 华北电力科学研究院有限责任公司 Pollutant emission amount anomaly identification method and device based on power data
CN117609928B (en) * 2024-01-23 2024-04-05 华北电力科学研究院有限责任公司 Pollutant emission amount anomaly identification method and device based on power data
CN117935081A (en) * 2024-03-21 2024-04-26 泰安市金土地测绘整理有限公司 Cultivated land change monitoring method and system based on remote sensing satellite data
CN118069774A (en) * 2024-04-19 2024-05-24 四川省地质矿产勘查开发局成都综合岩矿测试中心(国土资源部成都矿产资源监督检测中心) Ecological pollution migration path analysis method and medium based on soil heavy metals
CN118069774B (en) * 2024-04-19 2024-06-25 四川省地质矿产勘查开发局成都综合岩矿测试中心(国土资源部成都矿产资源监督检测中心) Ecological pollution migration path analysis method and medium based on soil heavy metals

Also Published As

Publication number Publication date
CN114742272A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
WO2023168781A1 (en) Soil cadmium risk prediction method based on spatial-temporal interaction relationship
Wang et al. Machine learning-based regional scale intelligent modeling of building information for natural hazard risk management
CN112288247B (en) Soil heavy metal risk identification method based on space interaction relationship
CN109508360B (en) Geographical multivariate stream data space-time autocorrelation analysis method based on cellular automaton
CN111222661A (en) Urban planning implementation effect analysis and evaluation method
Khoshnood Motlagh et al. Analysis and prediction of land cover changes using the land change modeler (LCM) in a semiarid river basin, Iran
CN109189917B (en) City functional zone division method and system integrating landscape and social characteristics
Zhou et al. Spatiotemporal change footprint pattern discovery: an inter‐disciplinary survey
Djan'na et al. Impact of the accuracy of land cover data sets on the accuracy of land cover change scenarios in the Mono River Basin, Togo, West Africa
Huang et al. Research on urban modern architectural art based on artificial intelligence and GIS image recognition system
Fan et al. Understanding spatial-temporal urban expansion pattern (1990–2009) using impervious surface data and landscape indexes: a case study in Guangzhou (China)
Amarsaikhan et al. Applications of remote sensing and geographic information systems for urban land-cover change studies in Mongolia
Wahyudi et al. Combining Landsat and landscape metrics to analyse large-scale urban land cover change: A case study in the Jakarta Metropolitan Area
Lu et al. Assessing the impact of land surface temperature on urban net primary productivity increment based on geographically weighted regression model
CN114661744B (en) Terrain database updating method and system based on deep learning
Muttaqin et al. MaxEnt (Maximum Entropy) model for predicting prehistoric cave sites in Karst area of Gunung Sewu, Gunung Kidul, Yogyakarta
Liu et al. Landslide susceptibility mapping with the fusion of multi-feature SVM model based FCM sampling strategy: A case study from Shaanxi Province
CN115019163A (en) City factor identification method based on multi-source big data
CN113688940A (en) Suspected pollution industrial enterprise identification method based on public data
Duan et al. Urban flood vulnerability Knowledge-Graph based on remote sensing and textual bimodal data fusion
CN114511787A (en) Neural network-based remote sensing image ground feature information generation method and system
Oki et al. Model for estimation of building structure and built year using building façade images and attributes obtained from a real estate database
Sun et al. Automatic building age prediction from street view images
CN115497006A (en) Urban remote sensing image change depth monitoring method and system based on dynamic hybrid strategy
CN105808715B (en) Method for establishing map per location

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22930413

Country of ref document: EP

Kind code of ref document: A1