CN115374709A

CN115374709A - A land analysis method and system based on deep forest model and FLUS model

Info

Publication number: CN115374709A
Application number: CN202211045930.XA
Authority: CN
Inventors: 刘小平; 庄浩铭
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-11-22
Anticipated expiration: 2042-08-29
Also published as: CN115374709B

Abstract

The invention discloses a land analysis method and system based on a deep forest model and a FLUS model, which acquires driving factor data and multi-period land use distribution data for driving land use changes; and analyzes the multi-period land use distribution according to the land expansion analysis strategy The data is sampled, and the driving factor data is collected to obtain a training sample set; the training sample set is input into a deep forest model for training to obtain the data on the suitability for development of each land change; the data on the suitability for development of the land change , Neighborhood Effect, Conversion Cost and Inertia Coefficient are input into the FLUS model for simulation to obtain simulated land use distribution data; the driving factor data set is calculated through the deep forest model to obtain the contribution weight data of each driving factor to each land change , effectively improving the accuracy of FLUS model land use change simulation, which is less affected by randomness and more stable, endowing the FLUS model with the ability to accurately analyze the driving mechanism of land use.

Description

A land analysis method and system based on deep forest model and FLUS model

技术领域technical field

本发明涉及地理信息科学技术领域，特别是涉及一种基于深度森林模型和FLUS模型的土地分析方法及系统。The invention relates to the technical field of geographic information science, in particular to a land analysis method and system based on a deep forest model and a FLUS model.

背景技术Background technique

土地是社会系统和自然系统交互的重要场所。人类的活动深刻影响了土地的利用模式，而土地变化反过来也影响了自然环境和人类社会。研究历史和未来的土地变化，是评估土地变化影响并制定有效缓解措施，进而实现人类社会可持续发展的重要基础。CA模型能够用于探究土地变化与自然环境和社会经济因子间的交互作用，模拟土地的复杂时空动态，探究土地变化的驱动机制，目前已经被广泛应用于土地变化的模拟研究。FLUS模型在CA的基础上引入了自适应惯性机制，是现今应用最广泛的CA模型之一，已经被成功应用到区域尺度和全球尺度的土地利用变化模拟研究中。Land is an important place where social systems and natural systems interact. Human activities have profoundly affected land use patterns, and land changes have in turn affected the natural environment and human society. The study of historical and future land changes is an important basis for assessing the impact of land change and formulating effective mitigation measures, so as to realize the sustainable development of human society. The CA model can be used to explore the interaction between land change and natural environment and socio-economic factors, simulate the complex spatio-temporal dynamics of land, and explore the driving mechanism of land change. It has been widely used in the simulation research of land change. The FLUS model introduces an adaptive inertial mechanism on the basis of CA, and is one of the most widely used CA models today. It has been successfully applied to land use change simulation studies at regional and global scales.

建模各类空间和非空间的驱动因子与土地变化间的复杂非线性关系，挖掘精确的土地变化规则，是CA模型的重要组成成分。早期的研究，采用传统的数据挖掘方法，包括机器学习、仿生学、集成学习方法，挖掘CA土地变化规则。然而这些方法的表达能力较弱，导致模型的模拟精度较低。Modeling the complex nonlinear relationship between various spatial and non-spatial driving factors and land change, and mining accurate land change rules are important components of the CA model. Early studies used traditional data mining methods, including machine learning, bionics, and ensemble learning methods, to mine CA land change rules. However, these methods are less expressive, resulting in lower simulation accuracy of the model.

近来的研究，尝试应用深度神经网络方法，挖掘CA土地变化规则。深度神经网络虽然具有充足的表达能力，但是模型复杂，计算效率低，且易受超参的影响，可用性较低，也难以用于土地变化的驱动机制。Recent studies have attempted to apply deep neural network methods to mine CA land change rules. Although the deep neural network has sufficient expressive ability, the model is complex, the calculation efficiency is low, and it is easily affected by hyperparameters, the usability is low, and it is difficult to be used for the driving mechanism of land change.

因此，现有技术还有待进一步提升和改进。Therefore, prior art still needs further promotion and improvement.

发明内容Contents of the invention

本发明的目的：提供一种基于深度森林模型和FLUS模型的土地分析方法及系统，克服传统机器学习模型表达能力较差和深度神经网络过于复杂的问题，改进FLUS模型土地利用变化模拟的精度，为土地利用变化驱动因子重要性分析提供了可能，赋予了FLUS模型精确分析土地利用驱动机制的能力。Purpose of the present invention: provide a kind of land analysis method and system based on deep forest model and FLUS model, overcome the poor expressive ability of traditional machine learning model and the problem that deep neural network is too complicated, improve the accuracy of land use change simulation of FLUS model, It provides the possibility to analyze the importance of driving factors of land use change, and endows the FLUS model with the ability to accurately analyze the driving mechanism of land use.

为了实现上述目的，本发明提供了一种基于深度森林模型和FLUS 模型的土地分析方法及系统。In order to achieve the above purpose, the present invention provides a land analysis method and system based on the deep forest model and the FLUS model.

第一方面，本发明提供了一种基于深度森林模型和FLUS模型的土地分析方法，其中，所述方法包括：获取驱动土地利用变化的驱动因子数据和土地利用分布数据；In a first aspect, the present invention provides a land analysis method based on a deep forest model and a FLUS model, wherein the method includes: obtaining driving factor data and land use distribution data that drive land use change;

根据土地扩张分析策略，对所述多期土地利用变化数据进行采样，并集合所述驱动因子数据，所述得到训练样本集；According to the land expansion analysis strategy, the multi-period land use change data is sampled, and the driving factor data is collected to obtain a training sample set;

将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据；The training sample set is input into the deep forest model for training to obtain the suitability data for each land change development;

将所述土地变化发展适宜性数据、邻域效应、转换成本和惯性系数输入FLUS模型进行模拟，得到模拟土地利用分布数据；Input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据。The driving factor data set is calculated through the deep forest model to obtain the contribution weight data of each driving factor to each land change.

在进一步的实施方案中，所述根据土地扩张分析策略，对所述多期土地利用分布数据进行采样，并集合所述驱动因子数据，所述得到训练样本集，还包括：In a further embodiment, according to the land expansion analysis strategy, the multi-period land use distribution data is sampled, and the driving factor data is collected, and the training sample set is obtained, which also includes:

采用叠加分析方法，从多期历史土地利用分布数据中提取出扩张区域；The expansion area is extracted from the multi-period historical land use distribution data by superposition analysis method;

在所述扩张区域内进行分层随机采样，获得样本位置；performing stratified random sampling in the expansion area to obtain sample positions;

根据所述驱动因子数据和所述样本位置对应的土地利用数据，构造所述训练样本集。The training sample set is constructed according to the driving factor data and the land use data corresponding to the sample location.

在进一步的实施方案中，所述将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据，还包括：In a further embodiment, the input of the training sample set into the deep forest model for training to obtain the data on the suitability for development of each land change also includes:

所述训练样本集如下方式表示：The training sample set is represented as follows:

{x₁,x₁,…,x_i,…,x_N,y_j}，其中x_i代表第i个驱动因子的值，N为驱动因子的数量，y_j土地变化所对应的土地利用类型，M为土地利用类型的数量；{x ₁ ,x ₁ ,…, _xi ,…,x _N ,y _j }, where x _i represents the value of the i-th driving factor, N is the number of driving factors, and the land use type corresponding to y _j land change , M is the number of land use types;

通过深度森林模型建模驱动因子和土地变化之间的非线性映射关系，将在所述深度森林模型最后一层中基础森林输出的分类概率向量的平均值作为土地利用变化发展适宜性值；Modeling the nonlinear mapping relationship between driving factors and land changes through the deep forest model, the average value of the classification probability vectors output by the basic forest in the last layer of the deep forest model is used as the land use change development suitability value;

所述分类概率向量采用如下方式表示：The classification probability vector is expressed in the following way:

{P₁,P₂,…,P_j,…,P_M}{P ₁ ,P ₂ ,…,P _j ,…,P _M }

分类概率向量的平均值采用如下公式计算：The mean value of the classification probability vector is calculated using the following formula:

其中，P_j是土地利用类型y_j的分类概率，H_t(X)为单个决策树的分类结果，X为训练样本集中驱动因子x_i所对应的特征向量，T是普通森林中决策树的个数，I为指示函数。Among them, P _j is the classification probability of land use type y _j , H _t (X) is the classification result of a single decision tree, X is the feature vector corresponding to the driving factor x _i in the training sample set, T is the decision tree in the general forest number, and I is an indicator function.

在进一步的实施方案中，所述通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据，还包括：In a further embodiment, the said deep forest model is used to calculate the driving factor data set to obtain the contribution weight data of each driving factor to each land change, which also includes:

所述深度森林模型包括多层级联结构且每层具有数个普通森林模型，所述普通森林由决策树构成；The deep forest model includes a multi-layer cascade structure and each layer has several common forest models, and the common forest is composed of decision trees;

选择划分规则，并根据划分规则，计算驱动因子数据集所在节点的基尼系数；其中，所述划分规则定义为θ，θ＝(i,t)，i表示一个驱动因子变量，t表示划分阈值驱动因子数据集所在节点定义为Node，所述基尼系数定义为G，具体如下：Select the division rule, and calculate the Gini coefficient of the node where the driving factor data set is located according to the division rule; wherein, the division rule is defined as θ, θ=(i, t), i represents a driving factor variable, and t represents the division threshold driving The node where the factor data set is located is defined as Node, and the Gini coefficient is defined as G, as follows:

将所述驱动因子数据集按照如下划分规则划分为两个新节点：The driving factor data set is divided into two new nodes according to the following division rules:

Node^left＝{(X,y)|x_i≤t}Node ^left ＝{(X,y)|x _i ≤t}

Node^right＝Node/Node^left Node ^right = Node/Node ^left

其中，Node^left和Node^right为划分后的两个新节点，两个数据集包含的样本数目分别为m^left和m^right；Among them, Node ^left and Node ^right are two new nodes after division, and the number of samples contained in the two data sets are m ^left and m ^right respectively;

采用如下公式计算两个新节点的基尼系数和：Use the following formula to calculate the sum of the Gini coefficients of the two new nodes:

采用如下公式计算划分规则θ的不纯度降低值：Use the following formula to calculate the impurity reduction value of the division rule θ:

D(Node|θ)＝G(Node)-G(Node|θ)D(Node|θ)＝G(Node)-G(Node|θ)

采用最大化不纯度减少值划分节点，构建决策树：其中，最大化不纯度减少值划分节点表示为θ^*，θ^*＝argmax_θD(Node|θ)。Use the maximum impurity reduction value to divide the nodes to construct a decision tree: where the maximum impurity reduction value division node is expressed as θ ^* , θ ^* = argmax _θ D(Node|θ).

在进一步的实施方案中，将所述最大化不纯度减少值作为驱动因子的贡献权重，评估所述驱动因子对各土地变化的贡献度。In a further embodiment, the maximum impurity reduction value is used as the contribution weight of the driving factors to evaluate the contribution of the driving factors to each land change.

在进一步的实施方案中，所述将所述最大化不纯度减少值作为驱动因子的贡献权重，还包括：In a further embodiment, the use of the maximum impurity reduction value as the contribution weight of the driving factor also includes:

对所述深度森林模型中每一层森林模型的每个基础森林模型进行训练，得到每个基础森林中各驱动因子的贡献权重；Each base forest model of each layer of forest model in the depth forest model is trained to obtain the contribution weight of each driving factor in each base forest;

计算每个基础森林中各驱动因子的贡献权重的平均值以及每层森林模型中各驱动因子的贡献权重的平均值；Calculate the average of the contribution weights of each driver in each base forest and the average of the contribution weights of each driver in each layer of the forest model;

将各层森林模型中各驱动因子的贡献权重的平均值取平均，得到集成贡献度，由所述集成贡献度构成各驱动因子对各土地变化的贡献度权重数据。The average value of the contribution weights of the driving factors in each layer of the forest model is averaged to obtain the integrated contribution degree, which constitutes the contribution weight data of each driving factor to each land change.

在进一步的实施方案中，对所述模拟土地利用分布数据进行评估，将真实的土地利用分布数据与DF-FLUS模型的模拟土地利用分布数据进行比对评估，得出所述DF-FLUS模型的模拟土地利用分布数据的精确度。In a further embodiment, the simulated land use distribution data is evaluated, and the real land use distribution data is compared with the simulated land use distribution data of the DF-FLUS model to obtain the DF-FLUS model Accuracy of simulated land use distribution data.

第二方面，本发明提供了一种基于深度森林模型和FLUS模型的土地分析系统，其中，所述系统包括：In a second aspect, the present invention provides a land analysis system based on a deep forest model and a FLUS model, wherein the system includes:

数据收集模块，用于获取驱动土地利用变化的驱动因子数据和多期土地利用分布数据；The data collection module is used to obtain the driving factor data and multi-period land use distribution data that drive land use change;

数据采样模块，用于根据土地扩张分析策略，对所述多期土地利用分布数据进行采样，并集合所述驱动因子数据，所述得到训练样本集；The data sampling module is used to sample the multi-period land use distribution data according to the land expansion analysis strategy, and collect the driving factor data to obtain the training sample set;

模型训练模块，用于将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据；The model training module is used to input the training sample set into the deep forest model for training to obtain the suitability data for each land change development;

数据处理模块，用于将所述土地变化发展适宜性数据、邻域效应、转换成本和惯性系数输入FLUS模型进行模拟，得到模拟土地利用分布数据；The data processing module is used to input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

数据评估模块，用于通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据。The data evaluation module is used to calculate the driving factor data set through the deep forest model, and obtain the contribution weight data of each driving factor to each land change.

第三方面，本发明还提供了一种计算机设备，包括处理器和存储器，所述处理器与所述存储器相连，所述存储器用于存储计算机程序，所述处理器用于执行所述存储器中存储的计算机程序，以使得所述计算机设备执行实现上述方法的步骤。In a third aspect, the present invention also provides a computer device, including a processor and a memory, the processor is connected to the memory, the memory is used to store computer programs, and the processor is used to execute the program stored in the memory. A computer program, so that the computer device executes the steps to realize the above method.

第四方面，本发明还提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机程序，所述计算机程序被处理器执行时实现上述方法的步骤。In a fourth aspect, the present invention also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above method are implemented.

本发明提供了一种基于深度森林模型和FLUS模型的土地分析方法及系统，与现有技术相比，其有益效果在于：The present invention provides a kind of land analysis method and system based on depth forest model and FLUS model, compared with prior art, its beneficial effect is:

基于深度森林模型建模驱动因子和土地利用变化间的关系，可以获得相比传统机器学习方法更精确的土地利用变化规则，同时模型较已有的深度神经网络更易用，能够有效地改进FLUS模型土地利用变化模拟的精度，受随机性的影响较小，更加稳定，赋予了FLUS模型精确分析土地利用驱动机制的能力。Based on the deep forest model to model the relationship between driving factors and land use change, more accurate land use change rules can be obtained than traditional machine learning methods. At the same time, the model is easier to use than the existing deep neural network, which can effectively improve the FLUS model The accuracy of land use change simulation is less affected by randomness and more stable, endowing the FLUS model with the ability to accurately analyze the driving mechanism of land use.

附图说明Description of drawings

图1是本发明实施例提供的一种基于深度森林模型和FLUS模型的土地分析方法流程示意图；Fig. 1 is a schematic flow chart of a land analysis method based on a deep forest model and a FLUS model provided by an embodiment of the present invention;

图2是本发明实施例基于深度森林模型和FLUS模型的土地利用变化模拟和驱动机制分析方法的示意图；Fig. 2 is the schematic diagram of the land use change simulation and driving mechanism analysis method based on deep forest model and FLUS model of the embodiment of the present invention;

图3是本发明实施例提供的深度森林模型的框架图；Fig. 3 is the frame diagram of the deep forest model provided by the embodiment of the present invention;

图4是本发明实施例提供本发明的DF-FLUS模拟精度与普通方法模拟精度的对比图；Fig. 4 is a comparison diagram of the DF-FLUS simulation accuracy of the present invention and the simulation accuracy of the common method provided by the embodiment of the present invention;

图5是本发明实施例提供的研究区2010年实际土地利用模式和本发明以及普通方法模拟的土地利用模式分布细节的对比图；Fig. 5 is the comparison chart of the actual land use pattern of the research area in 2010 provided by the embodiment of the present invention and the distribution details of the land use pattern simulated by the present invention and common method;

图6是本发明计算的研究区2000-2010土地利用变化数据的驱动因子贡献度；Fig. 6 is the driving factor contribution degree of the study area 2000-2010 land use change data calculated by the present invention;

图7是本发明和普通方法驱动因子贡献度分析分析结果的随机波动性对比图；Fig. 7 is the stochastic volatility comparison chart of the analysis and analysis results of the driving factor contribution degree of the present invention and common method;

图8本发明实施例提供的一种基于深度森林模型和FLUS模型的土地分析系统框图；Fig. 8 is a block diagram of a land analysis system based on a deep forest model and a FLUS model provided by an embodiment of the present invention;

图9是本发明实施例提供的计算机设备的结构示意图。Fig. 9 is a schematic structural diagram of a computer device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图具体阐明本发明的实施方式，实施例的给出仅仅是为了说明目的，并不能理解为对本发明的限定，包括附图仅供参考和说明使用，不构成对本发明专利保护范围的限制，因为在不脱离本发明精神和范围基础上，可以对本发明进行许多改变。The embodiment of the present invention will be explained in detail below in conjunction with the accompanying drawings. The examples given are only for the purpose of illustration, and cannot be interpreted as limiting the present invention. The accompanying drawings are only for reference and description, and do not constitute the scope of patent protection of the present invention. limitations, since many changes may be made in the invention without departing from the spirit and scope of the invention.

在一个实施例中，如图1所示，本发明提供了一种基于深度森林模型和FLUS模型的土地分析方法，其中所述方法包括：In one embodiment, as shown in Fig. 1, the present invention provides a kind of land analysis method based on deep forest model and FLUS model, wherein said method comprises:

S1、获取驱动土地利用变化的驱动因子数据和多期土地利用分布数据；S1. Obtain the driving factor data and multi-period land use distribution data that drive land use change;

其中，驱动因子数据，包括自然环境因子和社会经济因子；Among them, driving factor data, including natural environment factors and socioeconomic factors;

自然环境因子包括高程、坡度、到河流的距离、年度平均温度、年度平均降水、温度季节性和降水季节性；Natural environmental factors include elevation, slope, distance to rivers, mean annual temperature, mean annual precipitation, temperature seasonality, and precipitation seasonality;

社会经济因子包括到省中心的距离、到市中心的距离、到县中心的距离、到机场的距离、到高速公路铁路的距离和到一般道路的距离；Socioeconomic factors include distance to provincial center, distance to city center, distance to county center, distance to airport, distance to expressway rail, and distance to general roads;

土地利用分布数据，包括耕地、林地、水体和建设用地。Land use distribution data, including cultivated land, forest land, water body and construction land.

S2、根据土地扩张分析策略，对所述多期土地利用分布数据进行采样，并集合所述驱动因子数据，所述得到训练样本集；S2. Sampling the multi-period land use distribution data according to the land expansion analysis strategy, and collecting the driving factor data to obtain a training sample set;

优选地，所述根据土地扩张分析策略，对所述多期土土地利用分布数据进行采样，并集合所述驱动因子数据，所述得到训练样本集，还包括：Preferably, according to the land expansion analysis strategy, the multi-period soil land use distribution data is sampled, and the driving factor data is collected, and the training sample set is obtained, which also includes:

土地扩张分析策略通过叠加分析，从多期历史土地利用分布数据提取出“扩张”区域，在扩张区域内进行分层随机采样，获得样本位置，而后基于驱动因子数据和土地利用数据构造训练样本。该采样方式只关注新的土地类型，忽略旧土地类型，因此在捕获土地利用变化特征的同时有效降低了采样和模型训练的复杂性。The land expansion analysis strategy extracts the "expansion" area from the multi-period historical land use distribution data through superposition analysis, performs stratified random sampling in the expansion area to obtain the sample location, and then constructs training samples based on the driving factor data and land use data. This sampling method only focuses on new land types and ignores old land types, thus effectively reducing the complexity of sampling and model training while capturing the characteristics of land use change.

S3、将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据；S3. Input the training sample set into the deep forest model for training, and obtain the development suitability data of each land change;

优选地，所述将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据，还包括：Preferably, the input of the training sample set into the deep forest model for training to obtain the suitability data for each land change development also includes:

{P₁,P₂,…,P_j,…,P_M}{P ₁ ,P ₂ ,…,P _j ,…,P _M }

其中，P_j是土地利用类型y_j的分类概率，H_t(X)为单个决策树的分类结果，X为训练样本集中驱动因子x_i所对应的特征向量，T是普通森林中决策树的个数，I为指示函数。Among them, P _j is the classification probability of land use type y _j , H _t (X) is the classification result of a single decision tree, X is the feature vector corresponding to the driving factor x _i in the training sample set, T is the decision tree in the general forest number, and I is the indicator function.

深度深林模型是基于集成学习策略的深度学习方法，具有超参数少、模型内特征转换、分层特征处理和自适应模型复杂度的特点。使用深度森林模型建模驱动因子和土地变化之间的非线性映射关系，计算各类土地变化的发展适宜性值。深度森林是一个具有级联结构的集成森林模型，它由K个层构成，每个层又由F个普通森林模型(模型可选为，随机森林、极端森林)构成，级联结构中每一层的每个普通森林接收前一层输出的所有特征，然后输出M个高级特征，而后，F 个普通森林输出的高级特征(共F×M个)和原始的输入特征(共N 个)连接，作为新的特征向量，输入模型的下一层。The deep deep forest model is a deep learning method based on an ensemble learning strategy, which has the characteristics of less hyperparameters, feature transformation within the model, hierarchical feature processing, and adaptive model complexity. Use the deep forest model to model the nonlinear mapping relationship between driving factors and land changes, and calculate the development suitability values of various land changes. Deep Forest is an integrated forest model with a cascaded structure. It consists of K layers, and each layer is composed of F ordinary forest models (models can be random forests, extreme forests). Each layer in the cascade structure Each ordinary forest of the layer receives all the features output by the previous layer, and then outputs M advanced features, and then, the advanced features output by F ordinary forests (a total of F×M) are connected to the original input features (a total of N) , as a new feature vector, input to the next layer of the model.

采用具有高表达能力和可用性的深度森林模型挖掘CA土地变化规则，可以显著地改进FLUS的土地利用变化模拟精度。Using a deep forest model with high expressivity and usability to mine CA land change rules can significantly improve the accuracy of FLUS land use change simulation.

S4、将所述土地变化发展适宜性数据、邻域效应、转换成本和惯性系数输入FLUS模型进行模拟，得到模拟土地利用分布数据；S4. Input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

优选地，对所述模拟土地利用分布数据进行评估，将真实的土地利用分布数据与DF-FLUS模型的模拟土地利用分布数据进行比对评估，得出所述DF-FLUS模型的模拟土地利用分布数据的精确度；Preferably, the simulated land use distribution data is evaluated, and the real land use distribution data is compared with the simulated land use distribution data of the DF-FLUS model to obtain the simulated land use distribution of the DF-FLUS model the accuracy of the data;

深度森林模型能够自适应地确定模型的复杂度，在使用默认参数的情况下，即能够确定和问题规模适配的模型大小，避免了CNN模型中过大或者过小的网络导致拟合精度降低，经比对分析得出本发明模拟的土地利用模式相比于普通方法和实际的更加接近，凸显了 DF-FLUS模型的高精确度特点，即采用深度深林模型可以有效的改进 FLUS的土地利用变化模拟精度。The deep forest model can adaptively determine the complexity of the model. In the case of using default parameters, it can determine the size of the model that is suitable for the problem scale, avoiding the reduction of fitting accuracy caused by too large or too small network in the CNN model Through comparative analysis, it can be concluded that the simulated land use model of the present invention is closer to the actual one than the common method, which highlights the high accuracy of the DF-FLUS model, that is, the use of the deep forest model can effectively improve the land use of FLUS Vary simulation precision.

S5、通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据；S5. Calculate the driving factor data set through the deep forest model, and obtain the contribution weight data of each driving factor to each land change;

优选地，所述通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据，还包括：Preferably, the calculation of the driving factor data set through the deep forest model to obtain the contribution weight data of each driving factor to each land change also includes:

Node^left＝{(X,y)|x_i≤t}Node ^left ＝{(X,y)|x _i ≤t}

Node^right＝Node/Node^left Node ^right = Node/Node ^left

D(Node|θ)＝G(Node)-G(Node|θ)D(Node|θ)＝G(Node)-G(Node|θ)

采用最大化不纯度减少值划分节点，构建决策树：其中，最大化不纯度减少值划分节点表示为θ^*，θ^*＝argmax_θD(Node|θ)；Use the maximum impurity reduction value to divide the node, and build a decision tree: wherein, the maximum impurity reduction value division node is expressed as θ ^* , θ ^* = argmax _θ D(Node | θ);

将所述最大化不纯度减少值作为驱动因子的贡献权重，评估所述驱动因子对各土地变化的贡献度；Using the maximum impurity reduction value as the contribution weight of the driving factor to evaluate the contribution of the driving factor to each land change;

所述将所述最大化不纯度减少值作为驱动因子的贡献权重，还包括：The said maximum impurity reduction value as the contribution weight of the driving factor also includes:

将各层森林模型中各驱动因子的贡献权重的平均值取平均，得到集成贡献度，由所述集成贡献度构成各驱动因子对各土地变化的贡献度权重数据；The mean values of the contribution weights of each driving factor in each layer of forest model are averaged to obtain the integrated contribution degree, and the contribution degree weight data of each driving factor to each land change is formed by the integrated contribution degree;

在一个实施例中，采用变量重要性分析方法，即重要的变量可以将样本分到较确定的类别，因而具有较低的不纯度值，因此在森林模型中不纯度平均减少值(MDI)可以用来做变量重要性评估。变量在森林中所有决策树的所有节点上的不纯度减少值的加权平均即为该变量的MDI，所有变量的MDI会被归一化，因此MDI也可以评估因子的贡献权重。In one embodiment, the variable importance analysis method is adopted, that is, important variables can divide the samples into more definite categories, and thus have lower impurity values, so in the forest model, the average impurity reduction value (MDI) can be Used to evaluate the importance of variables. The weighted average of the impurity reduction values of a variable on all nodes of all decision trees in the forest is the MDI of the variable. The MDI of all variables will be normalized, so the MDI can also evaluate the contribution weight of the factor.

由于深度森林模型的每一层是由森林模型构成的，因而可以用内部的基础森林计算的变量重要性，来做驱动因子的贡献权重分析。Since each layer of the deep forest model is composed of a forest model, the variable importance calculated by the internal basic forest can be used to analyze the contribution weight of the driving factors.

对层1的每一个基础森林n进行k次训练，获得k个内部模型 {Fn.1,Fn.2,…,Fn.k}，每一个模型在训练时会计算出一个相应的贡献度，最终共获得k个变量贡献度{Cn.1,Cn.2,…,Cn.k}。将一个森林输出的k 个贡献度进行平均获得Cn，对于层1中的所有随机森林和极端森林重复以上操作，获得{C1,C2,…,Cn}，最后取其平均值作为最终的集成贡献度，该方法充分利用了深度森林模型构建过程中计算的所有MDI信息，有效降低了随机性对于基尼系数重要性方法的影响。同时，变量重要性的计算和深度森林模型的训练过程同步，没有额外的时间成本；Perform k training for each basic forest n of layer 1 to obtain k internal models {Fn.1, Fn.2,...,Fn.k}, each model will calculate a corresponding contribution during training, and finally A total of k variable contribution degrees {Cn.1, Cn.2,..., Cn.k} are obtained. Average the k contributions of a forest output to obtain Cn, repeat the above operation for all random forests and extreme forests in layer 1, obtain {C1,C2,...,Cn}, and finally take the average as the final integrated contribution This method makes full use of all the MDI information calculated during the construction of the deep forest model, effectively reducing the impact of randomness on the importance of the Gini coefficient method. At the same time, the calculation of variable importance is synchronized with the training process of the deep forest model, and there is no additional time cost;

充分利用了深度森林模型构建过程中计算的所有MDI信息，能有效降低随机性对重要性评估的影响，更精确地分析土地利用变化的驱动机制。It fully utilizes all the MDI information calculated during the construction of the deep forest model, which can effectively reduce the impact of randomness on the importance evaluation, and more accurately analyze the driving mechanism of land use change.

基于深度森林模型建模驱动因子和土地利用变化间的关系，可以获得相比传统机器学习方法更精确的土地利用变化规则(发展适宜性)，同时模型较已有的深度神经网络更易用，能够有效地改进FLUS模型土地利用变化模拟的精度，本发明的变量重要性分析方法，受随机性的影响较小，更加稳定，赋予了FLUS模型精确分析土地利用变化驱动机制的能力。Based on the deep forest model to model the relationship between driving factors and land use change, more accurate land use change rules (development suitability) can be obtained than traditional machine learning methods. At the same time, the model is easier to use than the existing deep neural network, and can Effectively improving the accuracy of FLUS model land use change simulation, the variable importance analysis method of the present invention is less affected by randomness and more stable, endowing the FLUS model with the ability to accurately analyze the driving mechanism of land use change.

在一个实施例中，本发明中的研究对象为中国珠三地区，研究区域中所采用的数据为：2000和2010年的CLUD土地利用分布数据，土地利用数据被分类为4个类别(耕地、林地、水体、建设用地)。自然环境因子(高程、坡度、到河流的距离、年度平均温度、年度平均降水、温度季节性、降水季节性)和社会经济因子(到省中心的距离、到市中心的距离、到县中心的距离、到机场的距离、到高速公路铁路的距离、到一般道路的距离)。所有的因子被归一化到[0,1]，以消除尺度效应的影响。数据的空间分辨率被统一重采样到100m。In one embodiment, the research object in the present invention is the Pearl River Delta region of China, and the data adopted in the research area are: CLUD land use distribution data in 2000 and 2010, and the land use data are classified into 4 categories (cultivated land, forest land, water body, construction land). Natural environmental factors (elevation, slope, distance to rivers, annual mean temperature, annual mean precipitation, temperature seasonality, precipitation seasonality) and socioeconomic factors (distance to provincial center, distance to city center, distance to county center distance, distance to the airport, distance to expressways and railways, distance to general roads). All factors are normalized to [0,1] to remove the effect of scale effect. The spatial resolution of the data was uniformly resampled to 100m.

如图2所示，本发明的方法包括以下步骤：As shown in Figure 2, the method of the present invention comprises the following steps:

第1步：收集2000年和2010年的CLUD土地利用分布数据，收集空间矢量数据库，使用GIS软件计算区域内各像元到要素的距离，生成分辨为100m的距离驱动因子图层。收集分辨率为100m高程数据，使用GIS软件计算区域内各像元的坡度，生成坡度驱动因子图层。收集气候栅格数据，投影和重采样到100m分辨率，最终获得13个驱动因子；Step 1: Collect the CLUD land use distribution data in 2000 and 2010, collect the spatial vector database, use GIS software to calculate the distance from each pixel in the area to the elements, and generate a distance driving factor layer with a resolution of 100m. Collect elevation data with a resolution of 100m, use GIS software to calculate the slope of each pixel in the area, and generate a layer of slope driving factors. Collect climate raster data, project and resample to 100m resolution, and finally obtain 13 driving factors;

第2步：从2000和2010年的土地利用数据中使用扩张分析策略，分层随机采样4种类型的样本位置，即“0：保持不变”、“1：变化为耕地”、“2：变化为林地”、“3：变化为建设用地”。在该实例中，假设水体不变。对于类型1、2、3，分别采样5％的样本。对于类型0，其样本数量与前三种类型样本的总数相同。基于获得的样本位置，从驱动因子数据和土地利用分布数据中获得样本数据；Step 2: Using the expansion analysis strategy from the land use data in 2000 and 2010, stratified random sampling of 4 types of sample locations, namely "0: remain unchanged", "1: change to cultivated land", "2: Change to forest land", "3: Change to construction land". In this example, the water body is assumed to be constant. For types 1, 2, 3, 5% of the samples are sampled respectively. For type 0, its number of samples is the same as the total number of samples of the first three types. Obtain sample data from driver data and land use distribution data based on obtained sample locations;

第3步：将样本输入深度森林模型中进行训练，拟合驱动因子和土地利用变化类型间的关系，深度森林的参数基础森林数目F设置为2，每一个基础森林的决策树数目为100，皆为默认参数。其它对比模型也使用默认参数进行训练。模型训练完成后，将完整的驱动因子数据集输入，获得研究区的土地利用变化发展适宜性面；Step 3: Input samples into the deep forest model for training, and fit the relationship between driving factors and land use change types. The parameter base forest number F of the deep forest is set to 2, and the number of decision trees in each base forest is 100. All are default parameters. Other comparison models are also trained with default parameters. After the model training is completed, input the complete data set of driving factors to obtain the development suitability surface of land use change in the study area;

第4步：如图3所示，使用获得的土地利用变化发展适宜性面，集成邻域效应，转换成本、惯性系数计算土地利用变化的总转换概率，而后使用轮盘，确定每个像元的新土地利用类型。重复此步骤，直到土地利用的数量达到需求则停止模拟。本实例分别使用了2010年的土地需求作为模拟的终止条件；Step 4: As shown in Figure 3, use the obtained land use change development suitability surface, integrate neighborhood effects, conversion costs, and inertia coefficients to calculate the total conversion probability of land use change, and then use the roulette to determine each pixel new land use types. Repeat this step until the number of land uses meets the requirement then stop the simulation. This example uses the land demand in 2010 as the termination condition of the simulation;

优选地，如图4所示，本发明在整体精度上比普通方法有很大的改进，特别是和传统深度学习模型CNN对比发现，CNN使用默认参数，未进行参数调优，只有较低的模拟精度，而深度森林模型能够自适应地确定模型的复杂度，在使用默认参数的情况下，即能够确定和问题规模适配的模型大小，避免了CNN模型中过大或者过小的网络导致拟合精度降低；Preferably, as shown in Figure 4, the present invention has a great improvement in the overall accuracy compared with the common method, especially compared with the traditional deep learning model CNN, it is found that CNN uses default parameters without parameter tuning, and only has a lower Simulation accuracy, while the deep forest model can adaptively determine the complexity of the model. In the case of using default parameters, it can determine the size of the model that is suitable for the scale of the problem, avoiding the network that is too large or too small in the CNN model. The fitting accuracy is reduced;

其中，如图5和图6所示，对比了珠三角2010年的实际土地利用模式和本发明以及普通方法模拟的土地利用模式，本发明模拟的土地利用模式相比于普通方法和实际的更加接近。Wherein, as shown in Figure 5 and Figure 6, the actual land use pattern of the Pearl River Delta in 2010 is compared with the land use pattern simulated by the present invention and the common method, and the land use pattern simulated by the present invention is more accurate than the common method and the actual one. near.

第5步：使用深度森林模型训练过程中保存下来的MDI信息计算驱动因子对土地变化的贡献度。计算方式为，对深度森林模型第一层所有基础森林k折交叉验证训练输出的归一化MDI值进行平均。Step 5: Use the MDI information saved during the training process of the deep forest model to calculate the contribution of the driving factors to the land change. The calculation method is to average the normalized MDI values of all basic forest k-fold cross-validation training outputs of the first layer of the deep forest model.

优选地，如图7所示，应用本发明计算的各个驱动因子对研究区 2000-2010年土地利用变化的贡献度；Preferably, as shown in Figure 7, application of each driving factor calculated by the present invention is to the contribution degree of land use change in the study area 2000-2010;

对比了本发明和普通方法多次计算驱动因子贡献度排序的波动性，本发明的方法充分利用了深度森林模型构建过程中计算的所有MDI信息，能有效降低随机性对重要性评估的影响，更精确地分析土地利用变化的驱动机制。Comparing the volatility of multiple calculations of driving factor contribution rankings between the present invention and the common method, the method of the present invention makes full use of all MDI information calculated during the construction of the deep forest model, which can effectively reduce the impact of randomness on importance evaluation, More precise analysis of the driving mechanisms of land use change.

基于上述一种基于深度森林模型和FLUS模型的土地分析方法，如图8所示，本发明实施例，提供了一种基于深度森林模型和FLUS 模型的土地分析系统，所述系统包括：Based on the above-mentioned land analysis method based on a deep forest model and a FLUS model, as shown in Figure 8, an embodiment of the present invention provides a land analysis system based on a deep forest model and a FLUS model, the system comprising:

数据收集模块101，用于获取驱动土地利用分布的驱动因子数据和多期土地利用分布数据；The data collection module 101 is used to obtain driving factor data and multi-period land use distribution data that drive land use distribution;

数据采样模块102，用于根据土地扩张分析策略，对所述多期土地利用分布数据进行采样，并集合所述驱动因子数据，所述得到训练样本集；The data sampling module 102 is used to sample the multi-period land use distribution data according to the land expansion analysis strategy, and gather the driving factor data to obtain a training sample set;

模型训练模块103，用于将所述训练样本集输入深度森林模型进行训练，得到各土地变化发展适宜性数据；The model training module 103 is used to input the training sample set into the deep forest model for training to obtain the suitability data for each land change development;

数据处理模块104，用于将所述土地变化发展适宜性数据、邻域效应、转换成本和惯性系数输入FLUS模型进行模拟，得到模拟土地利用分布数据；The data processing module 104 is used to input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

数据评估模块105，用于通过所述深度森林模型对驱动因子数据集进行计算，得到各驱动因子对各土地变化的贡献度权重数据。The data evaluation module 105 is used to calculate the driving factor data set through the deep forest model, and obtain the contribution weight data of each driving factor to each land change.

基于上述一种基于深度森林模型和FLUS模型的土地分析方法及系统，如图9所示，本发明实施例提供的一种计算机设备，包括存储器、处理器和收发器，它们之间通过总线连接；存储器用于存储一组计算机程序指令和数据，并可以将存储的数据传输给处理器，处理器可以执行存储器存储的程序指令，以执行上述方法的步骤。Based on the above-mentioned land analysis method and system based on the deep forest model and the FLUS model, as shown in Figure 9, a computer device provided by an embodiment of the present invention includes a memory, a processor and a transceiver, and they are connected by a bus ; The memory is used to store a set of computer program instructions and data, and can transmit the stored data to the processor, and the processor can execute the program instructions stored in the memory to perform the steps of the above method.

其中，存储器可以包括易失性存储器或非易失性存储器，或可包括易失性和非易失性存储器两者；处理器可以是中央处理器、微处理器、特定应用集成电路、可编程逻辑器件或其组合。通过示例性但不是限制性说明，上述可编程逻辑器件可以是复杂可编程逻辑器件、现场可编程逻辑门阵列、通用阵列逻辑或其任意组合。Among them, the memory may include volatile memory or non-volatile memory, or may include both volatile and non-volatile memory; the processor may be a central processing unit, a microprocessor, an application-specific integrated circuit, a programmable Logic devices or combinations thereof. By way of illustration but not limitation, the aforementioned programmable logic device may be a complex programmable logic device, field programmable logic gate array, general array logic or any combination thereof.

另外，存储器可以是物理上独立的单元，也可以与处理器集成在一起。In addition, the memory can be a physically independent unit, or it can be integrated with the processor.

本领域普通技术人员可以理解，图9中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有相同的部件布置。Those of ordinary skill in the art can understand that the structure shown in Figure 9 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer equipment to which the solution of this application is applied. The specific computer equipment More or fewer components than shown in the figures may be included, or certain components may be combined, or have the same component arrangement.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现上述方法的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are implemented.

综上，本发明实施例提供一种基于深度森林模型和FLUS模型的土地分析方法及系统，基于深度森林模型建模驱动因子和土地利用变化间的关系，可以获得相比传统机器学习方法更精确的土地利用变化规则(发展适宜性)，同时模型较已有的深度神经网络更易用，能够有效地改进FLUS模型土地利用变化模拟的精度，受随机性的影响较小，更加稳定，赋予了FLUS模型精确分析土地利用驱动机制的能力。In summary, the embodiments of the present invention provide a land analysis method and system based on the deep forest model and the FLUS model. Based on the deep forest model, the relationship between driving factors and land use changes can be obtained more accurately than traditional machine learning methods. At the same time, the model is easier to use than the existing deep neural network, which can effectively improve the accuracy of land use change simulation of the FLUS model, which is less affected by randomness and is more stable, endowing FLUS with The ability of the model to accurately analyze the driving mechanisms of land use.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通计数人员来说，在不脱离本发明计数原理的前提下，还可以做出若干改进和替换，这些改进和替换也应视为本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that for ordinary counters in the technical field, without departing from the counting principle of the present invention, some improvements and substitutions can also be made, these improvements and substitutions It should also be regarded as the protection scope of the present invention.

Claims

1. A land analysis method based on depth forest model and FLUS model, is characterized in that: described method comprises:

Obtain driving factor data and multi-period land use distribution data that drive land use change;

According to the land expansion analysis strategy, the multi-period land use distribution data is sampled, and the driving factor data is collected to obtain a training sample set;

The training sample set is input into the deep forest model for training to obtain the suitability data for each land change development;

Input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

The driving factor data set is calculated through the deep forest model to obtain the contribution weight data of each driving factor to each land change.

2. a kind of land analysis method based on depth forest model and FLUS model as claimed in claim 1, is characterized in that: described according to land expansion analysis strategy, described multi-period land use distribution data is sampled, and gathers all The driving factor data, the obtained training sample set also includes:

The expansion area is extracted from the multi-period historical land use distribution data by superposition analysis method;

performing stratified random sampling in the expanded region to obtain sample positions;

The training sample set is constructed according to the driving factor data and the land use distribution corresponding to the sample location.

3. a kind of land analysis method based on depth forest model and FLUS model as claimed in claim 1, is characterized in that, described training sample set is input depth forest model and trains, and obtains each land change development suitability data ,Also includes:

The training sample set is represented as follows:

{x ₁ ,x ₁ ,…, _xi ,…,x _N ,y _j }, where x _i represents the value of the i-th driving factor, N is the number of driving factors, and the land use type corresponding to y _j land change , M is the number of land use types;

Modeling the nonlinear mapping relationship between driving factors and land changes through the deep forest model, the average value of the classification probability vectors output by the basic forest in the last layer of the deep forest model is used as the land use change development suitability value;

The classification probability vector is expressed in the following way:

{P ₁ ,P ₂ ,…,P _j ,…,P _M }

The mean value of the classification probability vector is calculated using the following formula:

Among them, P _j is the classification probability of land use type y _j , H _t (X) is the classification result of a single decision tree, X is the feature vector corresponding to the driving factor x _i in the training sample set, T is the decision tree in the general forest number, and I is the indicator function.

4. a kind of land analysis method based on depth forest model and FLUS model as claimed in claim 3, it is characterized in that, described by described depth forest model, driver factor data set is calculated, obtains each driver factor to each land The changing contribution weight data also includes:

The deep forest model includes a multi-layer cascade structure and each layer has several common forest models, and the common forest is composed of decision trees;

Select the division rule, and according to the division rule, calculate the Gini coefficient of the node where the driving factor data set is located; wherein, the division rule is defined as θ, θ=(i, t), i represents a driving factor variable, and t represents the division threshold driving The node where the factor data set is located is defined as Node, and the Gini coefficient is defined as G, as follows:

The driving factor data set is divided into two new nodes according to the following division rules:

Node ^left ＝{(X,y)|x _i ≤t}

Node ^right = Node/Node ^left

Among them, Node ^left and Node ^right are two new nodes after division, and the number of samples contained in the two data sets are m ^left and m ^right respectively;

Use the following formula to calculate the sum of the Gini coefficients of the two new nodes:

Use the following formula to calculate the impurity reduction value of the division rule θ:

D(Node|θ)＝G(Node)-G(Node|θ)

Use the maximum impurity reduction value to divide the nodes to construct a decision tree: where the maximum impurity reduction value division node is expressed as θ ^* , θ ^* = argmax _θ D(Node|θ).

5. a kind of land analysis method based on deep forest model and FLUS model as claimed in claim 4, it is characterized in that, with described maximum impurity reduction value as the contribution weight of driving factor, evaluate described driving factor to each contribution to land change.

6. a kind of land analysis method based on depth forest model and FLUS model as claimed in claim 5, is characterized in that, described maximum impurity reduction value is used as the contribution weight of driving factor, also comprises:

Each base forest model of each layer of forest model in the depth forest model is trained to obtain the contribution weight of each driving factor in each base forest;

Calculate the average of the contribution weights of each driver in each base forest and the average of the contribution weights of each driver in each layer of the forest model;

The average value of the contribution weights of the driving factors in each layer of the forest model is averaged to obtain the integrated contribution degree, which constitutes the contribution weight data of each driving factor to each land change.

7. A kind of land analysis method based on deep forest model and FLUS model as claimed in claim 1, it is characterized in that, the simulated land use distribution data of real land use distribution data and DF-FLUS model simulation are compared and evaluated , to obtain the accuracy of the simulated land use distribution data of the DF-FLUS model.

8. A land analysis system based on depth forest model and FLUS model, it is characterized in that, described system comprises:

The data collection module is used to obtain the driving factor data and multi-period land use distribution data that drive land use change;

The data sampling module is used to sample the multi-period land use distribution data according to the land expansion analysis strategy, and collect the driving factor data to obtain the training sample set;

The model training module is used to input the training sample set into the deep forest model for training to obtain the suitability data for each land change development;

The data processing module is used to input the land change development suitability data, neighborhood effect, conversion cost and inertia coefficient into the FLUS model for simulation to obtain simulated land use distribution data;

The data evaluation module is used to calculate the driving factor data set through the deep forest model, and obtain the contribution weight data of each driving factor to each land change.

9. A computer device, characterized in that: comprising a processor and a memory, the processor is connected to the memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory , so that the computer device executes the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that: a computer program is stored in the computer-readable storage medium, and when the computer program is run, the method according to any one of claims 1 to 7 is implemented .