CN114358192B

CN114358192B - Multi-source heterogeneous landslide data monitoring and fusing method

Info

Publication number: CN114358192B
Application number: CN202210013094.0A
Authority: CN
Inventors: 王利; 张懿恺; 许豪; 赵超英; 刘万林; 成伟
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-11-25
Anticipated expiration: 2042-01-06
Also published as: CN114358192A

Abstract

The invention discloses a multi-source heterogeneous landslide data monitoring and fusing method, which comprises the following steps: obtaining a weighted correlation degree by combining a maximum mutual information coefficient method MIC and a gray correlation analysis method GRA to reflect the influence of multisource heterogeneous landslide monitoring variables on landslide deformation displacement and a common change trend, further carrying out optimization on characteristic factors according to the weighted correlation degree, carrying out stepwise regression fitting analysis on the optimally obtained characteristic factors to obtain corresponding regression coefficients, further calculating to obtain a final regression equation and a fusion result, and carrying out reliability and effectiveness evaluation on the data fusion result by adopting a landslide stage judgment and trend prediction method. The method adopts a multi-source heterogeneous data fusion means to carry out multivariate data effective analysis and processing so as to obtain a more reliable and accurate data fusion result and provide valuable reference information for landslide prediction, thereby effectively improving the precision of landslide prediction.

Description

A Multi-source Heterogeneous Landslide Data Monitoring Fusion Method

技术领域technical field

本发明涉及数据融合技术领域，更具体的涉及一种多源异构滑坡数据监测融合方法。The invention relates to the technical field of data fusion, in particular to a multi-source heterogeneous landslide data monitoring fusion method.

背景技术Background technique

多源数据融合技术作为一门新兴的、多领域的交叉学科，经过十多年的探索发展已经在滑坡变形监测上有了广泛的应用。伴随着众多传感器的出现，如何将多源异构传感器信息进行综合信息提取并进行有效的融合处理是目前的研究难点。由于在滑坡监测中，单一的传感器信息无法全面的反映出滑坡的形变特点，且得到的预测预报结果可靠性不高，需要结合多源异构传感器信息进行有效特征提取与综合分析处理，以消除数据间的冗余性及互斥性，进而可得到更加可靠、准确的预测预报结果。As a new and multi-field interdisciplinary subject, multi-source data fusion technology has been widely used in landslide deformation monitoring after more than ten years of exploration and development. With the emergence of many sensors, how to extract information from multi-source heterogeneous sensors and carry out effective fusion processing is the current research difficulty. In landslide monitoring, a single sensor information cannot fully reflect the deformation characteristics of landslides, and the reliability of the prediction results obtained is not high, it is necessary to combine multi-source heterogeneous sensor information for effective feature extraction and comprehensive analysis and processing to eliminate The redundancy and mutual exclusivity among the data can obtain more reliable and accurate prediction results.

目前源异构传感器信息进行有效特征提取与综合分析处理存在单一滑坡监测信息片面性和不可靠性使得预测不准确的问题。At present, the effective feature extraction and comprehensive analysis and processing of source heterogeneous sensor information have the problem of one-sidedness and unreliability of single landslide monitoring information, which makes the prediction inaccurate.

发明内容Contents of the invention

本发明实施例提供一种多源异构滑坡数据监测融合方法，包括：An embodiment of the present invention provides a multi-source heterogeneous landslide data monitoring and fusion method, including:

获取多源异构监测变量数据；Obtain multi-source heterogeneous monitoring variable data;

将多源异构监测变量划分为因变量和特征变量；Divide multi-source heterogeneous monitoring variables into dependent variables and characteristic variables;

计算每两个多源异构滑坡监测变量的最大互信息系数MIC、并筛选出影响滑坡变形最大的特征变量；Calculate the maximum mutual information coefficient MIC of every two multi-source heterogeneous landslide monitoring variables, and screen out the characteristic variables that have the greatest impact on landslide deformation;

确定反应滑坡变形特征的单点位移序列为参考列，影响滑坡变形的因子组成的数据序列为比较列；Determine the single-point displacement sequence that reflects the deformation characteristics of the landslide as the reference column, and the data sequence composed of factors that affect the landslide deformation is the comparison column;

计算参考数列与比较数列的灰色关联系数及灰色关联度；Calculate the gray correlation coefficient and gray correlation degree of the reference sequence and the comparison sequence;

根据最大互信息系数MIC与灰色关联度，计算加权关联度；Calculate the weighted correlation degree according to the maximum mutual information coefficient MIC and the gray correlation degree;

根据加权关联度大小进行特征优选、并筛选出最终特征变量；Feature selection is performed according to the weighted correlation degree, and the final feature variable is screened out;

构建基于加权关联度的特征优选-逐步回归特征级数据融合模型；Construct feature selection based on weighted correlation degree-stepwise regression feature-level data fusion model;

利用基于加权关联度的特征优选-逐步回归特征级数据融合模型进行多源异构信息融合，为滑坡预测预报提供有效的辅助信息。Using the weighted correlation degree-based feature selection-stepwise regression feature-level data fusion model to carry out multi-source heterogeneous information fusion to provide effective auxiliary information for landslide prediction and forecasting.

近一步，还包括对多源异构监测变量数据预处理：One step further, it also includes data preprocessing for multi-source heterogeneous monitoring variables:

异常值剔除、缺失值补全及数据平滑去噪。Outlier elimination, missing value completion and data smoothing and denoising.

近一步，计算每两个多源异构滑坡监测变量的最大互信息系数MIC的步骤，包括：Further, the step of calculating the maximum mutual information coefficient MIC of each two multi-source heterogeneous landslide monitoring variables includes:

给定变量i、j，对两变量构成的散点图进行i列j行网格化，并求出最大的互信息值；Given the variables i and j, grid the scatter diagram composed of the two variables in column i and row j, and find the maximum mutual information value;

对最大的互信息值进行归一化处理；Normalize the maximum mutual information value;

选择不同尺度下互信息的最大值作为MIC值；Select the maximum value of mutual information at different scales as the MIC value;

得到与因变量关联程度最高的特征变量。Get the feature variable with the highest degree of correlation with the dependent variable.

近一步，灰色关联系数，计算公式包括：One step closer, the gray correlation coefficient, the calculation formula includes:

其中，ρ为分辨系数，0<ρ<1，若ρ越小，关联系数间差异越大，区分能力越强，通常ρ取0.5，|x₀(k)-x_i(k)|表示每个比较序列与参考序列对应元素的绝对差值，

与

分别表示两级最小差和两级最大差。Among them, ρ is the resolution coefficient, 0<ρ<1, if the smaller ρ, the greater the difference between the correlation coefficients, the stronger the ability to distinguish, usually ρ is 0.5, |x ₀ (k) _-xi (k)| The absolute difference between the comparison sequence and the corresponding elements of the reference sequence,

and

Respectively represent the two-level minimum difference and the two-level maximum difference.

近一步，加权关联度，计算公式包括：One step further, the weighted correlation degree, the calculation formula includes:

式中n为待选择特征变量的总数，MIC(A，B_i)表示特征变量A和特征变量B_i的最大互信息系数MIC。

In the formula, n is the total number of feature variables to be selected, and MIC(A, B _i ) represents the maximum mutual information coefficient MIC between feature variable A and feature variable B _i .

近一步，特征优选的步骤，包括：One step closer, the steps of feature optimization include:

对计算得到的加权关联度按照从大到小进行排序；Sort the calculated weighted correlation degrees from large to small;

特征变量按照加权关联度大小进行排序筛选；The characteristic variables are sorted and screened according to the weighted correlation degree;

计算排序后的各优选特征权重；Calculate the weight of each preferred feature after sorting;

当优选特征权重

时，筛选停止，得到最终的特征变量；When the optimal feature weight

When , the screening stops and the final feature variable is obtained;

其中，J_S为各特征变量的加权关联度之和，J_j为第j个待筛选特征变量的加权关联度，ω_j为第j个优选特征权重，α为给定的阈值。Among them, J _S is the sum of the weighted correlation degrees of each feature variable, J _j is the weighted correlation degree of the jth feature variable to be screened, ω _j is the jth optimal feature weight, and α is a given threshold.

近一步，还包括：对基于加权关联度的特征优化-逐步回归融合结果与BP神经网络融合结果进行分析对比；In a further step, it also includes: analyzing and comparing the fusion results of feature optimization based on weighted correlation degree-stepwise regression and the fusion results of BP neural network;

建立BP神经网络融合模型，以自变量作为系统输入变量，因变量为系统输出变量；Establish a BP neural network fusion model, take the independent variable as the system input variable, and the dependent variable as the system output variable;

建立含有两个隐藏层的多输入单输出BP神经网络融合模型；Establish a multi-input single-output BP neural network fusion model with two hidden layers;

采用改进切线角及变形速率两个指标对基于加权关联度的特征优选-逐步回归融合与BP神经网络数据融合模型进行阶段评价；Using the two indicators of improved tangent angle and deformation rate, the feature selection based on weighted correlation degree-stepwise regression fusion and BP neural network data fusion model are evaluated in stages;

采用长短期记忆网络人工神经网络LSTM分别基于加权关联度的特征优选-逐步回归融合数据和BP神经网络融合数据，进行预测比较分析。The long-short-term memory network artificial neural network LSTM is used to perform prediction comparison analysis based on feature selection-stepwise regression fusion data and BP neural network fusion data based on weighted correlation degree respectively.

本发明实施例提供一种多源异构滑坡数据监测融合方法，与现有技术相比，其有益效果如下：The embodiment of the present invention provides a multi-source heterogeneous landslide data monitoring fusion method, compared with the prior art, its beneficial effects are as follows:

1、MIC用来衡量两个变量之间的关联程度，且MIC相对于其他关联分析方法而言，MIC既适用于线性和非线性数据，又具有普适性、公平性和对称性，具有很高的准确度。1. MIC is used to measure the degree of correlation between two variables. Compared with other correlation analysis methods, MIC is not only suitable for linear and nonlinear data, but also has universality, fairness and symmetry. high accuracy.

2、将互信息权重与灰色关联度结合起来，采用加权关联度来衡量特征因子对滑坡变形的重要程度，并计算优选特征权重，根据阈值筛选出特征因子，结合互信息与灰色关联的特点来进行特征优选，使得特征优选结果更加可靠。2. Combine mutual information weight and gray relational degree, use weighted relational degree to measure the importance of characteristic factors to landslide deformation, and calculate optimal characteristic weight, filter out characteristic factors according to threshold, combine mutual information and gray relational characteristics to Perform feature optimization to make the result of feature optimization more reliable.

附图说明Description of drawings

图1为本发明融合方法中基于加权关联度的特征优选的流程图；Fig. 1 is a flow chart of feature optimization based on weighted correlation degree in the fusion method of the present invention;

图2为本发明评价分析中RNN模型图；Fig. 2 is the RNN model figure in the evaluation analysis of the present invention;

图3为本发明评价分析中RNN模型隐藏层细胞结构；Fig. 3 is the RNN model hidden layer cell structure in the evaluation analysis of the present invention;

图4为本发明评价分析中LSTM模型隐藏层细胞结构；Fig. 4 is the cell structure of the hidden layer of the LSTM model in the evaluation analysis of the present invention;

图5为本发明实验研究区域监测点分布图；Fig. 5 is the distribution diagram of monitoring points in the experimental research area of the present invention;

图6为本发明实验部分基于加权关联度的特征优选-逐步回归融合结果；Fig. 6 is the feature selection-stepwise regression fusion result based on the weighted correlation degree in the experimental part of the present invention;

图7为本发明实验部分BP神经网络模型融合结果。Fig. 7 is the fusion result of the BP neural network model of the experiment part of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

参见图1～7，本发明实施例提供一种多源异构滑坡数据监测融合方法，该方法包括：Referring to Figures 1-7, an embodiment of the present invention provides a multi-source heterogeneous landslide data monitoring and fusion method, the method comprising:

S1、最大互信息计算。本专利采用python程序软件中的minepy类库来实现MIC算法。主要分为三个步骤：1)给定i、j，对两变量构成的散点图进行i列j行网格化，并求出最大的互信息值；2)对最大的互信息值进行归一化处理；3)选择不同尺度下互信息的最大值作为MIC值；4)即得到与因变量关联程度最高的特征变量用于后续回归预测。MIC用来衡量两个变量之间的关联程度，且MIC相对于其他关联分析方法而言，MIC既适用于线性和非线性数据，又具有普适性、公平性和对称性，具有很高的准确度。S1. Maximum mutual information calculation. This patent adopts the minepy class library in the python program software to realize the MIC algorithm. It is mainly divided into three steps: 1) Given i and j, grid the scatter diagram composed of two variables in column i and row j, and calculate the maximum mutual information value; Normalization processing; 3) Select the maximum value of mutual information at different scales as the MIC value; 4) Get the feature variable with the highest degree of correlation with the dependent variable for subsequent regression prediction. MIC is used to measure the degree of association between two variables, and compared with other association analysis methods, MIC is not only suitable for linear and nonlinear data, but also has universality, fairness and symmetry, and has a high Accuracy.

S2、灰色关联度计算。确定能够反应滑坡变形特征的单点位移序列为参考列，影响滑坡变形的因子组成的数据序列为比较列，并对参考列和比较列进行无量纲化处理，然后求得参考数列与比较数列的灰色关联系数及灰色关联度。S2. Calculation of gray relational degree. The single-point displacement sequence that can reflect the deformation characteristics of the landslide is determined as the reference sequence, and the data sequence composed of factors that affect the landslide deformation is the comparison sequence, and the reference sequence and the comparison sequence are dimensionless, and then the reference sequence and the comparison sequence are obtained. Gray correlation coefficient and gray correlation degree.

S3、加权关联特征优选。互信息值度量了特征对滑坡变形的影响作用，其权重反映了特征的有效性，而灰色关联度则量化了特征与滑坡变形之间的一致程度。将互信息权重与灰色关联度结合起来，采用加权关联度来衡量特征因子对滑坡变形的重要程度，并计算优选特征权重，根据阈值筛选出特征因子。结合互信息与灰色关联的特点来进行特征优选，使得特征优选结果更加可靠。S3. Optimizing weighted correlation features. Mutual information measures the influence of features on landslide deformation, its weight reflects the effectiveness of features, and the gray correlation degree quantifies the consistency between features and landslide deformation. Combining mutual information weight and gray relational degree, the weighted relational degree is used to measure the importance of characteristic factors to landslide deformation, and the optimal characteristic weight is calculated, and the characteristic factors are screened out according to the threshold. Combining the characteristics of mutual information and gray correlation for feature selection makes the result of feature selection more reliable.

S4、逐步回归分析。将特征因子逐个引入模型，每引入一个解释变量后都要进行F检验，并对已经选入的解释变量逐个进行t检验，当原来引入的解释变量的引入变得不再显著时，则将其删除。以确保每次引入的新的变量之前的回归方程只包含显著性变量。反复执行直到既没有显著的解释变量选入回归方程，也没有不显著的解释变量从回归方程中剔除为止。S4. Stepwise regression analysis. The characteristic factors are introduced into the model one by one, the F test is carried out after each explanatory variable is introduced, and the t test is carried out for the explanatory variables that have been selected one by one. When the introduction of the explanatory variables originally introduced becomes no longer significant, the delete. To ensure that the regression equation before each new variable introduced contains only significant variables. Repeat until no significant explanatory variables are selected into the regression equation, and no insignificant explanatory variables are removed from the regression equation.

S5、评价分析。第一步、融合结果阶段比较。为了评价基于加权关联度的特征优选-逐步回归特征级数据融合的可靠性，采用基于加权关联度的特征优化-逐步回归融合结果与BP神经网络融合结果进行阶段判别分析对比。首先，建立BP神经网络融合模型，以自变量作为系统输入变量，因变量为系统输出变量，建立含有两个隐藏层的多输入单输出BP神经网络融合模型。并采用改进切线角及变形速率两个指标对基于加权关联度的特征优选-逐步回归融合与BP神经网络数据融合模型进行阶段评价，以证明本发明融合结果的有效性。第二步、融合结果预测比较。采用LSTM(长短期记忆网络人工神经网络)分别对GNSS监测点因变量单点数据、基于加权关联度的特征优选-逐步回归融合数据和BP神经网络融合数据，进行预测比较分析。采用python程序中的Keras进行LSTM模型的搭建，对预测结果采用MRE(平均相对误差)及MAE(平均绝对误差)两个指标进行预测精度评定，以证明本发明融合结果的可靠性。S5. Evaluation and analysis. The first step is to compare the fusion results. In order to evaluate the reliability of feature selection based on weighted correlation degree-stepwise regression feature-level data fusion, the fusion results of feature optimization based on weighted correlation degree-stepwise regression and BP neural network fusion results were used to conduct stage discriminant analysis and comparison. First, a BP neural network fusion model is established, with the independent variable as the system input variable and the dependent variable as the system output variable, and a multi-input single-output BP neural network fusion model with two hidden layers is established. And use the two indicators of improved tangent angle and deformation rate to carry out stage evaluation on the feature selection based on weighted correlation degree-stepwise regression fusion and BP neural network data fusion model, to prove the effectiveness of the fusion results of the present invention. The second step is to predict and compare the fusion results. LSTM (long-short-term memory network artificial neural network) is used to predict and compare the single-point data of the dependent variable of GNSS monitoring points, feature selection based on weighted correlation degree-stepwise regression fusion data and BP neural network fusion data. The Keras in the python program is used to build the LSTM model, and two indicators of MRE (mean relative error) and MAE (mean absolute error) are used to evaluate the prediction results to prove the reliability of the fusion results of the present invention.

具体地：specifically:

S1、最大互信息计算。滑坡体上的多源异构传感器间是具有关联性的，一个传感器会受到多个其他传感器的综合影响，故需要对多源异构滑坡监测变量进行关联度计算，筛选出影响滑坡变形最大的特征因子用于后续融合预测。在进行MIC(最大互信息系数)计算之前，首先需要对信息熵和互信息进行了解。互信息是指两个随机变量之间的关联程度，且将信息中排除了冗余后的平均信息量称为“信息熵”，那么采用MI(互信息)的方式即可筛选出影响滑坡变形的特征因子，且具有更少的冗余信息。而MIC相较于MI而言有更高的准确度，且不限定于特定的函数类型，就可以获得变量之间的关联程度。如果两个变量之间存在关联，它们对应的数据点的集合分布在二维空间中，使用m乘以n的网格划分数据空间，使落在第(x,y)格子中的数据点频率作为P(x,y)的估计，

(其中n_x,y为落在第(x,y)格子中的数据点数，n为总的数据点数)，同理获得P(x)、P(y)的估计。然后计算随机变量间的互信息，因为m乘以n的网格划分数据点的方式不止一种，所以要获得使互信息最大的网格划分，并使用归一化因子，将互信息的值转化到(0,1)区间内。最后，找到能使归一化互信息最大的网格分辨率，作为MIC的度量值。其中网格的分辨率限制为m×n<B,B＝f(data_size)＝n^0.6，MIC计算公式为

S1. Maximum mutual information calculation. The multi-source heterogeneous sensors on the landslide are correlated, and one sensor will be affected by multiple other sensors comprehensively. Therefore, it is necessary to calculate the correlation degree of the multi-source heterogeneous landslide monitoring variables and screen out the ones that most affect the landslide deformation. The eigenfactors are used for subsequent fusion predictions. Before performing MIC (Maximum Mutual Information Coefficient) calculation, it is first necessary to understand information entropy and mutual information. Mutual information refers to the degree of correlation between two random variables, and the average amount of information after the redundancy is excluded from the information is called "information entropy", then the method of MI (mutual information) can be used to screen out the influence of landslide deformation. eigenfactors with less redundant information. Compared with MI, MIC has higher accuracy, and is not limited to a specific function type, so the degree of correlation between variables can be obtained. If there is an association between two variables, the sets of their corresponding data points are distributed in a two-dimensional space, and the data space is divided by a grid of m times n, so that the frequency of data points falling in the (x, y) grid As an estimate of P(x,y),

(where n _{x, y} are the number of data points falling in the (x, y) grid, and n is the total number of data points), similarly obtain the estimates of P(x) and P(y). Then calculate the mutual information between random variables, because there are more than one way to divide the data points into the grid of m times n, so it is necessary to obtain the grid division that maximizes the mutual information, and use the normalization factor to convert the value of the mutual information Convert to (0,1) interval. Finally, find the grid resolution that maximizes the normalized mutual information as a measure of MIC. The resolution of the grid is limited to m×n<B, B=f(data_size)=n ^0.6 , and the MIC calculation formula is

具体计算步骤如下：1)计算最大互信息值。给定i、j，对两变量X、Y构成的散点图进行i列j行网格化，并求出最大的互信息值。但给定i、j后，可以得到多个不同的网格化方案，则需要计算每个方案对应的互信息值，找出使得互信息最大的网格化方案。2)对最大的互信息值进行归一化。将得到的最大互信息除以log(min(X,Y))，即为归一化。3)选择不同尺度下的互信息的最大值作为MIC值。进而挑选出对滑坡变形影响较大的特征，剔除信息量较少的特征，使得用于建模的变量更具有代表性。The specific calculation steps are as follows: 1) Calculate the maximum mutual information value. Given i and j, grid the scatter diagram composed of two variables X and Y with i columns and j rows, and find the maximum mutual information value. However, given i and j, multiple different gridding schemes can be obtained, and the mutual information value corresponding to each scheme needs to be calculated to find the gridding scheme that maximizes the mutual information. 2) Normalize the largest mutual information value. Dividing the obtained maximum mutual information by log(min(X,Y)) is normalization. 3) Select the maximum value of mutual information at different scales as the MIC value. Then select the features that have a greater impact on the landslide deformation, and eliminate the features with less information, so that the variables used for modeling are more representative.

S2、灰色关联度计算。只采取MIC结果进行特征选取不具有说服力，可将表示影响程度的MIC分析与表示一致程度的灰色关联分析结合起来综合分析，来得到更适合数据融合的特征因子。通过将多源异构滑坡监测特征序列进行无量纲化处理，计算关联系数、关联度，具体过程为：1)确定参考列序列及比较列序列。设参考列序列为Y＝{Y(k)|k＝1,2,…n}；比较列序列为X_i＝{X_i(k)|k＝1,2,…,n},i＝1,2,…,m。2)变量的无量纲化。由于特征因子量纲不同不便于比较，故需要进行无量纲化处理。

无量纲化后的数据序列形成如下矩阵：

3)逐个计算每个被评价对象指标序列(比较序列)与参考序列对应元素的绝对差值，即|x₀(k)-x_i(k)|(k＝1,…,m；i＝1,…,n,)n为被评价对象的个数。4)确定

与

5)计算关联系数。关联系数计算公式为：

其中ρ为分辨系数，0<ρ<1，若ρ越小，关联系数间差异越大，区分能力越强，通常ρ取0.5。6)计算关联度。对各评价对象(比较序列)分别计算其个指标与参考序列对应元素的关联系数的均值，以反映各评价对象与参考序列的关联关系，并称为关联度，记为：

S2. Calculation of gray relational degree. It is not convincing to use only MIC results for feature selection. The MIC analysis representing the degree of influence and the gray correlation analysis representing the degree of consistency can be combined for comprehensive analysis to obtain feature factors that are more suitable for data fusion. Through the non-dimensional processing of the multi-source heterogeneous landslide monitoring feature sequence, the correlation coefficient and correlation degree are calculated. The specific process is: 1) Determine the reference column sequence and compare the column sequence. Let the reference column sequence be Y={Y(k)|k=1,2,…n}; the comparison column sequence be X _i ={X _i (k)|k=1,2,…,n}, i= 1,2,...,m. 2) Dimensionalization of variables. Since the different dimensions of the characteristic factors are not convenient for comparison, it is necessary to perform dimensionless processing.

The dimensionless data sequence forms the following matrix:

3) Calculate the absolute difference between each index sequence (comparison sequence) of the evaluated object and the corresponding element of the reference sequence one by one, namely |x ₀ (k) _-xi (k)|(k=1,...,m; i= 1,...,n,) n is the number of evaluated objects. 4) OK

and

5) Calculate the correlation coefficient. The formula for calculating the correlation coefficient is:

Among them, ρ is the resolution coefficient, 0<ρ<1. If ρ is smaller, the difference between the correlation coefficients is greater and the discrimination ability is stronger. Usually, ρ is set to 0.5. 6) Calculate the correlation degree. For each evaluation object (comparison sequence), calculate the mean value of the correlation coefficient between each index and the corresponding element of the reference sequence to reflect the correlation between each evaluation object and the reference sequence, and it is called the correlation degree, which is recorded as:

S3、加权关联度计算。将最大互信息权重与灰色关联度相结合，得到相应特征的加权关联度来反应特征。加权关联度越大，表示该特征越重要，计算公式为：

式中n为待选择特征的总数。S3. Calculating the weighted correlation degree. Combining the maximum mutual information weight with the gray correlation degree, the weighted correlation degree of the corresponding feature is obtained to reflect the feature. The greater the weighted correlation degree, the more important the feature is, and the calculation formula is:

where n is the total number of features to be selected.

S4、特征优选。对计算得到的加权关联度按照从大到小进行排序，选取加权关联度最大的特征加入到优选集合中，并从待优选集合中剔除。依次从大到小进行筛选，计算优选特征权重：

式中J_S为优选集合中各特征的加权关联度之和，J_j为第j个待筛选特征的加权关联度，ω_j为第j个优选特征权重，该值小于α时，认为特征筛选完毕。S4. Feature optimization. The calculated weighted correlations are sorted from large to small, and the feature with the largest weighted correlation is selected to be added to the optimized set, and removed from the to-be-optimized set. Filter in order from large to small, and calculate the weight of the preferred feature:

In the formula, J _S is the sum of the weighted correlation degrees of each feature in the optimal set, J _j is the weighted correlation degree of the jth feature to be screened, and ω _j is the weight of the jth optimal feature. When the value is less than α, it is considered that the feature screening complete.

S5、逐步回归分析。逐步回归的基本思想是将变量逐个引入模型，每引入一个解释变量后都要进行F检验，并对已经选入的解释变量逐个进行t检验，当原来引入的解释变量的引入变得不再显著时，则将其删除。以确保每次引入的新的变量之前的回归方程只包含显著性变量。这是一个反复的过程，直到既没有显著的解释变量选入回归方程，也没有不显著的解释变量从回归方程中剔除为止。逐步回归具体步骤如下：S5. Stepwise regression analysis. The basic idea of stepwise regression is to introduce variables into the model one by one, perform F-test after each explanatory variable is introduced, and perform t-test on the explanatory variables that have been selected one by one, when the introduction of the original explanatory variables becomes no longer significant , delete it. To ensure that the regression equation before each new variable introduced contains only significant variables. This is an iterative process until neither significant explanatory variables are selected into the regression equation nor insignificant explanatory variables are removed from the regression equation. The specific steps of stepwise regression are as follows:

第一步：建立增广矩阵Step 1: Build an augmented matrix

计算l_ij，l_iy，l_yy以及r_ij，r_iy，公式分别为：

其中

即可得到扩充了的增广矩阵

其中R＝(r_ij)_m×m，r_yy＝1，r_y＝(r_1y,r_2y,…,r_my)'。Calculate l _ij , l _iy , l _yy and r _ij , r _iy , the formulas are:

in

The extended augmented matrix can be obtained

Where R=(r _ij ) _m×m , r _yy =1, r _y =(r _1y , r _2y , . . . , r _my )'.

第二步：对第s步进行消去变换，结果为

其中Step 2: Perform elimination transformation on step s, the result is

in

第三步：因子剔除。The third step: factor elimination.

①选择j₀，使得

②计算

③若F>F_出，则执行第四步；反之，则进行s+1次消去变换，然后转入二、三两步进行计算。①Choose j ₀ so that

② calculation

③ If F>F _{is out} , then execute the fourth step; otherwise, perform s+1 times of elimination transformation, and then transfer to the second and third steps for calculation.

第四步：引入回归因子。设s，{j}，f仍有步骤二定义。Step 4: Introduce regression factors. Suppose s, {j}, f are still defined in step 2.

①选择k₀，使得

②计算

其中

③若F<F_进，则执行第五步；反之，则进行s+1次消去变换，引入第k₀个回归因子，然后转入二、三、四步进行计算。①Choose k ₀ such that

② calculation

in

③ If F<F _advance , then execute the fifth step; otherwise, perform s+1 times of elimination transformation, introduce the k _0th regression factor, and then transfer to the second, third, and fourth steps for calculation.

第五步，这时既不能引进变量，也不能剔除变量。最后得到的回归方程为^y＝^b₀+∑_j∈{j}^b_jx_j,其中

In the fifth step, variables cannot be introduced or eliminated at this time. The final regression equation is ^y=^b ₀ +∑ _j∈{j} ^b _j x _j , where

S6、评价分析。S6. Evaluation and analysis.

第一步、融合结果阶段比较。为了评价基于加权关联度的特征优选-逐步回归特征级数据融合的可靠性，采用该模型融合结果与BP神经网络融合结果进行阶段判别分析对比。首先，建立BP神经网络融合模型，以自变量作为系统输入变量，因变量为系统输出变量，建立含有两个隐藏层的多输入单输出BP神经网络融合模型。BP神经网络拓扑结构包括输入层、隐藏层及输出层三个部分，同时也包含正向多层前馈阶段和反向误差修正阶段两个过程。正向多层前馈阶段是一个正向的过程从输入层开始依次计算各层各节点的实际输入输出，数学模型为

式中

为第l层第i个节点的输出值；

第l层第i个节点的激活值；

为第l-1层第j个节点到第l层第i个节点的连接权值；

为第l层第i个节点的阈值；N_l为第l层节点数。为了提高输出层神经元的误差精度，采用梯度递降算法进行反向误差传播。通过梯度递降算法对每一层神经元之间的连接权重进行调整，使最终的总体误差会沿减少方向改变。其算法公式为：

(η为学习率)，权值调整公式为

The first step is to compare the fusion results. In order to evaluate the reliability of feature selection based on weighted correlation degree-stepwise regression feature-level data fusion, the fusion results of this model and the fusion results of BP neural network were used to conduct stage discriminant analysis and comparison. First, a BP neural network fusion model is established, with the independent variable as the system input variable and the dependent variable as the system output variable, and a multi-input single-output BP neural network fusion model with two hidden layers is established. The topology of BP neural network includes three parts: input layer, hidden layer and output layer, and also includes two processes: forward multi-layer feedforward stage and reverse error correction stage. The forward multi-layer feed-forward stage is a forward process to calculate the actual input and output of each node in each layer sequentially from the input layer, and the mathematical model is

In the formula

is the output value of the i-th node in layer l;

The activation value of the i-th node in layer l;

is the connection weight from the jth node of layer l-1 to the ith node of layer l;

is the threshold of the i-th node in layer l; N _l is the number of nodes in layer l. In order to improve the error accuracy of neurons in the output layer, a gradient descent algorithm is used for reverse error propagation. The connection weights between neurons in each layer are adjusted through the gradient descent algorithm, so that the final overall error will change in the direction of reduction. Its algorithmic formula is:

(η is the learning rate), the weight adjustment formula is

并采用改进切线角(参考自许强，一种改进的切线角及对应的滑坡预警判据)及变形速率两个指标对基于加权联合度的特征优选-逐步回归融合模型与BP神经网络数据融合模型进行阶段评价，如表为基于变形速率阈值和变形过程综合预警依据。And using the improved tangent angle (refer to Xu Qiang, an improved tangent angle and the corresponding landslide early warning criterion) and deformation rate to optimize the feature based on the weighted joint degree-stepwise regression fusion model and BP neural network data fusion The model is evaluated in stages, as shown in the table based on the deformation rate threshold and the comprehensive early warning basis of the deformation process.

表2基于变形速率阈值和变形过程综合预警判据Table 2 Comprehensive early warning criteria based on deformation rate threshold and deformation process

第二步、融合结果预测比较。特征级融合是将滑坡监测得到的多源异构信息进行有效的分析处理。进而提高预测预报的精确度。为讨论特征级融合在提高滑坡预测预报准确度方面的有效性，采用LSTM(长短期记忆网络人工神经网络)分别对GNSS监测点因变量单点数据、基于加权联合度的特征优选-逐步回归融合数据和BP神经网络融合数据，进行预测比较分析。采用python程序中的Keras进行LSTM模型的搭建，LSTM神经网络模型是基于普通的循环神经网络改进的一种新型神经网络算法，LSTM模型中将隐藏层的RNN细胞替换为LSTM细胞，能有效的克服梯度在反向传播的过程中可能会快速消失这一问题，使其具有长期记忆能力，能够处理长时间序列数据。相比RNN模型，LSTM单元的内部设置了3个门控开关，如图3所示，其中，i为输入门，f为遗忘门，c为细胞状态，o为输出，σ和tanh分别为Sigmoid和双曲正切激活函数。The second step is to predict and compare the fusion results. Feature-level fusion is an effective analysis and processing of multi-source heterogeneous information obtained from landslide monitoring. This will improve the accuracy of the forecast. In order to discuss the effectiveness of feature-level fusion in improving the accuracy of landslide prediction, LSTM (long-term short-term memory network artificial neural network) is used to analyze the single-point data of GNSS monitoring point dependent variables, feature selection based on weighted joint degree-stepwise regression fusion The data is fused with the BP neural network for forecasting and comparative analysis. The LSTM model is built using Keras in the python program. The LSTM neural network model is a new type of neural network algorithm based on the improvement of the ordinary cyclic neural network. In the LSTM model, the RNN cells in the hidden layer are replaced by LSTM cells, which can effectively overcome The problem that gradients can quickly disappear during backpropagation gives it a long-term memory that can handle long-term series data. Compared with the RNN model, 3 gating switches are set inside the LSTM unit, as shown in Figure 3, where i is the input gate, f is the forgetting gate, c is the cell state, o is the output, and σ and tanh are Sigmoid respectively and hyperbolic tangent activation functions.

遗忘门通过查看h_t-1和x_t的信息，利用Sigmoid单元输出一个0～1之间的向量，该向量里面的0～1值表示细胞状态c_t-1中的哪些信息保留或丢弃多少。0表示不保留，1表示都保留。f_t＝σ(W_f·[h_t-1,x_t]+b_f)。输入门用来更新单元状态。先将先前隐藏状态的信息和当前输入的信息输入到Sigmoid函数，在0和1之间调整输出值来决定更新哪些信息，0表示不重要，1表示重要。同时将隐藏状态和当前输入传输给tanh函数，并在-1和1之间压缩数值以调节网络，然后把tanh输出和Sigmoid输出相乘，Sigmoid输出将决定在tanh输出中哪些信息时重要的且需要进行保留的。i_t＝σ[W_f·[h_t-1,x_t]+b_i],～C＝tanh(W_C·[h_t-1,x_t]+b_C)。输出门控制着下个隐藏状态的值，隐藏状态可用于预测。首先把先前的隐藏状态和当前输入传递给Sigmoid函数，同时把新得到的单元状态传递给tanh函数，然后把tanh输出和Sigmoid输出相乘，得出隐藏状态新的信息，作为当前单元的输出值输出；最后将新的单元状态和隐藏状态同步至下一个时间步。o_t＝σ[W_o·[h_t-1,x_t]+b_o],h_t＝o_t*tanh(C_t)。LSTM模型训练过程采用经典的反向传播算法，分为4个步骤：(1)按照前向计算方法计算LSTM细胞的输出值，

分别为损失函数l()的一阶导数和二阶导数，最终得到的目标函数为：

W和b分别对应的权重系数矩阵和偏置项。(2)反向计算每个LSTM细胞的误差项，包括按时间和网络层级2个反向传播方向。(3)根据相应的误差项，计算每个权重的梯度。(4)应用基于梯度的优化算法更新权重。The forget gate uses the Sigmoid unit to output a vector between 0 and 1 by looking at the information of h _t-1 and x _t . The 0-1 value in the vector indicates which information in the cell state c _t-1 is retained or discarded . 0 means not reserved, 1 means reserved. f _t =σ(W _f ·[h _t-1 ,x _t ]+b _f ). The input gate is used to update the cell state. First input the information of the previous hidden state and the current input information into the Sigmoid function, adjust the output value between 0 and 1 to decide which information to update, 0 means unimportant, 1 means important. At the same time, the hidden state and current input are transmitted to the tanh function, and the value is compressed between -1 and 1 to adjust the network, and then the tanh output is multiplied by the Sigmoid output, and the Sigmoid output will determine which information in the tanh output is important and need to be reserved. i _t =σ[W _f ·[h _t-1 , x _t ]+b _i ],~C=tanh(W _C ·[h _t-1 ,x _t ]+b _C ). The output gate controls the value of the next hidden state, which can be used for prediction. First pass the previous hidden state and current input to the Sigmoid function, and at the same time pass the newly obtained unit state to the tanh function, then multiply the tanh output and the Sigmoid output to obtain new information about the hidden state as the output value of the current unit output; finally the new cell state and hidden state are synchronized to the next time step. o _t =σ[W _o ·[h _t−1 ,x _t ]+b _o ], h _t =o _t *tanh(C _t ). The LSTM model training process adopts the classic backpropagation algorithm, which is divided into four steps: (1) calculate the output value of the LSTM cell according to the forward calculation method,

are the first and second derivatives of the loss function l() respectively, and the final objective function is:

W and b correspond to the weight coefficient matrix and bias term respectively. (2) Reversely calculate the error term of each LSTM cell, including 2 backpropagation directions according to time and network level. (3) Calculate the gradient of each weight according to the corresponding error term. (4) Apply a gradient-based optimization algorithm to update the weights.

采用MRE(平均相对误差)及MAE(平均绝对误差)两个指标进行预测精度评定。

检测期望值与实际值之间的距离大小，对预测精度进行衡量。Two indicators, MRE (Mean Relative Error) and MAE (Mean Absolute Error), were used to evaluate the prediction accuracy.

Detect the distance between the expected value and the actual value to measure the prediction accuracy.

实施例：Example:

本发明的实验数据采用了甘肃省永靖县黑方台党川7#滑坡体2019年3月28日至10月4日滑坡监测数据，以天为采样率，共191d，其中包含两组GNSS监测数据(HF06、HF07)，三组位移计监测数据(DCF11、DCF14、DCF15)和湿度、温度、降雨量3种气象数据。该滑坡体在2019年10月5日4时发生滑坡，5组监测设备均监测到这次滑坡变形的位移变化数据，即可采用多源异构传感器进行数据融合及精度判定。实验区监测点分布情况如图5所示。The experimental data of the present invention adopts the landslide monitoring data from March 28 to October 4, 2019 in Heifangtai Dangchuan 7# landslide, Yongjing County, Gansu Province, with the sampling rate of days as the sampling rate, a total of 191 days, including two sets of GNSS Monitoring data (HF06, HF07), three sets of displacement meter monitoring data (DCF11, DCF14, DCF15) and three meteorological data of humidity, temperature and rainfall. The landslide occurred at 4:00 on October 5, 2019. All five groups of monitoring equipment monitored the displacement change data of the landslide deformation, and multi-source heterogeneous sensors can be used for data fusion and accuracy determination. The distribution of monitoring points in the experimental area is shown in Figure 5.

本发明首先将多传感器变量及环境因子进行数据预处理，预处理包括异常值剔除、缺失值补全及数据平滑去噪，对预处理后的数据首先进行MIC(最大互信息)计算，并确定互信息权重，获得对滑坡变形影响最大的特征因子。然后进行灰色关联度计算，以获得灰色关联度值，最后采用加权关联度公式来得到加权关联度值，通过计算特征优选权重来确定最终的特征因子。表3为GNSS监测点HF06与其他GNSS监测数据、位移计监测数据、降雨量、温度、湿度的MIC权值、灰色关联度及加权关联度结果。加权关联度越大则表示该特征对滑坡变形影响及切近程度更高。The present invention first performs data preprocessing on multi-sensor variables and environmental factors. The preprocessing includes abnormal value elimination, missing value completion, and data smoothing and denoising. Firstly, MIC (maximum mutual information) calculation is performed on the preprocessed data to determine Mutual information weights are used to obtain the eigenfactors that have the greatest impact on landslide deformation. Then calculate the gray relational degree to obtain the gray relational degree value, and finally use the weighted relational degree formula to obtain the weighted relational degree value, and determine the final feature factor by calculating the feature optimization weight. Table 3 shows the MIC weight, gray correlation degree and weighted correlation degree results of GNSS monitoring point HF06 and other GNSS monitoring data, displacement meter monitoring data, rainfall, temperature, and humidity. The greater the weighted correlation degree, the higher the impact and closeness of the feature on landslide deformation.

表3加权关联计算表Table 3 Weighted association calculation table

对结合了MIC及灰色关联的加权关联方法所得到的结果进行排序：GNSS监测点HF07>位移计DCF11数据>位移计DCF14数据>位移计DCF15数据>前48小时累计降雨量>湿度>温度>降雨量，即位移传感器监测数据、GNSS监测数据、前48小时累计降雨量数据、温度、湿度数据对滑坡的影响较大。表4为根据加权关联度计算的特征优选权重大小，并通过该值进行特征优选。Sort the results obtained by the weighted correlation method that combines MIC and gray correlation: GNSS monitoring point HF07 > displacement meter DCF11 data > displacement meter DCF14 data > displacement meter DCF15 data > cumulative rainfall in the previous 48 hours > humidity > temperature > rainfall Quantities, that is, displacement sensor monitoring data, GNSS monitoring data, cumulative rainfall data in the first 48 hours, temperature, and humidity data have a greater impact on landslides. Table 4 shows the feature selection weight size calculated according to the weighted correlation degree, and the feature selection is carried out through this value.

表4特征优选结果Table 4 Feature Optimization Results

对加权关联度进行排序并计算优选特征权重，选取阈值α为0.1，即当特征权重小于0.1时，认为该特征对滑坡变形影响可以忽略，特征因子选择完毕。将加权关联度优选得到的影响滑坡形变的因素进行逐步回归拟合分析，根据分析结果以GNSS监测点HF07数据、位移计监测数据、前48小时累计降雨量数据、温度、湿度作为自变量，以GNSS监测点HF06数据作为因变量，进行逐步回归分析，得到相应的回归系数进而计算得到最终的特征及融合结果。且在逐步回归分析中，通过对不同模型进行相关系数、残差的方差、F值、显著性等比较分析得到模型的最优结果。得到的回归系数如表5所示。Sort the weighted correlation degree and calculate the optimal feature weight, select the threshold α as 0.1, that is, when the feature weight is less than 0.1, it is considered that the impact of the feature on landslide deformation can be ignored, and the selection of feature factors is completed. The factors affecting landslide deformation obtained by optimizing the weighted correlation degree were subjected to stepwise regression fitting analysis. According to the analysis results, the GNSS monitoring point HF07 data, displacement meter monitoring data, accumulated rainfall data in the previous 48 hours, temperature, and humidity were used as independent variables. The GNSS monitoring point HF06 data is used as the dependent variable for stepwise regression analysis to obtain the corresponding regression coefficients and then calculate the final features and fusion results. And in the stepwise regression analysis, the optimal result of the model is obtained by comparing and analyzing the correlation coefficient, residual variance, F value, and significance of different models. The obtained regression coefficients are shown in Table 5.

表5回归系数表Table 5 regression coefficient table

得到该处滑坡体表面位移逐步回归模型表达式为：滑坡体表面位移＝(GNSS监测点HF07数据×0.587)+(位移计DCF11数据×0.036)+(位移计DCF14数据×0.519)-(位移计DCF15数据×0.159)+(温度×0.028)+(湿度×0.026)-(前48小时累计降雨量×0.010)，进而得到基于加权关联度的特征优选-逐步回归特征级融合结果，如图6所示。The stepwise regression model expression of the surface displacement of the landslide body at this place is obtained: surface displacement of the landslide body=(GNSS monitoring point HF07 data×0.587)+(displacement meter DCF11 data×0.036)+(displacement meter DCF14 data×0.519)-(displacement meter DCF15 data × 0.159) + (temperature × 0.028) + (humidity × 0.026) - (cumulative rainfall in the first 48 hours × 0.010), and then get the feature selection based on the weighted correlation degree - stepwise regression feature level fusion results, as shown in Figure 6 Show.

将基于加权关联度的特征优选-逐步回归融合结果与BP神经网络融合结果进行阶段判别分析对比。首先对BP神经网络融合模型进行建立，将GNSS监测点HF07数据、位移计监测数据、前48小时累计降雨量、温度、湿度等作为BP神经网络模型的输入数据，以GNSS监测点HF06数据作为期望输出数据，参考MIC分析结果搭建含有两个隐藏层的多输入单一输出的BP神经网络融合模型。经过实验分析得到BP神经网络特征级融合结果，如图7所示。由文献资料可知，当切线角大于80°时，滑坡已处于中加速阶段，本实验中我们只针对临滑前的切线角进行两种融合结果的比较，切线角分析对比结果如表6所示，变形速率分析对比结果如表7所示。The stage discriminant analysis was carried out to compare the fusion results of feature selection based on weighted correlation degree-stepwise regression and the fusion results of BP neural network. Firstly, the BP neural network fusion model is established, and the GNSS monitoring point HF07 data, the displacement meter monitoring data, the accumulated rainfall in the previous 48 hours, temperature, humidity, etc. are used as the input data of the BP neural network model, and the GNSS monitoring point HF06 data is used as the expected For the output data, refer to the MIC analysis results to build a BP neural network fusion model with two hidden layers of multiple inputs and single output. After experimental analysis, the feature-level fusion results of BP neural network are obtained, as shown in Figure 7. It can be seen from the literature that when the tangent angle is greater than 80°, the landslide is already in the medium-acceleration stage. In this experiment, we only compare the two fusion results for the tangent angle before sliding. The comparison results of the tangent angle analysis are shown in Table 6. , the deformation rate analysis and comparison results are shown in Table 7.

表6两种融合结果改进切线角分析结果Table 6 Two kinds of fusion results improve the tangent angle analysis results

表7两种融合结果变形速率分析结果Table 7 Deformation rate analysis results of two fusion results

由两个阶段指标对比可知，基于加权关联度的特征优选-逐步回归所得的改进切线角更加靠近滑坡失稳时刻，用于滑坡阶段判别更加贴合滑坡真实发展状态。据此可以说明，基于加权关联度的特征优选-逐步回归融合结果在用于滑坡阶段判别分析中具有较好的可靠性和准确性，融合结果较佳。It can be seen from the comparison of the two stage indicators that the improved tangent angle obtained by feature selection-stepwise regression based on the weighted correlation degree is closer to the moment of landslide instability, and it is more suitable for the real development state of the landslide when used for landslide stage discrimination. Accordingly, it can be shown that the fusion result of feature selection based on weighted correlation degree-stepwise regression has good reliability and accuracy in landslide stage discriminant analysis, and the fusion result is better.

然后采用LSTM网络算法分别对GNSS监测点HF06数据、基于加权关联度的特征优选-逐步回归融合数据和BP神经网络融合数据，进行滑坡趋势预测对比分析，并采用MRE、MAE两个精度评定指标进行预测结果精度比较。Then, the LSTM network algorithm is used to conduct a comparative analysis of the landslide trend prediction on the GNSS monitoring point HF06 data, the feature selection based on the weighted correlation degree-stepwise regression fusion data and the BP neural network fusion data, and use MRE and MAE two accuracy evaluation indicators to carry out the analysis. Comparison of prediction accuracy.

表8两种融合结果预测精度比较Table 8 Comparison of prediction accuracy of two fusion results

由表8可以得出，基于加权关联度的特征优选-逐步回归融合结果预测的MAE和MRE分别为9.9mm和3.46％，BP神经网络融合结果预测的MAE和MRE分别为15.1mm和4.33％，GNSS监测点HF06数据预测的MAE和MRE分别为19.9mm和4.51％。即基于加权关联度的特征优选-逐步回归融合结果的预测精度更高，也证明该特征级融合结果更加准确、可靠。It can be concluded from Table 8 that the MAE and MRE predicted by the feature selection based on the weighted correlation degree-stepwise regression fusion results are 9.9mm and 3.46%, respectively, and the MAE and MRE predicted by the BP neural network fusion results are 15.1mm and 4.33%, respectively. The MAE and MRE predicted by GNSS monitoring point HF06 data are 19.9mm and 4.51%, respectively. That is, the prediction accuracy of the feature selection-stepwise regression fusion result based on the weighted correlation degree is higher, which also proves that the feature-level fusion result is more accurate and reliable.

以上公开的仅为本发明的几个具体实施例，本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明的精神和范围，但是，本发明实施例并非局限于此，任何本领域的技术人员能思之的变化都应落入本发明的保护范围内。The above disclosures are only a few specific embodiments of the present invention. Those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the present invention. However, the embodiments of the present invention are not limited thereto , any changes conceivable by those skilled in the art should fall within the protection scope of the present invention.

Claims

1. A multi-source heterogeneous landslide data monitoring fusion method is characterized in that, comprising:

Obtain multi-source heterogeneous monitoring variable data;

Divide multi-source heterogeneous monitoring variables into dependent variables and characteristic variables;

Calculate the maximum mutual information coefficient MIC of the dependent variable and the characteristic variable, and screen out the characteristic variable that most affects the landslide deformation;

Determine the single-point displacement sequence that reflects the deformation characteristics of the landslide as the reference column, and the data sequence composed of factors that affect the landslide deformation is the comparison column;

Calculate the gray correlation coefficient and gray correlation degree between the reference sequence and the comparative sequence;

Calculate the weighted correlation degree according to the maximum mutual information coefficient MIC and the gray correlation degree;

Feature selection is performed according to the weighted correlation degree, and the final feature variable is screened out;

Perform stepwise regression fitting analysis on the characteristic variables obtained by optimization;

Construct feature selection based on weighted correlation degree-stepwise regression feature-level data fusion model;

Using the feature selection-stepwise regression feature-level data fusion model based on weighted correlation degree to carry out multi-source heterogeneous information fusion to provide effective auxiliary information for landslide prediction and forecasting;

Described gray correlation coefficient, computing formula comprises:

Among them, ρ is the resolution coefficient, 0<ρ<1, if the smaller ρ, the greater the difference between the correlation coefficients, the stronger the ability to distinguish, usually ρ is 0.5, |x ₀ (k) _-xi (k)| The absolute difference between the comparison sequence and the corresponding elements of the reference sequence,

and

respectively represent the two-level minimum difference and the two-level maximum difference; n is the number of evaluated objects;

Calculation of correlation degree: For each evaluation object, calculate the average value of the correlation coefficient between each index and the corresponding element of the reference sequence to reflect the correlation relationship between each evaluation object and the reference sequence, and it is called the correlation degree, which is recorded as:

The calculation formula of the weighted correlation degree includes:

where n is the total number of feature variables to be selected, and MIC(A, B _i ) represents the maximum mutual information coefficient MIC between feature variable A and feature variable B _i ;

The preferred steps of the features include:

Sort the calculated weighted correlation degrees from large to small;

The characteristic variables are sorted and screened according to the weighted correlation degree;

Calculate the weight of each preferred feature after sorting;

When the optimal feature weight

When , the screening stops and the final feature variable is obtained;

Among them, J _S is the sum of the weighted correlation degrees of each feature variable, J _j is the weighted correlation degree of the jth feature variable to be screened, ω _j is the jth optimal feature weight, and α is a given threshold;

The stepwise regression fitting analysis includes: introducing the characteristic factors into the model one by one, carrying out the F test after each explanatory variable is introduced, and carrying out the t test to the explanatory variables which have been selected one by one, when the introduction of the explanatory variables originally introduced changes When it is no longer significant, delete it to ensure that the regression equation before each new variable introduced only contains significant variables; iteratively execute until no significant explanatory variables are selected into the regression equation, and there are no insignificant explanations Variables are removed from the regression equation.

2. a kind of multi-source heterogeneous landslide data monitoring fusion method as claimed in claim 1, is characterized in that, also comprises to multi-source heterogeneous monitoring variable data preprocessing:

Outlier elimination, missing value completion and data smoothing and denoising.

3. a kind of multi-source heterogeneous landslide data monitoring fusion method as claimed in claim 1, is characterized in that, the step of the maximum mutual information coefficient MIC of described calculation dependent variable and characteristic variable, comprises:

Given the variables i and j, grid the scatter diagram composed of the two variables in column i and row j, and find the maximum mutual information value;

Normalize the maximum mutual information value;

Select the maximum value of mutual information at different scales as the MIC value;

Get the feature variable with the highest degree of correlation with the dependent variable.

4. a kind of multi-source heterogeneous landslide data monitoring fusion method as claimed in claim 1, is characterized in that, also comprises: To the feature optimization-stepwise regression fusion result based on weighted correlation degree and BP neural network fusion result are analyzed and compared , which includes:

Establish a BP neural network fusion model, take the independent variable as the system input variable, and the dependent variable as the system output variable;

Establish a multi-input single-output BP neural network fusion model with two hidden layers;

The two indicators of improved tangent angle and deformation rate are used to evaluate the feature selection based on weighted correlation degree-stepwise regression fusion and BP neural network data fusion model.

5. a kind of multi-source heterogeneous landslide data monitoring fusion method as claimed in claim 4, is characterized in that, also comprises:

The long-short-term memory network artificial neural network LSTM is used to perform prediction comparison analysis based on feature selection-stepwise regression fusion data and BP neural network fusion data based on weighted correlation degree respectively.