CN108038081B - Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value - Google Patents

Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value Download PDF

Info

Publication number
CN108038081B
CN108038081B CN201711425595.5A CN201711425595A CN108038081B CN 108038081 B CN108038081 B CN 108038081B CN 201711425595 A CN201711425595 A CN 201711425595A CN 108038081 B CN108038081 B CN 108038081B
Authority
CN
China
Prior art keywords
landslide
logistic regression
spatial
value
regression model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711425595.5A
Other languages
Chinese (zh)
Other versions
CN108038081A (en
Inventor
陈玉敏
李慧芳
周江
杨家鑫
张静祎
陈娒杰
方涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201711425595.5A priority Critical patent/CN108038081B/en
Publication of CN108038081A publication Critical patent/CN108038081A/en
Application granted granted Critical
Publication of CN108038081B publication Critical patent/CN108038081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for analyzing a landslide disaster by using logistic regression based on a spatial filter value, which is characterized in that aiming at the landslide disaster analysis research, the spatial filter value idea is introduced into a common logistic regression model, and a landslide regression analysis algorithm comprising the steps of selecting non-landslide points, obtaining and grading disaster causing factor values, constructing an adjacency matrix, calculating characteristic values and characteristic vectors, selecting stepwise regression characteristic vectors, modeling regression and the like is designed. The method can solve the problem that the precision of the model is low due to the fact that the logistic regression model is influenced by the spatial autocorrelation among variables. The selected feature vectors are used for constructing a filtering value operator to be added into the logistic regression model, so that the self-correlation influence of residual errors can be effectively filtered, the goodness of fit and the prediction accuracy of the regression model are improved, and the accurate simulation and prediction of landslide disasters are realized.

Description

Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value
Technical Field
The invention belongs to the field of geostatistics and spatial analysis, and particularly relates to a landslide disaster logistic regression analysis method based on a feature function spatial filtering value.
Background
Landslide disasters are one of the most common geological disasters, landslide disaster analysis mainly comprises qualitative analysis and quantitative analysis (Yalcin et al, 2011, see background document 1), the qualitative analysis is mainly used for analyzing and evaluating the landslide disasters by researchers in related industries based on professional knowledge and deep understanding and investigation of a research area, and is mainly used for small-range or certain specific accidents, and the mainly used methods comprise an expert experience method, a weighted linear sum method and hierarchical analysis. The quantitative method for landslide analysis is mostly established on the basis of a perfect theory, and is used for analyzing and researching landslide disasters from a macroscopic view, and mainly comprises a fixed value design method, an artificial intelligence method and a multivariate statistical method.
The Logistic regression model is one of the methods for analyzing landslide disasters in a relatively extensive quantitative mode, is used as a generalized linear statistical method, is established by taking the probability of whether an event occurs as a dependent variable and factors influencing the event as independent variables, and is suitable for two-classification and multi-classification variables. In the process of landslide hazard research, because the dependent variable of the landslide hazard is classified data, namely landslide occurrence and landslide non-occurrence, a general linear regression model cannot analyze the variable. The logistic regression model has the advantage that the variables of the second class can be converted into logistic variables which can be subjected to regression modeling with several independent variables (Bewick et al, 2005, see background document 2), so that the logistic regression model can be applied to the research of landslide hazard analysis. Many scholars (Bai et al, 2010, background 3; Das et al, 2010, background 4; Mousaviet al, 2011, background 5; Budimir et al, 2014, background 6) establish a logistic regression model based on GIS, perform landslide disaster sensitivity research on different regions, evaluate the model, and verify the applicability of the logistic regression model in landslide analysis. In addition, the comparison research between the logistic regression model and models such as neural networks, frequency ratios, decision trees, evidence weights and information quantity (Yeselnacar et al, 2005, see background document 7; Wang et al, 2016, see background document 8), Chen et al, 2016, see background document 9) also shows that the logistic regression model has good performances in various aspects such as applicability, model precision, evaluation effect and the like compared with other research methods.
But the first laws of geography indicate: geographic objects or attributes are related to each other in spatial distribution, there are clustered (clustering), random (random), regular (Regularity) distributions, and the closer the distance the stronger the correlation (Miller et al, 2004, see background document 10). In the traditional Logistic regression model, the spatial correlation existing between variables can propagate in the Logistic regression through errorsThe residual error of the model is shown, and the Moran's I value of the residual error is usually used as a measuring index (
Figure BDA0001523916470000021
et al, 1996, see background 11), which often leads to misjudgment of the model and affects the accuracy of the model. To solve this problem, the effect of spatial autocorrelation needs to be eliminated.
The method for eliminating the spatial autocorrelation influence mainly comprises a geographical weighted regression method and a spatial filtering (spatialfiltering) method, the spatial filtering method is firstly proposed by Getis (Getis,1995, see background document 12) and Griffith (Griffith,2000, see background document 13), the core idea of the method is that the variables in the model are decomposed into a spatial influence part and a non-spatial influence part, the spatial influence part of the variables is extracted and filtered, and the analysis can be carried out by using a common regression method. The spatial filtering method proposed by Getis utilizes local Gi statistics to change independent variables through a formula, and realizes the filtering of spatial influence parts in residual errors.
The characteristic function spatial filtering method proposed by Griffith replaces the autocorrelation part in the model residual by selecting a characteristic vector and adding the characteristic vector to an independent variable to construct a filtering operator, so that the residual part is only affected by random errors, thereby eliminating the influence of spatial autocorrelation (Getis and Griffith,2002, see background document 14). The filtering operator is equivalent to the autocorrelation part of the residual, so it is necessary to include the spatial relationship between the geographic units. The spatial weight matrix can effectively express the spatial correlation of the geographic units by constructing the binary relation among the spatial geographic units, so that a filtering operator can be constructed on the basis of the spatial weight matrix. The selected characteristic vector is used for constructing a filtering operator to be added into the linear regression model, so that model error setting caused by the influence of the spatial autocorrelation of the residual error can be effectively reduced. Patueli (Patueli et al, 2011, see background document 15) utilizes a spatial filter value method to research the Germany unemployment phenomenon, finds that the addition of the spatial filter value enables the prediction accuracy of the regression model to the unemployment phenomenon to be improved, and verifies the effectiveness of the spatial filter value method from the perspective of empirical research. Murakami and Griffith (Murakami et al, 2015, see background document 16) analyze the problem that the spatial autocorrelation cannot be effectively processed by the random effect feature function spatial filtering method after spatial aliasing is considered, and the article elaborates two main defects of the random effect ESF in detail, and proposes to add the maximum likelihood estimation of the residual error into an extended model to solve the problem, thereby further expanding the application range of the spatial filtering method. Chun (Chun et al, 2016, see background document 17) indicates that although the spatial filtering method can effectively solve the spatial autocorrelation problem and is suitable for different research fields, the algorithm itself is complex, and the calculation efficiency needs to be improved, so that a method for generating a feature vector subset more quickly and effectively is provided, and the efficiency of the spatial filtering method is greatly improved. It follows that the spatial filtering method is becoming more sophisticated.
Background literature:
[1]Yalcin A,Reis S,Aydinoglu AC,et al.AGIS-based comparative study offrequency ratio,analytical hierarchy process,bivariate statistics andlogistics regression methods for landslide susceptibility mapping in Trabzon,NE Turkey[J].Catena,2011,85(3):274-287.
[2]Bewick V,Cheek L,Ball J.Statistics review 14:Logistic regression[J].Critical Care,2005,9(1):112.
[3]Bai S B,JianW,Zhou P G,et al.GIS-based logistic regression forlandslide susceptibility mapping of the Zhongxian segment in the Three Gorgesarea,China.[J].Geomorphology,2010,115(1–2):23-31.
[4]Das I,Sahoo S,Westen C V,et al.Landslide susceptibility assessmentusing logistic regression and its comparison with a rock mass classificationsystem,along a road section in the northern Himalayas(India).[J].Geomorphology,2010,114(4):627-637.
[5]Seyedeh Zohreh Mousavi,Ataollah Kavian,Karim Soleimani,et al.GIS-based spatial prediction of landslide susceptibility using logisticregression model[J].Geomatics Natural Hazards&Risk,2011,2(1):33-50.
[6]Budimir M E A,Atkinson P M,Lewis H G.Asystematic review oflandslide probability mapping using logistic regression[J].Landslides,2015,12(3):419-436.
[7]E.Yesilnacar,T.Topal.Landslide susceptibility mapping:Acomparisonof logistic regression and neural networks methods in a medium scale study,Hendek region(Turkey)[J].Engineering Geology,2005,79(3–4):251-266.
[8]Wang L J,Guo M,Sawada K,et al.Acomparative study of landslidesusceptibility maps using logistic regression,frequency ratio,decision tree,weights of evidence and artificial neural network[J].Geosciences Journal,2016,20(1):117-136.
[9]Chen T,Niu R,Jia X.A comparison of information value and logisticregression models in landslide susceptibility mapping by using GIS[J].Environmental Earth Sciences,2016,75(10):1-16.
[10]Miller H J.Tobler's First Law and Spatial Analysis[J].Annals ofthe Association of American Geographers,2004,94(2):284–289.
[11]
Figure BDA0001523916470000041
T.The spatial autocorrelation coefficient moran's i underheteroscedasticity.Statistics in Medicine,1996,15(7-9):887.
[12]Getis A.Spatial Filtering in a Regression Framework:ExamplesUsing Data on Urban Crime,Regional Inequality,and Government Expenditures[M]//New Directions in Spatial Econometrics.1995:172-185.
[13]Griffith D A.A linear regression solution to the spatialautocorrelation problem[J].Journal of Geographical Systems,2000,2(2):141-156
[14]Getis A,Griffith D A.Comparative Spatial Filtering in RegressionAnalysis[J].Geographical Analysis,2002,34(2):130–140.
[15]Patuelli R,Griffith D A,Tiefelsdorf M,et al.Spatial FilteringMethods For Tracing Space-Time Developments In An Open Regional System:Experiments with German Unemployment Data[M]//Societies in Motion:Innovation,Migration and Regional Transformation.2012.
[16]Murakami D,Griffith D A.Random effects specifications ineigenvector spatial filtering:a simulation study[J].Journal of GeographicalSystems,2015,17(4):1-21.
[17]Chun Y,Griffith D A,Lee M,et al.Eigenvector selection withstepwise regression techniques to construct eigenvector spatial filters[J].Journal of Geographical Systems,2016,18(1):67-85.
disclosure of Invention
In order to solve the problem that when the Logistic regression model is applied to landslide hazard analysis, the precision of the model is low due to the influence of spatial autocorrelation among variables, the invention provides a landslide hazard Logistic regression analysis method based on a characteristic function spatial filter value.
The technical scheme adopted by the invention is a landslide disaster logistic regression analysis method based on a characteristic function space filtering value, which comprises the following steps of:
step 1, selecting and processing landslide sample data, wherein the landslide sample data comprises landslide point samples and non-landslide point samples, the landslide point samples and corresponding spatial position attributes and landslide area attributes of the landslide point samples are obtained, and the non-landslide point samples with the same number as the landslide point samples are selected;
step 2, obtaining and grading corresponding disaster causing factor values of the landslide sample data obtained in the step 1;
step 3, constructing a Thiessen polygon on the landslide sample points obtained in the step 1, judging the spatial adjacency relation among the sample points to obtain a corresponding spatial adjacency matrix W, and performing centralization operation on the spatial adjacency matrix W to obtain a matrix C;
step 4, calculating the eigenvalue and the eigenvector of the matrix C obtained in the step 3;
step 5, aiming at Logistic regression, selecting stepwise regression eigenvector according to the eigenvector and eigenvalue obtained in step 4, and realizing the following steps,
step 5.1, the preliminary screening of the feature vectors comprises the steps of calculating the Moran's I value of each feature vector through the corresponding feature value, selecting the feature vectors of which the Moran's I value is larger than the corresponding preset threshold value as a candidate feature vector set E nSelecting subsequent characteristic vectors;
step 5.2, aiming at the Logistic regression model, carrying out vector set E on the candidate features obtained in the step 5.1 nRespectively adding the n candidate feature vectors into a regression model without a filtering operator to obtain n new regression models, calculating the likelihood ratio test statistic LRT of the new and old models, selecting the feature vector with the maximum LRT statistic, adding the feature vector into the regression model, and calculating the probability ratio test statistic LRT of the new and old models in the E nRemoving the selected characteristic vectors;
step 5.3, the selected feature vectors are subjected to significance test, if the result is significant, the feature vectors are removed, the step 5.2 is returned to be executed, and if the result is not significant, the step 5.4 is executed;
step 5.4, carrying out significance test on the residual error space autocorrelation of the new model added with the feature vector, if the result is significant, returning to execute the step 5.2 and the step 5.3, and if the result is not significant, finishing the selection of the feature vector;
and 6, adding the feature vector selected in the step 5 into a Logistic regression model as an independent variable, and constructing the landslide hazard Logistic regression model based on the feature function spatial filtering value.
In step 1, the landslide hazard point is taken as the center of a circle, the landslide influence range is taken as the radius to be used as a buffer area, a landslide influence area is obtained, the landslide influence area is subtracted from the whole landslide research area to be a selected area of non-landslide points, and non-landslide point samples with the quantity which is substantially the same as that of the landslide point samples are randomly selected in the selected area.
And selecting four indexes of residual Moran's I, Prob > chi2, Pseudo R2 and AUC value of ROC curve as evaluation parameters to evaluate the landslide disaster Logistic regression model based on the characteristic function space filtering value.
Aiming at the landslide disaster analysis research, the invention introduces the spatial filtering value idea into a common logistic regression model, and designs a logistic regression landslide disaster analysis method based on the characteristic function spatial filtering value. The method can solve the problem that the precision of the model is low due to the autocorrelation of the residual error of the logistic regression model, filter the autocorrelation influence of the residual error, effectively improve the goodness of fit and the prediction accuracy of the regression model, and realize accurate simulation and prediction of landslide disasters.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a sub-flowchart of step 1 according to an embodiment of the present invention.
FIG. 3 is a sub-flowchart of step 2 according to an embodiment of the present invention.
FIG. 4 is a sub-flowchart of step 5 according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The core problem to be solved by the invention is as follows: when the Logistic regression model is applied to landslide hazard analysis, the influence of spatial autocorrelation among variables on model precision and fitting goodness is eliminated by using a characteristic function spatial filtering method.
Referring to fig. 1, the following steps are performed:
step 1: selecting and processing landslide sample data (the sum of landslide point samples and non-landslide point samples), including obtaining landslide point samples and corresponding spatial position attributes and landslide area attributes thereof according to actual conditions, and selecting non-landslide point samples with the same number as that of the landslide point samples;
referring to fig. 2, in the specific implementation, the same number of non-landslide point samples as the landslide point samples are selected. The selection of the non-landslide point samples should follow two principles: firstly, the non-landslide point samples should have a certain distance from the region where landslide has occurred on the spatial position, and secondly, the non-landslide point samples should be uniformly distributed as much as possible, so that model errors caused by cluster effects are avoided. The specific selection principle is as follows: firstly, the existing landslide point sample and the corresponding landslide area are utilized to calculate the corresponding landslide influence area, the specific calculation mode is that the landslide hazard point is taken as the center of a circle, the corresponding landslide influence distance is taken as the radius to be used as a buffer area, wherein the landslide influence distance is determined by the landslide area calculation, and the calculation formula is as follows:
Figure BDA0001523916470000061
wherein R is the landslide influence distance, A is the landslide area, and rho is a proportionality constant which can be determined according to specific conditions. The area of influence of the landslide is subtracted from the whole landslide research area to obtain a selection area of the sample without the landslide point; and randomly generating equal number of non-landslide point samples in the selected area. And the obtained non-landslide point sample and the known landslide point sample jointly form landslide sample data.
Step 2: obtaining and grading corresponding disaster causing factor values of the landslide sample data (point data) obtained in the step 1;
referring to fig. 3, in specific implementation, a factor having a strong inducing effect on a landslide disaster is selected as a disaster-causing factor of a landslide according to a research area. The data source for acquiring the disaster-causing factor value mainly comprises raster data and vector data, and the vector data possibly comprises line data and surface data, so that the acquisition of the disaster-causing factor value corresponding to the landslide sample data mainly involves extracting the raster value based on the point elements, calculating the distance from the point elements to the line elements, and judging the position relation between the point and the surface. For example, an elevation value corresponding to the landslide sample data may be acquired based on a Digital Elevation Model (DEM).
The grid value extraction based on the point elements is to obtain the row-column coordinates of the disaster-causing factor grid image by performing inverse calculation on the geographic coordinates of the points, and the factor value of the landslide sample can be obtained by reading the corresponding grid value. The method for converting the relation between the geographic reference coordinate system and the grid position mainly comprises GCP (multiple control point positioning mode) and affine transformation.
The distance between the point element and the line element can be converted into the distance between the point and the line element, namely the distance between the point element and the line segment forming the line element is calculated in sequence, and the minimum distance is selected to be the distance between the point element and the line element. The distance from the point to the line segment has three specific algorithms, namely a classical geometric analysis algorithm, an area algorithm and a vector algorithm.
And judging the relative position of the point and the surface to determine the polygon to which the landslide point belongs and reading the attribute value of the polygon to obtain the corresponding disaster-causing factor value. The algorithm for judging whether the point is in the polygon mainly comprises an area sum judgment method, an included angle sum judgment method and an injection line method.
Grading of the landslide sample data corresponding to the disaster-causing factor values is carried out according to the type of the data, the grading standard of the quantized data is mainly to ensure the maximum difference between different grades according to a natural interval method, and the grading of the unquantized data is mainly carried out according to the influence on landslide disasters by referring to the conventional qualitative research.
And step 3: judging the spatial adjacency relation among the sample points of the landslide sample data obtained in the step 1 by constructing a Thiessen polygon to obtain a corresponding spatial adjacency matrix W, and performing centralization operation on the spatial adjacency matrix W to obtain a matrix C;
in specific implementation, corresponding Thiessen polygons are constructed based on sample points, each Thiessen polygon corresponds to one discrete landslide sample point, so that the spatial adjacency judgment of the points is converted into the spatial adjacency judgment of the surface, if two spatial units are adjacent, the weight between the two spatial units is 1, otherwise, the two spatial units are 0, and finally, a matrix of n x n, namely a spatial adjacency matrix W, can be obtained. The created adjacent matrix W is symmetric about a diagonal, which results in that the eigenvector results are orthogonal to each other in the subsequent calculation of the eigenvector, which may cause multiple collinearity problem to cause the model to be set incorrectly, so that the matrix W needs to be centered, and the calculation formula is as follows:
Figure BDA0001523916470000081
where C is the centered matrix, I is the identity matrix, 11 TFor a matrix with all elements 1, n is the row and column number of the adjacent matrix, and the row and column numbers are equal.
And 4, step 4: calculating the eigenvalue and the eigenvector of the matrix C obtained in the step 3;
in specific implementation, the characteristic value and the characteristic vector of the C are calculated by combining a numerical analysis calculation method and a computer program algorithm for the centralized matrix C. The specific implementation can use the prior art, and the common algorithms for solving the eigenvalue and the eigenvector at present are power method, inverse power method, Jacobi iterative method and QR algorithm. And the software and open source libraries capable of calculating the eigenvalue and the eigenvector are more, wherein the MATLAB, the Eigen library and the C # self-contained math library are more common.
And 5: aiming at Logistic regression, performing stepwise regression feature vector selection according to the feature vectors and the feature values obtained in the step 4, wherein the specific selection steps are as follows:
referring to fig. 4, the implementation is as follows:
step 5.1: the preliminary screening of the feature vectors comprises the steps of calculating the Moran's I value of each feature vector through the corresponding feature value of the feature vector and selecting the feature vectors with the Moran's I value larger than a corresponding preset threshold (preferably set to be 0.25) as a candidate feature vector set E nSelecting subsequent characteristic vectors; moran's I values represent the Moran index;
in specific implementation, it can be known that the feature vector should have an autocorrelation consistent with the residual, and the autocorrelation of the feature vector is measured by using the Moran's I value, and the larger the Moran's I value of the feature vector is, the more representative the autocorrelation of the residual is. Feature vectors with Moran's I >0.25 can be selected as candidate subsets to improve subsequent selection efficiency. The Moran's I value of the eigenvector can be calculated from the corresponding eigenvalue, and the calculation formula is as follows:
Figure BDA0001523916470000082
wherein λ is iIs corresponding toThe characteristic value, n is the row and column number of the matrix, and W is the original adjacent matrix; l is a vector where all elements of n x 1 are 1.
Step 5.2: aiming at the Logistic regression model, setting a candidate feature vector set E nThe method comprises n candidate feature vectors, and a candidate feature vector set E obtained in step 5.1 nRespectively adding n candidate feature vectors Ei (i belongs to (1,2, …, n)) into a regression model Y (aX + b), replacing an independent variable X with X (X + Ei), wherein the original independent variable X refers to the grading data of the landslide sample data corresponding to the disaster factor value obtained in the step 2, thereby obtaining n new regression models, calculating the likelihood ratio test statistic LRT of the new and old models, selecting the feature vector with the largest LRT statistic, adding the feature vector into the regression model, and adding the feature vector into the regression model in E nRemoving the selected characteristic vectors;
in specific implementation, the added feature vector is assumed to have no significance on the regression model, the likelihood function is used as a judgment index to carry out hypothesis test, and then a likelihood ratio test variable LRT is constructed:
Figure BDA0001523916470000091
wherein the content of the first and second substances, is the maximum value of the likelihood function of the initial regression model,
Figure BDA0001523916470000093
the maximum value of the likelihood function of the new regression model after the feature vector is added. The larger the LRT value is, the more the added feature vector causes the maximum value of the likelihood function to change, the more the reason to reject the original hypothesis is, so the feature vector with the maximum LRT statistical value is selected.
Step 5.3: the significance of the selected feature vectors is checked, if the result is significant, the feature vectors are removed, the step 5.2 is executed in a rotating mode, the feature vectors are selected in the remaining feature vector set again, and if the result is not significant, the following step 5.4 is executed;
in specific implementation, the significance of the selected feature vector is checked, and whether the feature vector parameter has significance is checked. Since the logistic regression model is a nonlinear regression model, the embodiment selects the Wald chi-square test in the significance test to judge the significance of the regression parameters of the selected feature vectors, thereby judging whether the feature vectors need to be eliminated. The original hypothesis of the test is that the parameters of the feature vector selected in the regression model should be 0, the p value of the chi-square statistic in the overall probability distribution under the hypothesis condition is calculated, and if the p value is less than the significance level, generally 0.05, the original hypothesis is rejected, that is, the feature vector does not need to be removed. Otherwise, the feature vector is removed and selected again.
Step 5.4: carrying out significance test on the residual error spatial autocorrelation of the new model added with the feature vector, if the result is significant, indicating that the residual error still has significant spatial autocorrelation after the selected feature vector is added into the regression model, and further continuously selecting a proper feature vector to add, wherein the feature vector is selected, then carrying out the step 5.2 and the step 5.3 in a rotating way, and if the result is not significant, then finishing the selection of the feature vector;
in general, after a plurality of feature vectors are added into a model, the purpose can be achieved generally, and if all the feature vectors are traversed, the residual spatial autocorrelation cannot be eliminated, which indicates that the method is not suitable for the situation.
In specific implementation, the method for calculating the Logistic regression residual is the pearson residual which is most widely applied, and in the embodiment, the judgment of ending or continuously selecting the feature vector is preferably performed by judging the significance of the spatial autocorrelation of the pearson residual of the new model obtained through the steps. First, the Pearson residual e and Moran's I values of the residual are calculated and are respectively marked as P realAnd I realSince there is only one pearson residual and its Moran's I value, it is necessary to randomly arrange the values in the pearson residual vector N times (the preset values are 999 times, and each row in one e vector is randomly arranged to obtain 999 one-dimensional e vectors), and calculate a new pearson residual P using the randomly arranged residual vectors as dependent variables rndThe Moran's I values I of N new Pearson residuals are counted rndIs greater than I realThe ratio of the times of the p values is the p value, and the calculation method of the p value is as follows:
and if p is more than 0.05, ending the spatial filtering algorithm.
Step 6: and (5) adding the feature vector selected in the step (5) as an independent variable into a Logistic regression model, and constructing the landslide disaster Logistic regression model based on the feature function spatial filtering value. I.e. a regression model is constructed based on the feature vectors selected in step 5 and the original independent variables.
In specific implementation, the logistic regression model after adding all the selected feature vectors can be expressed as:
logit(x)=w 0+w 1x 1+…+w nx n+Eα
wherein x is 1,… nAnd E is selected characteristic vector, wherein the landslide sample data corresponds to the disaster factor value grade data. Parameter solution of the Logistic regression model can be carried out by utilizing a maximum likelihood estimation method, and the Logistic regression model based on the characteristic function space filtering value is constructed and used for research and analysis of landslide disasters.
And 7: and (3) evaluating the model, namely selecting four indexes of residual Moran's I, Prob > chi2, Pseudo R2 and AUC (extreme program control) values of ROC (ROC) curves as evaluation parameters to evaluate the Logistic regression model based on the characteristic function spatial filter value, which is constructed in the step 6.
In specific implementation, residual Moran's I and Prob of a common Logistic model and a Logistic regression model based on a characteristic function spatial filter value are respectively calculated>chi2, Pseudo R2, and AUC values. The residual Moran's I is used for evaluating the effect of spatial filtering on the self-correlation processing of the residual, and the residual Moran's I is lower than a threshold value of 0.05, so that the fact that the residual has no spatial self-correlation can be judged; prob>chi2 is the probability that the chi-square statistic zero hypothesis is true, i.e., the probability that the independent variable has no effect on the dependent variable, and is used to evaluate whether the regression model parameters are meaningful, typically 0.05 or 0.01 as the significance level of the parameters, whenProb>When chi2 is less than 0.05 or 0.01, independent variables have a remarkable influence on dependent variables, and the parameters are considered to be meaningful; pseudo R2, also known as Pseudo R2, is the test quantity proposed by the logistic regression model with reference to the R-squared of the linear regression model, Pseudo R2 2The fitting degree of the model is evaluated, and the larger the numerical value is, the better the fitting effect of the regression model is; the ROC curve is also called a receiver working characteristic curve and is a comprehensive index combining specificity, sensitivity and misjudgment rate, the area under the curve is an AUC value, the AUC value is used for evaluating the prediction accuracy of the regression model, and the larger the AUC value is, the higher the prediction accuracy of the model is.
The qualified model can be used in the whole research area, and the landslide susceptibility of the whole area can be predicted through regression.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
The above description is only one embodiment of the present invention, and is not intended to limit the present invention. Any modification, improvement or the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A landslide disaster logistic regression analysis method based on a feature function space filtering value is characterized by comprising the following steps:
step 1, selecting and processing landslide sample data, wherein the landslide sample data comprises landslide point samples and non-landslide point samples, the landslide point samples and corresponding spatial position attributes and landslide area attributes of the landslide point samples are obtained, and the non-landslide point samples with the same number as the landslide point samples are selected;
step 2, obtaining and grading corresponding disaster causing factor values of the landslide sample data obtained in the step 1;
step 3, constructing a Thiessen polygon on the landslide sample points obtained in the step 1, judging the spatial adjacency relation among the sample points to obtain a corresponding spatial adjacency matrix W, and performing centralization operation on the spatial adjacency matrix W to obtain a matrix C;
step 4, calculating the eigenvalue and the eigenvector of the matrix C obtained in the step 3;
step 5, aiming at Logistic regression, selecting stepwise regression eigenvector according to the eigenvector and eigenvalue obtained in step 4, and realizing the following steps,
step 5.1, the preliminary screening of the feature vectors comprises the steps of calculating the Moran 'sI value of each feature vector through the corresponding feature value, selecting the feature vectors with the Moran's I value larger than the corresponding preset threshold value as a candidate feature vector set E nSelecting subsequent characteristic vectors;
step 5.2, aiming at the Logistic regression model, carrying out vector set E on the candidate features obtained in the step 5.1 nRespectively adding the n candidate feature vectors into a regression model without a filtering operator to obtain n new regression models, calculating the likelihood ratio test statistic LRT of the new and old models, selecting the feature vector with the maximum LRT statistic, adding the feature vector into the regression model, and calculating the probability ratio test statistic LRT of the new and old models in the E nRemoving the selected characteristic vectors;
step 5.3, the selected feature vectors are subjected to significance test, if the result is significant, the feature vectors are removed, the step 5.2 is returned to be executed, and if the result is not significant, the step 5.4 is executed;
step 5.4, carrying out significance test on the residual error space autocorrelation of the new model added with the feature vector, if the result is significant, returning to execute the step 5.2 and the step 5.3, and if the result is not significant, finishing the selection of the feature vector;
and 6, adding the feature vector selected in the step 5 into a Logistic regression model as an independent variable, and constructing the landslide hazard Logistic regression model based on the feature function spatial filtering value.
2. The landslide disaster logistic regression analysis method based on feature function spatial filtering value according to claim 1, wherein: in the step 1, a landslide disaster point is taken as a circle center, a landslide influence range is taken as a radius to serve as a buffer area, a landslide influence area is obtained, the landslide influence area is subtracted from the whole landslide research area to serve as a selected area of non-landslide points, and non-landslide point samples with the same quantity as that of the landslide point samples are randomly selected in the selected area.
3. The landslide hazard logistic regression method based on feature function spatial filtering value according to claim 1 or 2, characterized in that: selecting four indexes of residual Moran's I, Prob > chi2, Pseudo R2 and AUC value of ROC curve as evaluation parameters to evaluate the landslide disaster Logistic regression model based on the characteristic function space filtering value.
CN201711425595.5A 2017-12-25 2017-12-25 Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value Active CN108038081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711425595.5A CN108038081B (en) 2017-12-25 2017-12-25 Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711425595.5A CN108038081B (en) 2017-12-25 2017-12-25 Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value

Publications (2)

Publication Number Publication Date
CN108038081A CN108038081A (en) 2018-05-15
CN108038081B true CN108038081B (en) 2020-02-11

Family

ID=62101191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711425595.5A Active CN108038081B (en) 2017-12-25 2017-12-25 Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value

Country Status (1)

Country Link
CN (1) CN108038081B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784552B (en) * 2018-12-29 2022-12-13 武汉大学 Re-ESF algorithm-based construction method of space variable coefficient PM2.5 concentration estimation model
CN110188324B (en) * 2019-05-17 2023-05-30 武汉大学 Traffic accident poisson regression analysis method based on feature vector space filtering value
CN110569554B (en) * 2019-08-13 2020-11-10 成都垣景科技有限公司 Landslide susceptibility evaluation method based on spatial logistic regression and geographic detector
CN112070366B (en) * 2020-08-19 2022-03-29 核工业湖州勘测规划设计研究院股份有限公司 Regional landslide risk quantitative measuring and calculating method based on multi-source monitoring data correlation analysis
CN112181642B (en) * 2020-09-16 2024-02-02 武汉大学 Artificial intelligence optimization method for space calculation operation
CN113468477B (en) * 2020-12-23 2023-11-24 南方科技大学 Sensitive data investigation analysis method, storage medium and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103364830A (en) * 2013-07-24 2013-10-23 北京师范大学 Predication method of happening position of slump disaster after earthquake based on multiple factors
CN106600578A (en) * 2016-11-22 2017-04-26 武汉大学 Remote-sensing-image-based parallelization method of regression model of characteristic function space filter value

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103364830A (en) * 2013-07-24 2013-10-23 北京师范大学 Predication method of happening position of slump disaster after earthquake based on multiple factors
CN106600578A (en) * 2016-11-22 2017-04-26 武汉大学 Remote-sensing-image-based parallelization method of regression model of characteristic function space filter value

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Accounting for Spatial Autocorrelation in Linear Regression Models Using Spatial Filtering with Eigenvectors;Jonathan B.Thayn 等;《Annals of the Association of American Geographers》;20120620;第103卷(第1期);47-66 *

Also Published As

Publication number Publication date
CN108038081A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN108038081B (en) Landslide disaster logistic regression analysis method based on characteristic function spatial filtering value
CN112819207B (en) Geological disaster space prediction method, system and storage medium based on similarity measurement
CN113642849B (en) Geological disaster risk comprehensive evaluation method and device considering spatial distribution characteristics
Ghiasi-Freez et al. Semi-automated porosity identification from thin section images using image analysis and intelligent discriminant classifiers
CN109657547A (en) A kind of abnormal trajectory analysis method based on associated model
Zhao et al. Automatic recognition of loess landforms using Random Forest method
Wan Entropy-based particle swarm optimization with clustering analysis on landslide susceptibility mapping
CN116108758B (en) Landslide susceptibility evaluation method
Wan et al. A novel data mining technique of analysis and classification for landslide problems
CN114330812A (en) Landslide disaster risk assessment method based on machine learning
CN111079999A (en) Flood disaster susceptibility prediction method based on CNN and SVM
CN113487105B (en) Landslide geological disaster space prediction method and device and computer equipment
CN109118004B (en) Prediction method for suitable area for engineering construction site selection
CN104881867A (en) Method for evaluating quality of remote sensing image based on character distribution
CN110363299A (en) Space reasoning by cases method towards delamination-terrane of appearing
Ayhan et al. Analysis of image classification methods for remote sensing
CN104809471A (en) Hyperspectral image residual error fusion classification method based on space spectrum information
Naeini et al. Improving the dynamic clustering of hyperspectral data based on the integration of swarm optimization and decision analysis
CN110362911A (en) A kind of agent model selection method of Design-Oriented process
CN117408167A (en) Debris flow disaster vulnerability prediction method based on deep neural network
CN116933947A (en) Landslide susceptibility prediction method based on soft voting integrated classifier
Kalpana et al. A novel approach to measure the pattern of urban agglomeration based on the road network
CN109636194B (en) Multi-source cooperative detection method and system for major change of power transmission and transformation project
CN113343565B (en) Neighborhood effect mode construction and CA simulation method and system considering spatial heterogeneity
Liu et al. Road density analysis based on skeleton partitioning for road generalization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant