CN115510945A - Geological disaster probability forecasting method based on principal component and Logistic analysis - Google Patents

Geological disaster probability forecasting method based on principal component and Logistic analysis Download PDF

Info

Publication number
CN115510945A
CN115510945A CN202210928334.XA CN202210928334A CN115510945A CN 115510945 A CN115510945 A CN 115510945A CN 202210928334 A CN202210928334 A CN 202210928334A CN 115510945 A CN115510945 A CN 115510945A
Authority
CN
China
Prior art keywords
rainfall
disaster
factors
probability
geological disaster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210928334.XA
Other languages
Chinese (zh)
Other versions
CN115510945B (en
Inventor
许凤雯
李宇梅
宋巧云
包红军
狄靖月
刘海知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Meteorological Center Central Meteorological Station
Original Assignee
National Meteorological Center Central Meteorological Station
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Meteorological Center Central Meteorological Station filed Critical National Meteorological Center Central Meteorological Station
Priority to CN202210928334.XA priority Critical patent/CN115510945B/en
Publication of CN115510945A publication Critical patent/CN115510945A/en
Application granted granted Critical
Publication of CN115510945B publication Critical patent/CN115510945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a geological disaster probability forecasting method based on principal components and Logistic analysis, which comprises the steps of extracting a disaster point rainfall factor from a rainfall and information amount database, selecting 6 factors of geological disaster easiness and rainfall representation by the disaster point rainfall factor, and selecting the current-day rainfall, early-stage effective rainfall, rainfall days, the longest continuous rainfall and the maximum rainfall in the last 3 days, which represent the rainfall amount and the rainfall continuity, by the 6 factors of the rainfall representation; establishing a binary data series according to the disaster point rainfall factor and the easy-to-send information content; obtaining principal components by a principal component analysis method, and determining main influence factors; performing Logistic regression on different main components and main influence factors; checking parameter comparison; and obtaining an optimal Logistic probability model. The method carries out regression after researching the factor significance, is more scientific, and can improve the hit rate by about 30 percent compared with the original model in the inspection of the model.

Description

Geological disaster probability forecasting method based on principal component and Logistic analysis
Technical Field
The invention relates to a geological disaster probability forecasting method based on principal components and Logistic analysis, and belongs to the technical field of meteorological research.
Background
The rainfall-induced geological disaster forecasting technology is mainly divided into two categories, one is a statistical method, namely, the relation between rainfall and geological disaster occurrence probability or displacement monitoring is researched through a large number of geological disaster samples, and the other is a dynamic mechanism method, wherein a mathematical physical criterion equation dynamic forecasting is established by considering the dynamic change process of a geological body in the rainfall process. In contrast, the statistical method is more applied to services due to its better operability and practicability. The statistical method is commonly used by multiple regression, a neural network model, a Bayes method or a frequency method and the like, the methods are used for counting rainfall before the disaster occurs, such as rainfall accumulation, rainfall duration, rainfall intensity and the like, some methods also consider the geological disaster susceptibility, and the factors are used as dependent variables for counting and establishing the relation between the factors and landslide time or scale. The Logistic regression method has no special requirements on factors and good early warning effect in a statistical method, and has wide application in regional and local geological disaster early warning.
The selection of factors in the geological disaster probability model established by the Logistic regression method is subjective, the factors are selected to have precipitation and geological and geomorphic characteristics, the factors are two to more than twenty, most researches have no related factor selection method, the model is slow to operate, and the reference value of results is reduced.
Disclosure of Invention
In order to solve the technical problems, the invention provides a geological disaster probability forecasting method based on principal components and Logistic analysis, which is researched scientifically by selecting factors and preferentially modeled according to inspection effects. On the basis of sample inspection, carry out long-time series inspection to the data outside the sample, compare rainfall inspection model and regional Logistic model simultaneously, the inspection effect is more representative, its specific technical scheme as follows:
a geological disaster probability forecasting method based on principal components and Logistic analysis comprises the following steps:
step 1: extracting a disaster point rainfall factor from a rainfall and information amount database;
step 2: extracting the probability of geological disasters;
and 3, step 3: according to the rainfall factor of the disaster point and the information amount of the easiness in occurrence, establishing a classified data series, wherein the process of establishing the classified data series is to push a part of disaster data occurrence days forward to the date of no disaster occurrence, determine a sample of no disaster occurrence, and simultaneously reject the sample of the disaster occurrence;
and 4, step 4: obtaining principal components by a principal component analysis method, and determining main influence factors;
and 5: performing Logistic regression on different main components;
and 6: performing Logistic regression on the main influence factors;
and 7: dividing test data into training data and inspection data, applying the training data, building a model through the steps 5 and 6, applying the model to forecast the inspection data, and inspecting the accuracy of a real situation, wherein inspection parameters comprise accuracy and an ROC curve;
and step 8: and comparing the accuracy of the test data of the established model with the ROC curve to obtain the optimal Logistic probability model.
Furthermore, in the step 1, the disaster point rainfall factor selects 6 factors of the easy emergence degree of the geological disaster and the representation of the rainfall, and the 6 factors of the representation of the rainfall select the current-day rainfall, the early-stage effective rainfall, the rainfall days, the longest continuous rainfall, and the maximum rainfall in the previous 3 days.
Further, before step 3, firstly, in order to ensure the continuity of the factors, normalizing the 6 related factors characterizing precipitation, and secondly, the difference between the attributes and the magnitude of the 6 factors characterizing precipitation and the susceptibility of the geological disaster are required to be subjected to data standardization processing.
Further, the normalization process comprises: cubic root development was performed on 6 relevant factors characterizing precipitation;
since different variables often have different units and different variation degrees, the different units often make practical interpretation of the coefficients difficult, and in order to eliminate the dimensional influence and the influence of the variation size and the numerical value size of the variables themselves, the data needs to be standardized. The specific process of data standardization processing is as follows: the data normalization method is characterized in that the original data are subjected to subtraction of a data mean value and then are divided by a square difference or a standard deviation, the processed data conform to standard normal distribution, namely the mean value is 0, the standard deviation is 1, and the conversion function is as follows: y '= (y-mu)/sigma), wherein y' represents the normalized data after processing, y is the raw data before processing, mu is the average value of the raw data of the geological disaster susceptibility and 6 factors representing precipitation, and sigma is the variance or standard deviation of the raw data of the geological disaster susceptibility and 6 factors representing precipitation.
Furthermore, the principal component analysis method is a statistical method, the original variables are recombined into a group of new several independent comprehensive variables, at the same time, several less comprehensive variables are taken out from the group according to the actual needs, the information of the original variables is reflected as much as possible, a group of variables possibly having correlation are converted into a group of linear independent variables through orthogonal transformation, the group of converted variables are called principal components, and the method is a method for mathematically reducing the dimension, and the specific process of the step 4 is as follows:
researching the relation between the occurrence of geological disaster and the incidence of precipitation and geological disaster, taking principal components according to the accumulated contribution rate to ensure that the characteristic value is more than 1, selecting k principal components, and establishing a principal component F k Represented by the following formula:
F k =a 1k Z 1 +a 2k Z 2 +…+a ij Z i …+a pk Z p (1)
where Zi is the normalized value of the original variable, a ij The factor score coefficient of the factor i in the jth principal component is described, i.e. the contribution of the ith factor to the jth principal component, a ij Contribution rate E of variance corresponding to jth principal component j The combination of (1) is the weight value W of the ith environmental factor to be determined i Represented by the formula:
Figure BDA0003780591280000021
furthermore, logistic regression analysis is a generalized linear regression analysis model, is used for researching classification variables with dependent variables of two classifications or the occurrence rate of certain events, is commonly used for exploring certain disaster-causing factors and predicting the probability of disaster occurrence according to dangerous factors, and the geological disaster occurrence is the result of comprehensive influence of precipitation and geological disaster susceptibility factors, and is suitable for probability prediction by using the Logistic model;
the logistic regression equation is:
Figure BDA0003780591280000031
p is the occurrence probability, x is the related independent variable, two methods are used for Logitics regression, the independent variable is the first several of 6 factors directly used for representing the easiness degree of geological disasters and precipitation, the independent variable is the main component obtained by analysis, beta is the coefficient corresponding to each x, and p is the numerical value between 0 and 1 because the numerator is smaller than the denominator and the exponential function value is positive.
Further, the step 7 verification process is as follows: taking a range of 20km around the county as a forecast value extraction area, giving early warning for the geological disaster at a level higher than yellow when the probability of occurrence of the geological disaster is greater than 60%, releasing the geological disaster to the society, taking the maximum probability value and the average probability value in a disaster range as test parameters for realizing the quantification of the test, and when the maximum probability value of the surrounding area is greater than or equal to 60%, giving a hit, and when the maximum probability value is less than 60%, giving a missed report, and expressing the hit rate and the missed report rate according to a formula as follows:
hit rate:
Figure BDA0003780591280000032
and (4) the rate of missing report:
Figure BDA0003780591280000033
in the formula, NA is the correct prediction times, and NC is the missed prediction times.
The invention has the beneficial effects that:
on the basis of analyzing factors inducing geological disasters in central China by using a principal component method, different variables are designed to carry out Logistic regression, a geological disaster probability model is established, the disaster in 2019 is tested, and the results of the Logistic model and the rainfall statistic model are compared to evaluate.
The factor significance is researched and then regressed, so that the method is more scientific, in the test of the model, the accuracy rate of a test sample exceeds 80%, and the 2019 disaster condition test irrelevant to modeling data shows that the hit rate of the method is improved by more than 30% compared with that of the original statistical model.
Drawings
Figure 1 is a schematic flow diagram of the process of the present invention,
figure 2 is a schematic view of an embodiment of the invention,
FIG. 3 shows the geological disaster susceptibility and the distribution of the geological disaster scattering in the central China,
ROC curves for Logistic regression model test of the first 4 principal components and 4 principal variables in the example of figure 4,
FIG. 5 shows the maximum forecast probability value (unit:%) of geological disaster points in the south region of 5-10 months in 2019,
FIG. 6 shows the average probability value (unit:%) of geological disaster points in the south region of 5-10 months in 2019,
FIG. 7 is a comparison graph of precipitation statistics, logistic regression, and geological disaster model forecast results,
fig. 8 is a proof of application of the present invention.
Detailed Description
As shown in fig. 1, the method of the present invention comprises:
step 1: extracting a disaster point rainfall factor from a rainfall and information amount database; the disaster point rainfall factor selects 6 factors of the easy emergence degree of the geological disaster and the representation rainfall, and the 6 factors of the representation rainfall select the current-day rainfall, the early-stage effective rainfall, the precipitation day number, the longest continuous precipitation amount and the maximum rainfall amount of the previous 3 days of the representation rainfall continuity.
And 2, step: firstly, 6 factors for representing precipitation are normalized to ensure the continuity of the factors, and secondly, the dependent variables have differences of attributes and magnitude orders, and the raw data needs to be standardized.
The normalization processing process comprises the following steps: the 6 relevant factors characterizing precipitation were squared three times.
Since different variables often have different units and different variation degrees, the different units often make practical interpretation of the coefficients difficult, and in order to eliminate the dimensional influence and the influence of the variation size and the numerical value size of the variables themselves, the data needs to be standardized. The specific process of data standardization processing is as follows: the data normalization method is characterized in that the original data are subjected to subtraction of a data mean value and then are divided by a square difference or a standard deviation, the processed data conform to standard normal distribution, namely the mean value is 0, the standard deviation is 1, and the conversion function is as follows: y '= (y-mu)/sigma), wherein y' represents the normalized data after processing, y is the raw data before processing, mu is the average value of the raw data of the geological disaster susceptibility and 6 factors representing precipitation, and sigma is the variance or standard deviation of the raw data of the geological disaster susceptibility and 6 factors representing precipitation.
And step 3: establishing a binary data series according to the rainfall factor of the disaster point and the information amount easy to send;
and 4, step 4: principal components are obtained by a principal component analysis method, and main influence factors are determined.
The principal component analysis method is a statistical method, original variables are recombined into a group of new several independent comprehensive variables, at the same time, several less comprehensive variables are taken out according to actual needs, the information of the original variables is reflected as much as possible, a group of variables possibly having correlation are converted into a group of linear uncorrelated variables through orthogonal transformation, and the group of converted variables is called principal components, so that the method is a method for reducing dimensions mathematically.
Investigating geological disastersTaking principal components according to the accumulated contribution rate to make the characteristic value greater than 1, selecting k principal components, and establishing principal component F k Represented by the formula:
F k =a 1k Z 1 +a 2k Z 2 +…+a ij Z i …+a pk Z p (1)
where Zi is the normalized value of the original variable, a ij The factor score coefficient of the factor i in the jth principal component, i.e. the contribution of the ith factor to the jth principal component, a is described ij Contribution rate E of variance corresponding to jth principal component j The combination of (1) is the weight value W of the ith environmental factor to be determined i Represented by the formula:
Figure BDA0003780591280000041
and 5: performing Logistic regression on different main components;
and 6: performing Logistic regression on the main influence factors;
the Logistic regression analysis is a generalized linear regression analysis model, is used for researching classification variables with dependent variables of two classifications or the occurrence rate of certain events, is commonly used for exploring certain disaster-causing factors and predicting the probability of disaster occurrence according to dangerous factors, and the geological disaster occurrence is the result of comprehensive influence of precipitation and geological disaster susceptibility factors, and is suitable for probability prediction by using the Logistic model;
the logistic regression equation is:
Figure RE-GDA0003922451230000051
p is the occurrence probability, x is the related independent variable, two methods are used in Logitics regression, the independent variable is the first several of 6 factors directly used for representing the easiness of geological disaster and precipitation, the independent variable is the main component obtained by analysis, beta is the coefficient corresponding to each x, and p is the numerical value between 0 and 1 because the numerator is smaller than the denominator and the exponential function value is positive.
And 7: and (5) checking parameter comparison.
Taking a range of 20km around a county as a prediction value extraction area, giving warning of a level higher than yellow of a geological disaster when the probability of occurrence of the geological disaster is greater than 60%, releasing the warning to the society, taking the maximum value and the average value of the probability in a disaster range as inspection parameters for realizing the quantization of the inspection, taking the maximum value and the average value of the probability in the disaster range as the hit when the maximum probability value of the surrounding area is greater than or equal to 60%, and the missing report when the maximum probability value is less than 60%, and expressing the hit rate and the missing report rate according to a formula as follows:
hit rate:
Figure BDA0003780591280000052
and (4) the rate of missing report:
Figure BDA0003780591280000053
in the formula, NA is the correct prediction times, and NC is the missed prediction times.
And 8: and obtaining an optimal Logistic probability model.
The patent method is applied to the Central China as a research area.
1. Introduction to the area of investigation
The China is east-west, south-north transition and complex and changeable in climate, annual precipitation amount is not uniformly distributed, and gradually decreases from north to south, wherein most of river-south precipitation is 600-1000 mm, most of Hubei precipitation is 800-1400 mm, most of Hunan precipitation is more than 1400 mm, and extreme precipitation is frequently generated in summer. In China, continuous overcast and rainy rain in spring, heavy rain in summer, autumn rain in western China and the like often occur, so that the geological disasters caused by rainfall are mainly characterized in that the rainfall occurs all the year round and are the most serious in summer, particularly in the west of the country, and the number of the geological disasters is next to that of Sichuan nationwide. The central China generally includes Henan, hubei and Hunan provinces, and from the viewpoints of similar geological disasters and influence ranges, the research range is defined as Hubei, hunan and Jiangxi.
After the researched disaster point and the geological disaster susceptibility are superposed, as shown in fig. 3, the information quantity of the representation susceptibility in the central China is mostly distributed between 0.0 and 0.6. The areas where geological disasters occur most densely are the Jiangxi junction of Hu, the northwest of the Hunan and the southwest of the Hunan and one area of the southeast of the Hunan.
2. Establishment of probability model of geological disasters
2.1 principal component analysis of geological disasters and major impact factors
And 6 factors of the geological disaster incidence degree and the characteristic precipitation which are related to the occurrence of the geological disaster are selected for analysis. The precipitation factor selects the current-day precipitation, the early-stage effective precipitation, the precipitation day number, the longest continuous precipitation amount and the maximum precipitation amount in the first 3 days, which represent the precipitation amount and the precipitation continuity. Logistic regression was performed to ensure independence of variables, so principal component analysis was performed first. The dependent variable also has differences in attributes and magnitude, and the PCA analysis needs to be performed after the raw data is subjected to the normalization processing. In addition, in order to ensure the continuity of the factors, the factors related to the precipitation are normalized, and the main analysis results are shown in table 1.
TABLE 1 principal component contribution rates and characteristic roots
Figure BDA0003780591280000061
The result shows that the first principal component interpretation degree (the proportion of data difference that each principal component can interpret) reaches 56%, the interpretation degree of the first 3 variables reaches 87%, and the interpretation degree of the first 4 variables reaches 93%, so that the first 4 principal components can basically reflect the information of all indexes. 3 principal components are proposed according to a cumulative variance contribution rate of 80% (variance contribution rate is the proportion of variation caused by a single common factor to the total variation, and the influence of the common factor on dependent variables is shown), and variables with higher factor load are main indexes influencing the occurrence of geological disasters (table 2).
TABLE 2 first 3 principal component load factors
Figure BDA0003780591280000062
The result shows that the first main component mainly comprises early-stage effective precipitation, current-day precipitation and the longest continuous precipitation day number, the second main component mainly comprises current-day precipitation, easy-to-send degree, early-stage effective precipitation and the longest continuous precipitation day number, the third main component comprises easy-to-send degree, and finally the index system for determining whether the early-stage effective precipitation, the current-day precipitation, the easy-to-send degree and the longest continuous precipitation day number are geological disasters in the central and south regions is adopted.
2.2Logistic regression test
At present, disaster situation data provides samples of disaster occurrence, and the samples already have the occurrence, but the samples which do not occur are needed for constructing logistic modeling, and the effect of the model has a great relationship with the selection of 0 value. According to the research, a value 0 is selected according to the conditions that no precipitation disaster occurs, no precipitation occurs before precipitation, and the early-stage precipitation is extracted similarly to the precipitation at a disaster point after the de-weight inspection. After the model sample was thus determined, 80% of the sample was subjected to modeling, and 20% was used for examination. In order to select the method and parameters with better effect, the main principal components and the main variables are respectively regressed to carry out comparative tests, and the test design is shown in table 3.
TABLE 3 Logistic regression test design
Figure BDA0003780591280000071
After fixing the samples, the tests are respectively carried out, the accuracy rates are respectively calculated to obtain the table 4, and the accuracy rate of selecting three factors in the test B is lower, while the accuracy rates of the tests A and C are relatively better. When two factors are selected, the model established by the principal components has higher accuracy, and when 3 or 4 factors are selected, the accuracy difference of the two methods is not large, but when the first 4 principal components are selected in a variable Logistic regression mode, the effect is relatively better, and the accuracy can reach 0.812.
Figure BDA0003780591280000072
The ROC curves are drawn by selecting the tests B and C, and it can be found from FIG. 4 that the Logistic regression score of the main variable directly used in the test B is better than the test data production result of the main component variable, while the difference between the test C and the main component variable is not large, and the area contained by the ROC of the main component variable is 0.915, which is relatively better.
In order to eliminate the influence of sample selection on the test, the parameters are determined by five-fold cross validation using the first 4 variables and the first 4 main factors of the principal component analysis, the accuracy of the principal component variable regression is 0.818, and the accuracy of the main factor regression is 0.804. Comprehensively considering, or modeling by using the variables of the principal component analysis, wherein the modeling formula and the main parameters are as follows:
Figure BDA0003780591280000073
B=-a+bX 1 +cX 2 +dX 3 +eX 4 wherein a is intercept, b, c, d, e and regression coefficient are [ -0.092,1.085,0.979, 0.188, -1.153]. X1-X4 are respectively the 1 st to 4 th main components.
TABLE 4 principal component variable Logistic regression test sample precision ratio, recall ratio, F1 score
Figure BDA0003780591280000074
Figure BDA0003780591280000081
Precision (precision), i.e. how accurate the prediction is among the samples that are predicted to be positive, and recall (recall), i.e. how accurate the prediction is among the samples that are truly positive. The F1 value is a harmonic mean of accuracy and recall, and when both accuracy and recall are high, the F1 value will also be high, with F1 reaching the optimum at 1, and worst 0. It can be seen from the calculation that the precision ratio of the regression model which does not occur is higher than the precision ratio, and the average value is 0.818, and the precision ratio of the regression model which occurs geological disaster is higher than the precision ratio, and the average value is 0.817, but the inspection result is based on the sample inspection, and the long-time series should be inspected in practical application.
And taking rainfall in the previous 15 days, forecast rainfall and geological disaster incidence as input, establishing a probability model by using Logistic regression statistical parameters, and making a geological disaster probability forecast product with the resolution of 5km and 24 hours within 72h every day.
3.2019 inspection of probability model product of geological disasters in mid-south area
And extracting the maximum probability value and the average value corresponding to the geological disaster in 2019, and averaging the probability values of the plurality of disasters at the same point to obtain a distribution diagram of the geological disaster point probability model. From the distribution of the maximum probability, the large-value area of the probability forecast of the disaster point is mainly distributed in the west of Hubei, the central south of Hunan, the west of Hunan, the south of Hunan and the central north of Jiangxi, the probability value is generally 65% -95%, and the area with the lower probability value is mainly distributed in the east of Hubei, the west of Jiangxi and the east of Hunan.
From the average probability of surrounding disaster points, the probability value is lower than the maximum probability by 20-40%, the large value areas are mainly distributed in the Hunan and the Jiangxi, and in addition, the probability value is 10-40%, and the distribution is not obviously regular. The average probability is greatly different from the maximum probability, and the average probability and the maximum probability are very different, so that the geological conditions are possibly complex within a range of 20km around a disaster point, the geological disaster is easy to occur, and the reference can be made in the refined geological disaster weather forecast.
The analysis of the inspection results of the three provinces in China shows that the hit rate of the three provinces is more than 52%, wherein the highest hit rate in Hunan is 63%, the missing report in Hunan is at least 37%, and the Hunan and the Jiangxi are relatively similar. In addition, the definition of the forecast probability of 50-60% can forecast the cases with the ratio of north Hu to south Hu being higher and 12%, and the cases can be forecasted and promoted under certain conditions. As can be seen, the possible forecast of the three provinces in China can reach more than 60%, and the method has better guiding significance.
TABLE 5.2 forecasting effect of geological disaster probability in the south-middle area of 5-10 months in 2019 (unit:)
Figure BDA0003780591280000082
The above is the result of model back-calculation by using quantitative precipitation estimation, and it can be seen that the hit rate in the three provinces in china is generally greater than 52%, and the highest hit rate in Hunan is 63%. However, the test is performed based on actual precipitation, the previous precipitation and the forecast precipitation are further calculated back, and the geological disaster model forecast result of precipitation and Logistic regression is counted in the national meteorological center to test, so that a graph 7 is obtained, and as can be seen, the hit rate of the model is 51%, and is improved by 36% compared with the hit rate of a Logistic statistical model 15% for site forecast.
Referring to fig. 8, the method is already applied to the central weather station in 2019 in 6 months, and has the advantages of stable operation, high forecasting precision and good application effect.
4 effects of this patent
(1) The method utilizes a principal component method to analyze 7 relevant factors of geological disaster occurrence, the first 4 principal components can reach 93 percent, factor load results of the principal components show that effective rainfall in the early stage, rainfall in the same day, susceptibility and the longest continuous rainfall day are main influence factors of whether geological disaster occurs in the central and south areas.
(2) And 2-4 main component and general variable comparison tests are respectively carried out on the fixed sample, and the accuracy of selecting three factors is lower, and the selection of 4 factors is relatively better. The accuracy rate of selected principal component analysis and direct variables is not greatly different, and the first 4 principal components are subjected to Logistic regression, so that the effect is relatively better. Five-fold cross validation of 4 variables shows that the accuracy of the principal component analysis variable is improved by 0.014. And comprehensively considering, obtaining Logistic regression parameters from the previous 4 main components for modeling.
(3) The probability model result of the geological disaster is calculated back by utilizing quantitative precipitation QPE (quantitative precipitation estimation) and QPF (quantitative precipitation forecast), and the hit rate of the three provinces in China is generally higher than 52 percent and the highest hit rate of the Hunan province is 63 percent. And the geological disaster model forecast results of the statistics of precipitation and Logistic regression in the national meteorological center are compared, and the hit rate is improved by more than 30%.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (7)

1. A geological disaster probability forecasting method based on principal component and Logistic analysis is characterized by comprising the following steps: the method comprises the following steps:
step 1: extracting a disaster point rainfall factor from a rainfall and information amount database;
and 2, step: extracting the probability of geological disasters;
and step 3: establishing a binary data series according to the disaster point rainfall factor and the easy-to-send information amount, wherein the binary data series establishing process is to push the occurrence date of a part of disaster data to the date of no occurrence of disaster, determine a sample of no occurrence of disaster, and simultaneously remove the sample of the occurrence of disaster;
and 4, step 4: obtaining principal components by a principal component analysis method, and determining main influence factors;
and 5: performing Logistic regression on different main components;
step 6: performing Logistic regression on the main influence factors;
and 7: dividing test data into training data and inspection data, applying the training data, establishing a model through steps 5 and 6, applying the model to forecast the inspection data, and inspecting the accuracy of a real situation, wherein inspection parameters comprise accuracy and an ROC curve;
and 8: and comparing the accuracy of the test data of the established model with the ROC curve to obtain the optimal Logistic probability model.
2. The principal component and Logistic-based analytical geological disaster probability forecasting method according to claim 1, wherein the method comprises the following steps: in the step 1, the rainfall factor of the disaster point selects 6 factors of the easiness in occurrence degree of the geological disaster and the representation of rainfall, and the 6 factors of the representation of the rainfall selects the current-day rainfall, the early-stage effective rainfall, the rainfall days, the longest continuous rainfall and the maximum rainfall of the previous 3 days, which represent the rainfall and the rainfall continuity.
3. The principal component and Logistic-based analytical geological disaster probability forecasting method according to claim 2, wherein the method comprises the following steps: before the step 3, firstly, in order to ensure the continuity of the factors, normalization processing is performed on 6 related factors for representing precipitation, and secondly, the probability of occurrence of the geological disaster and the 6 factors for representing precipitation have differences in attributes and magnitude, and data standardization processing needs to be performed on the 6 factors for representing the probability of occurrence of the geological disaster and precipitation.
4. The principal component and Logistic-based analytical geological disaster probability forecasting method according to claim 3, wherein: the normalization processing process comprises the following steps: cubic root development was performed on 6 relevant factors characterizing precipitation;
the specific process of data standardization processing is as follows: the data normalization method is characterized in that the original data are subjected to subtraction of a data mean value and then are divided by a square difference or a standard deviation, the processed data conform to standard normal distribution, namely the mean value is 0, the standard deviation is 1, and the conversion function is as follows: y '= (y-mu)/sigma), wherein y' represents the normalized data after processing, y is the raw data before processing, mu is the average value of the raw data of the geological disaster susceptibility and 6 factors representing precipitation, and sigma is the variance or standard deviation of the raw data of the geological disaster susceptibility and 6 factors representing precipitation.
5. The method for forecasting probability of geological disaster based on principal component and Logistic according to claim 3, wherein: the principal component analysis method is a statistical method, original variables are recombined into a group of new several independent comprehensive variables, and several fewer comprehensive variables are taken out according to actual needs to reflect the information of the original variables as much as possible, a group of variables possibly having correlation are converted into a group of linearly independent variables through orthogonal transformation, the group of converted variables are called principal components, and the method is a method for reducing dimensions mathematically, and the specific process of step 4 is as follows:
researching the relation between the occurrence of geological disaster and the incidence of precipitation and geological disaster, taking principal components according to the accumulated contribution rate to ensure that the characteristic value is more than 1, selecting k principal components, and establishing a principal component F k Represented by the following formula:
F k =a 1k Z 1 +a 2k Z 2 +…+a ij Z i …+a pk Z p (1)
wherein Z is i Is the normalized value of the original variable, a ij The factor score coefficient of the factor i in the jth principal component, i.e. the contribution of the ith factor to the jth principal component, a is described ij Contribution rate E of variance corresponding to jth principal component j The combination of (1) is the weight value W of the ith environmental factor to be determined i Represented by the following formula:
Figure FDA0003780591270000021
6. the principal component and Logistic-based analytical geological disaster probability forecasting method according to claim 3, wherein: the Logistic regression analysis is a generalized linear regression analysis model, is used for researching classification variables with dependent variables of two classifications or the occurrence rate of certain events, is commonly used for exploring certain disaster-causing factors and predicting the probability of disaster occurrence according to dangerous factors, and the geological disaster occurrence is the result of comprehensive influence of precipitation and geological disaster susceptibility factors, and is suitable for probability prediction by using the Logistic model;
the logistic regression equation is:
Figure FDA0003780591270000022
p is the occurrence probability, x is the related independent variable, two methods are used for Logitics regression, the independent variable is the first several of 6 factors directly used for representing the easiness degree of geological disasters and precipitation, the independent variable is the main component obtained by analysis, beta is the coefficient corresponding to each x, and p is the numerical value between 0 and 1 because the numerator is smaller than the denominator and the exponential function value is positive.
7. The method for forecasting probability of geological disaster based on principal component and Logistic according to claim 3, wherein: the step 7 comprises the following checking processes: taking a county as a unit for statistics of disaster situations, taking a range of 20km around the county as a forecast value extraction area, giving early warning for the geological disaster at a level higher than yellow when the probability of occurrence of the geological disaster is greater than 60%, releasing the geological disaster to the society, taking the maximum probability value and the average value in a disaster situation range as test parameters in order to realize quantification of the test, and when the maximum probability value of a peripheral area is greater than or equal to 60%, giving a hit, and when the maximum probability value of the peripheral area is less than 60%, giving a missed report, and expressing the hit rate and the missed report rate according to a formula as follows:
hit rate:
Figure FDA0003780591270000023
and (4) the rate of missing report:
Figure FDA0003780591270000024
in the formula, NA is the correct prediction times, and NC is the missed prediction times.
CN202210928334.XA 2022-08-03 2022-08-03 Geological disaster probability forecasting method based on principal component and Logistic analysis Active CN115510945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210928334.XA CN115510945B (en) 2022-08-03 2022-08-03 Geological disaster probability forecasting method based on principal component and Logistic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210928334.XA CN115510945B (en) 2022-08-03 2022-08-03 Geological disaster probability forecasting method based on principal component and Logistic analysis

Publications (2)

Publication Number Publication Date
CN115510945A true CN115510945A (en) 2022-12-23
CN115510945B CN115510945B (en) 2024-05-28

Family

ID=84501218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210928334.XA Active CN115510945B (en) 2022-08-03 2022-08-03 Geological disaster probability forecasting method based on principal component and Logistic analysis

Country Status (1)

Country Link
CN (1) CN115510945B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980451A (en) * 2023-09-25 2023-10-31 北京智城联合科技发展有限公司 Urban safety protection early warning platform system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144651A (en) * 2019-12-26 2020-05-12 杭州鲁尔物联科技有限公司 Geological disaster prediction method, device and equipment
CN111539904A (en) * 2020-05-13 2020-08-14 成都理工大学 Rainfall-based disaster vulnerability prediction method
CN112347701A (en) * 2020-11-27 2021-02-09 西安交通工程学院 Landslide occurrence probability and scale forecasting method
CN112819207A (en) * 2021-01-19 2021-05-18 武汉中地云申科技有限公司 Geological disaster space prediction method, system and storage medium based on similarity measurement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144651A (en) * 2019-12-26 2020-05-12 杭州鲁尔物联科技有限公司 Geological disaster prediction method, device and equipment
CN111539904A (en) * 2020-05-13 2020-08-14 成都理工大学 Rainfall-based disaster vulnerability prediction method
CN112347701A (en) * 2020-11-27 2021-02-09 西安交通工程学院 Landslide occurrence probability and scale forecasting method
CN112819207A (en) * 2021-01-19 2021-05-18 武汉中地云申科技有限公司 Geological disaster space prediction method, system and storage medium based on similarity measurement

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙德亮: "基于机器学习的滑坡易发性区划与降雨诱发滑坡预报预警研究", 《中国博士学位论文全文数据库 (基础科学辑)》, no. 9, pages 162 - 165 *
李凤升 等: "基于主成分分析法的中国石油安全度评价", 《中国石油大学学报(社会科学版)》, vol. 29, no. 3, pages 9 - 12 *
陈曙东: "基于Logistic回归算法的滑坡预报模型", 《微处理机》, vol. 42, no. 3, pages 35 - 38 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980451A (en) * 2023-09-25 2023-10-31 北京智城联合科技发展有限公司 Urban safety protection early warning platform system
CN116980451B (en) * 2023-09-25 2023-12-22 北京智城联合科技发展有限公司 Urban safety protection early warning platform system

Also Published As

Publication number Publication date
CN115510945B (en) 2024-05-28

Similar Documents

Publication Publication Date Title
Casati et al. Forecast verification: current status and future directions
Katipoğlu et al. Comparison of meteorological indices for drought monitoring and evaluating: a case study from Euphrates basin, Turkey
CN113344471B (en) Method for representing weather environment adaptability of aircraft system
CN111045117A (en) Climate monitoring and predicting platform
CN111275341B (en) High arch dam valley amplitude deformation analysis method based on lasso and random forest
CN111079999A (en) Flood disaster susceptibility prediction method based on CNN and SVM
CN110569566A (en) Method for predicting mechanical property of plate strip
CN113962489A (en) PM2.5 concentration fine-grained prediction method based on ST-CCN-PM2.5
CN115510945A (en) Geological disaster probability forecasting method based on principal component and Logistic analysis
CN104915563A (en) Fresh water chronic standard prediction method based on metal quantitative structure-activity relation
CN112907113B (en) Vegetation change cause identification method considering spatial correlation
CN114548493A (en) Method and system for predicting current overload of electric energy meter
CN112800540B (en) Aircraft engine load spectrum task segment modeling method based on Gaussian process regression
CN111340645A (en) Improved correlation analysis method for power load
Lemos et al. thresholdmodeling: A Python package for modeling excesses over a threshold using the Peak-Over-Threshold Method and the Generalized Pareto Distribution
CN116756505B (en) Photovoltaic equipment intelligent management system and method based on big data
CN110991878A (en) Evaluation method for conducting crowd environment risk perception standardization measurement based on Lekter scale
CN117078077A (en) Ecological vulnerability evaluation method for expressway road domain
CN111507514A (en) Atmospheric aerosol data prediction method
CN113960700B (en) Objective inspection, statistics and analysis system for regional numerical forecasting result
Zenzerović Business′ Financial Problems Prediction-Croatian Experience
CN115630337A (en) Quantitative evaluation method and system for extreme rainfall attribution based on large-scale climate remote correlation
CN111861141B (en) Power distribution network reliability assessment method based on fuzzy fault rate prediction
CN110263069B (en) Method and system for extracting and depicting implicit factors of time sequence characteristics of new energy use behaviors
CN113720952A (en) Method, device, equipment and medium for generating chart for reservoir interpretation and evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant