CN111639878A

CN111639878A - Landslide risk prediction method and system based on knowledge graph construction

Info

Publication number: CN111639878A
Application number: CN202010516705.4A
Authority: CN
Inventors: 马连博; 王经纬; 王兴伟; 朱万成; 张鹏海
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2020-09-08
Anticipated expiration: 2040-06-09
Also published as: CN111639878B

Abstract

A landslide risk prediction method and system based on knowledge graph construction comprises a training unit and a risk prediction unit, wherein the training unit comprises: the system comprises a data acquisition module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging module; the method comprises the following steps: acquiring landslide events occurring in a research area to form a basic attribute data set; processing data; constructing a positive sample set; constructing a negative sample set to generate a new basic attribute data set; training a feature combination model by using the new basic attribute data set, and performing feature combination on the feature vectors to generate 39-dimensional combined feature vectors; generating a feature vector of a relation in the knowledge graph; constructing a triple data set; training a knowledge graph model using the triplet data set; and predicting the landslide occurrence probability of the to-be-detected place, constructing a knowledge graph and applying the knowledge graph to landslide risk prediction to obtain higher landslide prediction accuracy.

Description

Landslide risk prediction method and system based on knowledge graph construction

Technical Field

The invention belongs to the technical field of landslide risk prediction, and particularly relates to a landslide risk prediction method and system based on knowledge graph construction

Background

Landslide is a common natural geographic disaster and brings great threat to the life and property safety of people. How to predict the risk of occurrence of a landslide event at a certain place is an important work and one of the directions of research caused by professionals. The landslide forecasting task is divided into space forecasting and time forecasting. The site of the landslide event usually has some unique geographic geological factors, and the research on the geographic geological factors can be used for searching the environmental conditions when the landslide event occurs so as to predict the occurrence probability of the landslide event at one site. In recent years, researchers have proposed many prediction methods, such as: calculating the probability of occurrence of the landslide event after various factors are superposed by using a Bayesian probability calculation formula; predicting the occurrence probability of the landslide event by using probability prediction models such as a support vector machine and a likelihood ratio model; by utilizing an artificial neural network method, the internal numerical relation and the like of landslide events caused by various factors are automatically searched, and the probability of landslide occurrence is predicted according to the attribute values.

The landslide risk prediction is carried out by applying a probability type model (such as a Bayes probability calculation method, a support vector machine, a likelihood ratio model and the like), and some problems exist: the influence degree of various factors on the occurrence of the landslide event needs to be determined by an artificial assumed method or cannot be determined at all, so that the accuracy of the landslide risk prediction result is finally influenced. The method of applying the artificial neural network only considers the influence factors of the neural network for automatically learning each factor and does not consider the mutual relation among the factors. Therefore, at present, no reasonable and effective method with higher accuracy is available for landslide risk prediction.

Knowledge graph construction belongs to the field of natural language processing, and represents the incidence relation among things existing in the real world, and expresses the incidence relation existing in the objective world in the form of a graph so as to present all the visualized and hidden relations existing objectively. In the knowledge graph, entities are used for representing a specific object which is in the real world and is in guest existence, and relationships are used for representing the association between the entities. The knowledge graph based on representation learning is a mode for constructing the knowledge graph, and the representation learning means that text information is represented in a vector form, the association relation between words is converted into the relation between vectors, different words are distinguished by the vectors, and language meaning expressed by the words can be well represented. The method comprises the steps of representing each relation in the knowledge graph into a head entity, a relation and a tail entity based on the knowledge graph representing learning, replacing the head entity, the relation and the tail entity with the characteristic vectors corresponding to the head entity, and identifying a correct relation by establishing a model according to a target so as to build the knowledge graph.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a landslide risk prediction method and a landslide risk prediction system based on knowledge graph construction, belongs to the aspect of space prediction and forecast, and predicts the landslide risk by using a machine learning method and a construction idea of the knowledge graph. Specifically, a method and a system for predicting the occurrence probability of a landslide event through attribute values described for local geographic conditions based on a knowledge graph construction thought representing learning are designed. The method combines the construction idea of knowledge graph based on representation learning and the task of landslide risk prediction, and judges the possibility of landslide event at one place by judging the relation and analogy between the places described by different geographic conditions.

A landslide risk prediction method based on knowledge graph construction comprises the following steps:

step 1: obtaining landslide events occurring in a research area, and respectively counting attribute values of local related geographic geological factors when each landslide event occurs, wherein the counted attributes comprise: front and back edge elevation, sliding mass material volume, sliding mass average thickness, sliding mass material composition, sliding bed lithology, rock stratum attitude, slope direction, slope type, slope form and slope gradient; the value corresponding to the attribute is an attribute value, each landslide event corresponds to a group of data groups containing the 10 geological factor data attribute values, and all the data groups corresponding to all the landslide events form a basic attribute data set;

step 2: processing data in the basic attribute data set;

according to the characteristics of each attribute, processing each attribute value to express the font attribute value into a numerical value form, and splitting or combining the numerical attribute values with a plurality of numerical values, wherein the specific method comprises the following steps:

front and rear edge elevation attribute values: two numerical terms are contained as two basic characteristic values;

volume property value of sliding mass: containing a numerical term as a basic characteristic value;

slide average thickness property value: the method comprises the following steps of (1) taking an average value as a basic characteristic value, wherein the average value comprises two numerical items;

sliding body material composition attribute value: the attribute values of the character type are 20 types in total, the number is 1-20, and the number represents the represented classification;

sliding bed lithology attribute value: the attribute values of the character type are 13 types in total, the numbers are 1-13, and the numbers are used for representing the represented classification;

the formation attitude attribute value: two numerical terms are contained as two basic characteristic values;

slope direction attribute value: containing a numerical term as a basic characteristic value;

slope type attribute value: the attribute values of the character type are 3 types in total, the number is 1-3, and the number represents the represented classification;

slope form attribute value: the attribute values of the character type are 4 types in total, the numbers are 1-4, and the numbers are used for representing the represented classification;

slope gradient attribute value: two numerical terms are contained as two basic characteristic values;

processing all data groups in the basic attribute data set according to the processing mode of the attribute values of the geological factors, wherein each newly obtained data group has 13 basic characteristic values;

and step 3: recording each landslide event counted in the step 1 as an entity in the knowledge graph, and naming the entities as p₁,p₂,…,p_x,…,p_y,…,p_nThese entities representing landslide events are defined as positive samples, and a positive sample set is formed by the positive samples, and is denoted as pos ═ p₁,p₂,…,p_x,…,p_y,…,p_nRecording a data set in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set;

and 4, step 4: setting the composition of the sliding body material as an independent variable, and setting the attribute value of the geological factor having direct correlation with the independent variable: setting the front and rear edge elevation attribute value, the slider material volume attribute value, the slider average thickness attribute value, the slope gradient attribute value and the slope type attribute value as dependent variables, grouping data groups corresponding to the attribute values of the same slider material composition in the data set, and counting the value range of each dependent variable by taking the group as a unit, wherein the following description is respectively given to the statistical method of the value range of each dependent variable:

front and rear edge elevation attribute values: the method comprises the steps that two numerical values respectively represent a front edge elevation and a rear edge elevation, the data sets are provided with data sets of the same sliding matter, the maximum value and the minimum value of all the numerical values are counted, the front edge elevation value and the rear edge elevation value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;

volume property value of sliding mass: counting the maximum value and the minimum value in the volume value, wherein the counting result is a value range;

slide average thickness property value: counting the maximum value and the minimum value in the average thickness values of the sliding body, wherein the counting result is a value range;

slope gradient attribute value: the method comprises the steps that two numerical values are provided, the maximum slope gradient and the minimum slope gradient of a slope are respectively represented, the data set is provided with a same sliding body material, the maximum value and the minimum value of all the numerical values are counted, the maximum slope value and the minimum slope value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;

slope type attribute value: the data are attribute values of a character type, 3 types are provided, namely a forward slope, an oblique slope and a reverse slope, the data are represented by numbers 1-3 after the step 2 is executed, if the slope types in all the data groups are the same, the number of the type is recorded, and otherwise, the data are not recorded;

and 5: respectively and correspondingly generating a group of new data groups corresponding to the landslide events counted in the step 1 according to the value range counted in the step 4 and the related geographic geological factor attribute value counted in the step 1, wherein the data groups corresponding to the landslide events are respectively corresponding to the data groups, the landslide events are also marked as an entity in the knowledge graph, and the entities are named as s₁,s₂,…,s_x,…,s_y,…,s_nThese entities representing no-landslide events are defined as negative examples, which constitute a set of negative examples, denoted neg ═ s₁,s₂,…,s_x,…,s_y,…,s_nAnd recording each new data set as a feature vector of a corresponding entity in the negative sample set, wherein the specific method is as follows:

for each group of data corresponding to each group of data groups in the basic attribute data set, randomly selecting one of the five dependent variables counted in step 4, changing the selected dependent variable, not changing the attribute values corresponding to other attributes, generating a new data group, and explaining a processing mode after different dependent variables are selected as follows:

front and rear edge elevation attribute values: randomly generating two values in a range smaller than the minimum value, respectively representing the elevation of the front edge and the elevation of the rear edge, and replacing the two corresponding values in the original data set in the basic attribute data set;

volume property value of sliding mass: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;

slide average thickness property value: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;

slope gradient attribute value: randomly generating two values in a range smaller than the minimum value, respectively representing the maximum slope and the minimum slope of the slope, and replacing the two corresponding values in the original data set in the basic attribute data set;

slope type attribute value: after the statistics in step 4, if the value of the statistical item exists, the value is randomly replaced by one of the other two encoding values to replace the corresponding value in the original data group in the basic attribute data set; if the value of the statistical item does not exist, selecting one from the front edge and the rear edge, the material volume of the sliding body, the average thickness of the sliding body and the slope gradient again, and processing according to the requirement of the selected dependent variable attribute after selection;

according to the method, each data group in the basic attribute data set obtained in the step 1 correspondingly generates a new data group, and all the new data groups are added into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set;

step 6: training a feature combination model by using the data sets in the new basic attribute data set generated in the step 5, then performing feature combination on each data set by using the trained feature combination model, and after the feature combination, changing the dimension of each feature vector in the new basic attribute data set from the original 13 dimension to 39 dimensions, wherein the 39-dimensional vector is a combined feature vector of an entity;

the training characteristic combination model adopts a gradient random decision tree model;

and 7: generating a feature vector of a relation in the knowledge graph;

the knowledge graph constructed by the method has two relations: similar and dissimilar; randomly generating an initialization value of a feature vector of the relationship, wherein the dimension of the feature vector is the same as that of a combined feature vector of the entity and is 39;

if the relationship between the two entities is similar, the two entities are both landslide events or not landslide events, and if the relationship between the two entities is dissimilar, one of the two entities is a landslide event and the other is not landslide event;

and 8: constructing a triple data set;

each triple represents a relationship existing in the real world, the triple is denoted as (head entity, tail entity, relationship), and the relationship is represented between the head entity and the tail entity, and there are three ways for generating the triple sample:

randomly drawing an entity p from a positive sample set pos_xRandomly drawing an entity s in the negative sample set neg_yThe relationships are dissimilar, the triplet is (p)_x，s_yNot similar) or(s)_y，p_xNot similar);

two entities p are randomly taken from the positive sample set pos_x、p_yRelationships are similar, the triplet is (p)_x，p_ySimilar);

randomly taking two entities s from the negative sample set neg_x、s_yRelationships are similar, the triplet is(s)_x，s_ySimilar);

generating triple samples according to the method, wherein the number of samples with similar relations is equal to that of samples with dissimilar relations, the generation number of the triple samples with each relation is more than or equal to 1000, and mixing the samples to form a triple data set;

and step 9: constructing a knowledge graph model, and training the knowledge graph model by using the triple data sets;

step 9.1: inputting the triples and the error triples generated correspondingly to the triples into a knowledge graph model;

each triplet in the triplet data set is a correct triplet, and an error triplet is generated according to the correct triplet;

the method for generating the error triples is as follows:

if the head entity of the triple belongs to the positive sample set pos and the relationship is similar, randomly selecting one from the negative sample set neg to replace the original tail entity;

if the head entity of the triple belongs to the positive sample set pos and the relation is dissimilar, randomly selecting one from the positive sample set pos to replace the original tail entity;

if the head entity of the triple belongs to the negative sample set neg and the relationship is similar, selecting one from the positive sample set pos to replace the original tail entity;

if the head entity of the triple belongs to the negative sample set neg and the relation is not similar, selecting one from the negative sample set neg to replace the original tail entity;

inputting the two triples into a knowledge graph model;

step 9.2: calculating the error distance to obtain a prediction result

Calculating the error distance of two triples by a knowledge graph model, wherein the error distance of the triples is obtained by carrying out vector operation on the combined feature vector and the relation feature vector of two entities in the triples, and the specific calculation formula is as follows:

dis＝|h*w+r-t*w| (1)

wherein w is a 39-dimensional weight vector, the initialized value of which is randomly generated, h represents the combined feature vector of the head entity in the triplet, t represents the combined feature vector of the tail entity in the triplet, r represents the feature vector of the relationship in the triplet:

then, comparing the error distances of the two triples to obtain a prediction result: the knowledge graph model predicts the triples with small error distances as correct triples;

step 9.3: calculating loss value

Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; respectively calculating the error distance dis of the correct triple through the formula (1)_posDistance of error dis from the wrong triplet_negIf the error distance of the correct triplet is smaller than that of the wrong triplet, the result predicted by the knowledge spectrum model is correct and is not consideredThere is a resulting loss value; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:

loss＝max{dis_pos-dis_neg,0} (2)

where loss represents the loss value, dis_posError distance, dis, representing the correct triplet_negRepresenting the error distance of the erroneous triplet;

step 9.4, automatically adjusting parameters of the knowledge graph model;

repeating the steps 9.1 to 9.4 for each triple sample in the triple data set obtained in the step 8 until the loss value is not reduced any more, finishing training and finishing the construction of the knowledge map model;

after the training of the knowledge graph model is finished, the relationship between a pair of entities is judged by the knowledge graph model, and the judgment method comprises the following steps: inputting the combined feature vector of the pair of entities into a knowledge graph model, and respectively calculating the relationship between the pair of entities and two relations by the knowledge graph model according to a formula (1): and comparing the error distances between similarity and dissimilarity, wherein the small error distance is the relation between the two entities predicted by the knowledge graph model.

Step 10: predicting the probability of landslide of a to-be-detected place by using a trained knowledge graph model, wherein the to-be-detected place belongs to the research area in the step 1;

obtaining attribute values of corresponding attributes of the to-be-detected place according to the counted attributes of the geographic geological factors in the step 1; the step of determining the probability of occurrence of landslide at the location is as follows:

step 10.1: processing the attribute values of the set of places to be tested according to the processing mode of each attribute value in the step 2 to obtain a feature vector of the new entity at present;

step 10.2: performing feature combination on the feature vector of the new entity by using the feature combination model obtained by training in the step 6 to obtain a combined feature vector of the new entity;

step 10.3: and respectively forming entity pairs by the new entity and all entities in the positive sample set, transmitting the combined characteristic vectors of the entity pairs and all the entities in the positive sample set to a knowledge graph model, giving the relationship between the new entity and all the entities in the positive sample set by the knowledge graph model, counting the number of similar relationships and marking as a, the number of entities in the positive sample as n, and determining the ratio a/n as the probability of landslide of the place to be detected.

A system for realizing a landslide risk prediction method constructed based on a knowledge graph comprises the following steps: a training unit and a risk prediction unit;

the training unit is used for constructing a knowledge graph model, training the model by using the existing historical landslide data and obtaining the knowledge graph model for judging the relationship; and the risk prediction unit calculates and obtains the probability of landslide of the place according to the attribute value of the relevant geographic condition of a new place in the same region by using the model and the historical landslide data obtained by the training unit.

Wherein the training unit comprises: the system comprises a data collection module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging model module;

the data collection module is used for inputting geographic geological attribute values related to landslide events in all landslide event records in a research area to form a basic attribute data set, and transmitting results to the data processing and generating module;

the data processing and generating module is used for processing data in the basic attribute data set, generating a positive sample set, and recording a data group in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set; respectively and correspondingly generating a new data set corresponding to the landslide event by the data set corresponding to each landslide event; generating a negative sample set, recording the characteristic vectors of the corresponding entities in the negative sample set in each new group of data sets, adding the characteristic vectors into the original basic attribute data set to generate a new basic attribute data set; the positive sample set and the negative sample set are transmitted to a triple data set construction module, and the new basic attribute data set is transmitted to a characteristic combination module;

the characteristic combination module is used for training a model for characteristic combination, then the characteristic combination model is used for carrying out characteristic combination on the characteristic vector of each entity, and the result is transmitted to the relation judgment model module;

the relation characteristic vector generation module is used for generating a characteristic vector of the relation in the knowledge graph, and transmitting the result to the relation judgment model module;

the ternary group data set building module is used for building a ternary group data set, and transmitting the result to the relationship judgment model module;

the relationship judgment model module is used for training a knowledge graph model for judging the relationship between the entities.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: a landslide risk prediction method and a landslide risk prediction system based on knowledge graph construction apply the knowledge graph construction technology to the field of landslide risk prediction, so that landslide risk prediction is not limited to a mode of constructing a probability model any more, and a probability value is finally calculated, and the method is a brand-new research direction. The influence degree of each relevant geographic geological factor when the landslide occurs is automatically searched from historical landslide data by utilizing a machine learning method, the influence weight of each factor is not determined artificially, a characteristic combination model is introduced, and compared with a method for simply applying an artificial neural network in the prior art, the nonlinear relation among the factors is considered, so that the influence of each factor on the landslide event is further searched, and the landslide prediction accuracy can be higher by applying the scheme.

Drawings

FIG. 1 is a schematic diagram illustrating vector representation when a correct relationship is represented based on a head entity feature vector, a relationship feature vector, and a tail entity feature vector in a knowledge graph for representation learning according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of landslide events in the form of a knowledge graph in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of a landslide risk prediction method based on knowledge graph construction in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a feature combination method used in the present invention according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of an operation of predicting a landslide risk at a location in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of a landslide risk prediction system constructed based on a knowledge graph in an embodiment of the invention

Detailed Description

In the field of landslide risk prediction, landslide risk prediction in space is more important than landslide risk prediction in time, and the position where landslide is likely to occur cannot be determined, i.e., landslide occurrence time cannot be predicted. The research results in the aspect are not much for the prediction in the aspect of landslide risk space.

The main theoretical basis of the invention is as follows: according to the method for predicting the landslide risk in the research space aspect, the probability that landslide possibly occurs at a place needing to be predicted is judged by comparing the geographical geological conditions of the place where landslide has already occurred with the place needing to be predicted according to the existing landslide data.

Knowledge graph construction based on representation learning is a popular direction of current knowledge graph research, a TransE model is the most representative model, and other knowledge graph models based on representation learning are improved on the basis of the theory of the TransE model. The main idea of the TransE model is to represent the association between an entity and a relationship in the form of numerical computation and the relationship in the form of triples, assuming that such a relationship exists in each group of triples: the sum of the head entity feature vector and the relationship feature vector is equal to the tail entity feature vector, and the schematic diagram in the vector space is shown in fig. 1. Based on the assumption, the feature vectors of all head entities, relations and tail entities are trained through a machine learning method, and a vector representation which can better meet the assumption is obtained. The proposal of the TransE model initiates a new research method in the field of knowledge graph research, namely the construction of the knowledge graph based on the representation learning, and the method is proved to have good effect.

The entity in the knowledge graph refers to a specific object in the real world, and the knowledge graph can show the association relationship existing among the entities in the real world. Both landslide and non-landslide events can be identified as entities, the relationship between landslide and non-landslide events can be identified as dissimilar, the relationship between landslide and landslide events can be identified as similar, and fig. 2 is a schematic diagram of a knowledge graph with 8 entities e 1-e 8, wherein the connections between the entities are the relationships between the entities, whether similar or dissimilar. In combination with the above-mentioned conventional method for predicting the risk of landslide in space, that is, based on the existing landslide data, the probability that the landslide may occur at the place needing to be predicted is determined by comparing the geographical and geological conditions of the place where the landslide has occurred with the place needing to be predicted. With the precondition, the knowledge map idea can be combined with a landslide risk prediction task. A risk prediction system is built by processing the existing data and by means of a construction and training method of a machine learning model, so that a risk prediction task is completed.

The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown by way of illustration and not of limitation.

A landslide risk prediction method and a landslide risk prediction system constructed based on a knowledge graph are disclosed, wherein a specific flow chart is shown in figure 3, and the method comprises the following steps:

the study area was selected as the red layer typical of the Jurassic system in the three gorges reservoir region. The latitude in different areas, the climatic conditions and the living habits of people have great influence on the geological environment of the areas, and different areas may have different weighted values with the same attribute when landslide events occur, so the research area needs to be determined first.

Occurring in the areaZiguo guo river landslide, the related geographical geological attribute value of this landslide is recorded: the elevation of the front and rear edges is 135m-432m, and the volume of the sliding mass is 16000000m³The average thickness of the sliding body is 40m, the sliding body comprises layered quartz sandstone and sand-powder block cracked rock, the lithology of the sliding bed is J1x, the formation shape of the rock layer is 10 degrees ∠ 36 degrees, the slope direction is 340 degrees, the slope type is a forward slope, the slope shape is stepped, the slope gradient is 25 degrees to 45 degrees, and the data group corresponding to the guozi Taguo river sliding slope is (135 + 432,16000000 and 40 degrees, the layered quartz sandstone and the sand-powder block cracked rock, J1x, 10 degrees ∠ 36 degrees, 340 degrees, the forward slope is stepped and 25 degrees to 45 degrees.

Step 1, counting the values corresponding to the attributes in the data set, wherein some of the attributes of the numerical type include a plurality of numerical values, for example: the attribute "leading and trailing edge elevation" includes two numerical values representing the leading edge elevation and the trailing edge elevation, respectively.

Step 2: processing data in the basic attribute data set;

ziguo guo river landslide data set: (135-432,16000000,40, layered quartz sandstone, silty sand block cracking rock, J1x, 10-36 degrees, 340 degrees, consequent slope, step-shaped, 25-45 degrees), by the above treatment: splitting 135-plus 432 into 135 and 432,16000000 reserving an original value, 40 reserving an original value, converting the layered quartz sandstone and the silt block fractured rock into the number 6, converting J1x into the number 1, splitting 10 and 36 at an angle of 10 degrees, 340 reserving an original value, converting a forward slope into the number 1, converting the step shape into the number 1, splitting 25 and 45 at an angle of 25 degrees to 45 degrees, and finally converting into: (135, 432,16000000,40,6,1, 10, 36,340,1,1, 25, 45).

the landslide event is recorded as an entity in the knowledge graph and named as p₁And an entity p₁Added to the positive sample set pos.

the composition of the sliding mass substances is independent variable, 20 sliding mass substances are provided, one 'collapsing and stacking silty clay clamped rock' is selected as an example, a data set that the sliding mass substances in all the landslide events in the step 1 are 'collapsing and stacking silty clay clamped rock' is counted, and the value ranges of 5 dependent variables are shown in the following table:

slope type attribute value: after the statistics in step 4, if the value of the statistical item exists, the value is randomly replaced by one of the other two encoding values to replace the corresponding value in the original data group in the basic attribute data set; if the value of the statistical item does not exist, one of the front and rear edge elevations, the sliding body material volume, the sliding body average thickness and the slope gradient is selected again, and the selected value is processed according to the requirement of the selected dependent variable attribute.

According to the method, each data group in the basic attribute data set obtained in the step 1 correspondingly generates a new data group, and all the new data groups are added into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set.

For one landslide event counted in step 1: the geographic geological factor attribute data of the landslide of the Tenn von domestic wharf is (135-plus 250,3600000,15, collapse and slide accumulation silty clay clamping crushed rock, J2S, 315-degree angle 18 degrees, 310 degrees, forward slope, straight line shape, 20-20 degrees), and the landslide is processed in the step 2 to obtain (135, 250,3600000,15, 1, 8,315,18,310,1, 2,20, 20). The sliding matter is independent variable, the independent variable attribute value is 1, the number 1 corresponds to the collapse and accumulation silty clay clamped crushed rock, and the value range of the dependent variable attribute value counted in the step 4 is as follows:

randomly selecting one of the front and rear edge elevation values, the sliding body material volume value, the sliding body average thickness value, the slope type value and the slope gradient value.

If the leading and trailing edge height values are selected, two values are randomly generated in a range less than 70, and the generated values are 45, 52, resulting in a new data set (45, 52,3600000,15, 1, 8,315,18,310,1, 2,20, 20).

If a slider mass volume value is selected, a value is randomly generated in a range less than 2160000, the generated value is 1160000, and a new data set (135, 250,1160000,15, 1, 8,315,18,310,1, 2,20,20) is obtained.

If the slider mean thickness value is selected, a value is randomly generated in a range less than 8, and the generated value is 5, resulting in a new data set (135, 250,3600000,5, 1, 8,315,18,310,1, 2,20, 20).

If the slope type value is selected, 1 in the value range represents a forward slope, one of the other two slope type reverse breakages and the inclined slope is randomly selected, the reverse breakage is selected, and the corresponding number is 3, so that a new data set (135, 250,3600000,15, 1, 8,315,18,310,3, 2,20,20) is obtained.

If a ramp slope value is selected, two values are randomly generated in a range less than 8, the generated values are 1,5, and the resulting new data set is (135, 250,3600000,15, 1, 8,315,18,310,1, 2,1, 5).

The new data set is obtained in one of the five cases, the newly generated data set is used as a feature vector of an entity in the knowledge graph, the entity represents a non-landslide event and is named s for the entity₁And combining the entities s₁Added to the negative sample set neg.

this process is illustrated in FIG. 4, where data in the new base dataset is used to train a gradient random decision tree model. The random gradient decision tree model selected for application is one of feature combination models, combines a gradient enhancement framework with a decision tree algorithm, tries to combine different basic feature values and selects a most appropriate combination mode, and ensures that the process can be accurately and efficiently carried out. And after the gradient random decision tree model is trained, performing feature combination on each group of data corresponding to each entity in the new basic data set by using the trained model to obtain a combined feature vector of each entity.

And 7: generating a feature vector of a relation in the knowledge graph;

if the relationship between the two entities is similar, the two entities are both landslide events or not landslide events, and if the relationship between the two entities is not similar, one of the two entities is a landslide event and the other is not landslide event;

the knowledge graph based on representation learning needs vector operation when calculation is carried out, so the relationship in the knowledge graph is also represented in a vector form, and the dimension of the feature vector is the same as that of the combined feature vector representing the entity, namely 39. The initialization values of the feature vectors representing the relationships may be randomly generated according to a standard normal distribution.

The knowledge-graph may present the objective connections that exist in real-world things, if one entity represents a landslide event, then another entity representing a landslide event has a similar relationship to it, and another entity representing a non-landslide event has a dissimilar relationship to it. In the actual implementation process, a plurality of entities representing landslide events and a plurality of entities representing non-landslide events exist in the constructed knowledge graph, so that two relationships exist among the entities in the whole knowledge graph, namely similarity and dissimilarity.

And 8: constructing a triple data set;

the method for generating the error triples is as follows:

inputting the two triples into a knowledge graph model;

step 9.2: calculating the error distance to obtain a prediction result

dis＝|h*w+r-t*w| (1)

step 9.3: calculating loss value

Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; the error distances dis of the correct triples can be calculated respectively through the formula (1)_posDistance of error dis from the wrong triplet_negIf the error distance of the correct triple is smaller than that of the wrong triple, the prediction result of the knowledge graph model is correct, and no loss value is considered to be generated; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:

loss＝max{dis_pos-dis_neg,0} (2)

step 9.4, automatically adjusting parameters of the knowledge graph model;

after the training of the knowledge graph model is finished, the knowledge graph model can be used for judging the relationship between a pair of entities by the following method: inputting the combined feature vector of the pair of entities into a knowledge graph model, and respectively calculating the relationship between the pair of entities and two relations by the knowledge graph model according to a formula (1): and comparing the error distances between similarity and dissimilarity, wherein the small error distance is the relation between the two entities predicted by the knowledge graph model.

During the training of the knowledge-graph, it is necessary to calculate a loss value, representing an error value between a correct result and an incorrect result. And transmitting the loss value to the model in a reverse direction, and adjusting parameters by the model by using the loss value and combining a machine learning method. So that the parameters in the model are adjusted to a set of suitable values, and the loss value is minimized. The purpose of model training is to find the values of the parameters in the model when the loss value is minimized.

And (5) a loss value calculation process. Given a triplet (p)₁，s₂Dissimilar), p₁Belonging to a positive sample set pos with dissimilar relationship, randomly selecting an entity p from the positive sample set pos₄The replacement of the tail entity generates an erroneous triplet (p)₁，p₄Not similar). The two triplets are passed to a knowledge graph model, and the error distance of the two triplets is calculated inside the knowledge graph model.

Entity p₁The combined feature vector is h₁Entities s₂The combined feature vector is t₁The feature vectors with dissimilar relationships are r₁. Then the triplet (p)₁，s₂Dissimilar) error distance dis₁Can be calculated using equation (1). The same way can calculate the error triple (p)₁，p₄Dissimilar) error distance dis₂。

If dis₁Greater than dis₂I.e. triplets (p) calculated by the knowledge graph model₁，s₂Dissimilar) is greater than the error distance of the wrong triplet (p)₁，p₄Dissimilar), that is, the knowledge graph considers that the incorrect triplet is correct, which indicates that there is an error between the result predicted by the knowledge graph model and the actual result, and the loss value is calculated to be dis according to the formula (2)₁-dis₂。

If dis₁Is less than dis₂I.e. triplets (p) calculated by the knowledge graph model₁，s₂Dissimilar) is smaller than the error triplet (p)₁，p₄And dissimilar), that is, the knowledge graph considers that the correct triplet is correct, which indicates that the prediction result of the knowledge graph model is the same as the real result, and the error value is 0.

And (3) judging the relation between the two entities by using the trained knowledge graph model. Determination of entity p₅With entity p₆The relation between the entities p is obtained₅Is h₃Entity p₆The feature vector of is t₃The feature vectors with dissimilar relationships are r₁The feature vector with similar relationship is r₂Separately computing triplets (p) according to formula (1)₅，p₆Dissimilar) error distance dis₃And triplet (p)₅，p₆Similar) error distance dis₄If dis₃<dis₄Knowledge map model predicted p₅And p₆The relationship between them is dissimilar if dis₃>dis₄Knowledge map model predicted p₅And p₆The relationship between them is similar. In the specific implementation process, dis cannot be generated₃And dis₄The case of equality.

Step 10: and (3) predicting the probability of landslide occurrence of the to-be-detected place by using the trained knowledge graph model, wherein the to-be-detected place belongs to the research area in the step 1.

And predicting the probability of landslide at a certain place in the same region. This process is illustrated in fig. 5, and regards the upcoming event at the location to be measured as an entity in the knowledge-graph, named x. Firstly, statistical data are removed according to the geographic geological factor attributes listed in the step 1, basic attribute data of a to-be-detected place are obtained, and then the basic attribute data are processed according to the processing mode of the step 2, so that the feature vector of the entity x is obtained. And transmitting the feature vector of the entity x to the random gradient decision tree model trained in the step 6, and performing feature combination on the feature vector of the entity x by the random gradient decision tree model to obtain a combined feature vector of the entity x. And inputting the combined feature vector of each entity in the positive sample set and the combined feature vector of the entity x into a trained knowledge map model, and recording the times of similarity of prediction results.

Assume an entity in the positive sample set, p₁,p₂,…,p₁₀₀The total number is 100. And the entity x respectively form an entity pair: (p)₁,x)、(p₂,x)…(p₁₀₀X). In pairs of entities (p)₁X) is an example, p₁The combined feature vector of x and the combined feature vector of x are transmitted to the knowledge graph model trained in step 9, and the relationship between the pair of entities is predicted by the knowledge graph model. Predicting the relation between all the entity pairs, and if the number of the entity pairs with similar prediction results is 90, then the method is smoothThe hill probability is 90/100.

A system for implementing a landslide risk prediction method based on knowledge graph construction, as shown in fig. 6, includes: training unit and risk prediction unit

The training unit is used for constructing a knowledge graph model, training the model by using the existing historical landslide data and obtaining the knowledge graph model capable of judging the relationship; and the risk prediction unit calculates and obtains the probability of landslide of the place according to the attribute value of the relevant geographic condition of a new place in the same region by using the model and the historical landslide data obtained by the training unit.

the data collection module is used for collecting geographic geological attribute values related to landslide events in all landslide event records in a research area to form a basic attribute data set, and transmitting results to the data processing and generating module;

the relation judgment model module is used for training a knowledge graph model which can judge the relation between the entities.

A schematic diagram of the data transfer between parts in the system is shown in fig. 6. The output result of the training unit is transmitted to the risk prediction unit. In the training unit, the data collection module outputs results and transmits the results to the data processing and generating module; the positive sample set and the negative sample set generated by the data processing and generating module are transmitted to the ternary group data set building module, and the generated new basic attribute data set is transmitted to the characteristic combination module; the output result of the characteristic combination module is transmitted to a relation judgment model module; the relational feature vector generation module outputs a result and transmits the result to the relational judgment model module; and the output result of the triple data set construction module is transmitted to the relation judgment model module.

Claims

1. A landslide risk prediction method constructed based on a knowledge graph is characterized by comprising the following steps:

step 2: processing data in the basic attribute data set;

and step 3: recording each landslide event counted in the step 1 as an entity in the knowledge graph, and naming the entities as p₁,p₂,…,p_x,…,p_y,…,p_nThese entities representing a landslide event are defined as positive samples, from positiveThe samples constitute a positive sample set, denoted pos ═ p₁,p₂,…,p_x,…,p_y,…,p_nRecording a data set in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set;

and 4, step 4: setting the composition of the sliding body material as an independent variable, and setting the attribute value of the geological factor having direct correlation with the independent variable: setting front and rear edge elevation attribute values, slider material volume attribute values, slider average thickness attribute values, slope gradient attribute values and slope type attribute values as dependent variables, grouping data groups corresponding to the same slider material composition attribute values in the data set into a group, and counting the value range of each dependent variable by taking the group as a unit;

and 5: respectively and correspondingly generating a group of new data groups corresponding to the landslide events counted in the step 1 according to the value range counted in the step 4 and the related geographic geological factor attribute value counted in the step 1, wherein the data groups corresponding to the landslide events are respectively corresponding to the data groups, the landslide events are also marked as an entity in the knowledge graph, and the entities are named as s₁,s₂,…,s_x,…,s_y,…,s_nThese entities representing no-landslide events are defined as negative examples, which constitute a set of negative examples, denoted neg ═ s₁,s₂,…,s_x,…,s_y,…,s_nRecording each new data set as a feature vector of a corresponding entity in the negative sample set;

adding all new data groups into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set;

and 7: generating a feature vector of a relation in the knowledge graph;

and 8: constructing a triple data set;

2. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 2 further comprises:

and processing all data groups in the basic attribute data set according to the processing mode of the attribute values of the geological factors, wherein each newly obtained data group has 13 basic characteristic values.

3. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 4 further comprises:

the statistical method of the value range of each dependent variable is explained respectively:

slope type attribute value: the data groups are the attribute values of a character type, 3 types are provided, namely a forward slope, an oblique slope and a reverse slope, the data groups are represented by numbers 1-3 after the step 2 is executed, if the slope types in all the data groups are the same, the number of the type is recorded, and otherwise, the data groups are not recorded.

4. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein a new data set corresponding to the landslide event is generated in the step 5, and the specific method is as follows:

5. The landslide risk prediction method based on knowledge graph construction as claimed in claim 1 wherein the training feature combination model in step 6 is a gradient stochastic decision tree model.

6. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 8 further comprises:

and generating triple samples according to the method, wherein the number of the samples with similar relations is equal to that of the samples with dissimilar relations, the generation number of the triple samples with each relation is more than or equal to 1000, and mixing the samples to form a triple data set.

7. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 9 further comprises:

the method for generating the error triples is as follows:

inputting the two triples into a knowledge graph model;

step 9.2: calculating the error distance to obtain a prediction result

dis＝|h*w+r-t*w| (1)

step 9.3: calculating loss value

Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; respectively calculating the error distance dis of the correct triple through the formula (1)_posDistance of error dis from the wrong triplet_negIf the error distance of the correct triple is smaller than that of the wrong triple, the prediction result of the knowledge graph model is correct, and no loss value is considered to be generated; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:

loss＝max{dis_pos-dis_neg,0} (2)

step 9.4, automatically adjusting parameters of the knowledge graph model;

8. The landslide risk prediction method based on knowledge graph construction of claim 1 wherein said step 10 further comprises:

9. The system for realizing the landslide risk prediction method based on knowledge graph construction according to claim 1, comprising: a training unit and a risk prediction unit;

10. The system for implementing the knowledge-graph-based landslide risk prediction method of claim 1, wherein the training unit comprises: the system comprises a data collection module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging model module;