CN111639878A - Landslide risk prediction method and system based on knowledge graph construction - Google Patents

Landslide risk prediction method and system based on knowledge graph construction Download PDF

Info

Publication number
CN111639878A
CN111639878A CN202010516705.4A CN202010516705A CN111639878A CN 111639878 A CN111639878 A CN 111639878A CN 202010516705 A CN202010516705 A CN 202010516705A CN 111639878 A CN111639878 A CN 111639878A
Authority
CN
China
Prior art keywords
value
landslide
attribute
knowledge graph
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010516705.4A
Other languages
Chinese (zh)
Other versions
CN111639878B (en
Inventor
马连博
王经纬
王兴伟
朱万成
张鹏海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202010516705.4A priority Critical patent/CN111639878B/en
Publication of CN111639878A publication Critical patent/CN111639878A/en
Application granted granted Critical
Publication of CN111639878B publication Critical patent/CN111639878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

A landslide risk prediction method and system based on knowledge graph construction comprises a training unit and a risk prediction unit, wherein the training unit comprises: the system comprises a data acquisition module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging module; the method comprises the following steps: acquiring landslide events occurring in a research area to form a basic attribute data set; processing data; constructing a positive sample set; constructing a negative sample set to generate a new basic attribute data set; training a feature combination model by using the new basic attribute data set, and performing feature combination on the feature vectors to generate 39-dimensional combined feature vectors; generating a feature vector of a relation in the knowledge graph; constructing a triple data set; training a knowledge graph model using the triplet data set; and predicting the landslide occurrence probability of the to-be-detected place, constructing a knowledge graph and applying the knowledge graph to landslide risk prediction to obtain higher landslide prediction accuracy.

Description

Landslide risk prediction method and system based on knowledge graph construction
Technical Field
The invention belongs to the technical field of landslide risk prediction, and particularly relates to a landslide risk prediction method and system based on knowledge graph construction
Background
Landslide is a common natural geographic disaster and brings great threat to the life and property safety of people. How to predict the risk of occurrence of a landslide event at a certain place is an important work and one of the directions of research caused by professionals. The landslide forecasting task is divided into space forecasting and time forecasting. The site of the landslide event usually has some unique geographic geological factors, and the research on the geographic geological factors can be used for searching the environmental conditions when the landslide event occurs so as to predict the occurrence probability of the landslide event at one site. In recent years, researchers have proposed many prediction methods, such as: calculating the probability of occurrence of the landslide event after various factors are superposed by using a Bayesian probability calculation formula; predicting the occurrence probability of the landslide event by using probability prediction models such as a support vector machine and a likelihood ratio model; by utilizing an artificial neural network method, the internal numerical relation and the like of landslide events caused by various factors are automatically searched, and the probability of landslide occurrence is predicted according to the attribute values.
The landslide risk prediction is carried out by applying a probability type model (such as a Bayes probability calculation method, a support vector machine, a likelihood ratio model and the like), and some problems exist: the influence degree of various factors on the occurrence of the landslide event needs to be determined by an artificial assumed method or cannot be determined at all, so that the accuracy of the landslide risk prediction result is finally influenced. The method of applying the artificial neural network only considers the influence factors of the neural network for automatically learning each factor and does not consider the mutual relation among the factors. Therefore, at present, no reasonable and effective method with higher accuracy is available for landslide risk prediction.
Knowledge graph construction belongs to the field of natural language processing, and represents the incidence relation among things existing in the real world, and expresses the incidence relation existing in the objective world in the form of a graph so as to present all the visualized and hidden relations existing objectively. In the knowledge graph, entities are used for representing a specific object which is in the real world and is in guest existence, and relationships are used for representing the association between the entities. The knowledge graph based on representation learning is a mode for constructing the knowledge graph, and the representation learning means that text information is represented in a vector form, the association relation between words is converted into the relation between vectors, different words are distinguished by the vectors, and language meaning expressed by the words can be well represented. The method comprises the steps of representing each relation in the knowledge graph into a head entity, a relation and a tail entity based on the knowledge graph representing learning, replacing the head entity, the relation and the tail entity with the characteristic vectors corresponding to the head entity, and identifying a correct relation by establishing a model according to a target so as to build the knowledge graph.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a landslide risk prediction method and a landslide risk prediction system based on knowledge graph construction, belongs to the aspect of space prediction and forecast, and predicts the landslide risk by using a machine learning method and a construction idea of the knowledge graph. Specifically, a method and a system for predicting the occurrence probability of a landslide event through attribute values described for local geographic conditions based on a knowledge graph construction thought representing learning are designed. The method combines the construction idea of knowledge graph based on representation learning and the task of landslide risk prediction, and judges the possibility of landslide event at one place by judging the relation and analogy between the places described by different geographic conditions.
A landslide risk prediction method based on knowledge graph construction comprises the following steps:
step 1: obtaining landslide events occurring in a research area, and respectively counting attribute values of local related geographic geological factors when each landslide event occurs, wherein the counted attributes comprise: front and back edge elevation, sliding mass material volume, sliding mass average thickness, sliding mass material composition, sliding bed lithology, rock stratum attitude, slope direction, slope type, slope form and slope gradient; the value corresponding to the attribute is an attribute value, each landslide event corresponds to a group of data groups containing the 10 geological factor data attribute values, and all the data groups corresponding to all the landslide events form a basic attribute data set;
step 2: processing data in the basic attribute data set;
according to the characteristics of each attribute, processing each attribute value to express the font attribute value into a numerical value form, and splitting or combining the numerical attribute values with a plurality of numerical values, wherein the specific method comprises the following steps:
front and rear edge elevation attribute values: two numerical terms are contained as two basic characteristic values;
volume property value of sliding mass: containing a numerical term as a basic characteristic value;
slide average thickness property value: the method comprises the following steps of (1) taking an average value as a basic characteristic value, wherein the average value comprises two numerical items;
sliding body material composition attribute value: the attribute values of the character type are 20 types in total, the number is 1-20, and the number represents the represented classification;
sliding bed lithology attribute value: the attribute values of the character type are 13 types in total, the numbers are 1-13, and the numbers are used for representing the represented classification;
the formation attitude attribute value: two numerical terms are contained as two basic characteristic values;
slope direction attribute value: containing a numerical term as a basic characteristic value;
slope type attribute value: the attribute values of the character type are 3 types in total, the number is 1-3, and the number represents the represented classification;
slope form attribute value: the attribute values of the character type are 4 types in total, the numbers are 1-4, and the numbers are used for representing the represented classification;
slope gradient attribute value: two numerical terms are contained as two basic characteristic values;
processing all data groups in the basic attribute data set according to the processing mode of the attribute values of the geological factors, wherein each newly obtained data group has 13 basic characteristic values;
and step 3: recording each landslide event counted in the step 1 as an entity in the knowledge graph, and naming the entities as p1,p2,…,px,…,py,…,pnThese entities representing landslide events are defined as positive samples, and a positive sample set is formed by the positive samples, and is denoted as pos ═ p1,p2,…,px,…,py,…,pnRecording a data set in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set;
and 4, step 4: setting the composition of the sliding body material as an independent variable, and setting the attribute value of the geological factor having direct correlation with the independent variable: setting the front and rear edge elevation attribute value, the slider material volume attribute value, the slider average thickness attribute value, the slope gradient attribute value and the slope type attribute value as dependent variables, grouping data groups corresponding to the attribute values of the same slider material composition in the data set, and counting the value range of each dependent variable by taking the group as a unit, wherein the following description is respectively given to the statistical method of the value range of each dependent variable:
front and rear edge elevation attribute values: the method comprises the steps that two numerical values respectively represent a front edge elevation and a rear edge elevation, the data sets are provided with data sets of the same sliding matter, the maximum value and the minimum value of all the numerical values are counted, the front edge elevation value and the rear edge elevation value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
volume property value of sliding mass: counting the maximum value and the minimum value in the volume value, wherein the counting result is a value range;
slide average thickness property value: counting the maximum value and the minimum value in the average thickness values of the sliding body, wherein the counting result is a value range;
slope gradient attribute value: the method comprises the steps that two numerical values are provided, the maximum slope gradient and the minimum slope gradient of a slope are respectively represented, the data set is provided with a same sliding body material, the maximum value and the minimum value of all the numerical values are counted, the maximum slope value and the minimum slope value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
slope type attribute value: the data are attribute values of a character type, 3 types are provided, namely a forward slope, an oblique slope and a reverse slope, the data are represented by numbers 1-3 after the step 2 is executed, if the slope types in all the data groups are the same, the number of the type is recorded, and otherwise, the data are not recorded;
and 5: respectively and correspondingly generating a group of new data groups corresponding to the landslide events counted in the step 1 according to the value range counted in the step 4 and the related geographic geological factor attribute value counted in the step 1, wherein the data groups corresponding to the landslide events are respectively corresponding to the data groups, the landslide events are also marked as an entity in the knowledge graph, and the entities are named as s1,s2,…,sx,…,sy,…,snThese entities representing no-landslide events are defined as negative examples, which constitute a set of negative examples, denoted neg ═ s1,s2,…,sx,…,sy,…,snAnd recording each new data set as a feature vector of a corresponding entity in the negative sample set, wherein the specific method is as follows:
for each group of data corresponding to each group of data groups in the basic attribute data set, randomly selecting one of the five dependent variables counted in step 4, changing the selected dependent variable, not changing the attribute values corresponding to other attributes, generating a new data group, and explaining a processing mode after different dependent variables are selected as follows:
front and rear edge elevation attribute values: randomly generating two values in a range smaller than the minimum value, respectively representing the elevation of the front edge and the elevation of the rear edge, and replacing the two corresponding values in the original data set in the basic attribute data set;
volume property value of sliding mass: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slide average thickness property value: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slope gradient attribute value: randomly generating two values in a range smaller than the minimum value, respectively representing the maximum slope and the minimum slope of the slope, and replacing the two corresponding values in the original data set in the basic attribute data set;
slope type attribute value: after the statistics in step 4, if the value of the statistical item exists, the value is randomly replaced by one of the other two encoding values to replace the corresponding value in the original data group in the basic attribute data set; if the value of the statistical item does not exist, selecting one from the front edge and the rear edge, the material volume of the sliding body, the average thickness of the sliding body and the slope gradient again, and processing according to the requirement of the selected dependent variable attribute after selection;
according to the method, each data group in the basic attribute data set obtained in the step 1 correspondingly generates a new data group, and all the new data groups are added into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set;
step 6: training a feature combination model by using the data sets in the new basic attribute data set generated in the step 5, then performing feature combination on each data set by using the trained feature combination model, and after the feature combination, changing the dimension of each feature vector in the new basic attribute data set from the original 13 dimension to 39 dimensions, wherein the 39-dimensional vector is a combined feature vector of an entity;
the training characteristic combination model adopts a gradient random decision tree model;
and 7: generating a feature vector of a relation in the knowledge graph;
the knowledge graph constructed by the method has two relations: similar and dissimilar; randomly generating an initialization value of a feature vector of the relationship, wherein the dimension of the feature vector is the same as that of a combined feature vector of the entity and is 39;
if the relationship between the two entities is similar, the two entities are both landslide events or not landslide events, and if the relationship between the two entities is dissimilar, one of the two entities is a landslide event and the other is not landslide event;
and 8: constructing a triple data set;
each triple represents a relationship existing in the real world, the triple is denoted as (head entity, tail entity, relationship), and the relationship is represented between the head entity and the tail entity, and there are three ways for generating the triple sample:
randomly drawing an entity p from a positive sample set posxRandomly drawing an entity s in the negative sample set negyThe relationships are dissimilar, the triplet is (p)x,syNot similar) or(s)y,pxNot similar);
two entities p are randomly taken from the positive sample set posx、pyRelationships are similar, the triplet is (p)x,pySimilar);
randomly taking two entities s from the negative sample set negx、syRelationships are similar, the triplet is(s)x,sySimilar);
generating triple samples according to the method, wherein the number of samples with similar relations is equal to that of samples with dissimilar relations, the generation number of the triple samples with each relation is more than or equal to 1000, and mixing the samples to form a triple data set;
and step 9: constructing a knowledge graph model, and training the knowledge graph model by using the triple data sets;
step 9.1: inputting the triples and the error triples generated correspondingly to the triples into a knowledge graph model;
each triplet in the triplet data set is a correct triplet, and an error triplet is generated according to the correct triplet;
the method for generating the error triples is as follows:
if the head entity of the triple belongs to the positive sample set pos and the relationship is similar, randomly selecting one from the negative sample set neg to replace the original tail entity;
if the head entity of the triple belongs to the positive sample set pos and the relation is dissimilar, randomly selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relationship is similar, selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relation is not similar, selecting one from the negative sample set neg to replace the original tail entity;
inputting the two triples into a knowledge graph model;
step 9.2: calculating the error distance to obtain a prediction result
Calculating the error distance of two triples by a knowledge graph model, wherein the error distance of the triples is obtained by carrying out vector operation on the combined feature vector and the relation feature vector of two entities in the triples, and the specific calculation formula is as follows:
dis=|h*w+r-t*w| (1)
wherein w is a 39-dimensional weight vector, the initialized value of which is randomly generated, h represents the combined feature vector of the head entity in the triplet, t represents the combined feature vector of the tail entity in the triplet, r represents the feature vector of the relationship in the triplet:
then, comparing the error distances of the two triples to obtain a prediction result: the knowledge graph model predicts the triples with small error distances as correct triples;
step 9.3: calculating loss value
Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; respectively calculating the error distance dis of the correct triple through the formula (1)posDistance of error dis from the wrong tripletnegIf the error distance of the correct triplet is smaller than that of the wrong triplet, the result predicted by the knowledge spectrum model is correct and is not consideredThere is a resulting loss value; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:
loss=max{dispos-disneg,0} (2)
where loss represents the loss value, disposError distance, dis, representing the correct tripletnegRepresenting the error distance of the erroneous triplet;
step 9.4, automatically adjusting parameters of the knowledge graph model;
repeating the steps 9.1 to 9.4 for each triple sample in the triple data set obtained in the step 8 until the loss value is not reduced any more, finishing training and finishing the construction of the knowledge map model;
after the training of the knowledge graph model is finished, the relationship between a pair of entities is judged by the knowledge graph model, and the judgment method comprises the following steps: inputting the combined feature vector of the pair of entities into a knowledge graph model, and respectively calculating the relationship between the pair of entities and two relations by the knowledge graph model according to a formula (1): and comparing the error distances between similarity and dissimilarity, wherein the small error distance is the relation between the two entities predicted by the knowledge graph model.
Step 10: predicting the probability of landslide of a to-be-detected place by using a trained knowledge graph model, wherein the to-be-detected place belongs to the research area in the step 1;
obtaining attribute values of corresponding attributes of the to-be-detected place according to the counted attributes of the geographic geological factors in the step 1; the step of determining the probability of occurrence of landslide at the location is as follows:
step 10.1: processing the attribute values of the set of places to be tested according to the processing mode of each attribute value in the step 2 to obtain a feature vector of the new entity at present;
step 10.2: performing feature combination on the feature vector of the new entity by using the feature combination model obtained by training in the step 6 to obtain a combined feature vector of the new entity;
step 10.3: and respectively forming entity pairs by the new entity and all entities in the positive sample set, transmitting the combined characteristic vectors of the entity pairs and all the entities in the positive sample set to a knowledge graph model, giving the relationship between the new entity and all the entities in the positive sample set by the knowledge graph model, counting the number of similar relationships and marking as a, the number of entities in the positive sample as n, and determining the ratio a/n as the probability of landslide of the place to be detected.
A system for realizing a landslide risk prediction method constructed based on a knowledge graph comprises the following steps: a training unit and a risk prediction unit;
the training unit is used for constructing a knowledge graph model, training the model by using the existing historical landslide data and obtaining the knowledge graph model for judging the relationship; and the risk prediction unit calculates and obtains the probability of landslide of the place according to the attribute value of the relevant geographic condition of a new place in the same region by using the model and the historical landslide data obtained by the training unit.
Wherein the training unit comprises: the system comprises a data collection module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging model module;
the data collection module is used for inputting geographic geological attribute values related to landslide events in all landslide event records in a research area to form a basic attribute data set, and transmitting results to the data processing and generating module;
the data processing and generating module is used for processing data in the basic attribute data set, generating a positive sample set, and recording a data group in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set; respectively and correspondingly generating a new data set corresponding to the landslide event by the data set corresponding to each landslide event; generating a negative sample set, recording the characteristic vectors of the corresponding entities in the negative sample set in each new group of data sets, adding the characteristic vectors into the original basic attribute data set to generate a new basic attribute data set; the positive sample set and the negative sample set are transmitted to a triple data set construction module, and the new basic attribute data set is transmitted to a characteristic combination module;
the characteristic combination module is used for training a model for characteristic combination, then the characteristic combination model is used for carrying out characteristic combination on the characteristic vector of each entity, and the result is transmitted to the relation judgment model module;
the relation characteristic vector generation module is used for generating a characteristic vector of the relation in the knowledge graph, and transmitting the result to the relation judgment model module;
the ternary group data set building module is used for building a ternary group data set, and transmitting the result to the relationship judgment model module;
the relationship judgment model module is used for training a knowledge graph model for judging the relationship between the entities.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: a landslide risk prediction method and a landslide risk prediction system based on knowledge graph construction apply the knowledge graph construction technology to the field of landslide risk prediction, so that landslide risk prediction is not limited to a mode of constructing a probability model any more, and a probability value is finally calculated, and the method is a brand-new research direction. The influence degree of each relevant geographic geological factor when the landslide occurs is automatically searched from historical landslide data by utilizing a machine learning method, the influence weight of each factor is not determined artificially, a characteristic combination model is introduced, and compared with a method for simply applying an artificial neural network in the prior art, the nonlinear relation among the factors is considered, so that the influence of each factor on the landslide event is further searched, and the landslide prediction accuracy can be higher by applying the scheme.
Drawings
FIG. 1 is a schematic diagram illustrating vector representation when a correct relationship is represented based on a head entity feature vector, a relationship feature vector, and a tail entity feature vector in a knowledge graph for representation learning according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of landslide events in the form of a knowledge graph in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a landslide risk prediction method based on knowledge graph construction in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a feature combination method used in the present invention according to an embodiment of the present invention;
FIG. 5 is a schematic illustration of an operation of predicting a landslide risk at a location in accordance with an embodiment of the present invention;
FIG. 6 is a schematic diagram of a landslide risk prediction system constructed based on a knowledge graph in an embodiment of the invention
Detailed Description
In the field of landslide risk prediction, landslide risk prediction in space is more important than landslide risk prediction in time, and the position where landslide is likely to occur cannot be determined, i.e., landslide occurrence time cannot be predicted. The research results in the aspect are not much for the prediction in the aspect of landslide risk space.
The main theoretical basis of the invention is as follows: according to the method for predicting the landslide risk in the research space aspect, the probability that landslide possibly occurs at a place needing to be predicted is judged by comparing the geographical geological conditions of the place where landslide has already occurred with the place needing to be predicted according to the existing landslide data.
Knowledge graph construction based on representation learning is a popular direction of current knowledge graph research, a TransE model is the most representative model, and other knowledge graph models based on representation learning are improved on the basis of the theory of the TransE model. The main idea of the TransE model is to represent the association between an entity and a relationship in the form of numerical computation and the relationship in the form of triples, assuming that such a relationship exists in each group of triples: the sum of the head entity feature vector and the relationship feature vector is equal to the tail entity feature vector, and the schematic diagram in the vector space is shown in fig. 1. Based on the assumption, the feature vectors of all head entities, relations and tail entities are trained through a machine learning method, and a vector representation which can better meet the assumption is obtained. The proposal of the TransE model initiates a new research method in the field of knowledge graph research, namely the construction of the knowledge graph based on the representation learning, and the method is proved to have good effect.
The entity in the knowledge graph refers to a specific object in the real world, and the knowledge graph can show the association relationship existing among the entities in the real world. Both landslide and non-landslide events can be identified as entities, the relationship between landslide and non-landslide events can be identified as dissimilar, the relationship between landslide and landslide events can be identified as similar, and fig. 2 is a schematic diagram of a knowledge graph with 8 entities e 1-e 8, wherein the connections between the entities are the relationships between the entities, whether similar or dissimilar. In combination with the above-mentioned conventional method for predicting the risk of landslide in space, that is, based on the existing landslide data, the probability that the landslide may occur at the place needing to be predicted is determined by comparing the geographical and geological conditions of the place where the landslide has occurred with the place needing to be predicted. With the precondition, the knowledge map idea can be combined with a landslide risk prediction task. A risk prediction system is built by processing the existing data and by means of a construction and training method of a machine learning model, so that a risk prediction task is completed.
The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown by way of illustration and not of limitation.
A landslide risk prediction method and a landslide risk prediction system constructed based on a knowledge graph are disclosed, wherein a specific flow chart is shown in figure 3, and the method comprises the following steps:
step 1: obtaining landslide events occurring in a research area, and respectively counting attribute values of local related geographic geological factors when each landslide event occurs, wherein the counted attributes comprise: front and back edge elevation, sliding mass material volume, sliding mass average thickness, sliding mass material composition, sliding bed lithology, rock stratum attitude, slope direction, slope type, slope form and slope gradient; the value corresponding to the attribute is an attribute value, each landslide event corresponds to a group of data groups containing the 10 geological factor data attribute values, and all the data groups corresponding to all the landslide events form a basic attribute data set;
the study area was selected as the red layer typical of the Jurassic system in the three gorges reservoir region. The latitude in different areas, the climatic conditions and the living habits of people have great influence on the geological environment of the areas, and different areas may have different weighted values with the same attribute when landslide events occur, so the research area needs to be determined first.
Occurring in the areaZiguo guo river landslide, the related geographical geological attribute value of this landslide is recorded: the elevation of the front and rear edges is 135m-432m, and the volume of the sliding mass is 16000000m3The average thickness of the sliding body is 40m, the sliding body comprises layered quartz sandstone and sand-powder block cracked rock, the lithology of the sliding bed is J1x, the formation shape of the rock layer is 10 degrees ∠ 36 degrees, the slope direction is 340 degrees, the slope type is a forward slope, the slope shape is stepped, the slope gradient is 25 degrees to 45 degrees, and the data group corresponding to the guozi Taguo river sliding slope is (135 + 432,16000000 and 40 degrees, the layered quartz sandstone and the sand-powder block cracked rock, J1x, 10 degrees ∠ 36 degrees, 340 degrees, the forward slope is stepped and 25 degrees to 45 degrees.
Step 1, counting the values corresponding to the attributes in the data set, wherein some of the attributes of the numerical type include a plurality of numerical values, for example: the attribute "leading and trailing edge elevation" includes two numerical values representing the leading edge elevation and the trailing edge elevation, respectively.
Step 2: processing data in the basic attribute data set;
according to the characteristics of each attribute, processing each attribute value to express the font attribute value into a numerical value form, and splitting or combining the numerical attribute values with a plurality of numerical values, wherein the specific method comprises the following steps:
front and rear edge elevation attribute values: two numerical terms are contained as two basic characteristic values;
volume property value of sliding mass: containing a numerical term as a basic characteristic value;
slide average thickness property value: the method comprises the following steps of (1) taking an average value as a basic characteristic value, wherein the average value comprises two numerical items;
sliding body material composition attribute value: the attribute values of the character type are 20 types in total, the number is 1-20, and the number represents the represented classification;
sliding bed lithology attribute value: the attribute values of the character type are 13 types in total, the numbers are 1-13, and the numbers are used for representing the represented classification;
the formation attitude attribute value: two numerical terms are contained as two basic characteristic values;
slope direction attribute value: containing a numerical term as a basic characteristic value;
slope type attribute value: the attribute values of the character type are 3 types in total, the number is 1-3, and the number represents the represented classification;
slope form attribute value: the attribute values of the character type are 4 types in total, the numbers are 1-4, and the numbers are used for representing the represented classification;
slope gradient attribute value: two numerical terms are contained as two basic characteristic values;
processing all data groups in the basic attribute data set according to the processing mode of the attribute values of the geological factors, wherein each newly obtained data group has 13 basic characteristic values;
ziguo guo river landslide data set: (135-432,16000000,40, layered quartz sandstone, silty sand block cracking rock, J1x, 10-36 degrees, 340 degrees, consequent slope, step-shaped, 25-45 degrees), by the above treatment: splitting 135-plus 432 into 135 and 432,16000000 reserving an original value, 40 reserving an original value, converting the layered quartz sandstone and the silt block fractured rock into the number 6, converting J1x into the number 1, splitting 10 and 36 at an angle of 10 degrees, 340 reserving an original value, converting a forward slope into the number 1, converting the step shape into the number 1, splitting 25 and 45 at an angle of 25 degrees to 45 degrees, and finally converting into: (135, 432,16000000,40,6,1, 10, 36,340,1,1, 25, 45).
And step 3: recording each landslide event counted in the step 1 as an entity in the knowledge graph, and naming the entities as p1,p2,…,px,…,py,…,pnThese entities representing landslide events are defined as positive samples, and a positive sample set is formed by the positive samples, and is denoted as pos ═ p1,p2,…,px,…,py,…,pnRecording a data set in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set;
the landslide event is recorded as an entity in the knowledge graph and named as p1And an entity p1Added to the positive sample set pos.
And 4, step 4: setting the composition of the sliding body material as an independent variable, and setting the attribute value of the geological factor having direct correlation with the independent variable: setting the front and rear edge elevation attribute value, the slider material volume attribute value, the slider average thickness attribute value, the slope gradient attribute value and the slope type attribute value as dependent variables, grouping data groups corresponding to the attribute values of the same slider material composition in the data set, and counting the value range of each dependent variable by taking the group as a unit, wherein the following description is respectively given to the statistical method of the value range of each dependent variable:
front and rear edge elevation attribute values: the method comprises the steps that two numerical values respectively represent a front edge elevation and a rear edge elevation, the data sets are provided with data sets of the same sliding matter, the maximum value and the minimum value of all the numerical values are counted, the front edge elevation value and the rear edge elevation value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
volume property value of sliding mass: counting the maximum value and the minimum value in the volume value, wherein the counting result is a value range;
slide average thickness property value: counting the maximum value and the minimum value in the average thickness values of the sliding body, wherein the counting result is a value range;
slope gradient attribute value: the method comprises the steps that two numerical values are provided, the maximum slope gradient and the minimum slope gradient of a slope are respectively represented, the data set is provided with a same sliding body material, the maximum value and the minimum value of all the numerical values are counted, the maximum slope value and the minimum slope value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
slope type attribute value: the data are attribute values of a character type, 3 types are provided, namely a forward slope, an oblique slope and a reverse slope, the data are represented by numbers 1-3 after the step 2 is executed, if the slope types in all the data groups are the same, the number of the type is recorded, and otherwise, the data are not recorded;
the composition of the sliding mass substances is independent variable, 20 sliding mass substances are provided, one 'collapsing and stacking silty clay clamped rock' is selected as an example, a data set that the sliding mass substances in all the landslide events in the step 1 are 'collapsing and stacking silty clay clamped rock' is counted, and the value ranges of 5 dependent variables are shown in the following table:
Figure BDA0002530393470000101
and 5: respectively and correspondingly generating a group of new data groups corresponding to the landslide events counted in the step 1 according to the value range counted in the step 4 and the related geographic geological factor attribute value counted in the step 1, wherein the data groups corresponding to the landslide events are respectively corresponding to the data groups, the landslide events are also marked as an entity in the knowledge graph, and the entities are named as s1,s2,…,sx,…,sy,…,snThese entities representing no-landslide events are defined as negative examples, which constitute a set of negative examples, denoted neg ═ s1,s2,…,sx,…,sy,…,snAnd recording each new data set as a feature vector of a corresponding entity in the negative sample set, wherein the specific method is as follows:
for each group of data corresponding to each group of data groups in the basic attribute data set, randomly selecting one of the five dependent variables counted in step 4, changing the selected dependent variable, not changing the attribute values corresponding to other attributes, generating a new data group, and explaining a processing mode after different dependent variables are selected as follows:
front and rear edge elevation attribute values: randomly generating two values in a range smaller than the minimum value, respectively representing the elevation of the front edge and the elevation of the rear edge, and replacing the two corresponding values in the original data set in the basic attribute data set;
volume property value of sliding mass: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slide average thickness property value: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slope gradient attribute value: randomly generating two values in a range smaller than the minimum value, respectively representing the maximum slope and the minimum slope of the slope, and replacing the two corresponding values in the original data set in the basic attribute data set;
slope type attribute value: after the statistics in step 4, if the value of the statistical item exists, the value is randomly replaced by one of the other two encoding values to replace the corresponding value in the original data group in the basic attribute data set; if the value of the statistical item does not exist, one of the front and rear edge elevations, the sliding body material volume, the sliding body average thickness and the slope gradient is selected again, and the selected value is processed according to the requirement of the selected dependent variable attribute.
According to the method, each data group in the basic attribute data set obtained in the step 1 correspondingly generates a new data group, and all the new data groups are added into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set.
For one landslide event counted in step 1: the geographic geological factor attribute data of the landslide of the Tenn von domestic wharf is (135-plus 250,3600000,15, collapse and slide accumulation silty clay clamping crushed rock, J2S, 315-degree angle 18 degrees, 310 degrees, forward slope, straight line shape, 20-20 degrees), and the landslide is processed in the step 2 to obtain (135, 250,3600000,15, 1, 8,315,18,310,1, 2,20, 20). The sliding matter is independent variable, the independent variable attribute value is 1, the number 1 corresponds to the collapse and accumulation silty clay clamped crushed rock, and the value range of the dependent variable attribute value counted in the step 4 is as follows:
Figure BDA0002530393470000111
randomly selecting one of the front and rear edge elevation values, the sliding body material volume value, the sliding body average thickness value, the slope type value and the slope gradient value.
If the leading and trailing edge height values are selected, two values are randomly generated in a range less than 70, and the generated values are 45, 52, resulting in a new data set (45, 52,3600000,15, 1, 8,315,18,310,1, 2,20, 20).
If a slider mass volume value is selected, a value is randomly generated in a range less than 2160000, the generated value is 1160000, and a new data set (135, 250,1160000,15, 1, 8,315,18,310,1, 2,20,20) is obtained.
If the slider mean thickness value is selected, a value is randomly generated in a range less than 8, and the generated value is 5, resulting in a new data set (135, 250,3600000,5, 1, 8,315,18,310,1, 2,20, 20).
If the slope type value is selected, 1 in the value range represents a forward slope, one of the other two slope type reverse breakages and the inclined slope is randomly selected, the reverse breakage is selected, and the corresponding number is 3, so that a new data set (135, 250,3600000,15, 1, 8,315,18,310,3, 2,20,20) is obtained.
If a ramp slope value is selected, two values are randomly generated in a range less than 8, the generated values are 1,5, and the resulting new data set is (135, 250,3600000,15, 1, 8,315,18,310,1, 2,1, 5).
The new data set is obtained in one of the five cases, the newly generated data set is used as a feature vector of an entity in the knowledge graph, the entity represents a non-landslide event and is named s for the entity1And combining the entities s1Added to the negative sample set neg.
Step 6: training a feature combination model by using the data sets in the new basic attribute data set generated in the step 5, then performing feature combination on each data set by using the trained feature combination model, and after the feature combination, changing the dimension of each feature vector in the new basic attribute data set from the original 13 dimension to 39 dimensions, wherein the 39-dimensional vector is a combined feature vector of an entity;
the training characteristic combination model adopts a gradient random decision tree model;
this process is illustrated in FIG. 4, where data in the new base dataset is used to train a gradient random decision tree model. The random gradient decision tree model selected for application is one of feature combination models, combines a gradient enhancement framework with a decision tree algorithm, tries to combine different basic feature values and selects a most appropriate combination mode, and ensures that the process can be accurately and efficiently carried out. And after the gradient random decision tree model is trained, performing feature combination on each group of data corresponding to each entity in the new basic data set by using the trained model to obtain a combined feature vector of each entity.
And 7: generating a feature vector of a relation in the knowledge graph;
the knowledge graph constructed by the method has two relations: similar and dissimilar; randomly generating an initialization value of a feature vector of the relationship, wherein the dimension of the feature vector is the same as that of a combined feature vector of the entity and is 39;
if the relationship between the two entities is similar, the two entities are both landslide events or not landslide events, and if the relationship between the two entities is not similar, one of the two entities is a landslide event and the other is not landslide event;
the knowledge graph based on representation learning needs vector operation when calculation is carried out, so the relationship in the knowledge graph is also represented in a vector form, and the dimension of the feature vector is the same as that of the combined feature vector representing the entity, namely 39. The initialization values of the feature vectors representing the relationships may be randomly generated according to a standard normal distribution.
The knowledge-graph may present the objective connections that exist in real-world things, if one entity represents a landslide event, then another entity representing a landslide event has a similar relationship to it, and another entity representing a non-landslide event has a dissimilar relationship to it. In the actual implementation process, a plurality of entities representing landslide events and a plurality of entities representing non-landslide events exist in the constructed knowledge graph, so that two relationships exist among the entities in the whole knowledge graph, namely similarity and dissimilarity.
And 8: constructing a triple data set;
each triple represents a relationship existing in the real world, the triple is denoted as (head entity, tail entity, relationship), and the relationship is represented between the head entity and the tail entity, and there are three ways for generating the triple sample:
randomly drawing an entity p from a positive sample set posxRandomly drawing an entity s in the negative sample set negyThe relationships are dissimilar, the triplet is (p)x,syNot similar) or(s)y,pxNot similar);
two entities p are randomly taken from the positive sample set posx、pyRelationships are similar, the triplet is (p)x,pySimilar);
randomly taking two entities s from the negative sample set negx、syRelationships are similar, the triplet is(s)x,sySimilar);
generating triple samples according to the method, wherein the number of samples with similar relations is equal to that of samples with dissimilar relations, the generation number of the triple samples with each relation is more than or equal to 1000, and mixing the samples to form a triple data set;
and step 9: constructing a knowledge graph model, and training the knowledge graph model by using the triple data sets;
step 9.1: inputting the triples and the error triples generated correspondingly to the triples into a knowledge graph model;
each triplet in the triplet data set is a correct triplet, and an error triplet is generated according to the correct triplet;
the method for generating the error triples is as follows:
if the head entity of the triple belongs to the positive sample set pos and the relationship is similar, randomly selecting one from the negative sample set neg to replace the original tail entity;
if the head entity of the triple belongs to the positive sample set pos and the relation is dissimilar, randomly selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relationship is similar, selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relation is not similar, selecting one from the negative sample set neg to replace the original tail entity;
inputting the two triples into a knowledge graph model;
step 9.2: calculating the error distance to obtain a prediction result
Calculating the error distance of two triples by a knowledge graph model, wherein the error distance of the triples is obtained by carrying out vector operation on the combined feature vector and the relation feature vector of two entities in the triples, and the specific calculation formula is as follows:
dis=|h*w+r-t*w| (1)
wherein w is a 39-dimensional weight vector, the initialized value of which is randomly generated, h represents the combined feature vector of the head entity in the triplet, t represents the combined feature vector of the tail entity in the triplet, r represents the feature vector of the relationship in the triplet:
then, comparing the error distances of the two triples to obtain a prediction result: the knowledge graph model predicts the triples with small error distances as correct triples;
step 9.3: calculating loss value
Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; the error distances dis of the correct triples can be calculated respectively through the formula (1)posDistance of error dis from the wrong tripletnegIf the error distance of the correct triple is smaller than that of the wrong triple, the prediction result of the knowledge graph model is correct, and no loss value is considered to be generated; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:
loss=max{dispos-disneg,0} (2)
where loss represents the loss value, disposError distance, dis, representing the correct tripletnegRepresenting the error distance of the erroneous triplet;
step 9.4, automatically adjusting parameters of the knowledge graph model;
repeating the steps 9.1 to 9.4 for each triple sample in the triple data set obtained in the step 8 until the loss value is not reduced any more, finishing training and finishing the construction of the knowledge map model;
after the training of the knowledge graph model is finished, the knowledge graph model can be used for judging the relationship between a pair of entities by the following method: inputting the combined feature vector of the pair of entities into a knowledge graph model, and respectively calculating the relationship between the pair of entities and two relations by the knowledge graph model according to a formula (1): and comparing the error distances between similarity and dissimilarity, wherein the small error distance is the relation between the two entities predicted by the knowledge graph model.
During the training of the knowledge-graph, it is necessary to calculate a loss value, representing an error value between a correct result and an incorrect result. And transmitting the loss value to the model in a reverse direction, and adjusting parameters by the model by using the loss value and combining a machine learning method. So that the parameters in the model are adjusted to a set of suitable values, and the loss value is minimized. The purpose of model training is to find the values of the parameters in the model when the loss value is minimized.
And (5) a loss value calculation process. Given a triplet (p)1,s2Dissimilar), p1Belonging to a positive sample set pos with dissimilar relationship, randomly selecting an entity p from the positive sample set pos4The replacement of the tail entity generates an erroneous triplet (p)1,p4Not similar). The two triplets are passed to a knowledge graph model, and the error distance of the two triplets is calculated inside the knowledge graph model.
Entity p1The combined feature vector is h1Entities s2The combined feature vector is t1The feature vectors with dissimilar relationships are r1. Then the triplet (p)1,s2Dissimilar) error distance dis1Can be calculated using equation (1). The same way can calculate the error triple (p)1,p4Dissimilar) error distance dis2
If dis1Greater than dis2I.e. triplets (p) calculated by the knowledge graph model1,s2Dissimilar) is greater than the error distance of the wrong triplet (p)1,p4Dissimilar), that is, the knowledge graph considers that the incorrect triplet is correct, which indicates that there is an error between the result predicted by the knowledge graph model and the actual result, and the loss value is calculated to be dis according to the formula (2)1-dis2
If dis1Is less than dis2I.e. triplets (p) calculated by the knowledge graph model1,s2Dissimilar) is smaller than the error triplet (p)1,p4And dissimilar), that is, the knowledge graph considers that the correct triplet is correct, which indicates that the prediction result of the knowledge graph model is the same as the real result, and the error value is 0.
And (3) judging the relation between the two entities by using the trained knowledge graph model. Determination of entity p5With entity p6The relation between the entities p is obtained5Is h3Entity p6The feature vector of is t3The feature vectors with dissimilar relationships are r1The feature vector with similar relationship is r2Separately computing triplets (p) according to formula (1)5,p6Dissimilar) error distance dis3And triplet (p)5,p6Similar) error distance dis4If dis3<dis4Knowledge map model predicted p5And p6The relationship between them is dissimilar if dis3>dis4Knowledge map model predicted p5And p6The relationship between them is similar. In the specific implementation process, dis cannot be generated3And dis4The case of equality.
Step 10: and (3) predicting the probability of landslide occurrence of the to-be-detected place by using the trained knowledge graph model, wherein the to-be-detected place belongs to the research area in the step 1.
Obtaining attribute values of corresponding attributes of the to-be-detected place according to the counted attributes of the geographic geological factors in the step 1; the step of determining the probability of occurrence of landslide at the location is as follows:
step 10.1: processing the attribute values of the set of places to be tested according to the processing mode of each attribute value in the step 2 to obtain a feature vector of the new entity at present;
step 10.2: performing feature combination on the feature vector of the new entity by using the feature combination model obtained by training in the step 6 to obtain a combined feature vector of the new entity;
step 10.3: and respectively forming entity pairs by the new entity and all entities in the positive sample set, transmitting the combined characteristic vectors of the entity pairs and all the entities in the positive sample set to a knowledge graph model, giving the relationship between the new entity and all the entities in the positive sample set by the knowledge graph model, counting the number of similar relationships and marking as a, the number of entities in the positive sample as n, and determining the ratio a/n as the probability of landslide of the place to be detected.
And predicting the probability of landslide at a certain place in the same region. This process is illustrated in fig. 5, and regards the upcoming event at the location to be measured as an entity in the knowledge-graph, named x. Firstly, statistical data are removed according to the geographic geological factor attributes listed in the step 1, basic attribute data of a to-be-detected place are obtained, and then the basic attribute data are processed according to the processing mode of the step 2, so that the feature vector of the entity x is obtained. And transmitting the feature vector of the entity x to the random gradient decision tree model trained in the step 6, and performing feature combination on the feature vector of the entity x by the random gradient decision tree model to obtain a combined feature vector of the entity x. And inputting the combined feature vector of each entity in the positive sample set and the combined feature vector of the entity x into a trained knowledge map model, and recording the times of similarity of prediction results.
Assume an entity in the positive sample set, p1,p2,…,p100The total number is 100. And the entity x respectively form an entity pair: (p)1,x)、(p2,x)…(p100X). In pairs of entities (p)1X) is an example, p1The combined feature vector of x and the combined feature vector of x are transmitted to the knowledge graph model trained in step 9, and the relationship between the pair of entities is predicted by the knowledge graph model. Predicting the relation between all the entity pairs, and if the number of the entity pairs with similar prediction results is 90, then the method is smoothThe hill probability is 90/100.
A system for implementing a landslide risk prediction method based on knowledge graph construction, as shown in fig. 6, includes: training unit and risk prediction unit
The training unit is used for constructing a knowledge graph model, training the model by using the existing historical landslide data and obtaining the knowledge graph model capable of judging the relationship; and the risk prediction unit calculates and obtains the probability of landslide of the place according to the attribute value of the relevant geographic condition of a new place in the same region by using the model and the historical landslide data obtained by the training unit.
Wherein the training unit comprises: the system comprises a data collection module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging model module;
the data collection module is used for collecting geographic geological attribute values related to landslide events in all landslide event records in a research area to form a basic attribute data set, and transmitting results to the data processing and generating module;
the data processing and generating module is used for processing data in the basic attribute data set, generating a positive sample set, and recording a data group in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set; respectively and correspondingly generating a new data set corresponding to the landslide event by the data set corresponding to each landslide event; generating a negative sample set, recording the characteristic vectors of the corresponding entities in the negative sample set in each new group of data sets, adding the characteristic vectors into the original basic attribute data set to generate a new basic attribute data set; the positive sample set and the negative sample set are transmitted to a triple data set construction module, and the new basic attribute data set is transmitted to a characteristic combination module;
the characteristic combination module is used for training a model for characteristic combination, then the characteristic combination model is used for carrying out characteristic combination on the characteristic vector of each entity, and the result is transmitted to the relation judgment model module;
the relation characteristic vector generation module is used for generating a characteristic vector of the relation in the knowledge graph, and transmitting the result to the relation judgment model module;
the ternary group data set building module is used for building a ternary group data set, and transmitting the result to the relationship judgment model module;
the relation judgment model module is used for training a knowledge graph model which can judge the relation between the entities.
A schematic diagram of the data transfer between parts in the system is shown in fig. 6. The output result of the training unit is transmitted to the risk prediction unit. In the training unit, the data collection module outputs results and transmits the results to the data processing and generating module; the positive sample set and the negative sample set generated by the data processing and generating module are transmitted to the ternary group data set building module, and the generated new basic attribute data set is transmitted to the characteristic combination module; the output result of the characteristic combination module is transmitted to a relation judgment model module; the relational feature vector generation module outputs a result and transmits the result to the relational judgment model module; and the output result of the triple data set construction module is transmitted to the relation judgment model module.

Claims (10)

1. A landslide risk prediction method constructed based on a knowledge graph is characterized by comprising the following steps:
step 1: obtaining landslide events occurring in a research area, and respectively counting attribute values of local related geographic geological factors when each landslide event occurs, wherein the counted attributes comprise: front and back edge elevation, sliding mass material volume, sliding mass average thickness, sliding mass material composition, sliding bed lithology, rock stratum attitude, slope direction, slope type, slope form and slope gradient; the value corresponding to the attribute is an attribute value, each landslide event corresponds to a group of data groups containing the 10 geological factor data attribute values, and all the data groups corresponding to all the landslide events form a basic attribute data set;
step 2: processing data in the basic attribute data set;
and step 3: recording each landslide event counted in the step 1 as an entity in the knowledge graph, and naming the entities as p1,p2,…,px,…,py,…,pnThese entities representing a landslide event are defined as positive samples, from positiveThe samples constitute a positive sample set, denoted pos ═ p1,p2,…,px,…,py,…,pnRecording a data set in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set;
and 4, step 4: setting the composition of the sliding body material as an independent variable, and setting the attribute value of the geological factor having direct correlation with the independent variable: setting front and rear edge elevation attribute values, slider material volume attribute values, slider average thickness attribute values, slope gradient attribute values and slope type attribute values as dependent variables, grouping data groups corresponding to the same slider material composition attribute values in the data set into a group, and counting the value range of each dependent variable by taking the group as a unit;
and 5: respectively and correspondingly generating a group of new data groups corresponding to the landslide events counted in the step 1 according to the value range counted in the step 4 and the related geographic geological factor attribute value counted in the step 1, wherein the data groups corresponding to the landslide events are respectively corresponding to the data groups, the landslide events are also marked as an entity in the knowledge graph, and the entities are named as s1,s2,…,sx,…,sy,…,snThese entities representing no-landslide events are defined as negative examples, which constitute a set of negative examples, denoted neg ═ s1,s2,…,sx,…,sy,…,snRecording each new data set as a feature vector of a corresponding entity in the negative sample set;
adding all new data groups into the basic attribute data set obtained in the step 1 to generate a new basic attribute data set;
step 6: training a feature combination model by using the data sets in the new basic attribute data set generated in the step 5, then performing feature combination on each data set by using the trained feature combination model, and after the feature combination, changing the dimension of each feature vector in the new basic attribute data set from the original 13 dimension to 39 dimensions, wherein the 39-dimensional vector is a combined feature vector of an entity;
and 7: generating a feature vector of a relation in the knowledge graph;
the knowledge graph constructed by the method has two relations: similar and dissimilar; randomly generating an initialization value of a feature vector of the relationship, wherein the dimension of the feature vector is the same as that of a combined feature vector of the entity and is 39;
if the relationship between the two entities is similar, the two entities are both landslide events or not landslide events, and if the relationship between the two entities is dissimilar, one of the two entities is a landslide event and the other is not landslide event;
and 8: constructing a triple data set;
and step 9: constructing a knowledge graph model, and training the knowledge graph model by using the triple data sets;
step 10: and (3) predicting the probability of landslide occurrence of the to-be-detected place by using the trained knowledge graph model, wherein the to-be-detected place belongs to the research area in the step 1.
2. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 2 further comprises:
according to the characteristics of each attribute, processing each attribute value to express the font attribute value into a numerical value form, and splitting or combining the numerical attribute values with a plurality of numerical values, wherein the specific method comprises the following steps:
front and rear edge elevation attribute values: two numerical terms are contained as two basic characteristic values;
volume property value of sliding mass: containing a numerical term as a basic characteristic value;
slide average thickness property value: the method comprises the following steps of (1) taking an average value as a basic characteristic value, wherein the average value comprises two numerical items;
sliding body material composition attribute value: the attribute values of the character type are 20 types in total, the number is 1-20, and the number represents the represented classification;
sliding bed lithology attribute value: the attribute values of the character type are 13 types in total, the numbers are 1-13, and the numbers are used for representing the represented classification;
the formation attitude attribute value: two numerical terms are contained as two basic characteristic values;
slope direction attribute value: containing a numerical term as a basic characteristic value;
slope type attribute value: the attribute values of the character type are 3 types in total, the number is 1-3, and the number represents the represented classification;
slope form attribute value: the attribute values of the character type are 4 types in total, the numbers are 1-4, and the numbers are used for representing the represented classification;
slope gradient attribute value: two numerical terms are contained as two basic characteristic values;
and processing all data groups in the basic attribute data set according to the processing mode of the attribute values of the geological factors, wherein each newly obtained data group has 13 basic characteristic values.
3. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 4 further comprises:
the statistical method of the value range of each dependent variable is explained respectively:
front and rear edge elevation attribute values: the method comprises the steps that two numerical values respectively represent a front edge elevation and a rear edge elevation, the data sets are provided with data sets of the same sliding matter, the maximum value and the minimum value of all the numerical values are counted, the front edge elevation value and the rear edge elevation value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
volume property value of sliding mass: counting the maximum value and the minimum value in the volume value, wherein the counting result is a value range;
slide average thickness property value: counting the maximum value and the minimum value in the average thickness values of the sliding body, wherein the counting result is a value range;
slope gradient attribute value: the method comprises the steps that two numerical values are provided, the maximum slope gradient and the minimum slope gradient of a slope are respectively represented, the data set is provided with a same sliding body material, the maximum value and the minimum value of all the numerical values are counted, the maximum slope value and the minimum slope value are not distinguished during counting and are respectively regarded as a numerical value, and the counting result is a value range;
slope type attribute value: the data groups are the attribute values of a character type, 3 types are provided, namely a forward slope, an oblique slope and a reverse slope, the data groups are represented by numbers 1-3 after the step 2 is executed, if the slope types in all the data groups are the same, the number of the type is recorded, and otherwise, the data groups are not recorded.
4. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein a new data set corresponding to the landslide event is generated in the step 5, and the specific method is as follows:
for each group of data corresponding to each group of data groups in the basic attribute data set, randomly selecting one of the five dependent variables counted in step 4, changing the selected dependent variable, not changing the attribute values corresponding to other attributes, generating a new data group, and explaining a processing mode after different dependent variables are selected as follows:
front and rear edge elevation attribute values: randomly generating two values in a range smaller than the minimum value, respectively representing the elevation of the front edge and the elevation of the rear edge, and replacing the two corresponding values in the original data set in the basic attribute data set;
volume property value of sliding mass: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slide average thickness property value: randomly generating a numerical value in a range smaller than the minimum value, and replacing a corresponding numerical value in the original data set in the basic attribute data set;
slope gradient attribute value: randomly generating two values in a range smaller than the minimum value, respectively representing the maximum slope and the minimum slope of the slope, and replacing the two corresponding values in the original data set in the basic attribute data set;
slope type attribute value: after the statistics in step 4, if the value of the statistical item exists, the value is randomly replaced by one of the other two encoding values to replace the corresponding value in the original data group in the basic attribute data set; if the value of the statistical item does not exist, one of the front and rear edge elevations, the sliding body material volume, the sliding body average thickness and the slope gradient is selected again, and the selected value is processed according to the requirement of the selected dependent variable attribute.
5. The landslide risk prediction method based on knowledge graph construction as claimed in claim 1 wherein the training feature combination model in step 6 is a gradient stochastic decision tree model.
6. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 8 further comprises:
each triple represents a relationship existing in the real world, the triple is denoted as (head entity, tail entity, relationship), and the relationship is represented between the head entity and the tail entity, and there are three ways for generating the triple sample:
randomly drawing an entity p from a positive sample set posxRandomly drawing an entity s in the negative sample set negyThe relationships are dissimilar, the triplet is (p)x,syNot similar) or(s)y,pxNot similar);
two entities p are randomly taken from the positive sample set posx、pyRelationships are similar, the triplet is (p)x,pySimilar);
randomly taking two entities s from the negative sample set negx、syRelationships are similar, the triplet is(s)x,sySimilar);
and generating triple samples according to the method, wherein the number of the samples with similar relations is equal to that of the samples with dissimilar relations, the generation number of the triple samples with each relation is more than or equal to 1000, and mixing the samples to form a triple data set.
7. The landslide risk prediction method based on knowledge graph construction according to claim 1, wherein the step 9 further comprises:
step 9.1: inputting the triples and the error triples generated correspondingly to the triples into a knowledge graph model;
each triplet in the triplet data set is a correct triplet, and an error triplet is generated according to the correct triplet;
the method for generating the error triples is as follows:
if the head entity of the triple belongs to the positive sample set pos and the relationship is similar, randomly selecting one from the negative sample set neg to replace the original tail entity;
if the head entity of the triple belongs to the positive sample set pos and the relation is dissimilar, randomly selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relationship is similar, selecting one from the positive sample set pos to replace the original tail entity;
if the head entity of the triple belongs to the negative sample set neg and the relation is not similar, selecting one from the negative sample set neg to replace the original tail entity;
inputting the two triples into a knowledge graph model;
step 9.2: calculating the error distance to obtain a prediction result
Calculating the error distance of two triples by a knowledge graph model, wherein the error distance of the triples is obtained by carrying out vector operation on the combined feature vector and the relation feature vector of two entities in the triples, and the specific calculation formula is as follows:
dis=|h*w+r-t*w| (1)
wherein w is a 39-dimensional weight vector, the initialized value of which is randomly generated, h represents the combined feature vector of the head entity in the triplet, t represents the combined feature vector of the tail entity in the triplet, r represents the feature vector of the relationship in the triplet:
then, comparing the error distances of the two triples to obtain a prediction result: the knowledge graph model predicts the triples with small error distances as correct triples;
step 9.3: calculating loss value
Calculating an error value based on the predicted result and the actual result, the error value being defined as a loss value; respectively calculating the error distance dis of the correct triple through the formula (1)posDistance of error dis from the wrong tripletnegIf the error distance of the correct triple is smaller than that of the wrong triple, the prediction result of the knowledge graph model is correct, and no loss value is considered to be generated; otherwise, the judgment of the knowledge graph is considered to be wrong, the difference value of the error distances is used as a loss value, and the calculation formula is as follows:
loss=max{dispos-disneg,0} (2)
where loss represents the loss value, disposError distance, dis, representing the correct tripletnegRepresenting the error distance of the erroneous triplet;
step 9.4, automatically adjusting parameters of the knowledge graph model;
repeating the steps 9.1 to 9.4 for each triple sample in the triple data set obtained in the step 8 until the loss value is not reduced any more, finishing training and finishing the construction of the knowledge map model;
after the training of the knowledge graph model is finished, the relationship between a pair of entities is judged by the knowledge graph model, and the judgment method comprises the following steps: inputting the combined feature vector of the pair of entities into a knowledge graph model, and respectively calculating the relationship between the pair of entities and two relations by the knowledge graph model according to a formula (1): and comparing the error distances between similarity and dissimilarity, wherein the small error distance is the relation between the two entities predicted by the knowledge graph model.
8. The landslide risk prediction method based on knowledge graph construction of claim 1 wherein said step 10 further comprises:
obtaining attribute values of corresponding attributes of the to-be-detected place according to the counted attributes of the geographic geological factors in the step 1; the step of determining the probability of occurrence of landslide at the location is as follows:
step 10.1: processing the attribute values of the set of places to be tested according to the processing mode of each attribute value in the step 2 to obtain a feature vector of the new entity at present;
step 10.2: performing feature combination on the feature vector of the new entity by using the feature combination model obtained by training in the step 6 to obtain a combined feature vector of the new entity;
step 10.3: and respectively forming entity pairs by the new entity and all entities in the positive sample set, transmitting the combined characteristic vectors of the entity pairs and all the entities in the positive sample set to a knowledge graph model, giving the relationship between the new entity and all the entities in the positive sample set by the knowledge graph model, counting the number of similar relationships and marking as a, the number of entities in the positive sample as n, and determining the ratio a/n as the probability of landslide of the place to be detected.
9. The system for realizing the landslide risk prediction method based on knowledge graph construction according to claim 1, comprising: a training unit and a risk prediction unit;
the training unit is used for constructing a knowledge graph model, training the model by using the existing historical landslide data and obtaining the knowledge graph model for judging the relationship; and the risk prediction unit calculates and obtains the probability of landslide of the place according to the attribute value of the relevant geographic condition of a new place in the same region by using the model and the historical landslide data obtained by the training unit.
10. The system for implementing the knowledge-graph-based landslide risk prediction method of claim 1, wherein the training unit comprises: the system comprises a data collection module, a data processing and generating module, a feature combination module, a relation feature vector generating module, a triple data set constructing module and a relation judging model module;
the data collection module is used for inputting geographic geological attribute values related to landslide events in all landslide event records in a research area to form a basic attribute data set, and transmitting results to the data processing and generating module;
the data processing and generating module is used for processing data in the basic attribute data set, generating a positive sample set, and recording a data group in the basic attribute data set as a feature vector of a corresponding entity in the positive sample set; respectively and correspondingly generating a new data set corresponding to the landslide event by the data set corresponding to each landslide event; generating a negative sample set, recording the characteristic vectors of the corresponding entities in the negative sample set in each new group of data sets, adding the characteristic vectors into the original basic attribute data set to generate a new basic attribute data set; the positive sample set and the negative sample set are transmitted to a triple data set construction module, and the new basic attribute data set is transmitted to a characteristic combination module;
the characteristic combination module is used for training a model for characteristic combination, then the characteristic combination model is used for carrying out characteristic combination on the characteristic vector of each entity, and the result is transmitted to the relation judgment model module;
the relation characteristic vector generation module is used for generating a characteristic vector of the relation in the knowledge graph, and transmitting the result to the relation judgment model module;
the ternary group data set building module is used for building a ternary group data set, and transmitting the result to the relationship judgment model module;
the relationship judgment model module is used for training a knowledge graph model for judging the relationship between the entities.
CN202010516705.4A 2020-06-09 2020-06-09 Landslide risk prediction method and system based on knowledge graph construction Active CN111639878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010516705.4A CN111639878B (en) 2020-06-09 2020-06-09 Landslide risk prediction method and system based on knowledge graph construction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010516705.4A CN111639878B (en) 2020-06-09 2020-06-09 Landslide risk prediction method and system based on knowledge graph construction

Publications (2)

Publication Number Publication Date
CN111639878A true CN111639878A (en) 2020-09-08
CN111639878B CN111639878B (en) 2023-05-26

Family

ID=72330791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010516705.4A Active CN111639878B (en) 2020-06-09 2020-06-09 Landslide risk prediction method and system based on knowledge graph construction

Country Status (1)

Country Link
CN (1) CN111639878B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200382A (en) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 Training method and device of risk prediction model
CN112328801A (en) * 2020-09-28 2021-02-05 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for predicting group events by event knowledge graph
CN113159451A (en) * 2021-05-13 2021-07-23 长江水利委员会水文局 Long-term prediction method for drainage basin drought and flood events based on event knowledge graph construction
CN113792152A (en) * 2021-08-23 2021-12-14 南京信息工程大学 Method for fusing triangular graph and knowledge graph
CN116611546A (en) * 2023-04-14 2023-08-18 中国科学院空天信息创新研究院 Knowledge-graph-based landslide prediction method and system for target research area

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992672A (en) * 2019-04-11 2019-07-09 华北科技学院 Knowledge mapping construction method based on disaster scene

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992672A (en) * 2019-04-11 2019-07-09 华北科技学院 Knowledge mapping construction method based on disaster scene

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
朱庆;曾浩炜;丁雨淋;谢潇;刘飞;张利国;李海峰;胡翰;张骏骁;陈力;陈琳;张鹏程;何华贵;: "重大滑坡隐患分析方法综述" *
李泽荃;徐淑华;李碧霄;李靖;: "基于知识图谱的灾害场景信息融合技术" *
杜志强;李钰;张叶廷;谭玉琪;赵文豪;: "自然灾害应急知识图谱构建方法研究" *
杨天鸿;王赫;董鑫;刘飞跃;张鹏海;邓文学;: "露天矿边坡稳定性智能评价研究现状、存在问题及对策" *
缪亚敏: "滑坡危险度评价中的负样本采样方法研究" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328801A (en) * 2020-09-28 2021-02-05 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for predicting group events by event knowledge graph
CN112328801B (en) * 2020-09-28 2022-06-14 西南电子技术研究所(中国电子科技集团公司第十研究所) Method for predicting group events by event knowledge graph
CN112200382A (en) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 Training method and device of risk prediction model
CN112200382B (en) * 2020-10-27 2022-11-22 支付宝(杭州)信息技术有限公司 Training method and device for risk prediction model
CN113159451A (en) * 2021-05-13 2021-07-23 长江水利委员会水文局 Long-term prediction method for drainage basin drought and flood events based on event knowledge graph construction
CN113792152A (en) * 2021-08-23 2021-12-14 南京信息工程大学 Method for fusing triangular graph and knowledge graph
CN116611546A (en) * 2023-04-14 2023-08-18 中国科学院空天信息创新研究院 Knowledge-graph-based landslide prediction method and system for target research area
CN116611546B (en) * 2023-04-14 2023-11-10 中国科学院空天信息创新研究院 Knowledge-graph-based landslide prediction method and system for target research area

Also Published As

Publication number Publication date
CN111639878B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN111639878A (en) Landslide risk prediction method and system based on knowledge graph construction
CN110135502B (en) Image fine-grained identification method based on reinforcement learning strategy
CN103473786B (en) Gray level image segmentation method based on multi-objective fuzzy clustering
CN107341497A (en) The unbalanced weighting data streams Ensemble classifier Forecasting Methodology of sampling is risen with reference to selectivity
CN110346831B (en) Intelligent seismic fluid identification method based on random forest algorithm
CN109800954B (en) Reservoir evaluation method based on logging data
CN111832615A (en) Sample expansion method and system based on foreground and background feature fusion
CN104794496A (en) Remote sensing character optimization algorithm for improving mRMR (min-redundancy max-relevance) algorithm
CN112966740A (en) Small sample hyperspectral image classification method based on core sample adaptive expansion
Li et al. Predicting seabed sand content across the Australian margin using machine learning and geostatistical methods
Kamrava et al. Quantifying accuracy of stochastic methods of reconstructing complex materials by deep learning
CN110363299A (en) Space reasoning by cases method towards delamination-terrane of appearing
CN114330841A (en) Seabed sulfide mineralization quantitative prediction method based on machine learning
CN114140448A (en) Tunnel face surrounding rock intelligent grading method and device based on deep learning
CN111144462A (en) Unknown individual identification method and device for radar signals
CN104463207A (en) Knowledge self-encoding network and polarization SAR image terrain classification method thereof
Patel et al. Smart adaptive mesh refinement with NEMoSys
Díaz-González et al. Development and comparison of machine learning models for water multidimensional classification
CN105758403A (en) Comprehensive evaluation method for geomagnetic map suitability based on Vague set fuzzy inference
CN110609327A (en) Carbonate reservoir facies prediction method and device based on pre-stack seismic attributes
CN114402233A (en) Automatic calibration of forward deposition model
Abbas et al. Assessing the Dimensionality Reduction of the Geospatial Dataset Using Principal Component Analysis (PCA) and Its Impact on the Accuracy and Performance of Ensembled and Non-ensembled Algorithms
CN114463175A (en) Mars image super-resolution method based on deep convolution neural network
CN114202551A (en) Grading drawing method and grading drawing device for karst stony desertification
CN111967677A (en) Prediction method and device for unconventional resource dessert distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant