CN113742489A

CN113742489A - Comprehensive influence compensation method based on time sequence knowledge graph

Info

Publication number: CN113742489A
Application number: CN202110894317.4A
Authority: CN
Inventors: 王彬; 李哲辉; 王炜智
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-12-03
Anticipated expiration: 2041-08-05

Abstract

The invention discloses a comprehensive influence compensation method based on a time sequence knowledge graph, which expresses acquired triple knowledge information in the knowledge graph by the connection between nodes, can describe the whole data in a graph network form, and is more convenient for analyzing historical events; step S2, on the constructed knowledge graph, dividing the history into different segments by a time slice dividing method, constructing an adjacent matrix corresponding to the event subnets in different time slices, using the information of the event occurrence time to propose a quadruple event representation form, combining the event occurrence time information, considering that the event network in the knowledge graph dynamically changes along with the time, proposing a time attenuation function, and fitting the attenuation trend of the event correlation influence in the history; in step S3, the time span interval is divided, so as to compensate the historical total influence and obtain more accurate historical comprehensive influence.

Description

Comprehensive influence compensation method based on time sequence knowledge graph

Technical Field

The invention relates to a comprehensive influence compensation method based on a time sequence knowledge graph, and belongs to the technical field of time sequence knowledge graphs.

Background

Knowledge Graph (KG) is a Knowledge system that structurally stores Knowledge in the form of Graph databases, and is essentially a semantic network. Because the knowledge graph has strong expression capability, logical meaning and rules and flexible modeling, the knowledge graph is concerned by researchers and is widely applied to specific applications of multiple industries such as information retrieval, intelligent question-answering systems, recommendation systems and the like.

The expression learning is applied to the knowledge graph, and the object to be described is expressed as a low-dimensional dense vector, namely, the problem of data sparsity can be effectively solved by adopting a distributed expression method, and the calculation in a low-dimensional semantic space is facilitated.

A distributed vector representation model TransE based on entities and relations and knowledge translation models such as TransH, TransR and TransD improved for multiple relations on the basis of the distributed vector representation model TransE, describe static knowledge information in a vector translation and space mapping mode, and in the real world, knowledge is often time-tagged and can change along with time. Therefore, time-series knowledge maps taking into account time factors are beginning to be of interest to researchers, and a knowledge representation of four-tuples (head entity, time, relationship, tail entity) is proposed.

In the prior art [1] (Liu J, Zhang Q, Fu L, et al. Evalving Knowledge Graphs [ C ]// IEEE INFOCOM 2019-IEEE Conference on Computer communications. IEEE,2019.), a time influence model EvolngKG based on a time decay function is provided, and influence of a historical event on a current event is described. In the prior art [2] (Zhan Weiwei. Research of improved event KG Method Based on Comprehensive information Model [ J ]. Application Research of computers,2020,37(S1): 159. 162.) on the basis, different event Influence weights are taken into consideration, and a Comprehensive evaluation Method of Influence is provided for entity prediction tasks.

However, the current time-series knowledge graph reasoning algorithm has the problem that the influence of time span is ignored, namely, the larger the time span from the occurrence of the current event is, the more the number of historical events related to the current event is, the larger the accumulative influence of the historical events on the comprehensive influence is; the influence of the events which occur in the near future and have larger relevance with the current events is weakened due to the small number of the events, so that the evaluation of the comprehensive influence of the historical events is influenced.

Disclosure of Invention

The invention provides a comprehensive influence compensation method based on a time sequence knowledge graph, which is used for obtaining the comprehensive influence of compensated historical events and further can be combined with a training model to perform a link prediction task of the time sequence knowledge graph.

The technical scheme of the invention is as follows: a comprehensive influence compensation method based on a time sequence knowledge graph comprises the following steps:

s1, cleaning the data set, extracting triple knowledge (h, r, T) in the cleaned data set and the time of the event represented by the triple knowledge, dividing the extracted data formed by the triple knowledge (h, r, T) and the time T of the event represented by the triple knowledge into a training set and a testing set, and constructing a knowledge graph of the training set; the relation r in the triple knowledge is used as the relation between the nodes in the knowledge graph; counting all head entities and tail entities in the training set and the test set, and representing the head entities and the tail entities as an entity set (E) after duplication removal₁,E₂,E₃....E_N}; counting all relations in the training set and the test set, and expressing the relations as R ═ R after removing the duplication₁,R₂,R₃....R_M}; wherein E is_NRepresenting an Nth entity, wherein the entity is a head entity/a tail entity, and the total number of the entities is N; r_MRepresenting the Mth relation, wherein the total number of the relations is M;

s2, on the knowledge graph constructed in the step S1, time slices are divided according to the fixed length d on the historical time axis, and the tuple events on the time axis are divided into { G }₁,G₂....G_n}，G_nRepresents the nth event subnet; constructing an adjacency matrix A (G) corresponding to each subnet₁),A(G₂)....A(G_n) Calculating the correlation between the node pairs with the common neighbor nodes through the adjacency matrix and the similarity index, and then fusing time factors to obtain the correlation influence of the fusion time factors; the influence obtained is regarded asThe historical relevance comprehensive influence of the time slice on the current event under the condition determined by the previous event;

and S3, for the time slices after the division, dividing time span intervals according to the span between the time slices and the current time node, and giving different span factors to calculate to obtain the comprehensive influence of the compensated historical events.

Integrating the compensated comprehensive influence of the historical events into a knowledge representation model as weight, and iteratively obtaining vector representation after time factors are integrated with the entity and the relation; and performing a link prediction task on the test set according to the score ranking and the performance index through the vector representation obtained by training.

The S2 specifically includes:

s2.1, on the knowledge graph constructed in the step S1, time slice division is carried out on the knowledge graph spectrum according to the fixed length d on the historical time axis, and the events on the time axis are divided into { G }₁,G₂....G_n}，G_nRepresents the nth event subnet;

s2.2, constructing an adjacency matrix A (G) corresponding to each event subnet₁),A(G₂)....A(G_n)}，A(G_n) Indicating an event subnet G_nThe adjacency matrix of (a);

s2.3, corresponding to each adjacency matrix, counting common neighbor nodes of all node pairs;

s2.4, counting the node degree of each common neighbor node of each node pair, taking the node degree as the important contribution degree of the neighbor node in the indirect connection, and calculating the correlation S between the node pairs according to the importance degrees of all the common neighbor nodes between the node pairs through Adamic-Adar indexes_AB；

S2.5, adding time as fourth element knowledge information into a triple knowledge representation mode, representing the event as a positive quadruple (h, r, T, T), traversing the correlation between the node pairs obtained in the step S2.4 according to the head entity and the tail entity of the current event (A, r, B, T2) of the current event occurring at the current time point, and enabling the S meeting the requirement that the head entity is A and the tail entity is B at the current time point to be S_ABFusing with time attenuation function to obtain the phase of fused time factorAnd the relevance influence SIM (A, B) is used as the historical relevance comprehensive influence of the time slice on the current events which occur at the current time point and have the head entity of A and the tail entity of B.

The time attenuation function f (T1) e^-λ(T2-T1)(ii) a Wherein, T1 represents the time points of the node a and the node B representing the historical events in the knowledge graph, T2 represents the time points of the current event when the head entity is a and the tail entity is B, λ is a decay factor, and the time decay function f (T1) represents the degree of the decay of the influence of the historical events occurring at the time point T1 on the current event.

The S3 specifically includes:

s3.1, carrying out time span interval division on the historical time axis in which the data set is positioned in the step S1 according to an equal-area division method of normal distribution; one or more time slices exist in each time span interval, and each time slice only belongs to any time span interval;

s3.2, under the condition that the current event is determined, counting the historical correlation comprehensive influence of the time slices contained in the time span interval on the current event, and calculating to obtain the comprehensive influence of the time span interval on the current event;

s3.3, endowing different time span intervals with different span factors, and calculating to obtain the comprehensive influence of the historical event on the current event after compensation;

s3.4, integrating the comprehensive influence of the historical events on the current events after compensation as weight into a knowledge representation model, constructing equal-quantity negative quadruples through positive quadruples, training the negative quadruples as model input, and obtaining the vector representation { E after the time factors of the entity and the relation are integrated¹,E²,E³....E^N}，{R¹,R²,R³....R^M}; wherein E is^NAs entity E_NVector representation after fusion of time factors, R^MAs a relation R_MVector representation after fusion of time factors;

s3.5, performing head entity/tail entity replacement on all four tuples in the test set, wherein the replacement modes are the same, and the description is given by the head entity replacement, specifically: replacing a head entity of a quadruple represented by each event in the test set by the statistical N entities to construct N candidate quadruple data, calculating a score in the N candidate quadruple constructed by each event in the test set by a score function, and determining the score ranking of the quadruple in the N candidate quadruple which is the same as the original event in the test set; and judging the effect of the link prediction task through indexes Meanrank and Hits @ according to the score ranking of all events in the statistical test set.

The score function f_r(h,t)＝||E^h+R^r-E^t||_L2，E^hRepresenting the vector of the head entity in the entity set E after fusing the time factor, R^rVector representation after fusion of time factors as a relation, E^tRepresenting the vector of the tail entity in the entity set E after fusing time factors; l2 denotes the norm.

The integrated influence of the time span interval on the current event in step S3.2 is:

wherein l_wFor the combined influence of the w-th time span interval on the current event, q_wFor the number of time slices contained in the w-th time span interval, SIM_iAnd (A, B) represents the historical correlation comprehensive influence of the ith time slice in the time span interval.

In said step S3.3, the W time span intervals are given different span factors

And will obtain_wAccumulating to be used as the comprehensive influence of the historical events on the current events after compensation;

w is 1, 2.. W; w is the total number of time span interval divisions, q_wIs the w timeNumber of corresponding time slices in span interval, l_wThe comprehensive influence of the w-th time span interval on the current event is defined, and l is the comprehensive influence of the historical event compensated on the current event with the head entity A and the tail entity B.

The model training process in step S3.4 is represented as:

wherein S is a positive quadruplet set, S' is a negative quadruplet set, l_posIs the positive quadruple combined influence, f, calculated in S3.3_r(h, t) is the score calculation formula for the positive quadruple, l_negIs the negative quadruple combined influence, f, calculated in S3.3_r(h ', t') is a negative quadruple score calculation formula, gamma is a standardized item, the training process is the process of minimizing the loss function L, and the output of the training is the vector representation of all entities and relations.

The invention has the beneficial effects that: according to the invention, a comprehensive influence compensation model of historical events is designed based on the time sequence knowledge graph, and the model can effectively mine and capture the influence of the historical events on the current situation, so that more accurate knowledge representation can be obtained; on the basis of dividing the historical events into time slices, the invention not only considers the attenuation of the event influence along with the time, but also considers the influence of neighborhood network information in an event subnet on the future, and simultaneously on the basis, the time span interval is divided to compensate the comprehensive influence, thereby being beneficial to obtaining more accurate knowledge representation; the test on a plurality of data sets shows that the method has strong generalization capability and can be combined with a static vector training model to perform a link prediction task of a time sequence knowledge graph.

Specifically, the method comprises the following steps: the acquired triple knowledge information is represented in the knowledge graph by the connection between the nodes, so that the data can be integrally depicted in a graph network form, and the historical events can be more conveniently analyzed; further, in step S2, on the constructed knowledge graph, the history is divided into different segments by a time slice dividing method, and an adjacency matrix is constructed corresponding to the event subnets in different time slices, so that the calculation of the correlation influence of the history events is facilitated, and the analysis of different influences caused in different time slices is facilitated; the event representation form of a quadruple is provided by utilizing the information of the event occurrence time, the time attenuation function is provided by combining the event occurrence time information and considering that the event network in the knowledge graph dynamically changes along with the time, the attenuation trend of the event correlation influence in the history is fitted, and the event development rule is better met; further, for the problem that the influence of events which occur recently in history is large, but the influence is weakened in the calculation process of accumulated historical influence due to small quantity of events, division of a time span interval is provided in step S3, so that the purpose of compensating historical total influence is achieved, more accurate historical comprehensive influence is obtained, the compensated historical comprehensive influence is integrated into a knowledge representation model, entity and relation vector representation which are integrated with time information are obtained through training, link prediction experiments are further performed on test set data, and the link prediction effect is improved in subsequent tasks through indexes.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flowchart of a method of compensating for integrated influence;

FIG. 3 is a flow chart of a training experiment;

FIG. 4 is a flow chart of a prediction experiment.

Detailed Description

The invention will be further described with reference to the following figures and examples, without however restricting the scope of the invention thereto.

Example 1: as shown in fig. 1 to 4, a comprehensive influence compensation method based on a time series knowledge graph includes:

s1, cleaning the time sequence structured event data set (eliminating data with missing information), and extracting triple knowledge (h, r, t) in the cleaned data set and the triple knowledgeDividing a plurality of extracted data formed by the triple knowledge (h, r, T) and the event occurrence time T represented by the triple knowledge into a training set and a testing set, and constructing a knowledge graph of the training set; the relation r in the triple knowledge is used as the relation between the nodes in the knowledge graph; counting all head entities and tail entities in the training set and the test set, and representing the head entities and the tail entities as E ═ E { E } by the set after removing the duplication₁,E₂,E₃....E_N}; counting all relations in the training set and the test set, and expressing the relations as R ═ R by the set after removing the duplication₁,R₂,R₃....R_M}; wherein E is_NRepresenting an Nth entity, wherein the entity is a head entity/a tail entity, and the total number of the entities is N; r_MRepresenting the Mth relation, wherein the total number of the relations is M;

s2, on the knowledge graph constructed in the step S1, time slices are divided according to fixed length d (month, week and the like can be taken) on the historical time axis, and tuple events on the time axis are divided into { G }₁,G₂....G_n}，G_nRepresents the nth event subnet; constructing an adjacency matrix A (G) corresponding to each subnet₁),A(G₂)....A(G_n) Calculating the correlation between the node pairs with the common neighbor nodes through the adjacency matrix and a similarity index (such as an Adamic-Adar index), and then fusing time factors to obtain the correlation influence of the fusion time factors; taking the obtained influence as the historical correlation comprehensive influence of the time slice on the current event under the condition determined by the current event;

and S3, for the time slices after division, dividing time span intervals according to the span with the current time node, and endowing different span factors to obtain the comprehensive influence of the compensated historical events, thereby realizing the compensation of the comprehensive influence of the events occurring in the near future in the history, and weakening the influence of the events far away.

Further, the comprehensive influence of the compensated historical events can be set to be taken as weight to be fused into the knowledge representation model, and the vector representation { E after the entity and the relation are fused with time factors is obtained in an iteration mode¹,E²,E³....E^N}，{R¹,R²,R³....R^M}. And performing a link prediction task on the test set according to performance indexes such as score ranking, hit rate and the like through vector representation obtained by training. The link prediction task is to predict what relationship may occur between two nodes at the current time by analyzing historical correlations of the two nodes.

Further, S2 may specifically be:

s2.3, corresponding to each adjacency matrix, counting common neighbor nodes of all node pairs (the nodes are in the existence form of entities in the knowledge graph, and two nodes can form one node pair);

S2.5, considering the correlation S between the node pairs calculated in the step S2.4_ABIs decaying with time, so the time is taken as the fourth element knowledge information to be added into the triple knowledge representation mode, the event is represented as a positive quadruple (h, r, T, T), and for the current event (A, r, B, T2) which occurs at the current time point T2, the head of the current event is real according to the current eventTraversing the correlation between the node pairs obtained in the step S2.4 to obtain the S with the head entity A and the tail entity B at the current time point_ABAnd fusing (i.e. multiplying) the time attenuation function to obtain the correlation influence SIM (A, B) of the fused time factor, wherein the correlation influence SIM is used as the historical correlation comprehensive influence of the time slice on the current events which occur at the current time point and have the head entity of A and the tail entity of B.

In the step S2.4, the correlation S between the node pairs is calculated through an Adamic-Adar index_AB：

Wherein Γ (A) is a neighbor node set of the node A, Γ (B) is a neighbor node set of the node B, z is a common neighbor node of A and B, and k (z) is node degree information of the common neighbor node z; A. correlation S between two nodes B_ABTaking logarithm of node degree of common neighbor node and calculating reciprocal of the logarithm, S_ABWhich is used to characterize the contribution of the neighbor node to A, B influence of the relevance of the two nodes.

Further, the time decay function f (T1) may be set to e^-λ(T2-T1)(ii) a Wherein, T1 represents the time points of the node a and the node B representing the historical events in the knowledge graph, T2 represents the time points of the current event when the head entity is a and the tail entity is B, λ is a decay factor, and the time decay function f (T1) represents the degree of the decay of the influence of the historical events occurring at the time point T1 on the current event; the time decay function in said step S2.5 is a negative exponential function intended to fit the decay trend of the event influence.

If P { (h, r, T, T2) } is used to represent the probability of the event (h, r, T, T2) occurring, the following condition is satisfied:

if the entity h does not generate a new related event within a period of time, the probability of the occurrence of the event remains unchanged by the end of the period of time; if the entity h has occurred related historical events within a certain time range, the probability of the occurrence of the event is greater than that of the event in the case of R1 (no related event occurs) by the end of the time; if the entity h has occurred related historical events within a certain time range, the more the date of the occurrence of the historical events is close to the current event, the higher the probability of the occurrence of the historical events is when the time is over; if the entity h has occurred related historical events within a certain time range, the greater the number of the related historical events, the greater the probability of the occurrence of the related historical events until the end of the time.

Wherein: if nothing happens in the interval T1 to T2 (T1 ═ T2- Δ T), the probability remains unchanged:

P{(h,r,t,T2}＝P{(h,r,t,T1}

if some are aggregated

The fact represented occurs in the interval from time T1 to T2, the probability satisfies:

R1：

wherein

and

And

is that

Two possibilities of (3).

R2：

Wherein

T2≥T3≥T4≥T1。

R3：

Wherein the content of the first and second substances,

therefore, the occurrence of the historical event has a certain effect on the current event, but the influence of the historical event is reduced continuously as the time after the occurrence of the historical event goes on. Generally, the time influence change of the historical event on the current event can be specifically expressed by a time decay function as follows: f (T1) ═ e^-λ(T2-T1)(ii) a Wherein, T1 represents the time points of the node a and the node B representing the historical event in the knowledge graph, T2 represents the time points of the current event when the head entity is a and the tail entity is B, λ is a decay factor, and a value is 0.01, and a time decay function f (T1) represents the degree of attenuation of the influence of the historical event occurring at the time point T1 on the current event after the current event target is determined (i.e., the head entity is a and the tail entity is B of the current event is determined).

Further, S3 may specifically be:

s3.1, dividing a time span interval of a historical time axis in which the time sequence structured event data set is positioned in the step S1 according to an equal-area division method of normal distribution; one or more time slices exist in each time span interval, and each time slice only belongs to any time span interval;

s3.3, endowing different time span intervals with different span factors, and calculating to obtain the comprehensive influence of the compensated historical events;

s3.4, taking the comprehensive influence of the compensated historical events as weight, and integrating the weight into a knowledge representation model (such as a knowledge table)The representation model can be a TransE model), and meanwhile, equal-number negative quadruplets are constructed through positive quadruplets and are used as model input for training, so that vector representation { E ] after the entity and the relation are fused with time factors is obtained¹,E²,E³....E^N}，{R¹,R²,R³....R^M}; wherein E is^NAs entity E_NVector representation after fusion of time factors, E¹As entity E₁Vector representation after fusion of time factors, R^MAs a relation R_MVector representation after fusion of time factors;

the true data set obtained in the foregoing step S1 is used to construct a positive quadruple, and the negative quadruple constructed in this step is a non-true data set.

S3.5, performing head entity/tail entity replacement on all four tuples in the test set, wherein the replacement modes are the same, and the description is given by the head entity replacement, specifically: replacing a head entity of a quadruple represented by each event in the test set by the statistical N entities to construct N candidate quadruple data, calculating a score in the N candidate quadruple constructed by each event in the test set by a score function, and determining the score ranking of the quadruple in the N candidate quadruple which is the same as the original event in the test set; and judging the effect of the link prediction task through indexes Meanrank and Hits @ according to the score ranking of all events in the statistical test set. For example, when N is counted to have 10000 (different entity numbers are counted), each event in the test set data is replaced, and then becomes 10000 candidate quadruple data, where the 10000 quadruple data includes a quadruple data that is the same as the replaced event in the test set data. Each event in the test set data does so.

Further, the score function f may be set_r(h,t)＝||E^h+R^r-E^t||_L2，E^hRepresenting the vector of the head entity in the entity set E after fusing the time factor, R^rVector representation after fusion of time factors as a relation, E^tRepresenting the vector of the tail entity in the entity set E after fusing time factors; l2 denotesAnd (4) norm.

Further, the dividing in step S3.1 may be performed according to an equal area method with normal distribution, according to an area integral formula, such as:

dividing a historical time axis into a plurality of time span intervals; wherein t is₁Is a time starting point, t, of a certain time span interval₂Is the time end of a certain time span interval.

Further, the statistical process of the integrated influence of the time span interval on the current event in the step S3.2 may be set as:

wherein l_wFor the integrated influence of the w-th time span interval on the current event (i.e. the historical correlation integrated influence of the time slices accumulated for the w-th time span interval), q_wFor the number of time slices contained in the w-th time span interval, SIM_iAnd (A, B) represents the historical correlation comprehensive influence of the ith time slice.

Further, it may be arranged that in said step S3.3, the W time span intervals are given different span factors

w is 1, 2.. W; w is the total number of time span interval divisions, q_wThe number of corresponding time slices in the w-th time span interval is, and l is the compensated comprehensive influence of the historical event on the current event with the head entity A and the tail entity B.

Further, it may be set that the model training process in step S3.4 may be expressed as:

wherein S is a positive quadruplet set, S' is a negative quadruplet set, l_posIs the positive quadruple combined influence, f, calculated in S3.3_r(h,t)＝||E^h+R^r-E^t||_L2Formula for score calculation of positive quadruples,/_negIs the negative quadruple combined influence, f, calculated in S3.3_r(h',t')＝||E^h'+R^r-E^t'||_L2The method is a negative quadruple score calculation formula, gamma is a standardized item, 1.0 is taken, the training process is the process of minimizing a loss function L, and the output of the training is vector representation of all entities and relations; e^h、E^h'is vector representation of head entities h and h' in an entity set E after fusion of time factors, R^rVector representation after fusion of time factors as a relation, E^t、E^t'is vector representation after the tail entities t and t' in the entity set E are fused with time factors, and subscript + represents that the value inside brackets and 0 are taken as the maximum value;

example 2: taking the data ICEWS2014 and ICEWS2017 of the comprehensive crisis early warning system as examples, the time sequence knowledge graph link prediction is carried out, and the first table of the experimental data attribute statistics is shown.

Table-statistical table of ICEWS attribute of experimental data

A comprehensive influence compensation method based on a time sequence knowledge graph comprises the following steps:

s1, cleaning the data set of the comprehensive crisis early warning system, extracting triple knowledge (h, r, T) in the cleaned data set and the time of the event represented by the triple knowledge (the extraction number of the triple knowledge (h, r, T) and the time of the event represented by the triple knowledge is selected according to actual needs), and extracting a plurality of the extracted triple knowledgeDividing data formed by tuple knowledge (h, r, T) and the time T of occurrence of an event represented by the tuple knowledge into a training set and a testing set, and constructing a knowledge graph on the training set; the relation r in the triple knowledge is used as the relation between the nodes in the knowledge graph; counting all head entities and tail entities in the training set and the test set, and representing the head entities and the tail entities as E ═ E { E } by the set after removing the duplication₁,E₂,E₃....E_N}; counting all relations in the training set and the test set, and expressing the relations as R ═ R by the set after removing the duplication₁,R₂,R₃....R_M}; wherein E is_NRepresenting an Nth entity, wherein the entity is a head entity/a tail entity, and the total number of the entities is N; r_MRepresenting the Mth relation, wherein the total number of the relations is M;

s2, dividing the constructed knowledge graph into time slices (in the example, month) according to fixed length d on historical time axes 2014-1-1 to 2014-12-31 and 2017-1-1 to 2017-12-31, and dividing tuple events on the time axes into { G₁,G₂....G_nN is 12 event subnets, and an adjacency matrix { a (G) is constructed corresponding to each subnet₁),A(G₂)....A(G_n) Calculating the correlation between node pairs with common neighbor nodes through an adjacency matrix and an adaptive-Adar index, fusing time attenuation, and obtaining the influence of a historical event after time attenuation as the comprehensive influence generated by the time slice on the event (A, B) at the current time point;

s3, for the time slices after being divided, time span intervals are divided according to the span of the time slices and the current time node, different span factors are given, the comprehensive influence of recent events in the history is compensated, the weight is integrated into a vector representation model, and the vector representation { E after the entity and the relation are integrated with the time factors is obtained in an iterative mode¹,E²,E³....E^N}，{R¹,R²,R³....R^M}. Performing link prediction on test set data according to score ranking and hit rate through vector representation obtained by trainingAnd (5) transaction.

The specific method for acquiring the comprehensive influence in the step S2 is as follows:

s2.1, on the knowledge graph constructed in the step S1, time slice division is carried out on the historical time axis according to the fixed length d to divide the tuple events on the time axis into { G }₁,G₂....G_nN event subnets;

s2.2, constructing an adjacency matrix A (G) corresponding to each event subnet₁),A(G₂)....A(G_n)}；

S2.3, traversing all entities corresponding to each adjacency matrix, and counting all nodes which have common neighbors with the entities to obtain a common neighbor set of each node pair;

s2.4, counting the node degrees of the common neighbors of each node pair, taking the node degrees as the important contribution degrees of the neighbor nodes in the indirect connection, and calculating the relevance S between the two node pairs according to the importance degrees of all the common neighbor nodes between the node pairs through Adamic-Adar indexes_AB；

S2.5, considering the correlation S between the node pairs calculated in the step S2.4_ABThe time is attenuated along with the time, so that the time is taken as fourth element knowledge information and added into a triple knowledge representation mode, the event is represented as a positive quadruple (h, r, T, T), for a current event (A, r, B, T2) occurring at a current time point T2, according to a head entity and a tail entity of the current event, the correlation between the node pairs obtained through the step S2.4 is traversed, and S meeting the condition that the head entity is A and the tail entity is B at the current time point_ABAnd fusing (i.e. multiplying) the time attenuation function to obtain the correlation influence SIM (A, B) of the fused time factor, wherein the correlation influence SIM is used as the historical correlation comprehensive influence of the time slice on the current events which occur at the current time point and have the head entity of A and the tail entity of B.

The step S3 specifically includes:

s3.1, dividing the time span interval of the historical time axis according to an equal-area division method of normal distribution; according to the area integral formula:

the historical time axis is divided into a plurality of time span intervals (3 in the embodiment). The obtained intervals are divided, so that the total influence of events occurring at different historical times on the current event is relatively balanced in quantity and time; satisfies P (t)₁≤T1≤t₂)≈P(t₂≤T1≤t₃)≈…≈P(t_W≤T1≤t_W+1) Probability obtained according to a probability density formula of normal distribution; according to the number of the determined time span intervals, equal-area division is carried out, and the divided intervals correspond to the time axis of the data;

s3.2, endowing different span factors to different time span intervals, counting time slices contained in the time span intervals, and calculating to obtain the comprehensive influence of the time span intervals on the current event;

s3.3, distributing different time span factors to the comprehensive influence force obtained in different time span intervals for an accumulation summation method to obtain the comprehensive influence force of the compensated historical event;

s3.4, taking the comprehensive influence of the compensated historical events as weight, integrating the weight into a knowledge representation model (for example, the knowledge representation model can be a TransE model), simultaneously constructing equal number of negative sample quadruples through positive sample quadruples, taking the negative sample quadruples as model input for training, and obtaining vector representation { E after the entity and the relation are integrated with time factors¹,E²,E³....E^N}，{R¹,R²,R³....R^M}; wherein E is^NAs entity E_NVector representation after fusion of time factors, E¹As entity E₁Vector representation after fusion of time factors, R^MAs a relation R_MVector representation after fusion of time factors;

the true data set obtained in the foregoing step S1 is used to construct a positive quadruple, and the negative quadruple constructed in this step is a non-true data set. The loss function is defined as:

and S3.5, replacing the head entity or the tail entity with all four-tuple in the test set data, constructing a plurality of candidate four-tuple data, performing score ranking, and judging the effect of the link prediction task through indexes Meanrank and Hits @. The candidate quadruplet is constructed by replacing head entities or tail entities of all quadruplets in the test set data one by one, the replaced data is all entities in the entity set E, and the constructed candidate quadruplet comprises the original quadruplet data; passing the candidate quadruple through a scoring function f in turn_r(h,t)＝||E^h+R^r-E^t||_L2Obtaining the error values of the quadruple, ranking the error values of all the quadruple, and counting the ranking of the original quadruple; and averaging the ranks of all the data in the test set to obtain the value of an index mean, and counting the proportion of the data ranked in the first, the first ten and the first fifty to obtain the value of an index Hits @ to judge the effect of the link prediction task.

As tables two to seven show the link prediction effect of the invention on the real world comprehensive crisis early warning data sets ICEWS2014 and ICEWS2017, the Trans series is a traditional method (time attenuation is not considered), as more multiple relations exist in the data set used by the invention, and the spatial calculation is optimized by the TransD algorithm aiming at the multiple relations, the Hits @50 result of the invention on the ICEWS2014 data set is slightly higher than that of the invention, but other index results of the TransD are not as good as that of the invention. For the MenaRank index, the results of the algorithm on the two data sets of ICEWS2014 and ICEWS2017 are optimized compared with the traditional Trans algorithm. For Hits @ index, the effect of the method of the present invention on the ICEWS2014 and ICEWS2017 data sets is not much different (for example, the TranH difference is large), the method of the present invention has improvements on different data sets, and is superior to other methods, and compared to other methods, the method of the present invention has a better generalization ability for different data sets.

On the basis of considering time attenuation, compared with two methods, namely, evlovingg (namely, prior art 1 referred to in the background) and evlovingg _ weight (namely, prior art 2 referred to in the background), in experimental results, the method disclosed by the invention is in two ways, namely, ICEWS2014 and ICEWS2017On the data set, for the MeanRank index, the results of head entity prediction have mean values reduced by 73.6% and 59.2% (taking EvlovingKG as an example, the mean value refers to

Other similar reasons). The mean values of the results of tail entity prediction were reduced by 74.8% and 60.2%. The indexes of Hits @1, Hits @10 and Hits @50 are improved by 113.7% on two data sets of ICEWS2014 and ICEWS2017 compared with the average value of head entity prediction results of Evervingg _ weight (namely, the indexes are improved by 113.7% (namely, the indexes are improved by the average value of head entity prediction results of Evervingg _ weight)

) 51.2 percent and 33.2 percent, and the average value of the prediction results of the tail entities is improved by 57.6 percent, 23.7 percent and 44.4 percent.

Table two: ICEWS2014 data set head-to-tail entity link prediction Meanrank result comparison

Method	Head entity meanank	Tail entity meanank
			TransE	6583	6144
TransH	4527	5386
			TransD	1434	1397
TransR	8127	7847
			EvlovingKG	6154	6397
EvlovingKG_weight	4123	4104
			The invention	1347	1325

Table three: ICEWS2017 data set head-tail entity link prediction Meanrank result comparison

Method	Head entity meanank	Tail entity meanank
			TransE	6199	6324
TransH	7919	8314
			TransD	2206	2107
TransR	9815	8173
			EvlovingKG	6361	6319
EvlovingKG_weight	3971	3947
			The invention	1951	1878

Table four: ICEWS2014 data head entity link prediction Hits @1, Hits @10 and Hits @50 result comparison

Method	Hits@1	Hits@10	Hits@50
				TransE	0.97	5.56	13.96
TransH	2.08	12.62	23.11
				TransD	0.2	18.62	38.55
TransR	0.67	1.21	2.78
				EvlovingKG	1.16	2.8	4.45
EvlovingKG_weight	1.33	12.49	27.72
				The invention	3.53	19.42	35.39

Table five: ICEWS2014 data set tail entity link prediction Hits @1, Hits @10 and Hits @50 result comparison

Method	Hits@1	Hits@10	Hits@50
				TransE	0.79	4.47	13.58
TransH	2.31	13.2	21.3
				TransD	0.45	16.37	32.42
TransR	0.78	1.4	3.32
				EvlovingKG	1.54	5.61	7.67
EvlovingKG_weight	1.83	13.9	24.51
				The invention	2.86	17.1	33.38

Table six: ICEWS2017 data head entity link prediction Hits @1, Hits @10 and Hits @50 result comparison

Method	Hits@1	Hits@10	Hits@50
				TransE	1.07	6.64	14.5
TransH	0.14	0.39	0.97
				TransD	1.31	11.62	25.45
TransR	0.15	0.3	0.93
				EvlovingKG	0.16	0.89	1.5
EvlovingKG_weight	1.44	9.83	21.5
				The invention	2.39	14.33	30.17

TABLE VII: ICEWS2017 dataset end entity link prediction Hits @1, Hits @10 and Hits @50 result comparison

Method	Hits@1	Hits@10	Hits@50
				TransE	1.28	5.49	13.19
TransH	0.37	0.7	1.82
				TransD	1.6	12.46	26.63
TransR	0.14	0.51	1.37
				EvlovingKG	0.67	1.17	2.8
EvlovingKG_weight	1.5	10.9	20.69
				The invention	2.39	15.4	31.9

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A comprehensive influence compensation method based on a time sequence knowledge graph is characterized by comprising the following steps: the method comprises the following steps:

s2, on the knowledge graph constructed in the step S1, time slices are divided according to the fixed length d on the historical time axis, and the tuple events on the time axis are divided into { G }₁,G₂....G_n}，G_nRepresents the nth event subnet; constructing an adjacency matrix A (G) corresponding to each subnet₁),A(G₂)....A(G_n) Calculating the correlation between the node pairs with the common neighbor nodes through the adjacency matrix and the similarity index, and then fusing time factors to obtain the correlation influence of the fusion time factors; taking the obtained influence as the historical correlation comprehensive influence of the time slice on the current event under the condition determined by the current event;

2. The time series knowledge graph-based synthetic influence compensation method according to claim 1, wherein: integrating the compensated comprehensive influence of the historical events into a knowledge representation model as weight, and iteratively obtaining vector representation after time factors are integrated with the entity and the relation; and performing a link prediction task on the test set according to the score ranking and the performance index through the vector representation obtained by training.

3. The time series knowledge-graph-based synthetic influence compensation method according to claim 1 or 2, wherein: the S2 specifically includes:

S2.5, adding time as fourth element knowledge information into a triple knowledge representation mode, representing the event as a positive quadruple (h, r, T, T), traversing the section obtained in the step S2.4 according to the head entity and the tail entity of the current event (A, r, B, T2) occurring at the current time pointThe correlation between the point pairs meets the requirement of S with a head entity of A and a tail entity of B at the current time point_ABAnd fusing with a time attenuation function to obtain the correlation influence SIM (A, B) fusing time factors, and taking the correlation influence SIM (A, B) as the historical correlation comprehensive influence of the time slice on the current events of which the head entity is A and the tail entity is B and which occur at the current time point.

4. The time series knowledge graph-based synthetic influence compensation method according to claim 3, wherein: the time attenuation function f (T1) e^-λ(T2-T1)(ii) a Wherein, T1 represents the time points of the node a and the node B representing the historical events in the knowledge graph, T2 represents the time points of the current event when the head entity is a and the tail entity is B, λ is a decay factor, and the time decay function f (T1) represents the degree of the decay of the influence of the historical events occurring at the time point T1 on the current event.

5. The time series knowledge-graph-based synthetic influence compensation method according to claim 1 or 2, wherein: the S3 specifically includes:

s3.4, integrating the comprehensive influence of the historical events on the current events after compensation as weight into a knowledge representation model, constructing equal-number negative quadruples through positive quadruples, and training the negative quadruples as model input to obtain the final productVector representation after fusing time factors to entities and relationships E¹,E²,E³....E^N}，{R¹,R²,R³....R^M}; wherein E is^NAs entity E_NVector representation after fusion of time factors, R^MAs a relation R_MVector representation after fusion of time factors;

6. The time series knowledge graph-based synthetic influence compensation method according to claim 5, wherein: the score function

E^hRepresenting the vector of the head entity in the entity set E after fusing the time factor, R^rVector representation after fusion of time factors as a relation, E^tRepresenting the vector of the tail entity in the entity set E after fusing time factors; l2 denotes the norm.

7. The time series knowledge graph-based synthetic influence compensation method according to claim 5, wherein: the integrated influence of the time span interval on the current event in step S3.2 is:

wherein l_wFor the w-th time span interval for the summary of the current eventResultant influence q_wFor the number of time slices contained in the w-th time span interval, SIM_iAnd (A, B) represents the historical correlation comprehensive influence of the ith time slice in the time span interval.

8. The time series knowledge graph-based synthetic influence compensation method according to claim 5, wherein: in said step S3.3, the W time span intervals are given different span factors

w is 1, 2.. W; w is the total number of time span interval divisions, q_wIs the corresponding time slice number, l, in the w time span interval_wThe comprehensive influence of the w-th time span interval on the current event is defined, and l is the comprehensive influence of the historical event compensated on the current event with the head entity A and the tail entity B.

9. The time series knowledge graph-based synthetic influence compensation method according to claim 5, wherein: the model training process in step S3.4 is represented as:

wherein S is a positive quadruplet set, S' is a negative quadruplet set, l_posIs the positive quadruple combined influence, f, calculated in S3.3_r(h, t) is the score calculation formula for the positive quadruple, l_negIs the negative quadruple combined influence, f, calculated in S3.3_r(h ', t') is a score calculation formula of negative quadrupleAnd gamma is a standardized item, the training process is the process of minimizing the loss function L, and the output of the training is the vector representation of all entities and relations.