CN112860918B - Sequential knowledge graph representation learning method based on collaborative evolution modeling - Google Patents

Sequential knowledge graph representation learning method based on collaborative evolution modeling Download PDF

Info

Publication number
CN112860918B
CN112860918B CN202110305818.4A CN202110305818A CN112860918B CN 112860918 B CN112860918 B CN 112860918B CN 202110305818 A CN202110305818 A CN 202110305818A CN 112860918 B CN112860918 B CN 112860918B
Authority
CN
China
Prior art keywords
social
representing
entity
knowledge graph
fact
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110305818.4A
Other languages
Chinese (zh)
Other versions
CN112860918A (en
Inventor
张嘉昇
梁爽
邵杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Artificial Intelligence Research Institute Yibin
Original Assignee
Sichuan Artificial Intelligence Research Institute Yibin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Artificial Intelligence Research Institute Yibin filed Critical Sichuan Artificial Intelligence Research Institute Yibin
Priority to CN202110305818.4A priority Critical patent/CN112860918B/en
Publication of CN112860918A publication Critical patent/CN112860918A/en
Application granted granted Critical
Publication of CN112860918B publication Critical patent/CN112860918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a sequential knowledge graph representation learning method based on collaborative evolution modeling, which belongs to the technical field of sequential knowledge graphs, and initializes the parameters of a model and the embedded representation of any entity and relationship according to the sequential knowledge graph to be represented; calculating to obtain the occurrence probability of each known fact, and obtaining the evolution loss of the local structure by maximizing the occurrence probability of the known facts; calculating the corresponding soft modularity for the graph structure of each time sequence knowledge graph snapshot, and maximizing the soft modularity to obtain the evolution loss of the global structure; calculating to obtain an integral loss function of the model; and iteratively optimizing the overall loss function of the model by using a gradient descent method until the model converges. The invention solves the problem that the accurate embedded representation cannot be obtained because the evolution essence of the time sequence knowledge graph is ignored in the past work.

Description

Sequential knowledge graph representation learning method based on collaborative evolution modeling
Technical Field
The invention belongs to the technical field of time sequence knowledge maps, and particularly relates to a time sequence knowledge map representation learning method based on collaborative evolution modeling.
Background
Knowledge graph is a knowledge base system with semantic attributes, and is widely used for storage and management of structured data in various fields, such as dynamic social interaction. The knowledge graph can be represented as a heterogeneous directed graph, where nodes represent entities and concepts in the real world and directed edges with labels represent relationships between them. Although many knowledge graph representation learning methods are proposed at present, the dynamics of knowledge graphs are rarely considered by the knowledge graph representation learning methods, especially the evolution essence of the knowledge graph is ignored, and the update iteration of knowledge is reflected on the knowledge graph and is represented by the appearance and disappearance of entities or the establishment and the removal of relationships, so that the knowledge graph has time-varying property and evolution. Existing work ignores the temporal nature of knowledge-graphs, making the embedded representations they learn inaccurate and unreasonable.
In recent years, some work has attempted to learn embedded representations for such time-varying knowledge-graphs, also known as chronology-knowledge-graph representation learning, which includes mainly four types of methods. A time-series relationship dependency-based approach that aims to incorporate time information by constraining the objective order of occurrence between relationships; a temporal hyperplane-based approach that learns the embedded representation at each time separately by mapping the knowledge at different times onto different hyperplanes; a method of embedding on a duration-based entity that treats an embedded representation of the entity as a time-dependent non-linear function; a tensor decomposition-based approach that learns an embedded representation of a temporal knowledge graph using a low-rank decomposition of adjacency matrices.
However, the above works either learn the embedded representation for each time instant independently, ignoring the evolutionary nature of the time-series knowledge graph; or the evolution essence is simplified into the nonlinear dynamics of the entity, and the detailed evolution mechanism of the time sequence knowledge graph cannot be reflected. In fact, from a local structural point of view, as time progresses, relationships are continuously established or released between entities, and thus evolution of the time sequence knowledge graph is driven. From the perspective of the global structure, a large number of relationships are established and released to jointly form a slow evolution process of a community structure in a time sequence knowledge graph, meanwhile, local structure evolution and global structure evolution are not independent, the local structure evolution is an internal mechanism of the global structure evolution, the global structure evolution is an external driving factor of the local structure evolution, and the more accurate time sequence knowledge graph embedding expression can be learned by considering the collaborative evolution process of the local structure and the global structure, so that the point is not considered in the prior art.
Disclosure of Invention
Aiming at the defects in the prior art, the sequential knowledge graph representation learning method based on collaborative evolution modeling has the innovative points that the evolution process of sequential knowledge is modeled from two angles of a local structure and a global structure at the same time, and a new soft modularity is provided for measuring the community structure.
In order to achieve the above purpose, the invention adopts the technical scheme that:
the scheme provides a sequential knowledge graph representation learning method based on collaborative evolution modeling, which comprises the following steps:
s1, initializing parameters of a model and embedded representation of any entity and relationship according to a time sequence knowledge graph to be represented;
s2, inputting the known facts of the time sequence knowledge graph to calculate the occurrence probability of each known fact according to the sequence of the corresponding time stamps of the facts in the time sequence knowledge graph, and obtaining the evolution loss of the local structure by maximizing the occurrence probability of the known facts;
s3, inputting time sequence knowledge graph snapshots of the time sequence knowledge graph under each time stamp in a time sequence, calculating corresponding soft modularity for the graph structure of each time sequence knowledge graph snapshot, and maximizing the soft modularity to obtain the evolution loss of the overall structure;
s4, calculating to obtain an overall loss function of the model according to the evolution loss of the local structure and the evolution loss of the global structure;
s5, iteratively optimizing the overall loss function of the model by using a gradient descent method, and updating parameters of the model and embedded expressions of entities and relations;
and S6, judging whether the model is converged, if so, obtaining the final entity and relationship embedded representation, finishing the learning of the time sequence knowledge graph representation, and otherwise, returning to the step S1.
The beneficial effects of the invention are: the invention designs a novel sequential knowledge graph representation learning method based on co-evolution, which can model the evolution process of knowledge from two aspects of local evolution and global evolution and capture the internal mechanism of knowledge evolution, thereby learning more accurate representation vectors to improve the performance of downstream tasks such as event prediction and the like. Compared with the prior method, the method provided by the invention has higher operation efficiency and can adapt to the online environment of streaming data.
Further, the step S1 initializes the embedded representation u τ of any entity e under the timestamp τ e The expression of (a) is as follows:
Figure BDA0002987658900000031
wherein, theta e 、ω e And v e All represent directions specific to the current entityAmount (v).
The beneficial effects of the above further scheme are: different policy evolution modes of different entities can be fully considered, such as: periodic evolution strategies, non-periodic evolution trends and static attributes.
Still further, the step S2 includes the steps of:
s201, inputting the known facts of the current time sequence knowledge graph according to the sequence of the corresponding time stamps tau of the facts in the time sequence knowledge graph, and calculating the spontaneous occurrence intensity of the facts according to the participants of any known fact (S, r, o and tau)
Figure BDA0002987658900000032
Wherein the participants are entities s, o and relations r contained in the known facts;
s202, utilization occurs in tau i The historical fact of the moment plays the role of the excitation of the current dynamic fact
Figure BDA0002987658900000033
The method is divided into two parts:
Figure BDA0002987658900000041
Figure BDA0002987658900000042
Figure BDA0002987658900000043
Figure BDA0002987658900000044
wherein eta is s,ri ) And η o,ri ) Respectively representing the head entity s and the tail entity o in the current dynamic fact at tau i The effect of the historical fact of the time of day on the current dynamic fact,
Figure BDA0002987658900000045
is expressed at tau i The set of relationships that entity e has at the time,
Figure BDA0002987658900000046
the attention of the relationship level is indicated,
Figure BDA0002987658900000047
and Z r An embedded representation of the relationship in the representation history fact,
Figure BDA0002987658900000048
representing the relation contained in the historical event, V representing a parameter matrix for measuring the similarity between relation vectors, and h representing tau i An entity, beta, in a temporal relationship with the entity e h,x Which is indicative of the attention of the entity,
Figure BDA0002987658900000049
denotes h is at τ i The vector representation at a time instant is,
Figure BDA00029876589000000410
denotes x is at τ i Vector representation at time, x represents entity e having relationship in current dynamic fact, r' represents tau i At any moment, one of the relations of the entity e is shown, h' represents one of h specific,
Figure BDA00029876589000000411
denotes τ i A set of entities having a relationship with entity e at the time,
Figure BDA00029876589000000412
denotes h' at τ i Vector representation at time;
s203, strength of spontaneous occurrence based on the fact
Figure BDA00029876589000000413
And the excitation of current dynamic facts
Figure BDA00029876589000000414
Dividing the two parts, and calculating the occurrence intensity of the known fact (s, r, o, tau)
Figure BDA00029876589000000415
S204, intensity of occurrence of (S, r, o, τ) according to the known fact
Figure BDA00029876589000000416
Calculating the probability p (s, r, o | I (tau)) of each known fact;
s205, according to the occurrence probability of each known fact, calculating by maximizing the occurrence probability of the fact to obtain the evolution loss L of the local structure local
Figure BDA00029876589000000417
Where I (τ) represents the set of historical event components before the time instant τ.
The beneficial effects of the further scheme are as follows: adaptive importance weighting is applied to different historical events to flexibly account for different effects of different historical facts on the current fact.
Further, the intensity of this fact spontaneously occurring in the step S201
Figure BDA0002987658900000051
The expression of (a) is as follows:
Figure BDA0002987658900000052
wherein the content of the first and second substances,
Figure BDA0002987658900000053
and
Figure BDA0002987658900000054
respectively representing the embedded representation of the head entity s and the tail entity o in a fact under a time stamp, Z r And (5) representing the embedded representation corresponding to the relation r, and w representing a learning parameter matrix for measuring the similarity between the vectors.
The beneficial effects of the further scheme are as follows: the spontaneous fact at any moment can be effectively identified.
Still further, the occurrence intensity of the fact (S, r, o, τ) is known in the step S203
Figure BDA0002987658900000055
The expression of (a) is as follows:
Figure BDA0002987658900000056
Figure BDA0002987658900000057
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002987658900000058
representing the original, factually occurring intensity, theta representing the hyper-parameter,
Figure BDA0002987658900000059
represents the excitation effect of the historical fact on the current fact, tau represents the occurrence time of the current fact, tau i Representing the time of occurrence of the historical event, k (τ - τ) i ) Representing a time decay function.
The beneficial effects of the further scheme are as follows: the influence of the spontaneous intensity and the historical fact of the fact on the fact is considered at the same time, and the occurrence intensity of the fact can be fully modeled.
Still further, the expression of the probability p (S, r, o | I (τ)) of occurrence of each known fact in the step S204 is as follows:
Figure BDA00029876589000000510
wherein the content of the first and second substances,
Figure BDA00029876589000000511
indicating the occurrence intensity of the candidate fact (e, r, o, τ),
Figure BDA00029876589000000512
representing the occurrence strength of the candidate facts (s, r, e, tau), e representing any entity in the entity set, epsilon representing the entity set of a time-series knowledge graph, I (tau) representing the set of historical events before the time tau, s representing the head entity contained in the current fact, r representing the relation contained in the current fact, and o representing the tail entity contained in the current fact.
The beneficial effects of the above further scheme are: the probability of occurrence of valid facts is substantially maximized.
Still further, the step S3 includes the steps of:
s301, inputting the time sequence knowledge graph snapshot of the time sequence knowledge graph under each time stamp in time sequence, and calculating to obtain the connection strength between the two entities
Figure BDA0002987658900000061
S302, according to the connection strength
Figure BDA0002987658900000062
Calculating to obtain a soft modularity corresponding to the graph structure of each time sequence knowledge graph snapshot, wherein each element in the soft modularity
Figure BDA0002987658900000063
The expression of (a) is as follows:
Figure BDA0002987658900000064
wherein the content of the first and second substances,
Figure BDA0002987658900000065
and
Figure BDA0002987658900000066
respectively representing the degree of entity i and entity j at the time stamp τ, m τ Representing the total number of relations existing in the time sequence knowledge graph under the tau time stamp;
s303, calculating to obtain a community distribution vector of each entity
Figure BDA0002987658900000067
S304, according to the community distribution vector of each entity, maximizing the soft modularity to obtain the evolution loss L of the global structure global
The beneficial effects of the further scheme are as follows: the dynamics and the heterogeneity of the time sequence knowledge graph can be fully considered.
Still further, the connection strength between the two entities in the step S301
Figure BDA0002987658900000068
The expression of (c) is as follows:
Figure BDA0002987658900000069
wherein r represents a set
Figure BDA00029876589000000610
In the above-mentioned relation, the relation of any one of,
Figure BDA00029876589000000611
representing the set of relationships, Z, existing between entity i and entity j under the time stamp of τ r A vector representing the relation r, a parameter vector for measuring the connection strength of different relations,
Figure BDA00029876589000000612
representing a non-linear activation function.
The beneficial effects of the above further scheme are: different connection strengths between entities brought by different relationships can be flexibly considered.
Still further, the community allocation vector of each entity in the step S303
Figure BDA0002987658900000071
The expression of (a) is as follows:
Figure BDA0002987658900000072
wherein F represents a parameter matrix for mapping the embedded representation of the entity to a community allocation vector of the entity,
Figure BDA0002987658900000073
the embedded representation of the representation entity i under the time stamp tau,
Figure BDA0002987658900000074
and representing the embedded representation corresponding to the community to which the entity i belongs in the last timestamp.
The beneficial effects of the further scheme are as follows: the community division of the entity can be calculated based on the topological structure of the time sequence knowledge graph under the current timestamp and the slow evolution characteristic of the community.
Still further, the evolution loss L of the global structure in the step S304 global The expression of (a) is as follows:
Figure BDA0002987658900000075
wherein T represents a transposed symbol, m τ Representing the total number of relations existing in the time-sequence knowledge graph at the time stamp tau, tr (-) representing the trace of the matrix, H τ Represents the community allocation matrix at the timestamp tau,
Figure BDA0002987658900000076
representing a soft block degree matrix, norm (·) representing a two-norm regularization, H τ Representing the community assignment matrix at the timestamp τ.
The beneficial effects of the further scheme are as follows: the method can simplify the maximization process of the soft modularity and accelerate the convergence of the model.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flowchart of a method applied to a dynamic social network.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined by the appended claims, and all changes that can be made by the invention using the inventive concept are intended to be protected.
Examples
As shown in fig. 1, the invention provides a sequential knowledge graph representation learning method based on collaborative evolution modeling, which is implemented as follows:
s1, initializing parameters of a model and embedded representation of any entity and relationship according to a time sequence knowledge graph to be represented;
in this embodiment, an embedded representation of any entity e under a timestamp τ is initialized
Figure BDA0002987658900000081
The expression of (c) is as follows:
Figure BDA0002987658900000082
wherein, theta e 、ω e And v e Represent vectors specific to the current entity.
S2, inputting the known facts of the time sequence knowledge graph to calculate the occurrence probability of each known fact according to the sequence of the corresponding time stamps of the facts in the time sequence knowledge graph, and obtaining the evolution loss of the local structure by maximizing the occurrence probability of the known facts, wherein the implementation method comprises the following steps:
s201, inputting the known facts of the current time sequence knowledge graph according to the sequence of the corresponding time stamps tau of the facts in the time sequence knowledge graph, and calculating the spontaneous occurrence intensity of the facts according to the participants of any known fact (S, r, o and tau)
Figure BDA0002987658900000083
Wherein the participants are entities s, o and relations r contained in the known facts;
s202, utilization occurs in tau i The historical fact of the moment plays the role of the excitation of the current dynamic fact
Figure BDA0002987658900000084
The method is divided into two parts:
s203, intensity of spontaneous generation based on the fact
Figure BDA0002987658900000085
And the excitation of current dynamic facts
Figure BDA0002987658900000086
Dividing the two parts, and calculating the occurrence intensity of the known fact (s, r, o, tau)
Figure BDA0002987658900000087
S204, intensity of occurrence of (S, r, o, τ) according to the known fact
Figure BDA0002987658900000088
Calculating the probability p (s, r, o | I (tau)) of each known fact;
s205, calculating to obtain the evolution loss L of the local structure by maximizing the occurrence probability of the fact according to the occurrence probability of each known fact local
In this embodiment, in order to consider the influence of the historical fact on the occurrence probability of the current fact, the invention will first occur in τ i The influence of the historical fact of the moment on the current time is decomposed into two parts:
Figure BDA0002987658900000091
wherein eta is s,ri ) And η o,ri ) Respectively representing head entity s and tail entity o in current dynamic fact at tau i The effect of historical facts at the time of day on the current dynamic facts. For each entity, different historical facts have different effects on the current fact since their different historical facts will be connected to different entities through different relationships, and for this reason, the present invention will τ i All historical facts of entity e under the timestamp are considered as a hierarchy and their impact on the current fact is quantified as follows:
Figure BDA0002987658900000092
where e represents an entity (s or o) that considers the impact of a historical fact, x represents a target entity in the historical fact (when e is s, x is o),
Figure BDA0002987658900000093
is expressed at tau i The set of relationships that entity e has at the time,
Figure BDA0002987658900000094
is at tau i Entity under time stamp i Existence relationship
Figure BDA0002987658900000095
Represents a parameter matrix for measuring the similarity between the relationship vectors. In order to model different importance of different historical facts to current facts, the invention uses a hierarchical attention mechanism to calculate relationship-level attentiveness respectively
Figure BDA0002987658900000096
And attention at the entity level beta h,x The relationship level attention is calculated as follows:
Figure BDA0002987658900000097
wherein the content of the first and second substances,
Figure BDA0002987658900000098
and Z r An embedded table representing relationships in historical facts, the entity level attention is calculated as follows:
Figure BDA0002987658900000099
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00029876589000000910
and
Figure BDA00029876589000000911
is an embedded representation of the target entity in the historical fact under the corresponding timestamp.
In the present embodiment, the intensity spontaneously occurs according to this fact
Figure BDA0002987658900000101
And the influence of the current fact is divided into two parts
Figure BDA0002987658900000102
The intensity of occurrence of the known fact (s, r, o, τ) is calculated:
Figure BDA0002987658900000103
since the above equation may obtain negative values, and the probability of occurrence is a positive number of 1 or less, the present invention converts the above occurrence strength into a positive number by an exponential function:
Figure BDA0002987658900000104
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002987658900000105
representing the original, factually occurring intensity, theta representing the hyper-parameter,
Figure BDA0002987658900000106
represents the excitation effect of the historical fact on the current fact, tau represents the occurrence time of the current fact, tau i Representing the time of occurrence of the historical event, k (τ - τ) i ) Representing a time decay function.
Therefore, the probability p (s, r, o | I (τ)) of occurrence of each known fact can be obtained:
Figure BDA0002987658900000107
wherein the content of the first and second substances,
Figure BDA0002987658900000108
indicating the occurrence intensity of the candidate fact (e, r, o, τ),
Figure BDA0002987658900000109
representing the occurrence strength of the candidate facts (s, r, e, tau), e representing any entity in the entity set, epsilon representing the entity set of a time-series knowledge graph, I (tau) representing the set of historical events before the time tau, s representing the head entity contained in the current fact, r representing the relation contained in the current fact, and o representing the tail entity contained in the current fact.
In this embodiment, the probability of each known fact occurring is maximized by minimizing a loss function:
Figure BDA00029876589000001010
s3, inputting the time sequence knowledge graph snapshots of the time sequence knowledge graph under each time stamp in a time sequence, calculating corresponding soft modularity for the graph structure of each time sequence knowledge graph snapshot, and maximizing the soft modularity to obtain the evolution loss of the global structure, wherein the implementation method comprises the following steps:
s301, inputting the time sequence knowledge graph snapshot of the time sequence knowledge graph under each time stamp in time sequence, and calculating to obtain the connection strength between the two entities
Figure BDA00029876589000001011
S302, according to the connection strength
Figure BDA0002987658900000111
Calculating to obtain a soft modularity corresponding to the graph structure of each time sequence knowledge graph snapshot;
s303, calculating to obtain a community distribution vector of each entity
Figure BDA0002987658900000112
S304, according to the community distribution vector of each entity, maximizing the soft modularity to obtain the evolution loss L of the global structure global
In this embodiment, in the process of modeling the community structure of the time-series knowledge graph, considering that different connection strengths may be brought by using different relationship connections between entities, the connection strength between two entities is first calculated according to the following formula:
Figure BDA0002987658900000113
wherein r represents a set
Figure BDA0002987658900000114
In the above-mentioned relation, the relation of any one of,
Figure BDA0002987658900000115
represents the set of relationships, Z, that exist between entity i and entity j under the time stamp of τ r A vector representing the relation r, a parameter vector for measuring the connection strength of different relations,
Figure BDA0002987658900000116
representing a non-linear activation function.
Based on this, a soft module matrix of the time-series knowledge graph under each time stamp can be obtained, and each element in the matrix is obtained by the following method:
Figure BDA0002987658900000117
wherein the content of the first and second substances,
Figure BDA0002987658900000118
and
Figure BDA0002987658900000119
respectively representing the degree of entity i and entity j at the time stamp τ, m τ Representing the total number of relationships that exist for the time-series knowledge graph at the time stamp of tau.
In order to maximize the soft modularity of the time-series knowledge-graph at each timestamp, the invention needs to obtain the community allocation vector of each entity. Considering that entities in the time-series knowledge graph have multiple types, and the same entity may belong to multiple different communities at the same time, soft community allocation is allowed to be performed on the entities, and the community allocation of each entity is obtained through the following formula:
Figure BDA00029876589000001110
wherein F represents a parameter matrix for mapping the embedded representation of the entity to a community allocation vector of the entity,
Figure BDA0002987658900000121
representing the embedded representation of entity i under time stamp tau,
Figure BDA0002987658900000122
and representing the embedded representation corresponding to the community to which the entity i belongs in the timestamp.
In this embodiment, the soft modularity of the timing knowledge graph under each timestamp is finally maximized by minimizing the following loss function:
Figure BDA0002987658900000123
s4, calculating to obtain an overall loss function L of the model according to the evolution loss of the local structure and the evolution loss of the global structure:
L=L local +L global
s5, iteratively optimizing the overall loss function of the model by using a gradient descent method, and updating parameters of the model and embedded expressions of entities and relations;
and S6, judging whether the model is converged, if so, obtaining the final entity and relationship embedded representation, and finishing the learning of the time sequence knowledge graph representation, otherwise, returning to the step S1.
The embedded representation of the time sequence knowledge graph is learned by simultaneously modeling the local structure evolution and the global structure evolution of the time sequence knowledge graph by the model, so that the embedded representation learned by the model can effectively capture the evolution essence of the time sequence knowledge graph. The time sequence point process based on the level attention can consider various evolution modes of entity semantics and calculate different influences for different historical events, thereby effectively modeling the establishment of the relationship between entities. The model can effectively model dynamic community division in the time sequence knowledge graph based on the soft modularity and learn the evolution process of the time sequence knowledge graph on the macroscopic level. As shown in Table 1, table 1 is a comparative table of the results of the experiments.
TABLE 1
Figure BDA0002987658900000131
Example 2
The present invention is further described below.
For any dynamic social network in the real world, the dynamic social network is represented as a time sequence knowledge graph for describing the relationship between entities by means of entity disambiguation, relationship extraction and the like, the obtained time sequence knowledge graph is input into the proposed model to obtain embedded representations corresponding to the social entities and the relationship through gradient descent optimization, and then the embedded representations are used for describing score functions of fact credibility so as to measure the credibility of each candidate fact, and the fact with the highest credibility is selected from the embedded representations to supplement the original dynamic social network, as shown in fig. 2, the implementation method is as follows:
a1, initializing parameters of a model and embedded representation of any social entity and relationship according to a current dynamic social knowledge graph to be represented;
in this embodiment, an embedded representation of any social entity e under a timestamp τ is initialized
Figure BDA0002987658900000132
The expression of (c) is as follows:
Figure BDA0002987658900000133
wherein, theta e 、ω e And v e Represent vectors specific to the current social entity.
A2, inputting the known social facts of the current dynamic social knowledge graph to calculate and obtain the occurrence probability of each known social fact according to the sequence of the fact corresponding to the time stamps in the social knowledge graph, and obtaining the evolution loss of the local structure by maximizing the occurrence probability of the known social facts, wherein the implementation method comprises the following steps:
a201, inputting the known social facts of the current dynamic social timing knowledge graph according to the sequence of the fact corresponding to the time stamp tau in the social knowledge graph, and calculating the spontaneous occurrence strength of the social facts according to the participants of any known social facts (s, r, o, tau)
Figure BDA0002987658900000141
Wherein the participants are social entities s, o and social relations r contained in known social facts;
a202, utilization occurs in tau i Historical social facts at the moment will serve as incentives for the current dynamic social facts
Figure BDA0002987658900000142
The method is divided into two parts:
a203, strength of spontaneous occurrence according to the social fact
Figure BDA0002987658900000143
Incentives with current dynamic social facts
Figure BDA0002987658900000144
Dividing the two parts, and calculating the occurrence intensity of the known social facts (s, r, o, tau)
Figure BDA0002987658900000145
A204, occurrence intensity according to the known social facts (s, r, o, τ)
Figure BDA0002987658900000146
Calculating the probability p (s, r, o | I (tau)) of occurrence of each known social fact;
a205, according to the occurrence probability of each known social fact, calculating by maximizing the occurrence probability of the fact to obtain the evolution loss L of the local structure local
In this embodiment, in order to consider the influence of the historical fact on the occurrence probability of the current fact, the invention first occurs in τ i The influence of the historical fact of the moment on the current time is decomposed into two parts:
Figure BDA0002987658900000147
wherein eta s,ri ) And η o,ri ) Respectively representing head social entity s and tail social entity o in the current dynamic social fact at tau i The impact of historical social facts at the time of day on the current dynamic social facts. For each entity, different historical facts have different effects on the current fact since their different historical facts will be connected to different entities through different relationships, and for this reason, the present invention will τ i All historical facts of entity e under the timestamp are considered as a hierarchy and their impact on the current fact is quantified as follows:
Figure BDA0002987658900000148
where e represents a social entity (s or o) that considers the impact of historical social facts, x represents a target entity in the historical facts (when e is s, x is o),
Figure BDA0002987658900000151
is expressed at tau i The set of relationships that entity e has at the time,
Figure BDA0002987658900000152
is at τ i Entity under time stamp i Existence relationship
Figure BDA0002987658900000153
Represents a parameter matrix for measuring the similarity between the relationship vectors. In order to model different importance of different historical facts to the current fact, the invention uses a hierarchical attention mechanism to respectively calculate the relationship level attention
Figure BDA0002987658900000154
And social entity level attention beta h,x The relationship level attention is calculated as follows:
Figure BDA0002987658900000155
wherein the content of the first and second substances,
Figure BDA0002987658900000156
and Z r An embedded table representing relationships in historical facts, the social entity level attention is calculated as follows:
Figure BDA0002987658900000157
wherein the content of the first and second substances,
Figure BDA0002987658900000158
and
Figure BDA0002987658900000159
is an embedded representation of the target entity in the historical fact under the corresponding timestamp.
In the present embodiment, the intensity spontaneously occurs according to this fact
Figure BDA00029876589000001510
And the influence of the current fact is divided into two parts
Figure BDA00029876589000001511
The intensity of occurrence of the known fact (s, r, o, τ) is calculated:
Figure BDA00029876589000001512
since the above equation may obtain negative values, and the probability of occurrence is a positive number of 1 or less, the present invention converts the above occurrence strength into a positive number by an exponential function:
Figure BDA00029876589000001513
wherein the content of the first and second substances,
Figure BDA00029876589000001514
representing primitive factsThe intensity of occurrence, theta, represents a hyper-parameter,
Figure BDA00029876589000001515
represents the excitation effect of the historical fact on the current fact, tau represents the occurrence time of the current fact, tau i Representing the time of occurrence of the historical event, k (τ - τ) i ) Representing a time decay function.
Thus, the probability p (s, r, o | I (τ)) that each known social fact occurs can be found:
Figure BDA0002987658900000161
wherein the content of the first and second substances,
Figure BDA0002987658900000162
representing the occurrence intensity of the candidate social facts (e, r, o, τ),
Figure BDA0002987658900000163
representing the occurrence strength of the candidate social facts (s, r, e, τ), e representing any social entity in the set of entities, epsilon representing the set of entities of a social knowledge graph, I (τ) representing the set of historical events before the time τ, s representing the head social entity contained by the current fact, r representing the social relationship contained by the current fact, and o representing the tail social entity contained by the current fact.
In this embodiment, the probability of each known fact occurring is maximized by minimizing a loss function:
Figure BDA0002987658900000164
a3, inputting the social knowledge graph snapshots of the current dynamic social knowledge graph under each time stamp in a time sequence, calculating corresponding soft modularity for the graph structure of each social knowledge graph snapshot, and maximizing the soft modularity to obtain the evolution loss of the global structure, wherein the implementation method comprises the following steps:
a301, inputting in time sequenceCalculating the connection strength between two social entities according to the social knowledge graph snapshot of the front dynamic social knowledge graph under each time stamp
Figure BDA0002987658900000165
A302, according to the connection strength
Figure BDA0002987658900000166
Calculating to obtain a soft modularity corresponding to the graph structure of each social knowledge graph snapshot;
a303, calculating to obtain a community distribution vector of each social entity
Figure BDA0002987658900000167
A304, according to the community distribution vector of each social entity, maximizing the soft modularity to obtain the evolution loss L of the global structure global
In this embodiment, in the process of modeling the community structure of the time-series knowledge graph, considering that different connection strengths may be brought by using different relationship connections between entities, the connection strength between two entities is first calculated according to the following formula:
Figure BDA0002987658900000168
wherein r represents a set
Figure BDA0002987658900000171
Any of the social relationships in (a) or (b),
Figure BDA0002987658900000172
represents the set of social relationships that exist between social entity i and social entity j at the time stamp τ, Z r A vector representation representing the social relationship r, a parameter vector for measuring the strength of the connection of different relationships,
Figure BDA0002987658900000173
representing a non-linear activation function.
Based on the above, a soft module matrix of the time-series knowledge graph under each time stamp can be obtained, and each element in the matrix is obtained by the following steps:
Figure BDA0002987658900000174
wherein the content of the first and second substances,
Figure BDA0002987658900000175
and
Figure BDA0002987658900000176
respectively representing the degree of social entity i and social entity j at the timestamp τ, m τ Representing the total number of relationships that the social knowledge graph exists at the τ timestamp.
In order to maximize the soft modularity of the timing knowledge graph at each timestamp, the present invention requires obtaining a community allocation vector for each social entity. Considering that the entities in the time-series knowledge graph have multiple types, and the same social entity may belong to multiple different communities at the same time, soft community allocation is allowed for the social entity, and community allocation of each entity is obtained through the following formula:
Figure BDA0002987658900000177
wherein F represents a parameter matrix for mapping the embedded representation of the social entity to a social assignment vector of the social entity,
Figure BDA0002987658900000178
representing an embedded representation of a social entity i under a timestamp tau,
Figure BDA0002987658900000179
and representing the embedded representation corresponding to the community to which the social entity i belongs in the last timestamp.
In this embodiment, the soft modularity of the timing knowledge graph under each timestamp is finally maximized by minimizing the following loss function:
Figure BDA00029876589000001710
a4, calculating to obtain an overall loss function L of the model according to the evolution loss of the local structure and the evolution loss of the global structure:
L=L local +L global
a5, iteratively optimizing the overall loss function of the model by using a gradient descent method, and updating parameters of the model and embedded expressions of social entities and relationships;
and A6, judging whether the model is converged, if so, obtaining the final embedded representation of the social entity and the relationship, and finishing the learning of the representation of the timing sequence knowledge graph, otherwise, returning to the step A1.

Claims (6)

1. A time sequence knowledge graph representation learning method based on co-evolution modeling is characterized by comprising the following steps:
s1, initializing parameters of a model and embedded representation of any social entity and relationship according to a current dynamic social timing knowledge graph to be represented;
s2, inputting known social facts of the current dynamic social timing knowledge graph to calculate according to the sequence of the fact corresponding to the timestamps in the social timing knowledge graph to obtain the occurrence probability of each known social fact, and obtaining the evolution loss of the local structure by maximizing the occurrence probability of the known social facts;
s3, inputting the social timing knowledge graph snapshots of the current dynamic social timing knowledge graph under each timestamp in a time sequence, calculating corresponding soft modularity for the graph structure of each social timing knowledge graph snapshot, and maximizing the soft modularity to obtain the evolution loss of the global structure;
the step S3 includes the steps of:
s301, inputting the current dynamic social timing knowledge graph in time sequenceCalculating the connection strength between two social entities according to the social timing knowledge graph snapshot under each timestamp
Figure FDA0004070142580000011
S302, according to the connection strength
Figure FDA0004070142580000012
Calculating to obtain a soft modularity corresponding to the graph structure of each social timing knowledge graph snapshot, wherein each element in the soft modularity
Figure FDA0004070142580000013
The expression of (a) is as follows:
Figure FDA0004070142580000014
wherein the content of the first and second substances,
Figure FDA0004070142580000015
and
Figure FDA0004070142580000016
respectively representing the degree of social entity i and social entity j at the timestamp τ, m τ Representing a total number of relationships that the social timing knowledge graph exists at the τ timestamp;
s303, calculating to obtain a community distribution vector of each social entity
Figure FDA0004070142580000017
S304, according to the community distribution vector of each social entity, maximizing the soft modularity to obtain the evolution loss L of the global structure global
The connection strength between the two entities in the step S301
Figure FDA0004070142580000018
The expression of (a) is as follows:
Figure FDA0004070142580000021
wherein r represents a set
Figure FDA0004070142580000022
Any of the social relationships in (a) or (b),
Figure FDA0004070142580000023
represents the set of social relationships that exist between social entity i and social entity j at the time stamp τ, Z r A vector representation representing the social relationship r, a parameter vector for measuring the strength of the connection of different relationships,
Figure FDA0004070142580000024
representing a non-linear activation function;
the community allocation vector of each entity in the step S303
Figure FDA0004070142580000025
The expression of (c) is as follows:
Figure FDA0004070142580000026
wherein F represents a parameter matrix for mapping the embedded representation of the social entity to a social assignment vector of the social entity,
Figure FDA0004070142580000027
representing an embedded representation of a social entity i under a timestamp tau,
Figure FDA0004070142580000028
representing an embedded representation corresponding to the community to which the social entity i belongs in the last timestamp;
the evolution loss L of the global structure in the step S304 global The expression of (a) is as follows:
Figure FDA0004070142580000029
wherein T represents a transposition symbol, m τ Representing the total number of relations existing in the time-sequence knowledge graph at the time stamp tau, tr (-) representing the trace of the matrix, H τ Represents the community allocation matrix at the timestamp tau,
Figure FDA00040701425800000210
representing a soft modularity matrix, norm (·) representing two-norm regularization;
s4, calculating to obtain an overall loss function of the model according to the evolution loss of the local structure and the evolution loss of the global structure;
s5, iteratively optimizing the overall loss function of the model by using a gradient descent method, and updating parameters of the model and embedded expressions of social entities and social relations;
and S6, judging whether the model is converged, if so, obtaining the final embedded representation of the social entity and the social relation, and finishing the learning of the time sequence knowledge graph representation, otherwise, returning to the step S1.
2. The method for learning sequential knowledge graph representation based on co-evolution modeling according to claim 1, wherein the step S1 is to initialize the embedded representation of any social entity e under the timestamp τ
Figure FDA00040701425800000211
The expression of (a) is as follows:
Figure FDA0004070142580000031
wherein, theta e 、ω e And v e Are all shown asA vector specific to the current social entity.
3. The method for learning sequential knowledge graph representation based on co-evolution modeling according to claim 1, wherein the step S2 comprises the following steps:
s201, inputting the known social facts of the current dynamic social timing knowledge graph according to the sequence of the corresponding timestamps tau of the facts in the social timing knowledge graph, and calculating the spontaneous occurrence intensity of the social facts according to the participants of any known social facts (S, r, o and tau)
Figure FDA0004070142580000032
Wherein the participants are social entities s, o and social relations r contained in known social facts;
s202, utilization occurs in tau i Historical social facts at the moment will serve as incentives for current dynamic social facts
Figure FDA0004070142580000033
The method is divided into two parts:
Figure FDA0004070142580000034
Figure FDA0004070142580000035
Figure FDA0004070142580000036
Figure FDA0004070142580000037
wherein eta is s,ri ) And η o,ri ) Respectively representThe head social entity s and the tail social entity o in the current dynamic social fact are at τ i The impact of historical social facts at the time on the current dynamic social facts,
Figure FDA0004070142580000038
is expressed at tau i The set of relationships that social entity e has at the moment,
Figure FDA0004070142580000039
a relationship-level of attention is indicated,
Figure FDA00040701425800000310
and Z r An embedded representation representing a relationship in a historical social fact,
Figure FDA00040701425800000311
representing the relation contained in the historical event, V representing a parameter matrix for measuring the similarity between relation vectors, and h representing tau i Social entity, beta, having a relationship with social entity e at the moment h,x Which is indicative of the attention of the entity,
Figure FDA00040701425800000312
denotes h is at τ i The vector representation at a time instant is,
Figure FDA00040701425800000313
denotes x is at τ i Vector representation at time instant, x represents the entity that social entity e has a relationship in the current dynamic social fact, r' represents τ i At any moment, one of the social relations of the social entity e is given, h' represents one specific to h,
Figure FDA0004070142580000041
denotes τ i A set of entities having a social relationship with the social entity e at the moment,
Figure FDA0004070142580000042
denotes h' at τ i Vector representation at time instant;
s203, strength of spontaneous occurrence according to the social fact
Figure FDA0004070142580000043
Incentives with current dynamic social facts
Figure FDA00040701425800000416
Dividing the two parts, and calculating the occurrence intensity of the known social facts (s, r, o, tau)
Figure FDA0004070142580000044
S204, according to the occurrence intensity of the known social facts (S, r, o, tau)
Figure FDA0004070142580000045
Calculating the probability p (s, r, o | I (tau)) of occurrence of each known social fact;
s205, calculating to obtain the evolution loss L of the local structure by maximizing the occurrence probability of the fact according to the occurrence probability of each known social fact local
Figure FDA0004070142580000046
Where I (τ) represents the set of historical event components before the time instant τ.
4. The method for learning sequential knowledge graph representation based on co-evolution modeling according to claim 3, wherein the strength of the fact spontaneously occurring in the step S201
Figure FDA0004070142580000047
The expression of (c) is as follows:
Figure FDA0004070142580000048
wherein the content of the first and second substances,
Figure FDA0004070142580000049
and
Figure FDA00040701425800000410
representing embedded representations of head and tail social entities s and o, respectively, in a social fact under a timestamp, Z r Representing the embedded representation corresponding to the social relationship r, and w representing a learning parameter matrix for measuring the similarity between the vectors.
5. The method as claimed in claim 3, wherein the occurrence intensities of the known facts (S, r, o, τ) in step S203 are determined according to the evolutionary modeling-based sequential knowledge graph representation learning method
Figure FDA00040701425800000411
The expression of (a) is as follows:
Figure FDA00040701425800000412
Figure FDA00040701425800000413
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00040701425800000414
representing the original, factually occurring intensity, theta representing the hyper-parameter,
Figure FDA00040701425800000415
represents the excitation effect of the historical fact on the current fact, tau represents the occurrence time of the current fact, tau i Indicating the time of occurrence of the historical event, k (tau-tau) i ) Representing time decayA function.
6. The method for learning representation of time-series knowledge graph based on co-evolution modeling according to claim 3, wherein the expression of the probability p (S, r, oI (τ)) of each known fact occurrence in step S204 is as follows:
Figure FDA0004070142580000051
wherein the content of the first and second substances,
Figure FDA0004070142580000052
representing the occurrence intensity of the candidate social facts (e, r, o, τ),
Figure FDA0004070142580000053
representing the occurrence strength of the candidate social facts (s, r, e, τ), e representing any social entity in the set of entities, epsilon representing the set of entities of a social timing knowledge graph, I (τ) representing the set of historical events before τ, s representing the head social entity contained by the current fact, r representing the social relationship contained by the current fact, and o representing the tail social entity contained by the current fact.
CN202110305818.4A 2021-03-23 2021-03-23 Sequential knowledge graph representation learning method based on collaborative evolution modeling Active CN112860918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110305818.4A CN112860918B (en) 2021-03-23 2021-03-23 Sequential knowledge graph representation learning method based on collaborative evolution modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110305818.4A CN112860918B (en) 2021-03-23 2021-03-23 Sequential knowledge graph representation learning method based on collaborative evolution modeling

Publications (2)

Publication Number Publication Date
CN112860918A CN112860918A (en) 2021-05-28
CN112860918B true CN112860918B (en) 2023-03-14

Family

ID=75992217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110305818.4A Active CN112860918B (en) 2021-03-23 2021-03-23 Sequential knowledge graph representation learning method based on collaborative evolution modeling

Country Status (1)

Country Link
CN (1) CN112860918B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113934862B (en) * 2021-09-29 2022-10-14 北方工业大学 Community security risk prediction method, device, electronic equipment and medium
CN114117064B (en) * 2021-11-09 2023-05-26 西南交通大学 Urban subway flow prediction method based on knowledge dynamic evolution of multi-time granularity

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10795937B2 (en) * 2016-08-08 2020-10-06 International Business Machines Corporation Expressive temporal predictions over semantically driven time windows
US20190018827A1 (en) * 2017-07-12 2019-01-17 Google Inc. Electronic content insertion systems and methods
CN108733792B (en) * 2018-05-14 2020-12-01 北京大学深圳研究生院 Entity relation extraction method
CN111581396B (en) * 2020-05-06 2023-03-31 西安交通大学 Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax
CN111723729B (en) * 2020-06-18 2022-08-05 四川千图禾科技有限公司 Intelligent identification method for dog posture and behavior of surveillance video based on knowledge graph
CN112215435B (en) * 2020-11-02 2023-06-09 银江技术股份有限公司 Urban congestion propagation mode prediction method based on cyclic autoregressive model
CN112364132A (en) * 2020-11-12 2021-02-12 苏州大学 Similarity calculation model and system based on dependency syntax and method for building system

Also Published As

Publication number Publication date
CN112860918A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
Zheng et al. Meta label correction for noisy label learning
CN113191484B (en) Federal learning client intelligent selection method and system based on deep reinforcement learning
CN111047085B (en) Hybrid vehicle working condition prediction method based on meta-learning
CN112860918B (en) Sequential knowledge graph representation learning method based on collaborative evolution modeling
CN112699247A (en) Knowledge representation learning framework based on multi-class cross entropy contrast completion coding
CN110244689A (en) A kind of AUV adaptive failure diagnostic method based on identification feature learning method
CN111931814B (en) Unsupervised countering domain adaptation method based on intra-class structure tightening constraint
CN113361685B (en) Knowledge tracking method and system based on learner knowledge state evolution expression
CN111198550A (en) Cloud intelligent production optimization scheduling on-line decision method and system based on case reasoning
CN116402352A (en) Enterprise risk prediction method and device, electronic equipment and medium
Lin et al. Master general parking skill via deep learning
CN112348269A (en) Time series prediction modeling method of fusion graph structure
Xu et al. Living with artificial intelligence: A paradigm shift toward future network traffic control
CN108009635A (en) A kind of depth convolutional calculation model for supporting incremental update
CN114399055A (en) Domain generalization method based on federal learning
CN116501444B (en) Abnormal cloud edge collaborative monitoring and recovering system and method for virtual machine of intelligent network-connected automobile domain controller
CN116484016A (en) Time sequence knowledge graph reasoning method and system based on automatic maintenance of time sequence path
CN114240539B (en) Commodity recommendation method based on Tucker decomposition and knowledge graph
CN113835964B (en) Cloud data center server energy consumption prediction method based on small sample learning
CN115965078A (en) Classification prediction model training method, classification prediction method, device and storage medium
Imani et al. Hierarchical, distributed and brain-inspired learning for internet of things systems
CN112836511B (en) Knowledge graph context embedding method based on cooperative relationship
Papageorgiou et al. Bagged nonlinear hebbian learning algorithm for fuzzy cognitive maps working on classification tasks
CN114880527A (en) Multi-modal knowledge graph representation method based on multi-prediction task
CN114120447A (en) Behavior recognition method and system based on prototype comparison learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant