CN115809340A - Entity updating method and system of knowledge graph - Google Patents

Entity updating method and system of knowledge graph Download PDF

Info

Publication number
CN115809340A
CN115809340A CN202211047396.6A CN202211047396A CN115809340A CN 115809340 A CN115809340 A CN 115809340A CN 202211047396 A CN202211047396 A CN 202211047396A CN 115809340 A CN115809340 A CN 115809340A
Authority
CN
China
Prior art keywords
entity
sentence
similarity
knowledge graph
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211047396.6A
Other languages
Chinese (zh)
Inventor
罗旺
娄超
朱成诚
席丁鼎
俞弦
高德荃
赵子岩
来风刚
韩圣亚
马超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Nari Information and Communication Technology Co
State Grid Electric Power Research Institute
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Nari Information and Communication Technology Co
State Grid Electric Power Research Institute
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Nari Information and Communication Technology Co, State Grid Electric Power Research Institute, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202211047396.6A priority Critical patent/CN115809340A/en
Publication of CN115809340A publication Critical patent/CN115809340A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an entity updating method and system of a knowledge graph, which are used for acquiring a new knowledge graph and an original knowledge graph; based on the name attributes of the new knowledge graph and the original knowledge graph, calculating a name attribute similarity matrix; calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph; and fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph, and updating the entity corresponding to the new knowledge graph into the original knowledge graph. The advantages are that: on the basis of multi-attention entity alignment research, a fault diagnosis field knowledge graph entity alignment method combining long text name attributes and relation structure similarity calculation is provided. And a knowledge graph updating tool is developed based on the method, so that the accuracy of entity alignment is effectively improved and the efficiency of knowledge graph updating is improved through case testing and actual use.

Description

Entity updating method and system of knowledge graph
Technical Field
The invention relates to an entity updating method and system of a knowledge graph, and belongs to the technical field of cloud data center diagnosis.
Background
With the development of knowledge graph technology in the field of intelligent operation and maintenance, intelligent diagnosis, reasoning, knowledge recommendation and other technologies based on knowledge graphs are concerned by researchers. Because knowledge in the fault diagnosis knowledge graph needs to be updated according to the actual cloud data center topological structure and the actual situation, an automatic knowledge graph updating tool is needed, and the knowledge fusion of new knowledge and the original knowledge graph is realized.
A cloud data center fault diagnosis knowledge map belongs to a domain knowledge map. The fault diagnosis knowledge map has the characteristics of clear relation structure and large entity NAME attribute information amount. In the knowledge updating process, entities cannot be aligned accurately only according to the calculation of NAME attribute similarity. At present, entity alignment work focuses on entity structure and attribute information, but the attribute information of an entity is limited to a node of the entity, and interactive learning with a domain structure of the entity cannot be carried out.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the defects of the prior art and provide a method and a system for updating an entity of a knowledge graph.
In order to solve the above technical problem, the present invention provides a method for updating an entity of a knowledge graph, including:
acquiring a new knowledge graph and an original knowledge graph;
based on the name attributes of the new knowledge graph and the original knowledge graph, calculating a name attribute similarity matrix;
calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph, and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
Further, the calculating an entity attribute similarity matrix based on the name attributes of the new knowledge graph and the original knowledge graph includes:
extracting sentences S in the new knowledge graph by using a mode of searching character embedding matrixes Ai Word vector s Ai Sum word vector x Ai
Extracting the sentence S in the original knowledge graph by using a mode of searching a character embedding matrix Bi Word vector s Bi Sum word vector x Bi
According to the sentence S Ai Word vector s of Ai And sentence S Bi Word vector s Bi Calculating a sentence S Ai Corresponding sentence S Bi Word vector similarity of (2);
according to the sentence S Ai Word vector x of Ai And sentence S Bi Word vector x Bi Calculating a sentence S Ai Corresponding sentence S Bi The word vector similarity of (2);
sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The word vector similarity is summed and averaged to obtain a sentence S Ai Corresponding sentence S Bi Name attribute similarity S of namei
And acquiring the name attribute similarity of each sentence in the new knowledge graph corresponding to each sentence in the original knowledge graph to obtain a name attribute similarity matrix.
Further, the calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph includes:
and acquiring an entity relationship structure triple of each sentence in the new knowledge map, inputting the entity relationship structure triple into an entity relationship structure similarity model obtained by training the relationship structure triple based on the original knowledge map in advance, obtaining the entity structure similarity of each sentence, and obtaining an entity structure similarity matrix according to the entity structure similarity of each sentence.
Further, the training process of the entity relationship structure similarity model obtained by training the entity relationship structure triples based on the original knowledge graph includes:
constructing an entity relationship structure similarity model to be trained, and expressing as follows:
Figure SMS_1
Figure SMS_2
Figure SMS_3
wherein the content of the first and second substances,
Figure SMS_4
entity vectors representing the input and output of the ith layer of domain attention, respectively;
Figure SMS_5
an entity vector representing the input of the l-th layer of attention of the domain, comprising an entity e i And all its neighbors; σ represents an activation function sigmoid; n is a radical of hydrogen i Representing an entity e i Set of connected entities of e j Representing an entity e i And all its neighbors, e k Representing an entity e i All neighbors of (2);
Figure SMS_6
representing the entity domain attention coefficient after the l layer normalization;
Figure SMS_7
representing an entity e i The result is fused with the neighbor j;
Figure SMS_8
representing an entity e i The result is fused with the neighbor k through information; exp () represents an exponential function with a natural constant e as the base; leakyReLU () represents an activation function; u is an element of R 2d(l+1)×1 And W (l) ∈R d(l+1)×d(l) Is a learnable parameter matrix; d (l) represents the network embedding dimension of the l-th layer; d (l + 1) represents the network embedding dimension of the l +1 th layer; superscript T represents matrix transposition;
constructing a pre-aligned entity seed set and positive/negative example triples;
constructing a loss function L for entity alignment for training an entity relationship structure similarity model A Expressed as:
L A =L 0 +L a
Figure SMS_9
wherein L is a Entity alignment loss function, L, representing a model of similarity of entity-relationship structures 0 Representing a parameter matrixA W orthogonalization loss function, a negative sampling set e _ofan entity e and a negative sampling set e '_ of a neighbor entity e' of the entity e are constructed by adopting a nearest neighbor sampling method NS (e); d (·) =1-cos (·,) represents the inter-entity cosine distance; [. The] + = max {., 0}; gamma is a hyperparameter;
Figure SMS_10
wherein, W (l) A parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure SMS_11
2, representing the 2 norm of the matrix and taking the square operation;
constructing a loss function L of a relationship structure for training an entity relationship structure similarity model R Expressed as:
Figure SMS_12
wherein f (h, r, t) = | | h + r-t | | purple light 2 Representing a scoring function for a relation triple (h, r, t) for calculating a confidence of the relation triple, wherein h, t is a head-tail entity vector from a global structure embedding layer, and r is a relation vector to be modeled and learned; γ' is a hyperparameter; t is 1 Represents a positive example triplet, T 1 ' (h,r,t) { (h ', r, t) | h' ∈ E } - { (h, r, t ') | t' ∈ E } represents a negative case triplet set, h 'represents a head entity of the negative case global structure embedding layer, t' represents a tail entity of the negative case global structure embedding layer, and E represents a set including all negative case entities;
training loss function L of entity alignment respectively by using pre-aligned entity seed set and positive/negative example triples A Sum-relation-structured loss function L R And determining final model parameters of the entity relationship structure similarity model, and updating the entity relationship structure similarity model according to the final model parameters to obtain the trained entity relationship structure similarity model.
Further, the fusing the name attribute similarity matrix and the entity relationship similarity matrix includes:
respectively standardizing the name attribute similarity matrix and the entity relationship similarity matrix;
and solving the average value of the normalized name attribute similarity matrix and the normalized entity relationship similarity matrix to obtain a fused final entity similarity sentence.
A system for entity updating of knowledge-graphs, comprising:
the acquisition module is used for acquiring a new knowledge graph and an original knowledge graph;
the first calculation module is used for calculating a name attribute similarity matrix based on the name attributes of the new knowledge graph and the original knowledge graph;
the second calculation module is used for calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and the updating module is used for fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
Further, the first calculating module is used for
Extracting sentences S in the new knowledge graph by using a mode of searching character embedding matrix Ai Word vector s Ai Sum word vector x Ai
Extracting the sentence S in the original knowledge graph by using a mode of searching a character embedding matrix Bi Word vector s Bi Sum word vector x Bi
According to the sentence S Ai Word vector s Ai And sentence S Bi Word vector s Bi Calculating a sentence S Ai Corresponding sentence S Bi Word vector similarity of (2);
according to the sentence S Ai Word vector x Ai And sentence S Bi Word vector x Bi Computing a sentence S Ai Corresponding sentence S Bi The word vector similarity of (2);
sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The word vector similarity is summed and averaged to obtain a sentence S Ai Corresponding sentence S Bi Name attribute similarity S of namei
And obtaining the name attribute similarity of each sentence in the new knowledge graph corresponding to each sentence in the original knowledge graph to obtain a name attribute similarity matrix.
Further, the second computing module is used for
And acquiring an entity relationship structure triple of each sentence in the new knowledge map, inputting the entity relationship structure triple into an entity relationship structure similarity model obtained by training the relationship structure triple based on the original knowledge map in advance, obtaining the entity structure similarity of each sentence, and obtaining an entity structure similarity matrix according to the entity structure similarity of each sentence.
Further, the second module is used for
Constructing an entity relationship structure similarity model to be trained, and expressing as follows:
Figure SMS_13
Figure SMS_14
Figure SMS_15
wherein the content of the first and second substances,
Figure SMS_16
entity vectors representing the input and output of the ith layer of domain attention, respectively;
Figure SMS_17
an entity vector representing the input of the l-th layer of attention of the domain, comprising an entity e i And all its neighbors; σ represents an activation function sigmoid; n is a radical of i Representing an entity e i Set of connected entities of e j Representing an entity e i And all its neighbors, e k Representing an entity e i All neighbors of (2);
Figure SMS_18
representing the entity domain attention coefficient after the l level normalization;
Figure SMS_19
representing an entity e i The result is fused with the neighbor j;
Figure SMS_20
representing an entity e i The result is fused with the neighbor k through information; exp () represents an exponential function with a natural constant e as the base; leakyReLU () represents an activation function; u is an element of R 2d(l+1)×1 And W (l) ∈R d(l+1)×d(l) Is a learnable parameter matrix; d (l) represents the network embedding dimension of the l-th layer; d (l + 1) represents the network embedding dimension of the l +1 th layer; superscript T represents matrix transposition;
constructing a pre-aligned entity seed set and positive/negative case triples;
constructing a loss function L for entity alignment for training an entity relationship structure similarity model A Expressed as:
L A =L 0 +L a
Figure SMS_21
wherein L is a Entity alignment loss function, L, representing a model of similarity of entity-relationship structures 0 Expressing a parameter matrix W orthogonalization loss function, and constructing a negative sampling set e _ofan entity e and a negative sampling set e '_ of a neighbor entity e' of the entity e by adopting a nearest neighbor sampling method NS (e); d (·) =1-cos (·,) represents the inter-entity cosine distance; [. The] + = max {., 0}; gamma is a hyperparameter;
Figure SMS_22
wherein, the first and the second end of the pipe are connected with each other,W (l) a parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure SMS_23
2, representing the 2 norm of the matrix and taking the square operation;
constructing a loss function L of a relationship structure for training an entity relationship structure similarity model R Expressed as:
Figure SMS_24
wherein f (h, r, t) = | | | h + r-t | | circuitry 2 Representing a scoring function for a relation triple (h, r, t) for calculating the confidence of the relation triple, wherein h, t is a head-tail entity vector from a global structure embedding layer, and r is a relation vector to be modeled and learned; γ' is a hyperparameter; t is a unit of 1 Represents a positive example triplet set, T 1 ' (h R, t) = { (h ', r, t) | h' ∈ E }, { (h, r, t ') | t' ∈ E } represents a negative case triplet set, h 'represents a head entity of the negative case global structure embedding layer, t' represents a tail entity of the negative case global structure embedding layer, and E represents a set including all negative case entities;
training loss function L of entity alignment respectively by using pre-aligned entity seed set and positive/negative example triples A Sum-relation-structured loss function L R And determining final model parameters of the entity relationship structure similarity model, and updating the entity relationship structure similarity model according to the final model parameters to obtain the trained entity relationship structure similarity model.
Further, the update module is used for
Respectively standardizing the name attribute similarity matrix and the entity relationship similarity matrix;
and solving the average value of the normalized name attribute similarity matrix and the normalized entity relationship similarity matrix to obtain a fused final entity similarity sentence.
A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods.
A computing device, comprising, in combination,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.
The invention achieves the following beneficial effects:
on the basis of multi-attention entity alignment research, a fault diagnosis field knowledge graph entity alignment method combining long text name attributes and relation structure similarity calculation is provided. And a knowledge graph updating tool is developed based on the method, and through case testing and actual use, the accuracy of entity alignment is effectively improved, and the efficiency of knowledge graph updating is improved.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a block diagram of the overall framework of the method for entity update of a knowledge-graph of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, the present invention discloses an entity updating method of a knowledge graph, which comprises:
acquiring a new knowledge graph and an original knowledge graph;
based on the name attributes of the new knowledge graph and the original knowledge graph, calculating a name attribute similarity matrix;
calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph, and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
Fig. 2 is a specific framework diagram of the method for updating an entity of a knowledge graph, in which a structural channel converts structural features included in a relationship triple of an entity into a graph entity feature vector, and an entity similarity matrix is obtained through similarity calculation. And calculating an entity attribute similarity matrix by adopting cosine similarity based on the name attribute through the attribute channel. And finally, obtaining an entity corresponding to the new knowledge through the fusion of the structural similarity and the attribute similarity, updating the new knowledge into the original map on the basis of manual inspection, and modifying or increasing the attributes and the relation content in the original map.
The method specifically comprises the following steps:
first, vector generation. The model provided by the invention fuses word segmentation characteristic vectors of words and word vectors, the input of the model is a sentence and all self-matching words in the sentence, and the self-matching words of characters refer to the words containing the characters. With s = { c 1 ,c 2 ,...c n Denotes this sentence, where c i Representing the ith word in the sentence. Representing each character in the sentence as a vector x by searching a character embedding matrix i
x i =e c (c i )
Wherein e c Is a character-embedded look-up table, c i Are characters in a sentence.
The model carries out word segmentation on the sentence through a word segmentation tool and labels data in a training set to construct word segmentation characteristics, and the obtained word vector containing word boundary information is expressed as follows:
Figure SMS_25
wherein x i Representing the word vector, s, to which the word corresponds i The word feature vector corresponding to the word is represented,
Figure SMS_26
representing vector stitching, c i After the representation fusionA word vector representation of;
and secondly, generating a self-matching word vector. To represent semantic information for a word, a vector representation of self-matching words is obtained, and the vocabulary in the model that the input sentence can match is represented as l = { z = 1 ,z 2 ,...z m Expressing each word as a semantic vector z by looking up a pre-trained word embedding matrix i
z i =e w (l i )
Finally, splicing the word vectors and the word vectors to obtain the final output representation of the embedded layer:
Node f =[v 1 ,v 2 ,....v n ]=[c 1 ,c 2 ,....,z 1 ,z 2 ,....z m ]
wherein v is i Representing the final word vector representation, c i Is a word vector representation, z i Is a self-matching word vector representation;
and thirdly, calculating the similarity of the attributes. After the entity name attribute is converted into a word vector and a word vector, the similarity of the two name attributes is calculated based on cosine similarity, the closer the value is to 1, the closer the included angle is to 0, namely the more similar the two vectors are. The calculation formula is as follows:
Figure SMS_27
wherein, node A And Node B A word vector representing two name attributes to be matched. Based on the formula, the attribute similarity between the entity to be updated and each entity name in the original map can be obtained.
In order to avoid the influence of a large number of repeated professional vocabularies on similarity calculation, the similarity of the two name attributes is calculated by adopting the Jaccard similarity. The calculation formula is as follows:
Figure SMS_28
wherein NAME A And NAME B Representing two NAME attributes to be matched,
Figure SMS_29
and
Figure SMS_30
represents NAME A And NAME B The ith word in (b) is # | { 8230 { \8230 { } andn { 8230 {, where | denotes the number of elements in the intersection of two sets, # | { 8230 { \ 8230 {, where | denotes the number of elements in the merged set of two sets.
The similarity of NAME attributes is defined as the mean of cosine similarity and Jaccard similarity:
Figure SMS_31
and fourthly, improving the GAT. And inputting a relation triple sequence of the entity, and calculating attention coefficients between the entity and each neighbor entity according to the thought of GAT, wherein the attention coefficients are used as weights for aggregating the characteristics of the neighbor entities. The computation of learning entity embedding with a relationship structure employs GAT:
Figure SMS_32
Figure SMS_33
Figure SMS_34
wherein the content of the first and second substances,
Figure SMS_35
entity vectors representing the input and output of the ith layer of domain attention, respectively; n is a radical of i Representing an entity e i A set of connected entities;
Figure SMS_36
representing the entity domain attention coefficient after the l level normalization; u is formed by R 2d(l+1)×1 And W (l) ∈R d(l +1)×d(l) Is a learnable parameter matrix; d is a radical of (l) Representing the network embedding dimension of the l-th layer. Inspired by words, the traditional GAT model is improved, the orthogonalization limitation is applied to the conversion matrix W, and the W orthogonalization loss is learned, so that the purpose is to keep the relative distribution between entities in the network embedding layer and the conversion process and keep more real entity structure information. The calculation formula of the parameter matrix W orthogonalization loss is as follows:
Figure SMS_37
wherein: w (l) A parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure SMS_38
2, representing the 2 norm of the matrix and taking the square operation;
and fifthly, aligning the damage. According to the requirements, a loss function is trained according to a pre-aligned entity seed set (an entity seed set is a set containing all entities of a knowledge graph, and a positive example triple is a set containing an entity and its neighbor entities and their relations). The penalty function for the alignment of the attribute path training entity is:
L A =L 0 +L a
Figure SMS_39
wherein: NS (e) represents a negative example sampling set of the entity e, and the negative example set of the entity e is constructed by adopting a nearest neighbor sampling method; d (·) =1-cos (·,) representsThe cosine distance between entities; [. The] + = max {., 0}; is a hyper-parameter;
and sixthly, modeling a relation structure. The method adopts the TransE model idea to calculate the similarity of the entity relationship structure. Given a relation triple (h, r, t), training the model based on the entity vector embedded by the global structure to enable h + r to be approximately equal to t, and further constraining the embedded representation of the head-tail entity while modeling the relation structure.
The formula for calculating the loss function of the training relationship structure part is as follows:
Figure SMS_40
wherein: f (h, r, t) = | | | h + r-t | | the hair holes 2 Representing a scoring function for the triples, and calculating the confidence of the triples, wherein h and t are head and tail entity vectors from a global structure embedding layer, and r is a relation vector to be modeled and learned; is a hyper-parameter; t is a unit of 1 Represents a positive example triplet, T 1 ' (h,r,t) And (h, r, t) | h ' is belonged to E } - { (h, r, t ') | t ' is belonged to E } represents a negative example triplet set, and the construction is carried out by randomly replacing head and tail entities with the same relationship type. When the model is trained, positive triples are given a lower score, negative triples are given a higher score, and the positive triples and the negative triples are distinguished by using the maximum interval, so that the entity is fused into a relationship structure in the embedding process, and the capability of the model for distinguishing the entity is improved. Training alignment and relation structure modeling loss simultaneously according to the pre-aligned entity seed set and the positive/negative example triples, updating model parameters in a global structure embedding layer and a local semantic optimization layer, and learning the influence of a topological structure and a relation structure on entity embedding simultaneously based on an entity alignment task. And obtaining the updated entity feature vector to obtain an entity similarity matrix S under the structural channel. The formula for calculating the aligned loss function of the structural channel training entity is as follows:
L s =L A +L R
by modeling the relationship structure similarity, the entity structure similarity can be calculated to obtain a structure similarity matrix S relation
And step seven, fusing the layers. Respectively obtaining a name attribute similarity matrix and a structural relationship similarity matrix S from the attribute channel and the structural channel relation . Firstly, the two matrixes are standardized to eliminate the influence of different characteristic dimension levels of the entity, the weight of the two matrixes is the same in the fusion process, and the final entity similarity is the mean value of the two matrixes. The similarity mean normalization and fusion formula is shown below:
Figure SMS_41
Figure SMS_42
Figure SMS_43
the following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Based on the method provided by the invention, under the condition of considering the structural similarity and the attribute name similarity, the structural relationship (C _ new1, cause and bug) of the target entity is brought into, the similarity calculation result is 0.75, and the similar entity recognition degree is greatly improved. The relationship structure < C _ new2, cause, communication interruption > substituted into C _ new2, the calculation result is 0.51. Therefore, the similarity calculation result difference between the target entity and other unrelated reason entities is obviously enlarged.
Correspondingly, the invention also provides an entity updating system of the knowledge graph, which comprises the following steps:
the acquisition module is used for acquiring a new knowledge graph and an original knowledge graph;
the first calculation module is used for calculating a name attribute similarity matrix based on the name attributes of the new knowledge graph and the original knowledge graph;
the second calculation module is used for calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and the updating module is used for fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
Further, the first calculating module is used for
Extracting sentences S in the new knowledge graph by using a mode of searching character embedding matrix Ai Word vector s of Ai Sum word vector x Ai
Extracting sentences S in the original knowledge graph by using a mode of searching character embedding matrixes Bi Word vector s Bi Sum word vector x Bi
According to the sentence S Ai Word vector s of Ai And sentence S Bi Word vector s Bi Calculating a sentence S Ai Corresponding sentence S Bi Word vector similarity of (2);
according to the sentence S Ai Word vector x Ai And sentence S Bi Word vector x Bi Calculating a sentence S Ai Corresponding sentence S Bi The word vector similarity of (2);
sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The word vector similarity is summed and averaged to obtain a sentence S Ai Corresponding sentence S Bi Name attribute similarity S of namei
And acquiring the name attribute similarity of each sentence in the new knowledge graph corresponding to each sentence in the original knowledge graph to obtain a name attribute similarity matrix.
Further, the second computing module is used for
And acquiring an entity relationship structure triple of each sentence in the new knowledge map, inputting the entity relationship structure triple into an entity relationship structure similarity model obtained by training the relationship structure triple based on the original knowledge map in advance, obtaining the entity structure similarity of each sentence, and obtaining an entity structure similarity matrix according to the entity structure similarity of each sentence.
Further, the second module is used for
Constructing an entity relationship structure similarity model to be trained, wherein the model is expressed as follows:
Figure SMS_44
Figure SMS_45
Figure SMS_46
wherein the content of the first and second substances,
Figure SMS_47
entity vectors representing the input and output of the ith layer of domain attention, respectively;
Figure SMS_48
an entity vector representing the input of the l-th layer of attention of the domain, comprising an entity e i And all its neighbors; σ represents an activation function sigmoid; n is a radical of hydrogen i Representing an entity e i Set of connected entities of e j Representing an entity e i And all its neighbors, e k Representing an entity e i All neighbors of (2);
Figure SMS_49
representing the entity domain attention coefficient after the l level normalization;
Figure SMS_50
representing an entity e i The result is fused with the neighbor j;
Figure SMS_51
representing an entity e i The result is fused with the neighbor k through information; exp () represents an exponential function with a natural constant e as the base; leakyReLU () represents an activation function; u is formed by R 2d(l+1)×1 And W (l) ∈R d(l+1)×d(l) Is a learnable parameter matrix; d (l) represents the network embedding dimension of the l-th layer;d (l + 1) represents the network embedding dimension of the l +1 th layer; superscript T represents matrix transposition;
constructing a pre-aligned entity seed set and positive/negative case triples;
constructing a loss function L for entity alignment for training an entity relationship structure similarity model A Expressed as:
L A =L 0 +L a
Figure SMS_52
wherein L is a Entity alignment loss function, L, representing a model of similarity of entity-relationship structures 0 Expressing a parameter matrix W orthogonalization loss function, and constructing a negative sampling set e _ofan entity e and a negative sampling set e 'of a neighbor entity e' of the entity e by adopting a nearest neighbor sampling method NS (e); d (·) =1-cos (·,) represents the inter-entity cosine distance; [. For] + = max { ·,0}; gamma is a hyperparameter;
Figure SMS_53
wherein, W (l) A parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure SMS_54
2, solving a 2-norm of a matrix and performing a squaring operation;
constructing a loss function L of a relationship structure for training an entity relationship structure similarity model R Expressed as:
Figure SMS_55
wherein f (h, r, t) = | | h + r-t | | purple light 2 Representing a scoring function for a relation triple (h, r, t) for calculating a confidence of the relation triple, wherein h, t is a head-tail entity vector from a global structure embedding layer, and r is a relation vector to be modeled and learned; γ' is a hyperparameter; t is a unit of 1 Represents a positive example triplet set, T 1'(h,r,t) { (h ', r, t) | h' ∈ E } - { (h, r, t ') | t' ∈ E } represents a negative case triplet set, h 'represents a head entity of the negative case global structure embedding layer, t' represents a tail entity of the negative case global structure embedding layer, and E represents a set including all negative case entities;
respectively training loss function L of entity alignment by using pre-aligned entity seed set and positive/negative example triples A Loss function L of sum relation structure R And determining final model parameters of the entity relationship structure similarity model, and updating the entity relationship structure similarity model according to the final model parameters to obtain the trained entity relationship structure similarity model.
Further, the update module is used for
Respectively standardizing the name attribute similarity matrix and the entity relationship similarity matrix;
and solving the average value of the normalized name attribute similarity matrix and the normalized entity relationship similarity matrix to obtain a fused final entity similarity sentence.
The present invention accordingly also provides a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described.
The invention also provides a computing device, comprising,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (12)

1. A method for entity update of a knowledge graph, comprising:
acquiring a new knowledge graph and an original knowledge graph;
based on the name attributes of the new knowledge graph and the original knowledge graph, calculating a name attribute similarity matrix;
calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph, and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
2. The method for entity updating of a knowledge-graph of claim 1, wherein the calculating of the entity attribute similarity matrix based on the name attributes of the new knowledge-graph and the original knowledge-graph comprises:
extracting sentences S in the new knowledge graph by using a mode of searching character embedding matrix Ai Word vector s of Ai Sum word vector x Ai
Extracting the sentence S in the original knowledge graph by using a mode of searching a character embedding matrix Bi Word vector s of Bi Sum word vector x Bi
According to the sentence S Ai Word vector s Ai And sentence S Bi Word vector s of Bi Computing a sentence S Ai Corresponding sentence S Bi Word vector similarity of (2);
according to sentence S Ai Word vector x Ai And sentence S Bi Word vector x Bi Calculating a sentence S Ai Corresponding sentence S Bi The word vector similarity of (2);
sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The word vector similarity is summed and averaged to obtain a sentence S Ai Corresponding sentence S Bi Name attribute similarity S of namei
And obtaining the name attribute similarity of each sentence in the new knowledge graph corresponding to each sentence in the original knowledge graph to obtain a name attribute similarity matrix.
3. The method for entity updating of a knowledge graph according to claim 1, wherein the calculating of the entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph comprises:
and acquiring an entity relationship structure triple of each sentence in the new knowledge map, inputting the entity relationship structure triple into an entity relationship structure similarity model obtained by training the relationship structure triple based on the original knowledge map in advance, obtaining the entity structure similarity of each sentence, and obtaining an entity structure similarity matrix according to the entity structure similarity of each sentence.
4. The method for updating the knowledge-graph entity according to claim 3, wherein the training process of the entity relationship structure similarity model obtained by training the entity relationship structure triples based on the original knowledge-graph comprises:
constructing an entity relationship structure similarity model to be trained, and expressing as follows:
Figure FDA0003820458740000021
Figure FDA0003820458740000022
Figure FDA0003820458740000023
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003820458740000024
entity vectors representing input and output of the ith layer of domain attention respectively;
Figure FDA0003820458740000025
an entity vector representing the input of the l-th layer of attention of the domain, comprising an entity e i And all its neighbors; σ represents an activation function sigmoid; n is a radical of i Representing an entity e i Set of connected entities of e j Representing an entity e i And all its neighbors, e k Representing an entity e i All neighbors of (2);
Figure FDA0003820458740000026
representing the entity domain attention coefficient after the l layer normalization;
Figure FDA0003820458740000027
representing an entity e i The result is fused with the neighbor j;
Figure FDA0003820458740000028
representing an entity e i The result is fused with the neighbor k through information; exp () represents an exponential function with a natural constant e as the base; leakyReLU () represents an activation function; u is formed by R 2d(l+1)×1 And W (l) ∈R d(l+1)×d(l) Is a learnable parameter matrix; d (l) represents the network embedding dimension of the l-th layer; d (l + 1) represents the network embedding dimension of the l +1 th layer; superscript T represents matrix transposition;
constructing a pre-aligned entity seed set and positive/negative case triples;
constructing a loss function L for entity alignment for training an entity relationship structure similarity model A Expressed as:
L A =L 0 +L a
Figure FDA0003820458740000029
wherein L is a Entity alignment loss function, L, representing an entity-relationship structural similarity model 0 Representing the orthogonalization loss of a parameter matrix WA function, a negative sampling set e _ofan entity e and a negative sampling set e '_ of a neighbor entity e' of the entity e are constructed by adopting a nearest neighbor sampling method NS (e); d (·) =1-cos (·,) represents the inter-entity cosine distance; [. The] + = max {., 0}; gamma is a hyperparameter;
Figure FDA0003820458740000031
wherein, W (l) A parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure FDA0003820458740000032
2, representing the 2 norm of the matrix and taking the square operation;
constructing a loss function L of a relationship structure for training an entity relationship structure similarity model R Expressed as:
Figure FDA0003820458740000033
wherein f (h, r, t) = | | h + r-t | | purple light 2 Representing a scoring function for a relation triple (h, r, t) for calculating a confidence of the relation triple, wherein h, t is a head-tail entity vector from a global structure embedding layer, and r is a relation vector to be modeled and learned; γ' is a hyperparameter; t is a unit of 1 Represents a positive ternary set, T' 1(h,r,t) { (h ', r, t) | h' ∈ E } - { (h, r, t ') | t' ∈ E } represents a negative case triplet set, h 'represents a head entity of the negative case global structure embedding layer, t' represents a tail entity of the negative case global structure embedding layer, and E represents a set including all negative case entities;
training loss function L of entity alignment respectively by using pre-aligned entity seed set and positive/negative example triples A Loss function L of sum relation structure R And determining final model parameters of the entity relationship structure similarity model, and updating the entity relationship structure similarity model according to the final model parameters to obtain the trained entity relationship structure similarity model.
5. The method for updating an entity of a knowledge graph of claim 4, wherein the fusing the name attribute similarity matrix and the entity relationship similarity matrix comprises:
respectively standardizing the name attribute similarity matrix and the entity relationship similarity matrix;
and solving the average value of the normalized name attribute similarity matrix and the normalized entity relationship similarity matrix to obtain a fused final entity similarity sentence.
6. An entity update system for a knowledge graph, comprising:
the acquisition module is used for acquiring a new knowledge graph and an original knowledge graph;
the first calculation module is used for calculating a name attribute similarity matrix based on the name attributes of the new knowledge graph and the original knowledge graph;
the second calculation module is used for calculating an entity relationship similarity matrix based on the entity relationship structure triples of the new knowledge graph and the original knowledge graph;
and the updating module is used for fusing the name attribute similarity matrix and the entity relationship similarity matrix to obtain an entity corresponding to the new knowledge graph and updating the entity corresponding to the new knowledge graph into the original knowledge graph.
7. The knowledgeable graph entity update identity of claim 6, wherein said first computing module is configured to
Extracting sentences S in the new knowledge graph by using a mode of searching character embedding matrixes Ai Word vector s Ai Sum word vector x Ai
Extracting the sentence S in the original knowledge graph by using a mode of searching a character embedding matrix Bi Word vector s of Bi Sum word vector x Bi
According to the sentence S Ai Word vector s Ai And sentence S Bi Word vector s of Bi Calculating a sentence S Ai Corresponding sentence S Bi Word vector similarity of (2);
according to the sentence S Ai Word vector x Ai And sentence S Bi Word vector x Bi Computing a sentence S Ai Corresponding sentence S Bi The word vector similarity of (2);
sentence S Ai Corresponding sentence S Bi Word vector similarity and sentence S Ai Corresponding sentence S Bi The word vector similarity is summed and averaged to obtain a sentence S Ai Corresponding sentence S Bi Name attribute similarity S of namei
And obtaining the name attribute similarity of each sentence in the new knowledge graph corresponding to each sentence in the original knowledge graph to obtain a name attribute similarity matrix.
8. The knowledgeable graph entity update identity of claim 6, wherein said second computing module is configured to
And acquiring an entity relationship structure triple of each sentence in the new knowledge map, inputting the entity relationship structure triple into an entity relationship structure similarity model obtained by training the relationship structure triple based on the original knowledge map in advance, obtaining the entity structure similarity of each sentence, and obtaining an entity structure similarity matrix according to the entity structure similarity of each sentence.
9. The knowledgegraph entity update identity of claim 8, wherein said second module is configured to update identity of a knowledgegraph entity
Constructing an entity relationship structure similarity model to be trained, and expressing as follows:
Figure FDA0003820458740000041
Figure FDA0003820458740000042
Figure FDA0003820458740000051
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003820458740000052
entity vectors representing input and output of the ith layer of domain attention respectively;
Figure FDA0003820458740000053
an entity vector representing the input of the l-th layer of attention of the domain, comprising an entity e i And all its neighbors; σ represents an activation function sigmoid; n is a radical of i Representing an entity e i Set of connected entities of e j Representing an entity e i And all its neighbors, e k Representing an entity e i All neighbors of (2);
Figure FDA0003820458740000054
representing the entity domain attention coefficient after the l level normalization;
Figure FDA0003820458740000055
representing an entity e i The result is fused with the neighbor j;
Figure FDA0003820458740000056
representing an entity e i The result is fused with the neighbor k through information; exp () represents an exponential function with a natural constant e as the base; leakyReLU () represents an activation function; u is an element of R 2d(l+1)×1 And W (l) ∈R d(l+1)×d(l) Is a learnable parameter matrix; d (l) represents the network embedding dimension of the l-th layer; d (l + 1) represents the network embedding dimension of the l +1 th layer; superscript T represents matrix transposition;
constructing a pre-aligned entity seed set and positive/negative case triples;
constructing a loss function L for entity alignment for training an entity relationship structure similarity model A Expressed as:
L A =L 0 +L a
Figure FDA0003820458740000057
wherein L is a Entity alignment loss function, L, representing an entity-relationship structural similarity model 0 Expressing a parameter matrix W orthogonalization loss function, and constructing a negative sampling set e _ofan entity e and a negative sampling set e 'of a neighbor entity e' of the entity e by adopting a nearest neighbor sampling method NS (e); d (·, =1-cos (·, ·) denotes an inter-entity cosine distance; [. For] + = max {., 0}; gamma is a hyperparameter;
Figure FDA0003820458740000058
wherein, W (l) A parameter matrix representing the l-th layer; m is the number of attention network embedding layers;
Figure FDA0003820458740000059
2, representing the 2 norm of the matrix and taking the square operation;
constructing a loss function L of a relationship structure for training an entity relationship structure similarity model R Expressed as:
Figure FDA00038204587400000510
wherein f (h, r, t) = | | h + r-t | | purple light 2 Representing a scoring function for a relation triple (h, r, t) for calculating a confidence of the relation triple, wherein h, t is a head-tail entity vector from a global structure embedding layer, and r is a relation vector to be modeled and learned; γ' is a hyperparameter; t is 1 Represents a positive case ternary set, T' 1(h,r,t) { (h ', r, t) | h' ∈ E } - { (h, r, t ') | t' ∈ E } represents a negative-case triplet set, h 'represents a head entity of the negative-case global structure embedding layer, t' represents a tail entity of the negative-case global structure embedding layer, and E represents a tail entity including all negative-case entitiesA set of (a);
training loss function L of entity alignment respectively by using pre-aligned entity seed set and positive/negative example triples A Sum-relation-structured loss function L R And determining final model parameters of the entity relationship structure similarity model, and updating the entity relationship structure similarity model according to the final model parameters to obtain the trained entity relationship structure similarity model.
10. The knowledgegraph entity update identity of claim 9, wherein the update module is configured to update the knowledge graph entity update identity
Respectively standardizing the name attribute similarity matrix and the entity relationship similarity matrix;
and solving the average value of the normalized name attribute similarity matrix and the normalized entity relationship similarity matrix to obtain a fused final entity similarity sentence.
11. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-5.
12. A computing device, comprising,
one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods of claims 1-5.
CN202211047396.6A 2022-08-29 2022-08-29 Entity updating method and system of knowledge graph Pending CN115809340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211047396.6A CN115809340A (en) 2022-08-29 2022-08-29 Entity updating method and system of knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211047396.6A CN115809340A (en) 2022-08-29 2022-08-29 Entity updating method and system of knowledge graph

Publications (1)

Publication Number Publication Date
CN115809340A true CN115809340A (en) 2023-03-17

Family

ID=85482426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211047396.6A Pending CN115809340A (en) 2022-08-29 2022-08-29 Entity updating method and system of knowledge graph

Country Status (1)

Country Link
CN (1) CN115809340A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150405A (en) * 2023-04-19 2023-05-23 中电科大数据研究院有限公司 Heterogeneous data processing method for multiple scenes
CN116226541A (en) * 2023-05-11 2023-06-06 湖南工商大学 Knowledge graph-based network hotspot information recommendation method, system and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150405A (en) * 2023-04-19 2023-05-23 中电科大数据研究院有限公司 Heterogeneous data processing method for multiple scenes
CN116150405B (en) * 2023-04-19 2023-06-27 中电科大数据研究院有限公司 Heterogeneous data processing method for multiple scenes
CN116226541A (en) * 2023-05-11 2023-06-06 湖南工商大学 Knowledge graph-based network hotspot information recommendation method, system and equipment

Similar Documents

Publication Publication Date Title
CN110825881B (en) Method for establishing electric power knowledge graph
CN115809340A (en) Entity updating method and system of knowledge graph
CN107967255A (en) A kind of method and system for judging text similarity
CN109857846B (en) Method and device for matching user question and knowledge point
Lin et al. Deep structured scene parsing by learning with image descriptions
CN112800774B (en) Entity relation extraction method, device, medium and equipment based on attention mechanism
CN112711953A (en) Text multi-label classification method and system based on attention mechanism and GCN
CN106997373A (en) A kind of link prediction method based on depth confidence network
CN106778878A (en) A kind of character relation sorting technique and device
CN113157919A (en) Sentence text aspect level emotion classification method and system
CN115328782A (en) Semi-supervised software defect prediction method based on graph representation learning and knowledge distillation
CN114841151A (en) Medical text entity relation joint extraction method based on decomposition-recombination strategy
CN113033186B (en) Error correction early warning method and system based on event analysis
CN113420117B (en) Sudden event classification method based on multivariate feature fusion
CN114840685A (en) Emergency plan knowledge graph construction method
WO2020240572A1 (en) Method for training a discriminator
CN113779190A (en) Event cause and effect relationship identification method and device, electronic equipment and storage medium
CN112905750A (en) Generation method and device of optimization model
CN117494760A (en) Semantic tag-rich data augmentation method based on ultra-large-scale language model
CN115358477B (en) Fight design random generation system and application thereof
CN116680407A (en) Knowledge graph construction method and device
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium
Li et al. Evaluating BERT on cloud-edge time series forecasting and sentiment analysis via prompt learning
CN113392220A (en) Knowledge graph generation method and device, computer equipment and storage medium
CN113094504A (en) Self-adaptive text classification method and device based on automatic machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination