CN115658927A - Time sequence knowledge graph-oriented unsupervised entity alignment method and device - Google Patents

Time sequence knowledge graph-oriented unsupervised entity alignment method and device Download PDF

Info

Publication number
CN115658927A
CN115658927A CN202211461066.1A CN202211461066A CN115658927A CN 115658927 A CN115658927 A CN 115658927A CN 202211461066 A CN202211461066 A CN 202211461066A CN 115658927 A CN115658927 A CN 115658927A
Authority
CN
China
Prior art keywords
time
graph
entity
matrix
entity alignment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211461066.1A
Other languages
Chinese (zh)
Other versions
CN115658927B (en
Inventor
陈璐
王嘉琪
高云君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211461066.1A priority Critical patent/CN115658927B/en
Publication of CN115658927A publication Critical patent/CN115658927A/en
Application granted granted Critical
Publication of CN115658927B publication Critical patent/CN115658927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a time sequence knowledge graph-oriented unsupervised entity alignment method and a time sequence knowledge graph-oriented unsupervised entity alignment device, wherein the method comprises the following steps: acquiring two time sequence knowledge graphs, wherein each time sequence knowledge graph comprises a plurality of quadruples containing time information; according to entities and corresponding time information in each time sequence knowledge graph, two time characteristic matrixes are constructed in a graph volume type forward transmission mode, two entity alignment matrixes are generated by adopting a bidirectional strategy, and pre-aligned pseudo labels are obtained in a matching mode in an unsupervised mode; training a graph neural network model expanded by using time information by taking a quadruple of a time sequence knowledge graph as a training data set and taking a pre-aligned pseudo label as an untrained data label to obtain a relation characteristic matrix; and fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain the distance between the two time sequence knowledge maps, and obtaining a corresponding entity alignment matrix by minimizing the distance, thereby obtaining an entity alignment result.

Description

Time sequence knowledge graph-oriented unsupervised entity alignment method and device
Technical Field
The invention belongs to the technical field of knowledge graph entity alignment, and particularly relates to a time sequence knowledge graph-oriented unsupervised entity alignment method and device.
Background
In recent years, as a tool for representing structured information of real objects, knowledge maps are applied more and more widely in semantic search, recommendation systems and question and answer systems. In order to fuse knowledge-maps from different sources to compensate for their incompleteness, entities from different knowledge-maps that point to the same real-world object are first aligned, i.e., "entity-aligned".
The time-series knowledge graph expands the traditional knowledge graph by introducing time information, and has recently received more and more attention. Most existing embedding-based entity alignment methods do not take into account the additional temporal information in the timing knowledgegraph, which easily leads to mis-alignment of entities with similar neighborhood structure but corresponding to different temporal information. Incorporating time information into the entity alignment process can significantly improve the performance of timing knowledge graph entity alignment. Therefore, designing an efficient entity alignment method facing to the timing knowledge graph has become an urgent need in academia and industry.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
first, the existing research only creates an embedding method for each time information to enhance the graph learning process, and does not fully utilize the advantages of the time information in the time-series knowledge graph, resulting in limited entity alignment accuracy. In addition, the existing method ignores the characteristic that time information representing a real time period is naturally aligned in a time sequence knowledge graph, excessively depends on a pre-aligned entity pair as training data, and the process needs a large amount of manpower, so that the entity alignment efficiency is low.
Disclosure of Invention
Aiming at the defects of the prior art, the embodiments of the present application provide an unsupervised entity alignment method and apparatus for a timing knowledge graph, which implement accurate and efficient entity alignment without additional data alignment, and improve accuracy and efficiency of entity alignment.
According to a first aspect of the embodiments of the present application, there is provided a time-series knowledge graph-oriented unsupervised entity alignment method, including:
s11: acquiring two time sequence knowledge maps, wherein each time sequence knowledge map comprises a plurality of quadruples containing time information;
s12: according to the entity in each time sequence knowledge graph and the corresponding time information, two time characteristic matrixes are constructed in a graph convolution type forward transmission mode;
s13: generating two entity alignment matrixes by adopting a bidirectional strategy according to the two time characteristic matrixes, and unsupervised obtaining a pre-aligned pseudo label in a matching mode through the two entity alignment matrixes;
s14: expanding the neural network model of the graph by using the time information, taking the quadruple of the time sequence knowledge graph as a training data set, and taking the pre-aligned pseudo labels as untrained data labels, and training the expanded neural network model of the graph to obtain a relation characteristic matrix;
s15: fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain a fused normalized entity alignment matrix;
s16: and obtaining the distance between the two time sequence knowledge maps by using the normalized entity alignment matrix, and obtaining a corresponding entity alignment matrix by minimizing the distance, thereby obtaining an entity alignment result.
Further, in step S11, quadruples
Figure 239849DEST_PATH_IMAGE001
Representing a subject entity
Figure 78361DEST_PATH_IMAGE002
At a time interval
Figure 654836DEST_PATH_IMAGE003
Inter and object entities
Figure 581204DEST_PATH_IMAGE004
Have a relationship with
Figure 322807DEST_PATH_IMAGE005
Further, in step S12, the following operations are performed on each time-series knowledge graph, so as to construct two time feature matrices:
s21: extracting a bipartite graph of an entity and time as a preliminary time characteristic matrix;
s22: constructing a relation adjacency matrix with weight for the knowledge graph according to the proportion of different relation types;
s23: and aggregating information from the neighbor entities through graph convolution forward transmission based on the time characteristic matrix and the relational adjacency matrix to supplement time characteristics to obtain an aggregated time characteristic matrix.
Further, step S13 includes:
s31: preliminarily deducing two entity alignment matrixes in the forward direction and the reverse direction respectively for the two time characteristic matrixes;
s32: and respectively identifying corresponding entities with the highest similarity to the entities in the two entity alignment matrixes to obtain a plurality of entity pairs, and if the obtained entity pairs are matched with each other in the bidirectional strategy, obtaining the entity pairs as pre-aligned pseudo labels.
Further, step S14 includes:
s41: initializing a learnable embedded vector;
s42: constructing a loss function by using a negative sample sampling method;
s43: constructing a multilayer graph neural network model for learning the structural features of the knowledge graph by adding a time information expansion graph neural network model by using the embedded vector;
s44: and taking the pre-aligned pseudo label as a training data label, and training a multilayer graph neural network model until the loss function is completely converged to obtain a relation characteristic matrix.
Further, in step S15, the fused normalized entity alignment matrix
Figure 676428DEST_PATH_IMAGE006
Wherein
Figure 740199DEST_PATH_IMAGE007
In order to fuse the weights, the weights are fused,
Figure 329312DEST_PATH_IMAGE008
and
Figure 794928DEST_PATH_IMAGE009
respectively are time characteristic matrixes of two time sequence knowledge graphs,
Figure 709664DEST_PATH_IMAGE010
and
Figure 995151DEST_PATH_IMAGE011
respectively are the relationship characteristic matrixes of the two time sequence knowledge graphs after the entity is split.
Further, step S16 includes:
obtaining a relation distance and a time distance through the relation characteristic and the time characteristic by using the normalized entity alignment matrix;
selecting a WL graph kernel algorithm to set weights for the relationship distance and the time distance respectively, and obtaining the distance between the two time sequence knowledge graphs through weighted summation;
and searching the fusion weight which enables the distance to be minimum in a preset range, thereby determining a corresponding entity alignment matrix and obtaining an entity alignment result.
According to a second aspect of the embodiments of the present application, there is provided a time-series knowledge-graph-oriented unsupervised entity alignment apparatus, including:
the acquisition module is used for acquiring two time sequence knowledge graphs, and each time sequence knowledge graph comprises a plurality of quadruples containing time information;
the construction module is used for constructing two time characteristic matrixes in a graph convolution type forward transmission mode according to the entity in each time sequence knowledge graph and the corresponding time information;
the pre-alignment module is used for generating two entity alignment matrixes by adopting a bidirectional strategy according to the two time characteristic matrixes and unsupervised obtaining pre-aligned pseudo labels in a matching mode through the two entity alignment matrixes;
the training module is used for utilizing the time information to expand the neural network model of the graph, taking the quadruple of the time sequence knowledge graph as a training data set and the pre-aligned pseudo labels as untrained data labels, and training the expanded neural network model of the graph to obtain a relation characteristic matrix;
the fusion module is used for fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain a fused normalized entity alignment matrix;
and the alignment module is used for obtaining the distance between the two time sequence knowledge graphs by utilizing the normalized entity alignment matrix, and obtaining a corresponding entity alignment matrix by minimizing the distance so as to obtain an entity alignment result.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the embodiments, the unsupervised entity alignment method capable of fully utilizing the time information is established for the time sequence knowledge graph. The method converts the entity alignment problem of the time sequence knowledge graph into a graph matching problem, and independently encodes the time characteristic and the relation characteristic into the embedded matrix respectively. On one hand, the coding of the time characteristics fully utilizes the advantages of the time information in the time sequence knowledge graph, and improves the accuracy of entity alignment. On the other hand, the time characteristic matrix is used for generating pre-aligned pseudo labels for the two knowledge graphs in an unsupervised mode, on the basis, the neural network model of the graph is trained to encode the relation characteristics, the entity pairs with known alignment do not need to be marked manually, and the entity alignment efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application.
FIG. 1 is a flow diagram illustrating a method for temporal-knowledge-graph-oriented unsupervised entity alignment, according to an example embodiment.
FIG. 2 is a flowchart illustrating the sub-steps performed by each timing knowledgegraph in step S12 according to an exemplary embodiment.
Fig. 3 is a schematic diagram illustrating an unsupervised entity alignment process in accordance with an exemplary embodiment.
Fig. 4 is a flowchart illustrating step S13 according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating step S14 according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating step S16 according to an exemplary embodiment.
FIG. 7 is a block diagram illustrating a temporal-knowledge-graph-oriented unsupervised entity alignment apparatus in accordance with an exemplary embodiment.
FIG. 8 is a schematic diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
FIG. 1 is a flow diagram illustrating a method for temporal-knowledgegraph-oriented unsupervised entity alignment, as shown in FIG. 1, which may include the steps of:
s11: acquiring two time sequence knowledge maps, wherein each time sequence knowledge map comprises a plurality of quadruples containing time information;
s12: according to the entity in each time sequence knowledge graph and the corresponding time information, two time characteristic matrixes are constructed in a graph convolution type forward transmission mode;
s13: generating two entity alignment matrixes by adopting a bidirectional strategy according to the two time characteristic matrixes, and unsupervised obtaining a pre-aligned pseudo label in a matching mode through the two entity alignment matrixes;
s14: expanding the neural network model of the graph by using the time information, taking the quadruple of the time sequence knowledge graph as a training data set, and taking the pre-aligned pseudo labels as untrained data labels, and training the expanded neural network model of the graph to obtain a relation characteristic matrix;
s15: fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain a fused normalized entity alignment matrix;
s16: and obtaining the distance between the two time sequence knowledge maps by using the normalized entity alignment matrix, and obtaining a corresponding entity alignment matrix by minimizing the distance, thereby obtaining an entity alignment result.
According to the embodiments, the unsupervised entity alignment method capable of fully utilizing the time information is established for the time sequence knowledge graph. The method converts the entity alignment problem of the time sequence knowledge graph into a graph matching problem, and independently codes the time characteristics and the relation characteristics into the embedded matrix respectively. On one hand, the coding of the time characteristics fully utilizes the advantages of the time information in the time sequence knowledge graph, and improves the accuracy of entity alignment. On the other hand, the time characteristic matrix is used for generating pre-aligned pseudo labels for the two knowledge graphs in an unsupervised mode, on the basis, the neural network model of the graph is trained to encode the relation characteristics, the entity pairs with known alignment do not need to be marked manually, and the entity alignment efficiency is improved.
In the specific implementation manner of S11, two time-series knowledge-maps are obtained, where each time-series knowledge-map includes a plurality of quadruples containing time information;
in particular, the present invention can be applied to a plurality of actual fields such as information integration of international events and medical events. Without loss of generality, the present invention represents a timing knowledge graph as
Figure 387955DEST_PATH_IMAGE012
In which
Figure 708078DEST_PATH_IMAGE013
Is a set of entities that are,
Figure 669081DEST_PATH_IMAGE014
is a set of relationships that are,
Figure 832078DEST_PATH_IMAGE015
is a set of time intervals of which the number of active terminals,
Figure 638360DEST_PATH_IMAGE016
is a set of quadruplets, and the quadruplets
Figure 72709DEST_PATH_IMAGE001
Representing a subject entity
Figure 329247DEST_PATH_IMAGE002
At time intervals
Figure 589327DEST_PATH_IMAGE003
Inter and object entities
Figure 464879DEST_PATH_IMAGE004
Have a relationship with
Figure 618649DEST_PATH_IMAGE005
Figure 187034DEST_PATH_IMAGE003
Is shown as
Figure 527885DEST_PATH_IMAGE017
Including the start time point
Figure 207128DEST_PATH_IMAGE018
And end time point
Figure 949825DEST_PATH_IMAGE019
Figure 954690DEST_PATH_IMAGE018
And
Figure 782838DEST_PATH_IMAGE019
may or may not be equal. For example,in the international event domain, an entity may be a collection of international personae and countries, where a set of quadruplets may be represented as (persona a, visit, country B, [2019.12.9, 2019.12.9)]) (ii) a In the field of medical events, an entity may be a set of patients, departments, and hospitals, where a set of four-tuples may be represented as (xiaoming, hospitalization, hospital C, [2015.10.1, 2015.10.6)]). After the representation of the time sequence knowledge graph is defined, the source time sequence knowledge graph and the target time sequence knowledge graph are input and are respectively recorded as
Figure 271631DEST_PATH_IMAGE020
And
Figure 9780DEST_PATH_IMAGE021
and is represented by
Figure 44601DEST_PATH_IMAGE022
And
Figure 500990DEST_PATH_IMAGE023
in which
Figure 522036DEST_PATH_IMAGE024
Representing a set of overlapping time intervals in the two knowledge-graphs. The task of entity alignment using time information aims to find a slave
Figure 239325DEST_PATH_IMAGE020
To
Figure 320413DEST_PATH_IMAGE021
One-to-one entity mapping of
Figure 388732DEST_PATH_IMAGE025
I.e. by
Figure 947890DEST_PATH_IMAGE026
In the specific implementation manner of S12, two time feature matrices are constructed in a graph convolution forward transfer manner according to the entity and the corresponding time information in each time sequence knowledge graph;
in particular, due to the presence of isolated entity sets, pairs of entities that are in different connected components than pre-aligned, making it difficult for alignment information to propagate into the embedding of such entities. Based on the alignment to be
Figure 191789DEST_PATH_IMAGE020
And
Figure 37254DEST_PATH_IMAGE021
sharing the same set of time intervals
Figure 733815DEST_PATH_IMAGE024
The present invention makes the following assumptions: if two entities
Figure 848662DEST_PATH_IMAGE027
And
Figure 71702DEST_PATH_IMAGE028
the time interval overlapping among the quadruples related to the quadruples is more, then
Figure 697855DEST_PATH_IMAGE027
And
Figure 6346DEST_PATH_IMAGE028
possibly pointing to the same real world object. In this step, as shown in fig. 2, the following sub-steps may be performed for each time series knowledge graph, so as to construct two time feature matrices:
s21: extracting a bipartite graph of an entity and time as a preliminary time characteristic matrix;
in an embodiment, as shown in (a) of FIG. 3, for the entities therein
Figure 641726DEST_PATH_IMAGE013
And time interval
Figure 594639DEST_PATH_IMAGE015
Digging sparse characteristics and extracting a bipartite graph
Figure 47486DEST_PATH_IMAGE029
. For each item in a bipartite graph
Figure 453059DEST_PATH_IMAGE030
Is provided with
Figure 282344DEST_PATH_IMAGE031
In which
Figure 89763DEST_PATH_IMAGE032
Is an inclusion entity
Figure 588877DEST_PATH_IMAGE033
And time interval
Figure 612240DEST_PATH_IMAGE034
The number of quadruplets of (c). Obtain the bipartite graph
Figure 120582DEST_PATH_IMAGE035
As a collection of entities
Figure 516928DEST_PATH_IMAGE013
The time adjacency matrix of (a).
S22: constructing a relation adjacency matrix with weight for the knowledge graph according to the proportion of different relation types;
in particular, to take advantage of the effects of different relationship types on the knowledge-graph, a relationship adjacency matrix is constructed according to the proportions of the different relationship types
Figure 577157DEST_PATH_IMAGE036
. Specifically, for each
Figure 691743DEST_PATH_IMAGE037
Figure 738197DEST_PATH_IMAGE038
Wherein
Figure 379263DEST_PATH_IMAGE039
Representation and entity
Figure 485759DEST_PATH_IMAGE040
The set of entities that are adjacent to each other,
Figure 212275DEST_PATH_IMAGE041
is that
Figure 62420DEST_PATH_IMAGE040
And
Figure 433358DEST_PATH_IMAGE042
the set of relationships between the first and second sets of relationships,
Figure 835389DEST_PATH_IMAGE043
and
Figure 653130DEST_PATH_IMAGE044
representing the number and containment relationships of all quadruples
Figure 306965DEST_PATH_IMAGE045
The number of quadruples of (2).
S23: aggregating information from neighbor entities through graph convolution forward transmission based on the time characteristic matrix and the relational adjacency matrix to supplement time characteristics to obtain an aggregated time characteristic matrix;
specifically, to fully utilize neighborhood information while utilizing temporal features, information from neighboring entities is aggregated to interpolate temporal features. In one embodiment, as shown in fig. 3 (a), the aggregated time feature matrix is obtained by forward pass of L-layer graph convolution:
Figure 63568DEST_PATH_IMAGE046
wherein L is a hyper-parameter, representing the number of layers of graph convolution, typically set between 1 and 3;
Figure 105343DEST_PATH_IMAGE047
is prepared from
Figure 212976DEST_PATH_IMAGE048
The relation adjacency matrix of the jump is directly obtained by the quadruple of the knowledge graph.
In a specific implementation manner of S13, according to the two time feature matrices, two entity alignment matrices are generated by using a bidirectional policy, and a pre-aligned pseudo tag is obtained in a matching manner without supervision through the two entity alignment matrices;
specifically, as shown in fig. 4, this step may include the following sub-steps:
s31: preliminarily deducing two entity alignment matrixes in the forward direction and the reverse direction respectively for the two time characteristic matrixes;
specifically, the time characteristic matrix of two time sequence knowledge graphs aggregation is input
Figure 998398DEST_PATH_IMAGE008
And
Figure 343929DEST_PATH_IMAGE009
preliminarily deriving the entity alignment matrix in the forward and backward directions, respectively
Figure 87763DEST_PATH_IMAGE049
And
Figure 151534DEST_PATH_IMAGE050
s32: respectively identifying corresponding entities with the highest similarity to the entities in the two entity alignment matrixes so as to obtain a plurality of entity pairs, and if the obtained entity pairs are matched with each other in the bidirectional strategy, obtaining the entity pairs as pre-aligned pseudo labels;
specifically, in
Figure 740647DEST_PATH_IMAGE051
And
Figure 940684DEST_PATH_IMAGE052
respectively identifying corresponding entities in another knowledge graph with highest similarity to the entities, and if the obtained entity pair is in
Figure 730785DEST_PATH_IMAGE051
And
Figure 412346DEST_PATH_IMAGE052
if they match, the entity pair is obtained as the pre-aligned pseudo label.
In the specific implementation manner of S14, extending the neural network model of the graph by using the time information, taking the quadruple of the time sequence knowledge graph as a training data set, and taking the pre-aligned pseudo labels as untrained data labels, training the extended neural network model of the graph, and obtaining a relationship feature matrix;
specifically, as shown in fig. 5, this step may include the following sub-steps:
s41: initializing a learnable embedded vector;
specifically, in order to increase the convergence rate, a gloriot initialization method is selected for initialization, and the method uses
Figure 680516DEST_PATH_IMAGE053
As entities
Figure 639DEST_PATH_IMAGE040
Of (2) is initialized
Figure 86275DEST_PATH_IMAGE054
The dimension may learn the embedded vector.
S42: constructing a loss function by using a negative sample sampling method;
specifically, a negative sample sampling method is adopted to construct a loss function of
Figure 859059DEST_PATH_IMAGE055
In which
Figure 462079DEST_PATH_IMAGE056
Is the smoothing factor of the LSE and,
Figure 495763DEST_PATH_IMAGE057
defined as a normalized triplet loss function.
S43: constructing a multilayer graph neural network model for learning the structural features of the knowledge graph by adding a time information expansion graph neural network model by using the embedded vector;
specifically, the original entity and relationship embedding model is expanded by using additional time embedding information to form an integral three-aspect embedding method, an L-layer graph neural network model for learning the structural features of the knowledge graph is constructed, and information including the entity, the relationship and the time is learned together. For each embedded vector
Figure 627667DEST_PATH_IMAGE058
Is provided with
Figure 12381DEST_PATH_IMAGE059
Wherein
Figure 887933DEST_PATH_IMAGE060
Figure 917069DEST_PATH_IMAGE061
Figure 362086DEST_PATH_IMAGE062
Vectors representing corresponding entities, relationships and time intervals respectively,
Figure 109462DEST_PATH_IMAGE063
and
Figure 788705DEST_PATH_IMAGE064
respectively representing a set of relationships and a set of time intervals around the entity.
S44: and taking the pre-aligned pseudo labels as training data labels, and training a multilayer graph neural network model until the loss function is completely converged to obtain a relation characteristic matrix.
Specifically, in the unsupervised approach of the present application, there are no pairs of entities for which alignment is known in advance. Therefore, in one embodiment, as shown in fig. 3 (b), the quads of the time-series knowledge graph are used as the training data set, the pre-aligned pseudo labels generated in step S13 are input as the training data labels, and the neural network model of the graph is trained until the training data labels are obtainedThe loss function is completely converged to obtain a relation characteristic matrix
Figure 796982DEST_PATH_IMAGE065
The relational feature matrix is the output of the last round of training.
In the specific implementation manner of S15, the relationship characteristic matrix and the two time characteristic matrices are fused in a weighting manner to obtain a fused normalized entity alignment matrix;
specifically, because the time characteristic matrix and the relation characteristic matrix are obtained from different encoders and have different influences on the entity alignment result, the fusion weight is introduced
Figure 270688DEST_PATH_IMAGE007
To balance the effects of two features, a fused normalized entity alignment matrix is defined:
Figure 364415DEST_PATH_IMAGE006
wherein, in the process,
Figure 581770DEST_PATH_IMAGE010
and
Figure 710132DEST_PATH_IMAGE011
the relational feature matrix obtained in step S14
Figure 354740DEST_PATH_IMAGE066
Obtained by the entity splitting of two knowledge graphs.
In the specific implementation manner of S16, the distance between the two timing knowledge graphs is obtained by using the normalized entity alignment matrix, and a corresponding entity alignment matrix is obtained by minimizing the distance, so as to obtain an entity alignment result.
Specifically, as shown in fig. 6, this step may include the following sub-steps:
step S61: obtaining a relation distance and a time distance through the relation characteristic and the time characteristic by using the normalized entity alignment matrix;
in particular, the amount of the solvent to be used,
Figure 342287DEST_PATH_IMAGE067
and
Figure 97754DEST_PATH_IMAGE068
distance of two knowledge-maps measured by relational and temporal features, respectively, wherein
Figure 346201DEST_PATH_IMAGE069
A relational adjacency matrix that is a source timing knowledge graph,
Figure 167570DEST_PATH_IMAGE070
a relational adjacency matrix for the target temporal knowledge graph,
Figure 376834DEST_PATH_IMAGE071
is a time adjacency matrix of the source timing knowledge graph,
Figure 529467DEST_PATH_IMAGE072
the time adjacency matrix, which is the target time-series knowledge graph, is obtained in steps S21-S23.
Step S62: setting weights for the relation distance and the time distance respectively by using a WL graph kernel algorithm, and obtaining the distance between the two time sequence knowledge graphs through weighted summation;
in particular, since the two knowledge-graphs may be non-homogeneous and the relationship adjacency matrix
Figure 507787DEST_PATH_IMAGE069
Figure 353252DEST_PATH_IMAGE070
And time adjacency matrix
Figure 315392DEST_PATH_IMAGE071
Figure 537295DEST_PATH_IMAGE072
Are separately constructed and assigned different weights, and the final distance between knowledge-graphsCan be expressed as:
Figure 370122DEST_PATH_IMAGE073
in which
Figure 527434DEST_PATH_IMAGE074
And
Figure 835924DEST_PATH_IMAGE075
is the weight calculated according to the WL graph kernel algorithm based on the adjacency matrix isomorphism.
Step S63: and searching the fusion weight which enables the distance to be minimum in a preset range, thereby determining a corresponding entity alignment matrix and obtaining an entity alignment result.
In particular, it is particularly in
Figure 471305DEST_PATH_IMAGE076
In-range search
Figure 424217DEST_PATH_IMAGE077
At a defined distance
Figure 136784DEST_PATH_IMAGE078
And is minimal. In one embodiment, as shown in FIG. 3 (c), optimal fusion weights are obtained simultaneously
Figure 276778DEST_PATH_IMAGE007
Value and entity alignment matrix
Figure 247008DEST_PATH_IMAGE079
. Finally, by finding the entity alignment matrix
Figure 647903DEST_PATH_IMAGE079
Determining the entity alignment result according to the maximum value corresponding to each entity.
In the field of information integration of medical events, alignment tasks may be performed on knowledge maps from different sources by the present method. For example, in the knowledge maps from the registration department and from the cardiothoracic surgery department, there are multiple identical quadruple information such as the patient xiaoming (xiaoming, visit, cardiothoracic surgery, [2015.10.1, 2015.10.1 ]) and (xiaoming, hospitalization, hospital C, [2015.10.1, 2015.10.6 ]), so that the entities of the two knowledge maps can be finally aligned by the method. Moreover, by the method, the same-name entities with the same treatment records at different times can be prevented from being aligned wrongly.
Corresponding to the foregoing embodiments of the method for aligning an unsupervised entity oriented to a time series knowledge graph, the present application also provides embodiments of an unsupervised entity aligning apparatus oriented to a time series knowledge graph.
FIG. 7 is a block diagram illustrating a timing-knowledgegraph-oriented unsupervised entity alignment apparatus according to an example embodiment. Referring to fig. 7, the apparatus may include:
the acquisition module 21 is configured to acquire two time sequence knowledge maps, where each time sequence knowledge map includes a plurality of quadruples including time information;
a constructing module 22, configured to construct two time feature matrices in a graph-convolution forward transmission manner according to the entity and the corresponding time information in each time sequence knowledge graph;
the pre-alignment module 23 is configured to generate two entity alignment matrices by using a bidirectional policy according to the two time feature matrices, and obtain pre-aligned pseudo labels in a matching manner without supervision through the two entity alignment matrices;
the training module 24 is configured to expand the neural network model of the graph by using the time information, train the expanded neural network model of the graph by using a quadruple of the time sequence knowledge graph as a training data set and using the pre-aligned pseudo labels as untrained data labels, and obtain a relationship feature matrix;
a fusion module 25, configured to fuse the relationship feature matrix and the two time feature matrices in a weighting manner to obtain a fused normalized entity alignment matrix;
an alignment module 26, configured to obtain a distance between the two timing knowledge graphs by using the normalized entity alignment matrix, and obtain a fusion weight by minimizing the distance, so as to obtain an entity alignment result.
With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a temporal-knowledge-graph-oriented unsupervised entity alignment method as described above. As shown in fig. 8, for a hardware structure diagram of any device with data processing capability in which an unsupervised entity alignment method facing to a timing knowledge graph according to an embodiment of the present invention is provided, in addition to the processor, the memory, and the network interface shown in fig. 8, any device with data processing capability in which an embodiment of the apparatus is provided may generally include other hardware according to an actual function of the any device with data processing capability, which is not described again.
Accordingly, the present application further provides a computer-readable storage medium, on which computer instructions are stored, wherein the instructions, when executed by a processor, implement the method for unsupervised entity alignment towards a time-series knowledge-graph as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit of any data processing capable device and an external storage device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims (10)

1. A time sequence knowledge graph-oriented unsupervised entity alignment method is characterized by comprising the following steps:
s11: acquiring two time sequence knowledge maps, wherein each time sequence knowledge map comprises a plurality of quadruples containing time information;
s12: according to the entity in each time sequence knowledge graph and the corresponding time information, two time characteristic matrixes are constructed in a graph convolution type forward transmission mode;
s13: generating two entity alignment matrixes by adopting a bidirectional strategy according to the two time characteristic matrixes, and unsupervised obtaining a pre-aligned pseudo label in a matching mode through the two entity alignment matrixes;
s14: expanding the neural network model of the graph by using the time information, taking the quadruple of the time sequence knowledge graph as a training data set, and taking the pre-aligned pseudo labels as untrained data labels, and training the expanded neural network model of the graph to obtain a relation characteristic matrix;
s15: fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain a fused normalized entity alignment matrix;
s16: and obtaining the distance between the two time sequence knowledge maps by using the normalized entity alignment matrix, and obtaining a corresponding entity alignment matrix by minimizing the distance, thereby obtaining an entity alignment result.
2. The method of claim 1, wherein in step S11, the quadruple
Figure 590285DEST_PATH_IMAGE001
Representing a subject entity
Figure 76630DEST_PATH_IMAGE002
At time intervals
Figure 53813DEST_PATH_IMAGE003
Inter and object entities
Figure 574793DEST_PATH_IMAGE004
Have a relationship with
Figure 510388DEST_PATH_IMAGE005
3. The method of claim 1, wherein in step S12, the following operations are performed on each time-series knowledge-graph, so as to construct two time feature matrices:
s21: extracting a bipartite graph of an entity and time as a preliminary time characteristic matrix;
s22: constructing a relation adjacency matrix with weight for the knowledge graph according to the proportion of different relation types;
s23: and aggregating information from the neighbor entities through graph convolution forward transmission based on the time characteristic matrix and the relational adjacency matrix to supplement time characteristics to obtain an aggregated time characteristic matrix.
4. The method according to claim 1, wherein step S13 comprises:
s31: preliminarily deducing two entity alignment matrixes in the forward direction and the reverse direction respectively for the two time characteristic matrixes;
s32: and respectively identifying corresponding entities with the highest entity similarity in the two entity alignment matrixes so as to obtain a plurality of entity pairs, and if the obtained entity pairs are matched with each other in the bidirectional strategy, obtaining the entity pairs as pre-aligned pseudo labels.
5. The method according to claim 1, wherein step S14 comprises:
s41: initializing a learnable embedded vector;
s42: constructing a loss function by using a negative sample sampling method;
s43: building a multilayer graph neural network model for learning the structural features of the knowledge graph by adding a time information expansion graph neural network model by using the embedded vector;
s44: and taking the pre-aligned pseudo label as a training data label, and training a multilayer graph neural network model until the loss function is completely converged to obtain a relation characteristic matrix.
6. The method of claim 1, wherein in step S15, the fused normalized entity alignment matrix
Figure 218450DEST_PATH_IMAGE006
Wherein
Figure 395397DEST_PATH_IMAGE007
In order to fuse the weights, the weights are fused,
Figure 505304DEST_PATH_IMAGE008
and
Figure 939696DEST_PATH_IMAGE009
respectively are time characteristic matrixes of two time sequence knowledge graphs,
Figure 541579DEST_PATH_IMAGE010
and
Figure 719620DEST_PATH_IMAGE011
respectively are the relationship characteristic matrixes of the two time sequence knowledge graphs after the entity is split.
7. The method according to claim 1, wherein step S16 comprises:
obtaining a relation distance and a time distance through the relation characteristic and the time characteristic by using the normalized entity alignment matrix;
setting weights for the relation distance and the time distance respectively by using a WL graph kernel algorithm, and obtaining the distance between the two time sequence knowledge graphs through weighted summation;
and searching the fusion weight which enables the distance to be minimum in a preset range, thereby determining a corresponding entity alignment matrix and obtaining an entity alignment result.
8. An unsupervised entity alignment apparatus oriented to a time series knowledge graph, comprising:
the acquisition module is used for acquiring two time sequence knowledge graphs, and each time sequence knowledge graph comprises a plurality of quadruples containing time information;
the construction module is used for constructing two time characteristic matrixes in a graph convolution type forward transmission mode according to the entity in each time sequence knowledge graph and the corresponding time information;
the pre-alignment module is used for generating two entity alignment matrixes by adopting a bidirectional strategy according to the two time characteristic matrixes and unsupervised obtaining pre-aligned pseudo labels in a matching mode through the two entity alignment matrixes;
the training module is used for utilizing the time information to expand the neural network model of the graph, taking the quadruple of the time sequence knowledge graph as a training data set and the pre-aligned pseudo labels as untrained data labels, and training the expanded neural network model of the graph to obtain a relation characteristic matrix;
the fusion module is used for fusing the relation characteristic matrix and the two time characteristic matrices in a weighting mode to obtain a fused normalized entity alignment matrix;
and the alignment module is used for obtaining the distance between the two time sequence knowledge graphs by utilizing the normalized entity alignment matrix, and obtaining a corresponding entity alignment matrix by minimizing the distance so as to obtain an entity alignment result.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-7.
CN202211461066.1A 2022-11-17 2022-11-17 Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph Active CN115658927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211461066.1A CN115658927B (en) 2022-11-17 2022-11-17 Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211461066.1A CN115658927B (en) 2022-11-17 2022-11-17 Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph

Publications (2)

Publication Number Publication Date
CN115658927A true CN115658927A (en) 2023-01-31
CN115658927B CN115658927B (en) 2023-04-11

Family

ID=85019096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211461066.1A Active CN115658927B (en) 2022-11-17 2022-11-17 Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph

Country Status (1)

Country Link
CN (1) CN115658927B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174319A (en) * 2023-11-03 2023-12-05 神州医疗科技股份有限公司 Sepsis time sequence prediction method and system based on knowledge graph

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609903A (en) * 2019-08-01 2019-12-24 华为技术有限公司 Information presentation method and device
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph
WO2021151325A1 (en) * 2020-09-09 2021-08-05 平安科技(深圳)有限公司 Method and apparatus for triage model training based on medical knowledge graphs, and device
CN114461812A (en) * 2022-01-12 2022-05-10 浙江大学 Large-scale knowledge graph-oriented multi-channel entity alignment method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609903A (en) * 2019-08-01 2019-12-24 华为技术有限公司 Information presentation method and device
WO2021151325A1 (en) * 2020-09-09 2021-08-05 平安科技(深圳)有限公司 Method and apparatus for triage model training based on medical knowledge graphs, and device
CN112364174A (en) * 2020-10-21 2021-02-12 山东大学 Patient medical record similarity evaluation method and system based on knowledge graph
CN114461812A (en) * 2022-01-12 2022-05-10 浙江大学 Large-scale knowledge graph-oriented multi-channel entity alignment method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱继召;乔建忠;林树宽;: "表示学习知识图谱的实体对齐算法" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174319A (en) * 2023-11-03 2023-12-05 神州医疗科技股份有限公司 Sepsis time sequence prediction method and system based on knowledge graph
CN117174319B (en) * 2023-11-03 2024-03-01 神州医疗科技股份有限公司 Sepsis time sequence prediction method and system based on knowledge graph

Also Published As

Publication number Publication date
CN115658927B (en) 2023-04-11

Similar Documents

Publication Publication Date Title
Zhang et al. Improved deep hashing with soft pairwise similarity for multi-label image retrieval
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN111311107B (en) Risk assessment method and device based on user relationship and computer equipment
CN113610126A (en) Label-free knowledge distillation method based on multi-target detection model and storage medium
CN114090783A (en) Heterogeneous knowledge graph fusion method and system
CN113393474B (en) Feature fusion based three-dimensional point cloud classification and segmentation method
CN112765370B (en) Entity alignment method and device of knowledge graph, computer equipment and storage medium
CN109145083B (en) Candidate answer selecting method based on deep learning
CN111462324A (en) Online spatiotemporal semantic fusion method and system
CN115658927B (en) Unsupervised entity alignment method and unsupervised entity alignment device for time sequence knowledge graph
Cheng et al. A two-stage outlier filtering framework for city-scale localization using 3D SfM point clouds
CN113065409A (en) Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint
CN114780777B (en) Cross-modal retrieval method and device based on semantic enhancement, storage medium and terminal
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
CN111241326B (en) Image visual relationship indication positioning method based on attention pyramid graph network
CN114926742A (en) Loop detection and optimization method based on second-order attention mechanism
CN116226452A (en) Cross-modal video retrieval method and device based on double-branch dynamic distillation learning
CN116136870A (en) Intelligent social conversation method and conversation system based on enhanced entity representation
CN108959472B (en) Knowledge graph representation learning method based on multi-step relation path
CN114328943A (en) Question answering method, device, equipment and storage medium based on knowledge graph
CN117809198A (en) Remote sensing image significance detection method based on multi-scale feature aggregation network
CN110647917B (en) Model multiplexing method and system
CN116956002A (en) Training method, device, equipment and storage medium of diagnosis and treatment information prediction model
CN115100502A (en) Multi-label image recognition algorithm research based on label reasoning
CN112836511B (en) Knowledge graph context embedding method based on cooperative relationship

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant