CN113688249A - Knowledge graph embedding method and system based on relation cognition - Google Patents

Knowledge graph embedding method and system based on relation cognition Download PDF

Info

Publication number
CN113688249A
CN113688249A CN202010420480.2A CN202010420480A CN113688249A CN 113688249 A CN113688249 A CN 113688249A CN 202010420480 A CN202010420480 A CN 202010420480A CN 113688249 A CN113688249 A CN 113688249A
Authority
CN
China
Prior art keywords
relationship
knowledge
graph
matrix
tra
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010420480.2A
Other languages
Chinese (zh)
Inventor
姚权铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN202010420480.2A priority Critical patent/CN113688249A/en
Publication of CN113688249A publication Critical patent/CN113688249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided are a relation cognition-based knowledge graph embedding method and system, wherein the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.

Description

Knowledge graph embedding method and system based on relation cognition
Technical Field
The present application relates to knowledge graph embedding technology in the field of artificial intelligence, and more particularly, to a knowledge graph embedding method and system based on relationship recognition.
Background
With the rapid development of information network technology, various information network data contents present an explosive growth situation. Such content is generally characterized by large scale, heterogeneous diversity and loose organization structure, and provides challenges for people to effectively acquire information and knowledge. Knowledge Graph (KG) is a Knowledge base of semantic network, and can describe Knowledge resources and carriers thereof by using a visualization technology, and mine, analyze, construct, draw and display Knowledge and mutual relations among the Knowledge resources and the carriers.
The knowledge graph is a special graph structure, the entity is used as a node, and the relation is used as a directed edge, which has recently attracted the interest of many people. In the knowledge-graph, each edge is represented as a triple (h, r, t) in the form of (head entity, relationship, tail entity) to indicate that two entities h (i.e., head entity) and t (i.e., tail entity) are connected by a relationship r, e.g., (new york, isLocatedIn, USA) may indicate that new york is located in USA. Many large knowledge maps have been established over the last decades, such as WordNet, Freebase, DBpedia, YAGO. They improve various downstream applications such as structured search, question and answer, and entity recommendations, among others.
In a knowledge graph, one basic problem is how to quantify the similarity of a given triplet (h, r, t) so that subsequent applications can be performed. Recently, Knowledge-map Embedding (KGE) has emerged and developed as a method for this purpose. Knowledge graph embedding aims at finding vector representations (i.e., embedding) of low-dimensional entities and relationships so that their similarity can be quantified. In particular, given a set of observed facts (i.e., triples), knowledge-graph embedding attempts to learn low-dimensional vector representations of entities and relationships in the triples so that the similarity of the triples can be quantified. This similarity can be measured by a Scoring Function (SF), which can be used to build a model based on a given relationship for measuring similarity between entities. To construct the knowledge-graph embedded model, it is most important to design and select a suitable SF. Since different SFs have their own weaknesses and advantages in capturing similarity, the selection of SF is crucial to the performance of the knowledge-graph embedding.
In general, SFs are not task-aware, so they have difficulty obtaining optimal performance across various data sets at all times. Pioneering work in designing task-dependent SFs is the automatic scoring function (AutoSF), which uses automatic machine learning techniques to specify an SF for a given task. In this way, AutoSF becomes the most advanced SF in the knowledge-graph embedding.
However, both the performance and efficiency of AutoSF are not as expected. First, the SF should also be relationship-aware as different relationships exhibit different patterns. Furthermore, a single training of the model is already costly, whereas AutoSF requires thousands of hundreds of trainings. Therefore, there is a need for a method that can efficiently determine the SF of relationship awareness for a given task without multiple model training and that improves both performance and efficiency.
Disclosure of Invention
According to an embodiment of the invention, a knowledge graph embedding method based on relationship cognition is provided, and the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
Alternatively, the scoring function may be expressed as:
Figure BDA0002496675690000021
wherein the plurality of relationships in the knowledge-graph can be divided into a plurality of relationship groups (CU)d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups, h, t and r respectively represent the head entity h, the tail entity t and the embedded vector of the relation r between h and t in the triplet (h, r, t) of the knowledge graph, and h, t and r are respectively divided into K sub-embedded vectors h according to the same division mode1To hK、r1To rKAnd t1To tKM is 1. ltoreq. K, n is 1. ltoreq. K, and K is a positive integer, gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,
Figure BDA0002496675690000022
a block representing an mth row and an nth column in the relational block matrix,
Figure BDA0002496675690000023
okis a set of operators
Figure BDA0002496675690000024
The (k) th operator in (2),
Figure BDA0002496675690000025
1≤k≤2K+1。
optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the correspondence matrix structure of the relational block matrix corresponding to each scoring function may be determined based on the following expression (2):
Figure BDA0002496675690000031
s.t.
Figure BDA0002496675690000032
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,
Figure BDA0002496675690000033
is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure BDA0002496675690000034
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,
Figure BDA0002496675690000035
is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000036
is based on a structural search space
Figure BDA0002496675690000037
Matrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure BDA0002496675690000038
is a structure search space
Figure BDA0002496675690000039
In the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of scoring functions having the smallest loss, and each matrix structure in the set indicates a corresponding matrix structure of a relationship block matrix corresponding to the each scoring function, respectively.
Optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function may be determined based on expression (3):
Figure BDA0002496675690000041
s.t.
Figure BDA0002496675690000042
wherein the content of the first and second substances,
Figure BDA0002496675690000043
representation and relation block matrix gd(r) corresponding structural weight matrix, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure BDA0002496675690000044
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra(j) Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers, SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),
Figure BDA0002496675690000045
Indication of
Figure BDA0002496675690000046
Whether or not to belong to a relationship group CUd
Figure BDA0002496675690000047
Indication rjWhether or not to belong to a relationship group CUd
Figure BDA0002496675690000048
Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000049
is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure BDA00024966756900000410
is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
Optionally, the step of dividing the plurality of relationships in the knowledge-graph into a plurality of relationship groups may comprise: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.
Alternatively,
Figure BDA00024966756900000411
may be determined by the following operations based on expression (3): an initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indication
Figure BDA00024966756900000412
Wherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,
Figure BDA00024966756900000413
indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd(ii) a Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indication
Figure BDA0002496675690000051
Updating { A ] by performing at least one iterative operationd}; to be used in the last iteration operationdDiscrete samples of } are determined as
Figure BDA0002496675690000052
Each iteration of the operation may include the following operations: set of weighting matrices for structure { AdSampling to obtain a set of discrete structure weight matrices
Figure BDA0002496675690000053
Based on the obtained
Figure BDA0002496675690000054
Updating the embedding vector X ═ { h, r, t }; assigning indications to relationship groups
Figure BDA0002496675690000055
Updating is carried out; to { AdThe update is performed.
Optionally, a set of weight matrices for the structure { AdThe step of sampling may comprise: as determined by expression (4)Fixed BdRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraint
Figure BDA0002496675690000056
Of a discrete set of structural weight matrices
Figure BDA0002496675690000057
Figure BDA0002496675690000058
Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,
Figure BDA00024966756900000514
indicating operator set
Figure BDA00024966756900000515
Of 1 indicates that all are
1 column vector, C2={a(m,n)|||a(m,n)||0=1},
Figure BDA00024966756900000516
Figure BDA00024966756900000517
Optionally based on the obtained
Figure BDA0002496675690000059
The step of updating the embedding vector X ═ { h, r, t } may include: from the training set StraSelecting a predetermined number of triplets as the mini-batch set Btra(ii) a And updating the embedding vector X ═ { h, r, t } according to the following expression (5):
Figure BDA00024966756900000510
where eta is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA00024966756900000511
indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
Optionally, an indication is assigned to the relationship group
Figure BDA00024966756900000512
The step of performing the update may comprise: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results
Figure BDA00024966756900000513
Alternatively, for { AdThe step of updating may comprise: from the verification set SvalSelecting a predetermined number of triplets as the mini-batch set Bval(ii) a Pair set { A) according to the following expression (6)dUpdating each matrix in the list:
Figure BDA0002496675690000061
wherein ε is a predetermined step length, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval (i’)Representing a small batch set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet (h) withi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA0002496675690000062
indication of
Figure BDA0002496675690000063
Whether or not to belong to a relationship group CUd,1≤i’≤|BvalAnd i' is an integer.
According to another embodiment of the present invention, there is provided a relationship-awareness-based knowledge-graph embedding system, including: a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group; a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups; an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and a representation means configured to obtain an embedded representation of the knowledge-graph using the embedded model.
Alternatively, the scoring function may be expressed as:
Figure BDA0002496675690000064
wherein the plurality of relationships in the knowledge-graph can be divided into a plurality of relationship groups (CU)d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups, h, t and r respectively represent the head entity h, the tail entity t and the embedded vector of the relation r between h and t in the triplet (h, r, t) of the knowledge graph, and h, t and r are respectively divided into K sub-embedded vectors h according to the same division mode1To hK、r1To rKAnd t1To tKM is 1. ltoreq. K, n is 1. ltoreq. K, and K is a positive integer, gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,
Figure BDA0002496675690000065
a block representing an mth row and an nth column in the relational block matrix,
Figure BDA0002496675690000066
C1≡{0,1},okis a set of operators
Figure BDA0002496675690000067
The (k) th operator in (2),
Figure BDA0002496675690000068
1≤k≤2K+1。
alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Optionally, the search means is configured to determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions based on the following expression (2):
Figure BDA0002496675690000071
s.t.
Figure BDA0002496675690000072
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,
Figure BDA0002496675690000073
is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraRespectively is the knowledge-graphIs a subset of the triple set, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure BDA0002496675690000074
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,
Figure BDA0002496675690000075
is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000076
is based on a structural search space
Figure BDA0002496675690000077
Matrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure BDA0002496675690000078
is a structure search space
Figure BDA0002496675690000079
In the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of the scoring function having the smallest loss, and each matrix structure in the set is referred to respectivelyAnd showing a corresponding matrix structure of the relation block matrix corresponding to each scoring function.
Alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the search means may be configured to determine the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):
Figure BDA0002496675690000081
s.t.
Figure BDA0002496675690000082
wherein the content of the first and second substances,
Figure BDA0002496675690000083
representation and relation block matrix gd(r) corresponding structural weight matrix, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure BDA0002496675690000084
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)To representTraining set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers, SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),
Figure BDA0002496675690000085
Indication of
Figure BDA0002496675690000086
Whether or not to belong to a relationship group CUd
Figure BDA0002496675690000087
Indication rjWhether or not to belong to a relationship group CUd
Figure BDA0002496675690000088
Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000089
is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure BDA00024966756900000810
is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
Alternatively, the relationship dividing means may be configured to divide the plurality of relationships into a plurality of relationship groups using a clustering method.
Alternatively, the search means may be configured to determine by the following operation based on expression (3)
Figure BDA00024966756900000811
An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indication
Figure BDA00024966756900000812
Wherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,
Figure BDA00024966756900000813
indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd(ii) a Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indication
Figure BDA0002496675690000091
Updating { A ] by performing at least one iterative operationd}; to be used in the last iteration operationdDiscrete samples of } are determined as
Figure BDA0002496675690000092
Each iteration of the operation may include the following operations: set of structural weight matrices { A } by the search meansdSampling to obtain a set of discrete structure weight matrices
Figure BDA0002496675690000093
Based on obtained by the searching means
Figure BDA0002496675690000094
Updating the embedding vector X ═ { h, r, t }; assigning indications to relationship groups by a relationship partitioning means
Figure BDA0002496675690000095
Updating is carried out; bySearch device pair { AdThe update is performed.
Alternatively, the search means may be configured to construct the set of weight matrices { A } bydSampling: with B determined by expression (4)dRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraint
Figure BDA0002496675690000096
Of a discrete set of structural weight matrices
Figure BDA0002496675690000097
Figure BDA0002496675690000098
Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,
Figure BDA0002496675690000099
indicating operator set
Figure BDA00024966756900000910
1 indicates a column vector of all 1, C2={a(m,n)|||a(m,n)||0=1},
Figure BDA00024966756900000915
Figure BDA00024966756900000916
Alternatively, the search means may be configured to update the embedding vector X ═ { h, r, t } by: from the training set StraSelecting a predetermined number of triplets as the mini-batch set Btra(ii) a The embedding vector X ═ { h, r, t } is updated according to the following expression (5):
Figure BDA00024966756900000911
where eta is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA00024966756900000912
indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
Alternatively, the relationship partitioning apparatus may be configured to assign an indication to the relationship group by
Figure BDA00024966756900000913
Updating: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results
Figure BDA00024966756900000914
Alternatively, the search apparatus may be configured to operate on { A } bydUpdating: from the verification set SvalSelecting a predetermined number of triplets as the mini-batch set Bval(ii) a Pair set { A) according to the following expression (6)dUpdating each matrix in the list:
Figure BDA0002496675690000101
wherein ε is a predetermined step length, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval(i') denotes a small lot set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet ofhi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA0002496675690000102
indication of
Figure BDA0002496675690000103
Whether or not to belong to a relationship group CUd,1≤i’≤|BvalAnd i' is an integer.
According to another embodiment of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.
According to another embodiment of the present invention, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.
Advantageous effects
By applying the knowledge graph embedding method and system based on the relational cognition according to the exemplary embodiment of the invention, SF of the relational cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.
Drawings
These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system according to an exemplary embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method according to an exemplary embodiment of the present disclosure.
Detailed Description
In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.
Before starting the description of the inventive concept below, for the sake of understanding, the various parameters and their expressions used in the present application will be explained first:
vectors are represented by lower case bold and matrices by upper case bold.
For a knowledge-graph, its set of entities and relationships are represented by ε and R, respectively. The triples in the knowledgegraph are represented by (h, R, t), where h e ε and t e ε are the head and tail entities, respectively, and R e R is the relationship.
The parameters of the knowledge-graph embedding model are denoted as e (for each entity) and r (for relationships). For simplicity, in the following, embedding (sometimes also referred to as embedding vector in the following) is represented by a bold-faced form of the corresponding parameter, e.g.,
Figure BDA0002496675690000111
is an embedding of a.
<a,b,c>Is a dot product and for a real-valued vector it equals
Figure BDA0002496675690000112
Whereas for a complex-valued vector it is the Hermitian product.
The diagonal matrix diag (b) is the diagonal matrix of b,
Figure BDA0002496675690000113
furthermore, in the context of the present disclosure, parameters having the same expression have the same definition.
f (h, r, t) is a scoring function that returns a real number value reflecting the similarity of the triples (h, r, t), with higher scores representing more similarity.
Fig. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system 100 according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, the knowledge-graph embedding system 100 based on relationship recognition may include a relationship dividing means 110, a searching means 120, an embedding model training means 130, and a representing means 140.
The relationship dividing apparatus 110 according to an exemplary embodiment of the present disclosure may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups such that each relationship in the knowledge graph belongs to only one relationship group.
In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups using a clustering method, however, it should be understood that the method of dividing the plurality of relationships in the knowledge graph is not limited thereto, and any other suitable method may be used to divide the relationships.
Hereinafter, for ease of understanding, the plurality of relationship groups may be represented as { CU }d},CUdRepresenting the d-th relational group and arbitrary
Figure BDA0002496675690000114
R is a set of relationships in the knowledge graph, D is greater than or equal to 1 and less than or equal to D, and D is the number of the plurality of relationship groups.
The search means 120 may determine a scoring function corresponding to each relationship group based on the division result, and obtain a set of scoring functions for the plurality of relationship groups.
In an exemplary embodiment of the present invention, the form of the scoring function may be expressed as follows:
Figure BDA0002496675690000121
here, fd(h, r, t) is the relation group CUdAnd corresponding scoring functions, h, t and r respectively represent embedded vectors of a head entity h, a tail entity t and a relation r between h and t in the triplet (h, r, t) of the knowledge graph. H, t and r in expression (1) can be divided into K sub-embedded vectors h respectively according to the same division manner1To hK、r1To rKAnd t1To tK,1≤m≤K,1≤n≤K,And K is a positive integer. gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,
Figure BDA0002496675690000122
a block representing an mth row and an nth column in the relational block matrix,
Figure BDA0002496675690000123
C1≡{0,1},okis a set of operators
Figure BDA0002496675690000125
The (k) th operator in (2),
Figure BDA0002496675690000124
1≤k≤2K+1。
in the embodiment of the present invention, being divided in the same division manner means that K sub-embedded vectors h obtained by dividing the embedded vectors h, r, and t1To hK、r1To rKAnd t1To tKThe corresponding sub-embedding vectors have the same dimension, i.e., h1、r1And t1Dimension is the same, h2、r2And t2The dimensions are the same, and so on. Furthermore, in embodiments of the present invention, in partitioning, the embedding vectors h, r, and t may be partitioned uniformly (i.e., dimensions of each sub-embedding vector are the same, e.g., sub-embedding vector h1To hKSame dimension) or may be non-uniformly partitioned (i.e., dimensions of individual sub-embedding vectors are not exactly the same, e.g., sub-embedding vector h1To hKAre not all the same).
As can be seen from the above representation of the scoring functions, the relational block matrix g of the different scoring functionsdThe main difference between (r) is the sub-embedding vector r1,...,±rKDistribution of (2). Thus, after the embedded vectors h, r, and t are partitioned, they can be based on gdNon-zero blocks in a K block matrix of (r) (i.e., sub-embedding vectors + -r)1,...,±rK) The distribution of the score is designed into a plurality of candidate scoring functionsNumbers, thereby constituting a scoring function search space. The process of the search means 120 determining the scoring function corresponding to each relationship group may be a process of searching the scoring function corresponding to each relationship group in the scoring function search space.
Therefore, in the embodiment of the present invention, in combination with the above-mentioned method for constructing the scoring function search space, the block matrix g of the correspondence relationship that the appropriate scoring function searched in the scoring function search space can be actually converted into the scoring functiond(r) finding a suitable matrix structure gdIndicates the relationship block matrix gdDistribution of non-zero blocks in (r) (i.e., sub-embedding vectors ± r)1,…,±rKDistribution structure in the block matrix), and may then be based on the block matrix structure gdTo determine the relational block matrix gd(r) whereby the block matrix g can be based on the determined relationd(r) obtaining a corresponding scoring function.
Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions using the following expression (2):
Figure BDA0002496675690000131
s.t.
Figure BDA0002496675690000132
in the expression (2), gdIs a reaction of with gd(r) a corresponding matrix structure,
Figure BDA0002496675690000133
is a structure search space including a plurality of matrix structures corresponding to the above scoring function search space, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure BDA0002496675690000134
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers.
In addition to this, the present invention is,
Figure BDA0002496675690000135
is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000136
is based on a structural search space
Figure BDA0002496675690000137
Matrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.
Figure BDA0002496675690000138
Is a structure search space
Figure BDA0002496675690000139
In the verification set S, an embedding model based on the embedding vector parametersvalThe searching apparatus 120 may determine each matrix structure in the set as a corresponding matrix structure of the relationship block matrix corresponding to each scoring function.
Further, in the exemplary embodiment of the present invention, since
Figure BDA00024966756900001310
Thus, the relational block matrix gdNon-zero blocks in (r) (i.e., sub-embedding vectors ± r)1,...,±rK) Is actually dependent on
Figure BDA00024966756900001311
Therefore, in an exemplary embodiment of the present invention, searching for an appropriate scoring function in the scoring function search space may be further converted into a corresponding relationship block matrix g of the scoring functiond(r) finding a suitable corresponding structural weight matrix
Figure BDA0002496675690000141
Indicating the relationship block matrix gd(r) structural weights of the individual blocks in (r), and a relational block matrix g may then be determined based on the structural weight matrixd(r) thereby enabling block matrix g based on the determined relationshipd(r) obtaining a corresponding scoring function.
Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a corresponding structure weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):
Figure BDA0002496675690000142
s.t.
Figure BDA0002496675690000143
in expression (3), SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),
Figure BDA0002496675690000144
Indication of
Figure BDA0002496675690000145
Whether or not to belong to a relationship group CUd
Figure BDA0002496675690000146
Indication rjWhether or not to belong to a relationship group CUd
Figure BDA0002496675690000147
Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure BDA0002496675690000148
is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.
Figure BDA0002496675690000149
Is to make the embedding model based on the embedding vector parameters in the verification set SvalThe searching apparatus 120 may determine each structural weight matrix in the set as corresponding to a relational block matrix corresponding to each scoring function, respectively.
Further, as shown in expressions (2) and (3), the above expressions (2) and (3) are actual packagesTwo levels of optimization are involved. However, it should be understood that, for example, when K is 4 (i.e., the embedding vectors h, r, and t are divided into 4 sub-embedding vectors, respectively), g is for a 4 × 4 block matrixdIs provided with
Figure BDA00024966756900001410
There are possible structures (9 choices for each sub-block, i.e., 0, ± r)1,...,±r4) I.e. the structure search space
Figure BDA00024966756900001411
Included
Figure BDA00024966756900001412
Of the same structure, and likewise, AdThere are also many possible forms. Therefore, when the matrix structure set is directly searched using the above expressions (2) and (3)
Figure BDA00024966756900001413
Or a collection of structural weight matrices
Figure BDA0002496675690000151
The search process can be quite complex and time consuming.
Thus, in exemplary embodiments of the present invention, search apparatus 120 may determine in expression (3) based on the following operations, by way of example only
Figure BDA0002496675690000152
(A) An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indication
Figure BDA0002496675690000153
Wherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,
Figure BDA0002496675690000154
indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd. Here, the searcher 120 may determine the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a } using gaussian initialization or the likedAnd relationship group assignment indication
Figure BDA0002496675690000155
It should be understood, however, that the present application is not limited thereto, and that any other suitable method may be used for initialization to obtain the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a }, as welldAnd relationship group assignment indication
Figure BDA0002496675690000156
(B) Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indication
Figure BDA0002496675690000157
Updating { A ] by performing at least one iterative operationdAnd will be used in the last iteration operation { A }dDiscrete samples of } are determined as
Figure BDA0002496675690000158
By way of example only, in expression (3) above, the determination may be based on the following example algorithm 1
Figure BDA0002496675690000159
Figure BDA00024966756900001510
Figure BDA0002496675690000161
Referring to algorithm 1, in an exemplary embodiment of the present invention, each iteration operation may include the following operations (B1) to (B4):
(B1) set of structural weight matrices { A } by the search means 120dSampling to obtain a set of discrete structure weight matrices
Figure BDA0002496675690000162
(step 3 in algorithm 1).
Here, the search means 120 may determine B by the following expression (4)dRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraint
Figure BDA0002496675690000163
Of a discrete set of structural weight matrices
Figure BDA0002496675690000164
Figure BDA0002496675690000165
In the expression (4), τ ∈ [0, 1 ]]Is a pre-set hyper-parameter which,
Figure BDA0002496675690000166
indicating operator set
Figure BDA0002496675690000167
1 indicates a column vector of all 1 s. Furthermore, C2={a(m,n)|||a(m,n)||0=1},
Figure BDA00024966756900001613
Figure BDA00024966756900001614
(B2) Based on obtained by the search means 120
Figure BDA0002496675690000168
The embedding vector X is updated to { h, r, t }.
In an exemplary embodiment of the present invention, when updating the embedded vector X ═ { h, r, t }, the embedded vector X may first be updated from the training set StraSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set Btra(step 4 in algorithm 1), and then updates the embedded vector X ═ { h, r, t } according to the following expression (5) (step 5 in algorithm 1):
Figure BDA0002496675690000169
where η is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA00024966756900001610
indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
(B3) Assigning indications to relationship groups by the relationship partitioning means 110
Figure BDA00024966756900001611
And (6) updating.
In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may assign a plurality of relationships in the knowledge-graph to different relationship groups using a clustering method, and then update the relationships based on the assignment result
Figure BDA00024966756900001612
Here, when
Figure BDA0002496675690000171
After being determined, the expression used in the preceding expression
Figure BDA0002496675690000172
And
Figure BDA0002496675690000173
can be determined accordingly.
For example only, the relational division apparatus 110 may implement clustering (step 6 in algorithm 1) according to the following expression (6):
Figure BDA0002496675690000174
wherein r istIs a relation R in a set of relations R with the knowledge-graphtCorresponding embedded vector, cdIs a relationship group CUdVector representation of bdtRepresents CUdAnd rtDegree of membership between.
In an exemplary implementation of the present invention, the EM algorithm may be used to obtain a solution to expression (6) above to determine the assignment of the relationship. Specifically, in the EM algorithm, the E step shown in the following expression (7) and the M step shown in the following expression (8) may be iteratively performed until the cluster group (i.e., the relationship group) of the clusters converges, thereby obtaining the assignment result of the relationship:
Figure BDA0002496675690000175
Figure BDA0002496675690000176
after completion of clustering, based on the clustering result { b }dtGet the value of arg max if dd′bd′tThen make it possible to
Figure BDA0002496675690000177
Otherwise make the
Figure BDA0002496675690000178
Thereby obtainingCan be updated
Figure BDA0002496675690000179
(B4) By the search apparatus 120 for { AdThe update is performed.
In an exemplary embodiment of the present invention, in the pair { A }dIn the updating step, the verification set S can be firstly selectedvalSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set Bval(step 7 in Algorithm 1), and then pair the set { A) according to the following expression (9)dUpdate each matrix in (steps 8 to 10 in algorithm 1):
Figure BDA00024966756900001710
here, ε is the preset step size, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval (i’)Representing a small batch set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet (h) withi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,
Figure BDA00024966756900001711
indication of
Figure BDA00024966756900001712
Whether or not to belong to a relationship group CUd,1≤i′≤|BvalAnd i' is an integer.
In an exemplary embodiment of the invention, the last iteration in algorithm 1 is performed as determined in step 3
Figure BDA0002496675690000181
Can be determined to be finally obtained
Figure BDA0002496675690000182
Further, although it is shown in the above description that the embedded vector X ═ h, r, t }, the set of structural weight matrices { a } are sequentially aligned in each iterative updatedAnd relationship set assignment indications
Figure BDA0002496675690000183
The update is performed, but the present invention is not limited thereto, and the embedded vector X ═ h, r, t, and the set of structure weight matrices { a }dAnd relationship set assignment indications
Figure BDA0002496675690000184
The update order of the three can be arbitrarily set, and is not limited to the order in the above algorithm 1.
Furthermore, in an exemplary embodiment of the present invention, the set of structural weight matrices { a } is applied to the embedded vector X ═ { h, r, t }dAnd relationship set assignment indications
Figure BDA0002496675690000185
The embedded vector X involved in the update process is { h, r, t }, the set of structure weight matrices { a }dAnd relationship set assignment indications
Figure BDA0002496675690000186
The update result in the last iteration operation may be also the update result of the corresponding parameter in the current iteration operation. For example, in algorithm 1 shown above, set A is paired in step 9dThe embedding vector h, r, t used in the expression (9) for updating may be the embedding vector updated in the last iteration operation, or the embedding vector updated in step 5 in the current iteration operation.
It should be understood that the various algorithms used in the above steps (B1) to (B4) of the iterative operations or the various specific calculation methods shown in the form of expressions are only examples listed for the convenience of understanding the present application, the present application is not limited thereto, and the operations of the steps (B1) to (B4) may be accomplished using other methods.
Further, it should also be understood that, in the exemplary embodiment of the present invention, the searching means 120 may search for the corresponding scoring function for each of the divided relationship groups after the relationship dividing means 110 completes the division of the relationship groups, however, the present application is not limited thereto, and as shown in the above algorithm 1, the relationship dividing means 110 may also iteratively update the division of the relationship groups based on the optimization result of the searching means 120 in the process of the searching means 120 searching for the scoring function, thereby enabling the knowledge graph embedding system 100 based on relationship recognition of the present application to search for the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative update.
After searching out the scoring functions of the respective relationship groups, the embedded model training device 130 may train the embedded model of the knowledge-graph based on the obtained set of scoring functions, and the representing device 140 may obtain the embedded representation of the knowledge-graph using the embedded model.
Further, although not shown in fig. 1, the knowledge-graph embedding system 100 based on relationship awareness according to an exemplary embodiment of the present disclosure may further include: a machine learning model training unit (not shown) for training a machine learning model based on the obtained embedded representation of the knowledge graph to obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution; and a prediction unit (not shown) for performing a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.
Fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method 200 according to an exemplary embodiment of the present disclosure.
As shown in fig. 2, in step S210, the plurality of relationships in the knowledge-graph may be divided into a plurality of relationship groups by the relationship dividing unit 110 such that each relationship in the knowledge-graph belongs to only one relationship group.
In step S220, the search unit 120 may determine a scoring function corresponding to each relationship group based on the division result, obtaining a set of scoring functions for the plurality of relationship groups.
Thereafter, the obtained set of scoring functions may be used by the embedded model training unit 130 to train the embedded model of the knowledge-graph at step S230, and the representation unit 140 may obtain an embedded representation of the knowledge-graph using the embedded model at step S240.
It should be understood that, in the exemplary embodiment of the present invention, the respective scoring functions may be searched for the divided relationship groups in step S220 after the division of the relationship groups is completed in step S210, however, as shown in the above algorithm 1, the execution order of step S210 and step S220 is not limited thereto, and the two steps may also be executed together, and the division of the relationship groups may be iteratively updated based on the optimization result of searching the scoring functions in the process of searching the scoring functions, so that the knowledge graph embedding method 200 based on relationship recognition of the present application can search out the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative updating.
The specific processes of detailed operations performed by the above-mentioned components of the knowledge-graph embedding system 100 based on relationship awareness according to the exemplary embodiment of the present disclosure have been described in detail above with reference to fig. 1, and therefore, for brevity, will not be described again here.
Furthermore, the knowledge graph embedding method based on relationship awareness according to the exemplary embodiment of the present disclosure may train a machine learning model based on the embedded representation of the knowledge graph obtained in step S240, obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution, and may perform a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.
That is, the knowledge-graph embedding method and system based on relationship awareness of the exemplary embodiments of the present disclosure may be applied to various fields, such as relationship retrieval, semantic retrieval, smart recommendation, smart question answering, personalized recommendation, anti-fraud, content distribution, and the like.
By way of example only, among various application scenarios of the knowledge graph embedding method and system based on relationship awareness according to exemplary embodiments of the present disclosure, for example, for retrieval (such as relationship retrieval, semantic retrieval, intelligence, etc.), the relationship between them may be retrieved or the corresponding other entity may be retrieved by inputting two keywords, for example, inputting (beijing china) may retrieve that the relationship between them is "capital" (i.e., beijing is the capital of china), or inputting (mother Zhang III) may retrieve another entity "Li IV" (mother Zhang III).
For example, for intelligent question-answering, input "where are the capital of china? The user can accurately return to Beijing, so that the intention of the user can be really understood through the knowledge graph.
For example, for anti-fraud, when information about a borrower (entity) is added to the knowledge-graph, it may be determined whether there is a risk of fraud by reading the relationship between the borrower and others in the knowledge-graph, or whether the information they share is consistent.
For example, for intelligent recommendations (e.g., personalized recommendations), similar content may be recommended to entities of triples having similar relationships. For example, for (three students, high and middle) (i.e., three is a student in high and middle), three may be recommended based on information of other students in high and middle in the knowledge-graph.
In the above different applications of the knowledge graph, evaluation indexes for judging whether the knowledge graph has been properly applied are also different. For example, for search applications, the evaluation index is generally the overall rate and accuracy of the search, for anti-fraud, the evaluation index is generally the credit, probability of fraud, etc., and for intelligent question answering and intelligent recommendation, the evaluation index is the satisfaction or accuracy, etc. Therefore, the evaluation index of the knowledge-graph embedded model is generally determined according to different application scenarios of the knowledge-graph embedded model, and a corresponding scoring function is designed accordingly, so that the embedded model which trains a better knowledge graph can be used by utilizing a better scoring function. According to the scoring function searched out according to the exemplary embodiment of the invention, the best scoring function model can be found by automatically combining the evaluation indexes in the searching process, and the inconvenience of manually designing the scoring function is eliminated. In addition, since the scoring function search space can cover all possible scoring function forms, the method is favorable for expanding the search range so as to find a better scoring function for the knowledge graph.
By applying the knowledge graph embedding method and system based on the relationship cognition, the scoring function of the relationship cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.
A relationship awareness based knowledge graph embedding method and system according to an exemplary embodiment of the present disclosure has been described above with reference to fig. 1 to 2. However, it should be understood that: the apparatus and systems shown in the figures may each be configured as software, hardware, firmware, or any combination thereof that performs the specified function. For example, the systems and apparatuses may correspond to an application-specific integrated circuit, a pure software code, or a module combining software and hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).
Further, the above method may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
The instructions stored in the computer-readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that the instructions can also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are mentioned in the description of the related method with reference to fig. 1 to 2, and therefore will not be described again here to avoid repetition.
It should be noted that the knowledge-graph embedding system based on relationship awareness according to the exemplary embodiments of the present disclosure may fully rely on the execution of computer programs or instructions to realize corresponding functions, i.e., each device corresponds to each step in the functional architecture of the computer programs, so that the whole system is called by a special software package (e.g., a lib library) to realize the corresponding functions.
On the other hand, when the system and apparatus shown in fig. 1 are implemented in software, firmware, middleware or microcode, program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program code or code segments.
For example, according to an exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
In particular, the above-described system may be deployed in a server or client or on a node in a distributed network environment. Further, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). In addition, all components of the system may be connected to each other via a bus and/or a network.
The system here need not be a single system, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the system, the at least one computing device may comprise a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.
While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims (10)

1. A relationship awareness-based knowledge graph embedding method, the method comprising:
dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group;
determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups;
training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and
obtaining an embedded representation of the knowledge-graph using the embedded model.
2. The method of claim 1, wherein the scoring function is represented as:
Figure FDA0002496675680000011
wherein the plurality of relationships in the knowledge-graph are divided into a plurality of relationship groups { CU }d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups,
h. t and r represent the embedding of a head entity h, a tail entity t and a relation r between h and t in a triplet (h, r, t) of the knowledge-graph, respectivelyVector, and h, t and r are divided into K sub-embedded vectors h respectively in the same division manner1To hK、r1To rKAnd t1To tKM is more than or equal to 1 and less than or equal to K, n is more than or equal to 1 and less than or equal to K, and K is a positive integer,
gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,
Figure FDA0002496675680000012
a block representing an mth row and an nth column in the relational block matrix,
Figure FDA0002496675680000013
C1≡{0,1},okis a set of operators
Figure FDA0002496675680000015
The (k) th operator in (2),
Figure FDA0002496675680000014
3. the method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:
determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix;
a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
4. The method according to claim 3, wherein the correspondence matrix structure of the relational block matrix corresponding to each scoring function is determined based on the following expression (2):
Figure FDA0002496675680000021
s.t.
Figure FDA0002496675680000022
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,
Figure FDA0002496675680000023
is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure FDA0002496675680000024
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,
Figure FDA0002496675680000025
is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure FDA0002496675680000026
is based on a structural search space
Figure FDA0002496675680000027
Matrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure FDA0002496675680000028
is a structure search space
Figure FDA0002496675680000029
In the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of scoring functions having the smallest loss, and each matrix structure in the set indicates a corresponding matrix structure of a relationship block matrix corresponding to the each scoring function, respectively.
5. The method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:
determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix;
a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
6. The method according to claim 5, wherein the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function is determined based on expression (3):
Figure FDA0002496675680000031
s.t.
Figure FDA0002496675680000032
wherein the content of the first and second substances,
Figure FDA0002496675680000033
representation and relation block matrix gd(r) a corresponding structural weight matrix,
Svalis a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And is
Figure FDA0002496675680000034
Is a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers,
SP(r,Ad) Indicates having AdRelational block matrix g of indicated structural weightsd(r),
Figure FDA0002496675680000035
Indication of
Figure FDA0002496675680000036
Whether or not to belong to a relationship group CUd
Figure FDA0002496675680000037
Indication rjWhether or not to belong to a relationship group CUd
Figure FDA0002496675680000038
Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,
Figure FDA0002496675680000039
is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
Figure FDA00024966756800000310
is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
7. The method of claim 1 or 6, wherein the step of partitioning the plurality of relationships in the knowledge-graph into a plurality of relationship groups comprises: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.
8. A relationship-awareness-based knowledge-graph embedding system, the system comprising:
a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group;
a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups;
an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and
a representation device configured to obtain an embedded representation of the knowledge-graph using the embedded model.
9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.
10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.
CN202010420480.2A 2020-05-18 2020-05-18 Knowledge graph embedding method and system based on relation cognition Pending CN113688249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010420480.2A CN113688249A (en) 2020-05-18 2020-05-18 Knowledge graph embedding method and system based on relation cognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010420480.2A CN113688249A (en) 2020-05-18 2020-05-18 Knowledge graph embedding method and system based on relation cognition

Publications (1)

Publication Number Publication Date
CN113688249A true CN113688249A (en) 2021-11-23

Family

ID=78575549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010420480.2A Pending CN113688249A (en) 2020-05-18 2020-05-18 Knowledge graph embedding method and system based on relation cognition

Country Status (1)

Country Link
CN (1) CN113688249A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649550A (en) * 2016-10-28 2017-05-10 浙江大学 Joint knowledge embedded method based on cost sensitive learning
KR20180092194A (en) * 2017-02-08 2018-08-17 경북대학교 산학협력단 Method and system for embedding knowledge gragh reflecting logical property of relations, recording medium for performing the method
CN110796254A (en) * 2019-10-30 2020-02-14 南京工业大学 Knowledge graph reasoning method and device, computer equipment and storage medium
US20200065668A1 (en) * 2018-08-27 2020-02-27 NEC Laboratories Europe GmbH Method and system for learning sequence encoders for temporal knowledge graph completion
CN110851614A (en) * 2019-09-09 2020-02-28 中国电子科技集团公司电子科学研究院 Relation prediction deduction method of knowledge graph and dynamic updating method of knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649550A (en) * 2016-10-28 2017-05-10 浙江大学 Joint knowledge embedded method based on cost sensitive learning
KR20180092194A (en) * 2017-02-08 2018-08-17 경북대학교 산학협력단 Method and system for embedding knowledge gragh reflecting logical property of relations, recording medium for performing the method
US20200065668A1 (en) * 2018-08-27 2020-02-27 NEC Laboratories Europe GmbH Method and system for learning sequence encoders for temporal knowledge graph completion
CN110851614A (en) * 2019-09-09 2020-02-28 中国电子科技集团公司电子科学研究院 Relation prediction deduction method of knowledge graph and dynamic updating method of knowledge graph
CN110796254A (en) * 2019-10-30 2020-02-14 南京工业大学 Knowledge graph reasoning method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111602148B (en) Regularized neural network architecture search
US20190278600A1 (en) Tiled compressed sparse matrix format
US20190164084A1 (en) Method of and system for generating prediction quality parameter for a prediction model executed in a machine learning algorithm
US20210150412A1 (en) Systems and methods for automated machine learning
CN111858947A (en) Automatic knowledge graph embedding method and system
WO2020224220A1 (en) Knowledge graph-based question answering method, electronic device, apparatus, and storage medium
CN112905809B (en) Knowledge graph learning method and system
CN110837567A (en) Method and system for embedding knowledge graph
CN112420125A (en) Molecular attribute prediction method and device, intelligent equipment and terminal
CN113377964A (en) Knowledge graph link prediction method, device, equipment and storage medium
US20220383119A1 (en) Granular neural network architecture search over low-level primitives
Pan et al. Context-aware entity typing in knowledge graphs
CN114692889A (en) Meta-feature training model for machine learning algorithm
CN113569018A (en) Question and answer pair mining method and device
JP2022032703A (en) Information processing system
CN113688249A (en) Knowledge graph embedding method and system based on relation cognition
US20240005129A1 (en) Neural architecture and hardware accelerator search
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
CN111506742A (en) Method and system for constructing multivariate relational knowledge base
WO2022249415A1 (en) Information provision device, information provision method, and information provision program
US11609936B2 (en) Graph data processing method, device, and computer program product
US20240004912A1 (en) Hierarchical topic model with an interpretable topic hierarchy
WO2023238258A1 (en) Information provision device, information provision method, and information provision program
CN114328940A (en) Method and system for constructing multivariate relational knowledge base
WO2023009293A9 (en) Antibody competition model using hidden variable affinities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination