CN113688249A - Knowledge graph embedding method and system based on relation cognition - Google Patents
Knowledge graph embedding method and system based on relation cognition Download PDFInfo
- Publication number
- CN113688249A CN113688249A CN202010420480.2A CN202010420480A CN113688249A CN 113688249 A CN113688249 A CN 113688249A CN 202010420480 A CN202010420480 A CN 202010420480A CN 113688249 A CN113688249 A CN 113688249A
- Authority
- CN
- China
- Prior art keywords
- relationship
- knowledge
- graph
- matrix
- tra
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000019771 cognition Effects 0.000 title abstract description 8
- 230000006870 function Effects 0.000 claims abstract description 121
- 238000012549 training Methods 0.000 claims abstract description 68
- 239000011159 matrix material Substances 0.000 claims description 136
- 239000013598 vector Substances 0.000 claims description 108
- 230000014509 gene expression Effects 0.000 claims description 37
- 238000012795 verification Methods 0.000 claims description 32
- 241000764238 Isis Species 0.000 claims description 7
- 238000000638 solvent extraction Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 description 16
- 238000010801 machine learning Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000005070 sampling Methods 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 101150055297 SET1 gene Proteins 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 239000002585 base Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000012458 free base Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Provided are a relation cognition-based knowledge graph embedding method and system, wherein the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
Description
Technical Field
The present application relates to knowledge graph embedding technology in the field of artificial intelligence, and more particularly, to a knowledge graph embedding method and system based on relationship recognition.
Background
With the rapid development of information network technology, various information network data contents present an explosive growth situation. Such content is generally characterized by large scale, heterogeneous diversity and loose organization structure, and provides challenges for people to effectively acquire information and knowledge. Knowledge Graph (KG) is a Knowledge base of semantic network, and can describe Knowledge resources and carriers thereof by using a visualization technology, and mine, analyze, construct, draw and display Knowledge and mutual relations among the Knowledge resources and the carriers.
The knowledge graph is a special graph structure, the entity is used as a node, and the relation is used as a directed edge, which has recently attracted the interest of many people. In the knowledge-graph, each edge is represented as a triple (h, r, t) in the form of (head entity, relationship, tail entity) to indicate that two entities h (i.e., head entity) and t (i.e., tail entity) are connected by a relationship r, e.g., (new york, isLocatedIn, USA) may indicate that new york is located in USA. Many large knowledge maps have been established over the last decades, such as WordNet, Freebase, DBpedia, YAGO. They improve various downstream applications such as structured search, question and answer, and entity recommendations, among others.
In a knowledge graph, one basic problem is how to quantify the similarity of a given triplet (h, r, t) so that subsequent applications can be performed. Recently, Knowledge-map Embedding (KGE) has emerged and developed as a method for this purpose. Knowledge graph embedding aims at finding vector representations (i.e., embedding) of low-dimensional entities and relationships so that their similarity can be quantified. In particular, given a set of observed facts (i.e., triples), knowledge-graph embedding attempts to learn low-dimensional vector representations of entities and relationships in the triples so that the similarity of the triples can be quantified. This similarity can be measured by a Scoring Function (SF), which can be used to build a model based on a given relationship for measuring similarity between entities. To construct the knowledge-graph embedded model, it is most important to design and select a suitable SF. Since different SFs have their own weaknesses and advantages in capturing similarity, the selection of SF is crucial to the performance of the knowledge-graph embedding.
In general, SFs are not task-aware, so they have difficulty obtaining optimal performance across various data sets at all times. Pioneering work in designing task-dependent SFs is the automatic scoring function (AutoSF), which uses automatic machine learning techniques to specify an SF for a given task. In this way, AutoSF becomes the most advanced SF in the knowledge-graph embedding.
However, both the performance and efficiency of AutoSF are not as expected. First, the SF should also be relationship-aware as different relationships exhibit different patterns. Furthermore, a single training of the model is already costly, whereas AutoSF requires thousands of hundreds of trainings. Therefore, there is a need for a method that can efficiently determine the SF of relationship awareness for a given task without multiple model training and that improves both performance and efficiency.
Disclosure of Invention
According to an embodiment of the invention, a knowledge graph embedding method based on relationship cognition is provided, and the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
Alternatively, the scoring function may be expressed as:
wherein the plurality of relationships in the knowledge-graph can be divided into a plurality of relationship groups (CU)d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups, h, t and r respectively represent the head entity h, the tail entity t and the embedded vector of the relation r between h and t in the triplet (h, r, t) of the knowledge graph, and h, t and r are respectively divided into K sub-embedded vectors h according to the same division mode1To hK、r1To rKAnd t1To tKM is 1. ltoreq. K, n is 1. ltoreq. K, and K is a positive integer, gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,a block representing an mth row and an nth column in the relational block matrix,okis a set of operatorsThe (k) th operator in (2),1≤k≤2K+1。
optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the correspondence matrix structure of the relational block matrix corresponding to each scoring function may be determined based on the following expression (2):
s.t.
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural search spaceMatrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,is a structure search spaceIn the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of scoring functions having the smallest loss, and each matrix structure in the set indicates a corresponding matrix structure of a relationship block matrix corresponding to the each scoring function, respectively.
Optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function may be determined based on expression (3):
s.t.
wherein,representation and relation block matrix gd(r) corresponding structural weight matrix, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra(j) Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers, SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),Indication ofWhether or not to belong to a relationship group CUd,Indication rjWhether or not to belong to a relationship group CUd,Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
Optionally, the step of dividing the plurality of relationships in the knowledge-graph into a plurality of relationship groups may comprise: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.
Alternatively,may be determined by the following operations based on expression (3): an initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indicationWherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd(ii) a Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indicationUpdating { A ] by performing at least one iterative operationd}; to be used in the last iteration operationdDiscrete samples of } are determined asEach iteration of the operation may include the following operations: set of weighting matrices for structure { AdSampling to obtain a set of discrete structure weight matricesBased on the obtainedUpdating the embedding vector X ═ { h, r, t }; assigning indications to relationship groupsUpdating is carried out; to { AdThe update is performed.
Optionally, a set of weight matrices for the structure { AdThe step of sampling may comprise: as determined by expression (4)Fixed BdRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraintOf a discrete set of structural weight matrices
Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,indicating operator setOf 1 indicates that all are
Optionally based on the obtainedThe step of updating the embedding vector X ═ { h, r, t } may include: from the training set StraSelecting a predetermined number of triplets as the mini-batch set Btra(ii) a And updating the embedding vector X ═ { h, r, t } according to the following expression (5):
where eta is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
Optionally, an indication is assigned to the relationship groupThe step of performing the update may comprise: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results
Alternatively, for { AdThe step of updating may comprise: from the verification set SvalSelecting a predetermined number of triplets as the mini-batch set Bval(ii) a Pair set { A) according to the following expression (6)dUpdating each matrix in the list:
wherein ε is a predetermined step length, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval (i’)Representing a small batch set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet (h) withi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,indication ofWhether or not to belong to a relationship group CUd,1≤i’≤|BvalAnd i' is an integer.
According to another embodiment of the present invention, there is provided a relationship-awareness-based knowledge-graph embedding system, including: a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group; a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups; an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and a representation means configured to obtain an embedded representation of the knowledge-graph using the embedded model.
Alternatively, the scoring function may be expressed as:
wherein the plurality of relationships in the knowledge-graph can be divided into a plurality of relationship groups (CU)d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups, h, t and r respectively represent the head entity h, the tail entity t and the embedded vector of the relation r between h and t in the triplet (h, r, t) of the knowledge graph, and h, t and r are respectively divided into K sub-embedded vectors h according to the same division mode1To hK、r1To rKAnd t1To tKM is 1. ltoreq. K, n is 1. ltoreq. K, and K is a positive integer, gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,a block representing an mth row and an nth column in the relational block matrix,C1≡{0,1},okis a set of operatorsThe (k) th operator in (2),1≤k≤2K+1。
alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Optionally, the search means is configured to determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions based on the following expression (2):
s.t.
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraRespectively is the knowledge-graphIs a subset of the triple set, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural search spaceMatrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,is a structure search spaceIn the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of the scoring function having the smallest loss, and each matrix structure in the set is referred to respectivelyAnd showing a corresponding matrix structure of the relation block matrix corresponding to each scoring function.
Alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
Alternatively, the search means may be configured to determine the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):
s.t.
wherein,representation and relation block matrix gd(r) corresponding structural weight matrix, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)To representTraining set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers, SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),Indication ofWhether or not to belong to a relationship group CUd,Indication rjWhether or not to belong to a relationship group CUd,Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
Alternatively, the relationship dividing means may be configured to divide the plurality of relationships into a plurality of relationship groups using a clustering method.
Alternatively, the search means may be configured to determine by the following operation based on expression (3)An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indicationWherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd(ii) a Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indicationUpdating { A ] by performing at least one iterative operationd}; to be used in the last iteration operationdDiscrete samples of } are determined asEach iteration of the operation may include the following operations: set of structural weight matrices { A } by the search meansdSampling to obtain a set of discrete structure weight matricesBased on obtained by the searching meansUpdating the embedding vector X ═ { h, r, t }; assigning indications to relationship groups by a relationship partitioning meansUpdating is carried out; bySearch device pair { AdThe update is performed.
Alternatively, the search means may be configured to construct the set of weight matrices { A } bydSampling: with B determined by expression (4)dRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraintOf a discrete set of structural weight matrices
Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,indicating operator set1 indicates a column vector of all 1, C2={a(m,n)|||a(m,n)||0=1}, 。
Alternatively, the search means may be configured to update the embedding vector X ═ { h, r, t } by: from the training set StraSelecting a predetermined number of triplets as the mini-batch set Btra(ii) a The embedding vector X ═ { h, r, t } is updated according to the following expression (5):
where eta is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
Alternatively, the relationship partitioning apparatus may be configured to assign an indication to the relationship group byUpdating: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results
Alternatively, the search apparatus may be configured to operate on { A } bydUpdating: from the verification set SvalSelecting a predetermined number of triplets as the mini-batch set Bval(ii) a Pair set { A) according to the following expression (6)dUpdating each matrix in the list:
wherein ε is a predetermined step length, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval(i') denotes a small lot set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet ofhi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,indication ofWhether or not to belong to a relationship group CUd,1≤i’≤|BvalAnd i' is an integer.
According to another embodiment of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.
According to another embodiment of the present invention, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.
Advantageous effects
By applying the knowledge graph embedding method and system based on the relational cognition according to the exemplary embodiment of the invention, SF of the relational cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.
Drawings
These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system according to an exemplary embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method according to an exemplary embodiment of the present disclosure.
Detailed Description
In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.
Before starting the description of the inventive concept below, for the sake of understanding, the various parameters and their expressions used in the present application will be explained first:
vectors are represented by lower case bold and matrices by upper case bold.
For a knowledge-graph, its set of entities and relationships are represented by ε and R, respectively. The triples in the knowledgegraph are represented by (h, R, t), where h e ε and t e ε are the head and tail entities, respectively, and R e R is the relationship.
The parameters of the knowledge-graph embedding model are denoted as e (for each entity) and r (for relationships). For simplicity, in the following, embedding (sometimes also referred to as embedding vector in the following) is represented by a bold-faced form of the corresponding parameter, e.g.,is an embedding of a.
<a,b,c>Is a dot product and for a real-valued vector it equalsWhereas for a complex-valued vector it is the Hermitian product.
furthermore, in the context of the present disclosure, parameters having the same expression have the same definition.
f (h, r, t) is a scoring function that returns a real number value reflecting the similarity of the triples (h, r, t), with higher scores representing more similarity.
Fig. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system 100 according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, the knowledge-graph embedding system 100 based on relationship recognition may include a relationship dividing means 110, a searching means 120, an embedding model training means 130, and a representing means 140.
The relationship dividing apparatus 110 according to an exemplary embodiment of the present disclosure may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups such that each relationship in the knowledge graph belongs to only one relationship group.
In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups using a clustering method, however, it should be understood that the method of dividing the plurality of relationships in the knowledge graph is not limited thereto, and any other suitable method may be used to divide the relationships.
Hereinafter, for ease of understanding, the plurality of relationship groups may be represented as { CU }d},CUdRepresenting the d-th relational group and arbitraryR is a set of relationships in the knowledge graph, D is greater than or equal to 1 and less than or equal to D, and D is the number of the plurality of relationship groups.
The search means 120 may determine a scoring function corresponding to each relationship group based on the division result, and obtain a set of scoring functions for the plurality of relationship groups.
In an exemplary embodiment of the present invention, the form of the scoring function may be expressed as follows:
here, fd(h, r, t) is the relation group CUdAnd corresponding scoring functions, h, t and r respectively represent embedded vectors of a head entity h, a tail entity t and a relation r between h and t in the triplet (h, r, t) of the knowledge graph. H, t and r in expression (1) can be divided into K sub-embedded vectors h respectively according to the same division manner1To hK、r1To rKAnd t1To tK,1≤m≤K,1≤n≤K,And K is a positive integer. gd(r) is a function f of the scored(h, r, t) corresponding to a KxK relational block matrix,a block representing an mth row and an nth column in the relational block matrix,C1≡{0,1},okis a set of operatorsThe (k) th operator in (2),1≤k≤2K+1。
in the embodiment of the present invention, being divided in the same division manner means that K sub-embedded vectors h obtained by dividing the embedded vectors h, r, and t1To hK、r1To rKAnd t1To tKThe corresponding sub-embedding vectors have the same dimension, i.e., h1、r1And t1Dimension is the same, h2、r2And t2The dimensions are the same, and so on. Furthermore, in embodiments of the present invention, in partitioning, the embedding vectors h, r, and t may be partitioned uniformly (i.e., dimensions of each sub-embedding vector are the same, e.g., sub-embedding vector h1To hKSame dimension) or may be non-uniformly partitioned (i.e., dimensions of individual sub-embedding vectors are not exactly the same, e.g., sub-embedding vector h1To hKAre not all the same).
As can be seen from the above representation of the scoring functions, the relational block matrix g of the different scoring functionsdThe main difference between (r) is the sub-embedding vector r1,...,±rKDistribution of (2). Thus, after the embedded vectors h, r, and t are partitioned, they can be based on gdNon-zero blocks in a K block matrix of (r) (i.e., sub-embedding vectors + -r)1,...,±rK) The distribution of the score is designed into a plurality of candidate scoring functionsNumbers, thereby constituting a scoring function search space. The process of the search means 120 determining the scoring function corresponding to each relationship group may be a process of searching the scoring function corresponding to each relationship group in the scoring function search space.
Therefore, in the embodiment of the present invention, in combination with the above-mentioned method for constructing the scoring function search space, the block matrix g of the correspondence relationship that the appropriate scoring function searched in the scoring function search space can be actually converted into the scoring functiond(r) finding a suitable matrix structure gdIndicates the relationship block matrix gdDistribution of non-zero blocks in (r) (i.e., sub-embedding vectors ± r)1,…,±rKDistribution structure in the block matrix), and may then be based on the block matrix structure gdTo determine the relational block matrix gd(r) whereby the block matrix g can be based on the determined relationd(r) obtaining a corresponding scoring function.
Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions using the following expression (2):
s.t.
in the expression (2), gdIs a reaction of with gd(r) a corresponding matrix structure,is a structure search space including a plurality of matrix structures corresponding to the above scoring function search space, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers.
In addition to this, the present invention is,is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural search spaceMatrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.
Is a structure search spaceIn the verification set S, an embedding model based on the embedding vector parametersvalThe searching apparatus 120 may determine each matrix structure in the set as a corresponding matrix structure of the relationship block matrix corresponding to each scoring function.
Further, in the exemplary embodiment of the present invention, sinceThus, the relational block matrix gdNon-zero blocks in (r) (i.e., sub-embedding vectors ± r)1,...,±rK) Is actually dependent onTherefore, in an exemplary embodiment of the present invention, searching for an appropriate scoring function in the scoring function search space may be further converted into a corresponding relationship block matrix g of the scoring functiond(r) finding a suitable corresponding structural weight matrixIndicating the relationship block matrix gd(r) structural weights of the individual blocks in (r), and a relational block matrix g may then be determined based on the structural weight matrixd(r) thereby enabling block matrix g based on the determined relationshipd(r) obtaining a corresponding scoring function.
Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a corresponding structure weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):
s.t.
in expression (3), SP (r, A)d) Indicates having AdRelational block matrix g of indicated structural weightsd(r),Indication ofWhether or not to belong to a relationship group CUd,Indication rjWhether or not to belong to a relationship group CUd。Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.
Is to make the embedding model based on the embedding vector parameters in the verification set SvalThe searching apparatus 120 may determine each structural weight matrix in the set as corresponding to a relational block matrix corresponding to each scoring function, respectively.
Further, as shown in expressions (2) and (3), the above expressions (2) and (3) are actual packagesTwo levels of optimization are involved. However, it should be understood that, for example, when K is 4 (i.e., the embedding vectors h, r, and t are divided into 4 sub-embedding vectors, respectively), g is for a 4 × 4 block matrixdIs provided withThere are possible structures (9 choices for each sub-block, i.e., 0, ± r)1,...,±r4) I.e. the structure search spaceIncludedOf the same structure, and likewise, AdThere are also many possible forms. Therefore, when the matrix structure set is directly searched using the above expressions (2) and (3)Or a collection of structural weight matricesThe search process can be quite complex and time consuming.
Thus, in exemplary embodiments of the present invention, search apparatus 120 may determine in expression (3) based on the following operations, by way of example only
(A) An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }dAnd relationship group assignment indicationWherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,indicating relationships r in the knowledge-graphtWhether or not to belong to a relationship group CUd. Here, the searcher 120 may determine the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a } using gaussian initialization or the likedAnd relationship group assignment indicationIt should be understood, however, that the present application is not limited thereto, and that any other suitable method may be used for initialization to obtain the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a }, as welldAnd relationship group assignment indication
(B) Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { AdAnd relationship group assignment indicationUpdating { A ] by performing at least one iterative operationdAnd will be used in the last iteration operation { A }dDiscrete samples of } are determined as
By way of example only, in expression (3) above, the determination may be based on the following example algorithm 1
Referring to algorithm 1, in an exemplary embodiment of the present invention, each iteration operation may include the following operations (B1) to (B4):
(B1) set of structural weight matrices { A } by the search means 120dSampling to obtain a set of discrete structure weight matrices(step 3 in algorithm 1).
Here, the search means 120 may determine B by the following expression (4)dRepeatedly decimating A as a sampling probabilitydTo obtain satisfaction of the constraintOf a discrete set of structural weight matrices
In the expression (4), τ ∈ [0, 1 ]]Is a pre-set hyper-parameter which,indicating operator set1 indicates a column vector of all 1 s. Furthermore, C2={a(m,n)|||a(m,n)||0=1},
In an exemplary embodiment of the present invention, when updating the embedded vector X ═ { h, r, t }, the embedded vector X may first be updated from the training set StraSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set Btra(step 4 in algorithm 1), and then updates the embedded vector X ═ { h, r, t } according to the following expression (5) (step 5 in algorithm 1):
where η is a preset step length, | BtraI denotes a small batch set BtraNumber of triples in (1), Btra (j’)Representing a small batch set BtraThe jth triplet (h) of (1)j’,rj’,tj’) And (h)j’,rj’,tj’) Is a triplet (h) withj’,rj’,tj’) The corresponding embedded vector is then used to generate the embedded vector,indication rj’Whether or not to belong to a relationship group CUd,1≤j’≤|BtraAnd j' is an integer.
(B3) Assigning indications to relationship groups by the relationship partitioning means 110And (6) updating.
In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may assign a plurality of relationships in the knowledge-graph to different relationship groups using a clustering method, and then update the relationships based on the assignment resultHere, whenAfter being determined, the expression used in the preceding expressionAndcan be determined accordingly.
For example only, the relational division apparatus 110 may implement clustering (step 6 in algorithm 1) according to the following expression (6):
wherein r istIs a relation R in a set of relations R with the knowledge-graphtCorresponding embedded vector, cdIs a relationship group CUdVector representation of bdtRepresents CUdAnd rtDegree of membership between.
In an exemplary implementation of the present invention, the EM algorithm may be used to obtain a solution to expression (6) above to determine the assignment of the relationship. Specifically, in the EM algorithm, the E step shown in the following expression (7) and the M step shown in the following expression (8) may be iteratively performed until the cluster group (i.e., the relationship group) of the clusters converges, thereby obtaining the assignment result of the relationship:
after completion of clustering, based on the clustering result { b }dtGet the value of arg max if dd′bd′tThen make it possible toOtherwise make theThereby obtainingCan be updated
(B4) By the search apparatus 120 for { AdThe update is performed.
In an exemplary embodiment of the present invention, in the pair { A }dIn the updating step, the verification set S can be firstly selectedvalSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set Bval(step 7 in Algorithm 1), and then pair the set { A) according to the following expression (9)dUpdate each matrix in (steps 8 to 10 in algorithm 1):
here, ε is the preset step size, | BvalI denotes a small batch set BvalNumber of triples in (1), Bval (i’)Representing a small batch set BvalThe ith' triplet (h) of (c)i’,ri’,ti’) And (h)i’,ri’,ti’) Is a triplet (h) withi’,ri’,ti’) The corresponding embedded vector is then used to generate the embedded vector,indication ofWhether or not to belong to a relationship group CUd,1≤i′≤|BvalAnd i' is an integer.
In an exemplary embodiment of the invention, the last iteration in algorithm 1 is performed as determined in step 3Can be determined to be finally obtained
Further, although it is shown in the above description that the embedded vector X ═ h, r, t }, the set of structural weight matrices { a } are sequentially aligned in each iterative updatedAnd relationship set assignment indicationsThe update is performed, but the present invention is not limited thereto, and the embedded vector X ═ h, r, t, and the set of structure weight matrices { a }dAnd relationship set assignment indicationsThe update order of the three can be arbitrarily set, and is not limited to the order in the above algorithm 1.
Furthermore, in an exemplary embodiment of the present invention, the set of structural weight matrices { a } is applied to the embedded vector X ═ { h, r, t }dAnd relationship set assignment indicationsThe embedded vector X involved in the update process is { h, r, t }, the set of structure weight matrices { a }dAnd relationship set assignment indicationsThe update result in the last iteration operation may be also the update result of the corresponding parameter in the current iteration operation. For example, in algorithm 1 shown above, set A is paired in step 9dThe embedding vector h, r, t used in the expression (9) for updating may be the embedding vector updated in the last iteration operation, or the embedding vector updated in step 5 in the current iteration operation.
It should be understood that the various algorithms used in the above steps (B1) to (B4) of the iterative operations or the various specific calculation methods shown in the form of expressions are only examples listed for the convenience of understanding the present application, the present application is not limited thereto, and the operations of the steps (B1) to (B4) may be accomplished using other methods.
Further, it should also be understood that, in the exemplary embodiment of the present invention, the searching means 120 may search for the corresponding scoring function for each of the divided relationship groups after the relationship dividing means 110 completes the division of the relationship groups, however, the present application is not limited thereto, and as shown in the above algorithm 1, the relationship dividing means 110 may also iteratively update the division of the relationship groups based on the optimization result of the searching means 120 in the process of the searching means 120 searching for the scoring function, thereby enabling the knowledge graph embedding system 100 based on relationship recognition of the present application to search for the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative update.
After searching out the scoring functions of the respective relationship groups, the embedded model training device 130 may train the embedded model of the knowledge-graph based on the obtained set of scoring functions, and the representing device 140 may obtain the embedded representation of the knowledge-graph using the embedded model.
Further, although not shown in fig. 1, the knowledge-graph embedding system 100 based on relationship awareness according to an exemplary embodiment of the present disclosure may further include: a machine learning model training unit (not shown) for training a machine learning model based on the obtained embedded representation of the knowledge graph to obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution; and a prediction unit (not shown) for performing a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.
Fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method 200 according to an exemplary embodiment of the present disclosure.
As shown in fig. 2, in step S210, the plurality of relationships in the knowledge-graph may be divided into a plurality of relationship groups by the relationship dividing unit 110 such that each relationship in the knowledge-graph belongs to only one relationship group.
In step S220, the search unit 120 may determine a scoring function corresponding to each relationship group based on the division result, obtaining a set of scoring functions for the plurality of relationship groups.
Thereafter, the obtained set of scoring functions may be used by the embedded model training unit 130 to train the embedded model of the knowledge-graph at step S230, and the representation unit 140 may obtain an embedded representation of the knowledge-graph using the embedded model at step S240.
It should be understood that, in the exemplary embodiment of the present invention, the respective scoring functions may be searched for the divided relationship groups in step S220 after the division of the relationship groups is completed in step S210, however, as shown in the above algorithm 1, the execution order of step S210 and step S220 is not limited thereto, and the two steps may also be executed together, and the division of the relationship groups may be iteratively updated based on the optimization result of searching the scoring functions in the process of searching the scoring functions, so that the knowledge graph embedding method 200 based on relationship recognition of the present application can search out the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative updating.
The specific processes of detailed operations performed by the above-mentioned components of the knowledge-graph embedding system 100 based on relationship awareness according to the exemplary embodiment of the present disclosure have been described in detail above with reference to fig. 1, and therefore, for brevity, will not be described again here.
Furthermore, the knowledge graph embedding method based on relationship awareness according to the exemplary embodiment of the present disclosure may train a machine learning model based on the embedded representation of the knowledge graph obtained in step S240, obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution, and may perform a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.
That is, the knowledge-graph embedding method and system based on relationship awareness of the exemplary embodiments of the present disclosure may be applied to various fields, such as relationship retrieval, semantic retrieval, smart recommendation, smart question answering, personalized recommendation, anti-fraud, content distribution, and the like.
By way of example only, among various application scenarios of the knowledge graph embedding method and system based on relationship awareness according to exemplary embodiments of the present disclosure, for example, for retrieval (such as relationship retrieval, semantic retrieval, intelligence, etc.), the relationship between them may be retrieved or the corresponding other entity may be retrieved by inputting two keywords, for example, inputting (beijing china) may retrieve that the relationship between them is "capital" (i.e., beijing is the capital of china), or inputting (mother Zhang III) may retrieve another entity "Li IV" (mother Zhang III).
For example, for intelligent question-answering, input "where are the capital of china? The user can accurately return to Beijing, so that the intention of the user can be really understood through the knowledge graph.
For example, for anti-fraud, when information about a borrower (entity) is added to the knowledge-graph, it may be determined whether there is a risk of fraud by reading the relationship between the borrower and others in the knowledge-graph, or whether the information they share is consistent.
For example, for intelligent recommendations (e.g., personalized recommendations), similar content may be recommended to entities of triples having similar relationships. For example, for (three students, high and middle) (i.e., three is a student in high and middle), three may be recommended based on information of other students in high and middle in the knowledge-graph.
In the above different applications of the knowledge graph, evaluation indexes for judging whether the knowledge graph has been properly applied are also different. For example, for search applications, the evaluation index is generally the overall rate and accuracy of the search, for anti-fraud, the evaluation index is generally the credit, probability of fraud, etc., and for intelligent question answering and intelligent recommendation, the evaluation index is the satisfaction or accuracy, etc. Therefore, the evaluation index of the knowledge-graph embedded model is generally determined according to different application scenarios of the knowledge-graph embedded model, and a corresponding scoring function is designed accordingly, so that the embedded model which trains a better knowledge graph can be used by utilizing a better scoring function. According to the scoring function searched out according to the exemplary embodiment of the invention, the best scoring function model can be found by automatically combining the evaluation indexes in the searching process, and the inconvenience of manually designing the scoring function is eliminated. In addition, since the scoring function search space can cover all possible scoring function forms, the method is favorable for expanding the search range so as to find a better scoring function for the knowledge graph.
By applying the knowledge graph embedding method and system based on the relationship cognition, the scoring function of the relationship cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.
A relationship awareness based knowledge graph embedding method and system according to an exemplary embodiment of the present disclosure has been described above with reference to fig. 1 to 2. However, it should be understood that: the apparatus and systems shown in the figures may each be configured as software, hardware, firmware, or any combination thereof that performs the specified function. For example, the systems and apparatuses may correspond to an application-specific integrated circuit, a pure software code, or a module combining software and hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).
Further, the above method may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
The instructions stored in the computer-readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that the instructions can also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are mentioned in the description of the related method with reference to fig. 1 to 2, and therefore will not be described again here to avoid repetition.
It should be noted that the knowledge-graph embedding system based on relationship awareness according to the exemplary embodiments of the present disclosure may fully rely on the execution of computer programs or instructions to realize corresponding functions, i.e., each device corresponds to each step in the functional architecture of the computer programs, so that the whole system is called by a special software package (e.g., a lib library) to realize the corresponding functions.
On the other hand, when the system and apparatus shown in fig. 1 are implemented in software, firmware, middleware or microcode, program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program code or code segments.
For example, according to an exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.
In particular, the above-described system may be deployed in a server or client or on a node in a distributed network environment. Further, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). In addition, all components of the system may be connected to each other via a bus and/or a network.
The system here need not be a single system, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the system, the at least one computing device may comprise a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.
While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.
Claims (10)
1. A relationship awareness-based knowledge graph embedding method, the method comprising:
dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group;
determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups;
training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and
obtaining an embedded representation of the knowledge-graph using the embedded model.
2. The method of claim 1, wherein the scoring function is represented as:
wherein the plurality of relationships in the knowledge-graph are divided into a plurality of relationship groups { CU }d},fd(h, r, t) is the relation group CUdD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups,
h. t and r represent the embedding of a head entity h, a tail entity t and a relation r between h and t in a triplet (h, r, t) of the knowledge-graph, respectivelyVector, and h, t and r are divided into K sub-embedded vectors h respectively in the same division manner1To hK、r1To rKAnd t1To tKM is more than or equal to 1 and less than or equal to K, n is more than or equal to 1 and less than or equal to K, and K is a positive integer,
3. the method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:
determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix;
a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
4. The method according to claim 3, wherein the correspondence matrix structure of the relational block matrix corresponding to each scoring function is determined based on the following expression (2):
s.t.
wherein, gdIs a reaction of with gd(r) a corresponding matrix structure,is a structure search space comprising a plurality of matrix structures, SvalIs a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|StraAnd i and j are integers,
is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural search spaceMatrix structure g in (1)dUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
is a structure search spaceIn the verification set S, an embedding model based on the embedding vector parametersvalA set of matrix structures of scoring functions having the smallest loss, and each matrix structure in the set indicates a corresponding matrix structure of a relationship block matrix corresponding to the each scoring function, respectively.
5. The method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:
determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix;
a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.
6. The method according to claim 5, wherein the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function is determined based on expression (3):
s.t.
Svalis a verification set, StraIs a training set, and SvalAnd StraAre subsets of the triplet set of the knowledge-graph, | SvalI and I StraRespectively representing a verification set SvalAnd training set StraNumber of triples in (1), Sval (i)Representation verification set SvalThe ith triplet (h) in (c)i,ri,ti) And isIs a triplet (h) withi,ri,ti) Corresponding embedding vector, Stra (j)Represents the training set StraThe jth triplet (h) of (a)j,rj,tj) And (h)j,rj,tj) Is a triplet (h) withj,rj,tj) Corresponding embedded vector, 1 ≤ i ≤ Sval|,1≤j≤|SvalAnd i and j are integers,
SP(r,Ad) Indicates having AdRelational block matrix g of indicated structural weightsd(r),Indication ofWhether or not to belong to a relationship group CUd,Indication rjWhether or not to belong to a relationship group CUd,
Is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,is based on a structural weight matrix AdUsing training set StraIn-training set S obtained by model training of embedded models of knowledge graphtraThe embedding vector parameters of the embedding model with the least loss,
is to make the embedding model based on the embedding vector parameters in the verification set SvalAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.
7. The method of claim 1 or 6, wherein the step of partitioning the plurality of relationships in the knowledge-graph into a plurality of relationship groups comprises: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.
8. A relationship-awareness-based knowledge-graph embedding system, the system comprising:
a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group;
a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups;
an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and
a representation device configured to obtain an embedded representation of the knowledge-graph using the embedded model.
9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.
10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010420480.2A CN113688249B (en) | 2020-05-18 | 2020-05-18 | Knowledge graph embedding method and system based on relational cognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010420480.2A CN113688249B (en) | 2020-05-18 | 2020-05-18 | Knowledge graph embedding method and system based on relational cognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113688249A true CN113688249A (en) | 2021-11-23 |
CN113688249B CN113688249B (en) | 2024-06-14 |
Family
ID=78575549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010420480.2A Active CN113688249B (en) | 2020-05-18 | 2020-05-18 | Knowledge graph embedding method and system based on relational cognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113688249B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649550A (en) * | 2016-10-28 | 2017-05-10 | 浙江大学 | Joint knowledge embedded method based on cost sensitive learning |
KR20180092194A (en) * | 2017-02-08 | 2018-08-17 | 경북대학교 산학협력단 | Method and system for embedding knowledge gragh reflecting logical property of relations, recording medium for performing the method |
CN110796254A (en) * | 2019-10-30 | 2020-02-14 | 南京工业大学 | Knowledge graph reasoning method and device, computer equipment and storage medium |
US20200065668A1 (en) * | 2018-08-27 | 2020-02-27 | NEC Laboratories Europe GmbH | Method and system for learning sequence encoders for temporal knowledge graph completion |
CN110851614A (en) * | 2019-09-09 | 2020-02-28 | 中国电子科技集团公司电子科学研究院 | Relation prediction deduction method of knowledge graph and dynamic updating method of knowledge graph |
-
2020
- 2020-05-18 CN CN202010420480.2A patent/CN113688249B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649550A (en) * | 2016-10-28 | 2017-05-10 | 浙江大学 | Joint knowledge embedded method based on cost sensitive learning |
KR20180092194A (en) * | 2017-02-08 | 2018-08-17 | 경북대학교 산학협력단 | Method and system for embedding knowledge gragh reflecting logical property of relations, recording medium for performing the method |
US20200065668A1 (en) * | 2018-08-27 | 2020-02-27 | NEC Laboratories Europe GmbH | Method and system for learning sequence encoders for temporal knowledge graph completion |
CN110851614A (en) * | 2019-09-09 | 2020-02-28 | 中国电子科技集团公司电子科学研究院 | Relation prediction deduction method of knowledge graph and dynamic updating method of knowledge graph |
CN110796254A (en) * | 2019-10-30 | 2020-02-14 | 南京工业大学 | Knowledge graph reasoning method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113688249B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111602148B (en) | Regularized neural network architecture search | |
JP7343568B2 (en) | Identifying and applying hyperparameters for machine learning | |
US20190278600A1 (en) | Tiled compressed sparse matrix format | |
US20190164084A1 (en) | Method of and system for generating prediction quality parameter for a prediction model executed in a machine learning algorithm | |
US20210150412A1 (en) | Systems and methods for automated machine learning | |
CN110837567A (en) | Method and system for embedding knowledge graph | |
CN111858947A (en) | Automatic knowledge graph embedding method and system | |
US20220383119A1 (en) | Granular neural network architecture search over low-level primitives | |
CN112905809B (en) | Knowledge graph learning method and system | |
JP7504192B2 (en) | Method and apparatus for searching images - Patents.com | |
CN112420125A (en) | Molecular attribute prediction method and device, intelligent equipment and terminal | |
CN113377964A (en) | Knowledge graph link prediction method, device, equipment and storage medium | |
JP2022032703A (en) | Information processing system | |
US20240005129A1 (en) | Neural architecture and hardware accelerator search | |
CN114692889A (en) | Meta-feature training model for machine learning algorithm | |
CN113569018A (en) | Question and answer pair mining method and device | |
CN113688249B (en) | Knowledge graph embedding method and system based on relational cognition | |
US11609936B2 (en) | Graph data processing method, device, and computer program product | |
CN113010687B (en) | Exercise label prediction method and device, storage medium and computer equipment | |
CN111506742A (en) | Method and system for constructing multivariate relational knowledge base | |
CN114357138A (en) | Question and answer identification method and device, electronic equipment and readable storage medium | |
WO2022249415A1 (en) | Information provision device, information provision method, and information provision program | |
WO2023238258A1 (en) | Information provision device, information provision method, and information provision program | |
US20240004912A1 (en) | Hierarchical topic model with an interpretable topic hierarchy | |
CN114328940A (en) | Method and system for constructing multivariate relational knowledge base |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |