CN113688249A

CN113688249A - Knowledge graph embedding method and system based on relation cognition

Info

Publication number: CN113688249A
Application number: CN202010420480.2A
Authority: CN
Inventors: 姚权铭
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2021-11-23
Anticipated expiration: 2040-05-18
Also published as: CN113688249B

Abstract

Provided are a relation cognition-based knowledge graph embedding method and system, wherein the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.

Description

Knowledge graph embedding method and system based on relation cognition

Technical Field

The present application relates to knowledge graph embedding technology in the field of artificial intelligence, and more particularly, to a knowledge graph embedding method and system based on relationship recognition.

Background

With the rapid development of information network technology, various information network data contents present an explosive growth situation. Such content is generally characterized by large scale, heterogeneous diversity and loose organization structure, and provides challenges for people to effectively acquire information and knowledge. Knowledge Graph (KG) is a Knowledge base of semantic network, and can describe Knowledge resources and carriers thereof by using a visualization technology, and mine, analyze, construct, draw and display Knowledge and mutual relations among the Knowledge resources and the carriers.

The knowledge graph is a special graph structure, the entity is used as a node, and the relation is used as a directed edge, which has recently attracted the interest of many people. In the knowledge-graph, each edge is represented as a triple (h, r, t) in the form of (head entity, relationship, tail entity) to indicate that two entities h (i.e., head entity) and t (i.e., tail entity) are connected by a relationship r, e.g., (new york, isLocatedIn, USA) may indicate that new york is located in USA. Many large knowledge maps have been established over the last decades, such as WordNet, Freebase, DBpedia, YAGO. They improve various downstream applications such as structured search, question and answer, and entity recommendations, among others.

In a knowledge graph, one basic problem is how to quantify the similarity of a given triplet (h, r, t) so that subsequent applications can be performed. Recently, Knowledge-map Embedding (KGE) has emerged and developed as a method for this purpose. Knowledge graph embedding aims at finding vector representations (i.e., embedding) of low-dimensional entities and relationships so that their similarity can be quantified. In particular, given a set of observed facts (i.e., triples), knowledge-graph embedding attempts to learn low-dimensional vector representations of entities and relationships in the triples so that the similarity of the triples can be quantified. This similarity can be measured by a Scoring Function (SF), which can be used to build a model based on a given relationship for measuring similarity between entities. To construct the knowledge-graph embedded model, it is most important to design and select a suitable SF. Since different SFs have their own weaknesses and advantages in capturing similarity, the selection of SF is crucial to the performance of the knowledge-graph embedding.

In general, SFs are not task-aware, so they have difficulty obtaining optimal performance across various data sets at all times. Pioneering work in designing task-dependent SFs is the automatic scoring function (AutoSF), which uses automatic machine learning techniques to specify an SF for a given task. In this way, AutoSF becomes the most advanced SF in the knowledge-graph embedding.

However, both the performance and efficiency of AutoSF are not as expected. First, the SF should also be relationship-aware as different relationships exhibit different patterns. Furthermore, a single training of the model is already costly, whereas AutoSF requires thousands of hundreds of trainings. Therefore, there is a need for a method that can efficiently determine the SF of relationship awareness for a given task without multiple model training and that improves both performance and efficiency.

Disclosure of Invention

According to an embodiment of the invention, a knowledge graph embedding method based on relationship cognition is provided, and the method comprises the following steps: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.

Alternatively, the scoring function may be expressed as:

wherein the plurality of relationships in the knowledge-graph can be divided into a plurality of relationship groups (CU)_d}，f_d(h, r, t) is the relation group CU_dD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups, h, t and r respectively represent the head entity h, the tail entity t and the embedded vector of the relation r between h and t in the triplet (h, r, t) of the knowledge graph, and h, t and r are respectively divided into K sub-embedded vectors h according to the same division mode₁To h_K、r₁To r_KAnd t₁To t_KM is 1. ltoreq. K, n is 1. ltoreq. K, and K is a positive integer, g_d(r) is a function f of the score_d(h, r, t) corresponding to a KxK relational block matrix,

a block representing an mth row and an nth column in the relational block matrix,

o^kis a set of operators

The (k) th operator in (2),

1≤k≤2K+1。

optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

Alternatively, the correspondence matrix structure of the relational block matrix corresponding to each scoring function may be determined based on the following expression (2):

s.t.

wherein, g_dIs a reaction of with g_d(r) a corresponding matrix structure,

is a structure search space comprising a plurality of matrix structures, S_valIs a verification set, S_traIs a training set, and S_valAnd S_traAre subsets of the triplet set of the knowledge-graph, | S_valI and I S_traRespectively representing a verification set S_valAnd training set S_traNumber of triples in (1), S_val ⁽ⁱ⁾Representation verification set S_valThe ith triplet (h) in (c)_i，r_i，t_i) And is

Is a triplet (h) with_i，r_i，t_i) Corresponding embedding vector, S_tra ^(j)Represents the training set S_traThe jth triplet (h) of (a)_j，r_j，t_j) And (h)_j，r_j，t_j) Is a triplet (h) with_j，r_j，t_j) Corresponding embedded vector, 1 ≤ i ≤ S_val|，1≤j≤|S_traAnd i and j are integers,

is a loss function for measuring the loss of a given embedding vector (h, r, t) over the corresponding data S,

is based on a structural search space

Matrix structure g in (1)_dUsing training set S_traIn-training set S obtained by model training of embedded models of knowledge graph_traThe embedding vector parameters of the embedding model with the least loss,

is a structure search space

In the verification set S, an embedding model based on the embedding vector parameters_valA set of matrix structures of scoring functions having the smallest loss, and each matrix structure in the set indicates a corresponding matrix structure of a relationship block matrix corresponding to the each scoring function, respectively.

Optionally, the step of determining a scoring function corresponding to each relationship group may comprise: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

Alternatively, the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function may be determined based on expression (3):

s.t.

wherein,

representation and relation block matrix g_d(r) corresponding structural weight matrix, S_valIs a verification set, S_traIs a training set, and S_valAnd S_traAre subsets of the triplet set of the knowledge-graph, | S_valI and I S_traRespectively representing a verification set S_valAnd training set S_traNumber of triples in (1), S_val ⁽ⁱ⁾Representation verification set S_valThe ith triplet (h) in (c)_i，r_i，t_i) And is

Is a triplet (h) with_i，r_i，t_i) Corresponding embedding vector, S_tra(j) Represents the training set S_traThe jth triplet (h) of (a)_j，r_j，t_j) And (h)_j，r_j，t_j) Is a triplet (h) with_j，r_j，t_j) Corresponding embedded vector, 1 ≤ i ≤ S_val|，1≤j≤|S_valAnd i and j are integers, SP (r, A)_d) Indicates having A_dRelational block matrix g of indicated structural weights_d(r)，

Indication of

Whether or not to belong to a relationship group CU_d，

Indication r_jWhether or not to belong to a relationship group CU_d，

is based on a structural weight matrix A_dUsing training set S_traIn-training set S obtained by model training of embedded models of knowledge graph_traThe embedding vector parameters of the embedding model with the least loss,

is to make the embedding model based on the embedding vector parameters in the verification set S_valAnd a set of structural weight matrices of the scoring function having the smallest loss, and each structural weight matrix in the set corresponds to a relationship block matrix corresponding to the each scoring function, respectively.

Optionally, the step of dividing the plurality of relationships in the knowledge-graph into a plurality of relationship groups may comprise: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.

Alternatively,

may be determined by the following operations based on expression (3): an initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }_dAnd relationship group assignment indication

Wherein 1 ≦ t ≦ R | and t is an integer, R is a set of relationships of the knowledge-graph, and | R | represents a number of relationships of the knowledge-graph,

indicating relationships r in the knowledge-graph_tWhether or not to belong to a relationship group CU_d(ii) a Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { A_dAnd relationship group assignment indication

Updating { A ] by performing at least one iterative operation_d}; to be used in the last iteration operation_dDiscrete samples of } are determined as

Each iteration of the operation may include the following operations: set of weighting matrices for structure { A_dSampling to obtain a set of discrete structure weight matrices

Based on the obtained

Updating the embedding vector X ═ { h, r, t }; assigning indications to relationship groups

Updating is carried out; to { A_dThe update is performed.

Optionally, a set of weight matrices for the structure { A_dThe step of sampling may comprise: as determined by expression (4)Fixed B_dRepeatedly decimating A as a sampling probability_dTo obtain satisfaction of the constraint

Of a discrete set of structural weight matrices

Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,

indicating operator set

Of 1 indicates that all are

1 column vector, C₂＝{a^(m，n)|||a^(m，n)||₀＝1}，

Optionally based on the obtained

The step of updating the embedding vector X ═ { h, r, t } may include: from the training set S_traSelecting a predetermined number of triplets as the mini-batch set B_tra(ii) a And updating the embedding vector X ═ { h, r, t } according to the following expression (5):

where eta is a preset step length, | B_traI denotes a small batch set B_traNumber of triples in (1), B_tra ^(j’)Representing a small batch set B_traThe jth triplet (h) of (1)_j’，r_j’，t_j’) And (h)_j’，r_j’，t_j’) Is a triplet (h) with_j’，r_j’，t_j’) The corresponding embedded vector is then used to generate the embedded vector,

indication r_j’Whether or not to belong to a relationship group CU_d，1≤j’≤|B_traAnd j' is an integer.

Optionally, an indication is assigned to the relationship group

The step of performing the update may comprise: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results

Alternatively, for { A_dThe step of updating may comprise: from the verification set S_valSelecting a predetermined number of triplets as the mini-batch set B_val(ii) a Pair set { A) according to the following expression (6)_dUpdating each matrix in the list:

wherein ε is a predetermined step length, | B_valI denotes a small batch set B_valNumber of triples in (1), B_val ^(i’)Representing a small batch set B_valThe ith' triplet (h) of (c)_i’，r_i’,t_i’) And (h)_i’，r_i’,t_i’) Is a triplet (h) with_i’，r_i’,t_i’) The corresponding embedded vector is then used to generate the embedded vector,

indication of

Whether or not to belong to a relationship group CU_d，1≤i’≤|B_valAnd i' is an integer.

According to another embodiment of the present invention, there is provided a relationship-awareness-based knowledge-graph embedding system, including: a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group; a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups; an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and a representation means configured to obtain an embedded representation of the knowledge-graph using the embedded model.

Alternatively, the scoring function may be expressed as:

C₁≡{0，1}，o^kis a set of operators

The (k) th operator in (2),

1≤k≤2K+1。

alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

Optionally, the search means is configured to determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions based on the following expression (2):

s.t.

wherein, g_dIs a reaction of with g_d(r) a corresponding matrix structure,

is a structure search space comprising a plurality of matrix structures, S_valIs a verification set, S_traIs a training set, and S_valAnd S_traRespectively is the knowledge-graphIs a subset of the triple set, | S_valI and I S_traRespectively representing a verification set S_valAnd training set S_traNumber of triples in (1), S_val ⁽ⁱ⁾Representation verification set S_valThe ith triplet (h) in (c)_i，r_i，t_i) And is

is based on a structural search space

is a structure search space

In the verification set S, an embedding model based on the embedding vector parameters_valA set of matrix structures of the scoring function having the smallest loss, and each matrix structure in the set is referred to respectivelyAnd showing a corresponding matrix structure of the relation block matrix corresponding to each scoring function.

Alternatively, the search means may be configured to determine the scoring function corresponding to each relationship group by: determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix; a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

Alternatively, the search means may be configured to determine the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):

s.t.

wherein,

Is a triplet (h) with_i，r_i，t_i) Corresponding embedding vector, S_tra ^(j)To representTraining set S_traThe jth triplet (h) of (a)_j，r_j，t_j) And (h)_j，r_j，t_j) Is a triplet (h) with_j，r_j，t_j) Corresponding embedded vector, 1 ≤ i ≤ S_val|，1≤j≤|S_valAnd i and j are integers, SP (r, A)_d) Indicates having A_dRelational block matrix g of indicated structural weights_d(r)，

Indication of

Whether or not to belong to a relationship group CU_d，

Indication r_jWhether or not to belong to a relationship group CU_d，

Alternatively, the relationship dividing means may be configured to divide the plurality of relationships into a plurality of relationship groups using a clustering method.

Alternatively, the search means may be configured to determine by the following operation based on expression (3)

An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }_dAnd relationship group assignment indication

Each iteration of the operation may include the following operations: set of structural weight matrices { A } by the search means_dSampling to obtain a set of discrete structure weight matrices

Based on obtained by the searching means

Updating the embedding vector X ═ { h, r, t }; assigning indications to relationship groups by a relationship partitioning means

Updating is carried out; bySearch device pair { A_dThe update is performed.

Alternatively, the search means may be configured to construct the set of weight matrices { A } by_dSampling: with B determined by expression (4)_dRepeatedly decimating A as a sampling probability_dTo obtain satisfaction of the constraint

Of a discrete set of structural weight matrices

Wherein, tau is equal to [0, 1 ]]Is a pre-set hyper-parameter which,

indicating operator set

1 indicates a column vector of all 1, C₂＝{a^(m，n)|||a^(m，n)||0＝1}，

。

Alternatively, the search means may be configured to update the embedding vector X ═ { h, r, t } by: from the training set S_traSelecting a predetermined number of triplets as the mini-batch set B_tra(ii) a The embedding vector X ═ { h, r, t } is updated according to the following expression (5):

Alternatively, the relationship partitioning apparatus may be configured to assign an indication to the relationship group by

Updating: assigning a plurality of relationships in the knowledge graph to different relationship groups using a clustering method; updating based on allocation results

Alternatively, the search apparatus may be configured to operate on { A } by_dUpdating: from the verification set S_valSelecting a predetermined number of triplets as the mini-batch set B_val(ii) a Pair set { A) according to the following expression (6)_dUpdating each matrix in the list:

wherein ε is a predetermined step length, | B_valI denotes a small batch set B_valNumber of triples in (1), B_val(i') denotes a small lot set B_valThe ith' triplet (h) of (c)_i’，r_i’，t_i’) And (h)_i’，r_i’，t_i’) Is a triplet ofh_i’，r_i’，t_i’) The corresponding embedded vector is then used to generate the embedded vector,

indication of

According to another embodiment of the present invention, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.

According to another embodiment of the present invention, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the aforementioned relationship-awareness based knowledge-graph embedding method.

Advantageous effects

By applying the knowledge graph embedding method and system based on the relational cognition according to the exemplary embodiment of the invention, SF of the relational cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system according to an exemplary embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method according to an exemplary embodiment of the present disclosure.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.

Before starting the description of the inventive concept below, for the sake of understanding, the various parameters and their expressions used in the present application will be explained first:

vectors are represented by lower case bold and matrices by upper case bold.

For a knowledge-graph, its set of entities and relationships are represented by ε and R, respectively. The triples in the knowledgegraph are represented by (h, R, t), where h e ε and t e ε are the head and tail entities, respectively, and R e R is the relationship.

The parameters of the knowledge-graph embedding model are denoted as e (for each entity) and r (for relationships). For simplicity, in the following, embedding (sometimes also referred to as embedding vector in the following) is represented by a bold-faced form of the corresponding parameter, e.g.,

is an embedding of a.

<a，b，c>Is a dot product and for a real-valued vector it equals

Whereas for a complex-valued vector it is the Hermitian product.

The diagonal matrix diag (b) is the diagonal matrix of b,

furthermore, in the context of the present disclosure, parameters having the same expression have the same definition.

f (h, r, t) is a scoring function that returns a real number value reflecting the similarity of the triples (h, r, t), with higher scores representing more similarity.

Fig. 1 is a block diagram illustrating a relationship awareness based knowledge-graph embedding system 100 according to an exemplary embodiment of the present disclosure.

Referring to fig. 1, the knowledge-graph embedding system 100 based on relationship recognition may include a relationship dividing means 110, a searching means 120, an embedding model training means 130, and a representing means 140.

The relationship dividing apparatus 110 according to an exemplary embodiment of the present disclosure may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups such that each relationship in the knowledge graph belongs to only one relationship group.

In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may divide the plurality of relationships in the knowledge graph into a plurality of relationship groups using a clustering method, however, it should be understood that the method of dividing the plurality of relationships in the knowledge graph is not limited thereto, and any other suitable method may be used to divide the relationships.

Hereinafter, for ease of understanding, the plurality of relationship groups may be represented as { CU }_d}，CU_dRepresenting the d-th relational group and arbitrary

R is a set of relationships in the knowledge graph, D is greater than or equal to 1 and less than or equal to D, and D is the number of the plurality of relationship groups.

The search means 120 may determine a scoring function corresponding to each relationship group based on the division result, and obtain a set of scoring functions for the plurality of relationship groups.

In an exemplary embodiment of the present invention, the form of the scoring function may be expressed as follows:

here, f_d(h, r, t) is the relation group CU_dAnd corresponding scoring functions, h, t and r respectively represent embedded vectors of a head entity h, a tail entity t and a relation r between h and t in the triplet (h, r, t) of the knowledge graph. H, t and r in expression (1) can be divided into K sub-embedded vectors h respectively according to the same division manner₁To h_K、r₁To r_KAnd t₁To t_K，1≤m≤K，1≤n≤K，And K is a positive integer. g_d(r) is a function f of the score_d(h, r, t) corresponding to a KxK relational block matrix,

C₁≡{0，1}，o^kis a set of operators

The (k) th operator in (2),

1≤k≤2K+1。

in the embodiment of the present invention, being divided in the same division manner means that K sub-embedded vectors h obtained by dividing the embedded vectors h, r, and t₁To h_K、r₁To r_KAnd t₁To t_KThe corresponding sub-embedding vectors have the same dimension, i.e., h₁、r₁And t₁Dimension is the same, h₂、r₂And t₂The dimensions are the same, and so on. Furthermore, in embodiments of the present invention, in partitioning, the embedding vectors h, r, and t may be partitioned uniformly (i.e., dimensions of each sub-embedding vector are the same, e.g., sub-embedding vector h₁To h_KSame dimension) or may be non-uniformly partitioned (i.e., dimensions of individual sub-embedding vectors are not exactly the same, e.g., sub-embedding vector h₁To h_KAre not all the same).

As can be seen from the above representation of the scoring functions, the relational block matrix g of the different scoring functions_dThe main difference between (r) is the sub-embedding vector r₁，...，±r_KDistribution of (2). Thus, after the embedded vectors h, r, and t are partitioned, they can be based on g_dNon-zero blocks in a K block matrix of (r) (i.e., sub-embedding vectors + -r)₁，...，±r_K) The distribution of the score is designed into a plurality of candidate scoring functionsNumbers, thereby constituting a scoring function search space. The process of the search means 120 determining the scoring function corresponding to each relationship group may be a process of searching the scoring function corresponding to each relationship group in the scoring function search space.

Therefore, in the embodiment of the present invention, in combination with the above-mentioned method for constructing the scoring function search space, the block matrix g of the correspondence relationship that the appropriate scoring function searched in the scoring function search space can be actually converted into the scoring function_d(r) finding a suitable matrix structure g_dIndicates the relationship block matrix g_dDistribution of non-zero blocks in (r) (i.e., sub-embedding vectors ± r)₁，…，±r_KDistribution structure in the block matrix), and may then be based on the block matrix structure g_dTo determine the relational block matrix g_d(r) whereby the block matrix g can be based on the determined relation_d(r) obtaining a corresponding scoring function.

Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a correspondence matrix structure of the relational block matrices corresponding to the respective scoring functions using the following expression (2):

s.t.

in the expression (2), g_dIs a reaction of with g_d(r) a corresponding matrix structure,

is a structure search space including a plurality of matrix structures corresponding to the above scoring function search space, S_valIs a verification set, S_traIs a training set, and S_valAnd S_traAre subsets of the triplet set of the knowledge-graph, | S_valI and I S_traRespectively representing a verification set S_valAnd training set S_traNumber of triples in (1), S_val ⁽ⁱ⁾Representation verification set S_valThe ith triplet (h) in (c)_i，r_i，t_i) And is

Is a triplet (h) with_i，r_i，t_i) Corresponding embedding vector, S_tra ^(j)Represents the training set S_traThe jth triplet (h) of (a)_j，r_j，t_j) And (h)_j，r_j，t_j) Is a triplet (h) with_j，r_j，t_j) Corresponding embedded vector, 1 ≤ i ≤ S_val|，1≤j≤|S_traAnd i and j are integers.

In addition to this, the present invention is,

is based on a structural search space

Matrix structure g in (1)_dUsing training set S_traIn-training set S obtained by model training of embedded models of knowledge graph_traThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.

Is a structure search space

In the verification set S, an embedding model based on the embedding vector parameters_valThe searching apparatus 120 may determine each matrix structure in the set as a corresponding matrix structure of the relationship block matrix corresponding to each scoring function.

Further, in the exemplary embodiment of the present invention, since

Thus, the relational block matrix g_dNon-zero blocks in (r) (i.e., sub-embedding vectors ± r)₁，...，±r_K) Is actually dependent on

Therefore, in an exemplary embodiment of the present invention, searching for an appropriate scoring function in the scoring function search space may be further converted into a corresponding relationship block matrix g of the scoring function_d(r) finding a suitable corresponding structural weight matrix

Indicating the relationship block matrix g_d(r) structural weights of the individual blocks in (r), and a relational block matrix g may then be determined based on the structural weight matrix_d(r) thereby enabling block matrix g based on the determined relationship_d(r) obtaining a corresponding scoring function.

Merely by way of example, in an exemplary embodiment of the present invention, the search apparatus 120 may determine a corresponding structure weight matrix of the relational block matrix corresponding to each scoring function based on the following expression (3):

s.t.

in expression (3), SP (r, A)_d) Indicates having A_dRelational block matrix g of indicated structural weights_d(r)，

Indication of

Whether or not to belong to a relationship group CU_d，

Indication r_jWhether or not to belong to a relationship group CU_d。

is based on a structural weight matrix A_dUsing training set S_traIn-training set S obtained by model training of embedded models of knowledge graph_traThe embedding vector parameters of the embedding model with the least loss. In exemplary embodiments of the present invention, the above-described model training may be performed using various suitable model training methods that are already known to those skilled in the art and may appear in the future, and thus will not be redundantly described for the sake of brevity.

Is to make the embedding model based on the embedding vector parameters in the verification set S_valThe searching apparatus 120 may determine each structural weight matrix in the set as corresponding to a relational block matrix corresponding to each scoring function, respectively.

Further, as shown in expressions (2) and (3), the above expressions (2) and (3) are actual packagesTwo levels of optimization are involved. However, it should be understood that, for example, when K is 4 (i.e., the embedding vectors h, r, and t are divided into 4 sub-embedding vectors, respectively), g is for a 4 × 4 block matrix_dIs provided with

There are possible structures (9 choices for each sub-block, i.e., 0, ± r)₁，...，±r₄) I.e. the structure search space

Included

Of the same structure, and likewise, A_dThere are also many possible forms. Therefore, when the matrix structure set is directly searched using the above expressions (2) and (3)

Or a collection of structural weight matrices

The search process can be quite complex and time consuming.

Thus, in exemplary embodiments of the present invention, search apparatus 120 may determine in expression (3) based on the following operations, by way of example only

(A) An initial embedding vector X is determined as { h, r, t }, a set of structural weight matrices { A }_dAnd relationship group assignment indication

indicating relationships r in the knowledge-graph_tWhether or not to belong to a relationship group CU_d. Here, the searcher 120 may determine the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a } using gaussian initialization or the like_dAnd relationship group assignment indication

It should be understood, however, that the present application is not limited thereto, and that any other suitable method may be used for initialization to obtain the initial embedded vector X ═ h, r, t, and the set of structural weight matrices { a }, as well_dAnd relationship group assignment indication

(B) Based on the initial embedded vector X ═ { h, r, t }, the set of structure weight matrices { A_dAnd relationship group assignment indication

Updating { A ] by performing at least one iterative operation_dAnd will be used in the last iteration operation { A }_dDiscrete samples of } are determined as

By way of example only, in expression (3) above, the determination may be based on the following example algorithm 1

Referring to algorithm 1, in an exemplary embodiment of the present invention, each iteration operation may include the following operations (B1) to (B4):

(B1) set of structural weight matrices { A } by the search means 120_dSampling to obtain a set of discrete structure weight matrices

(step 3 in algorithm 1).

Here, the search means 120 may determine B by the following expression (4)_dRepeatedly decimating A as a sampling probability_dTo obtain satisfaction of the constraint

Of a discrete set of structural weight matrices

In the expression (4), τ ∈ [0, 1 ]]Is a pre-set hyper-parameter which,

indicating operator set

1 indicates a column vector of all 1 s. Furthermore, C₂＝{a^(m，n)|||a^(m，n)||₀＝1}，

(B2) Based on obtained by the search means 120

The embedding vector X is updated to { h, r, t }.

In an exemplary embodiment of the present invention, when updating the embedded vector X ═ { h, r, t }, the embedded vector X may first be updated from the training set S_traSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set B_tra(step 4 in algorithm 1), and then updates the embedded vector X ═ { h, r, t } according to the following expression (5) (step 5 in algorithm 1):

where η is a preset step length, | B_traI denotes a small batch set B_traNumber of triples in (1), B_tra ^(j’)Representing a small batch set B_traThe jth triplet (h) of (1)_j’，r_j’，t_j’) And (h)_j’，r_j’，t_j’) Is a triplet (h) with_j’，r_j’，t_j’) The corresponding embedded vector is then used to generate the embedded vector,

(B3) Assigning indications to relationship groups by the relationship partitioning means 110

And (6) updating.

In an exemplary embodiment of the present invention, the relationship dividing apparatus 110 may assign a plurality of relationships in the knowledge-graph to different relationship groups using a clustering method, and then update the relationships based on the assignment result

Here, when

After being determined, the expression used in the preceding expression

And

can be determined accordingly.

For example only, the relational division apparatus 110 may implement clustering (step 6 in algorithm 1) according to the following expression (6):

wherein r is_tIs a relation R in a set of relations R with the knowledge-graph_tCorresponding embedded vector, c_dIs a relationship group CU_dVector representation of b_dtRepresents CU_dAnd r_tDegree of membership between.

In an exemplary implementation of the present invention, the EM algorithm may be used to obtain a solution to expression (6) above to determine the assignment of the relationship. Specifically, in the EM algorithm, the E step shown in the following expression (7) and the M step shown in the following expression (8) may be iteratively performed until the cluster group (i.e., the relationship group) of the clusters converges, thereby obtaining the assignment result of the relationship:

after completion of clustering, based on the clustering result { b }_dtGet the value of arg max if d_d′b_d′tThen make it possible to

Otherwise make the

Thereby obtainingCan be updated

(B4) By the search apparatus 120 for { A_dThe update is performed.

In an exemplary embodiment of the present invention, in the pair { A }_dIn the updating step, the verification set S can be firstly selected_valSelecting (e.g., randomly selecting) a predetermined number of triplets as the mini-batch set B_val(step 7 in Algorithm 1), and then pair the set { A) according to the following expression (9)_dUpdate each matrix in (steps 8 to 10 in algorithm 1):

here, ε is the preset step size, | B_valI denotes a small batch set B_valNumber of triples in (1), B_val ^(i’)Representing a small batch set B_valThe ith' triplet (h) of (c)_i’，r_i’，t_i’) And (h)_i’，r_i’，t_i’) Is a triplet (h) with_i’，r_i’，t_i’) The corresponding embedded vector is then used to generate the embedded vector,

indication of

Whether or not to belong to a relationship group CU_d，1≤i′≤|B_valAnd i' is an integer.

In an exemplary embodiment of the invention, the last iteration in algorithm 1 is performed as determined in step 3

Can be determined to be finally obtained

Further, although it is shown in the above description that the embedded vector X ═ h, r, t }, the set of structural weight matrices { a } are sequentially aligned in each iterative update_dAnd relationship set assignment indications

The update is performed, but the present invention is not limited thereto, and the embedded vector X ═ h, r, t, and the set of structure weight matrices { a }_dAnd relationship set assignment indications

The update order of the three can be arbitrarily set, and is not limited to the order in the above algorithm 1.

Furthermore, in an exemplary embodiment of the present invention, the set of structural weight matrices { a } is applied to the embedded vector X ═ { h, r, t }_dAnd relationship set assignment indications

The embedded vector X involved in the update process is { h, r, t }, the set of structure weight matrices { a }_dAnd relationship set assignment indications

The update result in the last iteration operation may be also the update result of the corresponding parameter in the current iteration operation. For example, in algorithm 1 shown above, set A is paired in step 9_dThe embedding vector h, r, t used in the expression (9) for updating may be the embedding vector updated in the last iteration operation, or the embedding vector updated in step 5 in the current iteration operation.

It should be understood that the various algorithms used in the above steps (B1) to (B4) of the iterative operations or the various specific calculation methods shown in the form of expressions are only examples listed for the convenience of understanding the present application, the present application is not limited thereto, and the operations of the steps (B1) to (B4) may be accomplished using other methods.

Further, it should also be understood that, in the exemplary embodiment of the present invention, the searching means 120 may search for the corresponding scoring function for each of the divided relationship groups after the relationship dividing means 110 completes the division of the relationship groups, however, the present application is not limited thereto, and as shown in the above algorithm 1, the relationship dividing means 110 may also iteratively update the division of the relationship groups based on the optimization result of the searching means 120 in the process of the searching means 120 searching for the scoring function, thereby enabling the knowledge graph embedding system 100 based on relationship recognition of the present application to search for the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative update.

After searching out the scoring functions of the respective relationship groups, the embedded model training device 130 may train the embedded model of the knowledge-graph based on the obtained set of scoring functions, and the representing device 140 may obtain the embedded representation of the knowledge-graph using the embedded model.

Further, although not shown in fig. 1, the knowledge-graph embedding system 100 based on relationship awareness according to an exemplary embodiment of the present disclosure may further include: a machine learning model training unit (not shown) for training a machine learning model based on the obtained embedded representation of the knowledge graph to obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution; and a prediction unit (not shown) for performing a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.

Fig. 2 is a flowchart illustrating a relationship awareness based knowledge-graph embedding method 200 according to an exemplary embodiment of the present disclosure.

As shown in fig. 2, in step S210, the plurality of relationships in the knowledge-graph may be divided into a plurality of relationship groups by the relationship dividing unit 110 such that each relationship in the knowledge-graph belongs to only one relationship group.

In step S220, the search unit 120 may determine a scoring function corresponding to each relationship group based on the division result, obtaining a set of scoring functions for the plurality of relationship groups.

Thereafter, the obtained set of scoring functions may be used by the embedded model training unit 130 to train the embedded model of the knowledge-graph at step S230, and the representation unit 140 may obtain an embedded representation of the knowledge-graph using the embedded model at step S240.

It should be understood that, in the exemplary embodiment of the present invention, the respective scoring functions may be searched for the divided relationship groups in step S220 after the division of the relationship groups is completed in step S210, however, as shown in the above algorithm 1, the execution order of step S210 and step S220 is not limited thereto, and the two steps may also be executed together, and the division of the relationship groups may be iteratively updated based on the optimization result of searching the scoring functions in the process of searching the scoring functions, so that the knowledge graph embedding method 200 based on relationship recognition of the present application can search out the optimal scoring function for each relationship group while obtaining the optimal relationship group division through such iterative updating.

The specific processes of detailed operations performed by the above-mentioned components of the knowledge-graph embedding system 100 based on relationship awareness according to the exemplary embodiment of the present disclosure have been described in detail above with reference to fig. 1, and therefore, for brevity, will not be described again here.

Furthermore, the knowledge graph embedding method based on relationship awareness according to the exemplary embodiment of the present disclosure may train a machine learning model based on the embedded representation of the knowledge graph obtained in step S240, obtain a target machine learning model for performing at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution, and may perform a prediction task using the target machine learning model, wherein the prediction task includes at least one of relationship retrieval, semantic retrieval, intelligent recommendation, intelligent question answering, personalized recommendation, and content distribution.

That is, the knowledge-graph embedding method and system based on relationship awareness of the exemplary embodiments of the present disclosure may be applied to various fields, such as relationship retrieval, semantic retrieval, smart recommendation, smart question answering, personalized recommendation, anti-fraud, content distribution, and the like.

By way of example only, among various application scenarios of the knowledge graph embedding method and system based on relationship awareness according to exemplary embodiments of the present disclosure, for example, for retrieval (such as relationship retrieval, semantic retrieval, intelligence, etc.), the relationship between them may be retrieved or the corresponding other entity may be retrieved by inputting two keywords, for example, inputting (beijing china) may retrieve that the relationship between them is "capital" (i.e., beijing is the capital of china), or inputting (mother Zhang III) may retrieve another entity "Li IV" (mother Zhang III).

For example, for intelligent question-answering, input "where are the capital of china? The user can accurately return to Beijing, so that the intention of the user can be really understood through the knowledge graph.

For example, for anti-fraud, when information about a borrower (entity) is added to the knowledge-graph, it may be determined whether there is a risk of fraud by reading the relationship between the borrower and others in the knowledge-graph, or whether the information they share is consistent.

For example, for intelligent recommendations (e.g., personalized recommendations), similar content may be recommended to entities of triples having similar relationships. For example, for (three students, high and middle) (i.e., three is a student in high and middle), three may be recommended based on information of other students in high and middle in the knowledge-graph.

In the above different applications of the knowledge graph, evaluation indexes for judging whether the knowledge graph has been properly applied are also different. For example, for search applications, the evaluation index is generally the overall rate and accuracy of the search, for anti-fraud, the evaluation index is generally the credit, probability of fraud, etc., and for intelligent question answering and intelligent recommendation, the evaluation index is the satisfaction or accuracy, etc. Therefore, the evaluation index of the knowledge-graph embedded model is generally determined according to different application scenarios of the knowledge-graph embedded model, and a corresponding scoring function is designed accordingly, so that the embedded model which trains a better knowledge graph can be used by utilizing a better scoring function. According to the scoring function searched out according to the exemplary embodiment of the invention, the best scoring function model can be found by automatically combining the evaluation indexes in the searching process, and the inconvenience of manually designing the scoring function is eliminated. In addition, since the scoring function search space can cover all possible scoring function forms, the method is favorable for expanding the search range so as to find a better scoring function for the knowledge graph.

By applying the knowledge graph embedding method and system based on the relationship cognition, the scoring function of the relationship cognition can be effectively determined for a given task without multiple model training, and the method and system have unsophisticated performances in the aspects of performance and efficiency.

A relationship awareness based knowledge graph embedding method and system according to an exemplary embodiment of the present disclosure has been described above with reference to fig. 1 to 2. However, it should be understood that: the apparatus and systems shown in the figures may each be configured as software, hardware, firmware, or any combination thereof that performs the specified function. For example, the systems and apparatuses may correspond to an application-specific integrated circuit, a pure software code, or a module combining software and hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).

Further, the above method may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present application, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.

The instructions stored in the computer-readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that the instructions can also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are mentioned in the description of the related method with reference to fig. 1 to 2, and therefore will not be described again here to avoid repetition.

It should be noted that the knowledge-graph embedding system based on relationship awareness according to the exemplary embodiments of the present disclosure may fully rely on the execution of computer programs or instructions to realize corresponding functions, i.e., each device corresponds to each step in the functional architecture of the computer programs, so that the whole system is called by a special software package (e.g., a lib library) to realize the corresponding functions.

On the other hand, when the system and apparatus shown in fig. 1 are implemented in software, firmware, middleware or microcode, program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, according to an exemplary embodiment of the present application, a system may be provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the steps of: dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group; determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups; training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and obtaining an embedded representation of the knowledge-graph using the embedded model.

In particular, the above-described system may be deployed in a server or client or on a node in a distributed network environment. Further, the system may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions. In addition, the system may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). In addition, all components of the system may be connected to each other via a bus and/or a network.

The system here need not be a single system, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The system may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the system, the at least one computing device may comprise a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.

While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims

1. A relationship awareness-based knowledge graph embedding method, the method comprising:

dividing a plurality of relations in a knowledge graph into a plurality of relation groups, wherein each relation in the knowledge graph only belongs to one relation group;

determining a scoring function corresponding to each relationship group based on the division result, and obtaining a scoring function set aiming at the plurality of relationship groups;

training an embedded model of the knowledge-graph based on the obtained set of scoring functions; and

obtaining an embedded representation of the knowledge-graph using the embedded model.

2. The method of claim 1, wherein the scoring function is represented as:

wherein the plurality of relationships in the knowledge-graph are divided into a plurality of relationship groups { CU }_d}，f_d(h, r, t) is the relation group CU_dD is more than or equal to 1 and less than or equal to D, D is the number of the plurality of relation groups,

h. t and r represent the embedding of a head entity h, a tail entity t and a relation r between h and t in a triplet (h, r, t) of the knowledge-graph, respectivelyVector, and h, t and r are divided into K sub-embedded vectors h respectively in the same division manner₁To h_K、r₁To r_KAnd t₁To t_KM is more than or equal to 1 and less than or equal to K, n is more than or equal to 1 and less than or equal to K, and K is a positive integer,

g_d(r) is a function f of the score_d(h, r, t) corresponding to a KxK relational block matrix,

C₁≡{0，1}，o^kis a set of operators

The (k) th operator in (2),

3. the method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:

determining a corresponding matrix structure of a relationship block matrix corresponding to each scoring function, wherein the matrix structure indicates the distribution of non-zero blocks in the corresponding relationship block matrix;

a correspondence block matrix is determined based on the determined matrix structure, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

4. The method according to claim 3, wherein the correspondence matrix structure of the relational block matrix corresponding to each scoring function is determined based on the following expression (2):

s.t.

wherein, g_dIs a reaction of with g_d(r) a corresponding matrix structure,

is based on a structural search space

is a structure search space

5. The method of claim 2, wherein determining a scoring function corresponding to each relationship group comprises:

determining a corresponding structural weight matrix of a relationship block matrix corresponding to each scoring function, wherein the structural weight matrix indicates the structural weight of each block in the corresponding relationship block matrix;

a correspondence block matrix is determined based on the determined structural weight matrix, and a correspondence scoring function is obtained based on the determined correspondence block matrix.

6. The method according to claim 5, wherein the corresponding structural weight matrix of the relational block matrix corresponding to each scoring function is determined based on expression (3):

s.t.

wherein,

representation and relation block matrix g_d(r) a corresponding structural weight matrix,

S_valis a verification set, S_traIs a training set, and S_valAnd S_traAre subsets of the triplet set of the knowledge-graph, | S_valI and I S_traRespectively representing a verification set S_valAnd training set S_traNumber of triples in (1), S_val ⁽ⁱ⁾Representation verification set S_valThe ith triplet (h) in (c)_i，r_i，t_i) And is

Is a triplet (h) with_i，r_i，t_i) Corresponding embedding vector, S_tra ^(j)Represents the training set S_traThe jth triplet (h) of (a)_j，r_j，t_j) And (h)_j，r_j，t_j) Is a triplet (h) with_j，r_j，t_j) Corresponding embedded vector, 1 ≤ i ≤ S_val|，1≤j≤|S_valAnd i and j are integers,

SP(r，A_d) Indicates having A_dRelational block matrix g of indicated structural weights_d(r)，

Indication of

Whether or not to belong to a relationship group CU_d，

Indication r_jWhether or not to belong to a relationship group CU_d，

7. The method of claim 1 or 6, wherein the step of partitioning the plurality of relationships in the knowledge-graph into a plurality of relationship groups comprises: the plurality of relationships are partitioned into a plurality of relationship groups using a clustering method.

8. A relationship-awareness-based knowledge-graph embedding system, the system comprising:

a relationship dividing means configured to divide a plurality of relationships in a knowledge graph into a plurality of relationship groups, wherein each relationship in the knowledge graph belongs to only one relationship group;

a search device configured to determine a scoring function corresponding to each relationship group based on the division result, and obtain a scoring function set for the plurality of relationship groups;

an embedded model training device configured to train an embedded model of the knowledge-graph based on the obtained set of scoring functions; and

a representation device configured to obtain an embedded representation of the knowledge-graph using the embedded model.

9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.

10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.