CN113449047A

CN113449047A - Method and device for complementing knowledge graph

Info

Publication number: CN113449047A
Application number: CN202110774126.4A
Authority: CN
Inventors: 霍宏; 段昊; 刘文轩; 李增; 刘京东; 潘鼎; 方涛
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-09-28

Abstract

A method and apparatus for complementing knowledge map, through introducing 1 x 1 standard convolution to zoom according to certain proportion in the characteristic matrix, has improved the precision that the characteristic represents; a large amount of correlation characteristics between entities and relations are extracted by using circular convolution and cavity convolution, so that more deep-level characteristics are learned by a model, the performance of a completion model is optimized, and the accuracy of knowledge graph completion is improved; and residual learning is introduced to fuse the original characteristic information, so that the accuracy of knowledge graph completion is further improved.

Description

Method and device for complementing knowledge graph

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a knowledge graph complementing method and a knowledge graph complementing device.

Background

Knowledge Graph (knowledgegraph) is a structured fact representation that represents a fact in the real world by a triplet of two entities and their relationships. With the continuous enlargement of the scale of the knowledge graph, the knowledge graph tends to be incomplete more and more, and a large number of missing and hidden facts exist. The completion of the knowledge graph aims to search missing parts in the knowledge graph and excavate hidden entities and relations among the hidden entities, so that information contained in the knowledge graph is richer and more complete. Fig. 1 is an example diagram of knowledge-graph completion, as shown in fig. 1, two pieces of knowledge, namely, knowledge (wind turbine generator system failure, including blade failure) and knowledge (blade breakage, resulting in blade failure) are known, and a "resulting" relationship between "blade breakage" and "wind turbine generator system failure" is found through knowledge-graph completion, so that a hidden fact (blade breakage, resulting in wind turbine generator system failure) is found (a dotted line with an arrow in fig. 1 represents new knowledge found through knowledge-graph completion); on the basis of the knowledge (blade breakage, generation and abnormal vibration signal) and the mined (blade breakage and wind generating set fault) fact, the hidden fact (abnormal vibration signal, mark and wind generating set fault) can be further mined through knowledge map completion, and therefore the knowledge map is further enriched.

The incomplete knowledge-graph will seriously affect the further application of the knowledge-graph, so the incomplete knowledge-graph needs to be complemented. Some existing knowledge graph completion methods (such as the TransE series) do not consider semantic relevance between entities and relations, do not capture relevance characteristics between the entities and the relations, or capture few relevance characteristics, and have poor completion performance.

Disclosure of Invention

The invention provides a knowledge graph complementing method and device aiming at the defects of high calculation cost and low complementing accuracy of the conventional knowledge graph complementing method, and solves the problem of incomplete knowledge graph.

The invention is realized by the following technical scheme:

the invention relates to a knowledge graph complementing method, which comprises the following steps:

step 1: all triples of the knowledge graph to be complemented are marked as positive triples, namely a correct triplet set consisting of two head entities and tail entities which are determined to be connected by a certain relationship and the relationship. Constructing a corresponding negative triple according to the positive triple set, namely randomly replacing the head entity or the tail entity of the positive triple with other entities to obtain an erroneous triple set, and finally generating a training set and a testing set for training and testing the knowledge graph spectrum completion model;

the triplet is (h, r, t), wherein: h and t are respectively a head entity and a tail entity, r is the relation between the two entities, and all triples existing in the default knowledge graph are positive triples.

For example, a positive triplet (blade breakage, resulting in wind turbine generator system failure), an abnormal vibration signal, resulting in wind turbine generator system failure, and a negative triplet (blade breakage, resulting in generator failure).

The step 1 specifically comprises:

step 1.1: marking all triples of the knowledge graph to be complemented as a positive triplet set G₊G is₊Divided into training sets

And test set

Step 1.2: respectively in the training set

And test set

According to the positive triple set, corresponding negative triple set is constructed

And

step 1.3: generating a final training set and G^TrainAnd test set G^TestWherein:

step 2: initializing embedded vectors for all triplet head, relationship, and tail entities in a training set

Inputting the embedding vector of the training set triplet into a completion model, calculating the similarity score of each triplet through the completion model, further calculating the accuracy of each triplet by using the similarity score, and training the model, wherein the method specifically comprises the following steps:

step 2.1: initializing embedded vectors for all triplet head, relationship and tail entities of the training set

Wherein:

k represents the dimension of the embedding vector;

step 2.2: to pair

Performing two-dimensional remodeling, and converting the two-dimensional remodeling into two-dimensional matrixes with the same size

Then will be

The two matrices are connected into a new matrix,

the matrix is on the top of the new matrix,

the matrix is at the lower part of the new matrix and is represented as

[；]Represents a matrix connection, k ═ mxn;

step 2.3: performing convolution on the matrix obtained in the step by using multi-class convolution, and capturing correlation characteristics between the head entity and the relation;

the multi-class convolution comprises: standard convolution, circular convolution and hole convolution, wherein: the standard convolution is a normal convolution, and the circular convolution and the hole convolution are variants of the normal convolution, and the normal convolution is called the standard convolution for distinguishing; the multi-class convolution specifically includes: the first layer is 1 × 1 standard convolution and 3 × 3 pooling; the second layer is a 4 × 4 circular convolution; the third layer is a convolution of 5 × 5 standard convolution and 3 × 3 holes; the method specifically comprises the following steps:

step 2.3.1: first layer pair

Three parallel 1 × 1 standard convolutions are performed and

performing average pooling operation with a window size of 3 × 3 to obtain

Wherein:

convolution kernels representing three 1 × 1 standard convolutions, respectively, represent standard convolution operations, and p (-) represents an average pooling operation.

Compared with the standard convolution, the circular convolution can perform convolution on the upper edge and the lower edge, and the left edge and the right edge of the matrix, so that more features can be extracted; the convolution mode of the hole convolution is the same as that of the standard convolution, but the hole convolution is to fill 0 on the convolution kernel of the standard convolution according to a fixed interval, so that the receptive field can be expanded and the features in a wider range can be extracted under the condition of not increasing the parameter quantity.

Step 2.3.2: performing three-time parallel circular convolution operation on the feature matrix obtained after convolution and pooling of the first layer at the second layer, introducing a nonlinear activation function ReLU, and using f₁Represents; taking the feature matrix obtained by the first 1 × 1 standard convolution of the first layer as original information, performing no convolution operation on the original information, and using the original information for residual learning to obtain:

wherein:

the convolution kernels of the second layer, the third 4 x 4 circular convolution, respectively, are represented, and Θ represents the circular convolution operation.

Because the feature matrix obtained by the first 1 × 1 standard convolution of the first layer retains the original feature information, the original feature information can be merged into the final feature matrix by connecting the feature matrix with the feature matrix after the convolution of the third layer by using residual learning.

Step 2.3.3: performing convolution operation on the feature matrix obtained after the convolution of the second layer at the third layer by respectively using 3 × 3 hollow convolution, 5 × 5 standard convolution and 3 × 3 hollow convolution, introducing a nonlinear activation function ReLU, and using f₂To show, we get:

by H₁It is shown that,

by H₂It is shown that,

by H₃Is shown, in which:

convolution kernels representing a 3 × 3 hole convolution, a 5 × 5 standard convolution and a 3 × 3 hole convolution of the third layer, respectively;

step 2.4: residual error learning is carried out on the feature matrix obtained by the first 1 × 1 standard convolution in the first layer in the step 2.3.1 and the feature matrix obtained by the convolution in the third layer in the step 2.3.3, so as to obtain a final entity and relation correlation feature matrix which is represented by H; the implementation method of residual learning is a connection operation, and the following steps are obtained:

wherein:

is positioned at the leftmost part of the H,

to the right of (H)₂At H₁To the right of (H)₃At H₂Right of (d), right most of H;

step 2.5: vectorizing the characteristic matrix H obtained in the step 2.4, remolding the characteristic matrix H into a one-dimensional vector, and transforming the vector into a k-dimensional vector through a transformation matrix obtained by random initialization to obtain: f. of₃(Vec(H·W∈R^k) Wherein: f. of₃A non-linear activation function ReLU, W being a transformation matrix;

step 2.6: embedding the vector obtained in the step 2.5 and the tail entity in the triple

And (3) performing inner product to obtain the similarity score of the triple, and converting the similarity score into a prediction probability P by using a sigmoid function, wherein the method specifically comprises the following steps:

p ═ g (score), where: g represents a sigmoid function;

step 2.7: in the step 2.6, in order to improve the efficiency of the model parameters, an entity embedding matrix composed of the vectors obtained in the step 2.6 and N entity embedding vectors in the training set is considered

And performing inner product so as to obtain the correct rate of the N triples, wherein:

step 2.8: and (3) calculating loss values of all the triples by using the correct rate of the triples obtained in the step (2.7) through a loss function, finishing the model training if the loss values are within a set threshold range, and skipping to the step (3), otherwise, returning to the step (2) to re-execute the steps (2.1-2.8).

The loss function is a standard cross entropy loss function.

And step 3: testing the trained model by using the triad group concentrated in the test, testing the performance of the completion model, and evaluating the performance of the model by using the average reciprocal ranking and the proportion of the number of correct triples in the first triples, the first triples and the third triples according to the completion result, wherein the method specifically comprises the following steps:

step 3.1: taking the embedded vector of the head entity and the relation of each triplet in the test set as the input of a trained model, taking N entities in the test set as tail entities for completion, and finally calculating the triplet with the highest prediction probability, namely the triplet with correct completion;

step 3.2: and calculating the performance of the MRR, Hist @10, Hits @3 and Hits @1 index evaluation model according to the completion result, wherein: the MRR ranks the triples from high to low according to the scores of the triples, and then calculates the average value of the sums of the inverses of the ranks of all the triples, wherein the higher the MRR value is, the higher the completion accuracy rate is; the list comprises a top ten triplet, a top three triplet and a first triplet, wherein the top 10 triplet, the top 3 triplet and the top 1 triplet are respectively the proportion of the correct triplets, and the higher the top 10 triplet, the top 3 triplet and the top 1 triplet are, the higher the completion accuracy is.

And 4, step 4: taking all negative triple sets as triple sets to be verified, inputting the triples in the triple sets to be verified into a completion model to calculate the accuracy of the triples, and completing on a knowledge graph spectrum according to a calculation result, wherein the method specifically comprises the following steps:

step 4.1: collecting the training set negative triple set in the step 1

And test set negative triple set

Negative triplets G combined into a whole_-；

Step 4.2: g is to be_-Inputting the triples into a completion model, and calculating the accuracy of each triplet through the completion model; if the calculated accuracy is larger than or equal to a preset confidence value, the triple is considered to be correct, and then completion is carried out on the original knowledge graph according to the triple; and if the calculated accuracy is lower than the preset confidence value, the triple is considered to be wrong, and the completion is not carried out.

The invention relates to a knowledge map spectrum complementing device, which comprises:

the generating unit is used for generating a training set and a test set for training and testing the knowledge graph spectrum completion model according to the to-be-completed knowledge graph spectrum triple;

the training unit is used for training the completion model;

the testing unit is used for testing the performance of the trained compensation model;

and the completion unit is used for verifying the correctness of the triple set to be verified and completing the known graph spectrum to be completed according to the verification result.

Technical effects

Compared with the prior art, the method and the device for complementing the knowledge graph, provided by the invention, have the advantages that 1 x 1 standard convolution is introduced to scale the feature matrix according to a certain proportion, so that the accuracy of feature representation is improved; a large amount of correlation characteristics between entities and relations are extracted by using circular convolution and cavity convolution, so that more deep-level characteristics are learned by a model, the performance of a completion model is optimized, and the accuracy of knowledge graph completion is improved; and residual learning is introduced to fuse the original characteristic information, so that the accuracy of knowledge graph completion is further improved.

Drawings

FIG. 1 is a diagram of an example knowledge graph completion scheme provided in this embodiment;

fig. 2 is a schematic structural diagram of a knowledge-graph compensation model provided in this embodiment;

FIG. 3 is a schematic diagram of a training process of the knowledge-graph completion model provided in this embodiment;

FIG. 4 is a schematic flow chart of a knowledge-graph completion method according to this embodiment;

fig. 5 is a schematic structural diagram of a knowledge-map complementing device provided in this embodiment;

Detailed Description

As shown in fig. 4, the method for supplementing a knowledge graph according to this embodiment includes:

step 1: recording all triples of the knowledge graph to be supplemented as positive triplet sets, constructing corresponding negative triplet sets according to the positive triplet sets, and finally generating a training set and a testing set for training and testing the knowledge graph spectrum supplementation model;

step 1 in this embodiment uses some existing public knowledge graph completion data sets, such as FB15k-237 and WN18RR, to construct a triple set required for training and testing a knowledge graph completion model; the method specifically comprises the following steps:

step 1.1: the positive triplets in the dataset are partitioned into a training set and a test set. Wherein: the training set constructed by the FB15k-237 data set contains 272115 positive triples, and the test set contains 37981 positive triples; the training set constructed from the WN18RR data set contained 86835 positive triplets and the test set contained 6168 positive triplets.

Step 1.2: and (3) for each positive triple in the training set and the test set positive triple set, randomly replacing a head entity h or a tail entity t with other entities to obtain a corresponding negative triple set. Generating a final training set by the training set positive triple set and the constructed negative triple set; and generating a final test set by the test set positive triple set and the constructed negative triple set.

Step 1.3: extracting a fixed number of positive triples and corresponding negative triples from the training set triplet set to form a batch training set triplet required by iteration_batch. Wherein: batch is the number of triples per batch, typically 64, 128, 256, etc., 128 being taken in the specific experiment.

Inputting the embedded vector of the training set triplet into a completion model, calculating the similarity score of each triplet through the completion model, further calculating the accuracy of each triplet by using the similarity score, and training the model;

fig. 2 is a schematic structural diagram of a knowledge graph completion model provided in this embodiment, and fig. 3 is a schematic training flow diagram of the knowledge graph completion model provided in this embodiment, as shown in fig. 2 and fig. 3; in the specific implementation, the method comprises the following steps:

step 2.1: initialization of batch training set Triple with the xavier _ normal _ method_batchEmbedded vectors of all triples (h, r, t) in (c)

k represents the dimension of the embedded vector, and is generally 50,100,200 and the like, and in a specific experiment, k is 200;

step 2.2: to pair

Then will be

The two matrices are connected into a new matrix,

the matrix is on the top of the new matrix,

the matrix is at the lower part of the new matrix and is represented as

[；]Represents a matrix connection, k ═ mxn; k is m × n, since k is 200 in the experiment, m is 20, and n is 10;

step 2.3: performing convolution on the matrix obtained in the step by using multi-class convolution, and capturing correlation characteristics between the head entity and the relation; the multi-class convolution comprises standard convolution, circular convolution and cavity convolution; the standard convolution is a normal convolution, the circular convolution and the void convolution are variants of the normal convolution, and the normal convolution is called the standard convolution for distinguishing;

wherein: the multi-class convolution is divided into three layers, wherein the first layer is a 1 multiplied by 1 standard convolution and 3 multiplied by 3 pooling; the second layer is a 4 × 4 circular convolution; the third layer is a 5 × 5 standard convolution and a 3 × 3 hole convolution; the method specifically comprises the following steps:

step 2.3.1: in the first layer pair

Three parallel 1 × 1 standard convolutions are performed and

performing an average pooling operation with a window size of 3 × 3, resulting in:

wherein:

convolution kernels of three times of 1 × 1 standard convolution are respectively represented, the number of the convolution kernels in a specific experiment is respectively 32, 1 and 1, the standard convolution operation is represented, and p (-) represents the average pooling operation;

step 2.3.2: performing three-time parallel circular convolution operation on the feature matrix obtained after convolution and pooling of the first layer at the second layer, introducing a nonlinear activation function ReLU, and using f₁Represents; taking the feature matrix obtained by the first 1 × 1 standard convolution of the first layer as original information, performing no convolution operation on the original information, and using the feature matrix for residual error learning:

wherein:

respectively representing convolution kernels of the second layer of the third 4 multiplied by 4 circular convolution, wherein theta represents circular convolution operation, and the number of the convolution kernels in a specific experiment is 32;

step 2.3.3: performing convolution operation on the feature matrix obtained after the convolution of the second layer at the third layer by respectively using 3 × 3 hollow convolution, 5 × 5 standard convolution and 3 × 3 hollow convolution, introducing a nonlinear activation function ReLU, and using f₂Specifically, the method comprises the following steps:

by H₁It is shown that,

by H₂It is shown that,

by H₃Is shown, in which:

convolution kernels of a third layer of convolution kernels of 3 x 3 holes, 5 x 5 standard convolution and 3 x 3 holes are respectively represented, and the number of the convolution kernels in a specific experiment is 32;

step 2.4: residual error learning is carried out on the feature matrix obtained by the first 1 × 1 standard convolution in the first layer in the step 2.3.1 and the feature matrix obtained by the convolution in the third layer in the step 2.3.3, so as to obtain a final entity and relation correlation feature matrix which is represented by H; the implementation method of residual learning comprises the following steps:

wherein:

is positioned at the leftmost part of the H,

step 2.5: vectorizing the characteristic matrix H obtained in the step 2.4, remolding the characteristic matrix H into a one-dimensional vector, and transforming the vector into k as a vector through a transformation matrix obtained by random initialization: f. of₃(Vec(H·W∈R^k) Wherein: f. of₃Is a nonlinear activation function ReLU, W is a transformation matrix;

wherein: g represents a sigmoid function;

Performing inner product, thereby obtaining the accuracy of N triples; the method specifically comprises the following steps:

The loss function is a standard cross entropy loss function

Wherein: n is the number of entity embedded vectors, if the input triple is a positive triple, t is_iGet 1, otherwise t_iTake 0.

In a specific experiment, updating by adopting an Adam optimization algorithm; the optimal selection is performed when the loss value of the loss function reaches convergence, and the corresponding training iteration time epoch is generally 1000 times after repeated tests.

And step 3: and testing the trained model by using the ternary group in the test set, testing the performance of the completion model, and evaluating the performance of the model by using MRR, Hits @10, Hits @3 and Hits @1 indexes according to the completion result. In the specific implementation, the method comprises the following steps:

step 3.1: initialization of each triplet embedding vector in a test set using the xavier normal method

And taking the embedded vector of the triplet as the input of the trained completion model, outputting a completion result, and finally calculating the triplet with the highest accuracy, namely the triplet with the correct completion, so as to test the completion accuracy of the completion model.

Step 3.2: and calculating the performance of performance index evaluation models such as MRR, Hits @10, Hits @3, hit @1 and the like according to the completion result, wherein: MRR is average reciprocal ranking, which is to rank the triples from high to low according to the scores of the triples, and then calculate the average value of the sum of the reciprocals of the ranking of all the triples, wherein the higher the MRR value is, the higher the completion accuracy is; the term "hit @ 10" refers to the proportion of the number of the correct triples in the top ten triples, and similarly, the terms "Hits @ 3" and "Hits @ 1" indicate that the higher the number of the triples, i.e., the higher the "Hits @ 10", the "Hits @ 3" and the "Hits @ 1" indicates the higher the completion accuracy. The completion performance index values of different knowledge-graph completion models are shown in table 1:

TABLE 1 comparison of the present Process with the existing Process

As can be seen from the table, the method performs better on most performance indexes than other methods, and the specific expression is as follows: on the FB15k-237 data set, the performance is slightly worse than that of the ConvKB method only on the H @3 index, and is better than that of other methods on all indexes; the WN8RR dataset performed better than each of the methods except for slightly worse performance on each index than the ConvTransE method, slightly worse on the H @10 index than the ConvE, ConvKB methods, and better on all other indexes. Therefore, the comparison of the results of the two data sets is combined, and the method is superior to other methods in the aspect of knowledge spectrum completion.

And 4, step 4: all negative triple sets are used as triple sets to be verified, the triples in the triple sets to be verified are input into a completion model to calculate the accuracy of the triples, and completion is performed on a knowledge graph spectrum according to the calculation result; in the specific implementation, the method comprises the following steps:

step 4.1: will carry out the stepsTraining set negative triplet set in step 1

And test set negative triple set

Negative triplets G combined into a whole_-，

Step 4.2: g is to be_-Inputting the triples into a completion model, and calculating the accuracy of each triplet through the completion model; if the calculated accuracy is larger than or equal to a preset confidence value, the triple is considered to be correct, and then completion is carried out on the original knowledge graph according to the triple; if the calculated accuracy is lower than the preset confidence value, the triple is considered to be wrong, and completion is not carried out; the confidence value may be set according to the specific experimental results, such as the confidence value is set to 0.8.

As shown in fig. 5, the present embodiment relates to a knowledge-graph complementing device for implementing the above method, including:

the training unit is used for training the completion model;

Specifically, the generating unit is configured to divide the positive triples of the knowledge graph to be complemented into a training set and a test set, construct a corresponding negative triplet set according to the positive triples in the training set and the test set, and finally generate the training set and the test set for training and testing the knowledge graph complementing model;

the training unit is used for training the knowledge graph complementing model by using the training set generated in the generating unit;

specifically, the training unit comprises an initialization subunit, a feature extraction subunit and a calculation subunit;

the initialization subunit is used for initializing embedded vectors of all the three tuple head entities, the relations and the tail entities in the training set;

the characteristic extraction subunit is used for extracting correlation characteristics between entities and relations in the triplets and original characteristics of the entities and the relations by using multi-class convolution and residual learning;

the calculation subunit is used for finally calculating the correct rate of each triple in the training set and the loss values of all the triples;

the testing unit is used for testing the trained completion model in the training unit by using the test set generated in the generating unit, and testing the completion performance of the completion model;

specifically, the test unit comprises a model test subunit and a performance evaluation subunit;

the model testing subunit is used for inputting the triples in the test set into the trained completion model, outputting a completion result and testing the completion accuracy of the completion result;

the performance evaluation subunit is used for calculating performance indexes such as MRR, Hits @10, Hits @3, Hits @1 and the like according to the completion result and evaluating the performance of the completion model;

and the completion unit is used for verifying whether all the negative triples generated in the generation unit are correct or not, performing corresponding completion on the knowledge graph to be completed according to the verification result, and finally completing the completion of the whole graph.

Compared with the prior art, the method further improves the accuracy of the completion of the knowledge graph, achieves better technical effect of the completion of the knowledge graph, and is beneficial to solving the problems of missing and incomplete knowledge graph. The knowledge graph spectrum completion model provided by the invention improves the precision of feature representation by introducing 1 multiplied by 1 standard convolution to scale the feature matrix according to a certain proportion; a large amount of correlation characteristics between entities and relations are extracted by using circular convolution and cavity convolution, so that more deep-level characteristics are learned by a model, the performance of a completion model is optimized, and the accuracy of knowledge graph completion is improved; and residual learning is introduced to fuse the original characteristic information, so that the accuracy of knowledge graph completion is further improved.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A method for supplementing a knowledge graph, comprising:

step 1: marking all triples of the knowledge graph to be supplemented as positive triples, namely a correct triplet set consisting of two head entities and tail entities which are determined to be connected in a certain relation and the relation, constructing corresponding negative triples according to the positive triplet set, namely randomly replacing the head entities or the tail entities of the positive triples with other entities to obtain an error triplet set, and finally generating a training set and a testing set for training and testing a knowledge graph and spectrum supplementation model;

the triplet is (h, r, t), wherein: h, t is a head entity and a tail entity respectively, r is the relation between the two entities, and all triples existing in the default knowledge graph are positive triples;

Inputting the embedded vector of the training set triples into a completion model, calculating the similarity score of each triplet through the completion model, further calculating the accuracy of each triplet by using the similarity score, and training the model;

and step 3: testing the trained model by using the triad group in the test set, testing the performance of the completion model, and evaluating the performance of the model by using the average reciprocal ranking and the proportion of the correct triad number in the first triad, the first triad and the third triad according to the completion result;

and 4, step 4: and taking all the negative triple sets as triple sets to be verified, inputting the triples in the triple sets to be verified into a completion model to calculate the accuracy of the triples, and completing the triples on the knowledge graph spectrum according to the calculation result.

2. The method for supplementing a knowledge-graph according to claim 1, wherein the step 1 specifically comprises:

And test set

Step 1.2: respectively in the training set

And test set

According to the positive triple set, constructing a corresponding negative triple set

And

3. the method for supplementing a knowledge-graph according to claim 1, wherein the step 2 specifically comprises:

Wherein:

representing the dimensions of the embedding vector;

step 2.2: to pair

Then will be

The two matrices are connected into a new matrix,

the matrix is on the top of the new matrix,

the matrix is at the lower part of the new matrix and is represented as

[；]Represents a matrix connection, k ═ mxn;

step 2.3: performing convolution on the matrix obtained in the step 2.2 by using multi-class convolution to capture correlation characteristics between the head entity and the relation;

step 2.4: residual error learning is carried out on the feature matrix obtained by the first 1 × 1 standard convolution of the first layer in the step 2.3.1 and the feature matrix obtained by the convolution of the third layer in the step 2.3.3, so as to obtain a final entity and relation correlation feature matrix which is represented by H; the implementation method of residual learning is a connection operation, and the following steps are obtained:

wherein:

at the leftmost part of H, H₁In that

p ═ g (score), where: g represents a sigmoid function;

step 2.7: in step 2.6, to improve the model parameter efficiency, consider thatEntity embedding matrix composed of the vector obtained in step 2.6 and N entity embedding vectors in the training set

step 2.8: and (3) calculating loss values of all the triples by using the correct rate of the triples obtained in the step 2.7 through a loss function, finishing the model training if the loss values are within a set threshold range, and skipping to the step 3, otherwise, returning to the step 2.1.

4. The method of knowledge-graph completion according to claim 3, wherein said plurality of classes of convolutions in step 2.3 comprises: standard convolution, circular convolution and hole convolution, wherein: the standard convolution is a normal convolution, and the circular convolution and the hole convolution are variants of the normal convolution, and the normal convolution is called the standard convolution for distinguishing; the multi-class convolution specifically includes: the first layer is 1 × 1 standard convolution and 3 × 3 pooling; the second layer is a 4 × 4 circular convolution; the third layer is a 5 × 5 standard convolution and a 3 × 3 hole convolution; the standard convolution is used for extracting the characteristics of the matrix, and compared with the standard convolution, the circular convolution can be used for convolving the upper edge and the lower edge, and the left edge and the right edge of the matrix, so that more characteristics can be extracted; the convolution mode of the hole convolution is the same as that of the standard convolution, but the hole convolution is to fill 0 on the convolution kernel of the standard convolution according to a fixed interval, so that the receptive field can be expanded and the features in a wider range can be extracted under the condition of not increasing the number of parameters.

5. The method of knowledge-graph completion according to claim 3 or 4, wherein said step 2.3 specifically comprises:

step 2.3.1: first layer pair

Three parallel 1 × 1 standard convolutions are performed and

performing average pooling operation with a window size of 3 × 3 to obtain

Wherein:

convolution kernels representing three times of 1 × 1 standard convolutions, respectively, representing standard convolution operations, and p (-) representing an average pooling operation;

step 2.3.2 the second layer performs three parallel round convolution operations on the feature matrix obtained after convolution and pooling of the first layer, introduces a nonlinear activation function ReLU, and uses f₁Represents; taking the feature matrix obtained by the first 1 × 1 standard convolution of the first layer as original information, performing no convolution operation on the original information, and using the original information for residual learning to obtain:

wherein:

respectively representing convolution kernels of the second layer of the third 4 x 4 circular convolution, and theta represents a circular convolution operation; because the feature matrix obtained by the first 1 × 1 standard convolution of the first layer retains original feature information, the feature matrix is connected with the feature matrix after the convolution of the third layer by residual error learning, and the original feature information is merged into the final feature matrix;

step 2.3.3: performing convolution operation on the feature matrix obtained after the convolution of the second layer at the third layer by respectively using 3 × 3 hole convolution, 5 × 5 standard convolution and 3 × 3 hole convolution, introducing a nonlinear activation function ReLU, and using f₂Is shown to obtainTo:

by H₁It is shown that,

by H₂It is shown that,

by H₃Is shown, in which:

convolution kernels representing the 3 × 3 hole convolution, the 5 × 5 standard convolution, and the 3 × 3 hole convolution of the third layer, respectively.

6. The method of knowledgegraph completion according to claim 1, wherein said step 3 specifically comprises:

step 3.2: and calculating the performance of the MRR, Hist @10, Hits @3 and Hits @1 index evaluation model according to the completion result, wherein: the MRR ranks the triples from high to low according to the scores of the triples, and then calculates the average value of the sums of the inverses of the ranks of all the triples, wherein the higher the MRR value is, the higher the completion accuracy rate is; the list comprises a top three triplet, a top three triplet and a top three triplet, wherein the top @10, the top @3 and the top @1 are respectively the proportion of the number of correct triples in the top ten triplet, the top three triplet and the first triplet, and the higher the top @10, the top @3 and the top @1 are, the higher the completion accuracy is.

7. The method of knowledgegraph completion according to claim 1, wherein said step 4 specifically comprises:

step 4.1: collecting the training set negative triple set in the step 1

And test set negative triple set

Negative triplets G combined into a whole_-；

Step 4.2: g is to be_-Inputting the triples into a completion model, and calculating the accuracy of each triplet through the completion model; if the calculated accuracy is greater than or equal to a preset confidence value, the triple is considered to be correct, and then completion is carried out on the original knowledge graph according to the triple; and if the calculated accuracy is lower than the preset confidence value, the triple is considered to be wrong, and the completion is not carried out.

8. An apparatus for implementing the knowledge-graph complementing method of any one of the preceding claims, comprising:

the generating unit is used for generating a training set and a testing set for training and testing the knowledge graph spectrum completion model according to the to-be-completed knowledge graph spectrum triple;

and the completion unit is used for verifying whether all the negative triples generated in the generation unit are correct or not, performing corresponding completion on the knowledge graph spectrum to be completed according to the verification result, and finally completing the completion of the whole graph.

9. The apparatus of claim 8, wherein the generating unit is configured to divide the positive triples of the knowledge-graph to be complemented into a training set and a testing set, construct a corresponding negative triplet set according to the positive triples in the training set and the testing set, and finally generate the training set and the testing set for training and testing the knowledge-graph-complemented model;

the training unit comprises an initialization subunit, a feature extraction subunit and a calculation subunit, wherein: the initialization subunit is used for initializing embedded vectors of all the three tuple head entities, the relations and the tail entities in the training set; the characteristic extraction subunit is used for extracting correlation characteristics between entities and relations in the triplets and original characteristics of the entities and the relations by using multi-class convolution and residual learning; the calculation subunit is used for finally calculating the correct rate of each triple in the training set and the loss values of all the triples;

the test unit comprises a model test subunit and a performance evaluation subunit, wherein: the model testing subunit is used for inputting the triples in the test set into the trained completion model, outputting a completion result and testing the completion accuracy of the completion result; and the performance evaluation subunit is used for calculating performance indexes such as MRR, Hits @10, Hits @3, Hits @1 and the like according to the completion result and evaluating the performance of the completion model.