CN111324776A

CN111324776A - Method and device for training graph embedding model, computing equipment and readable medium

Info

Publication number: CN111324776A
Application number: CN201811526711.7A
Authority: CN
Inventors: 姚权铭; 张永祺; 涂威威
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-06-23

Abstract

The invention provides a training method and device for a graph embedding model, computing equipment and a readable medium. The method comprises the following steps: acquiring a triple set of a knowledge graph for training; during each of at least some of the iterative training processes: updating the negative sample caches respectively corresponding to the triples, and then selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, or selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, and then updating the negative sample caches respectively corresponding to the triples; and updating the negative sample caches corresponding to the triples respectively, so that the negative samples with higher corresponding scores are reserved in the negative sample caches. By adopting the technical scheme of the invention, the embedding efficiency and the accuracy of the trained graph embedding model are higher.

Description

Method and device for training graph embedding model, computing equipment and readable medium

Technical Field

The invention relates to the technical field of computer application, in particular to a training method and device of a graph embedding model, computing equipment and a readable medium.

Background

Knowledge graph (knowledgegraph) is a very special oneA graph-type data structure where each node is an entity (entry) and the edges represent a relationship (relationship), so for a physical representation, a knowledge graph is typically kept as a stack of triplets (e.g., (h, r, t), where h ∈ epsilon is the head entity (entry),

the graph embedding (graph embedding) is a very important knowledge graph processing mode and plays a vital role in the use of a subsequent algorithm, and the graph embedding processing is mainly realized by adopting a graph embedding model.

In the existing knowledge graph, each triple is really existed, namely a positive sample, but no negative sample exists generally, so that how to find a proper negative sample (negative sampling) is a very critical problem in the training process of the graph embedding model. All triples which do not exist in the knowledge graph can be used as negative samples, so that the number of potential negative samples is huge, how to select effective negative samples not only can seriously affect the efficiency of the graph embedding algorithm, but also the effective negative samples can obviously improve the effect of the graph embedding model. In the prior art, negative samples are mostly extracted by uniform random sampling (uniform random). And training the graph embedding model according to the extracted negative samples.

However, in the method of extracting negative samples in the uniform random sampling manner adopted in the prior art, the extracted negative samples may have poor performance, and the evaluation function has a low score on the negative samples.

Disclosure of Invention

Based on the above technical problem, the present invention provides a method and an apparatus for training a graph-embedded model, a computing device and a readable medium, so as to improve the accuracy of the trained graph-embedded model.

The invention provides a training method of a graph embedding model, wherein the method comprises the following steps:

acquiring a triple set of a knowledge graph for training;

during each of at least some of the iterative training processes: updating the negative sample caches respectively corresponding to the triples, and then selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, or selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, and then updating the negative sample caches respectively corresponding to the triples;

and updating the negative sample caches corresponding to the triples respectively, so that the negative samples with higher corresponding scores are reserved in the negative sample caches.

The invention also provides a training device for the graph embedding model, wherein the device comprises:

the acquisition module is used for acquiring a triple set of the knowledge graph for training;

a training module to, during each of at least a portion of the iterative training processes: updating the negative sample caches respectively corresponding to the triples, and then selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, or selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, and then updating the negative sample caches respectively corresponding to the triples;

The present invention also provides a computing device comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.

The invention also provides a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform a method as described above.

According to the training method and device, the computing equipment and the readable medium of the graph embedding model, the negative sample caches corresponding to the triples are updated, so that the negative samples with higher scores can be reserved in the negative sample caches, then the corresponding negative samples are selected from the negative sample caches based on the updated negative sample caches to participate in the training of the graph embedding model, so that the negative samples with higher scores can be used for training the graph embedding model, the negative samples with higher scores can avoid that a gradient function returns to zero too fast in the training process of the graph embedding model, the training effect of the graph embedding model is improved, and the trained graph embedding model has higher embedding efficiency and better accuracy.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

FIG. 1 is a flowchart illustrating a method for training a graph embedding model according to an embodiment of the present invention.

FIG. 2 is a flowchart illustrating another embodiment of a method for training a graph embedding model according to the present invention.

FIG. 3 is a flowchart illustrating a method for training a graph embedding model according to yet another embodiment of the present invention.

FIG. 4 is a block diagram of an embodiment of a training apparatus for graph-embedded models according to the present invention.

FIG. 5 is a block diagram of another embodiment of the training apparatus for graph-embedded models of the present invention.

FIG. 6 is a schematic structural diagram of a computing device that can be used to implement the above-described training method for graph embedding models according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The training of graph-embedded models is a very important task in knowledge-graphs. The training of the graph embedding model has the following two key points: (A) selection of an evaluation function and (B) negative sample sampling. Wherein the evaluation function is a function for modeling the relationship between h, r and t in (h, r and t) in the triplet, and the evaluation result of the evaluation function determines the optimization target of the graph embedding algorithm. And negative sample sampling is the key in the optimization calculation. Since all triples in the knowledge graph as training data are real, the triples can be used as positive samples during training, and all triples not belonging to the knowledge graph can be negative samples, so that the number of potential negative samples is huge, and how to select effective negative samples not only can seriously affect the efficiency of the graph embedding algorithm, but also can remarkably affect the embedding effect of the graph embedding model.

For example, given one of the training data sets (h, r, t), its set of negative samples can be expressed as:

where ∪ represents union operations, it can be appreciated from the above representation that negative examples can be constructed for any replacement of head or tail entities for triples in the training data set, resulting in triples that are not present in the training data.

Table 1 below shows an implementation flow of the graph embedding model algorithm.

TABLE 1

Uniform random sampling (uniform random) is the most commonly used negative sample sampling method. In addition, it is also disclosed in the prior art that a Generative Adaptive Network (GAN) is used to replace the uniform random sampling, and achieve better embedding effect. Among the most important observations is "the most negative samples have very small estimates (the scores obtained from the evaluation functions)". Therefore, in uniform random sampling, the extracted negative samples are likely to have only a small estimation score, which leads to the problem that the gradient is zeroed too fast when the gradient is not zeroed during the training of the graph model, thereby terminating the training of the algorithm of the graph embedding model prematurely. Therefore, in sampling of negative samples, how to quickly find the negative samples with high estimation score is a core problem. GAN solves this problem by training the proto-model and, at the same time, also a generator (generator) constructed from a neural network. In particular, the generator will give a fractional distribution of negative examples, which will be drawn on the basis of this. The generator can well simulate the distribution of the negative samples, so that the negative samples with higher estimated scores can be effectively given, and the problem that the gradient returns to zero too fast in the training process is avoided.

However, the existing work overlooks the problem that a complex negative sample global distribution does not need to be modeled. Especially in the later period of training, useful are only negative samples with large scores, and the negative samples only account for a small proportion of all the negative samples. GAN can model the distribution as a whole, but at the same time it takes a lot of time and parameters on small negative examples that are not useful. Therefore, GAN-based models are inefficient. In addition, the selection of negative samples is a discrete optimization problem, and reinforcement learning must be used for training a GAN-based graph embedding model; while reinforcement learning is inherently very unstable and slow to train. The GAN-based approach is also difficult to train.

Based on the problems, the invention realizes the efficient and high-speed learning graph embedding model by utilizing the computer cache and only reserving high-score negative samples. In particular, a caching mechanism is established for each triplet present in the training data, this caching will dynamically alternate during the algorithm iteration during model training, and negative examples will be extracted from this caching. Because only a high fraction of negative samples remain in the buffer, the problem of the gradient returning to zero too fast can be effectively avoided. In addition, the mechanism of the present invention has no redundant training parameters and can be trained in the same manner as the original model (since there are not many parameters to train). Therefore, the cache-based model training mechanism can achieve the effect similar to or higher than that of the GAN, but the speed can be faster and the training can be more stable.

In addition, based on the above effects of the present invention, the present invention needs to search the space formed by all negative samples as much as possible, so as to reduce bias (bias) caused by sample selection. Overall, to obtain a good learning effect, two goals to be achieved by the cache design are:

(1) keeping the negative sample of the overestimation score (avoiding the problem of fast returning to zero in the training process);

(2) to explore as much as possible of the sample space (consisting of S ') consisting of all negative samples'_(h,r,t)Composition).

The training scheme of the graph embedding model of the present invention is developed in the above background, and the following embodiments may be referred to for specific implementation.

FIG. 1 is a flowchart illustrating a method for training a graph embedding model according to an embodiment of the present invention. As shown in fig. 1, the training method of the graph embedding model in this embodiment may specifically include the following steps:

s100, acquiring a triple set of a knowledge graph for training;

s101, in each iterative training process in at least part of iterative training processes: updating the negative sample caches respectively corresponding to the triples, and then selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, or selecting corresponding negative sample participation graph embedding model training for the triples from the corresponding negative sample caches, and then updating the negative sample caches respectively corresponding to the triples;

The executing body of the graph embedding model training method of this embodiment may be a graph embedding model training device, and the graph embedding model training device may be an electronic device having a physical entity, and may be capable of receiving an input triplet set of a knowledge graph for training and performing training processing based on each triplet in the triplet set. For example, in each of at least part of the iterative training process: the negative sample caches respectively corresponding to the triples can be updated first, then the corresponding negative sample participation pattern embedding model training is selected for the triples from the corresponding negative sample caches, or the corresponding negative sample participation pattern embedding model training is selected for the triples from the corresponding negative sample caches, and then the negative sample caches respectively corresponding to the triples are updated; however, the negative sample caches corresponding to the updated triples are used for directly obtaining corresponding negative samples from the negative sample caches to participate in the training of the graph embedding model during the next iterative training.

In addition, the training device of the graph embedding model in this embodiment may also be a software integrated application, and the implementation manner is the same.

In this embodiment, a negative sample cache may be set in advance for each triplet in the knowledge graph. For example, the negative sample buffer corresponding to each triplet may include: a head cache and/or a tail cache; the header cache is used for storing N1 negative samples of header entities inconsistent with corresponding triples; the tail cache is used for storing M1 negative samples of tail entities inconsistent with corresponding triples; n1 and M1 are both positive integers.

Specifically, for each triplet, the header cache is configured to store N1 negative samples whose header entities are inconsistent with the header entities of the triplets, and since only the header entities of the negative samples are different from the corresponding triplets, in order to save the storage space of the header cache, only N1 entities inconsistent with the header entities of the triplets may be stored in the header cache. For example, for N1 negative samples, only N1 unit-length buffer spaces are allocated to store the head entities of the corresponding negative samples.

Similarly, for each triplet, the tail cache is configured to store M1 negative samples of which the head entity is inconsistent with the tail entity of the triplet, and since only the head and tail entities of the negative samples are different from the corresponding triplet, in order to save the storage space of the tail cache, only M1 entities of which the tail entity is inconsistent with the tail entity of the triplet may be stored in the tail cache. For example, for M1 negative samples, only M1 unit-length buffer spaces are allocated to store the tail entities of the corresponding negative samples.

In specific training, only the negative samples in the head cache or only the negative samples in the tail cache may be selected according to a preset setting, or the negative samples may also be randomly selected from the head cache and the tail cache.

For example, in step S101 of this embodiment, "respectively select a corresponding negative sample from the corresponding negative sample cache for each triplet to participate in the training of the graph embedding model", the following method may be adopted in the specific implementation:

for a triple, when the triple has a corresponding head cache and a corresponding tail cache, a first negative sample set is randomly selected from the corresponding head cache, a second negative sample set is randomly selected from the corresponding tail cache, and then a preset number of negative samples are randomly selected from the first negative sample set and the second negative sample set as the negative samples of the triple to participate in the training of the graph embedding model.

In this implementation, for example, the negative samples are randomly selected from the head cache and the tail cache, a first negative sample set may be randomly selected from the head cache corresponding to the triplet, a second negative sample set may be randomly selected from the tail cache corresponding to the triplet, and then a predetermined number of negative samples are randomly selected from the first negative sample set and the second negative sample set as the negative samples of the triplet to participate in the training of the graph embedding model. The number of negative examples included in the first negative example set and the second negative example set in this embodiment and the number of preset numbers may be set according to a determined evaluation function, respectively. Because of different divided evaluation functions, the corresponding gradient functions are different, and the number of negative samples required by the gradient functions is also different. Therefore, the preset number of negative samples can be set by referring to the gradient function corresponding to the classification of the evaluation function, and then the number of negative samples in the first negative sample set and the second negative sample set can be respectively set according to the preset number. Since the preset number of negative examples finally participating in training can be randomly selected from the first negative example set and the second negative example set, the number of negative examples in the first negative example set and the second negative example set must be greater than the preset number.

For example, in practical applications, when only one negative sample needs to be selected to participate in training, both the first negative sample set and the second negative sample set may include only one negative sample, and in this case, one of the first negative sample set and the second negative sample set may be randomly selected to participate in training. When a plurality of negative samples need to be selected to participate in training, a plurality of negative samples can be arranged in both the first negative sample set and the second negative sample set so as to be randomly selected.

Optionally, before the step S101 "training each iteration in at least part of the iterative training process" of the above implementation, the method may further include: and the unique identifier of each triplet is adopted to index the corresponding negative sample cache, so that the negative sample cache preset for each triplet can be searched, and the corresponding negative sample cache in the negative sample cache can be updated conveniently in a follow-up manner. For example, when implemented, any one of the following three ways may be specifically included:

the first mode is as follows: and adopting the address of each triple to index the corresponding negative sample cache. The storage address of each triplet can uniquely identify each triplet, and therefore the address of each triplet can be used to index the corresponding negative sample cache.

The second mode is as follows: performing string hash on the characters of the triples in the memory, and indexing the corresponding negative sample cache according to the result of the string hash;

the third party is: and carrying out character string hashing on the specific physical meaning of each triple, and indexing the corresponding negative sample cache by using the result of the character string hashing. In one embodiment of the invention, the physical meaning of a triple may specifically be the semantic information expressed by the triple.

Each triplet (h, r, t) ∈ S in the training dataset has a unique identifier, which may be represented by an integer value, denoted as I (h, r, t.) due to negative samples S'_(h,r,t)It can be obtained by replacing the head entity or the tail entity, and it can be known from the above embodiments that two sets of the cache head cache H and the cache tail T can be designed for each triplet (H, r, T).

Head cache H: retaining high-score negative examples from replacement heads, i.e.

Is selected. The head cache H is indexed by I (H, r, t) as an address, and only the head entity is replaced, so that the head entity is indexed

Only the head storing the above negative examples;

similarly, tail cache T: retaining high-score negative samples from the displaced tails, i.e.

Is selected. T is indexed by Ih, r and T as addresses, and only the tail entities are replaced, so that the index is obtained

Only the tail where the above negative examples are stored;

also, in this example, it is actually retained in H_I(h,r,t)And T_I(h,r,t)In (1), only the marker corresponding to h ', t' ∈ epsilon is needed. Thus, for each (H, r, T) head cache H and tail cache T, the total memory consumed is 2N₁。

Similarly, taking bytes in the memory of each triple for character string hash, and indexing the corresponding negative sample cache based on the hash result; or the specific physical meaning of each triple is taken for carrying out character string hashing, and the corresponding negative sample cache can also be indexed based on the hash result. For example, the specific physical meaning of a triplet may be denoted (employee a, hired, company b), based on which string hashing may be performed.

According to the training method of the graph embedding model, the negative sample caches corresponding to the triples are updated, the negative samples with higher corresponding scores can be reserved in the negative sample caches, then based on the updated negative sample caches, when the corresponding negative samples are selected from the negative sample caches to participate in the training of the graph embedding model, the negative samples with higher scores can be used for training the graph embedding model, the negative samples with higher scores can avoid that a gradient function returns to zero too fast in the training process of the graph embedding model, the training effect of the graph embedding model is improved, and the trained graph embedding model is higher in embedding efficiency and better in accuracy.

FIG. 2 is a flowchart illustrating another embodiment of a method for training a graph embedding model according to the present invention. As shown in fig. 2, in this embodiment, specifically describing the embodiment shown in fig. 1, if the negative sample cache corresponding to each triplet includes the header cache, the step S101 of "updating the negative sample caches corresponding to the triplets respectively" may specifically include the following steps:

s200, for each triple, randomly sampling N2 negative samples of which the head entities are inconsistent with the triple in the negative sample space of the triple, wherein N2 is a positive integer;

for example, referring to the above embodiment that only the header entity different from the corresponding triplet may be included in the header buffer, for each triplet, since only the header is different, N2 negative samples whose header entity is inconsistent with the triplet are randomly sampled from the negative sample space of the triplet, and this may also be achieved by randomly selecting N2 header entities that are inconsistent with the header entity of the triplet.

S201, selecting N1 negative samples from N1 negative samples existing in a head cache of the triple and N2 sampled negative samples;

at this time, N1 header entities may be selected from the N1 header entities in the header buffer of the triplet and the N2 header entities selected randomly, correspondingly.

And S202, updating the head buffer of the triplet by using the selected N1 negative samples.

Correspondingly, the header cache of the triplet may be updated with the selected N1 header entities.

In addition, it should be noted that, in this embodiment, before starting at least part of the iteration process, the method may further include: for each triple, randomly selecting N1 head entities inconsistent with the head entity of the triple to be put into the head cache of the triple so as to initialize the head cache of the triple. The process initializes the header cache, and during initialization, the header cache in each triplet may not be the header entity of the highest-scoring negative sample, but through the updating of the embodiment, the header cache may only include the header entity of the highest-scoring negative sample.

For example, in step S201, "select N1 negative samples from N1 negative samples existing in the header buffer of the triplet and N2 negative samples of the sample", specifically, the method may include: calculating the scores of the N1+ N2 negative samples of the triple by adopting an evaluation function; then, according to the scores of the N1+ N2 negative samples, N1 negative samples are selected from the N1+ N2 negative samples.

Still further, the step of "selecting N1 negative examples from the N1+ N2 negative examples according to the scores of the N1+ N2 negative examples" may include either of the following two ways:

(a) selecting the top N1 negative samples with the highest scores from the N1+ N2 negative samples;

(b) for each negative example of the N1+ N2 negative examples, calculating an extraction probability for that negative example based on the scores of the N1+ N2 negative examples; and sequentially extracting N1 negative samples from the N1+ N2 negative samples according to the extraction probability corresponding to the N1+ N2 negative samples.

For example, the extraction probability of the negative example in this embodiment can be expressed by the following formula:

wherein p (H '| r, t) is the extraction probability of the negative sample (H' | r, t), f (H ', r, t) is the score of the evaluation function on the negative sample (H', r, t), f (H, r, t) is the score of the evaluation function on any triple (H, r, t) in the head cache H1, ∑_h∈H1e^f(h,r,t)Represents: obtaining the corresponding e of the triplet (H, r, t) corresponding to each head entity in the head cache H1^f(h ^,r,t)And then summed. Of course, in practical applications, other ways may be adopted to set the extraction probability of the negative sample, so that the probability that the negative sample with a high score is extracted is higher.

According to the training method of the graph embedding model, the head caches corresponding to the triplets are updated through the scheme, the negative samples with higher corresponding scores can be reserved in the head caches, then based on the updated head caches, when the corresponding negative samples are selected from the head caches to participate in the training of the graph embedding model, the negative samples with higher scores can be used for training the graph embedding model, the negative samples with higher scores can avoid that a gradient function returns to zero too fast in the training process of the graph embedding model, the training effect of the graph embedding model is improved, and the trained graph embedding model is higher in embedding efficiency and better in accuracy.

FIG. 3 is a flowchart illustrating a method for training a graph embedding model according to yet another embodiment of the present invention. As shown in fig. 2, in this embodiment, specifically describing the embodiment shown in fig. 1, if the negative sample cache corresponding to each triplet includes the tail cache, the step S101 of "updating the negative sample caches corresponding to the triplets respectively" may specifically include the following steps:

s300, for each triple, randomly sampling M2 negative samples of which tail entities are inconsistent with the triple from the negative sample space of the triple, wherein M2 is a positive integer;

similar to the embodiment shown in fig. 2, the tail buffer in the above-mentioned embodiment may only include tail entities different from corresponding triples, and for each triplet, since only the tail portions are different, M2 negative samples in which the tail entities are inconsistent with the triplet are randomly sampled from the negative sample space of the triplet, or M2 tail entities that are inconsistent with the tail entities of the triplet are randomly selected.

S301, selecting M1 negative samples from M1 negative samples existing in the tail buffer of the triple and M2 sampled negative samples;

similarly, correspondingly, M1 tail entities may be selected from M1 tail entities in the tail buffer of the triplet and M2 tail entities selected randomly.

And S302, updating the tail buffer of the triple by using the selected M1 negative samples.

Correspondingly, the tail cache of the triplet may be updated with the selected M1 tail entities.

Similarly, it should be noted that, in this embodiment, before starting at least part of the iteration process, the method may further include: for each triple, randomly selecting M1 tail entities inconsistent with the tail entity of the triple to be put into the tail cache of the triple so as to initialize the tail cache of the triple. Similarly, the process initializes the tail cache, and during initialization, the tail cache in each triplet may not be the tail entity of the negative sample with the highest score, but through the updating of the embodiment, the tail cache may only include the tail entity of the negative sample with the highest score.

For example, in step S301, "selecting M1 negative samples from M1 negative samples existing in the tail buffer of the triplet and M2 sampled negative samples," may specifically include: calculating the scores of the M1+ M2 negative samples of the triple by adopting an evaluation function; according to the scores of M1+ M2 negative samples, M1 negative samples are selected from M1+ M2 negative samples.

Still further, the step "selecting M1 negative examples from M1+ M2 negative examples according to the scores of M1+ M2 negative examples" may include the following two ways:

(A) selecting the top M1 negative samples with the highest scores from the M1+ M2 negative samples;

(B) for each negative example of the M1+ M2 negative examples, calculating an extraction probability for the negative example based on the scores of the M1+ M2 negative examples; and sequentially extracting M1 negative samples from the M1+ M2 negative samples according to the extraction probability corresponding to the M1+ M2 negative samples.

The negative sample extraction probability may be implemented by using the formula of the related negative sample extraction probability in the embodiment shown in fig. 2, or may be implemented by using other formulas, so that the probability that the negative sample with a high score is extracted is higher, and details are not repeated here.

According to the training method of the graph embedding model, by the scheme, the tail caches corresponding to the triples are updated, the negative samples with higher corresponding scores can be reserved in the tail caches, then based on the updated tail caches, when the corresponding negative samples are selected from the tail caches to participate in the training of the graph embedding model, the negative samples with higher scores can be used for training the graph embedding model, the negative samples with higher scores can be prevented from being subjected to the too-fast zero return in the graph embedding model training process, the training effect of the graph embedding model is improved, the embedding efficiency of the trained graph embedding model is higher, and the accuracy is better.

Based on the embodiment shown in fig. 1-3, the two-group cache, i.e. the header cache HI, based on the triples described above_(h,r,t)And tail cache TI_(h,r,t)The algorithm iteration is performed according to the algorithm flow shown in table 2 below. Specifically, the algorithm is iterated for K times as a whole; and K x M N, each triplet is guaranteed to be iterated multiple times. In each iteration, a batch of sample data S of size m is first selected_batch(step 24), then for each at S_batchNegative sampling is performed on the positive samples in (1) based on the buffer (steps 26-28); updating the embedded model parameters based on the sampled positive and negative samples (step 29); and finally updating the cache. It should be noted that, in this embodiment, the head cache H is aligned_I(h,r,t)All operations and descriptions of are equally applicable to the tail cache TI_(h,r,t)While described, only the head buffer HI may be described below_(h,r,t)And (5) performing related operation.

TABLE 2

Wherein step 27 takes uniform random sampling for the following reasons: in the core point, when selecting negative samples, it is necessary to search the whole sample space as much as possible while keeping the score high. Head cache HI_(h,r,t)The negative samples with higher scores are reserved in the method, and uniform random sampling is adopted to ensure exploration. Step 210 requires updating the existing head cache and tail cache, wherein the detailed steps of updating the head cache are as shown in table 3 below, and in practical applications, the tail cache may be updated in the same manner.

TABLE 3

Also, it is desirable to retain a high fraction of negative examples, but at the same time it is desirable to explore the negative example space as much as possible. Therefore, H 'is updated'_I(h,r,t)Previously, first, N2 entity components R are selected from the whole sample space_mThis set will sum with H_I(h,r,t)A union is taken (steps 33-34). Then, from this superset H'_I(h,r,t)Resampling back to N1 head entities as a buffer for updates. Note that steps 36-310 of Table 3 do significance sampling according to the score; in fact, the N1 entities with the highest scores can be selected according to their scores, but this cannot be explored high enough, resulting in a reduced embedding performance.

According to the training method of the graph embedding model, the negative sample caches corresponding to the triples are updated, so that the negative samples with higher corresponding scores can be reserved in the negative sample caches, then based on the updated negative sample caches, when the corresponding negative samples are selected from the negative sample caches to participate in training of the graph embedding model, the negative samples with higher scores can be used for training the graph embedding model, the negative samples with higher scores can avoid that a gradient function returns to zero too fast in the training process of the graph embedding model, the training effect of the graph embedding model is improved, and the trained graph embedding model is higher in embedding efficiency and better in accuracy.

In the flow of the graph embedding algorithm shown in table 2, in each iteration process, a negative sample is selected from the negative sample buffer for the current iteration training (steps 27 to 29), and then the negative sample buffer is updated (step 210). However, in another embodiment of the present invention, the negative sample buffer may be updated first in each iteration process, and then the negative sample is selected from the negative sample buffer to perform the iterative training.

FIG. 4 is a block diagram of an embodiment of a training apparatus for graph-embedded models according to the present invention. As shown in fig. 4, the training apparatus for graph-embedded models in this embodiment may specifically include:

an obtaining module 10, configured to obtain a triple set of a knowledge graph for training;

the training module 11 is configured to, during each of at least part of the iterative training process: updating the negative sample caches respectively corresponding to the triples in the triplet set acquired by the acquisition module 10, and then selecting corresponding training of the negative sample participation graph embedding model for the triples from the respective corresponding negative sample caches, or selecting corresponding training of the negative sample participation graph embedding model for the triples from the respective corresponding negative sample caches, and then updating the negative sample caches respectively corresponding to the triples;

The implementation principle and technical effect of the training apparatus for implementing the graph embedding model by using the modules in the embodiment are the same as those of the related method embodiment, and details of the related method embodiment may be referred to and are not repeated herein.

FIG. 5 is a block diagram of another embodiment of the training apparatus for graph-embedded models of the present invention. As shown in fig. 5, the training apparatus for graph-embedded models according to the present embodiment further describes the technical solution of the present invention in more detail based on the technical solution of the embodiment shown in fig. 4.

In the training apparatus for graph embedding model in this embodiment, the caching of the negative sample corresponding to each triplet includes: a head cache and/or a tail cache;

the header cache is used for storing N1 negative samples of header entities inconsistent with corresponding triples; the tail cache is used for storing M1 negative samples of tail entities inconsistent with corresponding triples; n1 and M1 are both positive integers.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to:

Further optionally, as shown in fig. 5, the graph embedding model training apparatus of this embodiment further includes: the indexing module 12 is configured to index the corresponding negative sample cache using the unique identifier of each triplet.

Correspondingly, the training module 11 is configured to, in each iterative training process of at least part of the iterative training processes, first update the negative sample cache corresponding to each triplet in the triplet set acquired by the index module 12.

Further optionally, the indexing module 12 is configured to: adopting the address of each triple to index the corresponding negative sample cache; or, performing string hash on the characters of the triples in the memory, and indexing the corresponding negative sample cache according to the result of the string hash; or carrying out character string hashing on the specific physical meaning of each triple, and indexing the corresponding negative sample cache by using the result of the character string hashing.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: if the negative sample cache corresponding to each triplet comprises a header cache, for each triplet, randomly sampling N2 negative samples of which the header entities are inconsistent with the triplet from the negative sample space of the triplet, wherein N2 is a positive integer; selecting N1 negative samples from N1 negative samples existing in the head buffer of the triple and the N2 sampled negative samples; the header buffer for the triplet is updated with the selected N1 negative samples.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: if only the head entity of the negative sample is stored in the head cache, randomly selecting N2 head entities inconsistent with the head entity of the triple; selecting N1 header entities from the N1 header entities and the randomly selected N2 header entities in the header buffer of the triple; the header cache of the triplet is updated with the N1 header entities selected.

Further optionally, as shown in fig. 5, the graph embedding model training apparatus of this embodiment further includes: the first initialization module 13 is configured to, for each triple, randomly select N1 head entities that are inconsistent with the head entity of the triple to be placed in the head buffer of the triple, so as to initialize the head buffer of the triple.

Correspondingly, the index module 12 may direct the corresponding head cache based on the initialization settings of the first initialization module 13.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: calculating the scores of the N1+ N2 negative samples of the triplet by adopting an evaluation function; and selecting N1 negative samples from the N1+ N2 negative samples according to the scores of the N1+ N2 negative samples.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: selecting the top N1 negative samples with highest scores from the N1+ N2 negative samples; or, for each negative example of the N1+ N2 negative examples, calculating the extraction probability of the negative example based on the scores of the N1+ N2 negative examples; and sequentially extracting N1 negative samples from the N1+ N2 negative samples according to the extraction probability corresponding to the N1+ N2 negative samples.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: for each triplet, randomly sampling M2 negative samples of which tail entities are inconsistent with the triplet from the negative sample space of the triplet, wherein M2 is a positive integer; selecting M1 negative samples from M1 negative samples existing in the tail buffer of the triple and the M2 sampled negative samples; the tail buffer for this triplet is updated with the selected M1 negative samples.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: if only the tail entity of the negative sample is stored in the tail cache, randomly selecting M2 tail entities inconsistent with the tail entity of the triple; selecting M1 tail entities from M1 tail entities in the tail buffer of the triple and M2 tail entities selected randomly; the tail cache of the triplet is updated with the selected M1 tail entities.

Further optionally, as shown in fig. 5, the graph embedding model training apparatus of this embodiment further includes: the second initialization module 14 is configured to, for each triple, randomly select M1 tail entities that are inconsistent with the tail entity of the triple to be placed in the tail buffer of the triple, so as to initialize the tail buffer of the triple.

Correspondingly, the indexing module 12 may direct the corresponding tail cache based on the initialization settings of the second initialization module 14.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: calculating the scores of the M1+ M2 negative samples of the triplet by adopting an evaluation function; according to the scores of the M1+ M2 negative samples, M1 negative samples are selected from the M1+ M2 negative samples.

Further optionally, in the training apparatus for graph-embedded models of this embodiment, the training module 11 is configured to: selecting the top M1 negative samples with the highest scores from the M1+ M2 negative samples; or, for each negative sample of the M1+ M2 negative samples, calculating an extraction probability for the negative sample based on the scores of the M1+ M2 negative samples; and sequentially extracting M1 negative samples from the M1+ M2 negative samples according to the extraction probability corresponding to each of the M1+ M2 negative samples.

Referring to fig. 6, the computing device 1000 includes a memory 1010 and a processor 1020.

The processor 1020 may be a multi-core processor or may include multiple processors. In some embodiments, processor 1020 may include a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, processor 1020 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 1010 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 1010 has stored thereon executable code that, when processed by the processor 1020, causes the processor 1020 to perform the above-described method of training the graph embedding model.

The training of the graph embedding model according to the present invention has been described in detail above with reference to the drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for training a graph embedding model, wherein the method comprises:

acquiring a triple set of a knowledge graph for training;

2. The method of claim 1, wherein,

the negative sample cache corresponding to each triplet comprises: a head cache and/or a tail cache;

3. The method of claim 2, wherein the selecting the corresponding negative sample from the corresponding negative sample cache for each triplet respectively to participate in the training of the graph embedding model comprises:

4. The method of claim 1, wherein prior to each iterative training in at least a portion of the iterative training process, the method comprises:

and adopting the unique identifier of each triple to index the corresponding negative sample cache.

5. The method of claim 4, wherein the employing the unique identifier of the triplet to de-index the corresponding negative sample cache comprises:

adopting the address of each triple to index the corresponding negative sample cache;

or, performing string hash on the characters of the triples in the memory, and indexing the corresponding negative sample cache according to the result of the string hash;

or carrying out character string hashing on the specific physical meaning of each triple, and indexing the corresponding negative sample cache by using the result of the character string hashing.

6. The method of claim 2, wherein when the negative sample cache corresponding to each triplet includes the header cache, updating the negative sample cache corresponding to each triplet comprises:

for each triplet, randomly sampling N2 negative samples in the negative sample space of the triplet, wherein the head entity is inconsistent with the triplet, and N2 is a positive integer;

selecting N1 negative samples from N1 negative samples existing in the head buffer of the triple and the N2 sampled negative samples;

the header buffer for the triplet is updated with the selected N1 negative samples.

7. The method of claim 6, wherein only the head entity of the negative sample is stored in the head cache;

the N2 negative samples randomly sampling the header entity from the negative sample space of the triplet that is inconsistent with the triplet include: randomly selecting N2 head entities inconsistent with the head entity of the triple;

the selecting N1 negative samples from the N1 negative samples existing in the header buffer of the triplet and the N2 sampled negative samples includes: selecting N1 header entities from the N1 header entities and the randomly selected N2 header entities in the header buffer of the triple;

the updating the header cache of the triplet with the selected N1 negative examples includes: the header cache of the triplet is updated with the N1 header entities selected.

8. A training apparatus for graph embedding models, wherein the apparatus comprises:

9. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-7.

10. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-7.