CN117891957B

CN117891957B - Knowledge graph completion method based on pre-training language model

Info

Publication number: CN117891957B
Application number: CN202410289201.1A
Authority: CN
Inventors: 雷逸舒; 王家兵; 文贵华
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-05-07
Anticipated expiration: 2044-03-14
Also published as: CN117891957A

Abstract

The invention discloses a knowledge graph completion method based on a pre-training language model, which is characterized in that the prior knowledge graph is subjected to file format processing, and the mapping and formatting of numbers and names of entities and relations are established, so that knowledge triplet data suitable for the model is obtained, and then the pre-training language model is used for extracting the embedding of the entities and the relations in the triplet, so as to perform training learning. The invention designs an antagonism learning method and an entity fine granularity representation method based on relative transformation, which can remarkably improve the learning efficiency of a model and further improve the effect of knowledge graph completion.

Description

Knowledge graph completion method based on pre-training language model

Technical Field

The invention relates to the technical field of knowledge graph completion, in particular to a knowledge graph completion method based on a pre-training language model.

Background

Knowledge graph completion is a challenging field, and is mainly reflected in the massive nature of knowledge graph data, namely, the knowledge graph data comprises a large number of entities and a complex network structure. Therefore, modeling the knowledge graph completion model is a complex project.

Along with the development of deep learning and pre-training language models, knowledge graph completion has appeared a brand new direction, namely a text-based knowledge graph completion model. The model mainly utilizes a pre-training language model to model the knowledge graph completion task, and a better effect is obtained.

However, the text-based knowledge graph completion task mostly only focuses on modeling the triplet matching task by using a language model, but ignores characterization information of massive entities, so that training information sources of the model are insufficient and rich; meanwhile, the model has a 'shortcut' problem in the contrast learning task.

Disclosure of Invention

In view of this, in order to solve the above technical problems at least in part, the present invention provides a knowledge graph complement method based on a pre-training language model by using countermeasure learning, and aims to introduce a relative transformation method based on an attention mechanism, so as to alleviate the "shortcut" problem occurring in the model in contrast learning, and make the training of the model more stable.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a knowledge graph completion method based on a pre-training language model, comprising:

Acquiring known knowledge graph data, constructing an input sequence,

Constructing a pre-training language model for extracting feature vectors of the input sequence,

Carrying out knowledge graph completion according to the matching score of the feature vector; wherein,

Determining a contrast loss function based on a learnable disturbance factor matrix, and training a pre-training language model according to the contrast loss function; the contrast loss function is:

Wherein, the number of negative samples in the current batch N, tau is a temperature coefficient, gamma is a fixed positive constant, beta is a super parameter, RTAM _hrt is a scalar value of a given positive sample in an anti-disturbance factor matrix; representing scalar values of a given negative sample in an anti-perturbation factor matrix,/> Representing a triplet evaluation function.

Preferably, the triplet evaluation function employs cosine similarity; i.e.。

Preferably, the known knowledge-graph data is obtained and then preprocessed, including text normalization and relationship normalization,

The text normalization is to remove the underlined characters of the text data in the knowledge base of the knowledge graph;

The relationship is normalized to check if there is a surface in the same form and the relationship is normalized to a different form.

Preferably, the input sequence is structured as follows:

Wherein h represents a head entity, r represents a relationship, t represents a tail entity, and h _text,r_text and t _text represent a text description of the head entity, a text description of the relationship and a text description of the tail entity, respectively; Representing a concatenation operation, [ SEP ] represents a special spacer, seq _hr represents the input sequence of the head entity-relation, and Seq _t represents the input sequence of the tail entity.

Preferably, the pre-trained language model includes BERT _hr、BERT_t and BERT _A;

the step of extracting feature vectors of the input sequence comprises:

The input of Seq _hr into BERT _hr yields vector e _hr, the input of Seq _t into BERT _t yields vector e _t, and the input of Seq _t into BERT _A yields vector Ae _t.

Preferably, a pre-trained BERT model is built for initializing BERT _hr、BERT_t and BERT _A using the same weights.

Preferably, the expression of each vector is:

wherein mean_pool represents the average pooling operation and normalize represents the normalization operation.

Preferably, the known knowledge-graph data is negatively sampled; to perform feature learning of head entity-relationship pairs and tail entities using a contrast learning paradigm, the negative sampling strategy comprising:

Negative sampling in a batch, and replacing tail entities in a training sample currently input into the model by other tail entities in the batch;

performing negative sampling before batch, and using the head entity-relation feature vector and the tail entity feature vector obtained by the previous round of training as negative samples of the current batch of training samples;

negative sampling is performed, and the tail entity in the current triplet is replaced by the head entity.

Preferably, the learning countermeasure factor matrix obtaining process includes:

calculating the distance between each group of inquiry (h, r) and all other candidate tail entities (t) to obtain a relative transformation matrix;

inputting the relative transformation matrix into a multi-layer perceptron to obtain attention weight;

Multiplying the relative transformation matrix by the attention weight to obtain an anti-disturbance factor matrix.

Preferably, the relative transformation matrix is calculatedWhen the distance calculation formula is adopted as cosine distance,

Where n is the batch size of the current batch training,。

Preferably, based on the entity characterization countermeasure learning method, learning fine-grained entity characterization features, the process includes:

Constructing an antagonism learning virtual sample library;

the vector Ae _t is obtained to interact with the sample library, and the positive and negative samples in the updated sample library are used for updating the following steps:

where k _i represents the update policy of the negative samples in the sample library, k _i ⁺ represents the update policy of the positive samples in the sample library, i ⁺ represents the selection policy of the positive samples, η is the learning rate of the positive and negative sample updates in the repository, The temperature coefficient updated for the memory bank samples, p (·) represents the probability that the samples in the current sample bank are positive samples relative to the current Ae _t, and K is the size of the sample bank.

Preferably, training the pre-training language model with a loss function determined based on a matrix of learnable challenge disturbance factors and a loss function based on an entity characterization challenge learning method, comprises:

preferably, the loss function based on the entity characterization countermeasure learning method is expressed as follows:

。

Compared with the prior art, the knowledge graph completion method based on the pre-training language model is provided, and the learning anti-disturbance factor matrix is obtained by introducing a relative transformation method based on an attention mechanism and utilizing the relation among samples, so that the problem of 'shortcut learning' in the original InfoNCE training model process is relieved, and the training efficiency and accuracy of the knowledge graph completion model are improved.

The invention has the other beneficial effects that the fine-granularity entity characterization countermeasure learning module is introduced, the characterization information of rich entity individuals in the knowledge graph is further utilized, and the fine-granularity entity representation characteristics are learned, so that the representation space of the entity in the knowledge graph is optimized, and the learning efficiency of the model training process and the accuracy and efficiency of the knowledge graph completion are further improved. The method provided by the invention has the advantages of high accuracy and low resource consumption.

The invention has the further beneficial effects that the combination of the relative transformation method based on the attention mechanism and Aler modules is used, and experiments show that the original knowledge graph completion model can achieve the performance under the condition of multiple computing resources under the condition of fewer computing resources, thereby improving the computing efficiency of the computing equipment and having the practical benefits of carbon reduction and energy saving.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a knowledge graph completion method based on a pre-training language model provided by the invention;

FIG. 2 is a diagram illustrating an exemplary training process of a pre-training language model provided by the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

The embodiment of the application discloses a knowledge graph completion method based on a pre-training language model based on countermeasure learning and the pre-training language model, and the knowledge graph completion method based on the pre-training language model disclosed by the application can be applied to the completion of acupuncture knowledge graphs and can also be applied to the completion of knowledge graphs in other fields.

The complement method steps are as shown in fig. 1, and include:

Acquiring known knowledge graph data, constructing an input sequence,

Carrying out knowledge graph completion according to the matching score of the feature vector;

In this embodiment, the triplet evaluation function employs cosine similarity; i.e.。

The following describes the execution of each step in detail.

Firstly, acquiring known knowledge graph data and constructing an input sequence;

in this embodiment, preprocessing is performed after acquiring known knowledge graph data, where the preprocessing step includes text normalization and relationship normalization; wherein,

Text normalization refers to removing the underlined characters of the knowledge base of the knowledge graph that are present in the text data;

Relationship normalization refers to checking whether there is a relationship with the same surface form that is normalized to a different form.

Then constructing an input sequence, wherein in the application, the sequence construction refers to splicing text descriptions of corresponding entities and relations in the knowledge graph with the current entities and relations to form a desired input sequence;

in the application, the knowledge graph is composed of triples and is expressed by (h, r, t), wherein h represents a head entity, r represents a relation, and t represents a tail entity. Corresponding text descriptions, namely h _text,r_text and t _text, respectively represent the text description of the head entity, the text description of the relation and the text description of the tail entity of the three elements in the knowledge graph.

Assuming that the current knowledge graph triplet is related to (acupoint name, efficacy, acupoint efficacy), then the corresponding h is the acupoint name, h _text is the text description of the acupoint name, r is the efficacy, t is the efficacy of the acupoint, generally abbreviated text, and t _text is the text description of the specific efficacy of the acupoint, generally longer text.

As an exemplary embodiment, there are two kinds of construction formats of the input sequence, respectively:

Wherein, Representing a concatenation operation, [ SEP ] represents a special spacer, seq _hr represents the input sequence of the head entity-relation, and Seq _t represents the input sequence of the tail entity.

A second step of constructing a pre-training language model for extracting feature vectors of the input sequence,

In the invention, a pre-training language model (BERT-base-uncased) is used for constructing an encoder module of a knowledge-graph triplet, which comprises a head entity-relation encoder BERT _hr, a tail entity encoder BERT _t and an auxiliary entity encoder BERT _A;

Preferably, a pre-trained BERT model is built, and the BERT _hr、BERT_t and BERT _A are initialized by using the same weight; wherein BERT _hr does not share weights derived from subsequent training updates with BERT _t、BERT_A.

In this embodiment, the head entity-relation pair and the tail entity are respectively encoded by using the constructed pre-training language model to obtain respective feature vectors, which specifically includes the following steps:

Inputting the obtained Seq _hr into BERT _hr via pooling layer and normalizing to obtain head entity-relation encoding vector e _hr, inputting the obtained Seq _t into BERT _t and BERT _A via pooling layer and normalizing to obtain tail entity encoding vector e _t and auxiliary entity encoding vector Ae _t respectively,

Preferably, the coding vectors obtained in the encoder module are averaged and pooled; i.e.

In one embodiment, the pre-training language model is trained using a comparative learning paradigm; as shown in fig. 2, includes:

1) Negative sampling is carried out on the known knowledge graph data;

Specifically, three negative sampling strategies, namely intra-batch negative sampling, pre-batch negative sampling and self-negative sampling, are adopted to carry out negative sample sampling, and are used for carrying out feature learning of a head entity-relation pair and a tail entity based on a comparison learning paradigm; wherein,

Negative sampling in a batch, i.e. the training samples currently input to the model, using other tail entities in the batch with (h, r, t) as positive samplesReplacing t in (h, r, t) to obtain a negative sample/>Examples of negative sampling within a batch that may be possible are: (Yangxi, located in the center of the nipple);

The negative sampling before batch, namely, the head entity-relation feature vector and the tail entity feature vector obtained by the previous round of training are used as negative samples of the training samples of the current batch, namely, the possible sample of the negative sampling before batch is: in fact this is a correct triplet (in milk, at the center of the nipple), except that in our training phase, the sequence code of (in milk) is coded to e _hr and the sequence code of (at the center of the nipple) is coded to e _t in different training sequences, so this is also considered as a negative sample;

Negative samples (h, r, h) are obtained from negative sampling, i.e. for the current positive sample (h, r, t), the tail entity in the current triplet is replaced with the head entity. One example of a self-negative sampling that may be possible is: (Yangxi, located in Yangxi).

2) The traditional comparison file paradigm adopts InfoNCE loss functions as the training loss functions, specifically:

The existing research shows that the traditional InfoNCE loss function can cause the problem of 'shortcut learning' on model training, so the invention uses a relative transformation method based on attention to relieve 'shortcut learning' in the InfoNCE loss function.

The method comprises the following specific steps:

Calculating the distance between each group of inquiry (h, r) and all other candidate tail entities (t) to obtain a relative transformation matrix ；

Will beInput into a multi-layer perceptron to obtain attention weight/>；

Will beAnd/>Multiplying to obtain an anti-disturbance factor matrix/>Namely, a relative transformation matrix based on an attention mechanism, and the specific calculation method comprises the following steps:

Wherein, CD (·) is a distance function, MLP is a multi-layer perceptron, n is the batch size of the current batch training;

preferably, the relative transformation matrix is calculated When the distance function used is the cosine distance, i.e

Wherein,；

Finally, the contrast loss function which is obtained by integrating the relative transformation method based on the attention mechanism and can learn to resist the disturbance factor matrix determination is as follows:

Wherein, the number of negative samples in the current batch N, tau is a temperature coefficient, gamma is a fixed positive constant, beta is a super parameter, RTAM _hrt is a scalar value of a given positive sample in an anti-disturbance factor matrix; representing scalar values of a given negative sample in an anti-perturbation factor matrix,/> Representing a triplet evaluation function. In this embodiment, the triplet evaluation function employs cosine similarity; i.e./>。

In this embodiment, n=128, γ=0.2, and β=0.5 are set.

3) The application further provides a method for using Aler to further optimize the representation space of the entity based on the entity characterization fine granularity learning module for resisting learning, which is used for obtaining the loss of the entity characterization fine granularity learning;

Specifically, the process of learning fine-grained entity characterization features based on an entity characterization challenge learning method includes:

Constructing an antagonism learning virtual sample library; as a sample library of entity characterization fine-grained learning modules based on countermeasure learning,

The vector Ae _t is used for interacting with a sample library to obtain a feedback value for updating positive and negative samples in the sample library, and the iterative updating and the characterization information of the samples in the repository are synchronized in the training process, wherein the updating steps are as follows:

In one embodiment, k=65536,=0.04。

The loss function of the entity characterization fine granularity learning finally obtained by the application is as follows:

Further, to enable the fine-grained learning module to be applied to the triad structured information obtained when other modules are trained based on the entity representation of the countermeasure learning, part of the parameters in the BERT _t are shared to the BERT _A, so that the BERT _A has the capability of sensing the triad structured information.

Because the BERT _A used in Aler needs to obtain part of the parameters shared in BERT _t, other relevant modules are updated implicitly according to the loss function of Aler modules when the final gradient is back propagated according to the chain rule;

Specifically, BERT _hr is updated only from the triplet matching training module, BERT _t is updated from the two penalty function terms, BERT _A is updated from the penalty of BERT _t and Aler;

Before each round of loss return, performing parameter sharing of the BERT _t->BERT_A, and after the return, performing back propagation parameter updating of the L_ Aler;

And all parameters are shared as a percentage of the super parameters, i.e. param _ BERT _A = gamma * param_BERT_A + (1-gamma) * param_BERT_t.

According to the application, the Aler module is introduced, so that the characterization information of rich entity individuals in the knowledge graph is further utilized, and the entity representation characteristics of fine granularity are learned, thereby optimizing the representation space of the entities in the knowledge graph, and further improving the accuracy and efficiency of knowledge graph completion.

In a preferred embodiment, the two loss values are weighted and added to obtain a final loss function of model training, and the loss function is used for training the pre-training language model.

Where θ represents a weight coefficient, specifically, θ=1.4 is selected as the best embodiment.

Thirdly, carrying out knowledge graph completion according to the matching score of the feature vector;

The matching score calculation formula of the feature vector obtained through the normalization operation can be simplified. The simplified matching score calculation formula is the dot product among vectors:

Wherein e _x and e _y represent two embedded vectors respectively, And/>Representing the modulo lengths of the two embedded vectors, respectively. Since the vector obtained from the encoder is processed via normalize, the calculation formula for calculating the matching score can be simplified as:

I.e., given a query (h, r), the answer (t) of the model prediction can be expressed as:

wherein ɛ is the candidate entity corpus.

Example two

The present embodiment provides a computer readable storage medium, where the storage medium may be a storage medium such as a ROM, a RAM, a magnetic disk, or an optical disk, and the storage medium stores one or more programs, where the programs, when executed by a processor, implement the knowledge graph complement method based on the pre-training language model disclosed in the first embodiment.

Example III

The present embodiment provides a computing device, which may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, or other terminal devices with display functions, where the computing device includes a processor and a memory, where the memory stores one or more programs, and when the processor executes the programs stored in the memory, the knowledge graph completing method based on the pre-training language model in the first embodiment is implemented.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A knowledge graph completion method based on a pre-training language model, comprising:

Acquiring known knowledge graph data, wherein the knowledge graph consists of triples and is expressed by (h, r, t), wherein h represents a head entity, r represents a relation, and t represents a tail entity; and the input sequence is constructed in the following format,

Seq_hr＝(h||h_text[SEP]r||r_text)

Seq_t＝(t|t_text)

Wherein h _text,r_text and t _text represent a textual description of the head entity, a textual description of the relationship, and a textual description of the tail entity, respectively; the I represents a concatenation operation, [ SEP ] represents a special spacer, seq _hr represents the input sequence of the head entity-relation, and Seq _t represents the input sequence of the tail entity;

Constructing a pre-training language model, and constructing an encoder module of a knowledge-graph triplet by using the pre-training language model, wherein the encoder module comprises a head entity-relation encoder BERT _hr, a tail entity encoder BERT _t and an auxiliary entity encoder BERT _A; the method for extracting the feature vector of the input sequence comprises the following steps:

Inputting Seq _hr into BERT _hr to obtain vector e _hr, inputting Seq _t into BERT _t to obtain vector e _t, and inputting Seq _t into BERT _A to obtain vector Ae _t;

Wherein, the number of negative samples in the current batch N, tau is a temperature coefficient, gamma is a fixed positive constant, beta is a super parameter, RTAM _hrt is a scalar value of a given positive sample in an anti-disturbance factor matrix; RTAM _hrt′ denotes the scalar value of a given negative sample in the anti-perturbation factor matrix, Representing a triplet evaluation function;

the learning anti-disturbance factor matrix obtaining process comprises the following steps:

Calculating the distance between each group of inquiry (h, r) and all other candidate tail entities (t) to obtain a relative transformation matrix; when the relative transformation matrix is calculated, the distance calculation formula adopted is cosine distance,

Wherein RTM ^n*n is the relative transformation matrix, n is the batch size of the current batch training, CD (. Cndot.) is the distance function,

α＝MLP(RTM^n*n)

alpha is attention weight, and MLP is multi-layer perceptron;

multiplying the relative transformation matrix with the attention weight to obtain an anti-disturbance factor matrix;

RTAM^n*n＝RTM^n*n*α。

2. The method for supplementing knowledge patterns based on a pre-training language model according to claim 1 wherein the pre-processing is performed after the known knowledge pattern data is obtained, including text normalization and relationship normalization,

3. The knowledge-graph completion method based on a pre-training language model according to claim 1, wherein known knowledge-graph data is negatively sampled; to perform feature learning of head entity-relationship pairs and tail entities using a contrast learning paradigm, the negative sampling strategy includes:

4. The knowledge-graph completion method based on a pre-training language model according to claim 1, wherein the learning of fine-grained entity characterization features based on an entity characterization countermeasure learning method comprises:

Constructing an antagonism learning virtual sample library;

Where K _i represents the update policy of the negative sample in the sample library, K _i ⁺ represents the update policy of the positive sample in the sample library, i ⁺ represents the selection policy of the positive sample, η is the learning rate of the positive and negative sample update in the memory library, τ' is the temperature coefficient of the memory library sample update, p (·) represents the probability that the sample in the current sample library is the positive sample relative to the current Ae _t, and K is the size of the sample library.

5. The knowledge-graph completion method based on a pre-training language model according to claim 4, wherein the pre-training language model is trained by using a loss function determined based on a matrix of learnable challenge disturbance factors and a loss function based on an entity characterization challenge learning method, comprising

Loss＝L_TMT+θ·L_Aler

Where θ represents a weight coefficient, L _TMT represents a loss function determined based on a matrix of learnable challenge disturbance factors, and L _Aler represents a loss function based on an entity characterization challenge learning method.

6. The knowledge-graph completion method based on a pre-training language model according to claim 5, wherein the loss function of the countermeasure learning method based on entity characterization is expressed as: