CN116431831A

CN116431831A - Supervised relation extraction method based on label contrast learning

Info

Publication number: CN116431831A
Application number: CN202310410923.3A
Authority: CN
Inventors: 赵亚慧; 王苑儒; 金国哲; 崔荣一; 刘帆; 任一平; 徐培焱; 李永恒; 孟嘉; 王乐; 孙烨
Original assignee: Yanbian University
Current assignee: Yanbian University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-07-14
Anticipated expiration: 2043-04-18
Also published as: CN116431831B

Abstract

The invention discloses a supervised relation extraction method based on label comparison learning, which comprises the following steps: obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols; the vectors are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation obtains a second relation vector representation through full connection layer processing; constructing positive and negative examples based on the second relation vector representation; and confirming the loss function based on the positive and negative examples and training the comparison relation to obtain the encoder capable of identifying the relation representation more accurately. The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of positive and negative example selection, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.

Description

Supervised relation extraction method based on label contrast learning

Technical Field

The invention belongs to the field of natural language processing in computer intelligent information processing, and particularly relates to a supervised relation extraction method based on label comparison learning.

Background

The rapid growth of the internet has led to an explosive growth of information, how to make efficient use of this information is a major task of information extraction (Information Extraction, IE) technology. Relationship extraction (Relation Extraction, RE) is a main task in the field of information extraction, and aims to identify semantic relationships contained between target entities from unstructured text and apply the semantic relationships to other downstream tasks such as event extraction, machine translation, knowledge graph, sentence matching and the like. The problem to be solved for supervised relational extraction is how to more efficiently utilize a limited amount of supervised data. The current strategy commonly adopted is to pretrain on large unsupervised or semi-supervised data sets and fine tune on supervised data sets. This training approach fails to fully exploit the tag information in the dataset and suffers from disjoint training goals of the pre-training model and the downstream tasks.

For the supervised relation extraction task, the head and tail entities in the sentence have been labeled and the tag class of the sentence is known. Thus, the supervised relation extraction task can be seen as a multi-classification problem for annotated sentences. For the task of supervised relation extraction, the key to the problem is how to obtain a more correct and rich relation representation from sentences for relation classification.

Zhang et al uses RNN for feature extraction to accomplish the relationship extraction task; the method uses Bi-LSTM as a feature extractor to extract text features, and captures important features in the text by combining an attention mechanism to jointly complete a relation extraction task. The relationship representation obtained through deep learning is limited in captured information, and the pre-training language model trained through large-scale data provides more possibility for the relationship extraction task. For example, wu and the like use a pre-training language model BERT to perform feature extraction to complete a relation extraction task, but simple concatenation of special symbol [ CLS ], head entity and tail entity representations is used as input, and training is performed through a full-connection+softmax model, and the training mode cannot fully mine relation representation information required in sentences. Chen et al construct positive and negative examples of contrast learning by combining packet level and sentence level, namely, data enhancement is carried out on an original sentence by replacing/inserting words with low TF-IDF scores, the sentence after data enhancement and the original sentence form positive example pairs, and representations of other packets and the original sentence form negative example pairs by randomly selecting. This training approach requires the construction of two hierarchical relational representations, sentence-level and package-level, which are complex to construct and difficult to interact effectively, and the package-level representations may lose much of the information of the sentence-level representations.

The core idea of contrast learning is to learn the similarity and the difference between samples. The common contrast learning implementation flow is: firstly, carrying out data enhancement on an original sentence to obtain a sentence with enhanced data; secondly, inputting the original sentence and the enhanced sentence into a model; and finally, taking the original sentence and the sentence with the enhanced data as positive example pairs, and taking the original sentence and other sentences as negative example pairs to carry out contrast learning training.

Based on this structure, a plurality of classical comparative learning models are generated. For example, sim-CLR improves contrast learning by using larger batch sizes and data enhancements; moCo increases the number of positive and negative examples participated in each training by constructing a dynamic dictionary for comparison learning under the condition of not increasing the burden of a model, thereby obtaining a better training encoder. Meanwhile, the contrast learning is widely applied to the relation extraction task. For example, the HiCLRE uses a method combining data enhancement and multi-granularity representation, which comprises three steps of packet level, sentence level and entity level, and performs contrast learning training on the three levels respectively, and performs interaction on the three levels at the same time, so that richer and accurate information representation integrating different granularities can be obtained; the HiURE obtains two reinforced sentences by using a data enhancement mode of Random Span, on the basis, the representations belonging to the same category and different categories with the trained sentences on semantic representation are obtained by a hierarchical clustering mode, positive and negative example pairs are respectively formed with the training sentences for training of contrast learning, and more accurate relation representation is obtained.

The models adopt a common contrast learning construction mode, namely positive and negative examples are constructed in a data enhancement mode. In order to ensure that the data-enhanced sentence and the original sentence belong to the same relation, the data-enhanced sentence and the original sentence are very close in sentence representation. Therefore, the relationship expression range obtained by training in this way is not wide enough, and it is easy to train only as a negative example of other sentence errors belonging to the same relationship as the training sample. Therefore, when Khosla et al apply contrast learning to supervised data, positive and negative examples are selected from the same Batch by the label. However, examples that are prone to misclassification, such as positive examples with low similarity to training samples and negative examples with high similarity, are still difficult to train.

Disclosure of Invention

The invention aims to provide a supervised relation extraction method based on label comparison learning, so as to solve the problems in the prior art.

In order to achieve the above object, the present invention provides a supervised relationship extraction method based on label comparison learning, including:

obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols;

the vectors with the special symbols are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation is processed through a full connection layer to obtain a second relation vector representation;

constructing positive and negative examples based on the second relation vector representation;

and confirming the loss function based on the positive and negative examples, training the comparison relation, obtaining an encoder for identifying the relation representation, and extracting the relation by adopting the encoder.

Optionally, the process of constructing the positive and negative examples includes: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.

Optionally, the process of constructing the global positive and negative example candidate dictionary further includes: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.

Optionally, a first loss function L in the comparative relationship training process _LabeisCL ：

Where Total (i) is the Total number of samples in the sample set,

representing tags y belonging to the same relationship in batch _i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.

Optionally, the training process of the contrast relationship includes: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.

Optionally, the second loss function is expressed as follows:

wherein y is _i,c A true relationship tag representing an i-th sentence,

the model representing the i-th sentence outputs probabilities of belonging to the c-th relation.

Optionally, the total loss function is:

L _total ＝(1-λ)L _CE +λL _LabelsCL

where λ is a scalar weighted hyper-parameter.

Optionally, the adding the special symbol includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.

The invention has the technical effects that:

the invention directly utilizes the label with the supervision data to construct the positive and negative examples of the contrast learning from the global and local angles, thereby greatly reducing the training time and the cost; the invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a flow chart of a method in an embodiment of the invention;

FIG. 2 shows different k in an embodiment of the invention ^- Corresponding F1-micro value diagrams;

FIG. 3 shows F1-micro values corresponding to different batch_size values in an embodiment of the present invention;

FIG. 4 is a t-SNE profile of a sample representation prior to training in an embodiment of the present invention;

FIG. 5 is a t-SNE profile of a post-training sample representation in an embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

As shown in fig. 1-5, the present embodiment provides a supervised relationship extraction method based on label contrast learning, which includes:

step one: special symbol [ CLS ] is added before and after sentences]、[SEP]Special symbol < e1 is added before and after two entities in sentence _start ＞、＜e1 _end ＞、＜e2 _start >、＜e2 _end The sentence after processing is passed through the embedding layer to obtain sentence vector representation;

step two: passing the sentence through BERT coding layer, and changing the special symbol vector representation before two entities into h _e1start ,h _e2start Splicing the two special symbolic representations together as an initial relational vector representation

Step three: a denser relation expression vector h' is obtained after the full connection layer is adopted; carrying out cosine similarity calculation on the relation expression h' and relation expressions of other samples; finally, sorting sample sequences of other samples belonging to the same relation with the training samples according to the sequence from low to high in similarity, sorting sample sequences of other samples belonging to different relations according to the sequence from high to low in similarity, and creating two candidate dictionaries for positive and negative examples from global angle

Step four: training of the contrast learning loss function and the cross entropy loss function is performed, so that more accurate relation representation is obtained.

For the existing supervised contrast learning, positive examples and negative examples are usually constructed only from a local angle, namely, a sample which belongs to the same relation with the training sample in the batch is selected as the positive example, and other samples except the training sample are selected as the negative example. Specifically, there are N samples { x } in one batch _i ,y _i } _i＝1,..N Training N _yi Representing tags y belonging to the same relationship in batch _i Is a sample of the total number of samples. For training sample x _i In one batch with x _i Belonging to the same categoryA relation (i.e. y _i ＝y _j ) Is of sample x of (2) _j Comprising x _i Is itself in common with

Training sample and sample x belonging to the same relationship _j Can form a positive example pair, thus the positive example pair has +>

For each pair. Negative example pair is then obtained by training sample x _i Respectively with all other samples x in batch _k The composition is N-1 pairs. Wherein Φ (g) represents an output representation of the sentence obtained by model encoding. τ > 0 is an adjustable scalar temperature parameter. Contrast learning training target formula L based on thought composition _base-CL The following is shown:

because the selection of the training sample in one batch is random and influenced by the super parameter of the batch size, the construction effect of positive and negative examples is not stable only from the local angle, the training sample cannot find the positive examples which belong to the same relation with the positive examples in the batch, the training of the contrast learning loss function cannot be performed, meanwhile, the samples which are easy to be misclassified (namely, the positive examples with low similarity and the negative examples with high similarity with the training sample) are difficult to ensure the training of the same batch as the training sample and are selected to participate in the contrast learning, and therefore, the invention increases the construction of the positive examples and the negative examples of the global angle on the basis of the conventional supervised contrast learning positive examples and negative examples. The built contrast learning training model based on the global angle and the local angle ensures the randomness of positive examples and negative examples from the local angle, ensures that each training sample participates in the contrast learning training from the global angle, and meanwhile pertinently selects samples which are easy to be classified by mistake to participate in the contrast learning training.

Specifically, the method comprisesThe positive example is added with a sample which belongs to the same relation with the training sample and has the lowest similarity

Positive example number of numbers is from->

To become->

More importantly, the design ensures that each training sample in the Batch is not influenced by the Batch size and sample randomness, and has corresponding positive examples so that the training samples can participate in the training of the contrast learning loss function; negative example according to super parameter k ^- The setting of (2) increases Top k which belongs to different relations with the training sample and has highest similarity ^- Sample->

The global positive example is also added into the negative example, so that the purpose of learning positive and accurate positive and negative examples by stripping the positive example from the sample is achieved, and the number of the negative sample pairs is changed from N to N+k ^- And each. The positive and negative examples of the global angle are added, so that the selection of the loss function is not limited to one batch, meanwhile, the samples which are easy to be misclassified are selected pertinently to form the positive and negative examples to participate in the training of the contrast learning loss function, the information in the correct samples can be learned more pertinently from the global angle, the error information contained in the existing relation expression can be corrected, the coverage range of each relation expression is further increased by the model, and the correctness of each relation expression is guaranteed. The final contrast learning loss function is shown below:

wherein,,

total (i) packageComprises three important parts, namely, 1) positive sample +.>

2) Top k belonging to different relations and having highest similarity ^- Negative samples->

3) Other samples x in Batch than training samples ₁ ,..x _i-1 ,x _i+1 ,...x _N . Thus, total (i) has k- +N samples in Total.

Furthermore, to inherit the strong understanding capabilities of BERT, we also add cross entropy loss functions to our model to help the model learn the correct relational representation. We represent this training goal as L _CE . Wherein y is _i,c A true relationship tag representing an i-th sentence,

the model representing the i-th sentence outputs probabilities of belonging to the c-th relation. The sentence is passed through the pre-training model BERT and full connection layer to obtain relationship expression vector h', and passed through multi-class classifier Softmax to obtain

In LabelsCL, our training goals consist of two parts, including cross entropy loss and contrast learning loss. Lambda is a scalar weighted hyper-parameter used to adjust the weights of different training objectives. Total loss function L _total The specific formula is shown below, which is a weighted average of two loss functions:

L _total ＝(1-λ)L _CE +λL _LabelsCL

the invention is carried out under the hardware environment of display card RTX5000 and display memory of 16G. The system is Ubuntu20.04, the development language is Python3.7, and the deep learning framework is Pytorch1.8. The specific parameter settings are shown in table 1.

TABLE 1

The invention adopts an English data set Semeval-2010Task 8 as a data set extracted by a supervision relation. The dataset was downloaded from OpenNRE, the specific details of which are shown in table 2. The data set has 9+1 relations, wherein 9 relations are bidirectional, such as ' Component-white (e 1, e 2) ' Component-white (e 2, e 1) ', when the relations are the same but the head entity and the tail entity are sequentially exchanged, the two relation types are considered, and a special relation ' Other ' is unidirectional, and no entity-to-sequence conversion relation type change exists.

TABLE 2

Meanwhile, in order to simulate the situation that the data is more scarce, training sets in the data set are further divided according to the proportion, wherein the dividing proportion is 1%, 10% and 100%, and the number of specific samples is shown in table 3.

TABLE 3 Table 3

The invention adopts an evaluation index PRF value to analyze and evaluate the experimental result, wherein P is the accuracy (Precision), R is the Recall rate (Recall), and F is the F1 value. The F1 value is obtained by macroaveraging the harmonic mean of P and R to reflect the model comprehensive performance, and the formula is expressed as F1-micro=2×p×r/(p+r).

Model LabelsCL and MTB, CP, BERT, roBERTa, ERICA of the invention _BERT 、ERICA _RoBERTa FineCL was compared. In order to keep fairness, the result of the invention adopts the same strategy with ERICA and FineCL, the final result is an average value running 5 times and the seed settings are completely the same, and the specific values of the seed settings are respectively: 42, 43, 44, 45, 46. The results are shown in Table 5, and the results obtained by training on different proportions of the training data amount are shown in Table 5. When the training data proportion is 10% and 100%, the effect is better than that of other models, and the average F1 value reaches 81.7% and 88.9% respectively; however, the training effect of the model of the invention is not obvious when the training data is 1%. Because the model of the invention does not adopt a training strategy of pre-training on a large-scale unsupervised or semi-supervised relation extraction data set and then fine-tuning on semval-2010, the model directly obtains a relation representation through a BERT pre-training model and then directly fine-tunes on semval-2010. Therefore, it is presumed that when 1% training data is used, there is a problem that the data amount is insufficient and the expression inclusion range is not wide enough, and this problem leads to unsatisfactory results. The comparative results of each model are shown in Table 4.

TABLE 4 Table 4

The invention designs a contrast learning model, which selects positive and negative examples from two angles: 1) Global angle, selecting positive and negative examples by calculating the similarity maximum value; 2) Local angle is determined by selecting a sample in the Batch that has the same relationship with the training sample as a positive example, and selecting all other samples in the Batch except the training sample as negative examples.

TABLE 5

All experimental results in table 5 were completed at the same seed setting. The overall positive and negative example selection pattern F1 value for global + is 88.9%. When only a local mode is adopted, the F1 value is reduced by 0.7 percent; when only global mode is used, the F1 value is reduced by 0.3%. The overall situation is that samples which are most prone to be subjected to error classification are correctly classified, and the overall situation and the local learned representation are different; local is to correctly sort any sample that is randomly selected.

The purpose of this experiment was to observe the effect of hyper-parameters on the results. k (k) ^- Representing the number of negative examples selected in a global mode, namely the top k which does not belong to the same relation with the training sample but has the highest similarity ^- Samples. Under the premise that only the sample with the lowest similarity in the same relation with the training sample is selected as the positive example in the global angle, the optimal negative example number of the global angle is explored. As shown in FIG. 2, when k ^- The result is best when=2. As can be seen from the experimental results, not k ^- The greater the effect, the better, when k ^- When the model is oversized, the model trained through the training set may have the phenomenon of overfitting, and does not have good generalization capability; when k is ^- If the model is too small, the model is not learned sufficiently, and different relational expressions cannot be classified well.

The size of the Batch size greatly affects the number of samples that are selected from a local perspective, positive and negative examples. However, as can be seen from the experimental results in FIG. 3, the larger the not-Batch size, the better the model effect. The F1 value when the Batch size=16 is 0.2% higher than the F1 value when the Batch size=32, so that the model can learn a good effect under the condition of smaller Batch size, and the practicability of the model is shown.

In order to observe whether the Semeval2010 data set reaches a training target after training on the comparison learning model designed by the invention, t-SNE is used for visualizing the relation representation before and after learning in the experiment four, so that the representation distribution condition before and after training can be intuitively seen.

The invention selects four relations in the Semeval2010 data set for display, and selects 4 relations in the file 'rel 2id. Json' as follows: the 4 numbers corresponding to 1, 3, 9, 12 in FIGS. 4 and 5 are the relationships "Component-white (e 1, e 2)": 1, "Component-white (e 2, e 1)": 12, "membrane-Collection (e 1, e 2)": 3, "membrane-Collection (e 2, e 1)": 9, 4.

Meanwhile, in the formed triples (head entity, relation and tail entity) corresponding to each other, the relation represented by the

relation

1 and 12 belongs to the same relation, but the sequence of the head entity and the tail entity is opposite, as shown in fig. 4, the initial representation (namely the representation obtained by the example after the BERT pre-training model) obtained before training is very close and is very easy to confuse. However, after the comparison learning model designed by the invention is trained (see fig. 5), the problems can be solved well.

The invention provides a supervised relation extraction model based on label comparison learning, which directly utilizes labels with supervised data to construct positive and negative examples of comparison learning from two angles of global and local, thereby greatly reducing training time and cost.

The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A supervised relation extraction method based on label comparison learning is characterized by comprising the following steps:

2. The supervised relationship extraction method based on label contrast learning as recited in claim 1,

the process for constructing the positive and negative examples comprises the following steps: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.

3. The supervised relationship extraction method based on label contrast learning as recited in claim 2,

the process of constructing the global positive and negative example candidate dictionary further comprises: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.

4. The supervised relationship extraction method based on label contrast learning as recited in claim 1,

first loss function L in contrast relationship training process _LabelsCL ：

In the formula, total (i) is the Total number of samples in the sample set, N _yi Is shown inThe tags belonging to the same relationship y in batch _i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.

5. The supervised relationship extraction method based on label contrast learning as recited in claim 1,

the training process of the contrast relationship comprises the following steps: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.

6. The supervised relationship extraction method based on label contrast learning as recited in claim 4,

the second loss function is expressed as follows:

wherein y is _i,c A true relationship tag representing an i-th sentence,

7. The supervised relationship extraction method based on label contrast learning as recited in claim 6,

the total loss function is:

L _total ＝(1-λ)L _CE +λL _LabelsCL

where λ is a scalar weighted hyper-parameter.

8. The supervised relationship extraction method based on label contrast learning as recited in claim 1,

the process of adding special symbols includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.