CN116431831A - Supervised relation extraction method based on label contrast learning - Google Patents
Supervised relation extraction method based on label contrast learning Download PDFInfo
- Publication number
- CN116431831A CN116431831A CN202310410923.3A CN202310410923A CN116431831A CN 116431831 A CN116431831 A CN 116431831A CN 202310410923 A CN202310410923 A CN 202310410923A CN 116431831 A CN116431831 A CN 116431831A
- Authority
- CN
- China
- Prior art keywords
- relation
- positive
- training
- samples
- supervised
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 82
- 239000013598 vector Substances 0.000 claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 abstract description 11
- 238000012545 processing Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 8
- 238000010276 construction Methods 0.000 description 4
- 241000735495 Erica <angiosperm> Species 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000013256 coordination polymer Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 235000019580 granularity Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/28—Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a supervised relation extraction method based on label comparison learning, which comprises the following steps: obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols; the vectors are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation obtains a second relation vector representation through full connection layer processing; constructing positive and negative examples based on the second relation vector representation; and confirming the loss function based on the positive and negative examples and training the comparison relation to obtain the encoder capable of identifying the relation representation more accurately. The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of positive and negative example selection, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.
Description
Technical Field
The invention belongs to the field of natural language processing in computer intelligent information processing, and particularly relates to a supervised relation extraction method based on label comparison learning.
Background
The rapid growth of the internet has led to an explosive growth of information, how to make efficient use of this information is a major task of information extraction (Information Extraction, IE) technology. Relationship extraction (Relation Extraction, RE) is a main task in the field of information extraction, and aims to identify semantic relationships contained between target entities from unstructured text and apply the semantic relationships to other downstream tasks such as event extraction, machine translation, knowledge graph, sentence matching and the like. The problem to be solved for supervised relational extraction is how to more efficiently utilize a limited amount of supervised data. The current strategy commonly adopted is to pretrain on large unsupervised or semi-supervised data sets and fine tune on supervised data sets. This training approach fails to fully exploit the tag information in the dataset and suffers from disjoint training goals of the pre-training model and the downstream tasks.
For the supervised relation extraction task, the head and tail entities in the sentence have been labeled and the tag class of the sentence is known. Thus, the supervised relation extraction task can be seen as a multi-classification problem for annotated sentences. For the task of supervised relation extraction, the key to the problem is how to obtain a more correct and rich relation representation from sentences for relation classification.
Zhang et al uses RNN for feature extraction to accomplish the relationship extraction task; the method uses Bi-LSTM as a feature extractor to extract text features, and captures important features in the text by combining an attention mechanism to jointly complete a relation extraction task. The relationship representation obtained through deep learning is limited in captured information, and the pre-training language model trained through large-scale data provides more possibility for the relationship extraction task. For example, wu and the like use a pre-training language model BERT to perform feature extraction to complete a relation extraction task, but simple concatenation of special symbol [ CLS ], head entity and tail entity representations is used as input, and training is performed through a full-connection+softmax model, and the training mode cannot fully mine relation representation information required in sentences. Chen et al construct positive and negative examples of contrast learning by combining packet level and sentence level, namely, data enhancement is carried out on an original sentence by replacing/inserting words with low TF-IDF scores, the sentence after data enhancement and the original sentence form positive example pairs, and representations of other packets and the original sentence form negative example pairs by randomly selecting. This training approach requires the construction of two hierarchical relational representations, sentence-level and package-level, which are complex to construct and difficult to interact effectively, and the package-level representations may lose much of the information of the sentence-level representations.
The core idea of contrast learning is to learn the similarity and the difference between samples. The common contrast learning implementation flow is: firstly, carrying out data enhancement on an original sentence to obtain a sentence with enhanced data; secondly, inputting the original sentence and the enhanced sentence into a model; and finally, taking the original sentence and the sentence with the enhanced data as positive example pairs, and taking the original sentence and other sentences as negative example pairs to carry out contrast learning training.
Based on this structure, a plurality of classical comparative learning models are generated. For example, sim-CLR improves contrast learning by using larger batch sizes and data enhancements; moCo increases the number of positive and negative examples participated in each training by constructing a dynamic dictionary for comparison learning under the condition of not increasing the burden of a model, thereby obtaining a better training encoder. Meanwhile, the contrast learning is widely applied to the relation extraction task. For example, the HiCLRE uses a method combining data enhancement and multi-granularity representation, which comprises three steps of packet level, sentence level and entity level, and performs contrast learning training on the three levels respectively, and performs interaction on the three levels at the same time, so that richer and accurate information representation integrating different granularities can be obtained; the HiURE obtains two reinforced sentences by using a data enhancement mode of Random Span, on the basis, the representations belonging to the same category and different categories with the trained sentences on semantic representation are obtained by a hierarchical clustering mode, positive and negative example pairs are respectively formed with the training sentences for training of contrast learning, and more accurate relation representation is obtained.
The models adopt a common contrast learning construction mode, namely positive and negative examples are constructed in a data enhancement mode. In order to ensure that the data-enhanced sentence and the original sentence belong to the same relation, the data-enhanced sentence and the original sentence are very close in sentence representation. Therefore, the relationship expression range obtained by training in this way is not wide enough, and it is easy to train only as a negative example of other sentence errors belonging to the same relationship as the training sample. Therefore, when Khosla et al apply contrast learning to supervised data, positive and negative examples are selected from the same Batch by the label. However, examples that are prone to misclassification, such as positive examples with low similarity to training samples and negative examples with high similarity, are still difficult to train.
Disclosure of Invention
The invention aims to provide a supervised relation extraction method based on label comparison learning, so as to solve the problems in the prior art.
In order to achieve the above object, the present invention provides a supervised relationship extraction method based on label comparison learning, including:
obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols;
the vectors with the special symbols are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation is processed through a full connection layer to obtain a second relation vector representation;
constructing positive and negative examples based on the second relation vector representation;
and confirming the loss function based on the positive and negative examples, training the comparison relation, obtaining an encoder for identifying the relation representation, and extracting the relation by adopting the encoder.
Optionally, the process of constructing the positive and negative examples includes: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.
Optionally, the process of constructing the global positive and negative example candidate dictionary further includes: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.
Optionally, a first loss function L in the comparative relationship training process LabeisCL :
Where Total (i) is the Total number of samples in the sample set,representing tags y belonging to the same relationship in batch i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.
Optionally, the training process of the contrast relationship includes: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.
Optionally, the second loss function is expressed as follows:
wherein y is i,c A true relationship tag representing an i-th sentence,the model representing the i-th sentence outputs probabilities of belonging to the c-th relation.
Optionally, the total loss function is:
L total =(1-λ)L CE +λL LabelsCL
where λ is a scalar weighted hyper-parameter.
Optionally, the adding the special symbol includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.
The invention has the technical effects that:
the invention directly utilizes the label with the supervision data to construct the positive and negative examples of the contrast learning from the global and local angles, thereby greatly reducing the training time and the cost; the invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method in an embodiment of the invention;
FIG. 2 shows different k in an embodiment of the invention - Corresponding F1-micro value diagrams;
FIG. 3 shows F1-micro values corresponding to different batch_size values in an embodiment of the present invention;
FIG. 4 is a t-SNE profile of a sample representation prior to training in an embodiment of the present invention;
FIG. 5 is a t-SNE profile of a post-training sample representation in an embodiment of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
As shown in fig. 1-5, the present embodiment provides a supervised relationship extraction method based on label contrast learning, which includes:
step one: special symbol [ CLS ] is added before and after sentences]、[SEP]Special symbol < e1 is added before and after two entities in sentence start >、<e1 end >、<e2 start >、<e2 end The sentence after processing is passed through the embedding layer to obtain sentence vector representation;
step two: passing the sentence through BERT coding layer, and changing the special symbol vector representation before two entities into h e1start ,h e2start Splicing the two special symbolic representations together as an initial relational vector representation
Step three: a denser relation expression vector h' is obtained after the full connection layer is adopted; carrying out cosine similarity calculation on the relation expression h' and relation expressions of other samples; finally, sorting sample sequences of other samples belonging to the same relation with the training samples according to the sequence from low to high in similarity, sorting sample sequences of other samples belonging to different relations according to the sequence from high to low in similarity, and creating two candidate dictionaries for positive and negative examples from global angle
Step four: training of the contrast learning loss function and the cross entropy loss function is performed, so that more accurate relation representation is obtained.
For the existing supervised contrast learning, positive examples and negative examples are usually constructed only from a local angle, namely, a sample which belongs to the same relation with the training sample in the batch is selected as the positive example, and other samples except the training sample are selected as the negative example. Specifically, there are N samples { x } in one batch i ,y i } i=1,..N Training N yi Representing tags y belonging to the same relationship in batch i Is a sample of the total number of samples. For training sample x i In one batch with x i Belonging to the same categoryA relation (i.e. y i =y j ) Is of sample x of (2) j Comprising x i Is itself in common withTraining sample and sample x belonging to the same relationship j Can form a positive example pair, thus the positive example pair has +>For each pair. Negative example pair is then obtained by training sample x i Respectively with all other samples x in batch k The composition is N-1 pairs. Wherein Φ (g) represents an output representation of the sentence obtained by model encoding. τ > 0 is an adjustable scalar temperature parameter. Contrast learning training target formula L based on thought composition base-CL The following is shown:
because the selection of the training sample in one batch is random and influenced by the super parameter of the batch size, the construction effect of positive and negative examples is not stable only from the local angle, the training sample cannot find the positive examples which belong to the same relation with the positive examples in the batch, the training of the contrast learning loss function cannot be performed, meanwhile, the samples which are easy to be misclassified (namely, the positive examples with low similarity and the negative examples with high similarity with the training sample) are difficult to ensure the training of the same batch as the training sample and are selected to participate in the contrast learning, and therefore, the invention increases the construction of the positive examples and the negative examples of the global angle on the basis of the conventional supervised contrast learning positive examples and negative examples. The built contrast learning training model based on the global angle and the local angle ensures the randomness of positive examples and negative examples from the local angle, ensures that each training sample participates in the contrast learning training from the global angle, and meanwhile pertinently selects samples which are easy to be classified by mistake to participate in the contrast learning training.
Specifically, the method comprisesThe positive example is added with a sample which belongs to the same relation with the training sample and has the lowest similarityPositive example number of numbers is from->To become->More importantly, the design ensures that each training sample in the Batch is not influenced by the Batch size and sample randomness, and has corresponding positive examples so that the training samples can participate in the training of the contrast learning loss function; negative example according to super parameter k - The setting of (2) increases Top k which belongs to different relations with the training sample and has highest similarity - Sample->The global positive example is also added into the negative example, so that the purpose of learning positive and accurate positive and negative examples by stripping the positive example from the sample is achieved, and the number of the negative sample pairs is changed from N to N+k - And each. The positive and negative examples of the global angle are added, so that the selection of the loss function is not limited to one batch, meanwhile, the samples which are easy to be misclassified are selected pertinently to form the positive and negative examples to participate in the training of the contrast learning loss function, the information in the correct samples can be learned more pertinently from the global angle, the error information contained in the existing relation expression can be corrected, the coverage range of each relation expression is further increased by the model, and the correctness of each relation expression is guaranteed. The final contrast learning loss function is shown below:
wherein,,total (i) packageComprises three important parts, namely, 1) positive sample +.>2) Top k belonging to different relations and having highest similarity - Negative samples->3) Other samples x in Batch than training samples 1 ,..x i-1 ,x i+1 ,...x N . Thus, total (i) has k- +N samples in Total.
Furthermore, to inherit the strong understanding capabilities of BERT, we also add cross entropy loss functions to our model to help the model learn the correct relational representation. We represent this training goal as L CE . Wherein y is i,c A true relationship tag representing an i-th sentence,the model representing the i-th sentence outputs probabilities of belonging to the c-th relation. The sentence is passed through the pre-training model BERT and full connection layer to obtain relationship expression vector h', and passed through multi-class classifier Softmax to obtain
In LabelsCL, our training goals consist of two parts, including cross entropy loss and contrast learning loss. Lambda is a scalar weighted hyper-parameter used to adjust the weights of different training objectives. Total loss function L total The specific formula is shown below, which is a weighted average of two loss functions:
L total =(1-λ)L CE +λL LabelsCL
the invention is carried out under the hardware environment of display card RTX5000 and display memory of 16G. The system is Ubuntu20.04, the development language is Python3.7, and the deep learning framework is Pytorch1.8. The specific parameter settings are shown in table 1.
TABLE 1
The invention adopts an English data set Semeval-2010Task 8 as a data set extracted by a supervision relation. The dataset was downloaded from OpenNRE, the specific details of which are shown in table 2. The data set has 9+1 relations, wherein 9 relations are bidirectional, such as ' Component-white (e 1, e 2) ' Component-white (e 2, e 1) ', when the relations are the same but the head entity and the tail entity are sequentially exchanged, the two relation types are considered, and a special relation ' Other ' is unidirectional, and no entity-to-sequence conversion relation type change exists.
TABLE 2
Meanwhile, in order to simulate the situation that the data is more scarce, training sets in the data set are further divided according to the proportion, wherein the dividing proportion is 1%, 10% and 100%, and the number of specific samples is shown in table 3.
TABLE 3 Table 3
The invention adopts an evaluation index PRF value to analyze and evaluate the experimental result, wherein P is the accuracy (Precision), R is the Recall rate (Recall), and F is the F1 value. The F1 value is obtained by macroaveraging the harmonic mean of P and R to reflect the model comprehensive performance, and the formula is expressed as F1-micro=2×p×r/(p+r).
Model LabelsCL and MTB, CP, BERT, roBERTa, ERICA of the invention BERT 、ERICA RoBERTa FineCL was compared. In order to keep fairness, the result of the invention adopts the same strategy with ERICA and FineCL, the final result is an average value running 5 times and the seed settings are completely the same, and the specific values of the seed settings are respectively: 42, 43, 44, 45, 46. The results are shown in Table 5, and the results obtained by training on different proportions of the training data amount are shown in Table 5. When the training data proportion is 10% and 100%, the effect is better than that of other models, and the average F1 value reaches 81.7% and 88.9% respectively; however, the training effect of the model of the invention is not obvious when the training data is 1%. Because the model of the invention does not adopt a training strategy of pre-training on a large-scale unsupervised or semi-supervised relation extraction data set and then fine-tuning on semval-2010, the model directly obtains a relation representation through a BERT pre-training model and then directly fine-tunes on semval-2010. Therefore, it is presumed that when 1% training data is used, there is a problem that the data amount is insufficient and the expression inclusion range is not wide enough, and this problem leads to unsatisfactory results. The comparative results of each model are shown in Table 4.
TABLE 4 Table 4
The invention designs a contrast learning model, which selects positive and negative examples from two angles: 1) Global angle, selecting positive and negative examples by calculating the similarity maximum value; 2) Local angle is determined by selecting a sample in the Batch that has the same relationship with the training sample as a positive example, and selecting all other samples in the Batch except the training sample as negative examples.
TABLE 5
All experimental results in table 5 were completed at the same seed setting. The overall positive and negative example selection pattern F1 value for global + is 88.9%. When only a local mode is adopted, the F1 value is reduced by 0.7 percent; when only global mode is used, the F1 value is reduced by 0.3%. The overall situation is that samples which are most prone to be subjected to error classification are correctly classified, and the overall situation and the local learned representation are different; local is to correctly sort any sample that is randomly selected.
The purpose of this experiment was to observe the effect of hyper-parameters on the results. k (k) - Representing the number of negative examples selected in a global mode, namely the top k which does not belong to the same relation with the training sample but has the highest similarity - Samples. Under the premise that only the sample with the lowest similarity in the same relation with the training sample is selected as the positive example in the global angle, the optimal negative example number of the global angle is explored. As shown in FIG. 2, when k - The result is best when=2. As can be seen from the experimental results, not k - The greater the effect, the better, when k - When the model is oversized, the model trained through the training set may have the phenomenon of overfitting, and does not have good generalization capability; when k is - If the model is too small, the model is not learned sufficiently, and different relational expressions cannot be classified well.
The size of the Batch size greatly affects the number of samples that are selected from a local perspective, positive and negative examples. However, as can be seen from the experimental results in FIG. 3, the larger the not-Batch size, the better the model effect. The F1 value when the Batch size=16 is 0.2% higher than the F1 value when the Batch size=32, so that the model can learn a good effect under the condition of smaller Batch size, and the practicability of the model is shown.
In order to observe whether the Semeval2010 data set reaches a training target after training on the comparison learning model designed by the invention, t-SNE is used for visualizing the relation representation before and after learning in the experiment four, so that the representation distribution condition before and after training can be intuitively seen.
The invention selects four relations in the Semeval2010 data set for display, and selects 4 relations in the file 'rel 2id. Json' as follows: the 4 numbers corresponding to 1, 3, 9, 12 in FIGS. 4 and 5 are the relationships "Component-white (e 1, e 2)": 1, "Component-white (e 2, e 1)": 12, "membrane-Collection (e 1, e 2)": 3, "membrane-Collection (e 2, e 1)": 9, 4.
Meanwhile, in the formed triples (head entity, relation and tail entity) corresponding to each other, the relation represented by the relation 1 and 12 belongs to the same relation, but the sequence of the head entity and the tail entity is opposite, as shown in fig. 4, the initial representation (namely the representation obtained by the example after the BERT pre-training model) obtained before training is very close and is very easy to confuse. However, after the comparison learning model designed by the invention is trained (see fig. 5), the problems can be solved well.
The invention provides a supervised relation extraction model based on label comparison learning, which directly utilizes labels with supervised data to construct positive and negative examples of comparison learning from two angles of global and local, thereby greatly reducing training time and cost.
The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (8)
1. A supervised relation extraction method based on label comparison learning is characterized by comprising the following steps:
obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols;
the vectors with the special symbols are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation is processed through a full connection layer to obtain a second relation vector representation;
constructing positive and negative examples based on the second relation vector representation;
and confirming the loss function based on the positive and negative examples, training the comparison relation, obtaining an encoder for identifying the relation representation, and extracting the relation by adopting the encoder.
2. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the process for constructing the positive and negative examples comprises the following steps: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.
3. The supervised relationship extraction method based on label contrast learning as recited in claim 2,
the process of constructing the global positive and negative example candidate dictionary further comprises: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.
4. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
first loss function L in contrast relationship training process LabelsCL :
In the formula, total (i) is the Total number of samples in the sample set, N yi Is shown inThe tags belonging to the same relationship y in batch i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.
5. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the training process of the contrast relationship comprises the following steps: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.
6. The supervised relationship extraction method based on label contrast learning as recited in claim 4,
the second loss function is expressed as follows:
7. The supervised relationship extraction method based on label contrast learning as recited in claim 6,
the total loss function is:
L total =(1-λ)L CE +λL LabelsCL
where λ is a scalar weighted hyper-parameter.
8. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the process of adding special symbols includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310410923.3A CN116431831B (en) | 2023-04-18 | 2023-04-18 | Supervised relation extraction method based on label contrast learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310410923.3A CN116431831B (en) | 2023-04-18 | 2023-04-18 | Supervised relation extraction method based on label contrast learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116431831A true CN116431831A (en) | 2023-07-14 |
CN116431831B CN116431831B (en) | 2023-09-22 |
Family
ID=87088738
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310410923.3A Active CN116431831B (en) | 2023-04-18 | 2023-04-18 | Supervised relation extraction method based on label contrast learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116431831B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084790A (en) * | 2020-09-24 | 2020-12-15 | 中国民航大学 | Relation extraction method and system based on pre-training convolutional neural network |
CN113011427A (en) * | 2021-03-17 | 2021-06-22 | 中南大学 | Remote sensing image semantic segmentation method based on self-supervision contrast learning |
CN114386437A (en) * | 2022-01-13 | 2022-04-22 | 延边大学 | Mid-heading translation quality estimation method and system based on cross-language pre-training model |
CN115270761A (en) * | 2022-07-28 | 2022-11-01 | 中国人民解放军国防科技大学 | Relation extraction method fusing prototype knowledge |
CN115496072A (en) * | 2022-09-19 | 2022-12-20 | 重庆中国三峡博物馆 | Relation extraction method based on comparison learning |
CN115630164A (en) * | 2022-10-14 | 2023-01-20 | 匀熵智能科技(无锡)有限公司 | Remote supervision relation extraction method based on positive and negative direction joint learning and prototype representation |
-
2023
- 2023-04-18 CN CN202310410923.3A patent/CN116431831B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084790A (en) * | 2020-09-24 | 2020-12-15 | 中国民航大学 | Relation extraction method and system based on pre-training convolutional neural network |
CN113011427A (en) * | 2021-03-17 | 2021-06-22 | 中南大学 | Remote sensing image semantic segmentation method based on self-supervision contrast learning |
CN114386437A (en) * | 2022-01-13 | 2022-04-22 | 延边大学 | Mid-heading translation quality estimation method and system based on cross-language pre-training model |
CN115270761A (en) * | 2022-07-28 | 2022-11-01 | 中国人民解放军国防科技大学 | Relation extraction method fusing prototype knowledge |
CN115496072A (en) * | 2022-09-19 | 2022-12-20 | 重庆中国三峡博物馆 | Relation extraction method based on comparison learning |
CN115630164A (en) * | 2022-10-14 | 2023-01-20 | 匀熵智能科技(无锡)有限公司 | Remote supervision relation extraction method based on positive and negative direction joint learning and prototype representation |
Non-Patent Citations (2)
Title |
---|
孟先艳: ""基于双向长短时记忆单元和卷...经网络的多语种文本分类方法"", 《计算机应用研究》, vol. 37, no. 9 * |
王鼎乾: "基于深度学习的有监督实体关系抽取方法对比研究", 《计算机应用》, no. 7 * |
Also Published As
Publication number | Publication date |
---|---|
CN116431831B (en) | 2023-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Representation learning of knowledge graphs with hierarchical types. | |
Gao et al. | Neural snowball for few-shot relation learning | |
CN108829801B (en) | Event trigger word extraction method based on document level attention mechanism | |
CN111950269A (en) | Text statement processing method and device, computer equipment and storage medium | |
CN111325264A (en) | Multi-label data classification method based on entropy | |
CN111309918A (en) | Multi-label text classification method based on label relevance | |
CN113434858A (en) | Malicious software family classification method based on disassembly code structure and semantic features | |
CN117273134A (en) | Zero-sample knowledge graph completion method based on pre-training language model | |
Wu et al. | Knowledge representation via joint learning of sequential text and knowledge graphs | |
CN112948588B (en) | Chinese text classification method for quick information editing | |
CN111191033A (en) | Open set classification method based on classification utility | |
Gao et al. | REPRESENTATION LEARNING OF KNOWLEDGE GRAPHS USING CONVOLUTIONAL NEURAL NETWORKS. | |
Yue et al. | Similarity Makes Difference: SSHTN for Generalized Zero-Shot Industrial Fault Diagnosis by Leveraging Auxiliary Set | |
CN116431831B (en) | Supervised relation extraction method based on label contrast learning | |
Deng et al. | Chinese triple extraction based on bert model | |
Liu et al. | Learning term embeddings for lexical taxonomies | |
Xu et al. | Multi text classification model based on bret-cnn-bilstm | |
Fan et al. | Multi-label Chinese question classification based on word2vec | |
Wang et al. | Dior: Learning to hash with label noise via dual partition and contrastive learning | |
Dyballa et al. | A separability-based approach to quantifying generalization: which layer is best? | |
CN115034213B (en) | Joint learning-based method for recognizing prefix and suffix negative words | |
Su et al. | Ensemble learning for question classification | |
Wu et al. | Intelligent Text Location Based on Multi Model Fusion | |
Tang et al. | None-Negative Graph Contrastive Learning for Knowledge-Driven Zero-Shot Learning | |
Wang et al. | Supervised Relation Extraction Based on Labels Contrastive Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |