CN116431831A - Supervised relation extraction method based on label contrast learning - Google Patents

Supervised relation extraction method based on label contrast learning Download PDF

Info

Publication number
CN116431831A
CN116431831A CN202310410923.3A CN202310410923A CN116431831A CN 116431831 A CN116431831 A CN 116431831A CN 202310410923 A CN202310410923 A CN 202310410923A CN 116431831 A CN116431831 A CN 116431831A
Authority
CN
China
Prior art keywords
relation
positive
training
samples
supervised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310410923.3A
Other languages
Chinese (zh)
Other versions
CN116431831B (en
Inventor
赵亚慧
王苑儒
金国哲
崔荣一
刘帆
任一平
徐培焱
李永恒
孟嘉
王乐
孙烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanbian University
Original Assignee
Yanbian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanbian University filed Critical Yanbian University
Priority to CN202310410923.3A priority Critical patent/CN116431831B/en
Publication of CN116431831A publication Critical patent/CN116431831A/en
Application granted granted Critical
Publication of CN116431831B publication Critical patent/CN116431831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/28Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a supervised relation extraction method based on label comparison learning, which comprises the following steps: obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols; the vectors are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation obtains a second relation vector representation through full connection layer processing; constructing positive and negative examples based on the second relation vector representation; and confirming the loss function based on the positive and negative examples and training the comparison relation to obtain the encoder capable of identifying the relation representation more accurately. The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of positive and negative example selection, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.

Description

Supervised relation extraction method based on label contrast learning
Technical Field
The invention belongs to the field of natural language processing in computer intelligent information processing, and particularly relates to a supervised relation extraction method based on label comparison learning.
Background
The rapid growth of the internet has led to an explosive growth of information, how to make efficient use of this information is a major task of information extraction (Information Extraction, IE) technology. Relationship extraction (Relation Extraction, RE) is a main task in the field of information extraction, and aims to identify semantic relationships contained between target entities from unstructured text and apply the semantic relationships to other downstream tasks such as event extraction, machine translation, knowledge graph, sentence matching and the like. The problem to be solved for supervised relational extraction is how to more efficiently utilize a limited amount of supervised data. The current strategy commonly adopted is to pretrain on large unsupervised or semi-supervised data sets and fine tune on supervised data sets. This training approach fails to fully exploit the tag information in the dataset and suffers from disjoint training goals of the pre-training model and the downstream tasks.
For the supervised relation extraction task, the head and tail entities in the sentence have been labeled and the tag class of the sentence is known. Thus, the supervised relation extraction task can be seen as a multi-classification problem for annotated sentences. For the task of supervised relation extraction, the key to the problem is how to obtain a more correct and rich relation representation from sentences for relation classification.
Zhang et al uses RNN for feature extraction to accomplish the relationship extraction task; the method uses Bi-LSTM as a feature extractor to extract text features, and captures important features in the text by combining an attention mechanism to jointly complete a relation extraction task. The relationship representation obtained through deep learning is limited in captured information, and the pre-training language model trained through large-scale data provides more possibility for the relationship extraction task. For example, wu and the like use a pre-training language model BERT to perform feature extraction to complete a relation extraction task, but simple concatenation of special symbol [ CLS ], head entity and tail entity representations is used as input, and training is performed through a full-connection+softmax model, and the training mode cannot fully mine relation representation information required in sentences. Chen et al construct positive and negative examples of contrast learning by combining packet level and sentence level, namely, data enhancement is carried out on an original sentence by replacing/inserting words with low TF-IDF scores, the sentence after data enhancement and the original sentence form positive example pairs, and representations of other packets and the original sentence form negative example pairs by randomly selecting. This training approach requires the construction of two hierarchical relational representations, sentence-level and package-level, which are complex to construct and difficult to interact effectively, and the package-level representations may lose much of the information of the sentence-level representations.
The core idea of contrast learning is to learn the similarity and the difference between samples. The common contrast learning implementation flow is: firstly, carrying out data enhancement on an original sentence to obtain a sentence with enhanced data; secondly, inputting the original sentence and the enhanced sentence into a model; and finally, taking the original sentence and the sentence with the enhanced data as positive example pairs, and taking the original sentence and other sentences as negative example pairs to carry out contrast learning training.
Based on this structure, a plurality of classical comparative learning models are generated. For example, sim-CLR improves contrast learning by using larger batch sizes and data enhancements; moCo increases the number of positive and negative examples participated in each training by constructing a dynamic dictionary for comparison learning under the condition of not increasing the burden of a model, thereby obtaining a better training encoder. Meanwhile, the contrast learning is widely applied to the relation extraction task. For example, the HiCLRE uses a method combining data enhancement and multi-granularity representation, which comprises three steps of packet level, sentence level and entity level, and performs contrast learning training on the three levels respectively, and performs interaction on the three levels at the same time, so that richer and accurate information representation integrating different granularities can be obtained; the HiURE obtains two reinforced sentences by using a data enhancement mode of Random Span, on the basis, the representations belonging to the same category and different categories with the trained sentences on semantic representation are obtained by a hierarchical clustering mode, positive and negative example pairs are respectively formed with the training sentences for training of contrast learning, and more accurate relation representation is obtained.
The models adopt a common contrast learning construction mode, namely positive and negative examples are constructed in a data enhancement mode. In order to ensure that the data-enhanced sentence and the original sentence belong to the same relation, the data-enhanced sentence and the original sentence are very close in sentence representation. Therefore, the relationship expression range obtained by training in this way is not wide enough, and it is easy to train only as a negative example of other sentence errors belonging to the same relationship as the training sample. Therefore, when Khosla et al apply contrast learning to supervised data, positive and negative examples are selected from the same Batch by the label. However, examples that are prone to misclassification, such as positive examples with low similarity to training samples and negative examples with high similarity, are still difficult to train.
Disclosure of Invention
The invention aims to provide a supervised relation extraction method based on label comparison learning, so as to solve the problems in the prior art.
In order to achieve the above object, the present invention provides a supervised relationship extraction method based on label comparison learning, including:
obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols;
the vectors with the special symbols are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation is processed through a full connection layer to obtain a second relation vector representation;
constructing positive and negative examples based on the second relation vector representation;
and confirming the loss function based on the positive and negative examples, training the comparison relation, obtaining an encoder for identifying the relation representation, and extracting the relation by adopting the encoder.
Optionally, the process of constructing the positive and negative examples includes: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.
Optionally, the process of constructing the global positive and negative example candidate dictionary further includes: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.
Optionally, a first loss function L in the comparative relationship training process LabeisCL
Figure BDA0004183126270000041
Where Total (i) is the Total number of samples in the sample set,
Figure BDA0004183126270000042
representing tags y belonging to the same relationship in batch i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.
Optionally, the training process of the contrast relationship includes: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.
Optionally, the second loss function is expressed as follows:
Figure BDA0004183126270000043
wherein y is i,c A true relationship tag representing an i-th sentence,
Figure BDA0004183126270000044
the model representing the i-th sentence outputs probabilities of belonging to the c-th relation.
Optionally, the total loss function is:
L total =(1-λ)L CE +λL LabelsCL
where λ is a scalar weighted hyper-parameter.
Optionally, the adding the special symbol includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.
The invention has the technical effects that:
the invention directly utilizes the label with the supervision data to construct the positive and negative examples of the contrast learning from the global and local angles, thereby greatly reducing the training time and the cost; the invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method in an embodiment of the invention;
FIG. 2 shows different k in an embodiment of the invention - Corresponding F1-micro value diagrams;
FIG. 3 shows F1-micro values corresponding to different batch_size values in an embodiment of the present invention;
FIG. 4 is a t-SNE profile of a sample representation prior to training in an embodiment of the present invention;
FIG. 5 is a t-SNE profile of a post-training sample representation in an embodiment of the present invention.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
As shown in fig. 1-5, the present embodiment provides a supervised relationship extraction method based on label contrast learning, which includes:
step one: special symbol [ CLS ] is added before and after sentences]、[SEP]Special symbol < e1 is added before and after two entities in sentence start >、<e1 end >、<e2 start >、<e2 end The sentence after processing is passed through the embedding layer to obtain sentence vector representation;
step two: passing the sentence through BERT coding layer, and changing the special symbol vector representation before two entities into h e1start ,h e2start Splicing the two special symbolic representations together as an initial relational vector representation
Figure BDA0004183126270000061
Step three: a denser relation expression vector h' is obtained after the full connection layer is adopted; carrying out cosine similarity calculation on the relation expression h' and relation expressions of other samples; finally, sorting sample sequences of other samples belonging to the same relation with the training samples according to the sequence from low to high in similarity, sorting sample sequences of other samples belonging to different relations according to the sequence from high to low in similarity, and creating two candidate dictionaries for positive and negative examples from global angle
Figure BDA0004183126270000062
Step four: training of the contrast learning loss function and the cross entropy loss function is performed, so that more accurate relation representation is obtained.
For the existing supervised contrast learning, positive examples and negative examples are usually constructed only from a local angle, namely, a sample which belongs to the same relation with the training sample in the batch is selected as the positive example, and other samples except the training sample are selected as the negative example. Specifically, there are N samples { x } in one batch i ,y i } i=1,..N Training N yi Representing tags y belonging to the same relationship in batch i Is a sample of the total number of samples. For training sample x i In one batch with x i Belonging to the same categoryA relation (i.e. y i =y j ) Is of sample x of (2) j Comprising x i Is itself in common with
Figure BDA0004183126270000065
Training sample and sample x belonging to the same relationship j Can form a positive example pair, thus the positive example pair has +>
Figure BDA0004183126270000064
For each pair. Negative example pair is then obtained by training sample x i Respectively with all other samples x in batch k The composition is N-1 pairs. Wherein Φ (g) represents an output representation of the sentence obtained by model encoding. τ > 0 is an adjustable scalar temperature parameter. Contrast learning training target formula L based on thought composition base-CL The following is shown:
Figure BDA0004183126270000063
because the selection of the training sample in one batch is random and influenced by the super parameter of the batch size, the construction effect of positive and negative examples is not stable only from the local angle, the training sample cannot find the positive examples which belong to the same relation with the positive examples in the batch, the training of the contrast learning loss function cannot be performed, meanwhile, the samples which are easy to be misclassified (namely, the positive examples with low similarity and the negative examples with high similarity with the training sample) are difficult to ensure the training of the same batch as the training sample and are selected to participate in the contrast learning, and therefore, the invention increases the construction of the positive examples and the negative examples of the global angle on the basis of the conventional supervised contrast learning positive examples and negative examples. The built contrast learning training model based on the global angle and the local angle ensures the randomness of positive examples and negative examples from the local angle, ensures that each training sample participates in the contrast learning training from the global angle, and meanwhile pertinently selects samples which are easy to be classified by mistake to participate in the contrast learning training.
Specifically, the method comprisesThe positive example is added with a sample which belongs to the same relation with the training sample and has the lowest similarity
Figure BDA0004183126270000071
Positive example number of numbers is from->
Figure BDA0004183126270000072
To become->
Figure BDA0004183126270000073
More importantly, the design ensures that each training sample in the Batch is not influenced by the Batch size and sample randomness, and has corresponding positive examples so that the training samples can participate in the training of the contrast learning loss function; negative example according to super parameter k - The setting of (2) increases Top k which belongs to different relations with the training sample and has highest similarity - Sample->
Figure BDA0004183126270000074
The global positive example is also added into the negative example, so that the purpose of learning positive and accurate positive and negative examples by stripping the positive example from the sample is achieved, and the number of the negative sample pairs is changed from N to N+k - And each. The positive and negative examples of the global angle are added, so that the selection of the loss function is not limited to one batch, meanwhile, the samples which are easy to be misclassified are selected pertinently to form the positive and negative examples to participate in the training of the contrast learning loss function, the information in the correct samples can be learned more pertinently from the global angle, the error information contained in the existing relation expression can be corrected, the coverage range of each relation expression is further increased by the model, and the correctness of each relation expression is guaranteed. The final contrast learning loss function is shown below:
Figure BDA0004183126270000081
wherein,,
Figure BDA0004183126270000082
total (i) packageComprises three important parts, namely, 1) positive sample +.>
Figure BDA0004183126270000083
2) Top k belonging to different relations and having highest similarity - Negative samples->
Figure BDA0004183126270000084
3) Other samples x in Batch than training samples 1 ,..x i-1 ,x i+1 ,...x N . Thus, total (i) has k- +N samples in Total.
Furthermore, to inherit the strong understanding capabilities of BERT, we also add cross entropy loss functions to our model to help the model learn the correct relational representation. We represent this training goal as L CE . Wherein y is i,c A true relationship tag representing an i-th sentence,
Figure BDA0004183126270000085
the model representing the i-th sentence outputs probabilities of belonging to the c-th relation. The sentence is passed through the pre-training model BERT and full connection layer to obtain relationship expression vector h', and passed through multi-class classifier Softmax to obtain
Figure BDA0004183126270000086
Figure BDA0004183126270000087
In LabelsCL, our training goals consist of two parts, including cross entropy loss and contrast learning loss. Lambda is a scalar weighted hyper-parameter used to adjust the weights of different training objectives. Total loss function L total The specific formula is shown below, which is a weighted average of two loss functions:
L total =(1-λ)L CE +λL LabelsCL
the invention is carried out under the hardware environment of display card RTX5000 and display memory of 16G. The system is Ubuntu20.04, the development language is Python3.7, and the deep learning framework is Pytorch1.8. The specific parameter settings are shown in table 1.
TABLE 1
Figure BDA0004183126270000088
Figure BDA0004183126270000091
The invention adopts an English data set Semeval-2010Task 8 as a data set extracted by a supervision relation. The dataset was downloaded from OpenNRE, the specific details of which are shown in table 2. The data set has 9+1 relations, wherein 9 relations are bidirectional, such as ' Component-white (e 1, e 2) ' Component-white (e 2, e 1) ', when the relations are the same but the head entity and the tail entity are sequentially exchanged, the two relation types are considered, and a special relation ' Other ' is unidirectional, and no entity-to-sequence conversion relation type change exists.
TABLE 2
Figure BDA0004183126270000092
Meanwhile, in order to simulate the situation that the data is more scarce, training sets in the data set are further divided according to the proportion, wherein the dividing proportion is 1%, 10% and 100%, and the number of specific samples is shown in table 3.
TABLE 3 Table 3
Figure BDA0004183126270000093
The invention adopts an evaluation index PRF value to analyze and evaluate the experimental result, wherein P is the accuracy (Precision), R is the Recall rate (Recall), and F is the F1 value. The F1 value is obtained by macroaveraging the harmonic mean of P and R to reflect the model comprehensive performance, and the formula is expressed as F1-micro=2×p×r/(p+r).
Model LabelsCL and MTB, CP, BERT, roBERTa, ERICA of the invention BERT 、ERICA RoBERTa FineCL was compared. In order to keep fairness, the result of the invention adopts the same strategy with ERICA and FineCL, the final result is an average value running 5 times and the seed settings are completely the same, and the specific values of the seed settings are respectively: 42, 43, 44, 45, 46. The results are shown in Table 5, and the results obtained by training on different proportions of the training data amount are shown in Table 5. When the training data proportion is 10% and 100%, the effect is better than that of other models, and the average F1 value reaches 81.7% and 88.9% respectively; however, the training effect of the model of the invention is not obvious when the training data is 1%. Because the model of the invention does not adopt a training strategy of pre-training on a large-scale unsupervised or semi-supervised relation extraction data set and then fine-tuning on semval-2010, the model directly obtains a relation representation through a BERT pre-training model and then directly fine-tunes on semval-2010. Therefore, it is presumed that when 1% training data is used, there is a problem that the data amount is insufficient and the expression inclusion range is not wide enough, and this problem leads to unsatisfactory results. The comparative results of each model are shown in Table 4.
TABLE 4 Table 4
Figure BDA0004183126270000101
The invention designs a contrast learning model, which selects positive and negative examples from two angles: 1) Global angle, selecting positive and negative examples by calculating the similarity maximum value; 2) Local angle is determined by selecting a sample in the Batch that has the same relationship with the training sample as a positive example, and selecting all other samples in the Batch except the training sample as negative examples.
TABLE 5
Figure BDA0004183126270000111
All experimental results in table 5 were completed at the same seed setting. The overall positive and negative example selection pattern F1 value for global + is 88.9%. When only a local mode is adopted, the F1 value is reduced by 0.7 percent; when only global mode is used, the F1 value is reduced by 0.3%. The overall situation is that samples which are most prone to be subjected to error classification are correctly classified, and the overall situation and the local learned representation are different; local is to correctly sort any sample that is randomly selected.
The purpose of this experiment was to observe the effect of hyper-parameters on the results. k (k) - Representing the number of negative examples selected in a global mode, namely the top k which does not belong to the same relation with the training sample but has the highest similarity - Samples. Under the premise that only the sample with the lowest similarity in the same relation with the training sample is selected as the positive example in the global angle, the optimal negative example number of the global angle is explored. As shown in FIG. 2, when k - The result is best when=2. As can be seen from the experimental results, not k - The greater the effect, the better, when k - When the model is oversized, the model trained through the training set may have the phenomenon of overfitting, and does not have good generalization capability; when k is - If the model is too small, the model is not learned sufficiently, and different relational expressions cannot be classified well.
The size of the Batch size greatly affects the number of samples that are selected from a local perspective, positive and negative examples. However, as can be seen from the experimental results in FIG. 3, the larger the not-Batch size, the better the model effect. The F1 value when the Batch size=16 is 0.2% higher than the F1 value when the Batch size=32, so that the model can learn a good effect under the condition of smaller Batch size, and the practicability of the model is shown.
In order to observe whether the Semeval2010 data set reaches a training target after training on the comparison learning model designed by the invention, t-SNE is used for visualizing the relation representation before and after learning in the experiment four, so that the representation distribution condition before and after training can be intuitively seen.
The invention selects four relations in the Semeval2010 data set for display, and selects 4 relations in the file 'rel 2id. Json' as follows: the 4 numbers corresponding to 1, 3, 9, 12 in FIGS. 4 and 5 are the relationships "Component-white (e 1, e 2)": 1, "Component-white (e 2, e 1)": 12, "membrane-Collection (e 1, e 2)": 3, "membrane-Collection (e 2, e 1)": 9, 4.
Meanwhile, in the formed triples (head entity, relation and tail entity) corresponding to each other, the relation represented by the relation 1 and 12 belongs to the same relation, but the sequence of the head entity and the tail entity is opposite, as shown in fig. 4, the initial representation (namely the representation obtained by the example after the BERT pre-training model) obtained before training is very close and is very easy to confuse. However, after the comparison learning model designed by the invention is trained (see fig. 5), the problems can be solved well.
The invention provides a supervised relation extraction model based on label comparison learning, which directly utilizes labels with supervised data to construct positive and negative examples of comparison learning from two angles of global and local, thereby greatly reducing training time and cost.
The invention provides a supervised comparison learning model for constructing positive and negative examples from two angles of global and local based on labels, which not only considers the correctness of selection of the positive and negative examples, but also ensures that error-prone examples are trained, thereby obtaining richer and more accurate relation expression.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A supervised relation extraction method based on label comparison learning is characterized by comprising the following steps:
obtaining sentences to be subjected to relation extraction in a sample set, adding special symbols, and carrying out vector representation on the processed sentences through a coding layer to obtain vectors with the special symbols;
the vectors with the special symbols are spliced by taking the entities in the sentences as marks and respectively selecting special symbol representations before each entity, so that a first relation vector representation is obtained; the first relation vector representation is processed through a full connection layer to obtain a second relation vector representation;
constructing positive and negative examples based on the second relation vector representation;
and confirming the loss function based on the positive and negative examples, training the comparison relation, obtaining an encoder for identifying the relation representation, and extracting the relation by adopting the encoder.
2. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the process for constructing the positive and negative examples comprises the following steps: and calculating the similarity of all samples, constructing a global positive and negative example candidate dictionary from the global angle according to the similarity, and constructing local positive and negative examples according to sample labels in the batch.
3. The supervised relationship extraction method based on label contrast learning as recited in claim 2,
the process of constructing the global positive and negative example candidate dictionary further comprises: and calculating cosine similarity through the relation vector representation of the second relation vector representation and the relation vector representations of other samples, sorting other samples belonging to the same relation with sentences to be subjected to relation extraction according to the similarity from low to high to obtain positive samples, sorting other samples belonging to different relations according to the similarity from high to low to obtain negative samples, and constructing global positive and negative examples from a global angle.
4. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
first loss function L in contrast relationship training process LabelsCL
Figure FDA0004183126260000021
In the formula, total (i) is the Total number of samples in the sample set, N yi Is shown inThe tags belonging to the same relationship y in batch i Phi (g) represents the output representation of the sentence after model encoding, τ > 0 is an adjustable scalar temperature parameter, and N is the total number of samples in batch.
5. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the training process of the contrast relationship comprises the following steps: and the positive and negative examples are used for comparing the learning loss function training, and the second relation vector represents the cross entropy loss function training after being processed by the multi-class classifier.
6. The supervised relationship extraction method based on label contrast learning as recited in claim 4,
the second loss function is expressed as follows:
Figure FDA0004183126260000022
wherein y is i,c A true relationship tag representing an i-th sentence,
Figure FDA0004183126260000023
the model representing the i-th sentence outputs probabilities of belonging to the c-th relation.
7. The supervised relationship extraction method based on label contrast learning as recited in claim 6,
the total loss function is:
L total =(1-λ)L CE +λL LabelsCL
where λ is a scalar weighted hyper-parameter.
8. The supervised relationship extraction method based on label contrast learning as recited in claim 1,
the process of adding special symbols includes: firstly, respectively adding special symbols [ CLS ] and [ SEP ] before and after a sentence, and then adding special symbols at two ends of an entity in the sentence.
CN202310410923.3A 2023-04-18 2023-04-18 Supervised relation extraction method based on label contrast learning Active CN116431831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310410923.3A CN116431831B (en) 2023-04-18 2023-04-18 Supervised relation extraction method based on label contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310410923.3A CN116431831B (en) 2023-04-18 2023-04-18 Supervised relation extraction method based on label contrast learning

Publications (2)

Publication Number Publication Date
CN116431831A true CN116431831A (en) 2023-07-14
CN116431831B CN116431831B (en) 2023-09-22

Family

ID=87088738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310410923.3A Active CN116431831B (en) 2023-04-18 2023-04-18 Supervised relation extraction method based on label contrast learning

Country Status (1)

Country Link
CN (1) CN116431831B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network
CN113011427A (en) * 2021-03-17 2021-06-22 中南大学 Remote sensing image semantic segmentation method based on self-supervision contrast learning
CN114386437A (en) * 2022-01-13 2022-04-22 延边大学 Mid-heading translation quality estimation method and system based on cross-language pre-training model
CN115270761A (en) * 2022-07-28 2022-11-01 中国人民解放军国防科技大学 Relation extraction method fusing prototype knowledge
CN115496072A (en) * 2022-09-19 2022-12-20 重庆中国三峡博物馆 Relation extraction method based on comparison learning
CN115630164A (en) * 2022-10-14 2023-01-20 匀熵智能科技(无锡)有限公司 Remote supervision relation extraction method based on positive and negative direction joint learning and prototype representation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084790A (en) * 2020-09-24 2020-12-15 中国民航大学 Relation extraction method and system based on pre-training convolutional neural network
CN113011427A (en) * 2021-03-17 2021-06-22 中南大学 Remote sensing image semantic segmentation method based on self-supervision contrast learning
CN114386437A (en) * 2022-01-13 2022-04-22 延边大学 Mid-heading translation quality estimation method and system based on cross-language pre-training model
CN115270761A (en) * 2022-07-28 2022-11-01 中国人民解放军国防科技大学 Relation extraction method fusing prototype knowledge
CN115496072A (en) * 2022-09-19 2022-12-20 重庆中国三峡博物馆 Relation extraction method based on comparison learning
CN115630164A (en) * 2022-10-14 2023-01-20 匀熵智能科技(无锡)有限公司 Remote supervision relation extraction method based on positive and negative direction joint learning and prototype representation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟先艳: ""基于双向长短时记忆单元和卷...经网络的多语种文本分类方法"", 《计算机应用研究》, vol. 37, no. 9 *
王鼎乾: "基于深度学习的有监督实体关系抽取方法对比研究", 《计算机应用》, no. 7 *

Also Published As

Publication number Publication date
CN116431831B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
Xie et al. Representation learning of knowledge graphs with hierarchical types.
Gao et al. Neural snowball for few-shot relation learning
CN108829801B (en) Event trigger word extraction method based on document level attention mechanism
CN111950269A (en) Text statement processing method and device, computer equipment and storage medium
CN111325264A (en) Multi-label data classification method based on entropy
CN111309918A (en) Multi-label text classification method based on label relevance
CN113434858A (en) Malicious software family classification method based on disassembly code structure and semantic features
CN117273134A (en) Zero-sample knowledge graph completion method based on pre-training language model
Wu et al. Knowledge representation via joint learning of sequential text and knowledge graphs
CN112948588B (en) Chinese text classification method for quick information editing
CN111191033A (en) Open set classification method based on classification utility
Gao et al. REPRESENTATION LEARNING OF KNOWLEDGE GRAPHS USING CONVOLUTIONAL NEURAL NETWORKS.
Yue et al. Similarity Makes Difference: SSHTN for Generalized Zero-Shot Industrial Fault Diagnosis by Leveraging Auxiliary Set
CN116431831B (en) Supervised relation extraction method based on label contrast learning
Deng et al. Chinese triple extraction based on bert model
Liu et al. Learning term embeddings for lexical taxonomies
Xu et al. Multi text classification model based on bret-cnn-bilstm
Fan et al. Multi-label Chinese question classification based on word2vec
Wang et al. Dior: Learning to hash with label noise via dual partition and contrastive learning
Dyballa et al. A separability-based approach to quantifying generalization: which layer is best?
CN115034213B (en) Joint learning-based method for recognizing prefix and suffix negative words
Su et al. Ensemble learning for question classification
Wu et al. Intelligent Text Location Based on Multi Model Fusion
Tang et al. None-Negative Graph Contrastive Learning for Knowledge-Driven Zero-Shot Learning
Wang et al. Supervised Relation Extraction Based on Labels Contrastive Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant