CN117764160A

CN117764160A - Training method and device for relation extraction model, electronic equipment and storage medium

Info

Publication number: CN117764160A
Application number: CN202311651011.1A
Authority: CN
Inventors: 刘强; 王亮; 吴书; 白平; 孙鑫
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-12-04
Filing date: 2023-12-04
Publication date: 2024-03-26

Abstract

The invention provides a training method, a training device, electronic equipment and a storage medium of a relation extraction model, wherein noise sample text and pure sample text are determined from a sample data set; based on the initial relation extraction model, respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text, and based on the noise hiding features, determining a predicted entity relation of the noise sample text; based on the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text, parameter iteration is carried out on the initial relation extraction model to obtain a relation extraction model, the defects that the model training is insufficient and the effect is poor due to the fact that useful information is lost due to discarding of the current noise sample are overcome, the information in the noise sample text is fully utilized through the pseudo tag, so that the model has better learning relation characteristics, the training effect is optimized, and the improvement of the model performance is realized.

Description

Training method and device for relation extraction model, electronic equipment and storage medium

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a training method and apparatus for a relationship extraction model, an electronic device, and a storage medium.

Background

Relationship extraction, which is a key ring of the knowledge graph construction process, digs knowledge by predicting relationships between entities. However, most supervised relational extraction techniques require a large amount of tag data, which is difficult to manually obtain. To address this problem, a remote supervision method is proposed that automatically generates a labeled text corpus by aligning plain text with a knowledge base. However, due to the existence of the annotation process, remote supervision is often faced with the problem of high tag noise in the training data.

To mitigate the noise impact of remote supervision, the relationship extraction model is currently trained using a multi-instance learning framework or a way to modify multi-instance learning so that it can identify the relationship tags at the package level, but it is still not good at precisely mapping each sentence in the package to a sentence tag. In view of this, a remote supervision relation extraction model focusing on sentence level improvement has been proposed at present, but it has a serious problem that when facing noise samples, it chooses to discard all noise samples, and directly filtering noise samples can result in loss of useful information, so that learning of the model is insufficient and training effect is poor.

Disclosure of Invention

The invention provides a training method, a device, electronic equipment and a storage medium for a relation extraction model, which are used for solving the defects that in the prior art, useful information is lost due to discarding of noise samples, so that model training is insufficient and effect is poor.

The invention provides a training method of a relation extraction model, which comprises the following steps:

determining a sample data set, and determining noise sample text and pure sample text from the sample data set, wherein the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts;

based on an initial relation extraction model, respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text, and based on the noise hiding features, determining a predicted entity relation of the noise sample text;

and carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text to obtain a relation extraction model.

According to the training method of the relation extraction model provided by the invention, the initial relation extraction model comprises a feature extraction model, a first classification model and a second classification model, and the predicted entity relation comprises a first predicted entity relation and a second predicted entity relation;

the method for extracting the model based on the initial relation respectively determines the noise hiding characteristic of the noise sample text and the pure hiding characteristic of the pure sample text, and determines the predicted entity relation of the noise sample text based on the noise hiding characteristic comprises the following steps:

respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text based on a feature extraction model in the initial relation extraction model;

determining a first predicted physical relationship of the noise sample text based on a first classification model in the initial relationship extraction model and the noise concealment feature;

a second predicted physical relationship of the noise sample text is determined based on a second classification model in the initial relationship extraction model and the noise concealment feature.

According to the training method of the relation extraction model provided by the invention, the first prediction entity relation of the noise sample text is determined based on the first classification model in the initial relation extraction model and the noise hiding characteristic; determining a second predicted physical relationship for the noise sample text based on a second classification model in the initial relationship extraction model and the noise concealment feature, comprising:

Determining a first predicted entity relationship for the noise sample text based on the first classification model and a first enhancement feature;

determining a second predicted physical relationship for the noise sample text based on the second classification model and a second enhancement feature;

the first enhancement feature and the second enhancement feature are noise concealment features determined by the feature extraction model based on the noise sample text at a first neuron loss rate and a second neuron loss rate, respectively;

the first neuron loss rate is less than the second neuron loss rate.

According to the training method of the relation extraction model provided by the invention, the predicted entity relation based on the noise sample text, and the pure hidden feature and the sample entity relation of the pure sample text, the parameter iteration is carried out on the initial relation extraction model to obtain the relation extraction model, and the method comprises the following steps:

determining a label-free sample loss based on a first predicted entity relationship and a second predicted entity relationship of the predicted entity relationships;

feature mixing is carried out on the pure hidden features of different pure sample texts to obtain a plurality of pure mixed features;

Determining a contrast loss based on each pure mixing feature and the corresponding sample entity relationship thereof;

and carrying out parameter iteration on the initial relation extraction model based on the label-free sample loss and the contrast loss to obtain a relation extraction model.

According to the training method of the relation extraction model provided by the invention, the unlabeled sample loss is determined based on the following formula:

wherein,for no label sample loss, μB is the number of noise sample texts u, +.>Representing a first predicted entity relationship, θ _s,t To use the feature extraction model of the t-th iteration at the second neuron loss rate, ψ _t For the second classification model of the t-th iteration, < >>Representing a second predicted entity relationship, l being a cross entropy loss function,>σ _t (c) And representing the number of noise sample texts with prediction confidence coefficient larger than a threshold value tau in the first prediction entity relation corresponding to the relation class c.

According to the training method of the relation extraction model provided by the invention, the contrast loss is determined based on the following formula:

z _i ＝λz _a +(1-λ)z _b

y _i ∈{y _a ,y _b }

wherein,for contrast loss, N is the number of clean sample texts, λ can be randomly sampled from the Beta distribution, λ ε [0,1]～Beta(α _m ,α _m )，α _m Is the super parameter, z _i Is z _a And z _b Is a mixed characteristic of (a) pure mixed characteristic, z _a And z _b Clean hidden features, y, of different clean sample texts, respectively _a Hiding feature z for purity _a Sample entity relationship, y, of corresponding clean sample text _b Hiding feature z for purity _b Sample entity relationship, y, of corresponding clean sample text _a And y _b All are pure mixed characteristics z _i Corresponding sample entity relationship,/->Representing a sample entity relationship y in a batch _i Number of plain sample texts, z _j For the sample entity relationship y _j Clean hidden features, z, of clean sample text of (1) _r For the sample entity relationship y _r Clean hidden features, y, of clean sample text of (2) _r ＝y _i Or y _r ≠y _i τ is the temperature super-parameter.

According to the training method of the relation extraction model provided by the invention, the noise sample text and the clean sample text are determined from the sample data set, and the training method comprises the following steps:

determining hidden features of each sample text in the sample dataset based on the initial relationship extraction model;

determining a K nearest neighbor diagram corresponding to each sample text based on the hidden characteristics of each sample text;

and determining noise sample text and clean sample text from the sample texts based on the K nearest neighbor diagram, and discarding sample entity relations of the noise sample text.

The invention also provides a training device of the relation extraction model, which comprises the following steps:

the system comprises a determining unit, a determining unit and a processing unit, wherein the determining unit is used for determining a sample data set and determining noise sample text and pure sample text from the sample data set, and the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts;

the prediction unit is used for respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text based on an initial relation extraction model, and determining a predicted entity relation of the noise sample text based on the noise hiding features;

and the training unit is used for carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text, the pure hidden characteristic of the pure sample text and the sample entity relation to obtain a relation extraction model.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the training method of the relation extraction model according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of training a relational extraction model as described in any of the above.

According to the training method, the training device, the electronic equipment and the storage medium of the relation extraction model, a noise sample text and a pure sample text are determined from a sample data set, noise hiding features of the noise sample text and pure hiding features of the pure sample text are respectively determined through an initial relation extraction model, and a predicted entity relation of the noise sample text is determined according to the noise hiding features; according to the predicted entity relation of the noise sample text, the pure hidden characteristic of the pure sample text and the sample entity relation, parameter iteration is carried out on the initial relation extraction model to obtain a relation extraction model, the defects that the model training is insufficient and the effect is poor due to the fact that useful information is lost due to discarding of the current noise sample are overcome, the information in the noise sample text is fully utilized through the pseudo tag, the relation characteristic of the model is better learned, the training effect is optimized, and the improvement of the model performance is achieved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a training method of a relation extraction model provided by the invention;

FIG. 2 is a schematic diagram of a training process of a relational extraction model provided by the invention;

FIG. 3 is a schematic diagram of a training device for a relational extraction model according to the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Currently, most supervised relational extraction techniques require a large amount of tag data, which is often difficult to manually obtain. To address this problem, a remote supervision method is proposed that automatically generates a labeled text corpus by aligning plain text with a knowledge base. For example, if a sentence contains both subjects(s) and objects (o) of a relationship triplet < s, r, o > (< subjects, relationships, objects >), then the remote supervision method will consider the sentence as a valid sample of the relationship triplet < s, r, o >; otherwise, if there is no relation triplet applicable, the sentence is marked as "NA". Whereas remote supervision datasets are often faced with high tag noise problems in training data due to the annotation process.

To mitigate the effects of noise signatures caused by remote supervision, multi-instance learning (Multiple Instance Learning, MILs) frameworks or modified MILs are currently used to train the relationship extraction model. However, although MILs-based techniques have been able to identify package-level relationship tags, they are still not good at precisely mapping each sentence in each package to a sentence tag. In this regard, some studies have focused on sentence-level improvement and empirically demonstrated the inadequacies of the package-level approach in sentence-level evaluation. However, these methods still have a serious problem in that when the noise samples are faced, all the noise samples are simply discarded, and the relationship extraction model is trained by using the selected samples, but the noise samples are filtered directly, so that useful information is lost, and therefore, the model is not learned sufficiently, and the training effect of the model is poor.

In this regard, the invention provides a training method of a relation extraction model under remote supervision of semi-supervised learning, which aims at generating a pseudo tag aiming at a noise sample text, and fully utilizing information in the noise sample text through the pseudo tag so as to enable the model to learn relation characteristics better, thereby optimizing training effect and realizing improvement of performance of the model. Fig. 1 is a flow chart of a training method of a relation extraction model provided by the invention, as shown in fig. 1, the method includes:

step 110, determining a sample data set, and determining noise sample text and pure sample text from the sample data set, wherein the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts;

step 120, determining noise hiding features of the noise sample text and pure hiding features of the pure sample text based on the initial relation extraction model, respectively, and determining a predicted entity relation of the noise sample text based on the noise hiding features;

and 130, carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the pure hidden characteristics and the sample entity relation of the pure sample text to obtain a relation extraction model.

Specifically, before model training is performed, training data, that is, sample text required for model training, and labels corresponding to the sample text need to be determined first, where the labels are entity relationships that refer to relationships between entity pairs in a relationship triplet, that is, sample entity relationships.

Here, the sample text required for training can be obtained through a remote supervision method, that is, the text corpus can be collected in advance, and then the text corpus is aligned with the knowledge base to obtain the text with the label, namely, the sample text, and the label, namely, the sample entity relation of the sample text.

The text is marked by using named entity label marking, namely, the text corpus is matched with the existing relation triples in the knowledge base, so that entity relations in the relation triples are given to the text, and the text with labels, namely, the sample text and the sample entity relation thereof, are obtained.

Here, the text corpus may be news of news, research reports, or the like, which may be collected from news events, journals, wikipedia, or the like. The text corpus may be a corpus of a single type or a single domain, or may be a corpus of multiple types or multiple domains, which is not particularly limited in the embodiment of the present invention.

After the sample text and the sample entity relation are obtained, a training data set can be constructed accordingly and used for model training so as to obtain a relation extraction model after training is completed. Here, the training data set is a sample data set, which includes a plurality of sample texts, and a label of each sample text, i.e. a sample entity relationship.

Further, considering that the sample data set obtained by the remote supervision method generally has a problem of high noise label, that is, the label during remote supervision may bring about noise, for example, there may be multiple entity relationships or no entity relationship between two entities, which may result in a noise label, and the noise label may have an adverse effect on training of a subsequent model. Therefore, in the embodiment of the invention, in order to remove the influence of the noise label caused by remote supervision, the acquired sample data set can be processed to distinguish the noise data from the clean data, so as to obtain the noise sample text and the clean sample text.

Here, the noise sample text refers to a sample text with deviation of the sample entity relationship or corresponding error; and the pure sample text is opposite to the pure sample text, and the pure sample text is the sample text which is correctly corresponding to the sample entity relationship, namely the sample entity relationship is completely corresponding to the sample text.

In consideration of the traditional scheme, when facing noise data, all noise samples are usually discarded directly to avoid interference of the noise samples in the subsequent model training process, but the noise samples are filtered directly to cause loss of useful information, so that learning is insufficient in the subsequent model training process, and the training effect is poor; based on the above, in the embodiment of the invention, after the noise sample text and the pure sample text are obtained by distinguishing, the noise sample text can be subjected to label discarding treatment to discard the labels and reserve the samples, so that the noise sample without the labels is obtained, and the effective information in the noise sample can be completely reserved while avoiding interference.

For distinguishing the noise sample text from the pure sample text, a K nearest neighbor method may be adopted, that is, a K nearest neighbor method is used to distinguish noise data and pure data from the sample data set, and the sample data set is divided into a labeled data set and an unlabeled data set.

The noise robust semi-supervised learning can then be used to learn the relationship extraction capability from the labeled and unlabeled datasets.

It can be understood that after the noise sample text and the pure sample text are obtained, the initial model can be made to learn based on the noise sample text and the pure sample text so as to learn better relation features, so that more accurate output can be performed, more accurate entity relation can be obtained through prediction, model training can be completed, and a trained relation extraction model can be obtained.

Specifically, after distinguishing the noise sample from the pure sample, the noise sample text and the pure sample text may be first processed by using an initial model, that is, extracting hidden features of the noise sample text and the pure sample text by using the initial model, where the initial model is an initial relation extraction model, which is constructed based on a relation extraction model based on a semi-supervised learning framework, and which is oriented to sentence-level remote supervision relation extraction.

Specifically, an initial relation extraction model is used to extract features of the noise sample text and the pure sample text respectively, so as to extract effective text information contained in the noise sample text and the pure sample text to be used for subsequent relation prediction, so that respective hidden features are obtained, that is, the noise sample text and the pure sample text can be respectively input into the initial relation extraction model, and the initial relation extraction model performs feature extraction on the input sample text respectively, so that hidden features of the noise sample text, namely noise hidden features, and hidden features of the pure sample text, namely pure hidden features, are obtained.

In the embodiment of the invention, a pseudo tag can be generated for the noise sample text so as to fully utilize the information in the noise sample text through the pseudo tag, thereby better performing model training.

Specifically, after the noise hiding feature is extracted, pseudo tag generation can be performed according to the noise hiding feature, so as to obtain a pseudo tag of the noise sample text, namely a predicted entity relation of the noise sample text; here, specifically, after obtaining the noise hiding feature of the noise sample text, an initial relation extraction model is applied to perform relation prediction so as to predict the relation between the entities in the noise sample text, thereby obtaining a predicted entity relation.

And then, model training can be carried out according to the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text to obtain a trained entity relation extraction model, wherein the model is learned from a labeled data set in a supervised learning mode, the model is more robust to noise through the supervised learning, and is learned from an unlabeled data set in an unsupervised learning mode, namely, a pseudo-label is used for learning, so that the model can fully utilize information in the noise sample text, thereby fully learning the relation characteristic, optimizing model training and improving model performance.

Specifically, the predicted entity relationship of the noise sample text, the pure hidden feature of the pure sample text and the sample entity relationship can be utilized to measure the learning loss of the model from the labeled data set and the unlabeled data set respectively, so that supervised loss and unsupervised loss are obtained, the initial relationship extraction model is subjected to parameter iteration according to the supervised loss and the unsupervised loss, so that the loss of the model after parameter adjustment in the supervised learning and the unsupervised learning processes can be as small as possible, more accurate relationship features can be learned, and the entity relationship can be correspondingly output for the input text corpus through the accurate relationship features in the subsequent application process.

According to the training method of the relation extraction model, a noise sample text and a pure sample text are determined from a sample data set, noise hiding features of the noise sample text and pure hiding features of the pure sample text are respectively determined through an initial relation extraction model, and a predicted entity relation of the noise sample text is determined according to the noise hiding features; according to the predicted entity relation of the noise sample text, the pure hidden characteristic of the pure sample text and the sample entity relation, parameter iteration is carried out on the initial relation extraction model to obtain a relation extraction model, the defects that the model training is insufficient and the effect is poor due to the fact that useful information is lost due to discarding of the current noise sample are overcome, the information in the noise sample text is fully utilized through the pseudo tag, the relation characteristic of the model is better learned, the training effect is optimized, and the improvement of the model performance is achieved.

Based on the above embodiment, the initial relation extraction model includes a feature extraction model, a first classification model, and a second classification model, and the predicted entity relation includes a first predicted entity relation and a second predicted entity relation; step 120 includes:

determining a first predicted entity relationship of the noise sample text based on a first classification model in the initial relationship extraction model and the noise concealment feature;

Specifically, in step 120, according to the initial relation extraction model, noise concealment features of the noise sample text and pure concealment features of the pure sample text are respectively determined, and a process of determining a predicted entity relation of the noise sample text based on the noise concealment features specifically includes:

fig. 2 is a schematic diagram of a training process of a relation extraction model provided by the present invention, and as shown in fig. 2, an initial relation extraction model includes a feature extraction model θ, a first classification model Φ, and a second classification model ψ, where the feature extraction model θ is used to extract hidden features of a noise sample text y and a pure sample text x, so as to obtain noise hidden features of the noise sample text u, and pure hidden features of the pure sample text x.

Correspondingly, compared with the three classification heads of the traditional model, the initial relation extraction model in the embodiment of the invention comprises two different classification models, namely a first classification model phi and a second classification model phi, wherein phi is used for training labeled data and generating pseudo labels for unlabeled data, and phi is used for training the unlabeled data through the pseudo labels.

Here, the first classification model phi is used to determine a first predicted entity relationship of the noise sample text u based on the noise concealment characteristicsThe second classification model ψ is used to determine a second predicted entity relationship for the noise sample text u based on the noise concealment characteristics. First predicted entity relationship->Together with the second predicted entity relationship, constitutes the predicted entity relationship of the noise sample text u.

Specifically, in the case of a sample data setAfter the noise sample text u and the pure sample text x are determined, firstly, a feature extraction model theta in the model can be extracted through an initial relation, and noise hiding features of the noise sample text u and pure hiding features of the pure sample text x are determined, namely, the noise sample text u and the pure sample text x are respectively input into the feature extraction model theta, so that the theta performs feature extraction on the input sample text, and further noise hiding features (Weak augmented feature and Strong augmented feature) and pure hiding features (Hidden features) output by the theta are obtained.

Then, the entity relation prediction can be performed based on the noise hiding features outputted by theta, namely, based on the noise hiding features, the entity relation prediction is performed by applying the first classification model phi to obtain the first predicted entity relation of the noise sample text u outputted by phiMeanwhile, on the basis of the noise hiding characteristics, a second classification model χ can be applied to conduct entity relation prediction so as to obtain a second predicted entity relation of the text u of the psi-output noise sample. Combining the first predicted entity relationship and the second predicted entity relationship, the predicted entity relationship of the noise sample text u can be determined.

Based on the above embodiment, a first predicted entity relationship of the noise sample text is determined based on a first classification model in the initial relationship extraction model and the noise concealment feature; determining a second predicted physical relationship for the noise sample text based on a second classification model in the initial relationship extraction model and the noise concealment feature, comprising:

determining a first predicted entity relationship for the noise sample text based on the first classification model and the first enhancement feature;

determining a second predicted physical relationship for the noise sample text based on the second classification model and the second enhancement feature;

The first enhancement feature and the second enhancement feature are noise concealment features determined by the feature extraction model based on noise sample text at a first neuron loss rate and a second neuron loss rate, respectively; the first neuron loss rate is less than the second neuron loss rate.

Specifically, the process of determining the first predicted entity relationship of the noise sample text according to the first classification model and the noise hiding feature, and determining the second predicted entity relationship of the noise sample text according to the second classification model and the noise hiding feature specifically includes:

in the process of extracting the characteristics of the noise sample text u through the characteristic extraction model theta to obtain the noise hiding characteristics, in order to enhance the robustness of the model, the model is made to be more robust to noise and the generalization performance of the model is improved.

Specifically, when the feature extraction model θ is used to extract the noise concealment features, different neuron loss rates dropouts may be set such that the feature extraction model θ outputs different noise concealment features, i.e., noise concealment features of different enhancement strengths, here, a first enhancement feature (Weak augmented feature) and a second enhancement feature (Strong augmented feature), at the different neuron loss rates dropouts, and the enhancement strength of the first enhancement feature is weaker than the enhancement strength of the second enhancement feature, in other words, the first neuron loss rate of the feature extraction model θ corresponding to the first enhancement feature is smaller than the second neuron loss rate of the feature extraction model θ corresponding to the second enhancement feature.

After that, the first enhancement feature and the second enhancement feature can be used to respectively predict the entity relationship to obtain a first predicted entity relationship and a second predicted entity relationship, that is, based on the first enhancement feature, the first classification model phi can be applied to predict the entity relationship to obtain a first predicted entity relationship of the noise sample text u output by the first classification model phiAnd on the basis of the second enhancement features, applying a second classification model psi to perform entity relation prediction so as to obtain a second predicted entity relation of the noise sample text u output by the second classification model psi.

Based on the above embodiment, step 130 includes:

and carrying out parameter iteration on the initial relation extraction model based on the label-free sample loss and the contrast loss to obtain the relation extraction model.

Specifically, in step 130, according to the predicted entity relationship of the noise sample text, and the pure hidden feature of the pure sample text and the sample entity relationship, performing parameter iteration on the initial relationship extraction model to obtain a relationship extraction model, which specifically may include:

Firstly, according to the predicted entity relation of the noise sample text, the loss of an initial relation extraction model in the unsupervised learning process can be measured, so that unsupervised loss is obtained, namely, the loss of learning from an unsupervised data set in an unsupervised learning mode is also called as an unsupervised sample loss; specifically, the difference between the first predicted entity relationship and the second predicted entity relationship predicted by the first classification model in the unsupervised learning process of the initial relationship extraction model and the second predicted entity relationship predicted by the second classification model is measured by the first predicted entity relationship and the second predicted entity relationship in the predicted entity relationship, so that the unlabeled sample loss of the model is obtained.

Meanwhile, the loss of the initial relation extraction model in the supervised learning process can be measured by utilizing the pure hidden characteristics of the pure sample text and the sample entity relation, so that the supervised loss is obtained, namely, the loss of learning from the labeled data set in the supervised learning mode is adopted; however, considering that the clean sample text obtained by screening is not perfect, i.e. it is not truly completely free of noisy sample text, a small portion of noise may still be present, whereas in the presence of noise, direct model training may lead to noise memorization.

In contrast, in the embodiment of the invention, when supervised learning is performed, the relationship features can be learned from the labeled data by adopting hybrid supervised contrast learning, and the loss of the initial relationship extraction model in the supervised learning process is measured by a loss function of the supervised contrast learning, so that the contrast loss of the initial relationship extraction model is obtained.

The feature mixing may be performed first to obtain a plurality of different Mixed features, that is, feature mixing mix may be performed on the pure hidden features of different pure sample texts to obtain a plurality of different pure Mixed features (Mixed features), and the sample entity relationship of the pure sample texts corresponding to the pure hidden features participating in feature mixing may be used as the sample entity relationship corresponding to the pure Mixed features obtained by feature mixing. And then, measuring the loss of the model in the mixed supervision contrast learning process according to each pure mixed characteristic and the corresponding sample entity relation, thereby obtaining the contrast loss of the model.

And then training the initial relation extraction model according to the loss of the model in the supervised learning and the unsupervised learning processes to obtain a trained relation extraction model, namely combining the unlabeled sample loss and the contrast loss, determining the overall loss in the model training process, specifically, calculating the overall loss of the model in a weighted summation mode on the basis of the unlabeled sample loss and the contrast loss, and then carrying out parameter iteration on the initial relation extraction model according to the overall loss, specifically, carrying out parameter updating by adopting a standard gradient descent method, so that the loss of the model after parameter updating in the supervised learning and the unsupervised learning processes can be as small as possible, more accurate relation features can be learned, and finally, the trained relation extraction model can be obtained, so that the accurate relation features learned by the relation extraction model in the subsequent application process can be outputted as the entity relation corresponding to the input text corpus.

In addition, it should be noted that, after the relationship extraction model is obtained through training of the sample data set, in order to verify the performance of the trained relationship extraction model, in the embodiment of the present invention, test verification may also be performed. The specific process comprises the following steps: s1, acquiring a verification data set and a test data set, wherein the verification data set can be acquired by a remote supervision method, and the test data set can be determined by manual labeling, so that the accuracy of the test can be ensured; s2, performing model verification by using a verification data set, and selecting a model with optimal performance on the verification data set as a final model according to the accuracy of prediction of the model on the verification data set; and S3, testing the model on a test set, and evaluating the prediction precision and accuracy. The following table is the test results of the relationship extraction model:

Where Prec (Precision) and Rec (Recall) represent the accuracy and Recall, respectively, of the model, and the F1Score (F1 Score) is the harmonic mean of the accuracy and Recall.

Here, the accuracy is used to measure the accuracy of the model in predicting the positive class; the recall rate is used for measuring the capacity of the model to find all positive samples; the F1 fraction is used for balancing the accuracy and the recall rate, and when the F1 fraction is higher, the model can keep higher accuracy and has better recall rate.

It can be seen that the relation extraction model provided by the invention exceeds the predictive model of the current main stream on most indexes of the test data set NYT10m and the Wiki20 m.

Based on the above embodiment, the unlabeled exemplar loss is determined based on the following formula:

wherein,for no label sample loss, μB is the number of noise sample texts u in each batch, +.>Representing a first predicted entity relationship, θ _s,t To use the feature extraction model of the t-th iteration at the second neuron loss rate, ψ _t For the second classification model of the t-th iteration, < >>Representing a second predicted entity relationship, l is a cross entropy loss function,σ _t (c) In the first predicted entity relationship corresponding to the relationship category c, the prediction deviceThe number of noise sample texts with confidence above the threshold τ.

Based on the above embodiment, the first predicted entity relationship is determined based on the following formula:

wherein phi is the first classification model, theta _w,t To use the feature extraction model for the t-th iteration at the first neuron loss rate, u is the noise sample text,for function composite operators, a function composite operation refers to taking the output of one function as the input of another function. For example, for the functions f (m) and g (m), a program is recorded>This means that f (m) is first found for q, and then g (f (m)) is found for the result f (m) of the previous step.

Based on the above embodiment, the contrast loss is determined based on the following formula:

z _i ＝λz _a +(1-λ)z _b

y _i ∈{y _a ,y _b }

wherein,to compare losses, NFor the number of clean sample texts, λ can be randomly sampled from the Beta distribution, λ ε [0,1]～Beta(α _m ,α _m )，α _m Is the super parameter, z _i Is z _a And z _b Is a mixed characteristic of (a) pure mixed characteristic, z _a And z _b Clean hidden features, y, of different clean sample texts, respectively _a Hiding feature z for purity _a Sample entity relationship, y, of corresponding clean sample text _b Hiding feature z for purity _b Sample entity relationship, y, of corresponding clean sample text _a And y _b All are pure mixed characteristics z _i Corresponding sample entity relationship,/->Representing a sample entity relationship y in a batch _i Number of plain sample texts, z _j For the sample entity relationship y _j Clean hidden features, y, of clean sample text of (2) _j And y is _i Identical, z _r For the sample entity relationship y _r Clean hidden features, y, of clean sample text of (2) _r Can be combined with y _i The same or different, τ is the temperature super-parameter.

Based on the above embodiment, in step 110, determining noise sample text and clean sample text from the sample dataset includes:

determining hidden features of each sample text in the sample data set based on the initial relation extraction model;

and determining noise sample texts and clean sample texts from the sample texts based on the K nearest neighbor diagram, and discarding sample entity relations of the noise sample texts.

Specifically, in step 110, the process of determining noise sample text and clean sample text from the sample dataset may specifically include:

firstly, an initial relation extraction model can be used for determining hidden features of each sample text in a sample data set, and specifically, the initial relation extraction model is used for extracting features of all sample texts in the sample data set so as to extract valid text information contained in the sample texts, thereby obtaining hidden features of each sample text, namely, each sample text can be input into the initial relation extraction model, and the initial relation extraction model performs feature extraction on each input sample text, thereby obtaining hidden features of each sample text.

Then, determining K nearest neighbor graphs corresponding to the sample texts through hidden features of the sample texts, namely constructing K-NN (K-Nearest Neighbor Graph) graphs of all the sample texts by using the hidden features of the sample texts; and then, respectively obtaining a noise sample text and a pure sample text from each sample text through the K-NN graph, namely, identifying the noise sample and the pure sample from the graph structure by the K-NN graph, so as to obtain the noise sample text and the pure sample text.

After obtaining the noise sample text and the clean sample text, labels of the noise sample text can be filtered from the sample data set, labels of the clean sample text are reserved, namely, sample entity relations of the noise sample text are discarded, so that a label-free data set (Unlabeled instance) without labels is constructed, and sample entity relations of the clean sample text are reserved, so that a label-carrying data set (Labeled instance) with labels is constructed.

Based on the above embodiment, the screening process of the clean sample text can be expressed by the following formula:

in the method, in the process of the invention,is based on sample text s _i Is predicted by hidden features, +.>For sample text s _i K nearest neighbor, s _k For sample text s _i Sample text in K-nearest neighbor of +.>For sample text s _k C represents a relationship class, calculated using a cross entropy loss function +.>And->To distinguish between noise samples and clean samples, and to construct a labeled dataset, expressed as:

wherein,labeled dataset representing the construction of clean sample text corresponding to relation class c,/for>For sample text s _i I is the cross entropy loss function, gamma _c The threshold corresponding to the relation category C is a super parameter, and C represents a set of all relation categories and comprises a plurality of relation categories.

The following describes the training device of the relation extraction model provided by the invention, and the training device of the relation extraction model described below and the training method of the relation extraction model described above can be correspondingly referred to each other.

Fig. 3 is a schematic structural diagram of a training device for a relational extraction model according to the present invention, as shown in fig. 3, the device includes:

a determining unit 310, configured to determine a sample data set, and determine noise sample text and clean sample text from the sample data set, where the sample data set includes a plurality of sample texts, and a sample entity relationship corresponding to each sample text;

A prediction unit 320, configured to determine noise concealment features of the noise sample text and pure concealment features of the pure sample text based on an initial relation extraction model, and determine a predicted entity relation of the noise sample text based on the noise concealment features;

the training unit 330 is configured to perform parameter iteration on the initial relationship extraction model based on the predicted entity relationship of the noise sample text, and the pure hidden feature and the sample entity relationship of the pure sample text, so as to obtain a relationship extraction model.

The invention provides a training device of a relation extraction model, which is used for determining a noise sample text and a pure sample text from a sample data set, respectively determining noise hiding characteristics of the noise sample text and pure hiding characteristics of the pure sample text through an initial relation extraction model, and determining a predicted entity relation of the noise sample text according to the noise hiding characteristics; according to the predicted entity relation of the noise sample text, the pure hidden characteristic of the pure sample text and the sample entity relation, parameter iteration is carried out on the initial relation extraction model to obtain a relation extraction model, the defects that the model training is insufficient and the effect is poor due to the fact that useful information is lost due to discarding of the current noise sample are overcome, the information in the noise sample text is fully utilized through the pseudo tag, the relation characteristic of the model is better learned, the training effect is optimized, and the improvement of the model performance is achieved.

Based on the above embodiment, the initial relation extraction model includes a feature extraction model, a first classification model and a second classification model, and the predicted entity relation includes a first predicted entity relation and a second predicted entity relation; the prediction unit 320 is configured to:

Based on the above embodiment, the prediction unit 320 is configured to:

The first neuron loss rate is less than the second neuron loss rate.

Based on the above embodiment, the training unit 330 is configured to:

x _i ＝λz _a +(1-λ)z _b

y _i ∈{y _a ,y _b }

wherein,for comparison of losses, N is a clean sampleThe number of texts, lambda, can be randomly sampled from the Beta distribution, lambda epsilon [0,1]～Beta(α _m ,α _m )，α _m Is the super ginseng, Z _i Is z _a And z _b Is a mixed characteristic of (a) pure mixed characteristic, z _a And z _b Clean hidden features, y, of different clean sample texts, respectively _a Hiding feature z for purity _a Sample entity relationship, y, of corresponding clean sample text _b Hiding feature z for purity _b Sample entity relationship, y, of corresponding clean sample text _a And y _b All are pure mixed characteristics z _i Corresponding sample entity relationship,/->Representing a sample entity relationship y in a batch _i Number of plain sample texts, z _j For the sample entity relationship y _j Clean hidden features, z, of clean sample text of (1) _r For the sample entity relationship y _r Clean hidden features, y, of clean sample text of (2) _r ＝y _i Or y _r ≠y _i τ is the temperature super-parameter.

Based on the above embodiment, the determining unit 310 is configured to:

Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a training method for a relational extraction model, the method comprising: determining a sample data set, and determining noise sample text and pure sample text from the sample data set, wherein the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts; based on an initial relation extraction model, respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text, and based on the noise hiding features, determining a predicted entity relation of the noise sample text; and carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text to obtain a relation extraction model.

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method of training a relational extraction model provided by the methods described above, the method comprising: determining a sample data set, and determining noise sample text and pure sample text from the sample data set, wherein the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts; based on an initial relation extraction model, respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text, and based on the noise hiding features, determining a predicted entity relation of the noise sample text; and carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text to obtain a relation extraction model.

In yet another aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a training method of a relation extraction model provided by the above methods, the method comprising: determining a sample data set, and determining noise sample text and pure sample text from the sample data set, wherein the sample data set comprises a plurality of sample texts and sample entity relations corresponding to the sample texts; based on an initial relation extraction model, respectively determining noise hiding features of the noise sample text and pure hiding features of the pure sample text, and based on the noise hiding features, determining a predicted entity relation of the noise sample text; and carrying out parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the pure hidden characteristic and the sample entity relation of the pure sample text to obtain a relation extraction model.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of training a relational extraction model, comprising:

2. The method of claim 1, wherein the initial relationship extraction model comprises a feature extraction model, a first classification model, and a second classification model, and the predicted entity relationship comprises a first predicted entity relationship and a second predicted entity relationship;

3. The method of claim 2, wherein the determining the first predicted physical relationship of the noise sample text is based on a first classification model of the initial relationship extraction model and the noise concealment feature; determining a second predicted physical relationship for the noise sample text based on a second classification model in the initial relationship extraction model and the noise concealment feature, comprising:

the first neuron loss rate is less than the second neuron loss rate.

4. A method of training a relation extraction model according to any one of claims 1 to 3, wherein the performing parameter iteration on the initial relation extraction model based on the predicted entity relation of the noise sample text and the clean hidden feature and sample entity relation of the clean sample text to obtain a relation extraction model comprises:

5. The method of training a relational extraction model as in claim 4, wherein the unlabeled exemplar loss is determined based on the following formula:

wherein,for no label sample loss, μb is the number of noise sample texts u,/>representing a first predicted entity relationship, θ _s,t To use the feature extraction model of the t-th iteration at the second neuron loss rate, ψ _t For the second classification model of the t-th iteration, < >>Representing a second predicted entity relationship, l being a cross entropy loss function,>σ _t (c) And representing the number of noise sample texts with prediction confidence coefficient larger than a threshold value tau in the first prediction entity relation corresponding to the relation class c.

6. The method of training a relational extraction model as in claim 4, wherein the contrast loss is determined based on the following formula:

z _i ＝λz _a +(1-λ)z _b

y _i ∈{y _a ,y _b }

wherein,for comparison loss, N is the pure sample textThe number of the books, lambda, can be randomly sampled from Beta distribution, lambda epsilon [0,1]～Beta(α _m ,α _m )，α _m Is the super parameter, z _i Is z _a And z _b Is a mixed characteristic of (a) pure mixed characteristic, z _a And z _b Clean hidden features, y, of different clean sample texts, respectively _a Hiding feature z for purity _a Sample entity relationship, y, of corresponding clean sample text _b Hiding feature z for purity _b Sample entity relationship, y, of corresponding clean sample text _a And y _b All are pure mixed characteristics z _i Corresponding sample entity relationship,/->Representing a sample entity relationship y in a batch _i Number of plain sample texts, z _j For the sample entity relationship y _j Clean hidden features, z, of clean sample text of (1) _r For the sample entity relationship y _r Clean hidden features, y, of clean sample text of (2) _r ＝y _i Or y _r ≠y _i τ is the temperature super-parameter.

7. A method of training a relational extraction model according to any one of claims 1 to 3, wherein said determining noise sample text and clean sample text from the sample dataset comprises:

8. A training device for a relational extraction model, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a training method for a relation extraction model according to any one of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method of training a relation extraction model according to any of claims 1 to 7.