CN114970857A - Optimization method for relational extraction model - Google Patents
Optimization method for relational extraction model Download PDFInfo
- Publication number
- CN114970857A CN114970857A CN202210753408.0A CN202210753408A CN114970857A CN 114970857 A CN114970857 A CN 114970857A CN 202210753408 A CN202210753408 A CN 202210753408A CN 114970857 A CN114970857 A CN 114970857A
- Authority
- CN
- China
- Prior art keywords
- entity
- loss
- vector
- encoder
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of natural language processing, in particular to an optimization method for a relation extraction model, wherein the relation extraction model comprises a main model, an entity recognition learning module and a discriminator, the main model comprises a shared encoder and a relation extraction encoder, and the method comprises the following steps: acquiring an input sentence through a shared encoder in the main model, encoding the sentence, and outputting a hidden vector of each word in the sentence; inputting the hidden vector into a relation extraction encoder, an entity recognition learning module and a discriminator to respectively obtain relation type loss, entity label loss and discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain the overall loss; and carrying out primary optimization on the relation extraction model through the overall loss. The optimization method provided by the invention enhances the understanding of the model to the meaning of the entity through the learning of the entity identification, thereby improving the performance of the model to the relation extraction task.
Description
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of natural language processing, in particular to an optimization method for a relation extraction model.
[ background of the invention ]
The relationship extraction task aims at extracting (predicting) a relationship between two given entities of a given sentence from the two entities. The understanding of the meaning of the entity is very important for the accuracy of the relationship prediction, and then the existing method often neglects the modeling of the entity and has insufficient understanding of the meaning of the entity.
[ summary of the invention ]
In order to solve the problem that the meaning of an entity is not sufficiently understood by the existing method, the invention provides an optimization method for a relation extraction model.
The present invention provides an optimization method for a relationship extraction model to solve the above technical problem, where the relationship extraction model includes a main model, an entity recognition learning module and a discriminator, the main model includes a shared encoder and a relationship extraction encoder, and the method includes the following steps: acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement; inputting the hidden vector into the relation extraction encoder, the entity identification learning module and the discriminator to respectively obtain the relation type loss, the entity label loss and the discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain an overall loss; and carrying out primary optimization on the relation extraction model through overall loss.
Preferably, the obtaining of the relationship type loss by the main model comprises the following steps: inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word; calculating the hidden vector of each word and the relation extraction hidden vector through a preset second algorithm to obtain a predicted relation type; and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss.
Preferably, the preset second algorithm comprises the steps of: calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of a first entity, a vector representation of a second entity and a sentence representation of a relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of a shared encoder; connecting the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relation extraction encoder and the sentence representation of the sharing encoder in series to obtain an intermediate vector; and the intermediate vector is sent into a SoftMax classifier after passing through a full connection layer, so that the predicted relation type is obtained.
Preferably, the entity identification learning module comprises an entity encoder, and the step of obtaining the entity tag loss by the entity identification learning module comprises the following steps: inputting the hidden vector into an entity encoder in the entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing the predicted entity identification labels of all words with a preset standard for calculation to obtain the entity label loss.
Preferably, the conversion process comprises the steps of: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification label.
Preferably, the preset standard is an actual entity label, the predicted entity identification labels of all the words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss.
Preferably, the actual entity label of each word is determined by the position of the entity in the input, and when a word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; and if one word is in the middle or at the end of a certain entity, the entity label is a third label.
Preferably, the obtaining of the discriminator loss by the discriminator includes the steps of: the discriminator obtains a result of comparison between the predicted entity identification label and a preset standard to obtain target output, and the value of the target output is 0 or 1; sending the hidden vector into a full connection layer and a SoftMax classifier, and obtaining a 2-dimensional vector for each word, wherein each dimension of the 2-dimensional vector corresponds to the distribution probability of target output on 0 and 1; and calculating to obtain the loss of the discriminator according to the distribution probability.
Preferably, in calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used for controlling the contribution of the entity recognition learning module and the arbiter for resisting learning to the model training.
Preferably, the method further comprises the steps of: and after the relation extraction model is optimized through the overall loss, adjusting parameters of partial modules in the relation extraction model, and performing secondary optimization updating on the relation extraction encoder and model parameters of the full connection layer.
Compared with the prior art, the optimization method for the relational extraction model provided by the invention has the following beneficial effects:
1. the embodiment of the invention provides an optimization method for a relation extraction model, wherein the relation extraction model comprises a main model, an entity recognition learning module and a discriminator, the main model comprises a sharing encoder and a relation extraction encoder, and the optimization method comprises the following steps: acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement; inputting the hidden vector into a relation extraction encoder, an entity recognition learning module and a discriminator to respectively obtain relation type loss, entity label loss and discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain the overall loss; and carrying out primary optimization on the relation extraction model through the overall loss. It can be understood that, by setting the entity identification module in the preliminary optimization, the shared encoder in the main model realizes the learning of the entity in the text, and enhances the modeling capability of the main model on the entity, thereby improving the performance of the relationship extraction model on the relationship extraction task.
2. In the optimization method for the relational extraction model provided by the embodiment of the present invention, obtaining a relation type loss by a main model includes the following steps: inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word; extracting a hidden vector from the hidden vector and the relation of each word, and performing calculation processing through a preset second algorithm to obtain a predicted relation type; and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss. Understandably, the hidden vector and the coded relation extraction hidden vector are processed through a second algorithm to obtain a prediction relation type of the current main model, the predicted relation type is compared with a preset standard for calculation to obtain a relation type loss for representing the relation extraction model prediction relation type capability, the relation type loss can optimize the relation prediction capability of the relation extraction model in the process of primarily optimizing the relation extraction model through the overall loss, and the reliability of the optimization method is ensured.
3. In the optimization method for the relational extraction model provided by the embodiment of the invention, the preset second algorithm comprises the following steps: calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of the first entity, a vector representation of the second entity and a sentence representation of the relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of the shared encoder; connecting the vector representation of the first entity, the vector representation of the second entity, the vector representation of the sentence and the sentence representation of the shared encoder in series to obtain an intermediate vector; and the intermediate vector is sent into a SoftMax classifier after passing through a full connection layer, so as to obtain the predicted relation type. The method includes the steps that a hidden vector and a relation extraction hidden vector are calculated through a preset vector algorithm, vector representations of a first entity, a second entity, sentence representations of a relation extraction encoder and sentence representations of a sharing encoder are obtained and serve as intermediate variables, vector representations of different positions of the entities or the sentences in a relation extraction model are reflected, the intermediate vectors obtained after the vector representations are connected in series are subjected to classification and normalization processing of a full connection layer and SoftMax, predicted relation types (scalars) of the two entities by a main model are obtained, the predicted relation types are embodied in the form of scalars, and subsequent relation type loss calculation is facilitated.
4. In the optimization method for the relationship extraction model provided by the embodiment of the present invention, the entity identification learning module includes an entity encoder, and the entity identification learning module obtains the entity label loss, including the following steps: inputting the hidden vector into an entity encoder in an entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing and calculating the predicted entity identification labels of all the words with a preset standard to obtain the entity label loss. It is to be understood that the further encoding of the hidden vector by the entity recognition encoder results in an entity recognition hidden vector for each word, and the hidden vector and the entity recognition hidden vector for each word are converted into an entity recognition tag (scalar) for prediction that a word is at an approximate position of an entity. Therefore, the predicted entity identification label is embodied in a scalar form, and the entity identification loss is easier to calculate in the follow-up process.
5. In an optimization method for a relational extraction model provided in an embodiment of the present invention, a conversion process includes the following steps: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification label. It can be understood that the mode of classifying the vectors after being connected in series through the full connection layer and the SoftMax classifier and converting the vectors into the predicted entity identification tags is the same as the mode of converting the intermediate vectors into the predicted relationship classes in the main model, and the two modes are unified, so that the establishment, optimization and maintenance of the relationship extraction model are facilitated.
6. In the optimization method for the relation extraction model provided by the embodiment of the invention, the preset standard is an actual entity label, the predicted entity identification labels of all words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss. The method can be understood that the difference between the predicted value and the actual value, namely the entity tag loss, can be obtained by calculating the predicted entity tag and the actual entity tag through a cross entropy loss function, and the relationship extraction model is further optimized through the entity tag loss, so that the understanding of the model on the entity meaning can be enhanced, and the prediction capability of the model on the entity tag can be improved.
7. In the optimization method for the relationship extraction model provided by the embodiment of the invention, the actual entity label of each word is determined by the position of the entity in the input, and when a word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; if a word is in the middle or at the end of an entity, the entity label is a third label. Understandably, by tagging words in sentences, the trainable entity recognition learning module can find the entity ability from the text, thereby enhancing the comprehension ability of the relationship extraction model to the entity.
8. In the optimization method for the relational extraction model provided by the embodiment of the invention, the step of acquiring the loss of the discriminator by the discriminator comprises the following steps: the discriminator acquires a result of comparison between the predicted entity identification tag and a preset standard to obtain target output, and the value of the target output is 0 or 1; the hidden vectors are sent into a full connection layer and a SoftMax classifier, a 2-dimensional vector is obtained for each word, and each dimension of the 2-dimensional vector corresponds to the distribution probability of target output on 0 and 1; and obtaining the loss of the discriminator through distribution probability calculation. Understandably, the setting of the discriminator effectively controls the learning degree of the shared encoder to the entity recognition task, avoids overfitting of the shared encoder to the entity recognition task, and if the discriminator provided by the discriminator is not available, the overall loss is optimized towards the entity recognition loss during primary optimization, so that the performance of the relation extraction model to the main task-relation extraction is influenced. It can be seen that the arbiter penalty provided by the arbiter guarantees that the main performance of the relational extraction model is not erroneously covered when initially optimized by the global penalty.
9. In the optimization method for the relation extraction model, provided by the embodiment of the invention, when the overall loss is calculated, adjustable control parameters are introduced, and the control parameters are used for controlling the entity recognition learning module and the contribution of a recognizer for resisting learning to model training. It can be understood that the control parameters provide adjustable options for the calculation of the overall loss, and the emphasis on optimizing the relation extraction model can be controlled by adjusting the control parameters when the relation extraction model is optimized. Therefore, the practicability of the optimization method is improved by setting the control parameters.
10. The optimization method for the relation extraction model provided by the embodiment of the invention further comprises the following steps: and after the relation extraction model is optimized through the whole loss, secondary optimization updating is carried out on the relation extraction encoder and the model parameters of the full connection layer by adjusting the parameters of partial modules in the relation extraction model. Understandably, by performing the second optimization updating, the relationship extraction model after the second optimization can obtain better performance. Meanwhile, the relation extraction model after secondary optimization is the same as the baseline model of relation extraction in use, only sentences and entities are needed to be used as input, additional input is not needed, and compared with the baseline model, the performance is enhanced while additional use expense is not caused.
[ description of the drawings ]
Fig. 1 is a schematic diagram of steps of an optimization method for a relational extraction model according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a relational extraction model of an optimization method for a relational extraction model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of logic steps of an optimization method for a relational extraction model according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a step of calculating a relationship type loss by the optimization method for the relationship extraction model according to the embodiment of the present invention.
Fig. 5 is a schematic diagram of a step of calculating an entity identification tag loss by the optimization method for the relational extraction model according to the embodiment of the present invention.
Fig. 6 is a schematic diagram of quadratic optimization of an optimization method for a relational extraction model according to an embodiment of the present invention.
The attached drawings indicate the following:
1. an optimization method;
10. a relation extraction model;
100. a master model; 101. an entity recognition learning module; 102. a discriminator;
1000. a shared encoder; 1001. a relation extraction encoder; 1002. a first fully-connected layer; 1003. a first SoftMax classifier; 1010. a physical encoder; 1011. a second fully connected layer; 1012. a second SoftMax classifier; 1020. a third fully connected layer; 1021. a third SoftMax classifier.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and implementation examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1 to fig. 3, a first embodiment of the present invention provides an optimization method 1 for a relational extraction model 10, including the following steps: obtaining an input sentence through a Shared Encoder 1000(Shared Encoder) in a preset main model 100(RE), encoding the sentence, and outputting a hidden vector of each word in the sentenceTHE relation extraction model 10 includes a main model 100, an entity recognition learning module 101(NER), and a Discriminator 102(THE Discriminator), and THE main model 100 further includes a relation extraction encoder 1001(REencoder) for converting a hidden vector into a hidden vectorThe input relationship extraction encoder 1001, the entity recognition learning module 101 and the discriminator 102 respectively obtain the relationship type loss L RE Entity tag loss L NER And arbiter 102 loss; losing the relationship type by L RE Entity tag loss L NER And a loss L of the discriminator 102 D Calculating to obtain the overall loss L through a preset first algorithm; the relational extraction model 10 is preliminarily optimized by the overall loss L. It can be understood that, by setting the entity identification learning module 101 in the preliminary optimization, the shared encoder 1000 in the main model 100 realizes the learning of the entities in the text, and enhances the modeling capability of the main model 100 on the entities, thereby improving the performance of the relationship extraction model 10 on the relationship extraction task.
Referring to fig. 2 and 4, in some embodiments, obtaining the relationship type loss from the main model 100 includes the following steps: to hide the vectorThe relation extracting encoder 1001 in the main model 100 is inputted, and the relation extracting encoder 1001 is applied to the hidden vectorFurther coding to obtain relation extraction hidden vector of each wordBy latent vectors for each wordSum relation extraction hidden vectorCalculating by a preset second algorithm to obtain a predicted relationship typeType of relationship to be predictedComparing and calculating with a preset standard to obtain the relation type loss L RE . Understandably, the hidden vector will beExtracting hidden vectors from coded relationsThe prediction relationship type of the current main model 100 can be obtained by processing through the second algorithmType of relationship to be predictedComparing and calculating with a preset standard to obtain a relation type loss L for representing the capability of the relation extraction model 10 for predicting the relation type RE Loss of relationship type L RE The relation prediction capability of the relation extraction model 10 can be optimized in the process of primarily optimizing the relation extraction model 10 through the overall loss L, and the reliability of the optimization method 1 is guaranteed.
Referring to fig. 2, in some embodiments, the predetermined second algorithm includes the following steps: extracting hidden vectors from the relation by a preset vector algorithmCalculating to obtain the vector representation of the first entityVector characterization of a second entityAnd sentence characterization by the relation extraction encoder 1001Meanwhile, a preset vector algorithm is applied to the hidden vector to obtain the sentence representation of the shared encoder 1000Characterizing a vector of a first entityVector characterization of a second entitySentence characterization by relational extraction encoder 1001And sentence characterization of shared encoder 1000Connecting in series to obtain an intermediate vector o; the intermediate vector o is sent to a first SoftMax classifier 1003 after passing through a preset first full connection layer 1002 to obtain a predicted relation typeUnderstandably, the hidden vectors and the relation extraction hidden vectors are calculated through a preset vector algorithm to obtain the vector representation of the first entityVector characterization of a second entity(E 1 And E 2 Representing two entities separately), sentence characterization by the relation extraction encoder 1001And sentence characterization of shared encoder 1000As intermediate variables, vector representations of different positions of the entities or sentences in the relationship extraction model 10 are reflected, and after classification and normalization processes of the first fully-connected layer 1002 and the first SoftMax classifier 1003 are performed on the intermediate vector o obtained after the vector representations are connected in series, predicted relationship types of the main model 100 for the two entities are obtained(scalar) type of relationship to be predictedEmbodied in scalar form, facilitating subsequent pair relationship type loss L RE And (4) calculating.
In some embodiments, the predetermined vector algorithm is MaxPooling algorithm. The concrete method is as follows:
referring to fig. 2 and 5, in some embodiments, the entity identification learning module 101 includes an entity encoder 1010, and the step of the entity identification learning module 101 obtaining the entity tag loss includes the following steps: will be provided withLatent vectorThe entity encoder 1010 is input into the entity recognition learning module 101, and the entity encoder 1010 is used for generating a hidden vectorFurther coding to obtain entity identification hidden vector of each wordBy latent vectors for each wordAnd entity identification hidden vectorPerforming conversion processing to obtain predicted entity identification labelTagging predicted entity identities of all wordsComparing and calculating with a preset standard to obtain the entity label loss L NER . Understandably, the hidden vectors are aligned by the entity encoder 1010The further coding of the word obtains an entity identification hidden vector of each wordAnd for each word latent vectorAnd entity identification hidden vectorPerforming conversion processing to convert into a wordPredicted entity identification tag at approximate location of entity(scalar), it should be understood that scalar refers herein to a quantity that will have no direction, and its content may be a number or a character. As can be seen, the entities to be predicted are tagged with identificationEmbodied in scalar form, the entity identification loss is easier to calculate subsequently.
Referring to FIG. 2, in some embodiments, the conversion process includes the following steps: implicit vector of each wordImplicit vector with entity identificationConcatenating the vectors and sending the concatenated vectors to a second fully-connected layer 1011 and a second SoftMax classifier 1012 to obtain the predicted entity identification tagUnderstandably, the manner in which the concatenated vector is sorted and converted into predicted entity identification tags by the second fully-connected layer 1011 and the second SoftMax classifier 1012And the relationship class in the main model 100 that converts the intermediate vector o into a predictionThe same way is adopted, and the two ways are unified, so that the establishment, optimization and maintenance of the relation extraction model 10 are more convenient.
In some embodiments, the predetermined criteria is the actual entity tagTagging predicted entity identities of all wordsWith actual entity labelsComparing, and calculating the comparison result through a cross entropy loss function to obtain an entity label loss L NER . Understandably, by tagging predicted entitiesWith actual entity labelsThe difference between the predicted value and the actual value, namely the entity label loss L, can be obtained by using the cross entropy loss function to calculate NER Loss of L by entity tag NER The relation extraction model 10 is further optimized, so that the understanding of the model to the meaning of the entity can be enhanced, and the prediction capability of the model to the entity label is improved.
In some embodiments, the actual entity label of each word is determined by the position of the entity input into the shared encoder 1000, and when a word does not belong to a certain entity, the entity label is the first label; if a word is the beginning of an entity, the entity label is a second label; if a word is in the middle or at the end of an entity, the entity label is a third label. Specifically, the first label is O, the second label is B, and the third label is I. Referring to FIG. 1, An exemplary input sentence is Anair forcepilotis back, the two entities of the input areair forceAndpilotthat inputs the actual entity tag for each word in the sentenceIt is OBIBOO. Understandably, by tagging words in the input sentence and tagging predicted entitiesWith actual entity labelsIn contrast, the trainable entity recognition learning module 101 finds the entity's ability from the text, thereby enhancing the comprehension of the entity by the relationship extraction model 10.
Continuing to refer to fig. 2, in some embodiments, the obtaining of the loss of the arbiter 102 by the arbiter 102 comprises: the arbiter 102 obtains the predicted entity identification tagComparing the result with a preset standard to obtain a target outputTarget outputThe value is 0 or 1, and the specific mode is as follows:
to hide the vectorA preset third full link layer 1020 and a third SoftMax classifier 1021 are fed to obtain a 2-dimensional vector for each word, each dimension of the 2-dimensional vector corresponding to the distribution probability of the target output over 0 and 1 (e.g.,the distribution probability of the prediction at 0 is denoted as P (0| X)); the arbiter 102 loss is obtained by a distribution probability calculation.
In some embodiments, for each word, a logarithmic natural loss is calculated and the losses for all words are summed to yield a discriminator 102 loss L D . Specifically, the following formula is used for calculation:
it can be understood that the setting of the discriminator 102 effectively controls the learning degree of the shared encoder 1000 on the entity recognition task, avoids overfitting of the shared encoder 1000 on the entity recognition task, and if there is no loss of the discriminator 102 provided by the discriminator 102, the overall loss is optimized more toward the entity recognition loss during the preliminary optimization, thereby affecting the performance of the relationship extraction model 10 on the main task-relationship extraction. It can be seen that the arbiter 102 penalty provided by the arbiter 102, when initially optimized by global penalty, ensures that the main performance of the relational extraction model 10 is not erroneously covered.
In some embodiments, in calculating the overall loss, an adjustable control parameter λ is introduced, which is used to control the magnitude of the entity recognition learning module 101 and the contribution of the discriminators 102 to the model training for counterlearning. It can be understood that the control parameter λ provides an adjustable option for calculating the overall loss L, and when the relation extraction model 10 is optimized, the emphasis on optimizing the relation extraction model 10 can be controlled by adjusting the control parameter λ. It can be seen that the setting of the control parameters improves the practicality of the optimization method 1.
Specifically, the overall loss L is calculated by the following formula:
L=L RE +L D ×(λ*L NER )
and after calculating the overall loss L, updating all parameters in the relational extraction model 10 by a back propagation algorithm.
Referring to fig. 2 and fig. 6, in some embodiments, the optimization method 1 further includes the following steps: after the relationship extraction model 10 is optimized by the overall loss L, the second optimization update is performed on the relationship extraction encoder 1001 and the model parameters of all the fully-connected layers by adjusting the parameters of some modules in the relationship extraction model 10. It can be understood that by performing the second optimization update, the relationship extraction model 10 after the second optimization can achieve better performance. Meanwhile, the relation extraction model 10 after the secondary optimization is the same as the baseline model of the relation extraction in use, only sentences and entities are needed as input, no additional input is relied on, and compared with the baseline model, the performance is enhanced and no additional use expense is caused.
In some embodiments, the second optimization update comprises the steps of: initializing the shared encoder 1000 and the relation extraction encoder 1001 in the main model 100, the first, the second, and the third full connection layers and the SoftMax classifier according to the parameters of the relation extraction model 10 optimized by the overall loss L;
calculating the optimized relation type loss L in the same way as in the primary optimization RE ;
Loss L according to optimized relationship type RE And performing secondary optimization updating on the relation extraction encoder 1001 and the model parameters of the first, second and third fully-connected layers through a back propagation algorithm to obtain a final optimized relation extraction model 10.
It can be understood that, regarding the representation of the performance of the relationship extraction model 10 on the relationship extraction task, it is generally expressed by F value, and for the extraction dataset of english relationship, before the optimization is performed by using the optimization method 1, the F value of the relationship extraction model 10 is 77.04; after optimization using optimization method 1, the F value of the relational extraction model 10 was 77.70. Therefore, by introducing the entity recognition learning module 101, the modeling capability of the model on the entity is enhanced, so that the performance of the relationship extraction model 10 on the relationship extraction task is improved.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary and alternative embodiments, and that the acts and modules illustrated are not required in order to practice the invention.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Compared with the prior art, the optimization method for the relation extraction model provided by the invention has the following beneficial effects:
1. the embodiment of the invention provides an optimization method for a relation extraction model, wherein the relation extraction model comprises a main model, an entity recognition learning module and a discriminator, the main model comprises a sharing encoder and a relation extraction encoder, and the optimization method comprises the following steps: acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement; inputting the hidden vector into a relation extraction encoder, an entity recognition learning module and a discriminator to respectively obtain relation type loss, entity label loss and discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain the overall loss; and carrying out primary optimization on the relation extraction model through the overall loss. It can be understood that, by setting the entity identification module in the preliminary optimization, the shared encoder in the main model realizes the learning of the entity in the text, and enhances the modeling capability of the main model on the entity, thereby improving the performance of the relationship extraction model on the relationship extraction task.
2. In the optimization method for the relational extraction model provided by the embodiment of the present invention, obtaining a relation type loss by a main model includes the following steps: inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word; extracting a hidden vector from the hidden vector and the relation of each word, and performing calculation processing through a preset second algorithm to obtain a predicted relation type; and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss. Understandably, the hidden vector and the coded relation extraction hidden vector are processed through a second algorithm to obtain a prediction relation type of the current main model, the predicted relation type is compared with a preset standard for calculation to obtain a relation type loss for representing the relation extraction model prediction relation type capability, the relation type loss can optimize the relation prediction capability of the relation extraction model in the process of primarily optimizing the relation extraction model through the overall loss, and the reliability of the optimization method is ensured.
3. In the optimization method for the relational extraction model provided by the embodiment of the invention, the preset second algorithm comprises the following steps: calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of the first entity, a vector representation of the second entity and a sentence representation of the relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of the shared encoder; connecting the vector representation of the first entity, the vector representation of the second entity, the vector representation of the sentence and the sentence representation of the shared encoder in series to obtain an intermediate vector; and the intermediate vector is sent into a SoftMax classifier after passing through a full connection layer, so as to obtain the predicted relation type. The method includes the steps that a hidden vector and a relation extraction hidden vector are calculated through a preset vector algorithm, vector representations of a first entity, a second entity, sentence representations of a relation extraction encoder and sentence representations of a sharing encoder are obtained and serve as intermediate variables, vector representations of different positions of the entities or the sentences in a relation extraction model are reflected, the intermediate vectors obtained after the vector representations are connected in series are subjected to classification and normalization processing of a full connection layer and SoftMax, predicted relation types (scalars) of the two entities by a main model are obtained, the predicted relation types are embodied in the form of scalars, and subsequent relation type loss calculation is facilitated.
4. The embodiment of the invention provides an optimization method for a relationship extraction model, wherein an entity identification learning module comprises an entity encoder, and the entity identification learning module obtains the entity label loss and comprises the following steps: inputting the hidden vector into an entity encoder in an entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing and calculating the predicted entity identification labels of all the words with a preset standard to obtain the entity label loss. It is to be understood that the further encoding of the hidden vector by the entity recognition encoder results in an entity recognition hidden vector for each word, and the hidden vector and the entity recognition hidden vector for each word are converted into an entity recognition tag (scalar) for prediction that a word is at an approximate position of an entity. Therefore, the predicted entity identification label is embodied in a scalar form, and the entity identification loss is easier to calculate in the follow-up process.
5. In an optimization method for a relational extraction model provided in an embodiment of the present invention, a conversion process includes the following steps: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification tag. It can be understood that the mode of classifying the vectors after being connected in series through the full connection layer and the SoftMax classifier and converting the vectors into the predicted entity identification tags is the same as the mode of converting the intermediate vectors into the predicted relationship classes in the main model, and the two modes are unified, so that the establishment, optimization and maintenance of the relationship extraction model are facilitated.
6. In the optimization method for the relationship extraction model provided by the embodiment of the invention, the preset standard is an actual entity label, the predicted entity identification labels of all words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss. The method can be understood that the difference between the predicted value and the actual value, namely the entity tag loss, can be obtained by calculating the predicted entity tag and the actual entity tag through a cross entropy loss function, and the relationship extraction model is further optimized through the entity tag loss, so that the understanding of the model on the entity meaning can be enhanced, and the prediction capability of the model on the entity tag can be improved.
7. In the optimization method for the relationship extraction model provided by the embodiment of the invention, the actual entity label of each word is determined by the position of the entity in the input, and when a word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; if a word is in the middle or at the end of an entity, the entity label is a third label. Understandably, by tagging words in sentences, the trainable entity recognition learning module can find the entity ability from the text, thereby enhancing the comprehension ability of the relationship extraction model to the entity.
8. In the optimization method for the relational extraction model provided by the embodiment of the invention, the step of acquiring the loss of the discriminator by the discriminator comprises the following steps: the discriminator acquires a result of comparison between the predicted entity identification tag and a preset standard to obtain target output, and the value of the target output is 0 or 1; the hidden vectors are sent into a full connection layer and a SoftMax classifier, a 2-dimensional vector is obtained for each word, and each dimension of the 2-dimensional vector corresponds to the distribution probability of target output on 0 and 1; and obtaining the loss of the discriminator through distribution probability calculation. Understandably, the setting of the discriminator effectively controls the learning degree of the shared encoder to the entity recognition task, avoids overfitting of the shared encoder to the entity recognition task, and if the discriminator provided by the discriminator is not available, the overall loss is optimized towards the entity recognition loss during primary optimization, so that the performance of the relation extraction model to the main task-relation extraction is influenced. It can be seen that the arbiter penalty provided by the arbiter guarantees that the main performance of the relational extraction model is not erroneously covered when initially optimized by the global penalty.
9. In the optimization method for the relation extraction model provided by the embodiment of the invention, when the overall loss is calculated, adjustable control parameters are introduced, and the control parameters are used for controlling the contribution of an entity recognition learning module and a counterstudy discriminator to model training. It can be understood that the control parameters provide adjustable options for the calculation of the overall loss, and the emphasis on optimizing the relation extraction model can be controlled by adjusting the control parameters when the relation extraction model is optimized. Therefore, the practicability of the optimization method is improved by setting the control parameters.
10. The optimization method for the relation extraction model provided by the embodiment of the invention further comprises the following steps: and after the relation extraction model is optimized through the whole loss, secondary optimization updating is carried out on the relation extraction encoder and the model parameters of the full connection layer by adjusting the parameters of partial modules in the relation extraction model. Understandably, by performing the second optimization updating, the relationship extraction model after the second optimization can obtain better performance. Meanwhile, the relation extraction model after secondary optimization is the same as the baseline model of relation extraction in use, only sentences and entities are needed to be used as input, additional input is not needed, and compared with the baseline model, the performance is enhanced while additional use expense is not caused.
The optimization method for the relation extraction model disclosed by the embodiment of the invention is described in detail, a specific example is applied in the description to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for the persons skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present description should not be construed as a limitation to the present invention, and any modification, equivalent replacement, and improvement made within the principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. An optimization method for a relationship extraction model, the relationship extraction model comprising a main model, an entity recognition learning module and a discriminator, the main model comprising a sharing encoder and a relationship extraction encoder, characterized in that: the method comprises the following steps:
acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement;
inputting the hidden vector into the relation extraction encoder, the entity identification learning module and the discriminator to respectively obtain the relation type loss, the entity label loss and the discriminator loss;
calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain an overall loss;
and carrying out primary optimization on the relation extraction model through the overall loss.
2. The method of claim 1, wherein: the obtaining of the relationship type loss by the main model comprises the following steps:
inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word;
calculating the hidden vector of each word and the relation extraction hidden vector through a preset second algorithm to obtain a predicted relation type;
and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss.
3. The method of claim 2, wherein: the preset second algorithm comprises the following steps:
calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of a first entity, a vector representation of a second entity and a sentence representation of a relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of a shared encoder;
connecting the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relation extraction encoder and the sentence representation of the sharing encoder in series to obtain an intermediate vector;
and the intermediate vector is sent into a preset first SoftMax classifier after passing through a preset first full connection layer, so that the predicted relation type is obtained.
4. The method of claim 1, wherein: the entity identification learning module comprises an entity encoder, and the step of obtaining the entity label loss by the entity identification learning module comprises the following steps:
inputting the hidden vector into an entity encoder in the entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word;
converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label;
and comparing the predicted entity identification labels of all words with a preset standard for calculation to obtain the entity label loss.
5. The method of claim 4, wherein: the conversion process includes the steps of:
and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a preset second full connection layer and a preset second SoftMax classifier so as to obtain the predicted entity identification tag.
6. The method of claim 4, wherein:
and the preset standard is an actual entity label, the predicted entity identification labels of all the words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss.
7. The method of claim 6, wherein: the actual entity label of each word is determined by the position of the entity in the input, and when one word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; and if one word is in the middle or at the end of a certain entity, the entity label is a third label.
8. The method of claim 6, wherein: the step of the arbiter obtaining the arbiter loss comprises the steps of:
the discriminator obtains a result of comparison between the predicted entity identification label and a preset standard to obtain target output, and the value of the target output is 0 or 1;
sending the hidden vector to a preset third full connection layer and a third SoftMax classifier, and obtaining a 2-dimensional vector for each word, wherein each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1;
and calculating the loss of the discriminator according to the distribution probability.
9. The method of claim 1, wherein: and when the overall loss is calculated, introducing adjustable control parameters, wherein the control parameters are used for controlling the contribution of the entity recognition learning module and the arbiter for resisting learning to model training.
10. The method of claim 5, wherein: the method further comprises the steps of:
and after the relation extraction model is optimized through the overall loss, secondary optimization updating is carried out on the relation extraction encoder and the model parameters of the full connection layer by adjusting the parameters of partial modules in the relation extraction model.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210753408.0A CN114970857A (en) | 2022-06-29 | 2022-06-29 | Optimization method for relational extraction model |
PCT/CN2022/128623 WO2024000966A1 (en) | 2022-06-29 | 2022-10-31 | Optimization method for natural language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210753408.0A CN114970857A (en) | 2022-06-29 | 2022-06-29 | Optimization method for relational extraction model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114970857A true CN114970857A (en) | 2022-08-30 |
Family
ID=82966585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210753408.0A Pending CN114970857A (en) | 2022-06-29 | 2022-06-29 | Optimization method for relational extraction model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114970857A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024000966A1 (en) * | 2022-06-29 | 2024-01-04 | 苏州思萃人工智能研究所有限公司 | Optimization method for natural language model |
CN117610562A (en) * | 2024-01-23 | 2024-02-27 | 中国科学技术大学 | Relation extraction method combining combined category grammar and multi-task learning |
-
2022
- 2022-06-29 CN CN202210753408.0A patent/CN114970857A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024000966A1 (en) * | 2022-06-29 | 2024-01-04 | 苏州思萃人工智能研究所有限公司 | Optimization method for natural language model |
CN117610562A (en) * | 2024-01-23 | 2024-02-27 | 中国科学技术大学 | Relation extraction method combining combined category grammar and multi-task learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344391B (en) | Multi-feature fusion Chinese news text abstract generation method based on neural network | |
CN110765966B (en) | One-stage automatic recognition and translation method for handwritten characters | |
CN109829299B (en) | Unknown attack identification method based on depth self-encoder | |
CN108416058B (en) | Bi-LSTM input information enhancement-based relation extraction method | |
CN114970857A (en) | Optimization method for relational extraction model | |
WO2020140487A1 (en) | Speech recognition method for human-machine interaction of smart apparatus, and system | |
CN109492113B (en) | Entity and relation combined extraction method for software defect knowledge | |
CN115145551A (en) | Intelligent auxiliary system for machine learning application low-code development | |
CN110781290A (en) | Extraction method of structured text abstract of long chapter | |
CN115311687A (en) | Natural language pedestrian retrieval method and system combining token and feature alignment | |
CN113673535A (en) | Image description generation method of multi-modal feature fusion network | |
CN115168541A (en) | Chapter event extraction method and system based on frame semantic mapping and type perception | |
CN114168754A (en) | Relation extraction method based on syntactic dependency and fusion information | |
CN117058667A (en) | End-to-end scene text recognition method based on CLIP | |
CN117036706A (en) | Image segmentation method and system based on multi-modal dialogue language model | |
CN115034228A (en) | Optimization method for emotion analysis model | |
CN113282714A (en) | Event detection method based on differential word vector representation | |
CN113628288B (en) | Controllable image subtitle generation optimization method based on coder-decoder structure | |
CN114529908A (en) | Offline handwritten chemical reaction type image recognition technology | |
WO2024000966A1 (en) | Optimization method for natural language model | |
CN116561325B (en) | Multi-language fused media text emotion analysis method | |
CN116958700A (en) | Image classification method based on prompt engineering and contrast learning | |
CN113342982B (en) | Enterprise industry classification method integrating Roberta and external knowledge base | |
CN114254080A (en) | Text matching method, device and equipment | |
CN113505587A (en) | Entity extraction method, related device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |