CN114970857A

CN114970857A - Optimization method for relational extraction model

Info

Publication number: CN114970857A
Application number: CN202210753408.0A
Authority: CN
Inventors: 宋彦; 田元贺; 李世鹏
Original assignee: Suzhou Sicui Artificial Intelligence Research Institute Co ltd
Current assignee: Suzhou Sicui Artificial Intelligence Research Institute Co ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-08-30

Abstract

The invention relates to the technical field of natural language processing, in particular to an optimization method for a relation extraction model, wherein the relation extraction model comprises a main model, an entity recognition learning module and a discriminator, the main model comprises a shared encoder and a relation extraction encoder, and the method comprises the following steps: acquiring an input sentence through a shared encoder in the main model, encoding the sentence, and outputting a hidden vector of each word in the sentence; inputting the hidden vector into a relation extraction encoder, an entity recognition learning module and a discriminator to respectively obtain relation type loss, entity label loss and discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain the overall loss; and carrying out primary optimization on the relation extraction model through the overall loss. The optimization method provided by the invention enhances the understanding of the model to the meaning of the entity through the learning of the entity identification, thereby improving the performance of the model to the relation extraction task.

Description

Optimization method for relational extraction model

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of natural language processing, in particular to an optimization method for a relation extraction model.

[ background of the invention ]

The relationship extraction task aims at extracting (predicting) a relationship between two given entities of a given sentence from the two entities. The understanding of the meaning of the entity is very important for the accuracy of the relationship prediction, and then the existing method often neglects the modeling of the entity and has insufficient understanding of the meaning of the entity.

[ summary of the invention ]

In order to solve the problem that the meaning of an entity is not sufficiently understood by the existing method, the invention provides an optimization method for a relation extraction model.

The present invention provides an optimization method for a relationship extraction model to solve the above technical problem, where the relationship extraction model includes a main model, an entity recognition learning module and a discriminator, the main model includes a shared encoder and a relationship extraction encoder, and the method includes the following steps: acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement; inputting the hidden vector into the relation extraction encoder, the entity identification learning module and the discriminator to respectively obtain the relation type loss, the entity label loss and the discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain an overall loss; and carrying out primary optimization on the relation extraction model through overall loss.

Preferably, the obtaining of the relationship type loss by the main model comprises the following steps: inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word; calculating the hidden vector of each word and the relation extraction hidden vector through a preset second algorithm to obtain a predicted relation type; and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss.

Preferably, the preset second algorithm comprises the steps of: calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of a first entity, a vector representation of a second entity and a sentence representation of a relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of a shared encoder; connecting the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relation extraction encoder and the sentence representation of the sharing encoder in series to obtain an intermediate vector; and the intermediate vector is sent into a SoftMax classifier after passing through a full connection layer, so that the predicted relation type is obtained.

Preferably, the entity identification learning module comprises an entity encoder, and the step of obtaining the entity tag loss by the entity identification learning module comprises the following steps: inputting the hidden vector into an entity encoder in the entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing the predicted entity identification labels of all words with a preset standard for calculation to obtain the entity label loss.

Preferably, the conversion process comprises the steps of: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification label.

Preferably, the preset standard is an actual entity label, the predicted entity identification labels of all the words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss.

Preferably, the actual entity label of each word is determined by the position of the entity in the input, and when a word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; and if one word is in the middle or at the end of a certain entity, the entity label is a third label.

Preferably, the obtaining of the discriminator loss by the discriminator includes the steps of: the discriminator obtains a result of comparison between the predicted entity identification label and a preset standard to obtain target output, and the value of the target output is 0 or 1; sending the hidden vector into a full connection layer and a SoftMax classifier, and obtaining a 2-dimensional vector for each word, wherein each dimension of the 2-dimensional vector corresponds to the distribution probability of target output on 0 and 1; and calculating to obtain the loss of the discriminator according to the distribution probability.

Preferably, in calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used for controlling the contribution of the entity recognition learning module and the arbiter for resisting learning to the model training.

Preferably, the method further comprises the steps of: and after the relation extraction model is optimized through the overall loss, adjusting parameters of partial modules in the relation extraction model, and performing secondary optimization updating on the relation extraction encoder and model parameters of the full connection layer.

Compared with the prior art, the optimization method for the relational extraction model provided by the invention has the following beneficial effects:

1. the embodiment of the invention provides an optimization method for a relation extraction model, wherein the relation extraction model comprises a main model, an entity recognition learning module and a discriminator, the main model comprises a sharing encoder and a relation extraction encoder, and the optimization method comprises the following steps: acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement; inputting the hidden vector into a relation extraction encoder, an entity recognition learning module and a discriminator to respectively obtain relation type loss, entity label loss and discriminator loss; calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain the overall loss; and carrying out primary optimization on the relation extraction model through the overall loss. It can be understood that, by setting the entity identification module in the preliminary optimization, the shared encoder in the main model realizes the learning of the entity in the text, and enhances the modeling capability of the main model on the entity, thereby improving the performance of the relationship extraction model on the relationship extraction task.

2. In the optimization method for the relational extraction model provided by the embodiment of the present invention, obtaining a relation type loss by a main model includes the following steps: inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word; extracting a hidden vector from the hidden vector and the relation of each word, and performing calculation processing through a preset second algorithm to obtain a predicted relation type; and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss. Understandably, the hidden vector and the coded relation extraction hidden vector are processed through a second algorithm to obtain a prediction relation type of the current main model, the predicted relation type is compared with a preset standard for calculation to obtain a relation type loss for representing the relation extraction model prediction relation type capability, the relation type loss can optimize the relation prediction capability of the relation extraction model in the process of primarily optimizing the relation extraction model through the overall loss, and the reliability of the optimization method is ensured.

3. In the optimization method for the relational extraction model provided by the embodiment of the invention, the preset second algorithm comprises the following steps: calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of the first entity, a vector representation of the second entity and a sentence representation of the relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of the shared encoder; connecting the vector representation of the first entity, the vector representation of the second entity, the vector representation of the sentence and the sentence representation of the shared encoder in series to obtain an intermediate vector; and the intermediate vector is sent into a SoftMax classifier after passing through a full connection layer, so as to obtain the predicted relation type. The method includes the steps that a hidden vector and a relation extraction hidden vector are calculated through a preset vector algorithm, vector representations of a first entity, a second entity, sentence representations of a relation extraction encoder and sentence representations of a sharing encoder are obtained and serve as intermediate variables, vector representations of different positions of the entities or the sentences in a relation extraction model are reflected, the intermediate vectors obtained after the vector representations are connected in series are subjected to classification and normalization processing of a full connection layer and SoftMax, predicted relation types (scalars) of the two entities by a main model are obtained, the predicted relation types are embodied in the form of scalars, and subsequent relation type loss calculation is facilitated.

4. In the optimization method for the relationship extraction model provided by the embodiment of the present invention, the entity identification learning module includes an entity encoder, and the entity identification learning module obtains the entity label loss, including the following steps: inputting the hidden vector into an entity encoder in an entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing and calculating the predicted entity identification labels of all the words with a preset standard to obtain the entity label loss. It is to be understood that the further encoding of the hidden vector by the entity recognition encoder results in an entity recognition hidden vector for each word, and the hidden vector and the entity recognition hidden vector for each word are converted into an entity recognition tag (scalar) for prediction that a word is at an approximate position of an entity. Therefore, the predicted entity identification label is embodied in a scalar form, and the entity identification loss is easier to calculate in the follow-up process.

5. In an optimization method for a relational extraction model provided in an embodiment of the present invention, a conversion process includes the following steps: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification label. It can be understood that the mode of classifying the vectors after being connected in series through the full connection layer and the SoftMax classifier and converting the vectors into the predicted entity identification tags is the same as the mode of converting the intermediate vectors into the predicted relationship classes in the main model, and the two modes are unified, so that the establishment, optimization and maintenance of the relationship extraction model are facilitated.

6. In the optimization method for the relation extraction model provided by the embodiment of the invention, the preset standard is an actual entity label, the predicted entity identification labels of all words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss. The method can be understood that the difference between the predicted value and the actual value, namely the entity tag loss, can be obtained by calculating the predicted entity tag and the actual entity tag through a cross entropy loss function, and the relationship extraction model is further optimized through the entity tag loss, so that the understanding of the model on the entity meaning can be enhanced, and the prediction capability of the model on the entity tag can be improved.

7. In the optimization method for the relationship extraction model provided by the embodiment of the invention, the actual entity label of each word is determined by the position of the entity in the input, and when a word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; if a word is in the middle or at the end of an entity, the entity label is a third label. Understandably, by tagging words in sentences, the trainable entity recognition learning module can find the entity ability from the text, thereby enhancing the comprehension ability of the relationship extraction model to the entity.

8. In the optimization method for the relational extraction model provided by the embodiment of the invention, the step of acquiring the loss of the discriminator by the discriminator comprises the following steps: the discriminator acquires a result of comparison between the predicted entity identification tag and a preset standard to obtain target output, and the value of the target output is 0 or 1; the hidden vectors are sent into a full connection layer and a SoftMax classifier, a 2-dimensional vector is obtained for each word, and each dimension of the 2-dimensional vector corresponds to the distribution probability of target output on 0 and 1; and obtaining the loss of the discriminator through distribution probability calculation. Understandably, the setting of the discriminator effectively controls the learning degree of the shared encoder to the entity recognition task, avoids overfitting of the shared encoder to the entity recognition task, and if the discriminator provided by the discriminator is not available, the overall loss is optimized towards the entity recognition loss during primary optimization, so that the performance of the relation extraction model to the main task-relation extraction is influenced. It can be seen that the arbiter penalty provided by the arbiter guarantees that the main performance of the relational extraction model is not erroneously covered when initially optimized by the global penalty.

9. In the optimization method for the relation extraction model, provided by the embodiment of the invention, when the overall loss is calculated, adjustable control parameters are introduced, and the control parameters are used for controlling the entity recognition learning module and the contribution of a recognizer for resisting learning to model training. It can be understood that the control parameters provide adjustable options for the calculation of the overall loss, and the emphasis on optimizing the relation extraction model can be controlled by adjusting the control parameters when the relation extraction model is optimized. Therefore, the practicability of the optimization method is improved by setting the control parameters.

10. The optimization method for the relation extraction model provided by the embodiment of the invention further comprises the following steps: and after the relation extraction model is optimized through the whole loss, secondary optimization updating is carried out on the relation extraction encoder and the model parameters of the full connection layer by adjusting the parameters of partial modules in the relation extraction model. Understandably, by performing the second optimization updating, the relationship extraction model after the second optimization can obtain better performance. Meanwhile, the relation extraction model after secondary optimization is the same as the baseline model of relation extraction in use, only sentences and entities are needed to be used as input, additional input is not needed, and compared with the baseline model, the performance is enhanced while additional use expense is not caused.

[ description of the drawings ]

Fig. 1 is a schematic diagram of steps of an optimization method for a relational extraction model according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a relational extraction model of an optimization method for a relational extraction model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of logic steps of an optimization method for a relational extraction model according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a step of calculating a relationship type loss by the optimization method for the relationship extraction model according to the embodiment of the present invention.

Fig. 5 is a schematic diagram of a step of calculating an entity identification tag loss by the optimization method for the relational extraction model according to the embodiment of the present invention.

Fig. 6 is a schematic diagram of quadratic optimization of an optimization method for a relational extraction model according to an embodiment of the present invention.

The attached drawings indicate the following:

1. an optimization method;

10. a relation extraction model;

100. a master model; 101. an entity recognition learning module; 102. a discriminator;

1000. a shared encoder; 1001. a relation extraction encoder; 1002. a first fully-connected layer; 1003. a first SoftMax classifier; 1010. a physical encoder; 1011. a second fully connected layer; 1012. a second SoftMax classifier; 1020. a third fully connected layer; 1021. a third SoftMax classifier.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and implementation examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1 to fig. 3, a first embodiment of the present invention provides an optimization method 1 for a relational extraction model 10, including the following steps: obtaining an input sentence through a Shared Encoder 1000(Shared Encoder) in a preset main model 100(RE), encoding the sentence, and outputting a hidden vector of each word in the sentence

THE relation extraction model 10 includes a main model 100, an entity recognition learning module 101(NER), and a Discriminator 102(THE Discriminator), and THE main model 100 further includes a relation extraction encoder 1001(REencoder) for converting a hidden vector into a hidden vector

The input relationship extraction encoder 1001, the entity recognition learning module 101 and the discriminator 102 respectively obtain the relationship type loss L _RE Entity tag loss L _NER And arbiter 102 loss; losing the relationship type by L _RE Entity tag loss L _NER And a loss L of the discriminator 102 _D Calculating to obtain the overall loss L through a preset first algorithm; the relational extraction model 10 is preliminarily optimized by the overall loss L. It can be understood that, by setting the entity identification learning module 101 in the preliminary optimization, the shared encoder 1000 in the main model 100 realizes the learning of the entities in the text, and enhances the modeling capability of the main model 100 on the entities, thereby improving the performance of the relationship extraction model 10 on the relationship extraction task.

Referring to fig. 2 and 4, in some embodiments, obtaining the relationship type loss from the main model 100 includes the following steps: to hide the vector

The relation extracting encoder 1001 in the main model 100 is inputted, and the relation extracting encoder 1001 is applied to the hidden vector

Further coding to obtain relation extraction hidden vector of each word

By latent vectors for each word

Sum relation extraction hidden vector

Calculating by a preset second algorithm to obtain a predicted relationship type

Type of relationship to be predicted

Comparing and calculating with a preset standard to obtain the relation type loss L _RE . Understandably, the hidden vector will be

Extracting hidden vectors from coded relations

The prediction relationship type of the current main model 100 can be obtained by processing through the second algorithm

Type of relationship to be predicted

Comparing and calculating with a preset standard to obtain a relation type loss L for representing the capability of the relation extraction model 10 for predicting the relation type _RE Loss of relationship type L _RE The relation prediction capability of the relation extraction model 10 can be optimized in the process of primarily optimizing the relation extraction model 10 through the overall loss L, and the reliability of the optimization method 1 is guaranteed.

Referring to fig. 2, in some embodiments, the predetermined second algorithm includes the following steps: extracting hidden vectors from the relation by a preset vector algorithm

Calculating to obtain the vector representation of the first entity

Vector characterization of a second entity

And sentence characterization by the relation extraction encoder 1001

Meanwhile, a preset vector algorithm is applied to the hidden vector to obtain the sentence representation of the shared encoder 1000

Characterizing a vector of a first entity

Vector characterization of a second entity

Sentence characterization by relational extraction encoder 1001

And sentence characterization of shared encoder 1000

Connecting in series to obtain an intermediate vector o; the intermediate vector o is sent to a first SoftMax classifier 1003 after passing through a preset first full connection layer 1002 to obtain a predicted relation type

Understandably, the hidden vectors and the relation extraction hidden vectors are calculated through a preset vector algorithm to obtain the vector representation of the first entity

Vector characterization of a second entity

(E ₁ And E ₂ Representing two entities separately), sentence characterization by the relation extraction encoder 1001

And sentence characterization of shared encoder 1000

As intermediate variables, vector representations of different positions of the entities or sentences in the relationship extraction model 10 are reflected, and after classification and normalization processes of the first fully-connected layer 1002 and the first SoftMax classifier 1003 are performed on the intermediate vector o obtained after the vector representations are connected in series, predicted relationship types of the main model 100 for the two entities are obtained

(scalar) type of relationship to be predicted

Embodied in scalar form, facilitating subsequent pair relationship type loss L _RE And (4) calculating.

In some embodiments, the predetermined vector algorithm is MaxPooling algorithm. The concrete method is as follows:

referring to fig. 2 and 5, in some embodiments, the entity identification learning module 101 includes an entity encoder 1010, and the step of the entity identification learning module 101 obtaining the entity tag loss includes the following steps: will be provided withLatent vector

The entity encoder 1010 is input into the entity recognition learning module 101, and the entity encoder 1010 is used for generating a hidden vector

Further coding to obtain entity identification hidden vector of each word

By latent vectors for each word

And entity identification hidden vector

Performing conversion processing to obtain predicted entity identification label

Tagging predicted entity identities of all words

Comparing and calculating with a preset standard to obtain the entity label loss L _NER . Understandably, the hidden vectors are aligned by the entity encoder 1010

The further coding of the word obtains an entity identification hidden vector of each word

And for each word latent vector

And entity identification hidden vector

Performing conversion processing to convert into a wordPredicted entity identification tag at approximate location of entity

(scalar), it should be understood that scalar refers herein to a quantity that will have no direction, and its content may be a number or a character. As can be seen, the entities to be predicted are tagged with identification

Embodied in scalar form, the entity identification loss is easier to calculate subsequently.

Referring to FIG. 2, in some embodiments, the conversion process includes the following steps: implicit vector of each word

Implicit vector with entity identification

Concatenating the vectors and sending the concatenated vectors to a second fully-connected layer 1011 and a second SoftMax classifier 1012 to obtain the predicted entity identification tag

Understandably, the manner in which the concatenated vector is sorted and converted into predicted entity identification tags by the second fully-connected layer 1011 and the second SoftMax classifier 1012

And the relationship class in the main model 100 that converts the intermediate vector o into a prediction

The same way is adopted, and the two ways are unified, so that the establishment, optimization and maintenance of the relation extraction model 10 are more convenient.

In some embodiments, the predetermined criteria is the actual entity tag

Tagging predicted entity identities of all words

With actual entity labels

Comparing, and calculating the comparison result through a cross entropy loss function to obtain an entity label loss L _NER . Understandably, by tagging predicted entities

With actual entity labels

The difference between the predicted value and the actual value, namely the entity label loss L, can be obtained by using the cross entropy loss function to calculate _NER Loss of L by entity tag _NER The relation extraction model 10 is further optimized, so that the understanding of the model to the meaning of the entity can be enhanced, and the prediction capability of the model to the entity label is improved.

In some embodiments, the actual entity label of each word is determined by the position of the entity input into the shared encoder 1000, and when a word does not belong to a certain entity, the entity label is the first label; if a word is the beginning of an entity, the entity label is a second label; if a word is in the middle or at the end of an entity, the entity label is a third label. Specifically, the first label is O, the second label is B, and the third label is I. Referring to FIG. 1, An exemplary input sentence is Anair forcepilotis back, the two entities of the input areair forceAndpilotthat inputs the actual entity tag for each word in the sentence

It is OBIBOO. Understandably, by tagging words in the input sentence and tagging predicted entities

With actual entity labels

In contrast, the trainable entity recognition learning module 101 finds the entity's ability from the text, thereby enhancing the comprehension of the entity by the relationship extraction model 10.

Continuing to refer to fig. 2, in some embodiments, the obtaining of the loss of the arbiter 102 by the arbiter 102 comprises: the arbiter 102 obtains the predicted entity identification tag

Comparing the result with a preset standard to obtain a target output

Target output

The value is 0 or 1, and the specific mode is as follows:

to hide the vector

A preset third full link layer 1020 and a third SoftMax classifier 1021 are fed to obtain a 2-dimensional vector for each word, each dimension of the 2-dimensional vector corresponding to the distribution probability of the target output over 0 and 1 (e.g.,

the distribution probability of the prediction at 0 is denoted as P (0| X)); the arbiter 102 loss is obtained by a distribution probability calculation.

In some embodiments, for each word, a logarithmic natural loss is calculated and the losses for all words are summed to yield a discriminator 102 loss L _D . Specifically, the following formula is used for calculation:

it can be understood that the setting of the discriminator 102 effectively controls the learning degree of the shared encoder 1000 on the entity recognition task, avoids overfitting of the shared encoder 1000 on the entity recognition task, and if there is no loss of the discriminator 102 provided by the discriminator 102, the overall loss is optimized more toward the entity recognition loss during the preliminary optimization, thereby affecting the performance of the relationship extraction model 10 on the main task-relationship extraction. It can be seen that the arbiter 102 penalty provided by the arbiter 102, when initially optimized by global penalty, ensures that the main performance of the relational extraction model 10 is not erroneously covered.

In some embodiments, in calculating the overall loss, an adjustable control parameter λ is introduced, which is used to control the magnitude of the entity recognition learning module 101 and the contribution of the discriminators 102 to the model training for counterlearning. It can be understood that the control parameter λ provides an adjustable option for calculating the overall loss L, and when the relation extraction model 10 is optimized, the emphasis on optimizing the relation extraction model 10 can be controlled by adjusting the control parameter λ. It can be seen that the setting of the control parameters improves the practicality of the optimization method 1.

Specifically, the overall loss L is calculated by the following formula:

L＝L _RE +L _D ×(λ*L _NER )

and after calculating the overall loss L, updating all parameters in the relational extraction model 10 by a back propagation algorithm.

Referring to fig. 2 and fig. 6, in some embodiments, the optimization method 1 further includes the following steps: after the relationship extraction model 10 is optimized by the overall loss L, the second optimization update is performed on the relationship extraction encoder 1001 and the model parameters of all the fully-connected layers by adjusting the parameters of some modules in the relationship extraction model 10. It can be understood that by performing the second optimization update, the relationship extraction model 10 after the second optimization can achieve better performance. Meanwhile, the relation extraction model 10 after the secondary optimization is the same as the baseline model of the relation extraction in use, only sentences and entities are needed as input, no additional input is relied on, and compared with the baseline model, the performance is enhanced and no additional use expense is caused.

In some embodiments, the second optimization update comprises the steps of: initializing the shared encoder 1000 and the relation extraction encoder 1001 in the main model 100, the first, the second, and the third full connection layers and the SoftMax classifier according to the parameters of the relation extraction model 10 optimized by the overall loss L;

calculating the optimized relation type loss L in the same way as in the primary optimization _RE ；

Loss L according to optimized relationship type _RE And performing secondary optimization updating on the relation extraction encoder 1001 and the model parameters of the first, second and third fully-connected layers through a back propagation algorithm to obtain a final optimized relation extraction model 10.

It can be understood that, regarding the representation of the performance of the relationship extraction model 10 on the relationship extraction task, it is generally expressed by F value, and for the extraction dataset of english relationship, before the optimization is performed by using the optimization method 1, the F value of the relationship extraction model 10 is 77.04; after optimization using optimization method 1, the F value of the relational extraction model 10 was 77.70. Therefore, by introducing the entity recognition learning module 101, the modeling capability of the model on the entity is enhanced, so that the performance of the relationship extraction model 10 on the relationship extraction task is improved.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also appreciate that the embodiments described in this specification are exemplary and alternative embodiments, and that the acts and modules illustrated are not required in order to practice the invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Compared with the prior art, the optimization method for the relation extraction model provided by the invention has the following beneficial effects:

4. The embodiment of the invention provides an optimization method for a relationship extraction model, wherein an entity identification learning module comprises an entity encoder, and the entity identification learning module obtains the entity label loss and comprises the following steps: inputting the hidden vector into an entity encoder in an entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word; converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label; and comparing and calculating the predicted entity identification labels of all the words with a preset standard to obtain the entity label loss. It is to be understood that the further encoding of the hidden vector by the entity recognition encoder results in an entity recognition hidden vector for each word, and the hidden vector and the entity recognition hidden vector for each word are converted into an entity recognition tag (scalar) for prediction that a word is at an approximate position of an entity. Therefore, the predicted entity identification label is embodied in a scalar form, and the entity identification loss is easier to calculate in the follow-up process.

5. In an optimization method for a relational extraction model provided in an embodiment of the present invention, a conversion process includes the following steps: and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a full connection layer and a SoftMax classifier so as to obtain the predicted entity identification tag. It can be understood that the mode of classifying the vectors after being connected in series through the full connection layer and the SoftMax classifier and converting the vectors into the predicted entity identification tags is the same as the mode of converting the intermediate vectors into the predicted relationship classes in the main model, and the two modes are unified, so that the establishment, optimization and maintenance of the relationship extraction model are facilitated.

6. In the optimization method for the relationship extraction model provided by the embodiment of the invention, the preset standard is an actual entity label, the predicted entity identification labels of all words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss. The method can be understood that the difference between the predicted value and the actual value, namely the entity tag loss, can be obtained by calculating the predicted entity tag and the actual entity tag through a cross entropy loss function, and the relationship extraction model is further optimized through the entity tag loss, so that the understanding of the model on the entity meaning can be enhanced, and the prediction capability of the model on the entity tag can be improved.

9. In the optimization method for the relation extraction model provided by the embodiment of the invention, when the overall loss is calculated, adjustable control parameters are introduced, and the control parameters are used for controlling the contribution of an entity recognition learning module and a counterstudy discriminator to model training. It can be understood that the control parameters provide adjustable options for the calculation of the overall loss, and the emphasis on optimizing the relation extraction model can be controlled by adjusting the control parameters when the relation extraction model is optimized. Therefore, the practicability of the optimization method is improved by setting the control parameters.

The optimization method for the relation extraction model disclosed by the embodiment of the invention is described in detail, a specific example is applied in the description to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for the persons skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present description should not be construed as a limitation to the present invention, and any modification, equivalent replacement, and improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An optimization method for a relationship extraction model, the relationship extraction model comprising a main model, an entity recognition learning module and a discriminator, the main model comprising a sharing encoder and a relationship extraction encoder, characterized in that: the method comprises the following steps:

acquiring an input statement through a shared encoder in the main model, encoding the statement, and outputting a hidden vector of each word in the statement;

inputting the hidden vector into the relation extraction encoder, the entity identification learning module and the discriminator to respectively obtain the relation type loss, the entity label loss and the discriminator loss;

calculating the relationship type loss, the entity label loss and the discriminator loss through a preset first algorithm to obtain an overall loss;

and carrying out primary optimization on the relation extraction model through the overall loss.

2. The method of claim 1, wherein: the obtaining of the relationship type loss by the main model comprises the following steps:

inputting the hidden vector into a relation extraction encoder in the main model, and further encoding the hidden vector by the relation extraction encoder to obtain a relation extraction hidden vector of each word;

calculating the hidden vector of each word and the relation extraction hidden vector through a preset second algorithm to obtain a predicted relation type;

and comparing the predicted relationship type with a preset standard for calculation to obtain the relationship type loss.

3. The method of claim 2, wherein: the preset second algorithm comprises the following steps:

calculating the relation extraction hidden vector through a preset vector algorithm to obtain a vector representation of a first entity, a vector representation of a second entity and a sentence representation of a relation extraction encoder, and simultaneously applying the preset vector algorithm to the hidden vector to obtain a sentence representation of a shared encoder;

connecting the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relation extraction encoder and the sentence representation of the sharing encoder in series to obtain an intermediate vector;

and the intermediate vector is sent into a preset first SoftMax classifier after passing through a preset first full connection layer, so that the predicted relation type is obtained.

4. The method of claim 1, wherein: the entity identification learning module comprises an entity encoder, and the step of obtaining the entity label loss by the entity identification learning module comprises the following steps:

inputting the hidden vector into an entity encoder in the entity recognition learning module, and further encoding the hidden vector by the entity encoder to obtain an entity recognition hidden vector of each word;

converting the hidden vector of each word and the entity identification hidden vector to obtain a predicted entity identification label;

and comparing the predicted entity identification labels of all words with a preset standard for calculation to obtain the entity label loss.

5. The method of claim 4, wherein: the conversion process includes the steps of:

and connecting the hidden vector of each word with the entity identification hidden vector in series, and sending the vector obtained after the connection in series into a preset second full connection layer and a preset second SoftMax classifier so as to obtain the predicted entity identification tag.

6. The method of claim 4, wherein:

and the preset standard is an actual entity label, the predicted entity identification labels of all the words are compared with the actual entity label, and the comparison result is calculated through a cross entropy loss function to obtain the entity label loss.

7. The method of claim 6, wherein: the actual entity label of each word is determined by the position of the entity in the input, and when one word does not belong to a certain entity, the entity label is a first label; if a word is the beginning of an entity, the entity label is a second label; and if one word is in the middle or at the end of a certain entity, the entity label is a third label.

8. The method of claim 6, wherein: the step of the arbiter obtaining the arbiter loss comprises the steps of:

the discriminator obtains a result of comparison between the predicted entity identification label and a preset standard to obtain target output, and the value of the target output is 0 or 1;

sending the hidden vector to a preset third full connection layer and a third SoftMax classifier, and obtaining a 2-dimensional vector for each word, wherein each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1;

and calculating the loss of the discriminator according to the distribution probability.

9. The method of claim 1, wherein: and when the overall loss is calculated, introducing adjustable control parameters, wherein the control parameters are used for controlling the contribution of the entity recognition learning module and the arbiter for resisting learning to model training.

10. The method of claim 5, wherein: the method further comprises the steps of:

and after the relation extraction model is optimized through the overall loss, secondary optimization updating is carried out on the relation extraction encoder and the model parameters of the full connection layer by adjusting the parameters of partial modules in the relation extraction model.