WO2024000966A1

WO2024000966A1 - Optimization method for natural language model

Info

Publication number: WO2024000966A1
Application number: PCT/CN2022/128623
Authority: WO
Inventors: 宋彦; 田元贺; 李世鹏
Original assignee: 苏州思萃人工智能研究所有限公司
Priority date: 2022-06-29
Filing date: 2022-10-31
Publication date: 2024-01-04

Abstract

Provided in the present application is an optimization method for a natural language model. The natural language model comprises a main model, an enhancement module and a discriminator, wherein the main model comprises a first encoder and a second encoder. The method comprises: acquiring an input statement by means of a first encoder, coding the statement, and outputting an implicit vector of each term in the statement; inputting the implicit vectors into a second encoder, an enhancement module and a discriminator, so as to respectively obtain a target result loss, an enhancement loss and a discriminator loss; by means of a preset first algorithm, performing calculation on the target result loss, the enhancement loss and the discriminator loss, so as to obtain an overall loss; and preliminarily optimizing a natural language model according to the overall loss.

Description

Optimization methods for natural language models

This application claims priority to the Chinese patent application with application number 202210753408.0 submitted to the China Patent Office on June 29, 2022, and claims priority to the Chinese patent application with application number 202210753407.6 submitted to the China Patent Office on June 29, 2022. The entire contents of both applications are hereby incorporated by reference.

Technical field

This application relates to the technical field of natural language processing, for example, to an optimization method for a natural language model.

Background technique

Natural language processing, such as the aspect-based sentiment analysis (Aspect-Based Sentiment Analysis, ABSA) task aims to predict the sentiment polarity for a specific aspect term, and the relationship extraction task aims to extract two entities from a given sentence. , extract (predict) the relationship between these two given entities. Among them, understanding the meaning of aspect words and entities themselves is very important for emotion prediction and relationship prediction. However, general methods often ignore the modeling of aspect words and entities themselves, resulting in insufficient understanding of the meaning of aspect words and entities themselves.

Contents of the invention

This application provides an optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The method includes:

Obtain the input sentence through the first encoder, encode the sentence, and output the hidden vector of each word in the sentence;

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;

The overall loss is calculated through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;

The natural language model is initially optimized through the overall loss.

In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the The main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss;

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:

Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word;

Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss;

Input the supplementary learning latent vector into the sentence restoration module to obtain the intermediate result and the supplementary learning loss;

The latent vector and the intermediate result are input into the discriminator to obtain the discriminator loss.

In one embodiment, the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relationship extraction encoder, and the The target result loss is a relationship type loss, and the enhancement loss is an entity label loss;

The latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.

This application also provides an optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The device includes:

The first module is configured to obtain the input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence;

The second module is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;

The third module is configured to calculate the overall loss through a preset first algorithm for the target result loss, the enhancement loss, and the discriminator loss;

The fourth module is configured to perform preliminary optimization on the natural language model through the overall loss.

This application also provides an electronic device, including a processor and a memory. When the computer program in the memory is executed by the processor, the above-mentioned optimization method for a natural language model is implemented.

This application also provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.

Description of drawings

Figure 1 is a schematic diagram of the steps of an optimization method for a natural language model provided by an embodiment of the present application;

Figure 2 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application;

Figure 3 is a schematic structural diagram of a sentiment analysis model used in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 4 is a schematic diagram of the logical steps of an optimization method for a natural language model provided by an embodiment of the present application;

Figure 5 is a schematic diagram of steps for calculating emotional polarity loss in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 6 is a schematic diagram of the steps for calculating supplementary learning loss in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 7 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application;

Figure 8 is a schematic structural diagram of a relationship extraction model used in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 9 is a schematic diagram of the logical steps of another optimization method for a natural language model provided by an embodiment of the present application;

Figure 10 is a schematic diagram of the steps for calculating relationship type loss in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 11 is a schematic diagram of the steps for calculating entity recognition label loss in an optimization method for a natural language model provided by an embodiment of the present application;

Figure 12 is a schematic diagram of secondary optimization of an optimization method for a natural language model provided by an embodiment of the present application;

Figure 13 is a schematic structural diagram of an optimization device for a natural language model provided by an embodiment of the present application;

Figure 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The present application will be described below with reference to the accompanying drawings and implementation examples. The specific embodiments described herein are merely illustrative of the present application.

As shown in Figure 1, the embodiment of the present application provides an optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. This method include:

The input sentence is obtained through the first encoder, the sentence is encoded, and the hidden vector of each word in the sentence is output; the hidden vector is input to the second encoder, the enhancement module and the The discriminator obtains the target result loss, the enhancement loss and the discriminator loss respectively; the target result loss, the enhancement loss and the discriminator loss are calculated to obtain the overall loss through the preset first algorithm; through the overall loss The loss performs preliminary optimization on the natural language model.

The enhancement module in this embodiment is used to enhance the main model's understanding and modeling capabilities of input objects, thereby enhancing model performance.

In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the main module also includes a supplementary learning encoder. The target The result loss is the emotional polarity loss, and the enhancement loss is the supplementary learning loss.

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including: inputting the latent vector into the supplementary learning encoder to obtain the supplementary learning latent vector, and The supplementary learning latent vector is calculated through vector algorithm to obtain the vector representation of the aspect word; the latent vector and the vector representation of the aspect word are input into the sentiment analysis encoder to obtain the emotional polarity loss; the supplementary learning latent vector is input into the sentence restoration module to obtain the intermediate result And supplementary learning loss; input the latent vector and intermediate results into the discriminator to obtain the discriminator loss.

Please refer to Figures 2 to 4. The sentiment analysis model includes a main model 100 (The sentiment Classifier), a sentence restoration module 101 (re-constructed sentence) and a discriminator 102 (Discriminator). The main model 100 includes a task-independent encoder 1000 (task -free encoder), sentiment analysis encoder 1001 (ABSA encoder) and supplementary learning encoder 1002 (CL encoder), where: the method includes the following steps: obtain the input sentence X through the task-free encoder 1000, and encode the sentence, obtaining The latent vector of each word in the sentence

will the hidden vector

Input the supplementary learning encoder 1002 to obtain the supplementary learning latent vector

and will supplement the learning of latent vectors

The vector representation of aspect words is calculated through vector algorithm.

(A represents the aspect word); convert the hidden vector

Vector representation with aspect words

Input the sentiment analysis encoder 1001 to obtain the sentiment polarity loss L _SA ; the hidden vector will be supplemented to learn

Input the sentence restoration module 101 to obtain the intermediate results and supplementary learning loss L _CL ; convert the latent vector

And the intermediate results are input into the discriminator 102 to obtain the discriminator loss _LD ; the emotional polarity loss L _SA , the supplementary learning loss L _CL and the discriminator loss _LD are calculated according to the preset first algorithm to obtain the overall loss L; through the overall loss L performs preliminary optimization on the sentiment analysis model. By setting up the sentence restoration module 101 in the preliminary optimization, the task-independent encoder 1000 and the supplementary learning encoder 1002 realize the learning of aspect words in the text, thereby enhancing the main model 100's understanding and modeling ability of aspect words, thereby Improved the performance of sentiment analysis models on sentiment analysis tasks.

Please combine Figure 3 to Figure 5. In some embodiments, obtaining the emotional polarity loss _LSA includes the following steps: convert the latent vector

Vector representation with aspect words

Input the sentiment analysis encoder 1001 to obtain the sentiment analysis vector representation of the vocabulary

And sentiment analysis vector representation of aspect words

latent vector

Sentiment analysis vector representation of words

And aspect word sentiment analysis vector representation

Calculation is performed according to the preset second algorithm to obtain the emotional polarity loss L _SA . will the hidden vector

Sentiment analysis vector representation of words

And aspect word sentiment analysis vector representation

Calculated according to the preset second algorithm, the emotional polarity loss L _SA used to characterize the emotional polarity prediction ability of the emotional analysis model can be obtained. The emotional polarity loss L _SA can be used to initially optimize the emotional analysis model based on the overall loss L. In the process, the emotional polarity prediction ability of the sentiment analysis model is optimized to ensure the reliability of the optimization method.

In some embodiments, the preset second algorithm includes the following steps:

Task-independent sentence representations are calculated according to the preset vector algorithm.

Sentiment analysis vector representation of words

And sentiment analysis vector representation of aspect words

Sentiment analysis sentence representation is calculated based on the preset vector algorithm.

Represent task-irrelevant sentences

Sentence Representation with Sentiment Analysis

The intermediate vector is obtained through series connection; the intermediate vector is input into the preset first fully connected layer 1003, and the output result of the first fully connected layer 1003 is input into the first SoftMax classifier 1004 to obtain the predicted emotional polarity.

Will predict the sentiment polarity of

The emotional polarity loss L _SA is calculated by comparing it with the preset standard. Hidden vectors are processed using preset vector algorithms

Sentiment analysis vector representation of words

And sentiment analysis vector representation of aspect words

Perform calculations to obtain task-independent sentence representations

Sentence Representation with Sentiment Analysis

As an intermediate variable, it reflects the vector representation of the different positions of the input sentence After processing, the predicted emotional polarity of the main model 100 pairs of words in the input sentence X is obtained.

(scalar) that will predict the sentiment polarity of

It is expressed in the form of a scalar to facilitate the subsequent calculation of the emotional polarity loss _LSA .

In some embodiments, the preset vector algorithm is the MaxPooling algorithm. Here's how:

Please combine Figure 3 and Figure 6. In some embodiments, the sentence restoration module 101 includes a supplementary learning decoder (The Specific Decoder) 1010. Obtaining the supplementary learning loss includes the following steps: adding the supplementary learning latent vector

The input supplementary learning decoder 1010 reconstructs the input sentence X and obtains the predicted vocabulary

will predict the vocabulary

The intermediate result is obtained by comparing it with the vocabulary x _t in the input sentence X, and the supplementary learning loss L _CL is calculated according to the preset third algorithm. The supplementary learning latent vector encoded by the supplementary learning encoder 1002 by the supplementary learning decoder 1010

Decoding and reconstruction are performed, and the predicted vocabulary x _t (scalar) is obtained. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss L _CL later.

In some embodiments, the preset third algorithm includes the following steps: calculating the negative log natural loss of the word at one position, and summing the negative log natural loss of the words at all positions to obtain the supplementary learning loss. By summing the losses of each word individually, the deviation of the model from the actual situation in the task of word prediction can be more accurately reflected, which is more conducive to the subsequent optimization of the word prediction ability of the sentiment analysis model. Illustratively, the supplementary learning loss is calculated by the following formula:

Among them, n represents the number of words in the input sentence X.

Please continue to refer to Figure 3. In some embodiments, obtaining the discriminator loss includes the following steps: according to the hidden vector

and the intermediate results to obtain the target output of the discriminator 102

output to target

Calculate according to the preset fourth algorithm to obtain the discriminator loss _LD . The discriminator 102 of adversarial learning effectively controls the learning degree of the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning task-the sentence reconstruction task, and avoids the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning. The over-fitting of the task ensures the optimization degree of fitting of the main model 100 to the main task-sentiment analysis task.

In some embodiments, the target output

The value is 0 or 1, the method is:

The preset fourth algorithm includes the following steps: will supplement the learning latent vector

Input the preset second fully connected layer 1020, and input the output result of the second fully connected layer 1020 to the preset second SoftMax classifier 1021, and obtain a 2-dimensional vector for each word, and each of the 2-dimensional vectors Dimensions correspond to target output

Distribution probability over 0 and 1, for example,

The predicted distribution probability on 0 is denoted as P(0|X).

The discriminator loss is obtained based on the distribution probability. The discriminator loss is calculated based on the distribution probability of the target output of the discriminator 102 on 0 and 1. The value of the discriminator loss can be used to intuitively determine whether the supplementary learning task is overfitting. The process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.

In some embodiments, for each word, logarithmic natural loss is calculated, and the losses of all words are summed to obtain the discriminator loss _LD . Calculated using the following formula:

In some embodiments, when calculating the overall loss L, an adjustable control parameter λ is introduced. The control parameter λ is used to control the contribution of the sentence restoration module 101 and the adversarial learning discriminator 102 to model training. The control parameter λ provides an adjustable option for the calculation of the overall loss L. When optimizing the sentiment analysis model, by adjusting the control parameter λ, the focus of optimizing the sentiment analysis model can be controlled. It can be seen that the setting of the control parameter λ improves the practicality of the optimization method.

Calculate the overall loss L by the following formula:

L＝L _SA +L _D ×(λ*L _CL )

And after calculating the overall loss L, all parameters in the sentiment analysis model are updated through the back propagation algorithm.

In some embodiments, the optimization method also includes the following steps: after optimizing the sentiment analysis model through the overall loss L, and then performing a secondary optimization update on the parameters in the sentiment analysis model by adjusting the parameters of some modules in the sentiment analysis model. . By performing secondary optimization updates, the sentiment analysis model after secondary optimization can achieve better performance. At the same time, the sentiment analysis model after secondary optimization is used the same as the baseline model of sentiment analysis. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.

In some embodiments, the secondary optimization includes the following steps: optimizing the parameters of the sentiment analysis model through the overall loss L, applying the task-independent encoder 1000, the supplementary learning encoder 1002, the sentiment analysis encoder 1001, and the preset vector algorithm Initialize with the first and second fully connected layers 1020 and SoftMax classifier; calculate the optimized emotional polarity loss in the same way as in the preliminary optimization

According to the optimized emotional polarity loss

The model parameters in the sentiment analysis encoder 1001, the first fully connected layer 1003 and the second fully connected layer 1020 are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model. Based on the sentiment analysis model after preliminary optimization, only some modules inside the model are optimized in the secondary optimization process, because some modules inside the preliminary optimization model are already close to the optimal optimization threshold after preliminary optimization. If it is optimized, the effect will not be very good, that is, continuing to perform secondary optimization on the entire sentiment analysis model will cause a waste of computing resources. Therefore, in the process of secondary optimization, only the sentiment analysis encoder 1001 and the first full connection are The model parameters in layer 1003 and the second fully connected layer 1020 are optimized and updated twice, which not only saves computing resources, but also achieves good optimization results.

Regarding the performance of the sentiment analysis model on sentiment analysis tasks, it is generally expressed by the F value. For the English relationship extraction data set, before using the optimization method for optimization, the F value of the sentiment analysis model was 77.04; using the optimization method for optimization Finally, the F-value of the sentiment analysis model is 77.70. It can be seen that by introducing the sentence restoration module 101, the model's modeling ability of aspect words is enhanced, thereby improving the performance of the sentiment analysis model on sentiment analysis tasks.

Compared with related technologies, the optimization method for natural language models provided by this application can achieve:

1. In the optimization method for natural language models provided by the embodiments of this application, the natural language model can be an emotion analysis model. The emotion analysis model includes a main model, a sentence restoration module and a discriminator. The main model includes a task-independent encoder, emotion Analytical encoder and supplementary learning encoder, wherein: the method includes the following steps: obtain the input sentence through the task-independent encoder, encode the sentence, and obtain the latent vector of each word in the sentence; input the latent vector into the supplementary learning encoder, Obtain the supplementary learning latent vector, and calculate the supplementary learning latent vector through the vector algorithm to obtain the vector representation of the aspect word; input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the emotional polarity loss; input the supplementary learning latent vector The sentence restoration module obtains the intermediate results and supplementary learning loss; inputs the latent vector and intermediate results into the discriminator to obtain the discriminator loss; calculates the emotional polarity loss, supplementary learning loss, and discriminator loss according to the preset first algorithm to obtain the overall Loss; preliminary optimization of the sentiment analysis model through overall loss. By setting up the sentence restoration module in the preliminary optimization, the task-independent encoder and the supplementary learning encoder realize the learning of aspect words in the text, which enhances the main model's understanding and modeling ability of aspect words, thereby improving sentiment analysis. Model performance on sentiment analysis tasks.

2. In the optimization method for natural language models provided by the embodiments of this application, obtaining the emotional polarity loss includes the following steps: input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder, and obtain the sentiment analysis vector representation of the vocabulary; Sentiment analysis vector representation of aspect words; the latent vector, sentiment analysis vector representation of words and aspect word sentiment analysis vector representation are calculated according to the preset second algorithm to obtain the emotional polarity loss. Calculate the latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words according to the preset second algorithm, and you can obtain the emotional polarity loss, which is used to characterize the emotional polarity prediction ability of the emotional analysis model. The loss can be used to optimize the emotional polarity prediction ability of the emotional analysis model during the preliminary optimization of the emotional analysis model through the overall loss, ensuring the reliability of the optimization method.

3. In the optimization method for natural language models provided by the embodiments of this application, the preset second algorithm includes the following steps: calculating latent vectors according to the preset vector algorithm to obtain task-independent sentence representations; performing emotional analysis on vocabulary The vector representation and the sentiment analysis vector representation of the aspect words are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation; the task-irrelevant sentence representation and the sentiment analysis sentence representation are concatenated to obtain an intermediate vector; the intermediate vector is input into the preset first full connection layer, and input the output result of the first fully connected layer into the preset SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss. The preset vector algorithm is used to calculate latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words, and obtain task-independent sentence representations and sentiment analysis sentence representations as intermediate variables, reflecting the input sentence in the sentiment analysis model. After concatenating the two vector representations, and after the classification and normalization processing of the first fully connected layer and the first SoftMax classifier, the prediction pole of the vocabulary in the input sentence by the main model is obtained. (scalar), which embodies the predicted polarity in the form of a scalar to facilitate the subsequent calculation of emotional polarity loss.

4. In the optimization method for natural language models provided by the embodiments of this application, the sentence restoration module includes a supplementary learning decoder, and obtaining the supplementary learning loss includes the following steps: inputting the supplementary learning latent vector into the supplementary learning decoder to reconstruct the input sentence, Obtain the predicted vocabulary; compare the predicted vocabulary with the vocabulary in the input sentence to obtain the intermediate result, and calculate the supplementary learning loss according to the preset third algorithm. The predicted vocabulary (scalar) is obtained by decoding and reconstructing the supplementary learning latent vector encoded by the supplementary learning encoder through the supplementary learning encoder. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss later.

5. In the optimization method for natural language models provided by the embodiment of this application, the preset third algorithm includes the following steps: calculate the negative log natural loss of the word in one position, and add the negative logarithm of the word in all positions. The word summation of the natural loss is obtained by supplementing the learned loss. By summing the losses of each word individually, the deviation of the model from the actual situation in the word prediction task can be more accurately reflected, which is more conducive to the subsequent optimization of the word prediction ability of the sentiment analysis model.

6. In the optimization method for natural language models provided by the embodiment of this application, obtaining the discriminator loss also includes the following steps: obtaining the target output of the discriminator based on the latent vector and the intermediate result; and calculating the target output according to the preset fourth The algorithm performs calculations and obtains the discriminator loss. The discriminator of adversarial learning effectively controls the learning degree of the task-independent encoder and the supplementary learning encoder on the supplementary learning task-the sentence reconstruction task, and avoids the overfitting of the task-independent encoder and the supplementary learning encoder on the supplementary learning task. , ensuring the degree of fitting optimization of the main model for the main task-sentiment analysis task.

7. In the optimization method for natural language model provided by the embodiment of the present application, the value of the target output is 0 or 1, and the preset fourth algorithm includes the following steps: input the supplementary learning latent vector into the preset second full algorithm. connection layer, and input the output result of the second fully connected layer to the preset second SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained based on the distribution probability. Calculate the discriminator loss based on the distribution probability of the discriminator's target output on 0 and 1. You can intuitively judge whether the supplementary learning task is overfitting through the value of the discriminator loss. The process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.

8. In the optimization method for natural language models provided by the embodiments of this application, when calculating the overall loss, adjustable control parameters are introduced. The control parameters are used to control the contribution of the sentence restoration module and the adversarial learning discriminator to model training. . The control parameters provide adjustable options for the calculation of the overall loss. When optimizing the sentiment analysis model, by adjusting the control parameters, you can control the focus of optimizing the sentiment analysis model. It can be seen that the setting of control parameters improves the practicality of the optimization method.

9. The optimization method for the natural language model provided by the embodiment of the present application also includes the following steps: after optimizing the sentiment analysis model through the overall loss, and then adjusting the parameters of some modules in the sentiment analysis model to perform the sentiment analysis. The parameters in the model are updated through secondary optimization. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.

10. In the optimization method for natural language models provided by the embodiments of this application, the secondary optimization includes the following steps: optimizing parameters of the sentiment analysis model through overall loss, task-independent encoder, supplementary learning encoder, sentiment analysis The encoder, the preset vector algorithm, the first and second fully connected layers and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss , the model parameters in the sentiment analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model. Based on the sentiment analysis model after preliminary optimization, only some modules inside the model are optimized in the secondary optimization process, because some modules inside the preliminary optimization model are already close to the optimal optimization threshold after preliminary optimization. If it is optimized, the effect will not be very good. That is, continuing to perform secondary optimization on the entire sentiment analysis model will cause a waste of computing resources. Therefore, in the process of secondary optimization, only the sentiment analysis encoder and the first fully connected layer will be optimized. Performing a secondary optimization update with the model parameters in the second fully connected layer can achieve good optimization results while saving computing resources.

In one embodiment, the natural language model is a relation extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relation extraction encoder, the target result loss is a relation type loss, and the enhancement loss is the entity label loss.

Input the latent vector into the second encoder, enhancement module and discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:

Input the latent vector into the relationship extraction encoder, entity recognition learning module and discriminator to obtain the relationship type loss, entity label loss and discriminator loss respectively.

Please combine Figure 7 to Figure 9. An optimization method for a natural language model includes the following steps: through the shared encoder 2000 (Shared Encoder) in the preset main model 200 (Relationship extraction (RE)) Get the input sentence, encode the sentence, and output the hidden vector of each word in the sentence

The relationship extraction model includes a main model 200, an entity recognition learning module 201 (NER) and a discriminator 202 (THE Discriminator). The main model 200 also includes a relationship extraction encoder 2001 (RE encoder), which converts the latent vector into

Input the relationship extraction encoder 2001, the entity recognition learning module 201 and the discriminator 202 to obtain the relationship type loss L _RE , the entity label loss L _NER and the discriminator 202 loss respectively; combine the relationship type loss L _RE , the entity label loss L _NER and the discriminant loss The loss L _D of the device 202 is calculated by the preset first algorithm to obtain the overall loss L; the relationship extraction model is initially optimized through the overall loss L. By setting up the entity recognition learning module 201 in the preliminary optimization, the shared encoder 2000 in the main model 200 realizes the learning of entities in the text, which enhances the modeling ability of the main model 200 for entities, thus improving the relationship extraction model. Performance on relation extraction tasks.

Please combine Figure 8 and Figure 10. In some embodiments, the main model 200 obtains the relationship type loss including the following steps: convert the latent vector

Input the relationship extraction encoder 2001 in the main model 200, and the relationship extraction encoder 2001 pairs the latent vector

Encode to obtain the relationship extraction latent vector of each word

Through the latent vector of each word

and relationship extraction latent vector

Perform calculation processing through the preset second algorithm to obtain the predicted relationship type

The type of relationship that will be predicted

Compare and calculate with the preset standard to obtain the relationship type loss L _RE . will the hidden vector

Extract latent vectors from the encoded relationship

Through processing by the second algorithm, the predicted relationship type of the current main model 200 can be obtained

The type of relationship that will be predicted

Compare and calculate with the preset standard to obtain the relationship type loss L _RE that is used to characterize the ability of the relationship extraction model to predict relationship types. The relationship type loss L _RE can be used to initially optimize the relationship extraction model through the overall loss L. The relationship prediction ability of the extracted model is optimized to ensure the reliability of the optimization method.

Please refer to Figure 8. In some embodiments, the preset second algorithm includes the following steps: extracting latent vectors for relationships through a preset vector algorithm.

Perform calculations to obtain the vector representation of the first entity

Vector representation of the second entity

and Sentence Representation with Relation Extraction Encoder 2001

At the same time, the preset vector algorithm is applied to the hidden vector to obtain the sentence representation of the shared encoder 2000

Represent the vector of the first entity

Vector representation of the second entity

Sentence Representation for Relation Extraction Encoder 2001

and shared encoder 2000 sentence representation

concatenate to obtain the intermediate vector o; the intermediate vector o passes through the preset first fully connected layer 2002 and then is sent to the first SoftMax classifier 2003 to obtain the predicted relationship type.

The vector representation of the first entity is obtained by calculating the implicit vector and the relationship extraction implicit vector through the preset vector algorithm.

Vector representation of the second entity

(E ₁ and E ₂ represent two entities respectively), sentence representation of relation extraction encoder 2001

and shared encoder 2000 sentence representation

As an intermediate variable, it reflects the vector representation of the different positions of entities or sentences in the relationship extraction model. The above-mentioned vector representation is the intermediate vector o obtained after concatenation, after passing through the first fully connected layer 2002 and the first SoftMax classifier 2003 After classification and normalization, the relationship type predicted by the main model 200 for two entities is obtained.

(scalar), the type of relationship that will be predicted

It is expressed in the form of a scalar to facilitate the subsequent calculation of the relationship type loss L _RE .

Please combine Figure 8 and Figure 11. In some embodiments, the entity recognition learning module 201 includes an entity encoder 2010. The entity recognition learning module 201 obtains the entity label loss including the following steps: converting the latent vector

Input to the entity encoder 2010 in the entity recognition learning module 201, the entity encoder 2010 encodes the latent vector

Further encoding, the entity recognition latent vector of each word is obtained.

Through the latent vector of each word

and entity recognition latent vector

Perform conversion processing to obtain predicted entity recognition tags

Convert all words’ predicted entity recognition labels to

Compare and calculate with the preset standard to obtain the entity label loss L _NER . Latent vectors via Entity Encoder 2010

The encoding obtains the entity recognition latent vector of each word

And the hidden vector of each word

and entity recognition latent vector

Convert to a predicted entity recognition label for the approximate location of a word in an entity

(Scalar), the scalar here refers to a quantity without direction, and its content can be numbers or characters. It can be seen that the predicted entity recognition label will be

Reflected in the form of a scalar, it is easier to calculate the entity recognition loss later.

Referring to Figure 8, in some embodiments, the conversion process includes the following steps: transforming the latent vector of each word into

and entity recognition latent vector

Concatenate, and send the vector obtained after concatenation into the second fully connected layer 2011 and the second SoftMax classifier 2012 to obtain the predicted entity recognition label

Understandably, the concatenated vectors are classified and converted into predicted entity recognition labels through the second fully connected layer 2011 and the second SoftMax classifier 2012

Convert the intermediate vector o to the predicted relation class in the main model 200

The methods are the same, and unifying the two methods makes it easier to establish, optimize and maintain the relationship extraction model.

In some embodiments, the preset criteria are actual entity tags

Convert all words’ predicted entity recognition labels to

with actual entity tags

Compare and calculate the comparison results through the cross-entropy loss function to obtain the entity label loss L _NER . By placing the predicted entity label

with actual entity tags

Calculated using the cross-entropy loss function, the difference between the predicted value and the actual value can be obtained, that is, the entity label loss L _NER . Optimizing the relationship extraction model through the entity label loss L _NER can enhance the model's understanding of the meaning of entities and improve the model's understanding of entity labels. predictive ability.

In some embodiments, the actual entity label of each word is determined by the location of the entity input into the shared encoder 2000. When a word does not belong to an entity, the entity label is the first label; if a word is an entity If a word is at the beginning of an entity, the entity tag is the second tag; if a word is in the middle or end of an entity, the entity tag is the third tag. The first label is O, the second label is B, and the third label is I. Please refer to Figure 8. For example, the input sentence is An air force pilot is back, and the two input entities are air force and pilot . Then the actual entity label of each word in the input sentence

Just O B I B O O. By labeling the words in the input sentence and labeling the predicted entities

with actual entity tags

By comparison, the entity recognition learning module 201 can be trained to find entities from text, thereby enhancing the relationship extraction model's ability to understand entities.

Please continue to refer to Figure 8. In some embodiments, the discriminator 202 obtaining the discriminator 202 loss includes the following steps: the discriminator 202 obtains the predicted entity recognition label

The result is compared with the preset standard to obtain the target output.

target output

The value is 0 or 1 in the following way:

will the hidden vector

The preset third fully connected layer 2020 and the third SoftMax classifier 2021 are sent to obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1 (for example,

The predicted distribution probability on 0 is recorded as P(0|X)); the discriminator 202 loss is obtained by calculating the distribution probability.

In some embodiments, for each word, a logarithmic natural loss is calculated and the losses for all words are summed to obtain the discriminator 202 loss _LD . Calculated using the following formula:

The setting of the discriminator 202 effectively controls the learning degree of the shared encoder 2000 for the entity recognition task and avoids the overfitting of the shared encoder 2000 for the entity recognition task. If there is no loss of the discriminator 202 provided by the discriminator 202, the overall loss will be In the initial optimization, entity recognition loss will be preferred for optimization, thus affecting the performance of the relationship extraction model for the main task-relation extraction. It can be seen that when the discriminator 202 loss provided by the discriminator 202 is initially optimized through the overall loss, it ensures that the main performance of the relationship extraction model is not incorrectly covered.

In some embodiments, when calculating the overall loss, an adjustable control parameter λ is introduced, and the control parameter λ is used to control the contribution of the entity recognition learning module 201 and the adversarial learning discriminator 202 to model training. The control parameter λ provides an adjustable option for the calculation of the overall loss L. When optimizing the relationship extraction model, by adjusting the control parameter λ, the focus of optimizing the relationship extraction model can be controlled. It can be seen that the setting of control parameters improves the practicality of the optimization method.

Calculate the overall loss L by the following formula:

L＝L _RE +L _D ×(λ·L _NER );

And after calculating the overall loss L, all parameters in the relationship extraction model are updated through the back propagation algorithm.

Please combine Figure 8 and Figure 12. In some embodiments, the optimization method also includes the following steps: after optimizing the relationship extraction model through the overall loss L, and then adjusting the parameters of some modules in the relationship extraction model to encode the relationship extraction 2001 and all fully connected layer model parameters are optimized and updated twice. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.

In some embodiments, the secondary optimization update includes the following steps: using the overall loss L to optimize the parameters of the relation extraction model, and comparing the shared encoder 2000 and the relation extraction encoder 2001 in the main model 200 with the first, second and The third fully connected layer 2020 and SoftMax classifier are initialized;

Calculate the post-optimization relationship type loss L _RE in the same way as in the preliminary optimization;

According to the optimized relationship type loss L _RE , the model parameters of the relationship extraction encoder 2001 and the first, second and third fully connected layers are optimized and updated twice through the back propagation algorithm to obtain the final optimized relationship extraction model.

Regarding the performance of the relationship extraction model in the relationship extraction task, it is generally expressed by the F value. For the English relationship extraction data set, before using the optimization method for optimization, the F value of the relationship extraction model is 77.04; using the optimization method for optimization Finally, the F value of the relationship extraction model is 77.70. It can be seen that by introducing the entity recognition learning module 201, the model's ability to model entities is enhanced, thereby improving the performance of the relationship extraction model in the relationship extraction task.

In the embodiments provided in this application, it should be understood that "B corresponding to A" means that B is associated with A, and B can be determined based on A. However, it should also be understood that determining B based on A does not mean determining B only based on A. B can also be determined based on A and/or other information.

Compared with related technologies, the optimization method for natural language models provided in this application can achieve:

1. In an optimization method for a natural language model provided by the embodiment of this application, the natural language model can be a relationship extraction model. The relationship extraction model includes a main model, an entity recognition learning module and a discriminator. The main model includes a shared encoder. and the relation extraction encoder, including the following steps: obtain the input sentence through the shared encoder in the main model, encode the sentence, and output the latent vector of each word in the sentence; input the latent vector into the relation extraction encoder and entity recognition The learning module and discriminator obtain the relationship type loss, entity label loss and discriminator loss respectively; the relationship type loss, entity label loss and discriminator loss are calculated through the preset first algorithm to obtain the overall loss; the relationship is extracted through the overall loss The model is initially optimized. By setting up the entity recognition module in the preliminary optimization, the shared encoder in the main model realizes the learning of entities in the text, enhances the main model's modeling ability of entities, and thus improves the relationship extraction model in the relationship extraction task. performance.

2. In an optimization method for natural language models provided by the embodiment of this application, obtaining the relationship type loss from the main model includes the following steps: inputting the latent vector into the relationship extraction encoder in the main model, and the relationship extraction encoder Encoding, obtain the relationship extraction latent vector of each word; calculate and process the latent vector of each word and the relationship extraction latent vector through the preset second algorithm to obtain the predicted relationship type; compare the predicted relationship type with the preset The standard is compared and calculated to obtain the relationship type loss. The latent vector and the encoded relationship extraction latent vector are processed through the second algorithm to obtain the predicted relationship type of the current main model. The predicted relationship type is compared and calculated with the preset standard to obtain the predicted relationship used to represent the relationship extraction model. Relationship type loss of type ability. Relationship type loss can optimize the relationship prediction ability of the relationship extraction model in the process of preliminary optimization of the relationship extraction model through overall loss, ensuring the reliability of the optimization method.

3. In an optimization method for natural language models provided by the embodiment of the present application, the preset second algorithm includes the following steps: calculating the relationship extraction latent vector through the preset vector algorithm to obtain the vector of the first entity representation, the vector representation of the second entity, and the sentence representation of the relation extraction encoder. At the same time, the preset vector algorithm is applied to the latent vector to obtain the sentence representation of the shared encoder; the vector representation of the first entity, the vector of the second entity The representation, the vector representation of the sentence and the sentence representation of the shared encoder are concatenated to obtain an intermediate vector; the intermediate vector is sent to the SoftMax classifier after passing through a fully connected layer to obtain the predicted relationship type. The latent vector and the relationship extraction latent vector are calculated through the preset vector algorithm, and the obtained vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder are used as intermediate variables , which reflects the vector representation of different positions of entities or sentences in the relationship extraction model. The above-mentioned vector representations are concatenated and the intermediate vectors are obtained. After classification and normalization by the fully connected layer and SoftMax, the main vector is obtained. The relationship type (scalar) predicted by the model for two entities is reflected in the form of a scalar to facilitate subsequent calculation of the relationship type loss.

4. In an optimization method for a natural language model provided by the embodiment of the present application, the entity recognition learning module includes an entity encoder, and obtaining the entity label loss by the entity recognition learning module includes the following steps: inputting the latent vector into the entity recognition learning module The entity encoder in The predicted entity recognition label of the word is compared with the preset standard to obtain the entity label loss. The entity recognition latent vector of each word is obtained by encoding the latent vector by the entity recognition encoder, and the latent vector of each word and the entity recognition latent vector are converted into the approximate position of a word in the entity. Predicted entity identification label (scalar). It can be seen that embodying the predicted entity recognition label in the form of a scalar makes it easier to calculate the entity recognition loss later.

5. In an optimization method for natural language models provided by the embodiment of this application, the conversion process includes the following steps: concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector into a Fully connected layer and SoftMax classifier to obtain predicted entity recognition labels. The way in which the concatenated vectors are classified and converted into predicted entity recognition labels through the fully connected layer and SoftMax classifier is the same as the way in which the intermediate vectors are converted into predicted relationship classes in the main model. It is easier to unify the two methods. Establishment, optimization and maintenance of relationship extraction models.

6. In an optimization method for natural language models provided by the embodiment of this application, the preset standard is the actual entity label, the predicted entity identification labels of all words are compared with the actual entity labels, and the comparison results are The entity label loss is obtained by calculating the cross-entropy loss function. By calculating the predicted entity label and the actual entity label using the cross entropy loss function, the gap between the predicted value and the actual value can be obtained, that is, the entity label loss. Optimizing the relationship extraction model through the entity label loss can enhance the model's understanding of the entity meaning. Understand and improve the model's prediction ability for entity labels.

7. In an optimization method for natural language models provided by the embodiment of this application, the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the entity label is the One label; if a word is the beginning of an entity, the entity label is the second label; if a word is in the middle or end of an entity, the entity label is the third label. By tagging words in sentences, the entity recognition learning module can be trained to find entities in the text, thereby enhancing the relationship extraction model's ability to understand entities.

8. In an optimization method for natural language models provided by the embodiment of the present application, the discriminator obtaining the discriminator loss includes the following steps: the discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains the target output. , the target output value is 0 or 1; send the latent vector to a fully connected layer and SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained by calculating the distribution probability. The setting of the discriminator effectively controls the learning degree of the shared encoder for the entity recognition task and avoids the overfitting of the shared encoder for the entity recognition task. Without the discriminator loss provided by the discriminator, the overall loss will be reduced during the initial optimization. It prefers entity recognition loss to optimize, thus affecting the performance of the relationship extraction model for the main task-relation extraction. It can be seen that the discriminator loss provided by the discriminator ensures that the main performance of the relationship extraction model is not incorrectly covered by the overall loss.

9. In an optimization method for a natural language model provided by the embodiment of this application, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the entity recognition learning module and the discriminator pair model of adversarial learning. The size of the training contribution. The control parameters provide adjustable options for the calculation of the overall loss. When optimizing the relationship extraction model, by adjusting the control parameters, you can control the focus of optimizing the relationship extraction model. It can be seen that the setting of control parameters improves the practicality of the optimization method.

10. An optimization method for relational natural language provided by the embodiment of the present application also includes the following steps: after optimizing the above-mentioned relation extraction model through overall loss, and then adjusting the parameters of some modules in the relation extraction model to optimize the relation Extract the model parameters of the encoder and fully connected layer for secondary optimization and update. By performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.

As shown in Figure 13, the embodiment of the present application also provides an optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second Encoder, the device includes: a first module 610, configured to obtain an input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence; a second module 620 , is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively; the third module 630 is configured to input the target result loss, enhancement loss and discriminator loss respectively. The result loss, the enhancement loss and the discriminator loss are calculated through a preset first algorithm to obtain the overall loss; the fourth module 640 is configured to perform preliminary optimization of the natural language model through the overall loss.

In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the The main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss; the second module 620 is set to:

Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word; combine the latent vector with the vector of the aspect word The representation is input into the sentiment analysis encoder to obtain the emotional polarity loss; the supplementary learning latent vector is input into the sentence restoration module to obtain the intermediate result and the supplementary learning loss; the latent vector and the intermediate The result is input into the discriminator and the discriminator loss is obtained.

In one embodiment, the second module 620 is configured as:

The latent vector and the vector representation of the aspect word are input into the sentiment analysis encoder to obtain the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word; the latent vector and the sentiment analysis vector of the vocabulary are obtained The representation and the aspect word sentiment analysis vector representation are calculated according to a preset second algorithm to obtain the sentiment polarity loss.

In one embodiment, the preset second algorithm includes:

The latent vector is calculated according to the preset vector algorithm to obtain a task-independent sentence representation; the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation. ; Connect the task-independent sentence representation and the sentiment analysis sentence representation in series to obtain an intermediate vector; input the intermediate vector into the preset first fully connected layer, and input the output result of the first fully connected layer into the preset Set the SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss.

In one embodiment, the sentence restoration module includes a supplementary learning decoder, and the second module 620 is configured as:

The supplementary learning latent vector is input into the supplementary learning decoder to reconstruct the input sentence to obtain the predicted vocabulary; the predicted vocabulary is compared with the vocabulary in the input sentence to obtain the intermediate result, and the intermediate result is obtained according to A preset third algorithm calculates the supplementary learning loss.

In one embodiment, the preset third algorithm includes:

Calculate the negative log natural loss of words at one position and sum the negative log natural loss of words at all positions to obtain the supplementary learning loss.

In one embodiment, the second module 620 is configured as:

According to the latent vector and the intermediate result, the target output of the discriminator is obtained; the target output is calculated according to a preset fourth algorithm to obtain the discriminator loss.

In one embodiment, the value of the target output is 0 or 1, and the second module 620 is set to:

Input the supplementary learning latent vector into the preset second fully connected layer, and input the output result of the second fully connected layer into the preset second SoftMax classifier to obtain a 2-dimensional vector for each word, Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1; the discriminator loss is obtained based on the distribution probability.

In one embodiment, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the contribution of the sentence restoration module and the discriminator of adversarial learning to model training.

In one embodiment, a fifth module is also included, configured as:

After the preliminary optimization of the natural language model through the overall loss, a second optimization and update of the parameters in the sentiment analysis model is performed by adjusting the parameters of some modules in the sentiment analysis model.

In one embodiment, the fifth module is configured as:

The optimized parameters of the sentiment analysis model through the overall loss, the task-independent encoder, the supplementary learning encoder, the sentiment analysis encoder, the preset vector algorithm, the first fully connected layer, The second fully connected layer and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss, the emotional polarity is calculated through the back propagation algorithm. The model parameters in the analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice to obtain the final optimized sentiment analysis model.

In one embodiment, the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relationship extraction encoder, and the The target result loss is a relationship type loss, and the enhanced loss is an entity label loss; the second module 620 is set to:

In one embodiment, the relationship type loss is obtained in the following manner:

The latent vector is input into the relationship extraction encoder in the main model, and the relationship extraction encoder encodes the latent vector to obtain the relationship extraction latent vector of each word; the latent vector of each word and the relationship The latent vector is extracted and processed through a preset second algorithm to obtain a predicted relationship type; the predicted relationship type is compared and calculated with a preset standard to obtain the relationship type loss.

In one embodiment, the preset second algorithm includes:

The relationship extraction latent vector is calculated through the preset vector algorithm to obtain the vector representation of the first entity, the vector representation of the second entity and the sentence representation of the relationship extraction encoder, and the preset vector algorithm is applied to the latent vector , obtain the sentence representation of the shared encoder; concatenate the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder to obtain the intermediate Vector; the intermediate vector passes through the preset first fully connected layer and then is sent to the preset first SoftMax classifier to obtain the predicted relationship type.

In one embodiment, the entity recognition learning module includes an entity encoder, and the entity label loss is obtained as follows:

The latent vector is input to the entity encoder in the entity recognition learning module, and the entity encoder encodes the latent vector to obtain the entity recognition latent vector of each word; by encoding the latent vector of each word Perform conversion processing with entity recognition latent vectors to obtain predicted entity recognition labels; compare and calculate the predicted entity recognition labels of all words with preset standards to obtain the entity label loss.

In one embodiment, the second module 620 is configured as:

Concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector to the preset second fully connected layer and the preset second SoftMax classifier to obtain the predicted entity recognition label.

In one embodiment, the second module 620 is set to::

Compare the predicted entity recognition labels of all words with the actual entity labels, and calculate the comparison results through a cross-entropy loss function to obtain the entity label loss.

In one embodiment, the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the actual entity label of a word is the first label; when a word If it is the beginning of an entity, the actual entity tag of the word is the second tag; if the word is in the middle or end of an entity, the actual entity tag of the word is the third tag. .

In one embodiment, the discriminator loss is obtained as follows:

The discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains a target output. The target output value is 0 or 1; the latent vector is sent to the preset third fully connected layer and the third SoftMax classifier, a 2-dimensional vector is obtained for each word, where each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1; calculated through the distribution probability The discriminator loss.

In one embodiment, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the contribution of the entity recognition learning module and the discriminator of adversarial learning to model training.

In one embodiment, a sixth module is also included, configured as:

After the preliminary optimization of the natural language model through the overall loss, by adjusting the parameters of some modules in the relationship extraction model, the model parameters of the relationship extraction encoder and the fourth fully connected layer are adjusted. Perform secondary optimization updates.

The device provided by the embodiment of the present application can implement the method steps in the above method embodiment and has the same technical effect.

As shown in Figure 14, this embodiment of the present application also provides an electronic device, including a processor 710 and a memory 720. When the computer program in the memory 720 is executed by the processor 710, the above-mentioned method for the natural language model is implemented. Optimization.

Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.

Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in an embodiment" throughout this text are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also know that the embodiments described in this article are optional embodiments, and the actions and modules involved are not necessarily necessary for this application.

In the various embodiments of the present application, the size of the serial numbers of the above-mentioned finalization process does not necessarily mean the order of execution. The execution order of the finalization process should be determined by its functions and internal logic, and should not be used in the implementation of the embodiments of the present application. The process constitutes any limitation.

The flowcharts and block diagrams in the drawings of this application illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of this application. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending upon the functionality involved. Each block in the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, may be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented using special purpose hardware implemented in combination with computer instructions.

Claims

An optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The method includes:

Obtain the input sentence through the first encoder, encode the sentence, and output the hidden vector of each word in the sentence;

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;

The overall loss is calculated through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;

The natural language model is initially optimized through the overall loss.
The optimization method of claim 1, wherein the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, and the second encoder is Sentiment analysis encoder, the main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss;

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:

Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word;

Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss;

Input the supplementary learning latent vector into the sentence restoration module to obtain the intermediate result and the supplementary learning loss;

The latent vector and the intermediate result are input into the discriminator to obtain the discriminator loss.
The optimization method according to claim 2, wherein inputting the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss includes:

Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word;

The latent vector, the sentiment analysis vector representation of the vocabulary, and the sentiment analysis vector representation of the aspect word are calculated according to a preset second algorithm to obtain the sentiment polarity loss.
The optimization method as claimed in claim 3, wherein the preset second algorithm includes:

Calculate the latent vector according to a preset vector algorithm to obtain a task-independent sentence representation;

The sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation;

Concatenate the task-irrelevant sentence representation and the sentiment analysis sentence representation to obtain an intermediate vector;

Input the intermediate vector into the preset first fully connected layer, and input the output result of the first fully connected layer into the preset SoftMax classifier to obtain the predicted emotional polarity;

The emotional polarity loss is calculated by comparing the predicted emotional polarity with a preset standard.
The optimization method according to claim 2, wherein the sentence restoration module includes a supplementary learning decoder, and the supplementary learning latent vector is input into the sentence restoration module to obtain an intermediate result and the supplementary learning loss, including :

Input the supplementary learning latent vector into the supplementary learning decoder to reconstruct the input sentence and obtain the predicted vocabulary;

The predicted vocabulary is compared with the vocabulary in the input sentence to obtain the intermediate result, and the supplementary learning loss is calculated according to a preset third algorithm.
The optimization method according to claim 5, wherein the preset third algorithm includes:

Calculate the negative log natural loss of the word at one position, and sum the negative log natural loss of the word at all positions to obtain the supplementary learning loss.
The optimization method according to claim 2, wherein inputting the latent vector and the intermediate result into the discriminator to obtain the discriminator loss includes:

According to the latent vector and the intermediate result, the target output of the discriminator is obtained;

The target output is calculated according to a preset fourth algorithm to obtain the discriminator loss.
The optimization method according to claim 7, wherein the value of the target output is 0 or 1, and the target output is calculated according to a preset fourth algorithm to obtain the discriminator loss, including:

Input the supplementary learning latent vector into the preset second fully connected layer, and input the output result of the second fully connected layer into the preset second SoftMax classifier to obtain a 2-dimensional vector for each word, Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1;

The discriminator loss is obtained based on the distribution probability.
The optimization method according to claim 2, wherein in the case of calculating the overall loss, an adjustable control parameter is introduced, the control parameter is used to control the sentence restoration module and the discriminator pair of adversarial learning. The size of the model training contribution.
The optimization method according to claim 2, after the preliminary optimization of the natural language model through the overall loss, further comprising:

By adjusting the parameters of some modules in the sentiment analysis model, the parameters in the sentiment analysis model are optimized and updated twice.
The optimization method according to claim 10, wherein the second optimization of the parameters in the sentiment analysis model by adjusting the parameters of some modules in the sentiment analysis model includes:

The optimized parameters of the sentiment analysis model through the overall loss, the task-independent encoder, the supplementary learning encoder, the sentiment analysis encoder, the preset vector algorithm, the first fully connected layer, The second fully connected layer and SoftMax classifier are initialized;

Calculate the optimized emotional polarity loss in the same way as in the preliminary optimization;

According to the optimized emotional polarity loss, the model parameters in the emotional analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice through the back propagation algorithm, and we obtain The final optimized sentiment analysis model.
The optimization method according to claim 1, wherein the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, and the second encoder is Relation extraction encoder, the target result loss is a relationship type loss, and the enhancement loss is an entity label loss;

Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:

The latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
The method of claim 12, wherein the relationship type loss is obtained as follows:

Input the latent vector into the relationship extraction encoder in the main model, and the relationship extraction encoder encodes the latent vector to obtain the relationship extraction latent vector of each word;

The latent vector and relationship extraction latent vector of each word are calculated and processed through the preset second algorithm to obtain the predicted relationship type;

The predicted relationship type is compared and calculated with a preset standard to obtain the relationship type loss.
The method of claim 13, wherein the preset second algorithm includes:

The relationship extraction latent vector is calculated through the preset vector algorithm to obtain the vector representation of the first entity, the vector representation of the second entity and the sentence representation of the relationship extraction encoder, and the preset vector algorithm is applied to the latent vector , obtain the sentence representation of the shared encoder;

Concatenate the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder to obtain an intermediate vector;

The intermediate vector passes through the preset first fully connected layer and then is sent to the preset first SoftMax classifier to obtain the predicted relationship type.
The method of claim 12, wherein the entity recognition learning module includes an entity encoder, and the entity label loss is obtained as follows:

Input the latent vector into the entity encoder in the entity recognition learning module, and the entity encoder encodes the latent vector to obtain the entity recognition latent vector of each word;

By converting the latent vector of each word and the entity recognition latent vector, the predicted entity recognition label is obtained;

The predicted entity recognition labels of all words are compared and calculated with preset standards to obtain the entity label loss.
The method of claim 15, wherein the predicted entity recognition label is obtained by converting the latent vector of each word and the entity recognition latent vector, including:

Concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector to the preset second fully connected layer and the preset second SoftMax classifier to obtain the predicted entity recognition label.
The method of claim 15, wherein the preset standard is an actual entity label, and the predicted entity recognition labels of all words are compared and calculated with the preset standard to obtain the entity label loss, including :

Compare the predicted entity recognition labels of all words with the actual entity labels, and calculate the comparison results through a cross-entropy loss function to obtain the entity label loss.
The method of claim 17, wherein the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the actual entity label of a word is the A label; when a word is the beginning of an entity, the actual entity label of the word is the second label; when a word is in the middle or end of an entity, the actual entity label of the word The entity tag is the third tag.
The method of claim 17, wherein the discriminator loss is obtained as follows:

The discriminator obtains the result of comparing the predicted entity identification label with a preset standard to obtain a target output, and the target output value is 0 or 1;

The latent vector is sent to the preset third fully connected layer and the third SoftMax classifier, and a 2-dimensional vector is obtained for each word, where each dimension of the 2-dimensional vector corresponds to the target output at 0 and the distribution probability on 1;

The discriminator loss is calculated through the distribution probability.
The method of claim 12, wherein in the case of calculating the overall loss, adjustable control parameters are introduced, the control parameters are used to control the entity recognition learning module and the discriminator of adversarial learning for model training. The size of the contribution.
The method of claim 16, after preliminary optimizing the natural language model through the overall loss, further comprising:

By adjusting the parameters of some modules in the relationship extraction model, the model parameters of the relationship extraction encoder and the fourth fully connected layer are optimized and updated twice.
An optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The device includes:

The first module is configured to obtain the input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence;

The second module is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;

The third module is configured to calculate the overall loss through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;

The fourth module is configured to perform preliminary optimization on the natural language model through the overall loss.
An electronic device includes a processor and a memory. When the computer program in the memory is executed by the processor, the optimization method for a natural language model according to any one of claims 1 to 21 is implemented.
A computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the optimization method for a natural language model as described in any one of claims 1 to 21 is implemented.