WO2024000966A1 - Optimization method for natural language model - Google Patents

Optimization method for natural language model Download PDF

Info

Publication number
WO2024000966A1
WO2024000966A1 PCT/CN2022/128623 CN2022128623W WO2024000966A1 WO 2024000966 A1 WO2024000966 A1 WO 2024000966A1 CN 2022128623 W CN2022128623 W CN 2022128623W WO 2024000966 A1 WO2024000966 A1 WO 2024000966A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss
vector
encoder
entity
discriminator
Prior art date
Application number
PCT/CN2022/128623
Other languages
French (fr)
Chinese (zh)
Inventor
宋彦
田元贺
李世鹏
Original Assignee
苏州思萃人工智能研究所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210753407.6A external-priority patent/CN115034228A/en
Priority claimed from CN202210753408.0A external-priority patent/CN114970857A/en
Application filed by 苏州思萃人工智能研究所有限公司 filed Critical 苏州思萃人工智能研究所有限公司
Publication of WO2024000966A1 publication Critical patent/WO2024000966A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the technical field of natural language processing, for example, to an optimization method for a natural language model.
  • Natural language processing such as the aspect-based sentiment analysis (Aspect-Based Sentiment Analysis, ABSA) task aims to predict the sentiment polarity for a specific aspect term, and the relationship extraction task aims to extract two entities from a given sentence. , extract (predict) the relationship between these two given entities.
  • aspects-based sentiment analysis Aspect-Based Sentiment Analysis, ABSA
  • the relationship extraction task aims to extract two entities from a given sentence. , extract (predict) the relationship between these two given entities.
  • understanding the meaning of aspect words and entities themselves is very important for emotion prediction and relationship prediction.
  • general methods often ignore the modeling of aspect words and entities themselves, resulting in insufficient understanding of the meaning of aspect words and entities themselves.
  • the natural language model includes a main model, an enhancement module and a discriminator.
  • the main model includes a first encoder and a second encoder.
  • the method includes:
  • the overall loss is calculated through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;
  • the natural language model is initially optimized through the overall loss.
  • the natural language model is a sentiment analysis model
  • the enhancement module is a sentence restoration module
  • the first encoder is a task-independent encoder
  • the second encoder is a sentiment analysis encoder
  • the The main module also includes a supplementary learning encoder
  • the target result loss is an emotional polarity loss
  • the enhancement loss is a supplementary learning loss
  • the latent vector and the intermediate result are input into the discriminator to obtain the discriminator loss.
  • the natural language model is a relationship extraction model
  • the enhancement module is an entity recognition learning module
  • the first encoder is a shared encoder
  • the second encoder is a relationship extraction encoder
  • the The target result loss is a relationship type loss
  • the enhancement loss is an entity label loss
  • the latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
  • the natural language model includes a main model, an enhancement module and a discriminator.
  • the main model includes a first encoder and a second encoder.
  • the device includes:
  • the first module is configured to obtain the input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence;
  • the second module is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;
  • the third module is configured to calculate the overall loss through a preset first algorithm for the target result loss, the enhancement loss, and the discriminator loss;
  • the fourth module is configured to perform preliminary optimization on the natural language model through the overall loss.
  • This application also provides an electronic device, including a processor and a memory.
  • the computer program in the memory is executed by the processor, the above-mentioned optimization method for a natural language model is implemented.
  • This application also provides a computer-readable storage medium that stores a computer program.
  • the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.
  • Figure 1 is a schematic diagram of the steps of an optimization method for a natural language model provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application
  • Figure 3 is a schematic structural diagram of a sentiment analysis model used in an optimization method for a natural language model provided by an embodiment of the present application;
  • Figure 4 is a schematic diagram of the logical steps of an optimization method for a natural language model provided by an embodiment of the present application
  • Figure 5 is a schematic diagram of steps for calculating emotional polarity loss in an optimization method for a natural language model provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of the steps for calculating supplementary learning loss in an optimization method for a natural language model provided by an embodiment of the present application;
  • Figure 7 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a relationship extraction model used in an optimization method for a natural language model provided by an embodiment of the present application
  • Figure 9 is a schematic diagram of the logical steps of another optimization method for a natural language model provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of the steps for calculating relationship type loss in an optimization method for a natural language model provided by an embodiment of the present application;
  • Figure 11 is a schematic diagram of the steps for calculating entity recognition label loss in an optimization method for a natural language model provided by an embodiment of the present application;
  • Figure 12 is a schematic diagram of secondary optimization of an optimization method for a natural language model provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of an optimization device for a natural language model provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides an optimization method for a natural language model.
  • the natural language model includes a main model, an enhancement module and a discriminator.
  • the main model includes a first encoder and a second encoder. This method include:
  • the input sentence is obtained through the first encoder, the sentence is encoded, and the hidden vector of each word in the sentence is output; the hidden vector is input to the second encoder, the enhancement module and the The discriminator obtains the target result loss, the enhancement loss and the discriminator loss respectively; the target result loss, the enhancement loss and the discriminator loss are calculated to obtain the overall loss through the preset first algorithm; through the overall loss The loss performs preliminary optimization on the natural language model.
  • the enhancement module in this embodiment is used to enhance the main model's understanding and modeling capabilities of input objects, thereby enhancing model performance.
  • the natural language model is a sentiment analysis model
  • the enhancement module is a sentence restoration module
  • the first encoder is a task-independent encoder
  • the second encoder is a sentiment analysis encoder
  • the main module also includes a supplementary learning encoder.
  • the target The result loss is the emotional polarity loss
  • the enhancement loss is the supplementary learning loss.
  • the sentiment analysis model includes a main model 100 (The sentiment Classifier), a sentence restoration module 101 (re-constructed sentence) and a discriminator 102 (Discriminator).
  • the main model 100 includes a task-independent encoder 1000 (task -free encoder), sentiment analysis encoder 1001 (ABSA encoder) and supplementary learning encoder 1002 (CL encoder), where: the method includes the following steps: obtain the input sentence X through the task-free encoder 1000, and encode the sentence, obtaining The latent vector of each word in the sentence will the hidden vector Input the supplementary learning encoder 1002 to obtain the supplementary learning latent vector and will supplement the learning of latent vectors
  • the vector representation of aspect words is calculated through vector algorithm.
  • A represents the aspect word
  • convert the hidden vector Vector representation with aspect words Input the sentiment analysis encoder 1001 to obtain the sentiment polarity loss L SA ; the hidden vector will be supplemented to learn Input the sentence restoration module 101 to obtain the intermediate results and supplementary learning loss L CL ; convert the latent vector And the intermediate results are input into the discriminator 102 to obtain the discriminator loss LD ; the emotional polarity loss L SA , the supplementary learning loss L CL and the discriminator loss LD are calculated according to the preset first algorithm to obtain the overall loss L; through the overall loss L performs preliminary optimization on the sentiment analysis model.
  • the task-independent encoder 1000 and the supplementary learning encoder 1002 realize the learning of aspect words in the text, thereby enhancing the main model 100's understanding and modeling ability of aspect words, thereby Improved the performance of sentiment analysis models on sentiment analysis tasks.
  • obtaining the emotional polarity loss LSA includes the following steps: convert the latent vector Vector representation with aspect words Input the sentiment analysis encoder 1001 to obtain the sentiment analysis vector representation of the vocabulary And sentiment analysis vector representation of aspect words latent vector Sentiment analysis vector representation of words And aspect word sentiment analysis vector representation Calculation is performed according to the preset second algorithm to obtain the emotional polarity loss L SA .
  • the emotional polarity loss L SA used to characterize the emotional polarity prediction ability of the emotional analysis model can be obtained.
  • the emotional polarity loss L SA can be used to initially optimize the emotional analysis model based on the overall loss L. In the process, the emotional polarity prediction ability of the sentiment analysis model is optimized to ensure the reliability of the optimization method.
  • the preset second algorithm includes the following steps: Task-independent sentence representations are calculated according to the preset vector algorithm. Sentiment analysis vector representation of words And sentiment analysis vector representation of aspect words Sentiment analysis sentence representation is calculated based on the preset vector algorithm. Represent task-irrelevant sentences Sentence Representation with Sentiment Analysis The intermediate vector is obtained through series connection; the intermediate vector is input into the preset first fully connected layer 1003, and the output result of the first fully connected layer 1003 is input into the first SoftMax classifier 1004 to obtain the predicted emotional polarity. Will predict the sentiment polarity of The emotional polarity loss L SA is calculated by comparing it with the preset standard.
  • Hidden vectors are processed using preset vector algorithms Sentiment analysis vector representation of words And sentiment analysis vector representation of aspect words Perform calculations to obtain task-independent sentence representations Sentence Representation with Sentiment Analysis As an intermediate variable, it reflects the vector representation of the different positions of the input sentence After processing, the predicted emotional polarity of the main model 100 pairs of words in the input sentence X is obtained. (scalar) that will predict the sentiment polarity of It is expressed in the form of a scalar to facilitate the subsequent calculation of the emotional polarity loss LSA .
  • the preset vector algorithm is the MaxPooling algorithm. Here's how:
  • the sentence restoration module 101 includes a supplementary learning decoder (The Specific Decoder) 1010.
  • Obtaining the supplementary learning loss includes the following steps: adding the supplementary learning latent vector
  • the input supplementary learning decoder 1010 reconstructs the input sentence X and obtains the predicted vocabulary will predict the vocabulary
  • the intermediate result is obtained by comparing it with the vocabulary x t in the input sentence X, and the supplementary learning loss L CL is calculated according to the preset third algorithm.
  • the supplementary learning latent vector encoded by the supplementary learning encoder 1002 by the supplementary learning decoder 1010 Decoding and reconstruction are performed, and the predicted vocabulary x t (scalar) is obtained. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss L CL later.
  • the preset third algorithm includes the following steps: calculating the negative log natural loss of the word at one position, and summing the negative log natural loss of the words at all positions to obtain the supplementary learning loss.
  • the supplementary learning loss is calculated by the following formula:
  • n the number of words in the input sentence X.
  • obtaining the discriminator loss includes the following steps: according to the hidden vector and the intermediate results to obtain the target output of the discriminator 102 output to target Calculate according to the preset fourth algorithm to obtain the discriminator loss LD .
  • the discriminator 102 of adversarial learning effectively controls the learning degree of the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning task-the sentence reconstruction task, and avoids the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning.
  • the over-fitting of the task ensures the optimization degree of fitting of the main model 100 to the main task-sentiment analysis task.
  • the target output The value is 0 or 1
  • the method is:
  • the preset fourth algorithm includes the following steps: will supplement the learning latent vector Input the preset second fully connected layer 1020, and input the output result of the second fully connected layer 1020 to the preset second SoftMax classifier 1021, and obtain a 2-dimensional vector for each word, and each of the 2-dimensional vectors Dimensions correspond to target output Distribution probability over 0 and 1, for example,
  • the predicted distribution probability on 0 is denoted as P(0
  • the discriminator loss is obtained based on the distribution probability.
  • the discriminator loss is calculated based on the distribution probability of the target output of the discriminator 102 on 0 and 1.
  • the value of the discriminator loss can be used to intuitively determine whether the supplementary learning task is overfitting.
  • the process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.
  • logarithmic natural loss is calculated, and the losses of all words are summed to obtain the discriminator loss LD . Calculated using the following formula:
  • an adjustable control parameter ⁇ is introduced.
  • the control parameter ⁇ is used to control the contribution of the sentence restoration module 101 and the adversarial learning discriminator 102 to model training.
  • the control parameter ⁇ provides an adjustable option for the calculation of the overall loss L.
  • the optimization method also includes the following steps: after optimizing the sentiment analysis model through the overall loss L, and then performing a secondary optimization update on the parameters in the sentiment analysis model by adjusting the parameters of some modules in the sentiment analysis model. .
  • the sentiment analysis model after secondary optimization can achieve better performance.
  • the sentiment analysis model after secondary optimization is used the same as the baseline model of sentiment analysis. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
  • the secondary optimization includes the following steps: optimizing the parameters of the sentiment analysis model through the overall loss L, applying the task-independent encoder 1000, the supplementary learning encoder 1002, the sentiment analysis encoder 1001, and the preset vector algorithm Initialize with the first and second fully connected layers 1020 and SoftMax classifier; calculate the optimized emotional polarity loss in the same way as in the preliminary optimization According to the optimized emotional polarity loss
  • the model parameters in the sentiment analysis encoder 1001, the first fully connected layer 1003 and the second fully connected layer 1020 are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model.
  • the F value of the sentiment analysis model is 77.04; using the optimization method for optimization Finally, the F-value of the sentiment analysis model is 77.70. It can be seen that by introducing the sentence restoration module 101, the model's modeling ability of aspect words is enhanced, thereby improving the performance of the sentiment analysis model on sentiment analysis tasks.
  • the natural language model can be an emotion analysis model.
  • the emotion analysis model includes a main model, a sentence restoration module and a discriminator.
  • the main model includes a task-independent encoder, emotion Analytical encoder and supplementary learning encoder, wherein: the method includes the following steps: obtain the input sentence through the task-independent encoder, encode the sentence, and obtain the latent vector of each word in the sentence; input the latent vector into the supplementary learning encoder, Obtain the supplementary learning latent vector, and calculate the supplementary learning latent vector through the vector algorithm to obtain the vector representation of the aspect word; input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the emotional polarity loss; input the supplementary learning latent vector
  • the sentence restoration module obtains the intermediate results and supplementary learning loss; inputs the latent vector and intermediate results into the discriminator to obtain the discriminator loss; calculates the emotional polarity loss, supplementary learning loss, and discriminator loss according to the prese
  • the task-independent encoder and the supplementary learning encoder realize the learning of aspect words in the text, which enhances the main model's understanding and modeling ability of aspect words, thereby improving sentiment analysis. Model performance on sentiment analysis tasks.
  • obtaining the emotional polarity loss includes the following steps: input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder, and obtain the sentiment analysis vector representation of the vocabulary; Sentiment analysis vector representation of aspect words; the latent vector, sentiment analysis vector representation of words and aspect word sentiment analysis vector representation are calculated according to the preset second algorithm to obtain the emotional polarity loss. Calculate the latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words according to the preset second algorithm, and you can obtain the emotional polarity loss, which is used to characterize the emotional polarity prediction ability of the emotional analysis model. The loss can be used to optimize the emotional polarity prediction ability of the emotional analysis model during the preliminary optimization of the emotional analysis model through the overall loss, ensuring the reliability of the optimization method.
  • the preset second algorithm includes the following steps: calculating latent vectors according to the preset vector algorithm to obtain task-independent sentence representations; performing emotional analysis on vocabulary
  • the vector representation and the sentiment analysis vector representation of the aspect words are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation; the task-irrelevant sentence representation and the sentiment analysis sentence representation are concatenated to obtain an intermediate vector; the intermediate vector is input into the preset first full connection layer, and input the output result of the first fully connected layer into the preset SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss.
  • the preset vector algorithm is used to calculate latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words, and obtain task-independent sentence representations and sentiment analysis sentence representations as intermediate variables, reflecting the input sentence in the sentiment analysis model.
  • the prediction pole of the vocabulary in the input sentence by the main model is obtained. (scalar), which embodies the predicted polarity in the form of a scalar to facilitate the subsequent calculation of emotional polarity loss.
  • the sentence restoration module includes a supplementary learning decoder
  • obtaining the supplementary learning loss includes the following steps: inputting the supplementary learning latent vector into the supplementary learning decoder to reconstruct the input sentence, Obtain the predicted vocabulary; compare the predicted vocabulary with the vocabulary in the input sentence to obtain the intermediate result, and calculate the supplementary learning loss according to the preset third algorithm.
  • the predicted vocabulary (scalar) is obtained by decoding and reconstructing the supplementary learning latent vector encoded by the supplementary learning encoder through the supplementary learning encoder. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss later.
  • the preset third algorithm includes the following steps: calculate the negative log natural loss of the word in one position, and add the negative logarithm of the word in all positions.
  • the word summation of the natural loss is obtained by supplementing the learned loss.
  • obtaining the discriminator loss also includes the following steps: obtaining the target output of the discriminator based on the latent vector and the intermediate result; and calculating the target output according to the preset fourth
  • the algorithm performs calculations and obtains the discriminator loss.
  • the discriminator of adversarial learning effectively controls the learning degree of the task-independent encoder and the supplementary learning encoder on the supplementary learning task-the sentence reconstruction task, and avoids the overfitting of the task-independent encoder and the supplementary learning encoder on the supplementary learning task. , ensuring the degree of fitting optimization of the main model for the main task-sentiment analysis task.
  • the value of the target output is 0 or 1
  • the preset fourth algorithm includes the following steps: input the supplementary learning latent vector into the preset second full algorithm. connection layer, and input the output result of the second fully connected layer to the preset second SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained based on the distribution probability. Calculate the discriminator loss based on the distribution probability of the discriminator's target output on 0 and 1. You can intuitively judge whether the supplementary learning task is overfitting through the value of the discriminator loss. The process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.
  • control parameters are used to control the contribution of the sentence restoration module and the adversarial learning discriminator to model training. .
  • the control parameters provide adjustable options for the calculation of the overall loss.
  • the optimization method for the natural language model also includes the following steps: after optimizing the sentiment analysis model through the overall loss, and then adjusting the parameters of some modules in the sentiment analysis model to perform the sentiment analysis.
  • the parameters in the model are updated through secondary optimization. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance.
  • the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
  • the secondary optimization includes the following steps: optimizing parameters of the sentiment analysis model through overall loss, task-independent encoder, supplementary learning encoder, sentiment analysis
  • the encoder, the preset vector algorithm, the first and second fully connected layers and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss , the model parameters in the sentiment analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model.
  • the natural language model is a relation extraction model
  • the enhancement module is an entity recognition learning module
  • the first encoder is a shared encoder
  • the second encoder is a relation extraction encoder
  • the target result loss is a relation type loss
  • the enhancement loss is the entity label loss.
  • An optimization method for a natural language model includes the following steps: through the shared encoder 2000 (Shared Encoder) in the preset main model 200 (Relationship extraction (RE)) Get the input sentence, encode the sentence, and output the hidden vector of each word in the sentence
  • the relationship extraction model includes a main model 200, an entity recognition learning module 201 (NER) and a discriminator 202 (THE Discriminator).
  • the main model 200 also includes a relationship extraction encoder 2001 (RE encoder), which converts the latent vector into Input the relationship extraction encoder 2001, the entity recognition learning module 201 and the discriminator 202 to obtain the relationship type loss L RE , the entity label loss L NER and the discriminator 202 loss respectively; combine the relationship type loss L RE , the entity label loss L NER and the discriminant loss
  • the loss L D of the device 202 is calculated by the preset first algorithm to obtain the overall loss L; the relationship extraction model is initially optimized through the overall loss L.
  • the shared encoder 2000 in the main model 200 realizes the learning of entities in the text, which enhances the modeling ability of the main model 200 for entities, thus improving the relationship extraction model. Performance on relation extraction tasks.
  • the main model 200 obtains the relationship type loss including the following steps: convert the latent vector Input the relationship extraction encoder 2001 in the main model 200, and the relationship extraction encoder 2001 pairs the latent vector Encode to obtain the relationship extraction latent vector of each word Through the latent vector of each word and relationship extraction latent vector Perform calculation processing through the preset second algorithm to obtain the predicted relationship type The type of relationship that will be predicted Compare and calculate with the preset standard to obtain the relationship type loss L RE .
  • the predicted relationship type of the current main model 200 can be obtained The type of relationship that will be predicted Compare and calculate with the preset standard to obtain the relationship type loss L RE that is used to characterize the ability of the relationship extraction model to predict relationship types.
  • the relationship type loss L RE can be used to initially optimize the relationship extraction model through the overall loss L. The relationship prediction ability of the extracted model is optimized to ensure the reliability of the optimization method.
  • the preset second algorithm includes the following steps: extracting latent vectors for relationships through a preset vector algorithm. Perform calculations to obtain the vector representation of the first entity Vector representation of the second entity and Sentence Representation with Relation Extraction Encoder 2001 At the same time, the preset vector algorithm is applied to the hidden vector to obtain the sentence representation of the shared encoder 2000 Represent the vector of the first entity Vector representation of the second entity Sentence Representation for Relation Extraction Encoder 2001 and shared encoder 2000 sentence representation concatenate to obtain the intermediate vector o; the intermediate vector o passes through the preset first fully connected layer 2002 and then is sent to the first SoftMax classifier 2003 to obtain the predicted relationship type.
  • the vector representation of the first entity is obtained by calculating the implicit vector and the relationship extraction implicit vector through the preset vector algorithm.
  • Vector representation of the second entity (E 1 and E 2 represent two entities respectively), sentence representation of relation extraction encoder 2001 and shared encoder 2000 sentence representation
  • the above-mentioned vector representation is the intermediate vector o obtained after concatenation, after passing through the first fully connected layer 2002 and the first SoftMax classifier 2003 After classification and normalization, the relationship type predicted by the main model 200 for two entities is obtained. (scalar), the type of relationship that will be predicted It is expressed in the form of a scalar to facilitate the subsequent calculation of the relationship type loss L RE .
  • the preset vector algorithm is the MaxPooling algorithm. Here's how:
  • the entity recognition learning module 201 includes an entity encoder 2010.
  • the entity recognition learning module 201 obtains the entity label loss including the following steps: converting the latent vector Input to the entity encoder 2010 in the entity recognition learning module 201, the entity encoder 2010 encodes the latent vector Further encoding, the entity recognition latent vector of each word is obtained. Through the latent vector of each word and entity recognition latent vector Perform conversion processing to obtain predicted entity recognition tags Convert all words’ predicted entity recognition labels to Compare and calculate with the preset standard to obtain the entity label loss L NER .
  • Latent vectors via Entity Encoder 2010 The encoding obtains the entity recognition latent vector of each word And the hidden vector of each word and entity recognition latent vector Convert to a predicted entity recognition label for the approximate location of a word in an entity (Scalar), the scalar here refers to a quantity without direction, and its content can be numbers or characters. It can be seen that the predicted entity recognition label will be Reflected in the form of a scalar, it is easier to calculate the entity recognition loss later.
  • the conversion process includes the following steps: transforming the latent vector of each word into and entity recognition latent vector Concatenate, and send the vector obtained after concatenation into the second fully connected layer 2011 and the second SoftMax classifier 2012 to obtain the predicted entity recognition label Understandably, the concatenated vectors are classified and converted into predicted entity recognition labels through the second fully connected layer 2011 and the second SoftMax classifier 2012 Convert the intermediate vector o to the predicted relation class in the main model 200
  • the methods are the same, and unifying the two methods makes it easier to establish, optimize and maintain the relationship extraction model.
  • the preset criteria are actual entity tags Convert all words’ predicted entity recognition labels to with actual entity tags Compare and calculate the comparison results through the cross-entropy loss function to obtain the entity label loss L NER .
  • the difference between the predicted value and the actual value can be obtained, that is, the entity label loss L NER .
  • Optimizing the relationship extraction model through the entity label loss L NER can enhance the model's understanding of the meaning of entities and improve the model's understanding of entity labels. predictive ability.
  • the actual entity label of each word is determined by the location of the entity input into the shared encoder 2000.
  • the entity label is the first label; if a word is an entity If a word is at the beginning of an entity, the entity tag is the second tag; if a word is in the middle or end of an entity, the entity tag is the third tag.
  • the first label is O
  • the second label is B
  • the third label is I.
  • the input sentence is An air force pilot is back, and the two input entities are air force and pilot .
  • the actual entity label of each word in the input sentence Just O B I B O O.
  • the entity recognition learning module 201 can be trained to find entities from text, thereby enhancing the relationship extraction model's ability to understand entities.
  • the discriminator 202 obtaining the discriminator 202 loss includes the following steps: the discriminator 202 obtains the predicted entity recognition label The result is compared with the preset standard to obtain the target output. target output The value is 0 or 1 in the following way:
  • the preset third fully connected layer 2020 and the third SoftMax classifier 2021 are sent to obtain a 2-dimensional vector for each word.
  • Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1 (for example, The predicted distribution probability on 0 is recorded as P(0
  • a logarithmic natural loss is calculated and the losses for all words are summed to obtain the discriminator 202 loss LD . Calculated using the following formula:
  • the setting of the discriminator 202 effectively controls the learning degree of the shared encoder 2000 for the entity recognition task and avoids the overfitting of the shared encoder 2000 for the entity recognition task. If there is no loss of the discriminator 202 provided by the discriminator 202, the overall loss will be In the initial optimization, entity recognition loss will be preferred for optimization, thus affecting the performance of the relationship extraction model for the main task-relation extraction. It can be seen that when the discriminator 202 loss provided by the discriminator 202 is initially optimized through the overall loss, it ensures that the main performance of the relationship extraction model is not incorrectly covered.
  • an adjustable control parameter ⁇ is introduced, and the control parameter ⁇ is used to control the contribution of the entity recognition learning module 201 and the adversarial learning discriminator 202 to model training.
  • the control parameter ⁇ provides an adjustable option for the calculation of the overall loss L.
  • the optimization method also includes the following steps: after optimizing the relationship extraction model through the overall loss L, and then adjusting the parameters of some modules in the relationship extraction model to encode the relationship extraction 2001 and all fully connected layer model parameters are optimized and updated twice. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance.
  • the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
  • the secondary optimization update includes the following steps: using the overall loss L to optimize the parameters of the relation extraction model, and comparing the shared encoder 2000 and the relation extraction encoder 2001 in the main model 200 with the first, second and The third fully connected layer 2020 and SoftMax classifier are initialized;
  • the model parameters of the relationship extraction encoder 2001 and the first, second and third fully connected layers are optimized and updated twice through the back propagation algorithm to obtain the final optimized relationship extraction model.
  • the F value of the relationship extraction model is 77.04; using the optimization method for optimization Finally, the F value of the relationship extraction model is 77.70. It can be seen that by introducing the entity recognition learning module 201, the model's ability to model entities is enhanced, thereby improving the performance of the relationship extraction model in the relationship extraction task.
  • B corresponding to A means that B is associated with A, and B can be determined based on A.
  • determining B based on A does not mean determining B only based on A.
  • B can also be determined based on A and/or other information.
  • the natural language model can be a relationship extraction model.
  • the relationship extraction model includes a main model, an entity recognition learning module and a discriminator.
  • the main model includes a shared encoder. and the relation extraction encoder, including the following steps: obtain the input sentence through the shared encoder in the main model, encode the sentence, and output the latent vector of each word in the sentence; input the latent vector into the relation extraction encoder and entity recognition
  • the learning module and discriminator obtain the relationship type loss, entity label loss and discriminator loss respectively; the relationship type loss, entity label loss and discriminator loss are calculated through the preset first algorithm to obtain the overall loss; the relationship is extracted through the overall loss
  • the model is initially optimized. By setting up the entity recognition module in the preliminary optimization, the shared encoder in the main model realizes the learning of entities in the text, enhances the main model's modeling ability of entities, and thus improves the relationship extraction model in the relationship extraction task. performance.
  • obtaining the relationship type loss from the main model includes the following steps: inputting the latent vector into the relationship extraction encoder in the main model, and the relationship extraction encoder Encoding, obtain the relationship extraction latent vector of each word; calculate and process the latent vector of each word and the relationship extraction latent vector through the preset second algorithm to obtain the predicted relationship type; compare the predicted relationship type with the preset The standard is compared and calculated to obtain the relationship type loss.
  • the latent vector and the encoded relationship extraction latent vector are processed through the second algorithm to obtain the predicted relationship type of the current main model.
  • the predicted relationship type is compared and calculated with the preset standard to obtain the predicted relationship used to represent the relationship extraction model.
  • Relationship type loss of type ability Relationship type loss can optimize the relationship prediction ability of the relationship extraction model in the process of preliminary optimization of the relationship extraction model through overall loss, ensuring the reliability of the optimization method.
  • the preset second algorithm includes the following steps: calculating the relationship extraction latent vector through the preset vector algorithm to obtain the vector of the first entity representation, the vector representation of the second entity, and the sentence representation of the relation extraction encoder.
  • the preset vector algorithm is applied to the latent vector to obtain the sentence representation of the shared encoder; the vector representation of the first entity, the vector of the second entity
  • the representation, the vector representation of the sentence and the sentence representation of the shared encoder are concatenated to obtain an intermediate vector; the intermediate vector is sent to the SoftMax classifier after passing through a fully connected layer to obtain the predicted relationship type.
  • the latent vector and the relationship extraction latent vector are calculated through the preset vector algorithm, and the obtained vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder are used as intermediate variables , which reflects the vector representation of different positions of entities or sentences in the relationship extraction model.
  • the above-mentioned vector representations are concatenated and the intermediate vectors are obtained.
  • the main vector is obtained.
  • the relationship type (scalar) predicted by the model for two entities is reflected in the form of a scalar to facilitate subsequent calculation of the relationship type loss.
  • the entity recognition learning module includes an entity encoder
  • obtaining the entity label loss by the entity recognition learning module includes the following steps: inputting the latent vector into the entity recognition learning module
  • the entity encoder in The predicted entity recognition label of the word is compared with the preset standard to obtain the entity label loss.
  • the entity recognition latent vector of each word is obtained by encoding the latent vector by the entity recognition encoder, and the latent vector of each word and the entity recognition latent vector are converted into the approximate position of a word in the entity.
  • Predicted entity identification label (scalar). It can be seen that embodying the predicted entity recognition label in the form of a scalar makes it easier to calculate the entity recognition loss later.
  • the conversion process includes the following steps: concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector into a Fully connected layer and SoftMax classifier to obtain predicted entity recognition labels.
  • the way in which the concatenated vectors are classified and converted into predicted entity recognition labels through the fully connected layer and SoftMax classifier is the same as the way in which the intermediate vectors are converted into predicted relationship classes in the main model. It is easier to unify the two methods. Establishment, optimization and maintenance of relationship extraction models.
  • the preset standard is the actual entity label
  • the predicted entity identification labels of all words are compared with the actual entity labels
  • the comparison results are
  • the entity label loss is obtained by calculating the cross-entropy loss function.
  • the actual entity label of each word is determined by the location of the entity in the input.
  • the entity label is the One label; if a word is the beginning of an entity, the entity label is the second label; if a word is in the middle or end of an entity, the entity label is the third label.
  • the discriminator obtaining the discriminator loss includes the following steps: the discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains the target output. , the target output value is 0 or 1; send the latent vector to a fully connected layer and SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained by calculating the distribution probability.
  • the setting of the discriminator effectively controls the learning degree of the shared encoder for the entity recognition task and avoids the overfitting of the shared encoder for the entity recognition task.
  • the discriminator loss provided by the discriminator ensures that the main performance of the relationship extraction model is not incorrectly covered by the overall loss.
  • control parameters are introduced, and the control parameters are used to control the entity recognition learning module and the discriminator pair model of adversarial learning.
  • the control parameters provide adjustable options for the calculation of the overall loss.
  • An optimization method for relational natural language provided by the embodiment of the present application also includes the following steps: after optimizing the above-mentioned relation extraction model through overall loss, and then adjusting the parameters of some modules in the relation extraction model to optimize the relation Extract the model parameters of the encoder and fully connected layer for secondary optimization and update.
  • the secondary optimized relationship extraction model can achieve better performance.
  • the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
  • the embodiment of the present application also provides an optimization device for a natural language model.
  • the natural language model includes a main model, an enhancement module and a discriminator.
  • the main model includes a first encoder and a second Encoder, the device includes: a first module 610, configured to obtain an input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence; a second module 620 , is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively; the third module 630 is configured to input the target result loss, enhancement loss and discriminator loss respectively.
  • the result loss, the enhancement loss and the discriminator loss are calculated through a preset first algorithm to obtain the overall loss; the fourth module 640 is configured to perform preliminary optimization of the natural language model through the overall loss.
  • the natural language model is a sentiment analysis model
  • the enhancement module is a sentence restoration module
  • the first encoder is a task-independent encoder
  • the second encoder is a sentiment analysis encoder
  • the The main module also includes a supplementary learning encoder
  • the target result loss is an emotional polarity loss
  • the enhancement loss is a supplementary learning loss
  • the second module 620 is set to:
  • the representation is input into the sentiment analysis encoder to obtain the emotional polarity loss;
  • the supplementary learning latent vector is input into the sentence restoration module to obtain the intermediate result and the supplementary learning loss; the latent vector and the intermediate
  • the result is input into the discriminator and the discriminator loss is obtained.
  • the second module 620 is configured as:
  • the latent vector and the vector representation of the aspect word are input into the sentiment analysis encoder to obtain the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word; the latent vector and the sentiment analysis vector of the vocabulary are obtained
  • the representation and the aspect word sentiment analysis vector representation are calculated according to a preset second algorithm to obtain the sentiment polarity loss.
  • the preset second algorithm includes:
  • the latent vector is calculated according to the preset vector algorithm to obtain a task-independent sentence representation; the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation. ; Connect the task-independent sentence representation and the sentiment analysis sentence representation in series to obtain an intermediate vector; input the intermediate vector into the preset first fully connected layer, and input the output result of the first fully connected layer into the preset Set the SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss.
  • the sentence restoration module includes a supplementary learning decoder
  • the second module 620 is configured as:
  • the supplementary learning latent vector is input into the supplementary learning decoder to reconstruct the input sentence to obtain the predicted vocabulary; the predicted vocabulary is compared with the vocabulary in the input sentence to obtain the intermediate result, and the intermediate result is obtained according to A preset third algorithm calculates the supplementary learning loss.
  • the preset third algorithm includes:
  • the second module 620 is configured as:
  • the target output of the discriminator is obtained; the target output is calculated according to a preset fourth algorithm to obtain the discriminator loss.
  • the value of the target output is 0 or 1
  • the second module 620 is set to:
  • adjustable control parameters are introduced, and the control parameters are used to control the contribution of the sentence restoration module and the discriminator of adversarial learning to model training.
  • a fifth module is also included, configured as:
  • a second optimization and update of the parameters in the sentiment analysis model is performed by adjusting the parameters of some modules in the sentiment analysis model.
  • the fifth module is configured as:
  • the optimized parameters of the sentiment analysis model through the overall loss, the task-independent encoder, the supplementary learning encoder, the sentiment analysis encoder, the preset vector algorithm, the first fully connected layer, The second fully connected layer and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss, the emotional polarity is calculated through the back propagation algorithm.
  • the model parameters in the analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice to obtain the final optimized sentiment analysis model.
  • the natural language model is a relationship extraction model
  • the enhancement module is an entity recognition learning module
  • the first encoder is a shared encoder
  • the second encoder is a relationship extraction encoder
  • the The target result loss is a relationship type loss
  • the enhanced loss is an entity label loss
  • the second module 620 is set to:
  • the latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
  • the relationship type loss is obtained in the following manner:
  • the latent vector is input into the relationship extraction encoder in the main model, and the relationship extraction encoder encodes the latent vector to obtain the relationship extraction latent vector of each word; the latent vector of each word and the relationship
  • the latent vector is extracted and processed through a preset second algorithm to obtain a predicted relationship type; the predicted relationship type is compared and calculated with a preset standard to obtain the relationship type loss.
  • the preset second algorithm includes:
  • the relationship extraction latent vector is calculated through the preset vector algorithm to obtain the vector representation of the first entity, the vector representation of the second entity and the sentence representation of the relationship extraction encoder, and the preset vector algorithm is applied to the latent vector , obtain the sentence representation of the shared encoder; concatenate the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder to obtain the intermediate Vector; the intermediate vector passes through the preset first fully connected layer and then is sent to the preset first SoftMax classifier to obtain the predicted relationship type.
  • the entity recognition learning module includes an entity encoder, and the entity label loss is obtained as follows:
  • the latent vector is input to the entity encoder in the entity recognition learning module, and the entity encoder encodes the latent vector to obtain the entity recognition latent vector of each word; by encoding the latent vector of each word Perform conversion processing with entity recognition latent vectors to obtain predicted entity recognition labels; compare and calculate the predicted entity recognition labels of all words with preset standards to obtain the entity label loss.
  • the second module 620 is configured as:
  • Concatenate the latent vector of each word with the entity recognition latent vector and send the concatenated vector to the preset second fully connected layer and the preset second SoftMax classifier to obtain the predicted entity recognition label.
  • the second module 620 is set to::
  • the actual entity label of each word is determined by the location of the entity in the input.
  • the actual entity label of a word is the first label; when a word If it is the beginning of an entity, the actual entity tag of the word is the second tag; if the word is in the middle or end of an entity, the actual entity tag of the word is the third tag. .
  • the discriminator loss is obtained as follows:
  • the discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains a target output.
  • the target output value is 0 or 1; the latent vector is sent to the preset third fully connected layer and the third SoftMax classifier, a 2-dimensional vector is obtained for each word, where each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1; calculated through the distribution probability The discriminator loss.
  • adjustable control parameters when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the contribution of the entity recognition learning module and the discriminator of adversarial learning to model training.
  • a sixth module is also included, configured as:
  • the model parameters of the relationship extraction encoder and the fourth fully connected layer are adjusted. Perform secondary optimization updates.
  • the device provided by the embodiment of the present application can implement the method steps in the above method embodiment and has the same technical effect.
  • this embodiment of the present application also provides an electronic device, including a processor 710 and a memory 720.
  • a processor 710 When the computer program in the memory 720 is executed by the processor 710, the above-mentioned method for the natural language model is implemented. Optimization.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.
  • references herein to "one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment” or “in an embodiment” throughout this text are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also know that the embodiments described in this article are optional embodiments, and the actions and modules involved are not necessarily necessary for this application.
  • the size of the serial numbers of the above-mentioned finalization process does not necessarily mean the order of execution.
  • the execution order of the finalization process should be determined by its functions and internal logic, and should not be used in the implementation of the embodiments of the present application.
  • the process constitutes any limitation.
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending upon the functionality involved.
  • Each block in the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration may be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented using special purpose hardware implemented in combination with computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Provided in the present application is an optimization method for a natural language model. The natural language model comprises a main model, an enhancement module and a discriminator, wherein the main model comprises a first encoder and a second encoder. The method comprises: acquiring an input statement by means of a first encoder, coding the statement, and outputting an implicit vector of each term in the statement; inputting the implicit vectors into a second encoder, an enhancement module and a discriminator, so as to respectively obtain a target result loss, an enhancement loss and a discriminator loss; by means of a preset first algorithm, performing calculation on the target result loss, the enhancement loss and the discriminator loss, so as to obtain an overall loss; and preliminarily optimizing a natural language model according to the overall loss.

Description

用于自然语言模型的优化方法Optimization methods for natural language models
本申请要求在2022年06月29日提交中国专利局、申请号为202210753408.0的中国专利申请的优先权,要求在2022年06月29日提交中国专利局、申请号为202210753407.6的中国专利申请的优先权,该两件申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210753408.0 submitted to the China Patent Office on June 29, 2022, and claims priority to the Chinese patent application with application number 202210753407.6 submitted to the China Patent Office on June 29, 2022. The entire contents of both applications are hereby incorporated by reference.
技术领域Technical field
本申请涉及自然语言处理技术领域,例如涉及一种用于自然语言模型的优化方法。This application relates to the technical field of natural language processing, for example, to an optimization method for a natural language model.
背景技术Background technique
自然语言处理,例如基于方面的情感分析(Aspect Based Sentiment Analysis,ABSA)任务旨在预测针对特定方面词(aspect term)的情感极性,又例如关系抽取任务旨在从给定句子的两个实体中,抽取(预测)这两个给定实体之间的关系。其中,对方面词以及实体本身意义的理解对情感预测和关系预测十分重要。然而,一般的方法往往忽视对方面词以及实体本身的建模,导致对方面词以及实体本身意义的理解不够。Natural language processing, such as the aspect-based sentiment analysis (Aspect-Based Sentiment Analysis, ABSA) task aims to predict the sentiment polarity for a specific aspect term, and the relationship extraction task aims to extract two entities from a given sentence. , extract (predict) the relationship between these two given entities. Among them, understanding the meaning of aspect words and entities themselves is very important for emotion prediction and relationship prediction. However, general methods often ignore the modeling of aspect words and entities themselves, resulting in insufficient understanding of the meaning of aspect words and entities themselves.
发明内容Contents of the invention
本申请提供一种用于自然语言模型的优化方法,所述自然语言模型包括主模型、增强模块和判别器,所述主模型包括第一编码器和第二编码器,所述方法包括:This application provides an optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The method includes:
通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;Obtain the input sentence through the first encoder, encode the sentence, and output the hidden vector of each word in the sentence;
将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;
对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;The overall loss is calculated through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;
通过所述整体损失对所述自然语言模型进行初步优化。The natural language model is initially optimized through the overall loss.
一实施方式中,所述自然语言模型为情感分析模型,所述增强模块为句子复原模块,所述第一编码器为任务无关编码器,所述第二编码器为情感分析编码器,所述主模块还包括补充学习编码器,所述目标结果损失为情感极性损失, 所述增强损失为补充学习损失;In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the The main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss;
所述将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:
将所述隐向量输入所述补充学习编码器,得到补充学习隐向量,并将所述补充学习隐向量通过向量算法计算得到方面词的向量表征;Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word;
将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到所述情感极性损失;Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss;
将所述补充学习隐向量输入所述句子复原模块,得到中间结果以及所述补充学习损失;Input the supplementary learning latent vector into the sentence restoration module to obtain the intermediate result and the supplementary learning loss;
将所述隐向量以及所述中间结果输入所述判别器,得到所述判别器损失。The latent vector and the intermediate result are input into the discriminator to obtain the discriminator loss.
一实施方式中,所述自然语言模型为关系抽取模型,所述增强模块为实体识别学习模块,所述第一编码器为共享编码器,所述第二编码器为关系抽取编码器,所述目标结果损失为关系类型损失,所述增强损失为实体标签损失;In one embodiment, the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relationship extraction encoder, and the The target result loss is a relationship type loss, and the enhancement loss is an entity label loss;
所述将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:
将所述隐向量输入所述关系抽取编码器、所述实体识别学习模块和所述判别器,分别得到所述关系类型损失、所述实体标签损失以及所述判别器损失。The latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
本申请还提供一种用于自然语言模型的优化装置,所述自然语言模型包括主模型、增强模块和判别器,所述主模型包括第一编码器和第二编码器,所述装置包括:This application also provides an optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The device includes:
第一模块,设置为通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;The first module is configured to obtain the input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence;
第二模块,设置为将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;The second module is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;
第三模块,设置为对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;The third module is configured to calculate the overall loss through a preset first algorithm for the target result loss, the enhancement loss, and the discriminator loss;
第四模块,设置为通过所述整体损失对所述自然语言模型进行初步优化。The fourth module is configured to perform preliminary optimization on the natural language model through the overall loss.
本申请还提供一种电子设备,包括处理器和存储器,所述存储器中的计算机程序被所述处理器执行时实现上述的用于自然语言模型的优化方法。This application also provides an electronic device, including a processor and a memory. When the computer program in the memory is executed by the processor, the above-mentioned optimization method for a natural language model is implemented.
本申请还提供一种计算机可读储存介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的用于自然语言模型的优化方法。This application also provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.
附图说明Description of drawings
图1是本申请实施例提供的一种用于自然语言模型的优化方法的步骤示意图;Figure 1 is a schematic diagram of the steps of an optimization method for a natural language model provided by an embodiment of the present application;
图2是本申请实施例提供的另一种用于自然语言模型的优化方法的步骤示意图;Figure 2 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application;
图3是本申请实施例提供的一种用于自然语言模型的优化方法的情感分析模型的结构示意图;Figure 3 is a schematic structural diagram of a sentiment analysis model used in an optimization method for a natural language model provided by an embodiment of the present application;
图4是本申请实施例提供的一种用于自然语言模型的优化方法的逻辑步骤示意图;Figure 4 is a schematic diagram of the logical steps of an optimization method for a natural language model provided by an embodiment of the present application;
图5是本申请实施例提供的一种用于自然语言模型的优化方法的计算情感极性损失的步骤示意图;Figure 5 is a schematic diagram of steps for calculating emotional polarity loss in an optimization method for a natural language model provided by an embodiment of the present application;
图6是本申请实施例提供的一种用于自然语言模型的优化方法的计算补充学习损失的步骤示意图;Figure 6 is a schematic diagram of the steps for calculating supplementary learning loss in an optimization method for a natural language model provided by an embodiment of the present application;
图7是本申请实施例提供的另一种用于自然语言模型的优化方法的步骤示意图;Figure 7 is a schematic diagram of the steps of another optimization method for a natural language model provided by an embodiment of the present application;
图8是本申请实施例提供的一种用于自然语言模型的优化方法的关系抽取模型的结构示意图;Figure 8 is a schematic structural diagram of a relationship extraction model used in an optimization method for a natural language model provided by an embodiment of the present application;
图9是本申请实施例提供的另一种用于自然语言模型的优化方法的逻辑步骤示意图;Figure 9 is a schematic diagram of the logical steps of another optimization method for a natural language model provided by an embodiment of the present application;
图10是本申请实施例提供的一种用于自然语言模型的优化方法的计算关系类型损失的步骤示意图;Figure 10 is a schematic diagram of the steps for calculating relationship type loss in an optimization method for a natural language model provided by an embodiment of the present application;
图11是本申请实施例提供的一种用于自然语言模型的优化方法的计算实体识别标签损失的步骤示意图;Figure 11 is a schematic diagram of the steps for calculating entity recognition label loss in an optimization method for a natural language model provided by an embodiment of the present application;
图12是本申请实施例提供的一种用于自然语言模型的优化方法的二次优化的示意图;Figure 12 is a schematic diagram of secondary optimization of an optimization method for a natural language model provided by an embodiment of the present application;
图13是本申请实施例提供的一种用于自然语言模型的优化装置的结构示意图;Figure 13 is a schematic structural diagram of an optimization device for a natural language model provided by an embodiment of the present application;
图14是本申请实施例提供的一种电子设备的结构示意图。Figure 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
以下结合附图及实施实例,对本申请进行说明。此处所描述的具体实施例 仅仅用以解释本申请。The present application will be described below with reference to the accompanying drawings and implementation examples. The specific embodiments described herein are merely illustrative of the present application.
如图1所示,本申请实施例提供一种用于自然语言模型的优化方法,自然语言模型包括主模型、增强模块和判别器,主模型包括第一编码器和第二编码器,该方法包括:As shown in Figure 1, the embodiment of the present application provides an optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. This method include:
通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;通过所述整体损失对所述自然语言模型进行初步优化。The input sentence is obtained through the first encoder, the sentence is encoded, and the hidden vector of each word in the sentence is output; the hidden vector is input to the second encoder, the enhancement module and the The discriminator obtains the target result loss, the enhancement loss and the discriminator loss respectively; the target result loss, the enhancement loss and the discriminator loss are calculated to obtain the overall loss through the preset first algorithm; through the overall loss The loss performs preliminary optimization on the natural language model.
本实施例中的增强模块用于增强主模型对输入对象的理解和建模能力,从而可以增强模型性能。The enhancement module in this embodiment is used to enhance the main model's understanding and modeling capabilities of input objects, thereby enhancing model performance.
一实施例中,自然语言模型为情感分析模型,增强模块为句子复原模块,第一编码器为任务无关编码器,第二编码器为情感分析编码器,主模块还包括补充学习编码器,目标结果损失为情感极性损失,增强损失为补充学习损失。In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the main module also includes a supplementary learning encoder. The target The result loss is the emotional polarity loss, and the enhancement loss is the supplementary learning loss.
将所述隐向量输入第二编码器、增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:将隐向量输入补充学习编码器,得到补充学习隐向量,并将补充学习隐向量通过向量算法计算得到方面词的向量表征;将隐向量与方面词的向量表征输入情感分析编码器,得到情感极性损失;将补充学习隐向量输入句子复原模块,得到中间结果以及补充学习损失;将隐向量以及中间结果输入判别器,得到判别器损失。Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including: inputting the latent vector into the supplementary learning encoder to obtain the supplementary learning latent vector, and The supplementary learning latent vector is calculated through vector algorithm to obtain the vector representation of the aspect word; the latent vector and the vector representation of the aspect word are input into the sentiment analysis encoder to obtain the emotional polarity loss; the supplementary learning latent vector is input into the sentence restoration module to obtain the intermediate result And supplementary learning loss; input the latent vector and intermediate results into the discriminator to obtain the discriminator loss.
请结合图2至图4,情感分析模型包括主模型100(The sentiment Classifier)、句子复原模块101(re-constructed sentence)和判别器102(Discriminator),主模型100包括任务无关编码器1000(task-free encoder)、情感分析编码器1001(ABSA encoder)和补充学习编码器1002(CL encoder),其中:方法包括以下步骤:通过任务无关编码器1000获取输入语句X,并对语句进行编码,得到语句中每个词的隐向量
Figure PCTCN2022128623-appb-000001
将隐向量
Figure PCTCN2022128623-appb-000002
输入补充学习编码器1002,得到补充学习隐向量
Figure PCTCN2022128623-appb-000003
并将补充学习隐向量
Figure PCTCN2022128623-appb-000004
通过向量算法计算得到方面词的向量表征
Figure PCTCN2022128623-appb-000005
(A表示方面词);将隐向量
Figure PCTCN2022128623-appb-000006
与方面词的向量表征
Figure PCTCN2022128623-appb-000007
输入情感分析编码器1001,得到情感极性损失L SA;将补充学习隐向量
Figure PCTCN2022128623-appb-000008
输入句子复原模块101,得到中间结果以及补充学习损失L CL;将隐向量
Figure PCTCN2022128623-appb-000009
以及中间结果输入判别器102,得到判别器损失L D;对情感极性损失L SA、补充学习损失L CL以及判别器损失L D依据预设的第一算法计算得到整体损失L;通过整体损失L对情感分析模型 进行初步优化。通过在初步优化中设置了句子复原模块101,使得任务无关编码器1000和补充学习编码器1002实现了对文本中方面词的学习,增强了主模型100对方面词的理解和建模能力,从而提升了情感分析模型在情感分析任务上的性能。
Please refer to Figures 2 to 4. The sentiment analysis model includes a main model 100 (The sentiment Classifier), a sentence restoration module 101 (re-constructed sentence) and a discriminator 102 (Discriminator). The main model 100 includes a task-independent encoder 1000 (task -free encoder), sentiment analysis encoder 1001 (ABSA encoder) and supplementary learning encoder 1002 (CL encoder), where: the method includes the following steps: obtain the input sentence X through the task-free encoder 1000, and encode the sentence, obtaining The latent vector of each word in the sentence
Figure PCTCN2022128623-appb-000001
will the hidden vector
Figure PCTCN2022128623-appb-000002
Input the supplementary learning encoder 1002 to obtain the supplementary learning latent vector
Figure PCTCN2022128623-appb-000003
and will supplement the learning of latent vectors
Figure PCTCN2022128623-appb-000004
The vector representation of aspect words is calculated through vector algorithm.
Figure PCTCN2022128623-appb-000005
(A represents the aspect word); convert the hidden vector
Figure PCTCN2022128623-appb-000006
Vector representation with aspect words
Figure PCTCN2022128623-appb-000007
Input the sentiment analysis encoder 1001 to obtain the sentiment polarity loss L SA ; the hidden vector will be supplemented to learn
Figure PCTCN2022128623-appb-000008
Input the sentence restoration module 101 to obtain the intermediate results and supplementary learning loss L CL ; convert the latent vector
Figure PCTCN2022128623-appb-000009
And the intermediate results are input into the discriminator 102 to obtain the discriminator loss LD ; the emotional polarity loss L SA , the supplementary learning loss L CL and the discriminator loss LD are calculated according to the preset first algorithm to obtain the overall loss L; through the overall loss L performs preliminary optimization on the sentiment analysis model. By setting up the sentence restoration module 101 in the preliminary optimization, the task-independent encoder 1000 and the supplementary learning encoder 1002 realize the learning of aspect words in the text, thereby enhancing the main model 100's understanding and modeling ability of aspect words, thereby Improved the performance of sentiment analysis models on sentiment analysis tasks.
请结合图3至图5,在一些实施例中,得到情感极性损失L SA包括以下步骤:将隐向量
Figure PCTCN2022128623-appb-000010
与方面词的向量表征
Figure PCTCN2022128623-appb-000011
输入情感分析编码器1001,得到词汇的情感分析向量表征
Figure PCTCN2022128623-appb-000012
以及方面词的情感分析向量表征
Figure PCTCN2022128623-appb-000013
对隐向量
Figure PCTCN2022128623-appb-000014
词汇的情感分析向量表征
Figure PCTCN2022128623-appb-000015
以及方面词情感分析向量表征
Figure PCTCN2022128623-appb-000016
依据预设的第二算法进行计算,得到情感极性损失L SA。将隐向量
Figure PCTCN2022128623-appb-000017
词汇的情感分析向量表征
Figure PCTCN2022128623-appb-000018
以及方面词情感分析向量表征
Figure PCTCN2022128623-appb-000019
依据预设的第二算法进行计算,可得到用于表征情感分析模型情感极性预测能力的情感极性损失L SA,情感极性损失L SA可在整体损失L对情感分析模型进行初步优化的过程中,对情感分析模型的情感极性预测额能力进行优化,保证了优化方法的可靠性。
Please combine Figure 3 to Figure 5. In some embodiments, obtaining the emotional polarity loss LSA includes the following steps: convert the latent vector
Figure PCTCN2022128623-appb-000010
Vector representation with aspect words
Figure PCTCN2022128623-appb-000011
Input the sentiment analysis encoder 1001 to obtain the sentiment analysis vector representation of the vocabulary
Figure PCTCN2022128623-appb-000012
And sentiment analysis vector representation of aspect words
Figure PCTCN2022128623-appb-000013
latent vector
Figure PCTCN2022128623-appb-000014
Sentiment analysis vector representation of words
Figure PCTCN2022128623-appb-000015
And aspect word sentiment analysis vector representation
Figure PCTCN2022128623-appb-000016
Calculation is performed according to the preset second algorithm to obtain the emotional polarity loss L SA . will the hidden vector
Figure PCTCN2022128623-appb-000017
Sentiment analysis vector representation of words
Figure PCTCN2022128623-appb-000018
And aspect word sentiment analysis vector representation
Figure PCTCN2022128623-appb-000019
Calculated according to the preset second algorithm, the emotional polarity loss L SA used to characterize the emotional polarity prediction ability of the emotional analysis model can be obtained. The emotional polarity loss L SA can be used to initially optimize the emotional analysis model based on the overall loss L. In the process, the emotional polarity prediction ability of the sentiment analysis model is optimized to ensure the reliability of the optimization method.
在一些实施例中,预设的第二算法包括以下步骤:对隐向量
Figure PCTCN2022128623-appb-000020
依据预设的向量算法计算得到任务无关的句子表征
Figure PCTCN2022128623-appb-000021
对词汇的情感分析向量表征
Figure PCTCN2022128623-appb-000022
以及方面词的情感分析向量表征
Figure PCTCN2022128623-appb-000023
依据预设的向量算法计算得到情感分析句子表征
Figure PCTCN2022128623-appb-000024
将任务无关的句子表征
Figure PCTCN2022128623-appb-000025
与情感分析句子表征
Figure PCTCN2022128623-appb-000026
串联得到中间向量;将中间向量输入预设的第一全连接层1003,并将第一全连接层1003的输出结果输入第一SoftMax分类器1004,得到预测的情感极性
Figure PCTCN2022128623-appb-000027
将预测的情感极性
Figure PCTCN2022128623-appb-000028
与预设的标准进行对比计算得到情感极性损失L SA。通过预设的向量算法对隐向量
Figure PCTCN2022128623-appb-000029
词汇的情感分析向量表征
Figure PCTCN2022128623-appb-000030
以及方面词的情感分析向量表征
Figure PCTCN2022128623-appb-000031
进行计算,得到任务无关的句子表征
Figure PCTCN2022128623-appb-000032
与情感分析句子表征
Figure PCTCN2022128623-appb-000033
作为中间变量,反映了输入语句X在情感分析模型中所处不同位置的向量表征,将两个向量表征串联后,经过第一全连接层1003和第一SoftMax分类器1004的分类和归一化处理后,得到了主模型100对输入语句X中词汇的预测情感极性
Figure PCTCN2022128623-appb-000034
(标量),将预测的情感极性
Figure PCTCN2022128623-appb-000035
以标量的形式体现,便于后续对情感极性损失L SA的计算。
In some embodiments, the preset second algorithm includes the following steps:
Figure PCTCN2022128623-appb-000020
Task-independent sentence representations are calculated according to the preset vector algorithm.
Figure PCTCN2022128623-appb-000021
Sentiment analysis vector representation of words
Figure PCTCN2022128623-appb-000022
And sentiment analysis vector representation of aspect words
Figure PCTCN2022128623-appb-000023
Sentiment analysis sentence representation is calculated based on the preset vector algorithm.
Figure PCTCN2022128623-appb-000024
Represent task-irrelevant sentences
Figure PCTCN2022128623-appb-000025
Sentence Representation with Sentiment Analysis
Figure PCTCN2022128623-appb-000026
The intermediate vector is obtained through series connection; the intermediate vector is input into the preset first fully connected layer 1003, and the output result of the first fully connected layer 1003 is input into the first SoftMax classifier 1004 to obtain the predicted emotional polarity.
Figure PCTCN2022128623-appb-000027
Will predict the sentiment polarity of
Figure PCTCN2022128623-appb-000028
The emotional polarity loss L SA is calculated by comparing it with the preset standard. Hidden vectors are processed using preset vector algorithms
Figure PCTCN2022128623-appb-000029
Sentiment analysis vector representation of words
Figure PCTCN2022128623-appb-000030
And sentiment analysis vector representation of aspect words
Figure PCTCN2022128623-appb-000031
Perform calculations to obtain task-independent sentence representations
Figure PCTCN2022128623-appb-000032
Sentence Representation with Sentiment Analysis
Figure PCTCN2022128623-appb-000033
As an intermediate variable, it reflects the vector representation of the different positions of the input sentence After processing, the predicted emotional polarity of the main model 100 pairs of words in the input sentence X is obtained.
Figure PCTCN2022128623-appb-000034
(scalar) that will predict the sentiment polarity of
Figure PCTCN2022128623-appb-000035
It is expressed in the form of a scalar to facilitate the subsequent calculation of the emotional polarity loss LSA .
在一些实施例中,预设的向量算法为MaxPooling算法。方式如下:In some embodiments, the preset vector algorithm is the MaxPooling algorithm. Here's how:
Figure PCTCN2022128623-appb-000036
Figure PCTCN2022128623-appb-000036
Figure PCTCN2022128623-appb-000037
Figure PCTCN2022128623-appb-000037
Figure PCTCN2022128623-appb-000038
Figure PCTCN2022128623-appb-000038
请结合图3和图6,在一些实施例中,句子复原模块101包括补充学习解码器(The Specific Decoder)1010,得到补充学习损失包括以下步骤:将补充学习隐向量
Figure PCTCN2022128623-appb-000039
输入补充学习解码器1010重构输入语句X,得到预测的词汇
Figure PCTCN2022128623-appb-000040
将预测的 词汇
Figure PCTCN2022128623-appb-000041
与输入语句X中的词汇x t进行对比得到中间结果,并根据预设的第三算法计算得到补充学习损失L CL。通过补充学习解码器1010对补充学习编码器1002编码后的补充学习隐向量
Figure PCTCN2022128623-appb-000042
进行解码重构,得到了预测的词汇x t(标量)。可见,将预测的词汇以标量的形式体现,更容易在后面对补充学习损失L CL的计算。
Please combine Figure 3 and Figure 6. In some embodiments, the sentence restoration module 101 includes a supplementary learning decoder (The Specific Decoder) 1010. Obtaining the supplementary learning loss includes the following steps: adding the supplementary learning latent vector
Figure PCTCN2022128623-appb-000039
The input supplementary learning decoder 1010 reconstructs the input sentence X and obtains the predicted vocabulary
Figure PCTCN2022128623-appb-000040
will predict the vocabulary
Figure PCTCN2022128623-appb-000041
The intermediate result is obtained by comparing it with the vocabulary x t in the input sentence X, and the supplementary learning loss L CL is calculated according to the preset third algorithm. The supplementary learning latent vector encoded by the supplementary learning encoder 1002 by the supplementary learning decoder 1010
Figure PCTCN2022128623-appb-000042
Decoding and reconstruction are performed, and the predicted vocabulary x t (scalar) is obtained. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss L CL later.
在一些实施例中,预设的第三算法包括以下步骤:计算一个位置的词的负对数自然损失,并将所有位置的词的负对数自然损失的词求和,得到补充学习损失。通过单独地将每个词地损失求和,可以更加准确地反应模型在词汇预测的任务上与实际情况的偏差,更有利于后续对情感分析模型词汇预测能力的优化。示例性地,补充学习损失通过如下公式计算:In some embodiments, the preset third algorithm includes the following steps: calculating the negative log natural loss of the word at one position, and summing the negative log natural loss of the words at all positions to obtain the supplementary learning loss. By summing the losses of each word individually, the deviation of the model from the actual situation in the task of word prediction can be more accurately reflected, which is more conducive to the subsequent optimization of the word prediction ability of the sentiment analysis model. Illustratively, the supplementary learning loss is calculated by the following formula:
Figure PCTCN2022128623-appb-000043
Figure PCTCN2022128623-appb-000043
其中,n表示输入语句X中的词汇数量。Among them, n represents the number of words in the input sentence X.
请继续参阅图3,在一些实施例中,得到判别器损失包括以下步骤:依据隐向量
Figure PCTCN2022128623-appb-000044
以及中间结果,得到判别器102的目标输出
Figure PCTCN2022128623-appb-000045
对目标输出
Figure PCTCN2022128623-appb-000046
依据预设的第四算法进行计算,得到判别器损失L D。对抗学习的判别器102有效地控制了任务无关编码器1000和补充学习编码器1002对补充学习任务——句子重建任务的学习程度,避免了任务无关编码器1000和补充学习编码器1002对补充学习任务的过拟合,保证了主模型100对主要任务——情感分析任务的拟合优化程度。
Please continue to refer to Figure 3. In some embodiments, obtaining the discriminator loss includes the following steps: according to the hidden vector
Figure PCTCN2022128623-appb-000044
and the intermediate results to obtain the target output of the discriminator 102
Figure PCTCN2022128623-appb-000045
output to target
Figure PCTCN2022128623-appb-000046
Calculate according to the preset fourth algorithm to obtain the discriminator loss LD . The discriminator 102 of adversarial learning effectively controls the learning degree of the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning task-the sentence reconstruction task, and avoids the task-irrelevant encoder 1000 and the supplementary learning encoder 1002 on the supplementary learning. The over-fitting of the task ensures the optimization degree of fitting of the main model 100 to the main task-sentiment analysis task.
在一些实施例中,目标输出
Figure PCTCN2022128623-appb-000047
的取值为0或1,方式为:
In some embodiments, the target output
Figure PCTCN2022128623-appb-000047
The value is 0 or 1, the method is:
Figure PCTCN2022128623-appb-000048
Figure PCTCN2022128623-appb-000048
预设的第四算法包括以下步骤:将补充学习隐向量
Figure PCTCN2022128623-appb-000049
输入预设的第二全连接层1020,并将第二全连接层1020的输出结果输入到预设的第二SoftMax分类器1021,针对每个词得到一个2维向量,2维向量的每个维度对应目标输出
Figure PCTCN2022128623-appb-000050
在0和1上的分布概率,示例性地,
Figure PCTCN2022128623-appb-000051
在0上的预测的分布概率记为P(0|X)。
The preset fourth algorithm includes the following steps: will supplement the learning latent vector
Figure PCTCN2022128623-appb-000049
Input the preset second fully connected layer 1020, and input the output result of the second fully connected layer 1020 to the preset second SoftMax classifier 1021, and obtain a 2-dimensional vector for each word, and each of the 2-dimensional vectors Dimensions correspond to target output
Figure PCTCN2022128623-appb-000050
Distribution probability over 0 and 1, for example,
Figure PCTCN2022128623-appb-000051
The predicted distribution probability on 0 is denoted as P(0|X).
依据分布概率得到判别器损失。基于判别器102的目标输出在0和1上的分布概率来计算判别器损失,可以通过判别器损失的值来直观地判断补充学习任务是否过拟合,且过程简单,占用较少的计算资源的同时,拥有极高的计算效率。The discriminator loss is obtained based on the distribution probability. The discriminator loss is calculated based on the distribution probability of the target output of the discriminator 102 on 0 and 1. The value of the discriminator loss can be used to intuitively determine whether the supplementary learning task is overfitting. The process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.
在一些实施例中,针对每个词,计算对数自然损失,并把所有词的损失相加,得到判别器损失L D。采用以下公式计算: In some embodiments, for each word, logarithmic natural loss is calculated, and the losses of all words are summed to obtain the discriminator loss LD . Calculated using the following formula:
Figure PCTCN2022128623-appb-000052
Figure PCTCN2022128623-appb-000052
在一些实施例中,计算整体损失L时,引入可调节的控制参数λ,控制参数λ用于控制句子复原模块101以及对抗学习的判别器102对模型训练贡献的大小。控制参数λ为整体损失L的计算提供了可调节的选项,在对情感分析模型进行优化时,通过调节控制参数λ,即可控制对情感分析模型进行优化的侧重点。可见,控制参数λ的设置提升了优化方法的实用性。In some embodiments, when calculating the overall loss L, an adjustable control parameter λ is introduced. The control parameter λ is used to control the contribution of the sentence restoration module 101 and the adversarial learning discriminator 102 to model training. The control parameter λ provides an adjustable option for the calculation of the overall loss L. When optimizing the sentiment analysis model, by adjusting the control parameter λ, the focus of optimizing the sentiment analysis model can be controlled. It can be seen that the setting of the control parameter λ improves the practicality of the optimization method.
通过下列公式计算整体损失L:Calculate the overall loss L by the following formula:
L=L SA+L D×(λ*L CL) L=L SA +L D ×(λ*L CL )
并在计算出整体损失L后,通过反向传播算法更新情感分析模型中的所有参数。And after calculating the overall loss L, all parameters in the sentiment analysis model are updated through the back propagation algorithm.
在一些实施例中,优化方法还包括以下步骤:在通过整体损失L对情感分析模型进行优化之后,再通过调整情感分析模型中部分模块的参数,对情感分析模型中的参数进行二次优化更新。通过进行二次优化更新,二次优化后的情感分析模型能够取得更好的性能。同时,二次优化后的情感分析模型在使用时与情感分析的基线模型相同,仅仅需要句子与实体作为输入,不依赖额外的输入,与基线模型相比,强化了性能的同时又未造成额外的使用开销。In some embodiments, the optimization method also includes the following steps: after optimizing the sentiment analysis model through the overall loss L, and then performing a secondary optimization update on the parameters in the sentiment analysis model by adjusting the parameters of some modules in the sentiment analysis model. . By performing secondary optimization updates, the sentiment analysis model after secondary optimization can achieve better performance. At the same time, the sentiment analysis model after secondary optimization is used the same as the baseline model of sentiment analysis. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
在一些实施例中,二次优化包括以下步骤:通过整体损失L对情感分析模型优化后的参数,对任务无关编码器1000、补充学习编码器1002、情感分析编码器1001、预设的向量算法与第一、第二全连接层1020和SoftMax分类器进行初始化;采用与初步优化中相同的方式计算出优化后的情感极性损失
Figure PCTCN2022128623-appb-000053
依据优化后的情感极性损失
Figure PCTCN2022128623-appb-000054
通过反向传播算法对情感分析编码器1001、第一全连接层1003和第二全连接层1020中的模型参数进行二次优化更新,得到最终优化后的情感分析模型。基于初步优化后的情感分析模型,在二次优化过程中只对模型内部的一些模块进行优化,因为初步优化模型内部的一些模块在初步优化后,已经接近于最佳的优化阈值,即使再对其进行优化,效果也不会太好,即继续对整个情感分析模型进行二次优化会造成计算资源的浪费,因此,在二次优化的过程中只对情感分析编码器1001、第一全连接层1003和第二全连接层1020中的模型参数进行二次优化更新,在节省了计算资源的同时,也能取得良好的优化效果。
In some embodiments, the secondary optimization includes the following steps: optimizing the parameters of the sentiment analysis model through the overall loss L, applying the task-independent encoder 1000, the supplementary learning encoder 1002, the sentiment analysis encoder 1001, and the preset vector algorithm Initialize with the first and second fully connected layers 1020 and SoftMax classifier; calculate the optimized emotional polarity loss in the same way as in the preliminary optimization
Figure PCTCN2022128623-appb-000053
According to the optimized emotional polarity loss
Figure PCTCN2022128623-appb-000054
The model parameters in the sentiment analysis encoder 1001, the first fully connected layer 1003 and the second fully connected layer 1020 are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model. Based on the sentiment analysis model after preliminary optimization, only some modules inside the model are optimized in the secondary optimization process, because some modules inside the preliminary optimization model are already close to the optimal optimization threshold after preliminary optimization. If it is optimized, the effect will not be very good, that is, continuing to perform secondary optimization on the entire sentiment analysis model will cause a waste of computing resources. Therefore, in the process of secondary optimization, only the sentiment analysis encoder 1001 and the first full connection are The model parameters in layer 1003 and the second fully connected layer 1020 are optimized and updated twice, which not only saves computing resources, but also achieves good optimization results.
关于情感分析模型在情感分析任务上性能的体现,一般用F值来表示,对于英文关系的抽取数据集上,使用优化方法进行优化前,情感分析模型的F值为77.04;使用优化方法进行优化后,情感分析模型的F值为77.70。可见,通 过引入句子复原模块101,增强了模型对方面词的建模能力,从而提升了情感分析模型在情感分析任务上的性能。Regarding the performance of the sentiment analysis model on sentiment analysis tasks, it is generally expressed by the F value. For the English relationship extraction data set, before using the optimization method for optimization, the F value of the sentiment analysis model was 77.04; using the optimization method for optimization Finally, the F-value of the sentiment analysis model is 77.70. It can be seen that by introducing the sentence restoration module 101, the model's modeling ability of aspect words is enhanced, thereby improving the performance of the sentiment analysis model on sentiment analysis tasks.
与相关技术相比,本申请所提供的一种用于自然语言模型的优化方法,可以实现:Compared with related technologies, the optimization method for natural language models provided by this application can achieve:
1.本申请实施例提供的用于自然语言模型的优化方法中,自然语言模型可以为情感分析模型,情感分析模型包括主模型、句子复原模块和判别器,主模型包括任务无关编码器、情感分析编码器和补充学习编码器,其中:方法包括以下步骤:通过任务无关编码器获取输入语句,并对语句进行编码,得到语句中每个词的隐向量;将隐向量输入补充学习编码器,得到补充学习隐向量,并将补充学习隐向量通过向量算法计算得到方面词的向量表征;将隐向量与方面词的向量表征输入情感分析编码器,得到情感极性损失;将补充学习隐向量输入句子复原模块,得到中间结果以及补充学习损失;将隐向量以及中间结果输入判别器,得到判别器损失;对情感极性损失、补充学习损失以及判别器损失依据预设的第一算法计算得到整体损失;通过整体损失对情感分析模型进行初步优化。通过在初步优化中设置了句子复原模块,使得任务无关编码器和补充学习编码器实现了对文本中方面词的学习,增强了主模型对方面词的理解和建模能力,从而提升了情感分析模型在情感分析任务上的性能。1. In the optimization method for natural language models provided by the embodiments of this application, the natural language model can be an emotion analysis model. The emotion analysis model includes a main model, a sentence restoration module and a discriminator. The main model includes a task-independent encoder, emotion Analytical encoder and supplementary learning encoder, wherein: the method includes the following steps: obtain the input sentence through the task-independent encoder, encode the sentence, and obtain the latent vector of each word in the sentence; input the latent vector into the supplementary learning encoder, Obtain the supplementary learning latent vector, and calculate the supplementary learning latent vector through the vector algorithm to obtain the vector representation of the aspect word; input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the emotional polarity loss; input the supplementary learning latent vector The sentence restoration module obtains the intermediate results and supplementary learning loss; inputs the latent vector and intermediate results into the discriminator to obtain the discriminator loss; calculates the emotional polarity loss, supplementary learning loss, and discriminator loss according to the preset first algorithm to obtain the overall Loss; preliminary optimization of the sentiment analysis model through overall loss. By setting up the sentence restoration module in the preliminary optimization, the task-independent encoder and the supplementary learning encoder realize the learning of aspect words in the text, which enhances the main model's understanding and modeling ability of aspect words, thereby improving sentiment analysis. Model performance on sentiment analysis tasks.
2.本申请实施例提供的用于自然语言模型的优化方法中,得到情感极性损失包括以下步骤:将隐向量与方面词的向量表征输入情感分析编码器,得到词汇的情感分析向量表征以及方面词的情感分析向量表征;对隐向量、词汇的情感分析向量表征以及方面词情感分析向量表征依据预设的第二算法进行计算,得到情感极性损失。将隐向量、词汇的情感分析向量表征以及方面词情感分析向量表征依据预设的第二算法进行计算,可得到当用于表征情感分析模型情感极性预测能力的情感极性损失,情感极性损失可通过整体损失对情感分析模型进行初步优化的过程中,对情感分析模型的情感极性预测额能力进行优化,保证了优化方法的可靠性。2. In the optimization method for natural language models provided by the embodiments of this application, obtaining the emotional polarity loss includes the following steps: input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder, and obtain the sentiment analysis vector representation of the vocabulary; Sentiment analysis vector representation of aspect words; the latent vector, sentiment analysis vector representation of words and aspect word sentiment analysis vector representation are calculated according to the preset second algorithm to obtain the emotional polarity loss. Calculate the latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words according to the preset second algorithm, and you can obtain the emotional polarity loss, which is used to characterize the emotional polarity prediction ability of the emotional analysis model. The loss can be used to optimize the emotional polarity prediction ability of the emotional analysis model during the preliminary optimization of the emotional analysis model through the overall loss, ensuring the reliability of the optimization method.
3.本申请实施例提供的用于自然语言模型的优化方法中,预设的第二算法包括以下步骤:对隐向量依据预设的向量算法计算得到任务无关的句子表征;对词汇的情感分析向量表征以及方面词的情感分析向量表征依据预设的向量算法计算得到情感分析句子表征;将任务无关的句子表征与情感分析句子表征串联得到中间向量;将中间向量输入预设的第一全连接层,并将第一全连接层的输出结果输入预设的SoftMax分类器,得到预测的情感极性;将预测的情感极性与预设的标准进行对比计算得到情感极性损失。通过预设的向量算法对隐向量、词汇的情感分析向量表征以及方面词的情感分析向量表征进行计算,得到任务 无关的句子表征与情感分析句子表征作为中间变量,反映了输入句子在情感分析模型中所处不同位置的向量表征,将两个向量表征串联后,经过第一全连接层和第一SoftMax分类器的分类和归一化处理后,得到了主模型对输入语句中词汇的预测极性(标量),将预测的极性以标量的形式体现,便于后续对情感极性损失的计算。3. In the optimization method for natural language models provided by the embodiments of this application, the preset second algorithm includes the following steps: calculating latent vectors according to the preset vector algorithm to obtain task-independent sentence representations; performing emotional analysis on vocabulary The vector representation and the sentiment analysis vector representation of the aspect words are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation; the task-irrelevant sentence representation and the sentiment analysis sentence representation are concatenated to obtain an intermediate vector; the intermediate vector is input into the preset first full connection layer, and input the output result of the first fully connected layer into the preset SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss. The preset vector algorithm is used to calculate latent vectors, sentiment analysis vector representations of words, and sentiment analysis vector representations of aspect words, and obtain task-independent sentence representations and sentiment analysis sentence representations as intermediate variables, reflecting the input sentence in the sentiment analysis model. After concatenating the two vector representations, and after the classification and normalization processing of the first fully connected layer and the first SoftMax classifier, the prediction pole of the vocabulary in the input sentence by the main model is obtained. (scalar), which embodies the predicted polarity in the form of a scalar to facilitate the subsequent calculation of emotional polarity loss.
4.本申请实施例提供的用于自然语言模型的优化方法中,句子复原模块包括补充学习解码器,得到补充学习损失包括以下步骤:将补充学习隐向量输入补充学习解码器重构输入语句,得到预测的词汇;将预测的词汇与输入语句中的词汇进行对比得到中间结果,并根据预设的第三算法计算得到补充学习损失。通过补充学习编码器对补充学习编码器编码后的补充学习隐向量进行解码重构,得到了预测的词汇(标量)。可见,将预测的词汇以标量的形式体现,更容易在后面对补充学习损失的计算。4. In the optimization method for natural language models provided by the embodiments of this application, the sentence restoration module includes a supplementary learning decoder, and obtaining the supplementary learning loss includes the following steps: inputting the supplementary learning latent vector into the supplementary learning decoder to reconstruct the input sentence, Obtain the predicted vocabulary; compare the predicted vocabulary with the vocabulary in the input sentence to obtain the intermediate result, and calculate the supplementary learning loss according to the preset third algorithm. The predicted vocabulary (scalar) is obtained by decoding and reconstructing the supplementary learning latent vector encoded by the supplementary learning encoder through the supplementary learning encoder. It can be seen that embodying the predicted vocabulary in the form of a scalar makes it easier to calculate the supplementary learning loss later.
5.本申请实施例提供的用于自然语言模型的优化方法中,预设的第三算法包括以下步骤:计算一个位置的词的负对数自然损失,并将所有位置的词的负对数自然损失的词求和,得到补充学习损失。通过单独地将每个词地损失求和,可以更加准确地反应模型在词汇预测的任务上与实际情况的偏差,更有利于后续对情感分析模型词汇预测能力的优化。5. In the optimization method for natural language models provided by the embodiment of this application, the preset third algorithm includes the following steps: calculate the negative log natural loss of the word in one position, and add the negative logarithm of the word in all positions. The word summation of the natural loss is obtained by supplementing the learned loss. By summing the losses of each word individually, the deviation of the model from the actual situation in the word prediction task can be more accurately reflected, which is more conducive to the subsequent optimization of the word prediction ability of the sentiment analysis model.
6.本申请实施例提供的用于自然语言模型的优化方法中,得到判别器损失还包括以下步骤:依据隐向量以及中间结果,得到判别器的目标输出;对目标输出依据预设的第四算法进行计算,得到判别器损失。对抗学习的判别器有效地控制了任务无关编码器和补充学习编码器对补充学习任务——句子重建任务的学习程度,避免了任务无关编码器和补充学习编码器对补充学习任务的过拟合,保证了主模型对主要任务——情感分析任务的拟合优化程度。6. In the optimization method for natural language models provided by the embodiment of this application, obtaining the discriminator loss also includes the following steps: obtaining the target output of the discriminator based on the latent vector and the intermediate result; and calculating the target output according to the preset fourth The algorithm performs calculations and obtains the discriminator loss. The discriminator of adversarial learning effectively controls the learning degree of the task-independent encoder and the supplementary learning encoder on the supplementary learning task-the sentence reconstruction task, and avoids the overfitting of the task-independent encoder and the supplementary learning encoder on the supplementary learning task. , ensuring the degree of fitting optimization of the main model for the main task-sentiment analysis task.
7.本申请实施例提供的用于自然语言模型的优化方法中,目标输出的取值为0或1,预设的第四算法包括以下步骤:将补充学习隐向量输入预设的第二全连接层,并将第二全连接层的输出结果输入到预设的第二SoftMax分类器,针对每个词得到一个2维向量,2维向量的每个维度对应目标输出在0和1上的分布概率;依据分布概率得到判别器损失。基于判别器的目标输出在0和1上的分布概率来计算判别器损失,可以通过判别器损失的值来直观地判断补充学习任务是否过拟合,且过程简单,占用较少的计算资源的同时,拥有极高的计算效率。7. In the optimization method for natural language model provided by the embodiment of the present application, the value of the target output is 0 or 1, and the preset fourth algorithm includes the following steps: input the supplementary learning latent vector into the preset second full algorithm. connection layer, and input the output result of the second fully connected layer to the preset second SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained based on the distribution probability. Calculate the discriminator loss based on the distribution probability of the discriminator's target output on 0 and 1. You can intuitively judge whether the supplementary learning task is overfitting through the value of the discriminator loss. The process is simple and takes up less computing resources. At the same time, it has extremely high computing efficiency.
8.本申请实施例提供的用于自然语言模型的优化方法中,计算整体损失时,引入可调节的控制参数,控制参数用于控制句子复原模块以及对抗学习的判别器对模型训练贡献的大小。控制参数为整体损失的计算提供了可调节的选项, 在对情感分析模型进行优化时,通过调节控制参数,即可控制对情感分析模型进行优化的侧重点。可见,控制参数的设置提升了优化方法的实用性。8. In the optimization method for natural language models provided by the embodiments of this application, when calculating the overall loss, adjustable control parameters are introduced. The control parameters are used to control the contribution of the sentence restoration module and the adversarial learning discriminator to model training. . The control parameters provide adjustable options for the calculation of the overall loss. When optimizing the sentiment analysis model, by adjusting the control parameters, you can control the focus of optimizing the sentiment analysis model. It can be seen that the setting of control parameters improves the practicality of the optimization method.
9.本申请实施例提供的用于自然语言模型的优化方法中,还包括以下步骤:在通过整体损失对情感分析模型进行优化之后,再通过调整情感分析模型中部分模块的参数,对情感分析模型中的参数进行二次优化更新。可以理解地,通过进行二次优化更新,二次优化后的关系抽取模型能够取得更好的性能。同时,二次优化后的关系抽取模型在使用时与关系抽取的基线模型相同,仅仅需要句子与实体作为输入,不依赖额外的输入,与基线模型相比,强化了性能的同时又未造成额外的使用开销。9. The optimization method for the natural language model provided by the embodiment of the present application also includes the following steps: after optimizing the sentiment analysis model through the overall loss, and then adjusting the parameters of some modules in the sentiment analysis model to perform the sentiment analysis. The parameters in the model are updated through secondary optimization. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
10.本申请实施例提供的用于自然语言模型的优化方法中,二次优化包括以下步骤:通过整体损失对情感分析模型优化后的参数,对任务无关编码器、补充学习编码器、情感分析编码器、预设的向量算法与第一、第二全连接层和SoftMax分类器进行初始化;采用与初步优化中相同的方式计算出优化后的情感极性损失;依据优化后的情感极性损失,通过反向传播算法对情感分析编码器、第一全连接层和第二全连接层中的模型参数进行二次优化更新,得到最终优化后的情感分析模型。基于初步优化后的情感分析模型,在二次优化过程中只对模型内部的一些模块进行优化,因为初步优化模型内部的一些模块在初步优化后,已经接近于最佳的优化阈值,即使再对其进行优化,效果也不会太好,即继续对整个情感分析模型进行二次优化会造成计算资源的浪费,因此,在二次优化的过程中只对情感分析编码器、第一全连接层和第二全连接层中的模型参数进行二次优化更新,在节省了计算资源的同时,也能取得良好的优化效果。10. In the optimization method for natural language models provided by the embodiments of this application, the secondary optimization includes the following steps: optimizing parameters of the sentiment analysis model through overall loss, task-independent encoder, supplementary learning encoder, sentiment analysis The encoder, the preset vector algorithm, the first and second fully connected layers and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss , the model parameters in the sentiment analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice through the back propagation algorithm to obtain the final optimized sentiment analysis model. Based on the sentiment analysis model after preliminary optimization, only some modules inside the model are optimized in the secondary optimization process, because some modules inside the preliminary optimization model are already close to the optimal optimization threshold after preliminary optimization. If it is optimized, the effect will not be very good. That is, continuing to perform secondary optimization on the entire sentiment analysis model will cause a waste of computing resources. Therefore, in the process of secondary optimization, only the sentiment analysis encoder and the first fully connected layer will be optimized. Performing a secondary optimization update with the model parameters in the second fully connected layer can achieve good optimization results while saving computing resources.
一实施例中,自然语言模型为关系抽取模型,增强模块为实体识别学习模块,第一编码器为共享编码器,第二编码器为关系抽取编码器,目标结果损失为关系类型损失,增强损失为实体标签损失。In one embodiment, the natural language model is a relation extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relation extraction encoder, the target result loss is a relation type loss, and the enhancement loss is the entity label loss.
将所述隐向量输入第二编码器、增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:Input the latent vector into the second encoder, enhancement module and discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:
将隐向量输入关系抽取编码器、实体识别学习模块和判别器,分别得到关系类型损失、实体标签损失以及判别器损失。Input the latent vector into the relationship extraction encoder, entity recognition learning module and discriminator to obtain the relationship type loss, entity label loss and discriminator loss respectively.
请结合图7至图9,一种用于自然语言模型的优化方法,包括以下步骤:通过预设的主模型200(关系抽取(Relationship extraction,RE))中的共享编码器2000(Shared Encoder)获取输入的语句,并对语句进行编码,输出该语句中每个词的隐向量
Figure PCTCN2022128623-appb-000055
关系抽取模型包括主模型200、实体识别学习模块201(NER) 和判别器202(THE Discriminator),主模型200还包括关系抽取编码器2001(RE encoder),将隐向量
Figure PCTCN2022128623-appb-000056
输入关系抽取编码器2001、实体识别学习模块201和判别器202,分别得到关系类型损失L RE、实体标签损失L NER以及判别器202损失;将关系类型损失L RE、实体标签损失L NER以及判别器202损失L D通过预设的第一算法计算得到整体损失L;通过整体损失L对关系抽取模型进行初步优化。通过在初步优化中设置了实体识别学习模块201,使得主模型200中的共享编码器2000实现了对文本中实体的学习,增强了主模型200对实体的建模能力,从而提升了关系抽取模型在关系抽取任务上的性能。
Please combine Figure 7 to Figure 9. An optimization method for a natural language model includes the following steps: through the shared encoder 2000 (Shared Encoder) in the preset main model 200 (Relationship extraction (RE)) Get the input sentence, encode the sentence, and output the hidden vector of each word in the sentence
Figure PCTCN2022128623-appb-000055
The relationship extraction model includes a main model 200, an entity recognition learning module 201 (NER) and a discriminator 202 (THE Discriminator). The main model 200 also includes a relationship extraction encoder 2001 (RE encoder), which converts the latent vector into
Figure PCTCN2022128623-appb-000056
Input the relationship extraction encoder 2001, the entity recognition learning module 201 and the discriminator 202 to obtain the relationship type loss L RE , the entity label loss L NER and the discriminator 202 loss respectively; combine the relationship type loss L RE , the entity label loss L NER and the discriminant loss The loss L D of the device 202 is calculated by the preset first algorithm to obtain the overall loss L; the relationship extraction model is initially optimized through the overall loss L. By setting up the entity recognition learning module 201 in the preliminary optimization, the shared encoder 2000 in the main model 200 realizes the learning of entities in the text, which enhances the modeling ability of the main model 200 for entities, thus improving the relationship extraction model. Performance on relation extraction tasks.
请结合图8和图10,在一些实施例中,主模型200得到关系类型损失包括以下步骤:将隐向量
Figure PCTCN2022128623-appb-000057
输入主模型200中的关系抽取编码器2001,关系抽取编码器2001对隐向量
Figure PCTCN2022128623-appb-000058
进行编码,得到每个词的关系抽取隐向量
Figure PCTCN2022128623-appb-000059
通过对每个词的隐向量
Figure PCTCN2022128623-appb-000060
和关系抽取隐向量
Figure PCTCN2022128623-appb-000061
通过预设的第二算法进行计算处理,得到预测的关系类型
Figure PCTCN2022128623-appb-000062
将预测的关系类型
Figure PCTCN2022128623-appb-000063
与预设的标准进行对比计算,得到关系类型损失L RE。将隐向量
Figure PCTCN2022128623-appb-000064
和编码后的关系抽取隐向量
Figure PCTCN2022128623-appb-000065
通过第二算法进行处理,可得到当前主模型200的预测关系类型
Figure PCTCN2022128623-appb-000066
将预测的关系类型
Figure PCTCN2022128623-appb-000067
与预设的标准比较计算,得到用于表征关系抽取模型预测关系类型能力的关系类型损失L RE,关系类型损失L RE可在通过整体损失L对关系抽取模型进行初步优化的过程中,对关系抽取模型的关系预测能力进行优化,保证了优化方法的可靠性。
Please combine Figure 8 and Figure 10. In some embodiments, the main model 200 obtains the relationship type loss including the following steps: convert the latent vector
Figure PCTCN2022128623-appb-000057
Input the relationship extraction encoder 2001 in the main model 200, and the relationship extraction encoder 2001 pairs the latent vector
Figure PCTCN2022128623-appb-000058
Encode to obtain the relationship extraction latent vector of each word
Figure PCTCN2022128623-appb-000059
Through the latent vector of each word
Figure PCTCN2022128623-appb-000060
and relationship extraction latent vector
Figure PCTCN2022128623-appb-000061
Perform calculation processing through the preset second algorithm to obtain the predicted relationship type
Figure PCTCN2022128623-appb-000062
The type of relationship that will be predicted
Figure PCTCN2022128623-appb-000063
Compare and calculate with the preset standard to obtain the relationship type loss L RE . will the hidden vector
Figure PCTCN2022128623-appb-000064
Extract latent vectors from the encoded relationship
Figure PCTCN2022128623-appb-000065
Through processing by the second algorithm, the predicted relationship type of the current main model 200 can be obtained
Figure PCTCN2022128623-appb-000066
The type of relationship that will be predicted
Figure PCTCN2022128623-appb-000067
Compare and calculate with the preset standard to obtain the relationship type loss L RE that is used to characterize the ability of the relationship extraction model to predict relationship types. The relationship type loss L RE can be used to initially optimize the relationship extraction model through the overall loss L. The relationship prediction ability of the extracted model is optimized to ensure the reliability of the optimization method.
请参阅图8,在一些实施例中,预设的第二算法包括以下步骤:通过预设的向量算法对关系抽取隐向量
Figure PCTCN2022128623-appb-000068
进行计算,得到第一实体的向量表征
Figure PCTCN2022128623-appb-000069
第二实体的向量表征
Figure PCTCN2022128623-appb-000070
以及关系抽取编码器2001的句子表征
Figure PCTCN2022128623-appb-000071
同时将预设的向量算法应用于隐向量,得到共享编码器2000的句子表征
Figure PCTCN2022128623-appb-000072
将第一实体的向量表征
Figure PCTCN2022128623-appb-000073
第二实体的向量表征
Figure PCTCN2022128623-appb-000074
关系抽取编码器2001的句子表征
Figure PCTCN2022128623-appb-000075
以及共享编码器2000的句子表征
Figure PCTCN2022128623-appb-000076
串联,得到中间向量o;中间向量o经过预设的第一全连接层2002后送入第一SoftMax分类器2003,得到预测的关系类型
Figure PCTCN2022128623-appb-000077
通过预设的向量算法对隐向量和关系抽取隐向量进行计算,得到的第一实体的向量表征
Figure PCTCN2022128623-appb-000078
第二实体的向量表征
Figure PCTCN2022128623-appb-000079
(E 1和E 2分别表示两个实体)、关系抽取编码器2001的句子表征
Figure PCTCN2022128623-appb-000080
以及共享编码器2000的句子表征
Figure PCTCN2022128623-appb-000081
作为中间变量,反映了实体或句子在关系抽取模型中所处不同位置的向量表征,上述的向量表征在串联之后得到的中间向量o,在经过第一全连接层2002和第一SoftMax分类器2003的分类和归一化处理后,得到了主模型200对两个实体的预测的关系类型
Figure PCTCN2022128623-appb-000082
(标量),将预测的关系类型
Figure PCTCN2022128623-appb-000083
以标量的形式体现,便于后续对关系类型损失L RE的计算。
Please refer to Figure 8. In some embodiments, the preset second algorithm includes the following steps: extracting latent vectors for relationships through a preset vector algorithm.
Figure PCTCN2022128623-appb-000068
Perform calculations to obtain the vector representation of the first entity
Figure PCTCN2022128623-appb-000069
Vector representation of the second entity
Figure PCTCN2022128623-appb-000070
and Sentence Representation with Relation Extraction Encoder 2001
Figure PCTCN2022128623-appb-000071
At the same time, the preset vector algorithm is applied to the hidden vector to obtain the sentence representation of the shared encoder 2000
Figure PCTCN2022128623-appb-000072
Represent the vector of the first entity
Figure PCTCN2022128623-appb-000073
Vector representation of the second entity
Figure PCTCN2022128623-appb-000074
Sentence Representation for Relation Extraction Encoder 2001
Figure PCTCN2022128623-appb-000075
and shared encoder 2000 sentence representation
Figure PCTCN2022128623-appb-000076
concatenate to obtain the intermediate vector o; the intermediate vector o passes through the preset first fully connected layer 2002 and then is sent to the first SoftMax classifier 2003 to obtain the predicted relationship type.
Figure PCTCN2022128623-appb-000077
The vector representation of the first entity is obtained by calculating the implicit vector and the relationship extraction implicit vector through the preset vector algorithm.
Figure PCTCN2022128623-appb-000078
Vector representation of the second entity
Figure PCTCN2022128623-appb-000079
(E 1 and E 2 represent two entities respectively), sentence representation of relation extraction encoder 2001
Figure PCTCN2022128623-appb-000080
and shared encoder 2000 sentence representation
Figure PCTCN2022128623-appb-000081
As an intermediate variable, it reflects the vector representation of the different positions of entities or sentences in the relationship extraction model. The above-mentioned vector representation is the intermediate vector o obtained after concatenation, after passing through the first fully connected layer 2002 and the first SoftMax classifier 2003 After classification and normalization, the relationship type predicted by the main model 200 for two entities is obtained.
Figure PCTCN2022128623-appb-000082
(scalar), the type of relationship that will be predicted
Figure PCTCN2022128623-appb-000083
It is expressed in the form of a scalar to facilitate the subsequent calculation of the relationship type loss L RE .
在一些实施例中,预设的向量算法为MaxPooling算法。方式如下:In some embodiments, the preset vector algorithm is the MaxPooling algorithm. Here's how:
Figure PCTCN2022128623-appb-000084
Figure PCTCN2022128623-appb-000084
Figure PCTCN2022128623-appb-000085
Figure PCTCN2022128623-appb-000085
Figure PCTCN2022128623-appb-000086
Figure PCTCN2022128623-appb-000086
Figure PCTCN2022128623-appb-000087
Figure PCTCN2022128623-appb-000087
请结合图8和图11,在一些实施例中,实体识别学习模块201包括实体编码器2010,实体识别学习模块201得到实体标签损失包括以下步骤:将隐向量
Figure PCTCN2022128623-appb-000088
输入到实体识别学习模块201中的实体编码器2010,实体编码器2010对隐向量
Figure PCTCN2022128623-appb-000089
进一步编码,得到每个词的实体识别隐向量
Figure PCTCN2022128623-appb-000090
通过对每个词的隐向量
Figure PCTCN2022128623-appb-000091
和实体识别隐向量
Figure PCTCN2022128623-appb-000092
进行转换处理,得到预测的实体识别标签
Figure PCTCN2022128623-appb-000093
将所有词的预测的实体识别标签
Figure PCTCN2022128623-appb-000094
与预设的标准对比计算,得到实体标签损失L NER。通过实体编码器2010对隐向量
Figure PCTCN2022128623-appb-000095
的编码得到了每个词的实体识别隐向量
Figure PCTCN2022128623-appb-000096
并对每个词的隐向量
Figure PCTCN2022128623-appb-000097
和实体识别隐向量
Figure PCTCN2022128623-appb-000098
进行转换处理,转换为用于一个词处于实体的大致位置的预测的实体识别标签
Figure PCTCN2022128623-appb-000099
(标量),此处的标量指的是将没有方向的量,其内容可以为数字或字符。可见,将预测的实体识别标签
Figure PCTCN2022128623-appb-000100
以标量的形式体现,更容易在后于对实体识别损失进行计算。
Please combine Figure 8 and Figure 11. In some embodiments, the entity recognition learning module 201 includes an entity encoder 2010. The entity recognition learning module 201 obtains the entity label loss including the following steps: converting the latent vector
Figure PCTCN2022128623-appb-000088
Input to the entity encoder 2010 in the entity recognition learning module 201, the entity encoder 2010 encodes the latent vector
Figure PCTCN2022128623-appb-000089
Further encoding, the entity recognition latent vector of each word is obtained.
Figure PCTCN2022128623-appb-000090
Through the latent vector of each word
Figure PCTCN2022128623-appb-000091
and entity recognition latent vector
Figure PCTCN2022128623-appb-000092
Perform conversion processing to obtain predicted entity recognition tags
Figure PCTCN2022128623-appb-000093
Convert all words’ predicted entity recognition labels to
Figure PCTCN2022128623-appb-000094
Compare and calculate with the preset standard to obtain the entity label loss L NER . Latent vectors via Entity Encoder 2010
Figure PCTCN2022128623-appb-000095
The encoding obtains the entity recognition latent vector of each word
Figure PCTCN2022128623-appb-000096
And the hidden vector of each word
Figure PCTCN2022128623-appb-000097
and entity recognition latent vector
Figure PCTCN2022128623-appb-000098
Convert to a predicted entity recognition label for the approximate location of a word in an entity
Figure PCTCN2022128623-appb-000099
(Scalar), the scalar here refers to a quantity without direction, and its content can be numbers or characters. It can be seen that the predicted entity recognition label will be
Figure PCTCN2022128623-appb-000100
Reflected in the form of a scalar, it is easier to calculate the entity recognition loss later.
请参阅图8,在一些实施例中,转换处理包括以下步骤:将每一个词的隐向量
Figure PCTCN2022128623-appb-000101
与实体识别隐向量
Figure PCTCN2022128623-appb-000102
串联,并把串联后得到的向量送入第二全连接层2011和第二SoftMax分类器2012,从而得到预测的实体识别标签
Figure PCTCN2022128623-appb-000103
可以理解地,将串联后的向量通过第二全连接层2011和第二SoftMax分类器2012分类并转换为预测的实体识别标签的方式
Figure PCTCN2022128623-appb-000104
与主模型200中将中间向量o转换为预测的关系类
Figure PCTCN2022128623-appb-000105
的方式相同,将两种方式统一更便于对关系抽取模型的建立、优化和维护。
Referring to Figure 8, in some embodiments, the conversion process includes the following steps: transforming the latent vector of each word into
Figure PCTCN2022128623-appb-000101
and entity recognition latent vector
Figure PCTCN2022128623-appb-000102
Concatenate, and send the vector obtained after concatenation into the second fully connected layer 2011 and the second SoftMax classifier 2012 to obtain the predicted entity recognition label
Figure PCTCN2022128623-appb-000103
Understandably, the concatenated vectors are classified and converted into predicted entity recognition labels through the second fully connected layer 2011 and the second SoftMax classifier 2012
Figure PCTCN2022128623-appb-000104
Convert the intermediate vector o to the predicted relation class in the main model 200
Figure PCTCN2022128623-appb-000105
The methods are the same, and unifying the two methods makes it easier to establish, optimize and maintain the relationship extraction model.
在一些实施例中,预设的标准为实际的实体标签
Figure PCTCN2022128623-appb-000106
将所有词的预测的实体识别标签
Figure PCTCN2022128623-appb-000107
与实际的实体标签
Figure PCTCN2022128623-appb-000108
对比,并将对比结果通过交叉熵损失函数计算,得到实体标签损失L NER。通过将预测的实体标签
Figure PCTCN2022128623-appb-000109
与实际的实体标签
Figure PCTCN2022128623-appb-000110
用交叉熵损失函数计算,可以得到预测值与实际值的差距,即实体标签损失L NER,通过实体标签损失L NER对关系抽取模型优化,可以增强模型对实体意义的理解,提升模型对实体标签的预测能力。
In some embodiments, the preset criteria are actual entity tags
Figure PCTCN2022128623-appb-000106
Convert all words’ predicted entity recognition labels to
Figure PCTCN2022128623-appb-000107
with actual entity tags
Figure PCTCN2022128623-appb-000108
Compare and calculate the comparison results through the cross-entropy loss function to obtain the entity label loss L NER . By placing the predicted entity label
Figure PCTCN2022128623-appb-000109
with actual entity tags
Figure PCTCN2022128623-appb-000110
Calculated using the cross-entropy loss function, the difference between the predicted value and the actual value can be obtained, that is, the entity label loss L NER . Optimizing the relationship extraction model through the entity label loss L NER can enhance the model's understanding of the meaning of entities and improve the model's understanding of entity labels. predictive ability.
在一些实施例中,每个词的实际的实体标签由输入到共享编码器2000中的实体所在的位置决定,当一个词不属于一个实体,则实体标签为第一标签;若一个词是一个实体的开头,则实体标签为第二标签;若一个词在一个实体的中间或者结尾,则实体标签为第三标签。第一标签为O,第二标签为B,第三标 签为I。请参阅图8,示例性地,输入句子为An  air force  pilot is back,输入的两个实体为 air forcepilot,那输入句子中的每个词的实际的实体标签
Figure PCTCN2022128623-appb-000111
就为O B I B O O。通过对输入句子中词语进行标签标记,并将预测的实体标签
Figure PCTCN2022128623-appb-000112
与实际的实体标签
Figure PCTCN2022128623-appb-000113
进行对比,可训练实体识别学习模块201从文本中找到实体的能力,进而加强关系抽取模型对实体的理解能力。
In some embodiments, the actual entity label of each word is determined by the location of the entity input into the shared encoder 2000. When a word does not belong to an entity, the entity label is the first label; if a word is an entity If a word is at the beginning of an entity, the entity tag is the second tag; if a word is in the middle or end of an entity, the entity tag is the third tag. The first label is O, the second label is B, and the third label is I. Please refer to Figure 8. For example, the input sentence is An air force pilot is back, and the two input entities are air force and pilot . Then the actual entity label of each word in the input sentence
Figure PCTCN2022128623-appb-000111
Just O B I B O O. By labeling the words in the input sentence and labeling the predicted entities
Figure PCTCN2022128623-appb-000112
with actual entity tags
Figure PCTCN2022128623-appb-000113
By comparison, the entity recognition learning module 201 can be trained to find entities from text, thereby enhancing the relationship extraction model's ability to understand entities.
请继续参阅图8,在一些实施例中,判别器202获得判别器202损失包括以下步骤:判别器202获取预测的实体识别标签
Figure PCTCN2022128623-appb-000114
与预设的标准对比的结果,得到目标输出
Figure PCTCN2022128623-appb-000115
目标输出
Figure PCTCN2022128623-appb-000116
取值为0或1,方式为:
Please continue to refer to Figure 8. In some embodiments, the discriminator 202 obtaining the discriminator 202 loss includes the following steps: the discriminator 202 obtains the predicted entity recognition label
Figure PCTCN2022128623-appb-000114
The result is compared with the preset standard to obtain the target output.
Figure PCTCN2022128623-appb-000115
target output
Figure PCTCN2022128623-appb-000116
The value is 0 or 1 in the following way:
Figure PCTCN2022128623-appb-000117
Figure PCTCN2022128623-appb-000117
将隐向量
Figure PCTCN2022128623-appb-000118
送入预设的第三全连接层2020和第三SoftMax分类器2021,针对每个词得到一个2维向量,2维向量的每个维度对应目标输出在0和1上的分布概率(例如,
Figure PCTCN2022128623-appb-000119
在0上的预测的分布概率记为P(0|X));通过分布概率计算得到判别器202损失。
will the hidden vector
Figure PCTCN2022128623-appb-000118
The preset third fully connected layer 2020 and the third SoftMax classifier 2021 are sent to obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1 (for example,
Figure PCTCN2022128623-appb-000119
The predicted distribution probability on 0 is recorded as P(0|X)); the discriminator 202 loss is obtained by calculating the distribution probability.
在一些实施例中,针对每个词,计算对数自然损失,并把所有词的损失相加,得到判别器202损失L D。采用以下公式计算: In some embodiments, for each word, a logarithmic natural loss is calculated and the losses for all words are summed to obtain the discriminator 202 loss LD . Calculated using the following formula:
Figure PCTCN2022128623-appb-000120
Figure PCTCN2022128623-appb-000120
判别器202的设置有效地控制了共享编码器2000对实体识别任务的学习程度,避免了共享编码器2000对实体识别任务的过拟合,如果没有判别器202提供的判别器202损失,整体损失在初步优化时会更偏向实体识别损失来优化,从而影响了关系抽取模型对主要任务—关系抽取的性能。可见,判别器202提供的判别器202损失在通过整体损失初步优化时,保证了关系抽取模型的主要性能不被错误地覆盖。The setting of the discriminator 202 effectively controls the learning degree of the shared encoder 2000 for the entity recognition task and avoids the overfitting of the shared encoder 2000 for the entity recognition task. If there is no loss of the discriminator 202 provided by the discriminator 202, the overall loss will be In the initial optimization, entity recognition loss will be preferred for optimization, thus affecting the performance of the relationship extraction model for the main task-relation extraction. It can be seen that when the discriminator 202 loss provided by the discriminator 202 is initially optimized through the overall loss, it ensures that the main performance of the relationship extraction model is not incorrectly covered.
在一些实施例中,在计算整体损失时,引入可调节的控制参数λ,控制参数λ用于控制实体识别学习模块201以及对抗学习的判别器202对模型训练贡献的大小。控制参数λ为整体损失L的计算提供了可调节的选项,在对关系抽取模型进行优化时,通过调节控制参数λ,即可控制对关系抽取模型进行优化的侧重点。可见,控制参数的设置提升了优化方法的实用性。In some embodiments, when calculating the overall loss, an adjustable control parameter λ is introduced, and the control parameter λ is used to control the contribution of the entity recognition learning module 201 and the adversarial learning discriminator 202 to model training. The control parameter λ provides an adjustable option for the calculation of the overall loss L. When optimizing the relationship extraction model, by adjusting the control parameter λ, the focus of optimizing the relationship extraction model can be controlled. It can be seen that the setting of control parameters improves the practicality of the optimization method.
通过下列公式计算整体损失L:Calculate the overall loss L by the following formula:
L=L RE+L D×(λ·L NER); L=L RE +L D ×(λ·L NER );
并在计算出整体损失L后,通过反向传播算法更新关系抽取模型中的所有参数。And after calculating the overall loss L, all parameters in the relationship extraction model are updated through the back propagation algorithm.
请结合图8和图12,在一些实施例中,优化方法还包括以下步骤:在通过整体损失L对关系抽取模型进行优化之后,再通过调整关系抽取模型中部分模块的参数,对关系抽取编码器2001以及所有的全连接层的模型参数进行二次优化更新。可以理解地,通过进行二次优化更新,二次优化后的关系抽取模型能够取得更好的性能。同时,二次优化后的关系抽取模型在使用时与关系抽取的基线模型相同,仅仅需要句子与实体作为输入,不依赖额外的输入,与基线模型相比,强化了性能的同时又未造成额外的使用开销。Please combine Figure 8 and Figure 12. In some embodiments, the optimization method also includes the following steps: after optimizing the relationship extraction model through the overall loss L, and then adjusting the parameters of some modules in the relationship extraction model to encode the relationship extraction 2001 and all fully connected layer model parameters are optimized and updated twice. Understandably, by performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
在一些实施例中,二次优化更新包括以下步骤:通过整体损失L对关系抽取模型优化后的参数,对主模型200中的共享编码器2000和关系抽取编码器2001与第一、第二以及第三全连接层2020和SoftMax分类器进行初始化;In some embodiments, the secondary optimization update includes the following steps: using the overall loss L to optimize the parameters of the relation extraction model, and comparing the shared encoder 2000 and the relation extraction encoder 2001 in the main model 200 with the first, second and The third fully connected layer 2020 and SoftMax classifier are initialized;
采用与初步优化中相同的方式计算出优化后关系类型损失L RECalculate the post-optimization relationship type loss L RE in the same way as in the preliminary optimization;
依据优化后关系类型损失L RE,通过反向传播算法对关系抽取编码器2001和第一、第二以及第三全连接层的模型参数进行二次优化更新,得到最终优化后的关系抽取模型。 According to the optimized relationship type loss L RE , the model parameters of the relationship extraction encoder 2001 and the first, second and third fully connected layers are optimized and updated twice through the back propagation algorithm to obtain the final optimized relationship extraction model.
关于关系抽取模型在关系抽取任务上性能的体现,一般用F值来表示,对于英文关系的抽取数据集上,使用优化方法进行优化前,关系抽取模型的F值为77.04;使用优化方法进行优化后,关系抽取模型的F值为77.70。可见,通过引入实体识别学习模块201,增强了模型对实体的建模能力,从而提升了关系抽取模型在关系抽取任务上的性能。Regarding the performance of the relationship extraction model in the relationship extraction task, it is generally expressed by the F value. For the English relationship extraction data set, before using the optimization method for optimization, the F value of the relationship extraction model is 77.04; using the optimization method for optimization Finally, the F value of the relationship extraction model is 77.70. It can be seen that by introducing the entity recognition learning module 201, the model's ability to model entities is enhanced, thereby improving the performance of the relationship extraction model in the relationship extraction task.
在本申请所提供的实施例中,应理解,“与A对应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。In the embodiments provided in this application, it should be understood that "B corresponding to A" means that B is associated with A, and B can be determined based on A. However, it should also be understood that determining B based on A does not mean determining B only based on A. B can also be determined based on A and/or other information.
与相关技术相比,本申请所提供的用于自然语言模型的优化方法,可以实现:Compared with related technologies, the optimization method for natural language models provided in this application can achieve:
1.本申请实施例提供的一种用于自然语言模型的优化方法中,自然语言模型可以为关系抽取模型,关系抽取模型包括主模型、实体识别学习模块和判别器,主模型包括共享编码器和关系抽取编码器,包括以下步骤:通过主模型中的共享编码器获取输入的语句,并对语句进行编码,输出语句中每个词的隐向量;将隐向量输入关系抽取编码器、实体识别学习模块和判别器,分别得到关系类型损失、实体标签损失以及判别器损失;将关系类型损失、实体标签损失以及判别器损失通过预设的第一算法计算得到整体损失;通过整体损失对关系抽取 模型进行初步优化。通过在初步优化中设置了实体识别模块,使得主模型中的共享编码器实现了对文本中实体的学习,增强了主模型对实体的建模能力,从而提升了关系抽取模型在关系抽取任务上的性能。1. In an optimization method for a natural language model provided by the embodiment of this application, the natural language model can be a relationship extraction model. The relationship extraction model includes a main model, an entity recognition learning module and a discriminator. The main model includes a shared encoder. and the relation extraction encoder, including the following steps: obtain the input sentence through the shared encoder in the main model, encode the sentence, and output the latent vector of each word in the sentence; input the latent vector into the relation extraction encoder and entity recognition The learning module and discriminator obtain the relationship type loss, entity label loss and discriminator loss respectively; the relationship type loss, entity label loss and discriminator loss are calculated through the preset first algorithm to obtain the overall loss; the relationship is extracted through the overall loss The model is initially optimized. By setting up the entity recognition module in the preliminary optimization, the shared encoder in the main model realizes the learning of entities in the text, enhances the main model's modeling ability of entities, and thus improves the relationship extraction model in the relationship extraction task. performance.
2.本申请实施例提供的一种用于自然语言模型的优化方法中,主模型得到关系类型损失包括以下步骤:将隐向量输入主模型中的关系抽取编码器,关系抽取编码器对隐向量编码,得到每个词的关系抽取隐向量;通过对每个词的隐向量和关系抽取隐向量通过预设的第二算法进行计算处理,得到预测的关系类型;将预测的关系类型与预设的标准进行对比计算,得到关系类型损失。将隐向量和编码后的关系抽取隐向量通过第二算法进行处理,可得到当前主模型的预测关系类型,将预测的关系类型与预设的标准比较计算,得到用于表征关系抽取模型预测关系类型能力的关系类型损失,关系类型损失可在通过整体损失对关系抽取模型进行初步优化的过程中,对关系抽取模型的关系预测能力进行优化,保证了优化方法的可靠性。2. In an optimization method for natural language models provided by the embodiment of this application, obtaining the relationship type loss from the main model includes the following steps: inputting the latent vector into the relationship extraction encoder in the main model, and the relationship extraction encoder Encoding, obtain the relationship extraction latent vector of each word; calculate and process the latent vector of each word and the relationship extraction latent vector through the preset second algorithm to obtain the predicted relationship type; compare the predicted relationship type with the preset The standard is compared and calculated to obtain the relationship type loss. The latent vector and the encoded relationship extraction latent vector are processed through the second algorithm to obtain the predicted relationship type of the current main model. The predicted relationship type is compared and calculated with the preset standard to obtain the predicted relationship used to represent the relationship extraction model. Relationship type loss of type ability. Relationship type loss can optimize the relationship prediction ability of the relationship extraction model in the process of preliminary optimization of the relationship extraction model through overall loss, ensuring the reliability of the optimization method.
3.本申请实施例提供的一种用于自然语言模型的优化方法中,预设的第二算法包括以下步骤:通过预设的向量算法对关系抽取隐向量进行计算,得到第一实体的向量表征、第二实体的向量表征以及关系抽取编码器的句子表征,同时将预设的向量算法应用于隐向量,得到共享编码器的句子表征;将第一实体的向量表征、第二实体的向量表征、句子的向量表征以及共享编码器的句子表征串联,得到中间向量;中间向量经过一个全连接层后送入SoftMax分类器,得到预测的关系类型。通过预设的向量算法对隐向量和关系抽取隐向量进行计算,得到的第一实体的向量表征、第二实体的向量表征、关系抽取编码器的句子表征以及共享编码器的句子表征作为中间变量,反映了实体或句子在关系抽取模型中所处不同位置的向量表征,上述的向量表征在串联之后得到的中间向量,在经过全连接层和SoftMax的分类和归一化处理后,得到了主模型对两个实体的预测的关系类型(标量),将预测的关系类型以标量的形式体现,便于后续对关系类型损失的计算。3. In an optimization method for natural language models provided by the embodiment of the present application, the preset second algorithm includes the following steps: calculating the relationship extraction latent vector through the preset vector algorithm to obtain the vector of the first entity representation, the vector representation of the second entity, and the sentence representation of the relation extraction encoder. At the same time, the preset vector algorithm is applied to the latent vector to obtain the sentence representation of the shared encoder; the vector representation of the first entity, the vector of the second entity The representation, the vector representation of the sentence and the sentence representation of the shared encoder are concatenated to obtain an intermediate vector; the intermediate vector is sent to the SoftMax classifier after passing through a fully connected layer to obtain the predicted relationship type. The latent vector and the relationship extraction latent vector are calculated through the preset vector algorithm, and the obtained vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder are used as intermediate variables , which reflects the vector representation of different positions of entities or sentences in the relationship extraction model. The above-mentioned vector representations are concatenated and the intermediate vectors are obtained. After classification and normalization by the fully connected layer and SoftMax, the main vector is obtained. The relationship type (scalar) predicted by the model for two entities is reflected in the form of a scalar to facilitate subsequent calculation of the relationship type loss.
4.本申请实施例提供的一种用于自然语言模型的优化方法中,实体识别学习模块包括实体编码器,实体识别学习模块得到实体标签损失包括以下步骤:将隐向量输入到实体识别学习模块中的实体编码器,实体编码器对隐向量编码,得到每个词的实体识别隐向量;通过对每个词的隐向量和实体识别隐向量进行转换处理,得到预测的实体识别标签;将所有词的预测的实体识别标签与预设的标准对比计算,得到实体标签损失。通过实体识别编码器对隐向量的编码得到了每个词的实体识别隐向量,并对每个词的隐向量和实体识别隐向量进行转换处理,转换为用于一个词处于实体的大致位置的预测的实体识别标签(标量)。可见,将预测的实体识别标签以标量的形式体现,更容易在后于对实体识别损 失进行计算。4. In an optimization method for a natural language model provided by the embodiment of the present application, the entity recognition learning module includes an entity encoder, and obtaining the entity label loss by the entity recognition learning module includes the following steps: inputting the latent vector into the entity recognition learning module The entity encoder in The predicted entity recognition label of the word is compared with the preset standard to obtain the entity label loss. The entity recognition latent vector of each word is obtained by encoding the latent vector by the entity recognition encoder, and the latent vector of each word and the entity recognition latent vector are converted into the approximate position of a word in the entity. Predicted entity identification label (scalar). It can be seen that embodying the predicted entity recognition label in the form of a scalar makes it easier to calculate the entity recognition loss later.
5.本申请实施例提供的一种用于自然语言模型的优化方法中,转换处理包括以下步骤:将每一个词的隐向量与实体识别隐向量串联,并把串联后得到的向量送入一个全连接层和SoftMax分类器,从而得到预测的实体识别标签。将串联后的向量通过全连接层和SoftMax分类器分类并转换为预测的实体识别标签的方式,与主模型中将中间向量转换为预测的关系类的方式相同,将两种方式统一更便于对关系抽取模型的建立、优化和维护。5. In an optimization method for natural language models provided by the embodiment of this application, the conversion process includes the following steps: concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector into a Fully connected layer and SoftMax classifier to obtain predicted entity recognition labels. The way in which the concatenated vectors are classified and converted into predicted entity recognition labels through the fully connected layer and SoftMax classifier is the same as the way in which the intermediate vectors are converted into predicted relationship classes in the main model. It is easier to unify the two methods. Establishment, optimization and maintenance of relationship extraction models.
6.本申请实施例提供的一种用于自然语言模型的优化方法中,预设的标准为实际的实体标签,将所有词的预测的实体识别标签与实际的实体标签对比,并将对比结果通过交叉熵损失函数计算,得到实体标签损失。通过将预测的实体标签与实际的实体标签用交叉熵损失函数计算,可以得到预测值与实际值的差距,即实体标签损失,通过实体标签损失对关系抽取模型优化,可以增强模型对实体意义的理解,提升模型对实体标签的预测能力。6. In an optimization method for natural language models provided by the embodiment of this application, the preset standard is the actual entity label, the predicted entity identification labels of all words are compared with the actual entity labels, and the comparison results are The entity label loss is obtained by calculating the cross-entropy loss function. By calculating the predicted entity label and the actual entity label using the cross entropy loss function, the gap between the predicted value and the actual value can be obtained, that is, the entity label loss. Optimizing the relationship extraction model through the entity label loss can enhance the model's understanding of the entity meaning. Understand and improve the model's prediction ability for entity labels.
7.本申请实施例提供的一种用于自然语言模型的优化方法中,每个词的实际的实体标签由输入中实体所在的位置决定,当一个词不属于一个实体,则实体标签为第一标签;若一个词是一个实体的开头,则实体标签为第二标签;若一个词在一个实体的中间或者结尾,则实体标签为第三标签。通过在对句子中词语进行标签标记,可训练实体识别学习模块从文本中找到实体的能力,进而加强关系抽取模型对实体的理解能力。7. In an optimization method for natural language models provided by the embodiment of this application, the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the entity label is the One label; if a word is the beginning of an entity, the entity label is the second label; if a word is in the middle or end of an entity, the entity label is the third label. By tagging words in sentences, the entity recognition learning module can be trained to find entities in the text, thereby enhancing the relationship extraction model's ability to understand entities.
8.本申请实施例提供的一种用于自然语言模型的优化方法中,判别器获得判别器损失包括以下步骤:判别器获取预测的实体识别标签与预设的标准对比的结果,得到目标输出,目标输出取值为0或1;将隐向量送入一个全连接层和SoftMax分类器,针对每个词得到一个2维向量,2维向量的每个维度对应目标输出在0和1上的分布概率;通过分布概率计算得到判别器损失。判别器的设置有效地控制了共享编码器对实体识别任务的学习程度,避免了共享编码器对实体识别任务的过拟合,如果没有判别器提供的判别器损失,整体损失在初步优化时会更偏向实体识别损失来优化,从而影响了关系抽取模型对主要任务—关系抽取的性能。可见,判别器提供的判别器损失在通过整体损失初步优化时,保证了关系抽取模型的主要性能不被错误地覆盖。8. In an optimization method for natural language models provided by the embodiment of the present application, the discriminator obtaining the discriminator loss includes the following steps: the discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains the target output. , the target output value is 0 or 1; send the latent vector to a fully connected layer and SoftMax classifier, and obtain a 2-dimensional vector for each word. Each dimension of the 2-dimensional vector corresponds to the target output between 0 and 1. Distribution probability; the discriminator loss is obtained by calculating the distribution probability. The setting of the discriminator effectively controls the learning degree of the shared encoder for the entity recognition task and avoids the overfitting of the shared encoder for the entity recognition task. Without the discriminator loss provided by the discriminator, the overall loss will be reduced during the initial optimization. It prefers entity recognition loss to optimize, thus affecting the performance of the relationship extraction model for the main task-relation extraction. It can be seen that the discriminator loss provided by the discriminator ensures that the main performance of the relationship extraction model is not incorrectly covered by the overall loss.
9.本申请实施例提供的一种用于自然语言模型的优化方法中,在计算整体损失时,引入可调节的控制参数,控制参数用于控制实体识别学习模块以及对抗学习的判别器对模型训练贡献的大小。控制参数为整体损失的计算提供了可调节的选项,在对关系抽取模型进行优化时,通过调节控制参数,即可控制对关系抽取模型进行优化的侧重点。可见,控制参数的设置提升了优化方法的实用 性。9. In an optimization method for a natural language model provided by the embodiment of this application, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the entity recognition learning module and the discriminator pair model of adversarial learning. The size of the training contribution. The control parameters provide adjustable options for the calculation of the overall loss. When optimizing the relationship extraction model, by adjusting the control parameters, you can control the focus of optimizing the relationship extraction model. It can be seen that the setting of control parameters improves the practicality of the optimization method.
10.本申请实施例提供的一种用于关系自然语言的优化方法还包括以下步骤:在通过整体损失对上述关系抽取模型进行优化之后,再通过调整关系抽取模型中部分模块的参数,对关系抽取编码器以及全连接层的模型参数进行二次优化更新。通过进行二次优化更新,二次优化后的关系抽取模型能够取得更好的性能。同时,二次优化后的关系抽取模型在使用时与关系抽取的基线模型相同,仅仅需要句子与实体作为输入,不依赖额外的输入,与基线模型相比,强化了性能的同时又未造成额外的使用开销。10. An optimization method for relational natural language provided by the embodiment of the present application also includes the following steps: after optimizing the above-mentioned relation extraction model through overall loss, and then adjusting the parameters of some modules in the relation extraction model to optimize the relation Extract the model parameters of the encoder and fully connected layer for secondary optimization and update. By performing secondary optimization updates, the secondary optimized relationship extraction model can achieve better performance. At the same time, the relationship extraction model after secondary optimization is used the same as the baseline model of relationship extraction. It only requires sentences and entities as input and does not rely on additional input. Compared with the baseline model, it enhances performance without causing additional usage costs.
如图13所示,本申请实施例还提供一种用于自然语言模型的优化装置,所述自然语言模型包括主模型、增强模块和判别器,所述主模型包括第一编码器和第二编码器,所述装置包括:第一模块610,设置为通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;第二模块620,设置为将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;第三模块630,设置为对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;第四模块640,设置为通过所述整体损失对所述自然语言模型进行初步优化。As shown in Figure 13, the embodiment of the present application also provides an optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second Encoder, the device includes: a first module 610, configured to obtain an input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence; a second module 620 , is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively; the third module 630 is configured to input the target result loss, enhancement loss and discriminator loss respectively. The result loss, the enhancement loss and the discriminator loss are calculated through a preset first algorithm to obtain the overall loss; the fourth module 640 is configured to perform preliminary optimization of the natural language model through the overall loss.
一实施例中,所述自然语言模型为情感分析模型,所述增强模块为句子复原模块,所述第一编码器为任务无关编码器,所述第二编码器为情感分析编码器,所述主模块还包括补充学习编码器,所述目标结果损失为情感极性损失,所述增强损失为补充学习损失;第二模块620,设置为:In one embodiment, the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, the second encoder is a sentiment analysis encoder, and the The main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss; the second module 620 is set to:
将所述隐向量输入所述补充学习编码器,得到补充学习隐向量,并将所述补充学习隐向量通过向量算法计算得到方面词的向量表征;将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到所述情感极性损失;将所述补充学习隐向量输入所述句子复原模块,得到中间结果以及所述补充学习损失;将所述隐向量以及所述中间结果输入所述判别器,得到所述判别器损失。Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word; combine the latent vector with the vector of the aspect word The representation is input into the sentiment analysis encoder to obtain the emotional polarity loss; the supplementary learning latent vector is input into the sentence restoration module to obtain the intermediate result and the supplementary learning loss; the latent vector and the intermediate The result is input into the discriminator and the discriminator loss is obtained.
一实施例中,第二模块620,设置为:In one embodiment, the second module 620 is configured as:
将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到词汇的情感分析向量表征以及方面词的情感分析向量表征;对所述隐向量、所述词汇的情感分析向量表征以及所述方面词情感分析向量表征依据预设的第二算法进行计算,得到所述情感极性损失。The latent vector and the vector representation of the aspect word are input into the sentiment analysis encoder to obtain the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word; the latent vector and the sentiment analysis vector of the vocabulary are obtained The representation and the aspect word sentiment analysis vector representation are calculated according to a preset second algorithm to obtain the sentiment polarity loss.
一实施例中,所述预设的第二算法包括:In one embodiment, the preset second algorithm includes:
对所述隐向量依据预设的向量算法计算得到任务无关的句子表征;对所述词汇的情感分析向量表征以及所述方面词的情感分析向量表征依据预设的向量算法计算得到情感分析句子表征;将所述任务无关的句子表征与所述情感分析句子表征串联得到中间向量;将所述中间向量输入预设的第一全连接层,并将所述第一全连接层的输出结果输入预设的SoftMax分类器,得到预测的情感极性;将所述预测的情感极性与预设的标准进行对比计算得到所述情感极性损失。The latent vector is calculated according to the preset vector algorithm to obtain a task-independent sentence representation; the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation. ; Connect the task-independent sentence representation and the sentiment analysis sentence representation in series to obtain an intermediate vector; input the intermediate vector into the preset first fully connected layer, and input the output result of the first fully connected layer into the preset Set the SoftMax classifier to obtain the predicted emotional polarity; compare the predicted emotional polarity with the preset standard to calculate the emotional polarity loss.
一实施例中,所述句子复原模块包括补充学习解码器,第二模块620,设置为:In one embodiment, the sentence restoration module includes a supplementary learning decoder, and the second module 620 is configured as:
将所述补充学习隐向量输入所述补充学习解码器重构所述输入语句,得到预测的词汇;将所述预测的词汇与所述输入语句中的词汇进行对比得到所述中间结果,并根据预设的第三算法计算得到所述补充学习损失。The supplementary learning latent vector is input into the supplementary learning decoder to reconstruct the input sentence to obtain the predicted vocabulary; the predicted vocabulary is compared with the vocabulary in the input sentence to obtain the intermediate result, and the intermediate result is obtained according to A preset third algorithm calculates the supplementary learning loss.
一实施例中,所述预设的第三算法包括:In one embodiment, the preset third algorithm includes:
计算一个位置的词的负对数自然损失,并将所有位置的词的负对数自然损失的词求和,得到所述补充学习损失。Calculate the negative log natural loss of words at one position and sum the negative log natural loss of words at all positions to obtain the supplementary learning loss.
一实施例中,第二模块620,设置为:In one embodiment, the second module 620 is configured as:
依据所述隐向量以及所述中间结果,得到所述判别器的目标输出;对所述目标输出依据预设的第四算法进行计算,得到所述判别器损失。According to the latent vector and the intermediate result, the target output of the discriminator is obtained; the target output is calculated according to a preset fourth algorithm to obtain the discriminator loss.
一实施例中,所述目标输出的取值为0或1,第二模块620,设置为:In one embodiment, the value of the target output is 0 or 1, and the second module 620 is set to:
将所述补充学习隐向量输入预设的第二全连接层,并将所述第二全连接层的输出结果输入到预设的第二SoftMax分类器,针对每个词得到一个2维向量,所述2维向量的每个维度对应所述目标输出在0和1上的分布概率;依据所述分布概率得到所述判别器损失。Input the supplementary learning latent vector into the preset second fully connected layer, and input the output result of the second fully connected layer into the preset second SoftMax classifier to obtain a 2-dimensional vector for each word, Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1; the discriminator loss is obtained based on the distribution probability.
一实施例中,在计算所述整体损失的情况下,引入可调节的控制参数,所述控制参数用于控制所述句子复原模块以及对抗学习的所述判别器对模型训练贡献的大小。In one embodiment, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the contribution of the sentence restoration module and the discriminator of adversarial learning to model training.
一实施例中,还包括第五模块,设置为:In one embodiment, a fifth module is also included, configured as:
在所述通过所述整体损失对所述自然语言模型进行初步优化之后,通过调整所述情感分析模型中部分模块的参数,对所述情感分析模型中的参数进行二次优化更新。After the preliminary optimization of the natural language model through the overall loss, a second optimization and update of the parameters in the sentiment analysis model is performed by adjusting the parameters of some modules in the sentiment analysis model.
一实施例中,第五模块,设置为:In one embodiment, the fifth module is configured as:
通过所述整体损失对所述情感分析模型优化后的参数,对所述任务无关编 码器、所述补充学习编码器、所述情感分析编码器、预设的向量算法、第一全连接层、第二全连接层和SoftMax分类器进行初始化;采用与初步优化中相同的方式计算出优化后的情感极性损失;依据所述优化后的情感极性损失,通过反向传播算法对所述情感分析编码器、所述第一全连接层和所述第二全连接层中的模型参数进行二次优化更新,得到最终优化后的情感分析模型。The optimized parameters of the sentiment analysis model through the overall loss, the task-independent encoder, the supplementary learning encoder, the sentiment analysis encoder, the preset vector algorithm, the first fully connected layer, The second fully connected layer and the SoftMax classifier are initialized; the optimized emotional polarity loss is calculated in the same way as in the preliminary optimization; based on the optimized emotional polarity loss, the emotional polarity is calculated through the back propagation algorithm. The model parameters in the analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice to obtain the final optimized sentiment analysis model.
一实施例中,所述自然语言模型为关系抽取模型,所述增强模块为实体识别学习模块,所述第一编码器为共享编码器,所述第二编码器为关系抽取编码器,所述目标结果损失为关系类型损失,所述增强损失为实体标签损失;第二模块620,设置为:In one embodiment, the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, the second encoder is a relationship extraction encoder, and the The target result loss is a relationship type loss, and the enhanced loss is an entity label loss; the second module 620 is set to:
将所述隐向量输入所述关系抽取编码器、所述实体识别学习模块和所述判别器,分别得到所述关系类型损失、所述实体标签损失以及所述判别器损失。The latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
一实施例中,所述关系类型损失通过如下方式获得:In one embodiment, the relationship type loss is obtained in the following manner:
将所述隐向量输入所述主模型中的关系抽取编码器,所述关系抽取编码器对所述隐向量进行编码,得到每个词的关系抽取隐向量;对每个词的隐向量和关系抽取隐向量通过预设的第二算法进行计算处理,得到预测的关系类型;将所述预测的关系类型与预设的标准进行对比计算,得到所述关系类型损失。The latent vector is input into the relationship extraction encoder in the main model, and the relationship extraction encoder encodes the latent vector to obtain the relationship extraction latent vector of each word; the latent vector of each word and the relationship The latent vector is extracted and processed through a preset second algorithm to obtain a predicted relationship type; the predicted relationship type is compared and calculated with a preset standard to obtain the relationship type loss.
一实施例中,所述预设的第二算法,包括:In one embodiment, the preset second algorithm includes:
通过预设的向量算法对关系抽取隐向量进行计算,得到第一实体的向量表征、第二实体的向量表征以及关系抽取编码器的句子表征,并将预设的向量算法应用于所述隐向量,得到共享编码器的句子表征;将所述第一实体的向量表征、所述第二实体的向量表征、所述关系抽取编码器的句子表征以及所述共享编码器的句子表征串联,得到中间向量;所述中间向量经过预设的第一全连接层后送入预设的第一SoftMax分类器,得到所述预测的关系类型。The relationship extraction latent vector is calculated through the preset vector algorithm to obtain the vector representation of the first entity, the vector representation of the second entity and the sentence representation of the relationship extraction encoder, and the preset vector algorithm is applied to the latent vector , obtain the sentence representation of the shared encoder; concatenate the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder to obtain the intermediate Vector; the intermediate vector passes through the preset first fully connected layer and then is sent to the preset first SoftMax classifier to obtain the predicted relationship type.
一实施例中,所述实体识别学习模块包括实体编码器,所述实体标签损失通过如下方式获得:In one embodiment, the entity recognition learning module includes an entity encoder, and the entity label loss is obtained as follows:
将所述隐向量输入到所述实体识别学习模块中的实体编码器,所述实体编码器对所述隐向量进行编码,得到每个词的实体识别隐向量;通过对每个词的隐向量和实体识别隐向量进行转换处理,得到预测的实体识别标签;将所有词的预测的实体识别标签与预设的标准对比计算,得到所述实体标签损失。The latent vector is input to the entity encoder in the entity recognition learning module, and the entity encoder encodes the latent vector to obtain the entity recognition latent vector of each word; by encoding the latent vector of each word Perform conversion processing with entity recognition latent vectors to obtain predicted entity recognition labels; compare and calculate the predicted entity recognition labels of all words with preset standards to obtain the entity label loss.
一实施例中,第二模块620,设置为:In one embodiment, the second module 620 is configured as:
将每一个词的隐向量与实体识别隐向量串联,并把串联后得到的向量送入预设的第二全连接层和预设的第二SoftMax分类器,得到所述预测的实体识别 标签。Concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector to the preset second fully connected layer and the preset second SoftMax classifier to obtain the predicted entity recognition label.
一实施例中,第二模块620,设置为::In one embodiment, the second module 620 is set to::
将所有词的预测的实体识别标签与所述实际的实体标签对比,并将对比结果通过交叉熵损失函数计算,得到所述实体标签损失。Compare the predicted entity recognition labels of all words with the actual entity labels, and calculate the comparison results through a cross-entropy loss function to obtain the entity label loss.
一实施例中,每个词的实际的实体标签由输入中实体所在的位置决定,在一个词不属于一个实体的情况下,所述一个词的实际的实体标签为第一标签;在一个词是一个实体的开头的情况下,所述一个词的实际的实体标签为第二标签;在一个词在一个实体的中间或者结尾的情况下,所述一个词的实际的实体标签为第三标签。In one embodiment, the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the actual entity label of a word is the first label; when a word If it is the beginning of an entity, the actual entity tag of the word is the second tag; if the word is in the middle or end of an entity, the actual entity tag of the word is the third tag. .
一实施例中,所述判别器损失通过如下方式获得:In one embodiment, the discriminator loss is obtained as follows:
所述判别器获取所述预测的实体识别标签与预设的标准对比的结果,得到目标输出,所述目标输出取值为0或1;将所述隐向量送入预设的第三全连接层和第三SoftMax分类器,针对每个词得到一个2维向量,其中,所述2维向量的每个维度对应所述目标输出在0和1上的分布概率;通过所述分布概率计算得到所述判别器损失。The discriminator obtains the result of comparing the predicted entity recognition label with the preset standard, and obtains a target output. The target output value is 0 or 1; the latent vector is sent to the preset third fully connected layer and the third SoftMax classifier, a 2-dimensional vector is obtained for each word, where each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1; calculated through the distribution probability The discriminator loss.
一实施例中,在计算整体损失的情况下,引入可调节的控制参数,所述控制参数用于控制所述实体识别学习模块以及对抗学习的所述判别器对模型训练贡献的大小。In one embodiment, when calculating the overall loss, adjustable control parameters are introduced, and the control parameters are used to control the contribution of the entity recognition learning module and the discriminator of adversarial learning to model training.
一实施例中,还包括第六模块,设置为:In one embodiment, a sixth module is also included, configured as:
在所述通过所述整体损失对所述自然语言模型进行初步优化之后,通过调整所述关系抽取模型中部分模块的参数,对所述关系抽取编码器以及所述第四全连接层的模型参数进行二次优化更新。After the preliminary optimization of the natural language model through the overall loss, by adjusting the parameters of some modules in the relationship extraction model, the model parameters of the relationship extraction encoder and the fourth fully connected layer are adjusted. Perform secondary optimization updates.
本申请实施例提供的装置可以实现上述方法实施例中的方法步骤,且具有相同的技术效果。The device provided by the embodiment of the present application can implement the method steps in the above method embodiment and has the same technical effect.
如图14所示,本申请实施例还提供一种电子设备,包括处理器710以及存储器720,所述存储器720中的计算机程序被所述处理器710执行时实现上述的用于自然语言模型的优化方法。As shown in Figure 14, this embodiment of the present application also provides an electronic device, including a processor 710 and a memory 720. When the computer program in the memory 720 is executed by the processor 710, the above-mentioned method for the natural language model is implemented. Optimization.
本申请实施例还提供一种计算机可读储存介质,计算机可读存储介质上存储计算机程序,计算机程序被处理器执行时实现上述的用于自然语言模型的优化方法。Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the above-mentioned optimization method for a natural language model is implemented.
本文中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在本文本中出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定特征、结构或特性可以以任意适合的方式结合在一个或多个实施例中。本领域技术人员也应该知悉,本文中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本申请所必须的。Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic associated with the embodiment is included in at least one embodiment of the present application. Thus, appearances of "in one embodiment" or "in an embodiment" throughout this text are not necessarily referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art should also know that the embodiments described in this article are optional embodiments, and the actions and modules involved are not necessarily necessary for this application.
在本申请的多种实施例中,上述定稿过程的序号的大小并不意味着执行顺序的必然先后,定稿过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。In the various embodiments of the present application, the size of the serial numbers of the above-mentioned finalization process does not necessarily mean the order of execution. The execution order of the finalization process should be determined by its functions and internal logic, and should not be used in the implementation of the embodiments of the present application. The process constitutes any limitation.
在本申请的附图中的流程图和框图,图示了按照本申请多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现方案中,方框中所标注的功能也可以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,在此基于涉及的功能而确定。框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the drawings of this application illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products according to various embodiments of this application. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending upon the functionality involved. Each block in the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, may be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented using special purpose hardware implemented in combination with computer instructions.

Claims (24)

  1. 一种用于自然语言模型的优化方法,所述自然语言模型包括主模型、增强模块和判别器,所述主模型包括第一编码器和第二编码器,所述方法包括:An optimization method for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The method includes:
    通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;Obtain the input sentence through the first encoder, encode the sentence, and output the hidden vector of each word in the sentence;
    将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;
    对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;The overall loss is calculated through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;
    通过所述整体损失对所述自然语言模型进行初步优化。The natural language model is initially optimized through the overall loss.
  2. 如权利要求1所述的优化方法,其中,所述自然语言模型为情感分析模型,所述增强模块为句子复原模块,所述第一编码器为任务无关编码器,所述第二编码器为情感分析编码器,所述主模块还包括补充学习编码器,所述目标结果损失为情感极性损失,所述增强损失为补充学习损失;The optimization method of claim 1, wherein the natural language model is a sentiment analysis model, the enhancement module is a sentence restoration module, the first encoder is a task-independent encoder, and the second encoder is Sentiment analysis encoder, the main module also includes a supplementary learning encoder, the target result loss is an emotional polarity loss, and the enhancement loss is a supplementary learning loss;
    所述将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:
    将所述隐向量输入所述补充学习编码器,得到补充学习隐向量,并将所述补充学习隐向量通过向量算法计算得到方面词的向量表征;Input the latent vector into the supplementary learning encoder to obtain a supplementary learning latent vector, and calculate the supplementary learning latent vector through a vector algorithm to obtain a vector representation of the aspect word;
    将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到所述情感极性损失;Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss;
    将所述补充学习隐向量输入所述句子复原模块,得到中间结果以及所述补充学习损失;Input the supplementary learning latent vector into the sentence restoration module to obtain the intermediate result and the supplementary learning loss;
    将所述隐向量以及所述中间结果输入所述判别器,得到所述判别器损失。The latent vector and the intermediate result are input into the discriminator to obtain the discriminator loss.
  3. 如权利要求2所述的优化方法,其中,所述将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到所述情感极性损失,包括:The optimization method according to claim 2, wherein inputting the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment polarity loss includes:
    将所述隐向量与所述方面词的向量表征输入所述情感分析编码器,得到词汇的情感分析向量表征以及方面词的情感分析向量表征;Input the latent vector and the vector representation of the aspect word into the sentiment analysis encoder to obtain the sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word;
    对所述隐向量、所述词汇的情感分析向量表征以及所述方面词情感分析向量表征依据预设的第二算法进行计算,得到所述情感极性损失。The latent vector, the sentiment analysis vector representation of the vocabulary, and the sentiment analysis vector representation of the aspect word are calculated according to a preset second algorithm to obtain the sentiment polarity loss.
  4. 如权利要求3所述的优化方法,其中,所述预设的第二算法包括:The optimization method as claimed in claim 3, wherein the preset second algorithm includes:
    对所述隐向量依据预设的向量算法计算得到任务无关的句子表征;Calculate the latent vector according to a preset vector algorithm to obtain a task-independent sentence representation;
    对所述词汇的情感分析向量表征以及所述方面词的情感分析向量表征依据预设的向量算法计算得到情感分析句子表征;The sentiment analysis vector representation of the vocabulary and the sentiment analysis vector representation of the aspect word are calculated according to the preset vector algorithm to obtain the sentiment analysis sentence representation;
    将所述任务无关的句子表征与所述情感分析句子表征串联得到中间向量;Concatenate the task-irrelevant sentence representation and the sentiment analysis sentence representation to obtain an intermediate vector;
    将所述中间向量输入预设的第一全连接层,并将所述第一全连接层的输出结果输入预设的SoftMax分类器,得到预测的情感极性;Input the intermediate vector into the preset first fully connected layer, and input the output result of the first fully connected layer into the preset SoftMax classifier to obtain the predicted emotional polarity;
    将所述预测的情感极性与预设的标准进行对比计算得到所述情感极性损失。The emotional polarity loss is calculated by comparing the predicted emotional polarity with a preset standard.
  5. 如权利要求2所述的优化方法,其中,所述句子复原模块包括补充学习解码器,所述将所述补充学习隐向量输入所述句子复原模块,得到中间结果以及所述补充学习损失,包括:The optimization method according to claim 2, wherein the sentence restoration module includes a supplementary learning decoder, and the supplementary learning latent vector is input into the sentence restoration module to obtain an intermediate result and the supplementary learning loss, including :
    将所述补充学习隐向量输入所述补充学习解码器重构所述输入语句,得到预测的词汇;Input the supplementary learning latent vector into the supplementary learning decoder to reconstruct the input sentence and obtain the predicted vocabulary;
    将所述预测的词汇与所述输入语句中的词汇进行对比得到所述中间结果,并根据预设的第三算法计算得到所述补充学习损失。The predicted vocabulary is compared with the vocabulary in the input sentence to obtain the intermediate result, and the supplementary learning loss is calculated according to a preset third algorithm.
  6. 如权利要求5所述的优化方法,其中,所述预设的第三算法包括:The optimization method according to claim 5, wherein the preset third algorithm includes:
    计算一个位置的词的负对数自然损失,并将所有位置的词的负对数自然损失的词求和,得到所述补充学习损失。Calculate the negative log natural loss of the word at one position, and sum the negative log natural loss of the word at all positions to obtain the supplementary learning loss.
  7. 如权利要求2所述的优化方法,其中,所述将所述隐向量以及所述中间结果输入所述判别器,得到所述判别器损失,包括:The optimization method according to claim 2, wherein inputting the latent vector and the intermediate result into the discriminator to obtain the discriminator loss includes:
    依据所述隐向量以及所述中间结果,得到所述判别器的目标输出;According to the latent vector and the intermediate result, the target output of the discriminator is obtained;
    对所述目标输出依据预设的第四算法进行计算,得到所述判别器损失。The target output is calculated according to a preset fourth algorithm to obtain the discriminator loss.
  8. 如权利要求7所述的优化方法,其中,所述目标输出的取值为0或1,所述对所述目标输出依据预设的第四算法进行计算,得到所述判别器损失,包括:The optimization method according to claim 7, wherein the value of the target output is 0 or 1, and the target output is calculated according to a preset fourth algorithm to obtain the discriminator loss, including:
    将所述补充学习隐向量输入预设的第二全连接层,并将所述第二全连接层的输出结果输入到预设的第二SoftMax分类器,针对每个词得到一个2维向量,所述2维向量的每个维度对应所述目标输出在0和1上的分布概率;Input the supplementary learning latent vector into the preset second fully connected layer, and input the output result of the second fully connected layer into the preset second SoftMax classifier to obtain a 2-dimensional vector for each word, Each dimension of the 2-dimensional vector corresponds to the distribution probability of the target output on 0 and 1;
    依据所述分布概率得到所述判别器损失。The discriminator loss is obtained based on the distribution probability.
  9. 如权利要求2所述的优化方法,其中,在计算所述整体损失的情况下,引入可调节的控制参数,所述控制参数用于控制所述句子复原模块以及对抗学习的所述判别器对模型训练贡献的大小。The optimization method according to claim 2, wherein in the case of calculating the overall loss, an adjustable control parameter is introduced, the control parameter is used to control the sentence restoration module and the discriminator pair of adversarial learning. The size of the model training contribution.
  10. 如权利要求2所述的优化方法,在所述通过所述整体损失对所述自然语言模型进行初步优化之后,还包括:The optimization method according to claim 2, after the preliminary optimization of the natural language model through the overall loss, further comprising:
    通过调整所述情感分析模型中部分模块的参数,对所述情感分析模型中的参数进行二次优化更新。By adjusting the parameters of some modules in the sentiment analysis model, the parameters in the sentiment analysis model are optimized and updated twice.
  11. 如权利要求10所述的优化方法,其中,所述通过调整所述情感分析模型中部分模块的参数,对所述情感分析模型中的参数进行二次优化,包括:The optimization method according to claim 10, wherein the second optimization of the parameters in the sentiment analysis model by adjusting the parameters of some modules in the sentiment analysis model includes:
    通过所述整体损失对所述情感分析模型优化后的参数,对所述任务无关编码器、所述补充学习编码器、所述情感分析编码器、预设的向量算法、第一全连接层、第二全连接层和SoftMax分类器进行初始化;The optimized parameters of the sentiment analysis model through the overall loss, the task-independent encoder, the supplementary learning encoder, the sentiment analysis encoder, the preset vector algorithm, the first fully connected layer, The second fully connected layer and SoftMax classifier are initialized;
    采用与初步优化中相同的方式计算出优化后的情感极性损失;Calculate the optimized emotional polarity loss in the same way as in the preliminary optimization;
    依据所述优化后的情感极性损失,通过反向传播算法对所述情感分析编码器、所述第一全连接层和所述第二全连接层中的模型参数进行二次优化更新,得到最终优化后的情感分析模型。According to the optimized emotional polarity loss, the model parameters in the emotional analysis encoder, the first fully connected layer and the second fully connected layer are optimized and updated twice through the back propagation algorithm, and we obtain The final optimized sentiment analysis model.
  12. 如权利要求1所述的优化方法,其中,所述自然语言模型为关系抽取模型,所述增强模块为实体识别学习模块,所述第一编码器为共享编码器,所述第二编码器为关系抽取编码器,所述目标结果损失为关系类型损失,所述增强损失为实体标签损失;The optimization method according to claim 1, wherein the natural language model is a relationship extraction model, the enhancement module is an entity recognition learning module, the first encoder is a shared encoder, and the second encoder is Relation extraction encoder, the target result loss is a relationship type loss, and the enhancement loss is an entity label loss;
    所述将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失,包括:Input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively, including:
    将所述隐向量输入所述关系抽取编码器、所述实体识别学习模块和所述判别器,分别得到所述关系类型损失、所述实体标签损失以及所述判别器损失。The latent vector is input into the relationship extraction encoder, the entity recognition learning module and the discriminator to obtain the relationship type loss, the entity label loss and the discriminator loss respectively.
  13. 如权利要求12所述的方法,其中,所述关系类型损失通过如下方式获得:The method of claim 12, wherein the relationship type loss is obtained as follows:
    将所述隐向量输入所述主模型中的关系抽取编码器,所述关系抽取编码器对所述隐向量进行编码,得到每个词的关系抽取隐向量;Input the latent vector into the relationship extraction encoder in the main model, and the relationship extraction encoder encodes the latent vector to obtain the relationship extraction latent vector of each word;
    对每个词的隐向量和关系抽取隐向量通过预设的第二算法进行计算处理,得到预测的关系类型;The latent vector and relationship extraction latent vector of each word are calculated and processed through the preset second algorithm to obtain the predicted relationship type;
    将所述预测的关系类型与预设的标准进行对比计算,得到所述关系类型损失。The predicted relationship type is compared and calculated with a preset standard to obtain the relationship type loss.
  14. 如权利要求13所述的方法,其中,所述预设的第二算法,包括:The method of claim 13, wherein the preset second algorithm includes:
    通过预设的向量算法对关系抽取隐向量进行计算,得到第一实体的向量表征、第二实体的向量表征以及关系抽取编码器的句子表征,并将预设的向量算法应用于所述隐向量,得到共享编码器的句子表征;The relationship extraction latent vector is calculated through the preset vector algorithm to obtain the vector representation of the first entity, the vector representation of the second entity and the sentence representation of the relationship extraction encoder, and the preset vector algorithm is applied to the latent vector , obtain the sentence representation of the shared encoder;
    将所述第一实体的向量表征、所述第二实体的向量表征、所述关系抽取编 码器的句子表征以及所述共享编码器的句子表征串联,得到中间向量;Concatenate the vector representation of the first entity, the vector representation of the second entity, the sentence representation of the relationship extraction encoder and the sentence representation of the shared encoder to obtain an intermediate vector;
    所述中间向量经过预设的第一全连接层后送入预设的第一SoftMax分类器,得到所述预测的关系类型。The intermediate vector passes through the preset first fully connected layer and then is sent to the preset first SoftMax classifier to obtain the predicted relationship type.
  15. 如权利要求12所述的方法,其中,所述实体识别学习模块包括实体编码器,所述实体标签损失通过如下方式获得:The method of claim 12, wherein the entity recognition learning module includes an entity encoder, and the entity label loss is obtained as follows:
    将所述隐向量输入到所述实体识别学习模块中的实体编码器,所述实体编码器对所述隐向量进行编码,得到每个词的实体识别隐向量;Input the latent vector into the entity encoder in the entity recognition learning module, and the entity encoder encodes the latent vector to obtain the entity recognition latent vector of each word;
    通过对每个词的隐向量和实体识别隐向量进行转换处理,得到预测的实体识别标签;By converting the latent vector of each word and the entity recognition latent vector, the predicted entity recognition label is obtained;
    将所有词的预测的实体识别标签与预设的标准对比计算,得到所述实体标签损失。The predicted entity recognition labels of all words are compared and calculated with preset standards to obtain the entity label loss.
  16. 如权利要求15所述的方法,其中,所述通过对每个词的隐向量和实体识别隐向量进行转换处理,得到预测的实体识别标签,包括:The method of claim 15, wherein the predicted entity recognition label is obtained by converting the latent vector of each word and the entity recognition latent vector, including:
    将每一个词的隐向量与实体识别隐向量串联,并把串联后得到的向量送入预设的第二全连接层和预设的第二SoftMax分类器,得到所述预测的实体识别标签。Concatenate the latent vector of each word with the entity recognition latent vector, and send the concatenated vector to the preset second fully connected layer and the preset second SoftMax classifier to obtain the predicted entity recognition label.
  17. 如权利要求15所述的方法,其中,所述预设的标准为实际的实体标签,所述将所有词的预测的实体识别标签与预设的标准对比计算,得到所述实体标签损失,包括:The method of claim 15, wherein the preset standard is an actual entity label, and the predicted entity recognition labels of all words are compared and calculated with the preset standard to obtain the entity label loss, including :
    将所有词的预测的实体识别标签与所述实际的实体标签对比,并将对比结果通过交叉熵损失函数计算,得到所述实体标签损失。Compare the predicted entity recognition labels of all words with the actual entity labels, and calculate the comparison results through a cross-entropy loss function to obtain the entity label loss.
  18. 如权利要求17所述的方法,其中,每个词的实际的实体标签由输入中实体所在的位置决定,在一个词不属于一个实体的情况下,所述一个词的实际的实体标签为第一标签;在一个词是一个实体的开头的情况下,所述一个词的实际的实体标签为第二标签;在一个词在一个实体的中间或者结尾的情况下,所述一个词的实际的实体标签为第三标签。The method of claim 17, wherein the actual entity label of each word is determined by the location of the entity in the input. When a word does not belong to an entity, the actual entity label of a word is the A label; when a word is the beginning of an entity, the actual entity label of the word is the second label; when a word is in the middle or end of an entity, the actual entity label of the word The entity tag is the third tag.
  19. 如权利要求17所述的方法,其中,所述判别器损失通过如下方式获得:The method of claim 17, wherein the discriminator loss is obtained as follows:
    所述判别器获取所述预测的实体识别标签与预设的标准对比的结果,得到目标输出,所述目标输出取值为0或1;The discriminator obtains the result of comparing the predicted entity identification label with a preset standard to obtain a target output, and the target output value is 0 or 1;
    将所述隐向量送入预设的第三全连接层和第三SoftMax分类器,针对每个词得到一个2维向量,其中,所述2维向量的每个维度对应所述目标输出在0 和1上的分布概率;The latent vector is sent to the preset third fully connected layer and the third SoftMax classifier, and a 2-dimensional vector is obtained for each word, where each dimension of the 2-dimensional vector corresponds to the target output at 0 and the distribution probability on 1;
    通过所述分布概率计算得到所述判别器损失。The discriminator loss is calculated through the distribution probability.
  20. 如权利要求12所述的方法,其中,在计算整体损失的情况下,引入可调节的控制参数,所述控制参数用于控制所述实体识别学习模块以及对抗学习的所述判别器对模型训练贡献的大小。The method of claim 12, wherein in the case of calculating the overall loss, adjustable control parameters are introduced, the control parameters are used to control the entity recognition learning module and the discriminator of adversarial learning for model training. The size of the contribution.
  21. 如权利要求16所述的方法,在所述通过所述整体损失对所述自然语言模型进行初步优化之后,还包括:The method of claim 16, after preliminary optimizing the natural language model through the overall loss, further comprising:
    通过调整所述关系抽取模型中部分模块的参数,对所述关系抽取编码器以及所述第四全连接层的模型参数进行二次优化更新。By adjusting the parameters of some modules in the relationship extraction model, the model parameters of the relationship extraction encoder and the fourth fully connected layer are optimized and updated twice.
  22. 一种用于自然语言模型的优化装置,所述自然语言模型包括主模型、增强模块和判别器,所述主模型包括第一编码器和第二编码器,所述装置包括:An optimization device for a natural language model. The natural language model includes a main model, an enhancement module and a discriminator. The main model includes a first encoder and a second encoder. The device includes:
    第一模块,设置为通过所述第一编码器获取输入语句,并对所述语句进行编码,输出所述语句中每个词的隐向量;The first module is configured to obtain the input sentence through the first encoder, encode the sentence, and output the latent vector of each word in the sentence;
    第二模块,设置为将所述隐向量输入所述第二编码器、所述增强模块和所述判别器,分别得到目标结果损失、增强损失以及判别器损失;The second module is configured to input the latent vector into the second encoder, the enhancement module and the discriminator to obtain the target result loss, enhancement loss and discriminator loss respectively;
    第三模块,设置为对所述目标结果损失、所述增强损失以及所述判别器损失通过预设的第一算法计算得到整体损失;The third module is configured to calculate the overall loss through a preset first algorithm for the target result loss, the enhancement loss and the discriminator loss;
    第四模块,设置为通过所述整体损失对所述自然语言模型进行初步优化。The fourth module is configured to perform preliminary optimization on the natural language model through the overall loss.
  23. 一种电子设备,包括处理器和存储器,所述存储器中的计算机程序被所述处理器执行时实现如权利要求1至21中任一项所述的用于自然语言模型的优化方法。An electronic device includes a processor and a memory. When the computer program in the memory is executed by the processor, the optimization method for a natural language model according to any one of claims 1 to 21 is implemented.
  24. 一种计算机可读储存介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至21中任一项所述的用于自然语言模型的优化方法。A computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the optimization method for a natural language model as described in any one of claims 1 to 21 is implemented.
PCT/CN2022/128623 2022-06-29 2022-10-31 Optimization method for natural language model WO2024000966A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210753408.0 2022-06-29
CN202210753407.6A CN115034228A (en) 2022-06-29 2022-06-29 Optimization method for emotion analysis model
CN202210753408.0A CN114970857A (en) 2022-06-29 2022-06-29 Optimization method for relational extraction model
CN202210753407.6 2022-06-29

Publications (1)

Publication Number Publication Date
WO2024000966A1 true WO2024000966A1 (en) 2024-01-04

Family

ID=89383923

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128623 WO2024000966A1 (en) 2022-06-29 2022-10-31 Optimization method for natural language model

Country Status (1)

Country Link
WO (1) WO2024000966A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610562A (en) * 2024-01-23 2024-02-27 中国科学技术大学 Relation extraction method combining combined category grammar and multi-task learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368528A (en) * 2020-03-09 2020-07-03 西南交通大学 Entity relation joint extraction method for medical texts
CN113128229A (en) * 2021-04-14 2021-07-16 河海大学 Chinese entity relation joint extraction method
US20210271822A1 (en) * 2020-02-28 2021-09-02 Vingroup Joint Stock Company Encoder, system and method for metaphor detection in natural language processing
CN114357155A (en) * 2021-11-29 2022-04-15 山东师范大学 Method and system for analyzing aspect emotion facing to natural language
CN114626529A (en) * 2022-02-25 2022-06-14 华南理工大学 Natural language reasoning fine-tuning method, system, device and storage medium
CN114970857A (en) * 2022-06-29 2022-08-30 苏州思萃人工智能研究所有限公司 Optimization method for relational extraction model
CN115034228A (en) * 2022-06-29 2022-09-09 苏州思萃人工智能研究所有限公司 Optimization method for emotion analysis model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210271822A1 (en) * 2020-02-28 2021-09-02 Vingroup Joint Stock Company Encoder, system and method for metaphor detection in natural language processing
CN111368528A (en) * 2020-03-09 2020-07-03 西南交通大学 Entity relation joint extraction method for medical texts
CN113128229A (en) * 2021-04-14 2021-07-16 河海大学 Chinese entity relation joint extraction method
CN114357155A (en) * 2021-11-29 2022-04-15 山东师范大学 Method and system for analyzing aspect emotion facing to natural language
CN114626529A (en) * 2022-02-25 2022-06-14 华南理工大学 Natural language reasoning fine-tuning method, system, device and storage medium
CN114970857A (en) * 2022-06-29 2022-08-30 苏州思萃人工智能研究所有限公司 Optimization method for relational extraction model
CN115034228A (en) * 2022-06-29 2022-09-09 苏州思萃人工智能研究所有限公司 Optimization method for emotion analysis model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117610562A (en) * 2024-01-23 2024-02-27 中国科学技术大学 Relation extraction method combining combined category grammar and multi-task learning

Similar Documents

Publication Publication Date Title
WO2022037256A1 (en) Text sentence processing method and device, computer device and storage medium
CN109492113B (en) Entity and relation combined extraction method for software defect knowledge
JP7291183B2 (en) Methods, apparatus, devices, media, and program products for training models
CN111382582A (en) Neural machine translation decoding acceleration method based on non-autoregressive
CN110348016A (en) Text snippet generation method based on sentence association attention mechanism
CN111401084B (en) Method and device for machine translation and computer readable storage medium
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
WO2023160472A1 (en) Model training method and related device
CN110083702B (en) Aspect level text emotion conversion method based on multi-task learning
CN111984791B (en) Attention mechanism-based long text classification method
WO2024000966A1 (en) Optimization method for natural language model
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN115168541A (en) Chapter event extraction method and system based on frame semantic mapping and type perception
CN114970857A (en) Optimization method for relational extraction model
CN115034228A (en) Optimization method for emotion analysis model
CN116702091A (en) Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN113869005A (en) Pre-training model method and system based on sentence similarity
CN113392656A (en) Neural machine translation method fusing push-and-knock network and character coding
CN117036706A (en) Image segmentation method and system based on multi-modal dialogue language model
WO2022228127A1 (en) Element text processing method and apparatus, electronic device, and storage medium
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN110059314B (en) Relation extraction method based on reinforcement learning
CN112966524A (en) Chinese sentence semantic matching method and system based on multi-granularity twin network
CN117521656B (en) Chinese text-oriented end-to-end Chinese entity relationship joint extraction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949049

Country of ref document: EP

Kind code of ref document: A1