CN112905750A

CN112905750A - Generation method and device of optimization model

Info

Publication number: CN112905750A
Application number: CN202110289744.XA
Authority: CN
Inventors: 姜姗; 刘升平; 梁家恩
Original assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd; Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-06-04

Abstract

The embodiment of the invention provides a method and equipment for generating an optimization model, wherein the method comprises the following steps: acquiring training data; organizing the training data; splicing the user question type and the question type in the knowledge base into a whole through a first character, and respectively adding a second character and the first character into the head and the tail of the whole to be used as input; modeling using BERT based on the input to obtain a coded vector; selecting a vector corresponding to the second character as a feature vector; respectively inputting the feature vectors into a first full-connection layer for text similarity calculation and a second full-connection layer for intention classification; training the model based on the text similarity score and the intention classification score and a loss function of the model combining the first full-connected layer and the second full-connected layer; and selecting the model with the highest F1 value from the trained models as the best model. The scheme improves the performance of model text matching.

Description

Generation method and device of optimization model

Technical Field

The invention relates to the technical field of model training, in particular to a method and equipment for generating an optimization model.

Background

The text matching task is a core task in a search-type question-answering system, and returns the best answer corresponding to the user question by calculating the matching degree of the user question and the question in the knowledge base. Wherein, the BERT (a language model) model is a bidirectional language model based on large-scale corpus pre-training and can be migrated to a text matching task through fine tuning. At present, based on BERT model fine tuning is the mainstream method of text matching task.

However, the text matching model in the current retrieval type question-answering system directly takes the user question as input, and does not fully utilize the implied intention information of the user question, which results in insufficient accuracy.

Therefore, there is a need for a better method to solve the problems of the prior art.

Disclosure of Invention

The invention provides a method and equipment for generating an optimization model, which can solve the technical problem of accuracy in the prior art.

The technical scheme for solving the technical problems is as follows:

the embodiment of the invention provides a generation method of an optimization model, which comprises the following steps:

acquiring training data;

organizing the training data into user questions, questions in a knowledge base, intention category labels and similarity category labels of the user questions;

splicing the user question types and the question types in the knowledge base into a whole through first characters, and adding second characters and the first characters into the head and the tail of the whole respectively to be used as input;

modeling using BERT based on the input to obtain a coded vector;

selecting the vector corresponding to the second character as a feature vector;

respectively inputting the feature vectors into a first full-connection layer for text similarity calculation and a second full-connection layer for intention classification to obtain a text similarity score and an intention classification score;

training the model based on the text similarity score and the intent classification score and a loss function of the model that combines the first fully-connected layer and the second fully-connected layer;

and selecting the model with the highest F1 value from the trained models as the best model.

In a specific embodiment, the text similarity score is determined by the following formula:

y_similarity＝Softmax(F_similarity(h_i))；

wherein, y_similarityScoring the text similarity; h is_iIs a feature vector; f_similarityIs the first fully connected layer for text similarity calculation.

In a particular embodiment, the intent classification score is determined by the following formula:

y_intent＝Softmax(F_intent(h_i))；

wherein, y_intentClassifying a score for the intent; h is_iIs a feature vector; f_intentIs the second fully connected layer for the intended classification.

In a specific embodiment, the loss function is:

L＝αL_similarity+(1-α)L_intent；

wherein L is the loss of the model combining the first fully-connected layer and the second fully-connected layer; l is_similarityMatching the loss of tasks for the text; l is_intentFor the loss of the intent classification task, α (0 ≦ α ≦ 1) is a coefficient parameter for controlling L_similarityAnd L_intentThe ratio of the two losses.

In a specific embodiment, the loss of the text matching task is:

wherein N is the number of samples, i is the ith (i is more than or equal to 1 and less than or equal to N) sample in the training data, and y_iIs similar to the sampleDegree category tag, y'_iThe probability of a positive sample is predicted for the model.

In a specific embodiment, the loss of the intent classification task is:

wherein K is the total number of the sample intention categories, j is the jth (j is more than or equal to 1 and less than or equal to K) category in the intention categories, and l_jFor sample intention class labels, p_jThe intention class computed for the model is the probability of j.

In a specific embodiment, the network parameters of the model are updated by a back-propagation algorithm.

The embodiment of the present invention further provides a device for generating an optimization model, including:

the acquisition module is used for acquiring training data;

the organizing module is used for organizing the training data into user questions, questions in a knowledge base, intention category labels and similarity category labels of the user questions;

the splicing module is used for splicing the user question types and the question types in the knowledge base into a whole through first characters, and adding second characters and the first characters into the whole from head to tail respectively to be used as input;

a vector module for modeling using BERT based on the input to obtain a coded vector;

the selection module is used for selecting the vector corresponding to the second character as a feature vector;

the input module is used for respectively inputting the feature vectors into a first full-connection layer used for text similarity calculation and a second full-connection layer used for intention classification so as to obtain a text similarity score and an intention classification score;

a training module to train a model of a first fully-connected layer and a second fully-connected layer based on the text similarity score and the intent classification score and a loss function of the model;

and the processing module is used for selecting the model with the highest F1 value from the trained models as the best model.

y_similarity＝Softmax(F_similarity(h_i))；

y_intent＝Softmax(F_intent(h_i))；

The invention has the beneficial effects that:

the embodiment of the invention provides a method and equipment for generating an optimization model, wherein the method comprises the following steps: acquiring training data; organizing the training data into user questions, questions in a knowledge base, intention category labels and similarity category labels of the user questions; splicing the user question types and the question types in the knowledge base into a whole through first characters, and adding second characters and the first characters into the head and the tail of the whole respectively to be used as input; modeling using BERT based on the input to obtain a coded vector; selecting the vector corresponding to the second character as a feature vector; respectively inputting the feature vectors into a first full-connection layer for text similarity calculation and a second full-connection layer for intention classification to obtain a text similarity score and an intention classification score; training the model based on the text similarity score and the intent classification score and a loss function of the model that combines the first fully-connected layer and the second fully-connected layer; and selecting the model with the highest F1 value from the trained models as the best model. In the scheme, the intention classification and the text matching task are combined, so that the model learns the characteristics of different intentions, the comprehension of the model on the input problems of the user can be enhanced, the text matching task is assisted, and the text matching performance of the model is improved.

Drawings

Fig. 1 is a schematic flow chart of a method for generating an optimization model according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a framework of a model in a method for generating an optimization model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an optimization model generation device according to an embodiment of the present invention;

fig. 4 is a frame structure diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Example 1

The embodiment 1 of the invention discloses a method for generating an optimization model, which comprises the following steps as shown in fig. 1:

step 101, acquiring training data;

step 102, organizing the training data into user questions, questions in a knowledge base, intention category labels and similarity category labels of the user questions;

in particular, the training data is organized into Sennce 1_i,Sentence2_i,l_i,y_iForm (I) of the formula (I), wherein Sennce 1_iIndicating a user question, Sennce 2_iRepresenting a problem in a knowledge base,/_iIntention category label, y, representing user question_iRepresenting similarity category labels.

103, splicing the user question type and the question type in the knowledge base into a whole through first characters, and adding second characters and the first characters into the head and the tail of the whole respectively to be used as input;

specifically, Sentence1_i,Sentence2_iBy [ SEP ]]Splicing characters and adding [ CLS ] at head and tail]And [ SEP ]]Characters are used as input.

104, modeling by using BERT based on the input to obtain a coded vector;

specifically, the input is modeled using BERT to obtain a coded vector.

Step 105, selecting the vector corresponding to the second character as a feature vector;

specific selection [ CLS]The vector corresponding to the character is used as a feature vector h_i. Inputting the parameters into a full connection layer with two different parameters respectively, wherein the full connection layer F _ similarity is used for calculating the text similarity

Step 106, respectively inputting the feature vectors into a first full-connection layer for text similarity calculation and a second full-connection layer for intention classification to obtain a text similarity score and an intention classification score;

y_similarity＝Softmax(F_similarity(h_i))；

Further, the intent classification score is determined by the following formula:

y_intent＝Softmax(F_intent(h_i))；

Further, the loss function is:

L＝αL_similarity+(1-α)L_intent；

wherein L is the loss of the model combining the first fully-connected layer and the second fully-connected layer; l is_similarityFor matching tasks to textLoss; l is_intentFor the loss of the intent classification task, α (0 ≦ α ≦ 1) is a coefficient parameter for controlling L_similarityAnd L_intentThe ratio of the two losses.

In a specific embodiment, the loss of the text matching task is:

wherein N is the number of samples, i is the ith (i is more than or equal to 1 and less than or equal to N) sample in the training data, and y_iFor sample similarity class labels, y' i is the probability that the model predicts positive samples.

In a specific embodiment, the loss of the intent classification task is:

Step 107, training the model based on the text similarity score and the intention classification score and a loss function of the model combining the first full-connected layer and the second full-connected layer;

specifically, the model is a BERT joint learning model in which intent classifications are matched with text, which is constructed by means of intent classification tags of user questions, as shown in fig. 2.

Specifically, the network parameters of the model are updated by a back propagation algorithm. After each update, a trained model is obtained and step 108 is performed.

And 108, selecting the model with the highest F1 value from the trained models as the best model.

Specifically, F1 is also referred to as the F value, F-measure, or other variant).

The F1 value is defined on a per category basis and includes two broad concepts: precision (precision) and recall (recall). The accuracy rate refers to the proportion of individuals whose prediction results belong to a certain class, and actually belong to the class. Recall refers to the ratio of the number of individuals correctly predicted to be of a certain category to the total number of individuals of that category in the data set.

In the scheme, the intention classification and the text matching task are combined, so that the model learns the characteristics of different intentions, the comprehension of the model on the input problems of the user can be enhanced, the text matching task is assisted, and the text matching performance of the model is improved.

Example 2

Embodiment 2 of the present invention further discloses an optimization model generation device, as shown in fig. 3, including:

an obtaining module 201, configured to obtain training data;

an organizing module 202 for organizing the training data into user questions, questions in a knowledge base, intention category labels and similarity category labels of the user questions;

the splicing module 203 is used for splicing the user question type and the question type in the knowledge base into a whole through a first character, and adding a second character and the first character into the whole as input respectively at the head and the tail of the whole;

a vector module 204 for modeling using BERT based on the input to obtain a coded vector;

a selecting module 205, configured to select the vector corresponding to the second character as a feature vector;

an input module 206, configured to input the feature vectors into a first full-link layer for text similarity calculation and a second full-link layer for intent classification, respectively, so as to obtain a text similarity score and an intent classification score;

a training module 207 for training the model based on the text similarity score and the intent classification score and a loss function of the model in conjunction with the first fully-connected layer and the second fully-connected layer;

and the processing module 208 is used for selecting the model with the highest F1 value from the trained models as the best model.

y_similarity＝Softmax(F_similarity(h_i))；

y_intent＝Softmax(F_intent(h_i))；

In a specific embodiment, the loss function is:

L＝αL_similarity+(1-α)L_intent；

In a specific embodiment, the loss of the text matching task is:

wherein N is the number of samples, i is the ith (i is more than or equal to 1 and less than or equal to N) sample in the training data, and y_iIs a sample similarity class label, y'_iThe probability of a positive sample is predicted for the model.

In a specific embodiment, the loss of the intent classification task is:

Example 3

Embodiment 3 of the present invention also discloses a terminal, as shown in fig. 4, including: a memory and a processor that executes the method of embodiment 1 when running an application in the memory.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for generating an optimization model, comprising:

acquiring training data;

modeling using BERT based on the input to obtain a coded vector;

selecting the vector corresponding to the second character as a feature vector;

2. The method of claim 1, wherein the text similarity score is determined by the formula:

y_similarity＝Softmax(F_similarity(h_i))；

3. The method of claim 1, wherein the intent classification score is determined by the formula:

y_intent＝Softmax(F_intent(h_i))；

4. The method of claim 1, wherein the loss function is:

L＝αL_similarity+(1-α)L_intent；

5. The method of claim 4, wherein the penalty for the text matching task is:

6. The method of claim 4, wherein the loss of the intent classification task is:

7. The method of claim 1, wherein the network parameters of the model are updated by a back-propagation algorithm.

8. An optimization model generation apparatus, comprising:

the acquisition module is used for acquiring training data;

9. The apparatus of claim 8, wherein the text similarity score is determined by the formula:

y_similarity＝Softmax(F_similarity(h_i))；

10. The apparatus of claim 8, wherein the intent classification score is determined by the formula:

y_intent＝Softmax(F_intent(h_i))；