CN112765356B

CN112765356B - Training method and system of multi-intention recognition model

Info

Publication number: CN112765356B
Application number: CN202110123802.1A
Authority: CN
Inventors: 刘枭
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2022-07-12
Anticipated expiration: 2041-01-29
Also published as: CN112765356A

Abstract

The embodiment of the invention provides a training method of a multi-intention recognition model. The method comprises the following steps: encoding the original labeled training data through an encoder to obtain a sentence vector; determining the probability of true positive, true negative, false negative and false positive of each intention in the sentence vector through a classifier; determining a differentiable soft-f loss function based on a true positive case, a true negative case, a false negative case and a false positive case; and (4) performing back propagation training on the multi-intention recognition model by using a soft-f loss function, and optimizing parameters of a classifier and an encoder until the multi-intention recognition model is trained. The embodiment of the invention also provides a training system of the multi-intention recognition model. The embodiment of the invention modifies the calculation mode of the F1 value to construct a differentiable loss function, which means that the optimization can be carried out by using the back propagation algorithms, thereby greatly simplifying the training process and improving the performance of the identification of the intention field.

Description

Training method and system of multi-intention recognition model

Technical Field

The invention relates to the field of intelligent voice, in particular to a training method and a training system for a multi-intention recognition model.

Background

In a dialogue system, all reasonable intentions in a user statement need to be recognized, for example, "yes, i want to send a courier," two intentions of "confirm" and "send a courier" need to be recognized. In general, the problem of multi-intention recognition can be modeled as a multi-label classification problem, and a plurality of two classification models are trained to recognize the intention in a one-vs-all mode.

A Binary cross entropy loss (Binary cross entropy loss) function, which is the most commonly used loss function for training Binary models, is typically used; a Hinge loss (Hinge loss) function, which is a loss function of a binary model for training maximum compartment classification, such as Support Vector Machines (SVMs) is also used. Compared with cross entropy loss, hinge loss generally brings better model generalization capability, but the output of a model trained by hinge loss has no good probability definition and cannot be used as the confidence of the model on a recognition result. The Focal loss is an improvement on the basis of cross entropy loss, so that the model can focus on difficult samples during training, and the model has better performance on the difficult samples.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the related art:

in a dialogue system, F1 scores are generally used as an index for evaluating a certain intention recognition, and the performance of the entire intention recognition is expressed by a macro average of all intention F1 scores. However, these methods do not directly optimize the objective function of the macro-average correlation of F1 values during the training of the classification model, which leads to the following problems:

1. the trained output of the classification model does not provide an optimal F1 value macroaverage. The output of the model needs to be optimized, and an optimal threshold is found to make the value of F optimal, for example, if the output of the model for a certain class is 0 to 1, a threshold between the optimal threshold and the optimal threshold needs to be found to ensure the optimal performance;

2. the threshold value needs to be searched again each time the model is updated, and the whole process is complicated.

On a validation set, the output of each class of the model is tuned separately, finding an optimal threshold such that the macro-average of the F1 values is highest. The output of a model trained for hinge loss generally has no particularly good way to solve this type of problem, since there is no good probability definition. And because the F1 value is not trivial by definition, it cannot be directly used as a loss function for classification model training.

Disclosure of Invention

The method at least solves the problem that the training method in the prior art is not an objective function for directly optimizing the macro-average correlation of the F1 value and can not be directly used as a loss function for training a classification model.

In a first aspect, an embodiment of the present invention provides a training method for a multi-intent recognition model, including:

encoding the original labeled training data through an encoder to obtain a sentence vector;

determining the probability of true positive, true negative, false negative and false positive of each intention in the sentence vector through a classifier;

determining a differentiable soft-f loss function based on the true positive case, the true negative case, the false negative case, and the false positive case;

and carrying out back propagation training on the multi-intention recognition model by utilizing the soft-f loss function, wherein the back propagation training is used for optimizing the parameters of the classifier and the encoder until the multi-intention recognition model training is completed.

In a second aspect, an embodiment of the present invention provides a training system for a multi-intent recognition model, including:

the coding program module is used for coding the original marked training data through a coder to obtain a sentence vector;

the classification program module is used for determining the probability of true positive, true negative, false negative and false positive of each intention in the sentence vector through the classifier;

a loss function determination program module for determining a differentiable soft-f loss function based on the true positive, the true negative, the false negative, and the false positive;

and the training program module is used for carrying out back propagation training on the multi-intention recognition model by utilizing the soft-f loss function and optimizing the parameters of the classifier and the encoder until the multi-intention recognition model is trained.

In a third aspect, an electronic device is provided, comprising: the training system comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the training method of the multi-purpose recognition model of any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention provides a storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of the training method for a multi-intent recognition model according to any embodiment of the present invention.

The embodiment of the invention has the beneficial effects that: the calculation mode of the F1 value is modified to construct a differentiable loss function, which means that the back propagation algorithms can be used for optimization, the training process is greatly simplified, and the performance of the identification of the intention field is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart of a training method for a multi-intent recognition model according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a training phase of a training method for a multi-intent recognition model according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an inference phase of a training method for a multi-intent recognition model according to an embodiment of the present invention;

FIG. 4 is a comparison graph of the performance of a training method for a multi-intent recognition model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a training system for a multi-intent recognition model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a training method for a multi-intent recognition model according to an embodiment of the present invention, which includes the following steps:

s11: encoding the original labeled training data through an encoder to obtain a sentence vector;

s12: determining the probability of true positive, true negative, false negative and false positive of each intention in the sentence vector through a classifier;

s13: determining a differentiable soft-f loss function based on the true positive case, the true negative case, the false negative case, and the false positive case;

and S14, carrying out back propagation training on the multi-intention recognition model by using the soft-f loss function, wherein the back propagation training is used for optimizing the parameters of the classifier and the encoder until the training of the multi-intention recognition model is completed.

In the embodiment, a differentiable loss function is constructed as a loss function for training the binary model by modifying the calculation mode of the F1 value.

For step S11, the implementation prepares the original annotation training labels with multiple intentions, for example, "yes, i want to send a courier", where "yes" and "send a courier" each have a respective intention. Assume multiple statements Q1, Q2, Q3. Wherein Q1 and Q3 contain intention A and Q2 does not contain intention A.

For step S12, the classification results for Q1, Q2, and Q3 using the classifier (classification model) are: q1 has an 80% probability of containing intention A, Q2 has a 10% probability of containing intention A, and Q3 has a 90% probability of containing intention A. Calculating a true positive case TP, a false positive case FP, a false negative case FN and a true negative case TN from the angle of probability:

TP＝1*0.8+1*0.9＝1.7；

FP＝1*0.1＝0.1；

FN＝1*0.2+1*0.1＝0.3；

TN＝1*0.9＝0.9。

for step S13, the differentiable soft-F loss function is determined in consideration of modifying the calculation method of the F1 value, and the relationship between the true case TP and the true negative case TN is additionally considered in the calculation method of the existing F1 value. And then the whole modification is carried out.

As an embodiment, in this embodiment, the soft-f loss function is:

the TP is a true positive case, the TN is a true negative case, the FP is a false positive case, and the FN is a false negative case.

In this embodiment, the formula is extended further:

where X represents the entire data set, X is one sample in X, y_xLabel (0 or 1), p, representing x_xRefers to the probability value (value between 0 and 1) that the prediction sample x is labeled 1.

In step S14, after determining the soft-f loss function in the multi-intent recognition model, the back propagation training is performed, so as to train the classifier and the encoder in the multi-intent recognition model as a whole.

Because soft-F is directly related to the F1 value, training with soft-F as a classification model loss function can obtain better macro-averaging of the F1 value of the final model. The optimal probability threshold of the model output is about 0.5 (as a better mode, the optimal probability threshold can be properly adjusted according to different requirements), searching is not needed, large changes caused by retraining are avoided, and the complexity of the process can be greatly reduced.

In the definition of the soft-f loss function, another case is also considered:

in this way, the loss function is designed only from the perspective of positive examples, and in practice, it is found that for some data sets with the most positive examples, the derivative of the loss function is always positive, which results in that the finally trained classification model predicts all classes as positive examples. To solve this problem, negative case dependent terms are therefore added to the loss function.

It can be seen from this embodiment that, by modifying the calculation mode of the F1 value, a differentiable loss function is constructed, and in the deep learning-based classification system, a back propagation algorithm is relied on to perform gradient descent optimization on the parameters of the classification model, which requires that the loss function we use is differentiable. The F1 value is used as a common index, the index is not differentiable in a common calculation mode, cross entropy and the like are used as loss functions to perform indirect optimization, a differentiable F1 value function is constructed, the F1 value is optimized directly, better performance is brought compared with indirect optimization, the training process is simplified greatly, and the performance of identification of the intention field is improved.

As an embodiment, the encoding, by an encoder, the original annotation training data includes: encoding the original labeling training data through a BERT encoder to obtain a sentence vector; performing label vectorization on the original labeling training data to obtain a vectorized intention label;

calculating a soft-f loss function based on the vectorized intent labels and the probabilities of the intents in the sentence vectors.

In the present embodiment, as shown in fig. 2, a specific flowchart of the training phase is shown.

Tagging annotation data is vectorized, for example, tags of a common intention in the data are three intentions of a, B and C, tags of a sentence Q1 are an intention a and an intention B, and then vectorized is 110, tags of a sentence Q2 are an intention a and an intention C, and then vectorized is 101.

The sentences are encoded into d-dimensional sentence vectors using BERT (bidirectional Encoder retrieval from transformations). The network architecture of the BERT uses a multi-layer Transformer structure, and has the biggest characteristic that the traditional RNN and CNN are abandoned, and the distance between two words at any position is converted into 1 through an Attention mechanism, so that the problem of troublesome long-term dependence in NLP is effectively solved.

The sentence vectors are passed through a classifier formed by a fully-connected network to output the probability of each intention, and the true case TP, the false positive case FP, the false negative case FN and the true negative case TN are not described herein again. And further calculating soft-f loss by using vectorized labels and the probability output by the classifier, and optimizing parameters of the classifier and a sentence vector encoder through a back propagation algorithm, thereby further improving the performance of identification of the intention field.

As an embodiment, the method further comprises:

recognizing a text corresponding to a sentence input by a user, and converting the text into a sentence vector through an encoder in a multi-intention recognition model;

determining probability values of all intentions in the sentence vector through the multi-intention recognition intra-model classifier, and outputting at least one predicted intention with the probability value higher than a preset threshold value.

In this embodiment, as shown in the flowchart of the inference and prediction stage shown in fig. 3, a text to be predicted is input into a sentence vector encoder in a trained multi-intent recognition model, so as to obtain a sentence vector of the text to be predicted.

A classifier in a multi-intent recognition based model determines probability values for each intent in a sentence vector of text to be predicted. For example, if the probability value of "express delivery field" in "i want to send an express delivery" is 0.935, and exceeds the preset 0.5 threshold, the intention of "express delivery field" is output.

Further, the method further comprises:

and checking at least one output predicted intention based on at least one actual intention preset in a text corresponding to the user input sentence, and determining the performance of the multi-intention recognition model based on the checking result.

In this embodiment, the experimental data is derived from a customer service dialogue system in the express delivery field and the financial field. The express delivery field data has 10 intention categories, the training set size is 16112, and the test set size is 4000. The financial field data has 15 intention categories in total, the training data size is 8179, and the test set size is 2000. The base line method and the sentence vector encoder in the method both use BERT, the classifier both use two layers of fully connected neural networks, 0.5 is adopted as a threshold value of all the classification outputs of the classifier, and the macro-average of F1 score is used as an evaluation index, and the result on the test set is shown in figure 4. From experimental results, it can be found that, due to the direct optimization of the F1 value, when the threshold value output by the classifier is not adjusted, relative cross entropy loss is caused, and the performance is obviously improved when soft-F loss is used.

Fig. 5 is a schematic structural diagram of a training system for multiple intention recognition models according to an embodiment of the present invention, which can execute the training method for multiple intention recognition models according to any of the above embodiments and is configured in a terminal.

The training system 10 for the multi-intent recognition model provided by the embodiment comprises: a coding program module 11, a classification program module 12, a loss function determination program module 13 and a training program module 14.

The encoding program module 11 is configured to encode the original labeled training data through an encoder to obtain a sentence vector; the classifier module 12 is configured to determine, through the classifier, probabilities of true positive, true negative, false negative, and false positive of each intention in the sentence vector; the loss function determination program module 13 is used for determining a differentiable soft-f loss function based on the true positive case, the true negative case, the false negative case and the false positive case; the training program module 14 is configured to train the multi-intent recognition model using the soft-f loss function in a back propagation manner, so as to optimize the parameters of the classifier and the encoder until the training of the multi-intent recognition model is completed.

Further, the soft-f loss function is:

Further, the encoded program modules are for: encoding the original labeling training data through a BERT encoder to obtain a sentence vector;

the method further comprises the following steps: the label vectorization program module is used for carrying out label vectorization on the original labeling training data to obtain a vectorized intention label;

a loss function determination program module for calculating a soft-f loss function based on the vectorized intent tags and the probabilities of the intents in the sentence vector.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the training method of the multi-intention recognition model in any method embodiment;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-transitory computer-readable storage medium, it may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium, which when executed by a processor, perform a method of training a multi-intent recognition model in any of the method embodiments described above.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory located remotely from the processor, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

An embodiment of the present invention further provides an electronic device, which includes: the training system comprises at least one processor and a memory which is connected with the at least one processor in a communication mode, wherein the memory stores instructions which can be executed by the at least one processor, and the instructions are executed by the at least one processor so as to enable the at least one processor to execute the steps of the training method of the multi-purpose recognition model of any embodiment of the invention.

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones, multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as tablet computers.

(3) Portable entertainment devices such devices may display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) Other electronic devices with data processing capabilities.

As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A training method of a multi-intention recognition model comprises the following steps:

back-propagating training the multi-intent recognition model by using the soft-f loss function for optimizing the parameters of the classifier and the encoder until the multi-intent recognition model training is completed,

wherein the soft-f loss function is:

2. The method of claim 1, wherein said encoding, by an encoder, original annotation training data comprises: encoding the original labeling training data through a BERT encoder to obtain a sentence vector;

the method further comprises the following steps:

performing label vectorization on the original labeling training data to obtain a vectorized intention label;

calculating a soft-f loss function based on the vectorized intent labels and the probabilities of the intents in the sentence vector.

3. The method of claim 1, wherein the method further comprises:

determining probability values of all intentions in the sentence vector through the classifier in the multi-intention recognition model, and outputting at least one predicted intention with the probability value higher than a preset threshold value.

4. The method of claim 3, wherein the method further comprises:

and checking at least one output predicted intention based on at least one preset actual intention in a text corresponding to the user input sentence, and determining the performance of the multi-intention recognition model based on the checking result.

5. A training system for a multi-intent recognition model, comprising:

a training program module for back propagation training of the multi-intent recognition model by using the soft-f loss function, for optimizing the parameters of the classifier and the encoder until the multi-intent recognition model training is completed,

wherein the soft-f loss function is:

6. The system of claim 5, wherein the encoded program modules are to: encoding the original labeling training data through a BERT encoder to obtain a sentence vector;

the system further comprises: the label vectorization program module is used for carrying out label vectorization on the original labeling training data to obtain a vectorized intention label;

7. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-4.

8. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.