CN113887643B

CN113887643B - New dialogue intention recognition method based on pseudo tag self-training and source domain retraining

Info

Publication number: CN113887643B
Application number: CN202111187641.9A
Authority: CN
Inventors: 田锋; 安文斌; 郑庆华
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-10-12
Filing date: 2021-10-12
Publication date: 2023-07-18
Anticipated expiration: 2041-10-12
Also published as: CN113887643A

Abstract

The invention discloses a new dialogue intention recognition method based on pseudo tag self-training and source domain retraining, and belongs to the technical field of language processing. According to the new dialogue intention recognition method based on the pseudo tag self-training and the source domain retraining, the pseudo tag is generated for unlabeled data containing the new dialogue intention, and model parameters are iteratively updated by using the self-training method, so that the recognition accuracy is continuously improved; meanwhile, a retraining strategy is provided, so that knowledge can be better migrated between a source domain and a target domain, and the expression capacity of the model is improved; finally, the invention integrates the output of the three models for integrated learning, and improves the robustness of the models.

Description

New dialogue intention recognition method based on pseudo tag self-training and source domain retraining

Technical Field

The invention belongs to the technical field of language processing, and particularly relates to a new dialogue intention recognition method based on pseudo tag self-training and source domain retraining.

Background

The core module of the intelligent dialog system is user intention recognition. New dialog intention recognition aims at finding new generated dialog intention based on existing dialog intention, and only uses a small amount of marked known intention data to find and classify new intention from a large amount of unmarked data. Because the data containing the new intention is all unlabeled data, the existing dialog intention classification model cannot process the data, thereby causing user intention recognition errors and affecting subsequent responses of the intelligent dialog system.

In order to solve the above problems, two methods are mainly adopted in the current academia: 1. a self-adaptive clustering model based on contrast similarity is provided on the basis of a contrast learning method, such as Lin and the like, and high-quality data are selected for self-marking and training through the contrast learning method. 2. A new dialogue intention discovery model based on depth alignment clustering is proposed by a depth clustering-based method, such as Zhang, and pseudo labels are generated for unlabeled data through clustering and alignment operation and model training is carried out. The technical scheme has the following defects: first, the granularity of pseudo labels generated by the existing model for unlabeled data is coarse, and the model cannot be well trained to find out new dialogue intention. Second, the existing model only uses the labeling data to initialize the model, but fails to fully use the labeling data in the training process, making knowledge migration difficult. Thirdly, the existing model only utilizes a clustering model to generate final prediction, and integrated learning is not carried out by fusing other models, so that the robustness of the model is reduced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a new dialog intention recognition method based on pseudo tag self-training and source domain retraining.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

a new dialog intention recognition method based on pseudo tag self-training and source domain retraining comprises the following steps:

1. model training

1) Feature extraction is performed on the input by using a model Net1 and a model Net2 to respectively obtain vector representations of the ith inputAnd->For->And->Weighted combination is performed to obtain the final representation of the i-th input +.>

The model Net1 and the model Net2 are BERT models with the same structure;

2) Final vector representation of input using cluster model KmeansClustering to obtain pseudo tags of the category to which each sample belongs>

Vector representation of input using Softmax classifierAnd->Respectively performing classification operation to obtain two other groups of pseudo tags ++>And->

For a pair ofAnd->Performing linear transformation to obtain predictive probability distribution vector of the model on the ith input>And

calculating predictive probability distribution vector by cross entropy loss by adopting self-training methodAnd pseudo tag->Is predictive of the difference of probability distribution vector +.>And pseudo tag->Is a predictive probability distribution vector>With pseudo tagsSumming the losses of all input samples to obtain a loss value +.>For loss valueWeighting combination is carried out to obtain final loss value of pseudo tag self-training +.>

3) Retraining model Net1 and model Net2 using labeled data, computing predictive probability distribution vectors via cross entropy loss functionsAnd->And (3) true label->The difference between them to obtain the loss value with labeling training

4) For a pair ofAnd->Weighting and combining to obtain final loss value +.>Continuously updating model parameters through back propagation, so that the difference between the prediction and the true value of the model is within a preset range;

5) And respectively inputting the dialogue text to be classified into a model Kmeans, a model Net1 and a model Net2 for label prediction, carrying out weighted combination on the obtained prediction labels, and finally classifying the combined result by using a Softmax classifier.

Further, the specific operation of feature extraction in step 1) is:

for the ith input text s _i Feature extraction is performed by using a model Net1 and a model Net2 respectively to obtain vector representationAnd->

Where MeanPooling represents an average pooling of the final output of the BERT model,and->For hidden layer representation of the input, σ represents the ReLU activation function，W _a ,b _a Is a learnable parameter.

Further, the final representation is found in step 1)The operation of (1) is as follows:

for s _i Is weighted and combined to obtain s _i Final vector representation of (a)

Wherein lambda is _f The super-parameters are preset values and are used for balancing the weights occupied by the two vector representations.

Further, the specific operation of generating the pseudo tag in the step 2) is as follows:

will s _i Final vector representation of (a)Inputting into a cluster model Kmeans to obtain pseudo tags of corresponding categories

Wherein mu _j For the vector representation of the center of the j-th cluster in the clustering process,in order to indicate the function, I.I. | ² Is the Euclidean distance;

input vectors extracted from model Net1 and model Net2 using Softmax classifierAnd->Performing classification operation to obtain two other groups of pseudo tags belonging to each sample>And->

Input vector by a linear layer pairAnd->Performing linear transformation to obtain predictive probability distribution vector +.f of model Net1 and model Net2 for ith input>Is->

Wherein W is _c ,b _c ,W _m ,b _m Are all learnable parameters.

Further, the self-training of step 2) results in a loss valueThe specific operation of (a) is as follows:

by a self-training method, respectively calculating prediction probability distribution vectors by using a cross entropy loss functionAnd pseudo tag->Is predictive of the difference of probability distribution vector +.>And pseudo tag->Is a predictive probability distribution vector>And pseudo tag->Summing the losses of all input samples to obtain a loss value +.>

Wherein N is the number of samples, exp () is an exponential function;

for loss valueWeighting combination is carried out to obtain final loss value of pseudo tag self-training +.>

Wherein lambda is _s1 And lambda (lambda) _s2 All are super parameters, and are preset values.

Further, in step 3), a loss value with labeling training is obtainedThe specific operation of (a) is as follows:

predicting probability distribution vectors by cross entropy loss function calculation model Net1And (3) true label->The difference between them is summed up for all input samples to obtain a loss value +.>

Prediction probability distribution vector through cross entropy loss function calculation model Net2And (3) true label->The difference between them is summed up for all input samples to obtain a loss value +.>

Wherein M is the number of marked data;

loss by retraining model Net1 and model Net2 on labeled dataAnd->Weighting and combining to obtain the total loss value +.>

Wherein lambda is _l1 Is a super parameter and is a preset value.

Further, in step 4), the specific operation of updating the model parameters using back propagation is:

self-training loss values for pseudo tagsAnd with a labeling training loss value->Weighting and combining to obtain the overall loss of the model>

Wherein lambda is _t Is super-parameter, is a preset value and is used for balancing lossAnd->The weight of the vehicle is occupied;

obtaining the integral lossAnd then, continuously updating parameters of the model Net1 and the model Net2 by using a back propagation algorithm, so that the difference between the prediction and the true value of the model is within a preset range.

Further, the specific operation of predicting the input category when the model is used is as follows:

respectively inputting the dialogue text to be classified into a trained model Net1, a trained model Net2 and a clustering model Kmeans to respectively obtain a prediction label y ¹ ，y ² Y ³ The obtained three prediction labels are weighted and combined to obtain a final prediction label y ^vote ：

y ^vote ＝λ _y1 y ¹ +λ _y2 y ² +(1-λ _y1 -λ _y2 )y ³ (20)

Prediction tag y using Softmax classifier pair ^vote Classifying, namely taking the maximum value y of the classification as the final prediction category:

y＝max(Softmax(y ^vote )) (21)

compared with the prior art, the invention has the following beneficial effects:

according to the new dialogue intention recognition method based on the pseudo tag self-training and the source domain retraining, the pseudo tag is generated for unlabeled data containing the new dialogue intention, and model parameters are iteratively updated by using the self-training method, so that the recognition accuracy is continuously improved; meanwhile, a retraining strategy is provided, so that knowledge can be better migrated between a source domain and a target domain, and the expression capacity of the model is improved; finally, the invention integrates the output of the three models for integrated learning, and improves the robustness of the models.

Drawings

Fig. 1 is a diagram showing the overall network configuration in the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to solve the problems in the prior art, the invention provides a new dialog intention recognition method based on self-training of a pseudo tag and retraining of a source domain, which generates the pseudo tag for unlabeled data containing new dialog intention, and iteratively updates model parameters by using the self-training method, so that the recognition accuracy is continuously improved; meanwhile, the invention provides a retraining strategy, so that knowledge can be better migrated between a source domain and a target domain, thereby improving the expression capacity of the model; finally, the invention integrates the output of the three models for integrated learning, and improves the robustness of the models.

The invention is described in further detail below with reference to the attached drawing figures:

referring to fig. 1, fig. 1 is a schematic diagram of a network model of the present invention, which includes a cluster model Kmeans and two BERT models Net1, net2. The three models can generate different pseudo tags, and the obtained pseudo tags are utilized for self-training, so that the performance of the models is continuously improved. And after training, combining the outputs of the three models to obtain a final predicted value.

1. Model training:

step 1: feature extraction of the input using two structurally identical pre-trained models BERT (model Net1 and model Net 2) to obtain vector representations of the ith input, respectivelyAnd->

Where MeanPooling represents an average pooling of the final output of the BERT model,and->For hidden layer representation of the input, σ represents the ReLU activation function, W _a ,b _a Is a learnable parameter.

Thereafter, for s _i Is represented by two vectors of (a)Weighted combination is carried out to obtain s _i Final vector representation of (a)

Wherein lambda is _f The super-parameters are set in advance and used for balancing the weights occupied by the two vector representations.

Step 2: final vector representation of input using cluster model KmeansClustering to obtain pseudo tags of the category to which each sample belongs>

Wherein mu _j For the vector representation of the center of the j-th cluster in the clustering process,in order to indicate the function, I.I. | ² Is the Euclidean distance.

Step 3: vector representation of input using Softmax classifierAnd->Respectively performing classification operation to obtain two other groups of pseudo tags ++>And->

Thereafter, by the pair ofAnd->Performing linear transformation to obtain predictive probability distribution vector of the model on the ith input>And->

Step 4: by a self-training method, respectively calculating prediction probability distribution vectors by using a cross entropy loss functionAnd pseudo tag->Is predictive of the difference of probability distribution vector +.>And pseudo tag->Is a difference of (1) and a predictive probability distribution vectorAnd pseudo tag->And summing all input samples to obtain a loss value +.>

Where N is the number of samples and exp () is an exponential function.

Then, for the loss valueWeighting and combining to obtain final loss value of pseudo tag self-training

Wherein lambda is _s1 And lambda (lambda) _s2 All are super parameters, and need to be set in advance for balancing the weights occupied by the three losses.

Step 5: retraining model Net1 and model Net2 using labeled data, calculating model predictive probability distribution vectors via cross entropy loss functionsAnd->And (3) true label->The difference between them, a loss value of +.>And->

Wherein M is the number of marked data.

Then, model Net1 and model Net2 are retrained with the loss obtained by the labeling dataAnd->Weighting and combining to obtain the total loss value +.>

Wherein lambda is _l1 Is super-parameter and needs to be set in advance for balancing the lossAnd->The weight occupied.

Step 6: self-training penalty for pseudo tagsAnd->Weighted combination is carried out to obtain a final loss value

Wherein lambda is _t Is super-parameter and needs to be set in advance for balancing the lossAnd->The weight occupied.

Step 7: obtaining the integral lossAnd then, continuously updating parameters of the model Net1 and the model Net2 by using a back propagation algorithm, so that the predicted value of the model gradually approaches to the true value.

Step 8: model training

The gradient is updated by using an Adam optimizer, the learning rate is set to 0.0001, the first-order momentum parameter of Adam is set to 0.1, the second-order momentum parameter is set to 0.999, the number of training iterations (epochs) of the data set is set to 100, the parameters of the pre-trained BERT model are fixed, and the super-parameters are selected according to the representation of the model on the model validation set.

2. Model use

y ^vote ＝λ _y1 y ¹ +λ _y2 y ² +(1-λ _y1 -λ _y2 )y ³ (20)

y＝max(Softmax(y ^vote )) (21)

to measure model performance, a comparative test was performed on three widely used public intent recognition datasets, training sets of datasets, test set divisions and the number and lexicon sizes of known intent categories as shown in Table 1. Table 2 shows the results of the comparison experiment, compared with fifteen common models, the index accuracy (Acc) and ARI values and NMI values, and it can be seen from the table that the model PTRN of the present invention has the best results on all indexes of all data sets, and has a larger improvement in performance compared with the current best method.

Table 1 statistics of data sets used to measure model performance

Table 2 shows the accuracy (Acc), ARI and NMI values of the comparative model over different data sets, wherein PTRN is the method of the present invention.

Table 2 performance of comparative models on different data sets

The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A new dialogue intention recognition method based on pseudo tag self-training and source domain retraining is characterized by comprising the following steps:

The model Net1 and the model Net2 are BERT models with the same structure;

2) Final vector representation of input using cluster model KmeansClustering operation is carried out to obtain pseudo tags of the category to which each sample belongs>

Vector representation of input using Softmax classifierAnd->Respectively performing classification operation to obtain two other groups of pseudo tags belonging to each sample>And->

For a pair ofAnd->Performing linear transformation to obtain predictive probability distribution vector of the model on the ith input>And->

Calculating predictive probability distribution vector by cross entropy loss by adopting self-training methodAnd pseudo tag->Is predictive of the difference of probability distribution vector +.>And pseudo tag->Is a predictive probability distribution vector>And pseudo tag->Summing the losses of all input samples to obtain a loss value +.>For loss value->Weighting combination is carried out to obtain final loss value of pseudo tag self-training +.>

3) Retraining model Net1 and model Net2 using labeled data, computing predictive probability distribution vectors via cross entropy loss functionsAnd->And (3) true label->The difference between them, a loss value of +.>

2. The new dialog intention recognition method based on pseudo tag self-training and source domain retraining of claim 1, wherein the feature extraction in step 1) is specifically performed as follows:

3. The new dialog intention recognition method based on pseudo-tag self-training and source domain retraining of claim 2, wherein the final representation is found in step 1)The operation of (1) is as follows:

4. The new dialog intention recognition method based on pseudo tag self-training and source domain retraining of claim 1, wherein the specific operations of pseudo tag generation of step 2) are:

will s _i Final vector representation of (a)Inputting into a cluster model Kmeans to obtain pseudo tags of corresponding categories>

Wherein W is _c ，b _c ，W _m ，b _m Are all learnable parameters.

5. The new dialog intention recognition method based on pseudo-tag self-training and source domain retraining of claim 4, wherein the self-training of step 2) results in a loss valueThe specific operation of (a) is as follows:

by a self-training method, respectively calculating prediction probability distribution vectors by using a cross entropy loss functionWith pseudo tagsIs predictive of the difference of probability distribution vector +.>And pseudo tag->Is a predictive probability distribution vector>And pseudo tag->Summing the losses of all input samples to obtain a loss value +.>

Wherein N is the number of samples, exp () is an exponential function;

6. The new dialog intention recognition method based on pseudo-tag self-training and source domain retraining as claimed in claim 1, wherein in step 3) a trained penalty value is obtainedThe specific operation of (a) is as follows:

Wherein M is the number of marked data;

Wherein lambda is _I1 Is a super parameter and is a preset value.

7. The new dialog intention recognition method based on pseudo tag self-training and source domain retraining of claim 1, wherein in step 4), the specific operations of updating model parameters using back propagation are:

8. The new dialog intention recognition method based on pseudo tag self-training and source domain retraining of claim 1, wherein the specific operation of predicting the input class when the model of step 5) is used is:

the dialogue text to be classified is respectively input into a trained model Net1, a model Net2 and a clustering model Kmeans,respectively obtaining the prediction labels y ¹ ，y ² Y ³ The obtained three prediction labels are weighted and combined to obtain a final prediction label y ^vote ：

y ^vote ＝λ _y1 y ¹ +λ _y2 y ² +(1-λ _y1 -λ _y2 )y ³ (20)

y＝max(Softmax(y ^vote )) (21)。