CN112613297A

CN112613297A - Dynamic subject window model-based multi-turn dialog intention recognition method

Info

Publication number: CN112613297A
Application number: CN202011500583.6A
Authority: CN
Inventors: 宋杰; 骆起峰
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-06

Abstract

A multi-turn under-dialogue intention recognition method based on a dynamic theme window model relates to the field of machine learning and natural language processing, and comprises the steps of firstly constructing a fixed-size window, sliding before and after predicting a text, then determining context information in an introduced window based on the theme model so as to carry out rough denoising, and then coding the introduced text by using a BERT model so as to obtain multi-level semantic information; an Attention mechanism is used for obtaining the feature vector, and a state value is calculated to control the introduced context information feature related to the context, so that the second more detailed denoising processing is carried out. And finally, training by using different intention classification models, and performing model fusion by adopting a minority majority-obeying voting mode to give an intention recognition result. The invention is based on a deep learning method, has more accurate recognition effect, and can be better transferred and used for intention recognition tasks in different fields.

Description

Dynamic subject window model-based multi-turn dialog intention recognition method

Technical Field

The invention relates to the field of machine learning and natural language processing, in particular to an intention recognition method under multiple rounds of conversations based on a dynamic theme window model.

Background

Constructing a human-machine dialog system that can automatically interact with humans using natural language has been a significant challenge for academic research and commercial applications. From an application point of view, dialog systems can be divided into two broad categories, namely task oriented dialog systems and chat oriented dialog systems. Our invention patent is directed to intent recognition in a task-oriented guided human-machine dialog system. In the existing task-oriented man-machine conversation system, the conversation text has the following three characteristics: firstly, the ratio of oral expressions in sentences is high, and the same concept has multiple expression modes; secondly, the proportion of short texts is higher, the input expression of each round of the user is relatively short, and some input words are only a few words; third, the content of each of the multiple rounds of conversation may not be independent, i.e., sometimes the content of the conversation is difficult to clarify by elaborating in a single round of conversation, in which case it may be necessary to incorporate contextual information to assist in understanding the conversation. More semantic information can be collected through multiple rounds of conversation, and the intention of the questioner can be identified more accurately. In practical application scenarios, speech recognition and human language expression are not accurate, which invisibly greatly increases the difficulty of understanding the intention of the robot to the user. How to correctly identify the intentions of questioners has been one of the key points of multi-turn dialog system research.

Early intent recognition methods considered semantic utterance classification problems and mainly included rule template-based methods, methods using statistical features, and machine learning classification algorithm-based methods. Rule template-based methods are usually directed to sentences that conform to certain rules and have very similar structures. It requires the manual construction of rule templates and category information, i.e. which keywords correspond to which intents. Then, the intentions of the questioner are determined by means of rule template parsing. The rule-based dialog system cannot support true open domain dialog, the system cannot recognize once the rule is out of range, and the mode is high in labor cost, low in efficiency and not easy to expand. The method based on the statistical characteristics is a method of performing word frequency statistics using an intention dictionary and extracting the intention corresponding to the most frequent word as the intention of the questioner, and although this method is relatively simple, its recognition effect is not good. The method based on Machine learning generally uses classifiers such as Naive Bayes, Support Vector Machine, Logistic regression and the like, and the common method for realizing multi-intention recognition of the methods is to train one classifier for each intention to form a classification chain and then use the classification chain layer by layer.

In recent years, with the continuous development of deep learning, the use of neural network models to deal with the problems of intention recognition in dialogue systems is becoming a mainstream trend. The main idea is to convert the abstract problem into a concrete problem, namely to convert an intention recognition task into an intention classification task, similar to a clustering task in a text, and achieve the intention recognition effect by using a text classification algorithm.

Disclosure of Invention

The invention aims to: the method aims to solve the problem that the intention recognition false detection rate in a multi-turn dialog system is high due to factors such as short spoken expressions and wide contents in a complex task type dialog scene. A method for recognizing intention under multiple rounds of conversations based on a dynamic topic window model is provided. The method overcomes the difficult problems of short oral expression and wide content in a complex conversation scene by effectively utilizing context information related to the context, can effectively identify the intention of a questioner, and has important help in the development of a task-oriented intelligent conversation system.

In order to achieve the purpose, the invention adopts the following technical scheme:

step (1), data preprocessing: removing punctuation, word segmentation and stop words;

step (2), constructing a dynamic theme window model to predict themes around the text to be recognized, and introducing window context information according to the prediction result;

step (3), coding the context information obtained in the step (2) by using a BERT model, extracting a feature vector of the context information related to the context based on an Attention mechanism, and controlling the introduced context information feature related to the context by calculating a state value;

step (4), finally, training different intention classification models according to the extracted feature vectors, fusing all classification models, and giving an intention recognition result by adopting a voting mode;

compared with the prior art, the invention has the beneficial effects that:

1. firstly, the method based on deep learning can be better used in a migration mode. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.

2. Secondly, compared with a method for performing intention recognition only by using current text information, whether context information related to the context in a certain range needs to be introduced or not can be controlled by using the topic model, so that the performance of the classification algorithm can be effectively improved, and the influence of noise caused by introducing unnecessary context information from the full text can be reduced.

3. Finally, compared with the traditional intention identification method, the method based on deep learning has more accurate identification effect, and the intention identification effect can be further improved by adopting a multi-model fusion strategy.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a block diagram of the intent recognition algorithm of the present invention;

FIG. 3 is a diagram of an LDA topic model used in the present invention;

FIG. 4 is a diagram of an encoding process using the BERT model according to the present invention;

FIG. 5 illustrates an implementation of status signals in a control module according to the present invention.

Detailed Description

In the embodiment, a method for recognizing an intention under multiple rounds of conversations based on a dynamic topic window model is provided. The data flow diagram is shown in fig. 1. The intention recognition algorithm framework is mainly divided into three modules: as shown in fig. 2, the encoding module, the control module and the classification module are specifically performed according to the following steps:

step (1.1), firstly, removing punctuation and special symbols such as emoticons from the conversation text, and then removing duplication;

step (1.2), then, performing word segmentation on the text by using a word segmentation tool pkuseg in a specific field, so as to eliminate the influence of ambiguous words to a certain extent;

and (1.3) finally, performing word-stop operation on the word segmentation result by adopting a public word-stop library.

Step (2), a coding module constructs a dynamic theme window model to predict themes around the text to be recognized;

step (2.1), firstly, a window containing the number of the conversation rounds of 2 is constructed, and the predicted text Q is subjected to_iSliding operation with a number of wheels of 1 before and after is performed, and is denoted as SW (Q)_i-1A_i-1,Q_iA_i) Wherein Q is_iA_iThe number of the dialog turns sliding back and forth is not more than 3 turns from the text to be predicted for one turn of dialog of the ith turn;

step (2.2), secondly, constructing an LDA model;

step (2.2.1), a theme library is established, wherein a theme is marked as being uninteresting, and a certain number of words capable of reflecting the theme are selected for each theme;

step (2.2.2), as shown in fig. 3, the LDA model is divided into two processes,

representing for each dialog text a topic Z extracted from the topic distribution_m,n。

The expression is extracted from the word distribution corresponding to the extracted subject_m,nThis process is repeated until every word in the current dialog text is traversed. Thus, a new dialog text is generated, the words in the text are composed of words under different subjects, and the joint probability distribution of the subjects and the words is obtained as shown in formula (1):

wherein K is the number of topics, M is the total number of texts,

is the Dirichlet prior parameter for the polynomial distribution of the vocabulary under each topic,

is the Dirichlet prior parameter for the multi-term distribution of topics under each text,

and

are two hidden variables which respectively represent the topic distribution under the mth document and the distribution of the vocabulary under the kth topic,

and

respectively representing a topic and a vocabulary.

Step (2.2.3), according to the joint probability distribution, using Gibbs Sampling pairsIt samples, assuming the word W has been observed_iT, then the bayes rule yields equation (2) as follows:

equation (3) can be derived in conjunction with the joint probability distribution:

formula estimated from Dirichlet parameters:

and finally, obtaining a Gibbs Sampling formula of the LDA model, and finally determining the theme of the predicted text as shown in a formula (6).

Wherein Z_iK is the ith word in the kth topic,

i is denoted by removing the word with index i.

And (2.3) finally, performing the operation in the step (2.1) by using the constructed LDA model, namely predicting once by using the LDA model every time the window slides. And determining the corresponding under-window context information required to be introduced in the encoding process by calculating the corresponding score of each topic. If the proportion of the statistical unintended topics exceeds 0.5, the introduction of the context information is not needed.

Step (3), firstly, the text obtained in step (2) is encoded by using a BERT model, as shown in fig. 4, firstly, each dialog text is split into single words, Token Embedding, Segment Embedding and Position Embedding are respectively performed, then summation is performed, the summation result is input into BERT and encoded into a vector of a continuous vector space, an encoding vector of each dialog is obtained, and an encoding formula is shown as formula (7):

wherein

The text to be predicted is the ith text in the dialog text.

Secondly, the coding vector of the ith text to be predicted is coded by the language feature vector with higher relevance based on the Attention mechanism

Coding vector V associated with context information_iThe correlation between them is represented by the inner product of two vectors, as shown in equation (8):

and finally, normalizing the calculated set of the correlation degree values by softmax to obtain the attention distribution probability distribution values which accord with the probability distribution value intervals.

The control module, as shown in FIG. 5, controls the introduction of context-related contextual information features by calculating a state value S, as shown in equation (9):

wherein V_HFor context-coding vectors of the ith text to be predicted in dialog text, WT_SAre weights.

And (4) training different intention classification models according to the extracted feature vectors by using a classification module, wherein the selected classification models require that the models are mutually independent, namely, the correlation between the models is as small as possible. Constructing a classifier, and encoding vectors of the ith text to be predicted

And context information encoding vector V_iClassification is performed in combination. By using sigmoid as an activation function, a feature vector E is obtained, as shown in equation (10):

the objective function uses binary cross entry, n denotes the total number of samples, x denotes the samples, y denotes the number of samples_iThe ith dimension, o, representing the target tag vector_iRepresenting the ith dimension of the predicted tag vector, the target tag vector y and the predicted tag vector o are vectors whose dimensions are the size of the tag set. As shown in formula (11):

for the classifier, existing classification algorithms can also be used to implement, for example, textCNN, fastText, etc.

And finally, further improving the accuracy of the intention recognition by adopting an optimization strategy of multi-model fusion. And (3) processing the model fusion principle by adopting a minority majority-compliant voting mode. Suppose there are m intents, m₀Indicates no intention, m₁Denotes the intention of number 1, m₂Denotes an intention numbered 2_i1 indicates that there is an intent numbered i, the results of n classifications are obtained using n classifiers, each of which is set with a weight WT_CThe weight satisfies formula (12):

WT_C1+WT_C2+WT_C3+…+WT_Cn＝1 (12)

and counting the number of each intention ticket according to the n results. There may be two cases for the statistical results:

only one intention gets the most votes and the intention is taken as the final intention recognition result.

There are two or more tickets intended to be the same number and the most. For this case, we use a weighted sum of classifiers to determine the intent. Suppose there are two intents m_i,m_j(i is more than or equal to 0, j is less than or equal to m, and i is not equal to j) satisfies the condition, and the sum of the classifier weights corresponding to the two intentions is calculated, as shown in formula (13):

and finally, taking the maximum value of the calculated result, taking the intention corresponding to the maximum value as a final intention identification result, and taking the two intentions as final results if the same maximum value exists. And finally, obtaining a final result of the intention recognition.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A multi-turn dialog intention recognition method based on a dynamic theme window model is characterized in that: the method comprises the following steps:

step (1), data preprocessing, namely firstly, removing punctuation from a conversation text in the field of traffic customer service, then using a Pkuseg word segmentation tool to segment words, and finally removing stop words;

step (2), a dynamic theme window model is constructed to predict themes around the text to be recognized, and whether window context information is introduced or not is determined according to a prediction result, so that the interference of meaningless information is removed;

step (3), encoding text content by using a BERT model, extracting a feature vector of context-related context information based on an Attention mechanism, and controlling the introduced context-related context information feature by calculating a state value;

the process of constructing a dynamic theme window model in the step (2) comprises the following steps:

step (2.1), firstly, a sliding window containing the number of conversation turns of 2 is constructed, and sliding operation with the number of conversation turns of 1 is carried out before and after the text is predicted;

step (2.2) and continuing step (2.1), predicting once by using an LDA topic model every time sliding is performed, and obtaining a prediction result of each topic;

step (2.3), whether corresponding context information is introduced in the encoding process is determined by calculating the score corresponding to each theme, and if the proportion of the calculated unintended themes exceeds 0.5, the context information does not need to be introduced, otherwise, the context information is introduced;

the construction process of the LDA topic model in the step (2.2) comprises the following steps:

step (2.2.1), firstly, establishing a theme library, wherein a theme is marked as being uninteresting, and selecting a certain number of words capable of reflecting the theme for each theme;

step (2.2.2), for each section of dialog text, extracting a theme from the theme distribution, extracting a word from the word distribution corresponding to the extracted theme, repeating the process until each word in the current dialog text is traversed, then generating a new dialog text, and simultaneously obtaining the joint probability distribution of the theme and the word;

and (2.2.3) Sampling the joint probability distribution by using Gibbs Sampling to obtain a Gibbs Sampling formula of the LDA topic model, and finally determining the topic of each section of the dialog text.

2. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the step (1) of using the Pkuseg word segmentation tool and the stop word content comprises the following steps:

firstly, training a word segmentation tool pkuseg by adopting a word segmentation corpus in the traffic field;

then, segmenting words of the dialogue text by using the trained word segmentation model;

finally, the public disuse word stock is used for carrying out the disuse word operation.

3. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the joint probability distribution of the topics and the vocabulary in the step (2.2.2) is shown by formula (1):

wherein K is the number of topics, M is the total number of texts,

and

and

respectively representing a topic and a vocabulary.

4. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the Gibbs Sampling formula of the LDA topic model in the step (2.2.3) is expressed by formula (2):

wherein Z_iK is the ith word in the kth topic,

expressed as removing the word with index i, assuming that the word W has been observed_iT, then from bayes' rule and Dirichlet parameter estimation, equation (2) can be derived.

5. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the content of the BERT model coding process and the feature extraction based on the Attention mechanism in the step (3) comprises the following steps:

splitting each dialog text into single characters, and respectively carrying out Token Embedding, Segment Embedding and Position Embedding;

summing the above three embeddings, inputting the summation result into BERT (transform bidirectional encoder) to be encoded into a vector of a continuous vector space, and obtaining an encoding vector of each dialog, wherein the encoding formula is shown as formula (3):

wherein

The method comprises the steps of obtaining an ith text to be predicted in a dialog text;

Coding vector V associated with context information_iThe correlation between them is represented by the inner product of two vectors, as shown in equation (4):

then, normalizing the calculated set of correlation degree values by softmax to obtain an attention distribution probability distribution value which accords with a probability distribution value interval;

finally, the introduced context-related information features are controlled by calculating a state value S, as shown in equation (5):

6. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the content of training different classification models and performing model fusion in the step (4) comprises:

constructing a classifier, and encoding vectors of the ith text to be predicted

And context information encoding vector V_iClassification is performed in combination. By using sigmoid as an activation function, a feature vector E is obtained, as shown in equation (6):

the objective function uses binary cross entry, n denotes the total number of samples, x denotes the samples, y denotes the number of samples_iThe ith dimension, o, representing the target tag vector_iRepresenting the ith dimension of the predicted label vector, the target label vector y and the predicted label vector o are vectors with the dimension being the size of the label set, and formula (7) shows:

for the classifier, the existing classification algorithm can be used to implement, for example, textCNN, fastText, and finally these classification models are fused, and a few majority-compliant voting rules are adopted to give the result of the intended classification.