CN112613297A - Dynamic subject window model-based multi-turn dialog intention recognition method - Google Patents
Dynamic subject window model-based multi-turn dialog intention recognition method Download PDFInfo
- Publication number
- CN112613297A CN112613297A CN202011500583.6A CN202011500583A CN112613297A CN 112613297 A CN112613297 A CN 112613297A CN 202011500583 A CN202011500583 A CN 202011500583A CN 112613297 A CN112613297 A CN 112613297A
- Authority
- CN
- China
- Prior art keywords
- text
- model
- topic
- theme
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
A multi-turn under-dialogue intention recognition method based on a dynamic theme window model relates to the field of machine learning and natural language processing, and comprises the steps of firstly constructing a fixed-size window, sliding before and after predicting a text, then determining context information in an introduced window based on the theme model so as to carry out rough denoising, and then coding the introduced text by using a BERT model so as to obtain multi-level semantic information; an Attention mechanism is used for obtaining the feature vector, and a state value is calculated to control the introduced context information feature related to the context, so that the second more detailed denoising processing is carried out. And finally, training by using different intention classification models, and performing model fusion by adopting a minority majority-obeying voting mode to give an intention recognition result. The invention is based on a deep learning method, has more accurate recognition effect, and can be better transferred and used for intention recognition tasks in different fields.
Description
Technical Field
The invention relates to the field of machine learning and natural language processing, in particular to an intention recognition method under multiple rounds of conversations based on a dynamic theme window model.
Background
Constructing a human-machine dialog system that can automatically interact with humans using natural language has been a significant challenge for academic research and commercial applications. From an application point of view, dialog systems can be divided into two broad categories, namely task oriented dialog systems and chat oriented dialog systems. Our invention patent is directed to intent recognition in a task-oriented guided human-machine dialog system. In the existing task-oriented man-machine conversation system, the conversation text has the following three characteristics: firstly, the ratio of oral expressions in sentences is high, and the same concept has multiple expression modes; secondly, the proportion of short texts is higher, the input expression of each round of the user is relatively short, and some input words are only a few words; third, the content of each of the multiple rounds of conversation may not be independent, i.e., sometimes the content of the conversation is difficult to clarify by elaborating in a single round of conversation, in which case it may be necessary to incorporate contextual information to assist in understanding the conversation. More semantic information can be collected through multiple rounds of conversation, and the intention of the questioner can be identified more accurately. In practical application scenarios, speech recognition and human language expression are not accurate, which invisibly greatly increases the difficulty of understanding the intention of the robot to the user. How to correctly identify the intentions of questioners has been one of the key points of multi-turn dialog system research.
Early intent recognition methods considered semantic utterance classification problems and mainly included rule template-based methods, methods using statistical features, and machine learning classification algorithm-based methods. Rule template-based methods are usually directed to sentences that conform to certain rules and have very similar structures. It requires the manual construction of rule templates and category information, i.e. which keywords correspond to which intents. Then, the intentions of the questioner are determined by means of rule template parsing. The rule-based dialog system cannot support true open domain dialog, the system cannot recognize once the rule is out of range, and the mode is high in labor cost, low in efficiency and not easy to expand. The method based on the statistical characteristics is a method of performing word frequency statistics using an intention dictionary and extracting the intention corresponding to the most frequent word as the intention of the questioner, and although this method is relatively simple, its recognition effect is not good. The method based on Machine learning generally uses classifiers such as Naive Bayes, Support Vector Machine, Logistic regression and the like, and the common method for realizing multi-intention recognition of the methods is to train one classifier for each intention to form a classification chain and then use the classification chain layer by layer.
In recent years, with the continuous development of deep learning, the use of neural network models to deal with the problems of intention recognition in dialogue systems is becoming a mainstream trend. The main idea is to convert the abstract problem into a concrete problem, namely to convert an intention recognition task into an intention classification task, similar to a clustering task in a text, and achieve the intention recognition effect by using a text classification algorithm.
Disclosure of Invention
The invention aims to: the method aims to solve the problem that the intention recognition false detection rate in a multi-turn dialog system is high due to factors such as short spoken expressions and wide contents in a complex task type dialog scene. A method for recognizing intention under multiple rounds of conversations based on a dynamic topic window model is provided. The method overcomes the difficult problems of short oral expression and wide content in a complex conversation scene by effectively utilizing context information related to the context, can effectively identify the intention of a questioner, and has important help in the development of a task-oriented intelligent conversation system.
In order to achieve the purpose, the invention adopts the following technical scheme:
step (1), data preprocessing: removing punctuation, word segmentation and stop words;
step (2), constructing a dynamic theme window model to predict themes around the text to be recognized, and introducing window context information according to the prediction result;
step (3), coding the context information obtained in the step (2) by using a BERT model, extracting a feature vector of the context information related to the context based on an Attention mechanism, and controlling the introduced context information feature related to the context by calculating a state value;
step (4), finally, training different intention classification models according to the extracted feature vectors, fusing all classification models, and giving an intention recognition result by adopting a voting mode;
compared with the prior art, the invention has the beneficial effects that:
1. firstly, the method based on deep learning can be better used in a migration mode. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.
2. Secondly, compared with a method for performing intention recognition only by using current text information, whether context information related to the context in a certain range needs to be introduced or not can be controlled by using the topic model, so that the performance of the classification algorithm can be effectively improved, and the influence of noise caused by introducing unnecessary context information from the full text can be reduced.
3. Finally, compared with the traditional intention identification method, the method based on deep learning has more accurate identification effect, and the intention identification effect can be further improved by adopting a multi-model fusion strategy.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a block diagram of the intent recognition algorithm of the present invention;
FIG. 3 is a diagram of an LDA topic model used in the present invention;
FIG. 4 is a diagram of an encoding process using the BERT model according to the present invention;
FIG. 5 illustrates an implementation of status signals in a control module according to the present invention.
Detailed Description
In the embodiment, a method for recognizing an intention under multiple rounds of conversations based on a dynamic topic window model is provided. The data flow diagram is shown in fig. 1. The intention recognition algorithm framework is mainly divided into three modules: as shown in fig. 2, the encoding module, the control module and the classification module are specifically performed according to the following steps:
step (1), data preprocessing: removing punctuation, word segmentation and stop words;
step (1.1), firstly, removing punctuation and special symbols such as emoticons from the conversation text, and then removing duplication;
step (1.2), then, performing word segmentation on the text by using a word segmentation tool pkuseg in a specific field, so as to eliminate the influence of ambiguous words to a certain extent;
and (1.3) finally, performing word-stop operation on the word segmentation result by adopting a public word-stop library.
Step (2), a coding module constructs a dynamic theme window model to predict themes around the text to be recognized;
step (2.1), firstly, a window containing the number of the conversation rounds of 2 is constructed, and the predicted text Q is subjected toiSliding operation with a number of wheels of 1 before and after is performed, and is denoted as SW (Q)i-1Ai-1,QiAi) Wherein Q isiAiThe number of the dialog turns sliding back and forth is not more than 3 turns from the text to be predicted for one turn of dialog of the ith turn;
step (2.2), secondly, constructing an LDA model;
step (2.2.1), a theme library is established, wherein a theme is marked as being uninteresting, and a certain number of words capable of reflecting the theme are selected for each theme;
step (2.2.2), as shown in fig. 3, the LDA model is divided into two processes,representing for each dialog text a topic Z extracted from the topic distributionm,n。The expression is extracted from the word distribution corresponding to the extracted subjectm,nThis process is repeated until every word in the current dialog text is traversed. Thus, a new dialog text is generated, the words in the text are composed of words under different subjects, and the joint probability distribution of the subjects and the words is obtained as shown in formula (1):
wherein K is the number of topics, M is the total number of texts,is the Dirichlet prior parameter for the polynomial distribution of the vocabulary under each topic,is the Dirichlet prior parameter for the multi-term distribution of topics under each text,andare two hidden variables which respectively represent the topic distribution under the mth document and the distribution of the vocabulary under the kth topic,andrespectively representing a topic and a vocabulary.
Step (2.2.3), according to the joint probability distribution, using Gibbs Sampling pairsIt samples, assuming the word W has been observediT, then the bayes rule yields equation (2) as follows:
equation (3) can be derived in conjunction with the joint probability distribution:
formula estimated from Dirichlet parameters:
and finally, obtaining a Gibbs Sampling formula of the LDA model, and finally determining the theme of the predicted text as shown in a formula (6).
And (2.3) finally, performing the operation in the step (2.1) by using the constructed LDA model, namely predicting once by using the LDA model every time the window slides. And determining the corresponding under-window context information required to be introduced in the encoding process by calculating the corresponding score of each topic. If the proportion of the statistical unintended topics exceeds 0.5, the introduction of the context information is not needed.
Step (3), firstly, the text obtained in step (2) is encoded by using a BERT model, as shown in fig. 4, firstly, each dialog text is split into single words, Token Embedding, Segment Embedding and Position Embedding are respectively performed, then summation is performed, the summation result is input into BERT and encoded into a vector of a continuous vector space, an encoding vector of each dialog is obtained, and an encoding formula is shown as formula (7):
Secondly, the coding vector of the ith text to be predicted is coded by the language feature vector with higher relevance based on the Attention mechanismCoding vector V associated with context informationiThe correlation between them is represented by the inner product of two vectors, as shown in equation (8):
and finally, normalizing the calculated set of the correlation degree values by softmax to obtain the attention distribution probability distribution values which accord with the probability distribution value intervals.
The control module, as shown in FIG. 5, controls the introduction of context-related contextual information features by calculating a state value S, as shown in equation (9):
wherein VHFor context-coding vectors of the ith text to be predicted in dialog text, WTSAre weights.
And (4) training different intention classification models according to the extracted feature vectors by using a classification module, wherein the selected classification models require that the models are mutually independent, namely, the correlation between the models is as small as possible. Constructing a classifier, and encoding vectors of the ith text to be predictedAnd context information encoding vector ViClassification is performed in combination. By using sigmoid as an activation function, a feature vector E is obtained, as shown in equation (10):
the objective function uses binary cross entry, n denotes the total number of samples, x denotes the samples, y denotes the number of samplesiThe ith dimension, o, representing the target tag vectoriRepresenting the ith dimension of the predicted tag vector, the target tag vector y and the predicted tag vector o are vectors whose dimensions are the size of the tag set. As shown in formula (11):
for the classifier, existing classification algorithms can also be used to implement, for example, textCNN, fastText, etc.
And finally, further improving the accuracy of the intention recognition by adopting an optimization strategy of multi-model fusion. And (3) processing the model fusion principle by adopting a minority majority-compliant voting mode. Suppose there are m intents, m0Indicates no intention, m1Denotes the intention of number 1, m2Denotes an intention numbered 2i1 indicates that there is an intent numbered i, the results of n classifications are obtained using n classifiers, each of which is set with a weight WTCThe weight satisfies formula (12):
WTC1+WTC2+WTC3+…+WTCn=1 (12)
and counting the number of each intention ticket according to the n results. There may be two cases for the statistical results:
only one intention gets the most votes and the intention is taken as the final intention recognition result.
There are two or more tickets intended to be the same number and the most. For this case, we use a weighted sum of classifiers to determine the intent. Suppose there are two intents mi,mj(i is more than or equal to 0, j is less than or equal to m, and i is not equal to j) satisfies the condition, and the sum of the classifier weights corresponding to the two intentions is calculated, as shown in formula (13):
and finally, taking the maximum value of the calculated result, taking the intention corresponding to the maximum value as a final intention identification result, and taking the two intentions as final results if the same maximum value exists. And finally, obtaining a final result of the intention recognition.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (6)
1. A multi-turn dialog intention recognition method based on a dynamic theme window model is characterized in that: the method comprises the following steps:
step (1), data preprocessing, namely firstly, removing punctuation from a conversation text in the field of traffic customer service, then using a Pkuseg word segmentation tool to segment words, and finally removing stop words;
step (2), a dynamic theme window model is constructed to predict themes around the text to be recognized, and whether window context information is introduced or not is determined according to a prediction result, so that the interference of meaningless information is removed;
step (3), encoding text content by using a BERT model, extracting a feature vector of context-related context information based on an Attention mechanism, and controlling the introduced context-related context information feature by calculating a state value;
step (4), finally, training different intention classification models according to the extracted feature vectors, fusing all classification models, and giving an intention recognition result by adopting a voting mode;
the process of constructing a dynamic theme window model in the step (2) comprises the following steps:
step (2.1), firstly, a sliding window containing the number of conversation turns of 2 is constructed, and sliding operation with the number of conversation turns of 1 is carried out before and after the text is predicted;
step (2.2) and continuing step (2.1), predicting once by using an LDA topic model every time sliding is performed, and obtaining a prediction result of each topic;
step (2.3), whether corresponding context information is introduced in the encoding process is determined by calculating the score corresponding to each theme, and if the proportion of the calculated unintended themes exceeds 0.5, the context information does not need to be introduced, otherwise, the context information is introduced;
the construction process of the LDA topic model in the step (2.2) comprises the following steps:
step (2.2.1), firstly, establishing a theme library, wherein a theme is marked as being uninteresting, and selecting a certain number of words capable of reflecting the theme for each theme;
step (2.2.2), for each section of dialog text, extracting a theme from the theme distribution, extracting a word from the word distribution corresponding to the extracted theme, repeating the process until each word in the current dialog text is traversed, then generating a new dialog text, and simultaneously obtaining the joint probability distribution of the theme and the word;
and (2.2.3) Sampling the joint probability distribution by using Gibbs Sampling to obtain a Gibbs Sampling formula of the LDA topic model, and finally determining the topic of each section of the dialog text.
2. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the step (1) of using the Pkuseg word segmentation tool and the stop word content comprises the following steps:
firstly, training a word segmentation tool pkuseg by adopting a word segmentation corpus in the traffic field;
then, segmenting words of the dialogue text by using the trained word segmentation model;
finally, the public disuse word stock is used for carrying out the disuse word operation.
3. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the joint probability distribution of the topics and the vocabulary in the step (2.2.2) is shown by formula (1):
wherein K is the number of topics, M is the total number of texts,is the Dirichlet prior parameter for the polynomial distribution of the vocabulary under each topic,is the Dirichlet prior parameter for the multi-term distribution of topics under each text,andare two hidden variables which respectively represent the topic distribution under the mth document and the distribution of the vocabulary under the kth topic,andrespectively representing a topic and a vocabulary.
4. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the Gibbs Sampling formula of the LDA topic model in the step (2.2.3) is expressed by formula (2):
5. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the content of the BERT model coding process and the feature extraction based on the Attention mechanism in the step (3) comprises the following steps:
splitting each dialog text into single characters, and respectively carrying out Token Embedding, Segment Embedding and Position Embedding;
summing the above three embeddings, inputting the summation result into BERT (transform bidirectional encoder) to be encoded into a vector of a continuous vector space, and obtaining an encoding vector of each dialog, wherein the encoding formula is shown as formula (3):
secondly, the coding vector of the ith text to be predicted is coded by the language feature vector with higher relevance based on the Attention mechanismCoding vector V associated with context informationiThe correlation between them is represented by the inner product of two vectors, as shown in equation (4):
then, normalizing the calculated set of correlation degree values by softmax to obtain an attention distribution probability distribution value which accords with a probability distribution value interval;
finally, the introduced context-related information features are controlled by calculating a state value S, as shown in equation (5):
wherein VHFor context-coding vectors of the ith text to be predicted in dialog text, WTSAre weights.
6. The method for recognizing the intention under multiple turns of conversations based on the dynamic topic window model as claimed in claim 1, wherein: the content of training different classification models and performing model fusion in the step (4) comprises:
constructing a classifier, and encoding vectors of the ith text to be predictedAnd context information encoding vector ViClassification is performed in combination. By using sigmoid as an activation function, a feature vector E is obtained, as shown in equation (6):
the objective function uses binary cross entry, n denotes the total number of samples, x denotes the samples, y denotes the number of samplesiThe ith dimension, o, representing the target tag vectoriRepresenting the ith dimension of the predicted label vector, the target label vector y and the predicted label vector o are vectors with the dimension being the size of the label set, and formula (7) shows:
for the classifier, the existing classification algorithm can be used to implement, for example, textCNN, fastText, and finally these classification models are fused, and a few majority-compliant voting rules are adopted to give the result of the intended classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011500583.6A CN112613297A (en) | 2020-12-17 | 2020-12-17 | Dynamic subject window model-based multi-turn dialog intention recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011500583.6A CN112613297A (en) | 2020-12-17 | 2020-12-17 | Dynamic subject window model-based multi-turn dialog intention recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112613297A true CN112613297A (en) | 2021-04-06 |
Family
ID=75240425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011500583.6A Withdrawn CN112613297A (en) | 2020-12-17 | 2020-12-17 | Dynamic subject window model-based multi-turn dialog intention recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613297A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554127A (en) * | 2021-09-18 | 2021-10-26 | 南京猫头鹰智能科技有限公司 | Image recognition method, device and medium based on hybrid model |
-
2020
- 2020-12-17 CN CN202011500583.6A patent/CN112613297A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554127A (en) * | 2021-09-18 | 2021-10-26 | 南京猫头鹰智能科技有限公司 | Image recognition method, device and medium based on hybrid model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN110083831B (en) | Chinese named entity identification method based on BERT-BiGRU-CRF | |
CN108874972B (en) | Multi-turn emotion conversation method based on deep learning | |
CN108597541B (en) | Speech emotion recognition method and system for enhancing anger and happiness recognition | |
CN110647612A (en) | Visual conversation generation method based on double-visual attention network | |
CN111259987B (en) | Method for extracting event main body by multi-model fusion based on BERT | |
CN112417894B (en) | Conversation intention identification method and system based on multi-task learning | |
CN107316654A (en) | Emotion identification method based on DIS NV features | |
CN107797987B (en) | Bi-LSTM-CNN-based mixed corpus named entity identification method | |
CN111753058B (en) | Text viewpoint mining method and system | |
CN111984780A (en) | Multi-intention recognition model training method, multi-intention recognition method and related device | |
CN114153971B (en) | Error correction recognition and classification equipment for Chinese text containing errors | |
CN115292463B (en) | Information extraction-based method for joint multi-intention detection and overlapping slot filling | |
CN113178193A (en) | Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip | |
CN111125367A (en) | Multi-character relation extraction method based on multi-level attention mechanism | |
KR20200105057A (en) | Apparatus and method for extracting inquiry features for alalysis of inquery sentence | |
CN113223509A (en) | Fuzzy statement identification method and system applied to multi-person mixed scene | |
CN114385802A (en) | Common-emotion conversation generation method integrating theme prediction and emotion inference | |
CN108536781B (en) | Social network emotion focus mining method and system | |
CN112328748A (en) | Method for identifying insurance configuration intention | |
CN111199149A (en) | Intelligent statement clarifying method and system for dialog system | |
CN113239690A (en) | Chinese text intention identification method based on integration of Bert and fully-connected neural network | |
CN116010874A (en) | Emotion recognition method based on deep learning multi-mode deep scale emotion feature fusion | |
Zhao et al. | Knowledge-aware bayesian co-attention for multimodal emotion recognition | |
Atkar et al. | Speech emotion recognition using dialogue emotion decoder and CNN Classifier |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210406 |