CN111984780A

CN111984780A - Multi-intention recognition model training method, multi-intention recognition method and related device

Info

Publication number: CN111984780A
Application number: CN202010951226.5A
Authority: CN
Inventors: 黄石磊; 张剑
Original assignee: Shenzhen Raisound Technology Co ltd
Current assignee: Shenzhen Raisound Technology Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-11-24

Abstract

The invention discloses a multi-intention recognition model training method, a multi-intention recognition method and a related device. The multi-intention recognition model training method comprises the following steps: obtaining the coding vector of each dialog text, and calculating the characteristic vector with context information related to context; judging whether context information related to the context needs to be introduced or not, if so, combining the coding vector of the dialog text and the feature vector with the context information related to the context, and inputting the combined coding vector and feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text. The multi-purpose recognition method comprises the following steps: and classifying and identifying the intention of the dialog text by using the multi-label classification model obtained by training. The method can be better migrated and used, has more accurate recognition effect, and can be more favorable for improving the performance of the classification algorithm by utilizing context information related to the context.

Description

Multi-intention recognition model training method, multi-intention recognition method and related device

Technical Field

The invention relates to the technical field of computer data processing, in particular to a multi-intention recognition model training method, a multi-intention recognition method and a related device.

Background

In the existing dialogue system, many sentences in non-standard language expression forms exist, for example, the structure of the syntax is simple, most sentences are short sentences and omitted forms, the content of the dialogue is difficult to be replaced and clarified in a single round of dialogue, and the intention is usually hidden in multiple rounds of dialogue. More semantic information can be collected through multiple rounds of conversation, and the intention of the questioner can be identified more accurately. In practical application scenarios, speech recognition and human language expression are not accurate, which invisibly greatly increases the difficulty of understanding the intention of the robot to the user. How to correctly identify the intentions of questioners has been one of the key points of multi-turn dialog system research.

Early intent recognition methods considered semantic utterance classification problems, and mainly included rule-based (rule-based) methods, methods using statistical features, and methods based on machine learning classification algorithms. Rule template based methods are usually directed to some very similar sentences, and these sentences conform to certain rules. It requires the manual construction of rule templates and category information, i.e. which keywords correspond to which intents. Then, the intentions of the questioner are determined by means of rule template parsing. The statistical feature-based method is to use an intention dictionary to perform word frequency statistics and extract the intention corresponding to the most frequent word as the intention of the questioner. The method based on Machine learning generally uses classifiers such as Naive Bayes (Naive Bayes), Support Vector Machine (Support Vector Machine), and the like, and the common method for realizing multi-intent recognition by the methods is to train one classifier for each intent and then use the classifier layer by layer, which significantly improves the accuracy of intent recognition.

With the development of deep learning, the intention recognition of a dialogue system using a neural network model is becoming mainstream. The method mainly converts the intention recognition task into an intention classification task, is similar to a clustering task in a text, and achieves the effect of intention classification by using a text classification algorithm.

However, the above methods have drawbacks. Because the process of language incubation during human communication is short, the syntactic structure is simple, short sentences are used for multiple purposes, and the form is omitted, the mode of identifying intentions by making rule templates is not applicable, and the mode has high labor cost, low efficiency and difficult expansion. The method based on the statistical characteristics is relatively simple, but the recognition effect is poor. Although the methods based on machine learning are improved in the accuracy of intention recognition, most of them cannot solve the problem of sparse matrix, and only rely on a large amount of labeled corpus, and also cannot reduce the labor cost.

Text classification algorithms based on deep learning mostly do not efficiently utilize the context information in dialog text. In a multi-turn dialog system, the current intentions of the questioner are often linked to the first or last rounds of dialog. Therefore, how to understand the intention of the current dialog by using the contextual information of the dialog is a difficulty of the dialog system.

Disclosure of Invention

The invention aims to provide a multi-intention recognition model training method, a multi-intention recognition method and a related device, which are used for solving the technical problem of how to understand the intention of the current conversation by using the contextual information of the conversation.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, a multi-intent recognition model training method is provided for multi-intent recognition, and the method includes: and (3) encoding: coding the training data to obtain a coding vector of each dialog text in the training data, and calculating a feature vector with context information related to context of each dialog text; the control steps are as follows: judging whether context information related to context needs to be introduced or not according to each dialog text; and (3) classification training: for each dialog text, if the control step judges that the dialog text is the dialog text, combining the coding vector of the dialog text and the feature vector with context information related to the context information, and inputting the combined coding vector and the feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

In a possible implementation manner, the encoding step specifically includes: coding the training data by adopting a deep learning pre-training model to obtain a coding vector V of each dialog text in the training data_Q3And for each dialog text, acquiring multiple rounds of dialog texts including the dialog text by using a sliding window, and weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector with context information related to the context of the dialog text, which is marked as V_H。

Further, the weighted summation of the coding vectors of all dialog texts in the sliding window may include: the code vector of each dialog text in the sliding window is recorded as V_i(ii) a Calculating the probability distribution value of attention distribution according to the formula

With P_iFor the weighted value, for each code vector V in the sliding window_iWeighted summation is carried out to obtain a feature vector V with context information related to context_H。

In a possible implementation manner, the controlling step specifically includes: for each dialog text, the coding vector V according to the dialog text_Q3And its feature vector V with context information that is context-dependent_HAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)_S*[V_Q3,V_H]) Wherein W is_SIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.

In one possible implementation manner, in the classification training step, the combining the coding vector of the dialog text and the feature vector with context information related to the context information and inputting the combined coding vector into the classifier includes: the coding vector V of the dialog text is coded by using the sigmoid function_Q3And its feature vector V with context information that is context-dependent_HBinding, E ═ sigmoid (W)_E*[V_Q3+SV_H]) Wherein W is_EAnd inputting the feature vector E obtained by combination into a classifier for empirical parameters.

In a possible implementation, the method further includes a step of obtaining a score for each intention in the verification set, where the step specifically includes: testing each dialog text in the verification set by using the multi-label classification model obtained by training, outputting a predicted label vector, and calculating the similarity score between the predicted label vector and the target label vector by using a target function, thereby obtaining the score of each intention in the verification set, wherein the score is between 0 and 1.

In a possible implementation, the method further comprises a step of determining an optimal threshold for the score of each intention in the verification set, which includes in particular:

s1: setting an initial threshold to 0.01, selecting an intention in the verification set and calculating an F1 score of the intention on the whole verification set;

s2: judging whether the threshold value threshold is between (0, 1), if yes, updating the intention threshold value as follows: threshold +0.01, then calculate the F1 score for the intent and compare it to the last F1 score, record the maximum F1 score and its corresponding threshold; if not, go to step S3;

s3: judging whether all intentions are traversed or not, if not, traversing the next intention, returning to the step 1, otherwise, entering the step S4;

s4: and finally, selecting a threshold corresponding to the F1 score with the highest intention as the optimal threshold of the score of the intention.

In a second aspect, a multi-intent recognition method is provided, including: and (3) encoding: vector coding is carried out on the current dialog text, and a feature vector with context information related to context of the current dialog text is calculated; the control steps are as follows: judging whether context information related to the context needs to be introduced or not; and (3) classification step: if the judgment of the control step is yes, combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information, inputting the combination into a multi-label classification model, and classifying and identifying the intention of the current dialog text.

In a possible implementation manner, the encoding step specifically includes: coding the current dialog text by adopting a deep learning pre-training model, and recording the coding vector of the current dialog text as V_dAnd acquiring multiple rounds of text of the current dialog text by using a sliding window, weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector of the current dialog text with context information related to the context, and marking the feature vector as V_h。

Further, the weighted summation of the coding vectors of all dialog texts in the sliding window may include: the code vector of each dialog text in the sliding window is recorded as V_j(ii) a Calculating the probability distribution value of attention distribution according to the formula

With P_jFor the weighted value, for each code vector V in the sliding window_jWeighted summation is carried out to obtain a feature vector V with context information related to context_h。

In a possible implementation manner, the controlling step specifically includes: coding vector V according to current dialog text_dAnd its feature vector V with context information that is context-dependent_hAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)_S*[V_d,V_h]) Wherein W is_SIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.

In one possible implementation manner, in the classifying step, the combining the coding vector of the current dialog text and the feature vector thereof with context information related to context information and inputting the combined coding vector into the multi-label classification model includes: the coding vector V of the dialog text is coded by using the sigmoid function_dAnd its feature vector V with context information that is context-dependent_hBinding, E ═ sigmoid (W)_E*[V_d+SV_h]) Wherein W is_EAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.

In a third aspect, a multi-intent recognition model training apparatus is provided, including: the encoding module is used for encoding the training data to obtain an encoding vector of each dialog text in the training data and calculating a feature vector with context information related to context of each dialog text; the control module is used for judging whether context information related to context needs to be introduced or not aiming at each dialog text; the classification training module is used for inputting the coded vector of each dialog text and the characteristic vector with context information related to context into the classifier after the coded vector is combined with the characteristic vector if the control module judges that the coded vector is positive; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

In a possible implementation manner, the apparatus further includes: and the threshold setting module is used for obtaining the score of each intention in the verification set and determining the optimal threshold of the score of each intention in the verification set.

In a fourth aspect, there is provided a multi-intent recognition apparatus comprising: the encoding module is used for carrying out vector encoding on the current dialog text and calculating a feature vector with context information related to context of the current dialog text; the control module is used for judging whether context information related to context needs to be introduced or not; and the classification module is used for combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information and then inputting the combined coding vector into the multi-label classification model to classify and identify the intention of the current dialog text if the control module judges that the coding vector is positive.

In a fifth aspect, a computer device is provided, which includes a processor and a memory, the memory storing a program, the program including computer-executable instructions, when the computer device is running, the processor executing the computer-executable instructions stored in the memory, so as to cause the computer device to execute the multi-intent recognition model training method according to the first aspect.

In a sixth aspect, there is provided a computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions, the processor executing the computer-executable instructions stored by the memory when the computer device is running, to cause the computer device to perform the multi-intent recognition method according to the second aspect.

In a seventh aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform the multi-intent recognition model training method of the first aspect.

In an eighth aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform the multiple intent recognition method of the second aspect.

According to the technical scheme, the embodiment of the invention has the following advantages:

1. firstly, model training is carried out by using a deep learning-based method, so that better migration and use can be realized. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.

2. Compared with the traditional intention identification method, the method based on deep learning has more accurate identification effect.

3. Compared with a method for performing intention identification only by using current text information, the method for performing intention identification only by using the text information has the advantages that the context information related to the context in a certain range is obtained by using the sliding window capable of controlling the text range, the performance of a classification algorithm can be improved, and the influence of larger noise caused by introducing the context information related to the context from the full text can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the embodiments and the drawings used in the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flowchart of a multi-intent recognition model training method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a training apparatus for multiple intent recognition models according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a multi-intent recognition model training apparatus according to an embodiment of the present invention;

FIG. 4 is a process diagram of the encoding step in an embodiment of the present invention;

FIG. 5 is a process diagram of the control step in the embodiment of the present invention;

FIG. 6 is a process diagram of the classification step in an embodiment of the present invention;

FIG. 7 is a flow chart of a method for multi-intent recognition according to an embodiment of the present invention;

FIG. 8 is a block diagram of a multiple intent recognition apparatus according to an embodiment of the present invention;

fig. 9 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," and the like in the description and in the claims, and in the above-described drawings, are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The following are detailed descriptions of the respective embodiments.

Referring to fig. 1, an embodiment of the present invention provides a method for training a multi-intent recognition model to solve the problems of the prior art, in which a multi-label classification model for multi-intent recognition is obtained through training. The multi-intention recognition model training method can comprise the following steps: the method comprises an encoding step, a control step and a classification training step.

Wherein the encoding step comprises:

101. obtaining dialogue texts (or called dialogue texts) from a dialogue system as training data, and performing data preprocessing on the training data.

102. And coding each dialog text in the training data to obtain a coding vector.

Optionally, the text is encoded by using a deep learning pre-training model based on a neural network, and the dialog text is vectorized. Herein, the obtained coding vector of each dialog text is recorded as V_Q3。

103. A feature vector with context information is computed for each dialog text.

Optionally, the text uses a sliding window to obtain multiple rounds of text including the current dialog text, obtains the code vectors of all the dialog texts in the sliding window, and calculates their weighted sum V_HThe weighted sum V_HI.e. the feature vector with context information that is context-dependent for the current dialog text.

The control steps comprise:

104. for each dialog text, it is determined whether context information relevant to the context needs to be introduced.

In particular, the encoding vector V can be determined from each dialog text_Q3And its vector V with context information related to context_HCalculating a state value (namely, relevancy), and judging whether context information related to context needs to be introduced or not according to whether the state value exceeds a preset value or not; if the content exceeds the preset value, introducing the content, and if the content does not exceed the preset value, not introducing the content. The state value represents V_Q3And V_HAnd semantically related.

The classification training step comprises:

105. and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

Specifically, for each dialog text, if the control step judges that the dialog text is positive, a feature vector with context information related to context is introduced, and a coding vector V of the dialog text is introduced_Q3And its feature vector V with context information that is context-dependent_HInputting the combined result into a classifier; if the control step judges that the text is not the dialog text, directly encoding the vector V of the dialog text_Q3Inputting a classifier; optionally, a convolutional neural network may be used as the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text. The multi-label classification model can be used for carrying out intention classification on vectors corresponding to input dialog texts and identifying the intention of the current dialog texts.

Fig. 2 is a schematic structural diagram of a multi-intent recognition model training apparatus according to an embodiment of the present invention. Referring to fig. 3, a schematic block diagram of a training apparatus for multi-intent recognition model under a multi-turn dialog system is shown. The device can be divided into three modules, namely an encoding module 21, a control module 22 and a classification training module 23.

An Encoding (Encoding) module 21 is configured to encode the dialog texts to obtain an Encoding vector of each dialog text, and calculate a feature vector of each dialog text with context information related to a context. The coding module can map non-quantized dialog text to vectorization space by using a real number vector by using an existing realized engineering technology, namely a deep learning pre-training model, such as a BERT model, and the coder adopts an existing realized engineering technology, namely a deep neural network framework, such as a bidirectional Transformer structure.

A Control module 22 for determining, for each dialog text, whether context information relating to the context needs to be introduced. The module may also control whether context information is required to introduce context in the current predicted dialog text.

And a classification (Classifier) training module 23 for training a classification model to perform intent classification. The intent recognition problem of a multi-turn dialog should be considered a multi-label classification problem. And training a classifier to obtain a multi-label classification model, and classifying and identifying the intention of the dialog text. When the classifier is trained, aiming at each dialog text, if the control module judges that the dialog text is true, the coding vector of the dialog text and the feature vector with context information related to the context are combined and input into the classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier.

Next, each step and module will be described in detail with reference to fig. 4 to 6.

1. Coding module

Based on specific areas such as: the coding module is used for carrying out mathematical processing on the conversation text data in the field of traffic customer service. Vector coding can be performed by using an engineering technology deep learning pre-training model such as a BERT (Bidirectional Encoder) pre-training model, and each round of dialog text is mapped into a high-dimensional vector space, so that semantic information loss caused by mapping in a traditional method is avoided. Meanwhile, sentence vectors generated by the BERT contain multilayer semantic information, so that the performance of the classifier can be greatly improved. Here by means of prior art.

As shown in fig. 4, in the encoding process, each dialog text is divided into single words, Token Embedding, Segment Embedding, and Position Embedding are performed respectively, and then the sum is input to BERT and encoded into a vector of a continuous vector space, so as to obtain an encoding vector of each dialog text:

since most users in a dialog system usually use the expression of phrases and ellipses, the content of the dialog is difficult to be understood in a single round of dialog. Therefore, by collecting more semantic information using the context of the current conversation, it is necessary to more accurately understand the intention of the user. For this purpose, for each dialog text, the invention can use a dynamic sliding window to obtain the dialog information of the front and back rounds of the dialog text. The sizes w _ before and w _ after of the sliding window can be customized (w _ before and w _ after are positive integers greater than or equal to 0), such as: if the preceding two dialogs of the current turn of dialog are used as the above information, and the following one of the current turn of dialog is used as the below information, it means that w _ before is 2 and w _ after is 1. It should be noted that each turn of dialog text refers to a group of dialogs with which both sides of the dialog have interaction, and may include two or more dialog texts.

After each dialog text in the sliding window is coded by BERT, a coding vector with context information is obtained. The sliding window is recorded with n dialog texts, wherein the code vector of any one dialog text is recorded as V_i＝BERT(X_C)，i＝1,2......n。

Some of the historical information is related to the intention of the current conversation, and many of the historical information are unrelated to the intention of the current conversation, so that introduction of indiscriminate information can cause introduction of a lot of noise, and an attention model is required to be introduced for judgment. V of current dialog text_Q3V with contextual information_iThe correlation between the two vectors is represented by the inner product of the two vectors, and the finally calculated set of correlation degree values is normalized by softmax to obtain the symbolAnd (4) integrating attention of the probability distribution value interval to distribute probability distribution values. The formula is as follows:

wherein the softmax function is:

with P_iAs weighted values, each code vector V in the sliding window is set_iTo obtain a feature vector with context information associated with the context, denoted as V_H。

2. Control module

In a multi-turn dialog system, user intent recognition requires reference to context information that is context-dependent in some cases, and intent may be determined in some cases by simply relying on the current dialog. For dialogs where the intent can be determined directly, the introduction of context information that is context dependent is equivalent to the introduction of a lot of noise. To this end the control module controls whether to introduce the context information feature with context correlation by calculating a state value, as shown in fig. 5. The state value formula is as follows (this part of the step is implemented using existing techniques):

S＝sigmoid(W_S*[V_Q3,V_H])

wherein, W_SWhat is it? The sigmoid function is:

generating a value sigma between 0 and 1 through a sigmoid function, wherein when the sigma is close to 0, the sigma indicates that context information related to a context does not need to be referred to; when sigma is close to 1, context information related to a reference context is indicated; determining a preset value, judging whether the state value S exceeds the preset value, and if so, judging that context information related to context needs to be introduced; if not, the introduction is judged not to be needed.

3. Classification training module

Since the intent recognition problem of a multi-turn dialog can itself be viewed as a problem that classifies intent, classification can be implemented in analogy to multi-label classification algorithms. Deep learning has achieved excellent performance in solving the multi-label classification problem. (this step can be accomplished using existing techniques.)

As shown in fig. 6, an engineering classification model based on the existing implementation is designed, such as: a Convolutional Neural Network (CNN) acts as a classifier.

If context information relevant to the context needs to be introduced, a dialogue vector V is introduced_Q3And a feature vector V with context information that is context-dependent_HClassification is performed in combination. Firstly, a sigmoid function is used for combination to obtain a characteristic vector E, and the formula is as follows:

E＝sigmoid(W_E*[V_Q3+SV_H])

by processing each piece of dialogue data in the training set as described above. According to the judgment result of whether the control module is led in or not, if the control module judges that the control module is led in, the obtained combined feature vector E is input into a classifier, otherwise, the coding vector V of the dialog text is directly input into the classifier_Q3Inputting a classifier; and (4) training a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

The training process specifically comprises: and (3) performing convolution, maximum pooling and full-connection operation on the combined feature vector E, and setting an activation function of a classification layer to be sigmoid, so that the predicted score is limited between (0 and 1), and obtaining a predicted label vector O, wherein the dimensionality of the predicted label vector O is the size of a label set.

To evaluate the model, an objective function may be used. Alternatively, the objective function may use binary cross entry, n representing the total number of samples, x representing the samples, y_iThe ith dimension, o, representing the target tag vector_iRepresenting the ith dimension of the predicted label vector, the target label vector y and the predicted label vector o being labeled with the dimension as labelVector of the size of the set of labels. The objective function is used to calculate the similarity score between the predicted label vector O and the target label vector Y, and the formula is as follows:

4. setting of custom threshold

And testing each piece of dialogue text data in the verification set by using the multi-label classification model obtained by training, and outputting a prediction label vector. Then, using the objective function as described above, a score of the similarity between the predicted tag vector and the target tag vector is calculated, thereby obtaining a score for each intention in the verification set, the score being between 0 and 1.

Further, the present invention also provides a scheme for determining an optimal threshold for a score for each intent in a verification set, comprising the steps of:

step S1: an initial threshold of 0.01 is first set, and then one intent in the verification set is selected and an F1 score is calculated for that intent over the entire verification set. The F1 Score (F1 Score) is an index used for measuring the accuracy of the two classification models in statistics, and the index gives consideration to the accuracy and the recall rate of the classification models. The F1 score can be viewed as a harmonic mean of model accuracy and recall with a maximum of 1 and a minimum of 0.

Step S2: judging whether the threshold value threshold is between (0, 1), if so, updating the intention threshold value as follows: threshold +0.01, then calculate the F1 score for the intent and compare it to the last F1 score, record the maximum F1 score and its corresponding threshold; otherwise, the process proceeds to step S3.

Step S3: judging whether all intentions are traversed or not, if not, traversing the next intention, returning to the step S1, and repeating the steps S1 and S2; otherwise, the process proceeds to step S4.

Step S4: the threshold corresponding to each F1 score with the highest intent is ultimately selected as the best threshold for the score corresponding to the intent.

Referring to fig. 7, a method for recognizing multiple intents is also provided according to an embodiment of the present invention. The method adopts the multi-label classification model obtained by the training in the above to perform multi-intention recognition on the dialog text.

The multi-intent recognition method may include: an encoding step, a control step and a classification step.

Wherein the encoding step comprises:

701. and acquiring the current dialog text from the dialog system, and performing data preprocessing.

702. And coding the current dialog text to obtain a coding vector.

Optionally, the text is encoded by using a deep learning pre-training model based on a neural network, and the dialog text is vectorized. Herein, the encoding vector of the current dialog text is denoted as V_d。

703. A feature vector with context information that is context-dependent for the current dialog text is computed.

Optionally, the text uses a sliding window to obtain multiple rounds of text of the current dialog text, obtains coding vectors of all the dialog texts in the sliding window, performs weighted summation, and calculates a weighted sum V_hI.e. feature vectors with context information that are context-dependent for the current dialog text.

The control steps comprise:

704. judging whether context information related to the context needs to be introduced or not;

in particular, the encoding vector V can be based on the current dialog text_dAnd its vector V with context information related to context_hCalculating a state value, and judging whether context information related to context needs to be introduced or not according to whether the state value exceeds a preset value or not; if the content exceeds the preset value, introducing the content, and if the content does not exceed the preset value, not introducing the content.

The classification step comprises:

705. intent classification is performed using a multi-label classification model.

Specifically, if the control step judges that the current dialog text is the dialog text, the current dialog text is codedQuantity V_dAnd its feature vector V with context information that is context-dependent_hInputting a multi-label classification model after combination; if the control step judges that the text is not the dialog text, directly encoding the vector V of the dialog text_dInputting a multi-label classification model; thus, the intention of the current dialog text is classified and recognized. The multi-label classification model is obtained by training according to the multi-intention recognition model training method.

Optionally, in the encoding step, the weighted summation may include: the code vector of each dialog text in the sliding window is recorded as V_j(ii) a Calculating the probability distribution value of attention distribution according to the formula

Optionally, in the controlling step, the encoding vector V according to the current dialog text may be used_dAnd its feature vector V with context information that is context-dependent_hAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)_S*[V_d,V_h]) Wherein W is_SIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.

Optionally, in the classifying step, the combining the coding vector of the current dialog text and the feature vector thereof with context information related to context information and inputting the combined coding vector into the multi-label classification model may include: the coding vector V of the dialog text is coded by using the sigmoid function_dAnd its feature vector V with context information that is context-dependent_hBinding, E ═ sigmoid (W)_E*[V_d+SV_h]) Wherein W is_EAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.

Referring to fig. 8, an embodiment of the present invention further provides a multi-intent recognition apparatus, including:

the encoding module 81 is configured to perform vector encoding on the current dialog text, and calculate a feature vector of the current dialog text with context information related to context;

a control module 82 for determining whether context information relating to context needs to be introduced;

and the classification module 83 is configured to, if the control module determines that the text is a text conversation, combine the coding vector of the current text conversation and the feature vector thereof with context information related to context, and input the combined vector into the multi-label classification model to perform classification and identification on the intention of the current text conversation.

The multi-intent recognition model training method, the multi-intent recognition method and the related device provided by the embodiment of the invention are explained above. According to the technical scheme, the embodiment of the invention has the following advantages:

1. firstly, the method based on deep learning can be better used in a migration mode. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.

Referring to fig. 9, an embodiment of the present invention further provides a computer device 90, which includes a processor 91 and a memory 92, where the memory 92 stores a program, and the program includes computer-executable instructions, and when the computer device 90 runs, the processor 91 executes the computer-executable instructions stored in the memory 92, so as to make the computer device 90 execute the multi-intent recognition model training method or the multi-intent recognition method as described above.

An embodiment of the present invention also provides a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform a multi-intent recognition model training method or a multi-intent recognition method as described above.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; those of ordinary skill in the art will understand that: the technical solutions described in the above embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multi-intention recognition model training method is characterized by comprising the following steps:

and (3) encoding: coding the training data to obtain a coding vector of each dialog text in the training data, and calculating a feature vector with context information related to context of each dialog text;

the control steps are as follows: judging whether context information related to context needs to be introduced or not according to each dialog text;

and (3) classification training: for each dialog text, if the control step judges that the dialog text is the dialog text, combining the coding vector of the dialog text and the feature vector with context information related to the context information, and inputting the combined coding vector and the feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

2. The method of claim 1,

the encoding step specifically includes: coding the training data by adopting a deep learning pre-training model to obtain a coding vector V of each dialog text in the training data_Q3And for each dialog text, acquiring multiple rounds of dialog texts including the dialog text by using a sliding window, and weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector with context information related to the context of the dialog text, which is marked as V_H。

3. The method of claim 2,

the control step specifically comprises: for each dialog text, the coding vector V according to the dialog text_Q3And its feature vector V with context information that is context-dependent_HAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)_S*[V_Q3,V_H]) Wherein W is_SIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.

4. The method according to claim 2, wherein in the classification training step, the combining the coded vector of the dialog text and the feature vector with context information thereof is input to a classifier, comprising:

the coding vector V of the dialog text is coded by using the sigmoid function_Q3And its feature vector V with context information that is context-dependent_HBinding, E ═ sigmoid (W)_E*[V_Q3+SV_H]) Wherein W is_EAnd inputting the feature vector E obtained by combination into a classifier for empirical parameters.

5. The method according to any of claims 1-4, further comprising the step of obtaining a score for each intent in the verification set, the step comprising in particular:

testing each dialog text in the verification set by using the multi-label classification model obtained by training, outputting a predicted label vector, and calculating the similarity score between the predicted label vector and the target label vector by using a target function, thereby obtaining the score of each intention in the verification set, wherein the score is between 0 and 1.

6. The method of claim 5, further comprising the step of determining an optimal threshold for the score of each intent in the verification set, the step comprising in particular:

7. A multi-intent recognition method, comprising:

and (3) encoding: vector coding is carried out on the current dialog text, and a feature vector with context information related to context of the current dialog text is calculated;

the control steps are as follows: judging whether context information related to the context needs to be introduced or not;

and (3) classification step: if the judgment of the control step is yes, combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information, inputting the combination into a multi-label classification model, and classifying and identifying the intention of the current dialog text.

8. The method of claim 7,

the encoding step specifically includes: coding the current dialog text by adopting a deep learning pre-training model, and recording the coding vector of the current dialog text as V_dAnd acquiring multiple rounds of text of the current dialog text by using a sliding window, weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector of the current dialog text with context information related to the context, and marking the feature vector as V_h；

The control step specifically includes: coding vector V according to current dialog text_dAnd its feature vector V with context information that is context-dependent_hAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)_S*[V_d,V_h]) Wherein W is_SIs an empirical parameter; judging whether context information related to context needs to be introduced or not according to whether the state value S exceeds a preset value or not;

in the classifying step, the combining the encoding vector of the current dialog text and the feature vector with context information related to the context information and inputting the combined encoding vector into the multi-label classification model includes: the coding vector V of the dialog text is coded by using the sigmoid function_dAnd its feature vector V with context information that is context-dependent_hBinding, E ═ sigmoid (W)_E*[V_d+SV_h]) Wherein W is_EAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.

9. A multi-intent recognition model training device, comprising:

the encoding module is used for encoding the training data to obtain an encoding vector of each dialog text in the training data and calculating a feature vector with context information related to context of each dialog text;

the control module is used for judging whether context information related to context needs to be introduced or not aiming at each dialog text;

the classification training module is used for inputting the coded vector of each dialog text and the characteristic vector with context information related to context into the classifier after the coded vector is combined with the characteristic vector if the control module judges that the coded vector is positive; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.

10. A multiple intent recognition apparatus, comprising:

the encoding module is used for carrying out vector encoding on the current dialog text and calculating a feature vector with context information related to context of the current dialog text;

the control module is used for judging whether context information related to context needs to be introduced or not;

and the classification module is used for combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information and then inputting the combined coding vector into the multi-label classification model to classify and identify the intention of the current dialog text if the control module judges that the coding vector is positive.