CN111984780A - Multi-intention recognition model training method, multi-intention recognition method and related device - Google Patents

Multi-intention recognition model training method, multi-intention recognition method and related device Download PDF

Info

Publication number
CN111984780A
CN111984780A CN202010951226.5A CN202010951226A CN111984780A CN 111984780 A CN111984780 A CN 111984780A CN 202010951226 A CN202010951226 A CN 202010951226A CN 111984780 A CN111984780 A CN 111984780A
Authority
CN
China
Prior art keywords
vector
dialog text
context
context information
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010951226.5A
Other languages
Chinese (zh)
Inventor
黄石磊
张剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Raisound Technology Co ltd
Original Assignee
Shenzhen Raisound Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Raisound Technology Co ltd filed Critical Shenzhen Raisound Technology Co ltd
Priority to CN202010951226.5A priority Critical patent/CN111984780A/en
Publication of CN111984780A publication Critical patent/CN111984780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-intention recognition model training method, a multi-intention recognition method and a related device. The multi-intention recognition model training method comprises the following steps: obtaining the coding vector of each dialog text, and calculating the characteristic vector with context information related to context; judging whether context information related to the context needs to be introduced or not, if so, combining the coding vector of the dialog text and the feature vector with the context information related to the context, and inputting the combined coding vector and feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text. The multi-purpose recognition method comprises the following steps: and classifying and identifying the intention of the dialog text by using the multi-label classification model obtained by training. The method can be better migrated and used, has more accurate recognition effect, and can be more favorable for improving the performance of the classification algorithm by utilizing context information related to the context.

Description

Multi-intention recognition model training method, multi-intention recognition method and related device
Technical Field
The invention relates to the technical field of computer data processing, in particular to a multi-intention recognition model training method, a multi-intention recognition method and a related device.
Background
In the existing dialogue system, many sentences in non-standard language expression forms exist, for example, the structure of the syntax is simple, most sentences are short sentences and omitted forms, the content of the dialogue is difficult to be replaced and clarified in a single round of dialogue, and the intention is usually hidden in multiple rounds of dialogue. More semantic information can be collected through multiple rounds of conversation, and the intention of the questioner can be identified more accurately. In practical application scenarios, speech recognition and human language expression are not accurate, which invisibly greatly increases the difficulty of understanding the intention of the robot to the user. How to correctly identify the intentions of questioners has been one of the key points of multi-turn dialog system research.
Early intent recognition methods considered semantic utterance classification problems, and mainly included rule-based (rule-based) methods, methods using statistical features, and methods based on machine learning classification algorithms. Rule template based methods are usually directed to some very similar sentences, and these sentences conform to certain rules. It requires the manual construction of rule templates and category information, i.e. which keywords correspond to which intents. Then, the intentions of the questioner are determined by means of rule template parsing. The statistical feature-based method is to use an intention dictionary to perform word frequency statistics and extract the intention corresponding to the most frequent word as the intention of the questioner. The method based on Machine learning generally uses classifiers such as Naive Bayes (Naive Bayes), Support Vector Machine (Support Vector Machine), and the like, and the common method for realizing multi-intent recognition by the methods is to train one classifier for each intent and then use the classifier layer by layer, which significantly improves the accuracy of intent recognition.
With the development of deep learning, the intention recognition of a dialogue system using a neural network model is becoming mainstream. The method mainly converts the intention recognition task into an intention classification task, is similar to a clustering task in a text, and achieves the effect of intention classification by using a text classification algorithm.
However, the above methods have drawbacks. Because the process of language incubation during human communication is short, the syntactic structure is simple, short sentences are used for multiple purposes, and the form is omitted, the mode of identifying intentions by making rule templates is not applicable, and the mode has high labor cost, low efficiency and difficult expansion. The method based on the statistical characteristics is relatively simple, but the recognition effect is poor. Although the methods based on machine learning are improved in the accuracy of intention recognition, most of them cannot solve the problem of sparse matrix, and only rely on a large amount of labeled corpus, and also cannot reduce the labor cost.
Text classification algorithms based on deep learning mostly do not efficiently utilize the context information in dialog text. In a multi-turn dialog system, the current intentions of the questioner are often linked to the first or last rounds of dialog. Therefore, how to understand the intention of the current dialog by using the contextual information of the dialog is a difficulty of the dialog system.
Disclosure of Invention
The invention aims to provide a multi-intention recognition model training method, a multi-intention recognition method and a related device, which are used for solving the technical problem of how to understand the intention of the current conversation by using the contextual information of the conversation.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a multi-intent recognition model training method is provided for multi-intent recognition, and the method includes: and (3) encoding: coding the training data to obtain a coding vector of each dialog text in the training data, and calculating a feature vector with context information related to context of each dialog text; the control steps are as follows: judging whether context information related to context needs to be introduced or not according to each dialog text; and (3) classification training: for each dialog text, if the control step judges that the dialog text is the dialog text, combining the coding vector of the dialog text and the feature vector with context information related to the context information, and inputting the combined coding vector and the feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
In a possible implementation manner, the encoding step specifically includes: coding the training data by adopting a deep learning pre-training model to obtain a coding vector V of each dialog text in the training dataQ3And for each dialog text, acquiring multiple rounds of dialog texts including the dialog text by using a sliding window, and weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector with context information related to the context of the dialog text, which is marked as VH
Further, the weighted summation of the coding vectors of all dialog texts in the sliding window may include: the code vector of each dialog text in the sliding window is recorded as Vi(ii) a Calculating the probability distribution value of attention distribution according to the formula
Figure BDA0002676996960000021
With PiFor the weighted value, for each code vector V in the sliding windowiWeighted summation is carried out to obtain a feature vector V with context information related to contextH
In a possible implementation manner, the controlling step specifically includes: for each dialog text, the coding vector V according to the dialog textQ3And its feature vector V with context information that is context-dependentHAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)S*[VQ3,VH]) Wherein W isSIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.
In one possible implementation manner, in the classification training step, the combining the coding vector of the dialog text and the feature vector with context information related to the context information and inputting the combined coding vector into the classifier includes: the coding vector V of the dialog text is coded by using the sigmoid functionQ3And its feature vector V with context information that is context-dependentHBinding, E ═ sigmoid (W)E*[VQ3+SVH]) Wherein W isEAnd inputting the feature vector E obtained by combination into a classifier for empirical parameters.
In a possible implementation, the method further includes a step of obtaining a score for each intention in the verification set, where the step specifically includes: testing each dialog text in the verification set by using the multi-label classification model obtained by training, outputting a predicted label vector, and calculating the similarity score between the predicted label vector and the target label vector by using a target function, thereby obtaining the score of each intention in the verification set, wherein the score is between 0 and 1.
In a possible implementation, the method further comprises a step of determining an optimal threshold for the score of each intention in the verification set, which includes in particular:
s1: setting an initial threshold to 0.01, selecting an intention in the verification set and calculating an F1 score of the intention on the whole verification set;
s2: judging whether the threshold value threshold is between (0, 1), if yes, updating the intention threshold value as follows: threshold +0.01, then calculate the F1 score for the intent and compare it to the last F1 score, record the maximum F1 score and its corresponding threshold; if not, go to step S3;
s3: judging whether all intentions are traversed or not, if not, traversing the next intention, returning to the step 1, otherwise, entering the step S4;
s4: and finally, selecting a threshold corresponding to the F1 score with the highest intention as the optimal threshold of the score of the intention.
In a second aspect, a multi-intent recognition method is provided, including: and (3) encoding: vector coding is carried out on the current dialog text, and a feature vector with context information related to context of the current dialog text is calculated; the control steps are as follows: judging whether context information related to the context needs to be introduced or not; and (3) classification step: if the judgment of the control step is yes, combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information, inputting the combination into a multi-label classification model, and classifying and identifying the intention of the current dialog text.
In a possible implementation manner, the encoding step specifically includes: coding the current dialog text by adopting a deep learning pre-training model, and recording the coding vector of the current dialog text as VdAnd acquiring multiple rounds of text of the current dialog text by using a sliding window, weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector of the current dialog text with context information related to the context, and marking the feature vector as Vh
Further, the weighted summation of the coding vectors of all dialog texts in the sliding window may include: the code vector of each dialog text in the sliding window is recorded as Vj(ii) a Calculating the probability distribution value of attention distribution according to the formula
Figure BDA0002676996960000041
With PjFor the weighted value, for each code vector V in the sliding windowjWeighted summation is carried out to obtain a feature vector V with context information related to contexth
In a possible implementation manner, the controlling step specifically includes: coding vector V according to current dialog textdAnd its feature vector V with context information that is context-dependenthAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)S*[Vd,Vh]) Wherein W isSIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.
In one possible implementation manner, in the classifying step, the combining the coding vector of the current dialog text and the feature vector thereof with context information related to context information and inputting the combined coding vector into the multi-label classification model includes: the coding vector V of the dialog text is coded by using the sigmoid functiondAnd its feature vector V with context information that is context-dependenthBinding, E ═ sigmoid (W)E*[Vd+SVh]) Wherein W isEAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.
In a third aspect, a multi-intent recognition model training apparatus is provided, including: the encoding module is used for encoding the training data to obtain an encoding vector of each dialog text in the training data and calculating a feature vector with context information related to context of each dialog text; the control module is used for judging whether context information related to context needs to be introduced or not aiming at each dialog text; the classification training module is used for inputting the coded vector of each dialog text and the characteristic vector with context information related to context into the classifier after the coded vector is combined with the characteristic vector if the control module judges that the coded vector is positive; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
In a possible implementation manner, the apparatus further includes: and the threshold setting module is used for obtaining the score of each intention in the verification set and determining the optimal threshold of the score of each intention in the verification set.
In a fourth aspect, there is provided a multi-intent recognition apparatus comprising: the encoding module is used for carrying out vector encoding on the current dialog text and calculating a feature vector with context information related to context of the current dialog text; the control module is used for judging whether context information related to context needs to be introduced or not; and the classification module is used for combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information and then inputting the combined coding vector into the multi-label classification model to classify and identify the intention of the current dialog text if the control module judges that the coding vector is positive.
In a fifth aspect, a computer device is provided, which includes a processor and a memory, the memory storing a program, the program including computer-executable instructions, when the computer device is running, the processor executing the computer-executable instructions stored in the memory, so as to cause the computer device to execute the multi-intent recognition model training method according to the first aspect.
In a sixth aspect, there is provided a computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions, the processor executing the computer-executable instructions stored by the memory when the computer device is running, to cause the computer device to perform the multi-intent recognition method according to the second aspect.
In a seventh aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform the multi-intent recognition model training method of the first aspect.
In an eighth aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform the multiple intent recognition method of the second aspect.
According to the technical scheme, the embodiment of the invention has the following advantages:
1. firstly, model training is carried out by using a deep learning-based method, so that better migration and use can be realized. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.
2. Compared with the traditional intention identification method, the method based on deep learning has more accurate identification effect.
3. Compared with a method for performing intention identification only by using current text information, the method for performing intention identification only by using the text information has the advantages that the context information related to the context in a certain range is obtained by using the sliding window capable of controlling the text range, the performance of a classification algorithm can be improved, and the influence of larger noise caused by introducing the context information related to the context from the full text can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following briefly introduces the embodiments and the drawings used in the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flowchart of a multi-intent recognition model training method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a training apparatus for multiple intent recognition models according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of a multi-intent recognition model training apparatus according to an embodiment of the present invention;
FIG. 4 is a process diagram of the encoding step in an embodiment of the present invention;
FIG. 5 is a process diagram of the control step in the embodiment of the present invention;
FIG. 6 is a process diagram of the classification step in an embodiment of the present invention;
FIG. 7 is a flow chart of a method for multi-intent recognition according to an embodiment of the present invention;
FIG. 8 is a block diagram of a multiple intent recognition apparatus according to an embodiment of the present invention;
fig. 9 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," and the like in the description and in the claims, and in the above-described drawings, are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The following are detailed descriptions of the respective embodiments.
Referring to fig. 1, an embodiment of the present invention provides a method for training a multi-intent recognition model to solve the problems of the prior art, in which a multi-label classification model for multi-intent recognition is obtained through training. The multi-intention recognition model training method can comprise the following steps: the method comprises an encoding step, a control step and a classification training step.
Wherein the encoding step comprises:
101. obtaining dialogue texts (or called dialogue texts) from a dialogue system as training data, and performing data preprocessing on the training data.
102. And coding each dialog text in the training data to obtain a coding vector.
Optionally, the text is encoded by using a deep learning pre-training model based on a neural network, and the dialog text is vectorized. Herein, the obtained coding vector of each dialog text is recorded as VQ3
103. A feature vector with context information is computed for each dialog text.
Optionally, the text uses a sliding window to obtain multiple rounds of text including the current dialog text, obtains the code vectors of all the dialog texts in the sliding window, and calculates their weighted sum VHThe weighted sum VHI.e. the feature vector with context information that is context-dependent for the current dialog text.
The control steps comprise:
104. for each dialog text, it is determined whether context information relevant to the context needs to be introduced.
In particular, the encoding vector V can be determined from each dialog textQ3And its vector V with context information related to contextHCalculating a state value (namely, relevancy), and judging whether context information related to context needs to be introduced or not according to whether the state value exceeds a preset value or not; if the content exceeds the preset value, introducing the content, and if the content does not exceed the preset value, not introducing the content. The state value represents VQ3And VHAnd semantically related.
The classification training step comprises:
105. and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
Specifically, for each dialog text, if the control step judges that the dialog text is positive, a feature vector with context information related to context is introduced, and a coding vector V of the dialog text is introducedQ3And its feature vector V with context information that is context-dependentHInputting the combined result into a classifier; if the control step judges that the text is not the dialog text, directly encoding the vector V of the dialog textQ3Inputting a classifier; optionally, a convolutional neural network may be used as the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text. The multi-label classification model can be used for carrying out intention classification on vectors corresponding to input dialog texts and identifying the intention of the current dialog texts.
Fig. 2 is a schematic structural diagram of a multi-intent recognition model training apparatus according to an embodiment of the present invention. Referring to fig. 3, a schematic block diagram of a training apparatus for multi-intent recognition model under a multi-turn dialog system is shown. The device can be divided into three modules, namely an encoding module 21, a control module 22 and a classification training module 23.
An Encoding (Encoding) module 21 is configured to encode the dialog texts to obtain an Encoding vector of each dialog text, and calculate a feature vector of each dialog text with context information related to a context. The coding module can map non-quantized dialog text to vectorization space by using a real number vector by using an existing realized engineering technology, namely a deep learning pre-training model, such as a BERT model, and the coder adopts an existing realized engineering technology, namely a deep neural network framework, such as a bidirectional Transformer structure.
A Control module 22 for determining, for each dialog text, whether context information relating to the context needs to be introduced. The module may also control whether context information is required to introduce context in the current predicted dialog text.
And a classification (Classifier) training module 23 for training a classification model to perform intent classification. The intent recognition problem of a multi-turn dialog should be considered a multi-label classification problem. And training a classifier to obtain a multi-label classification model, and classifying and identifying the intention of the dialog text. When the classifier is trained, aiming at each dialog text, if the control module judges that the dialog text is true, the coding vector of the dialog text and the feature vector with context information related to the context are combined and input into the classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier.
Next, each step and module will be described in detail with reference to fig. 4 to 6.
1. Coding module
Based on specific areas such as: the coding module is used for carrying out mathematical processing on the conversation text data in the field of traffic customer service. Vector coding can be performed by using an engineering technology deep learning pre-training model such as a BERT (Bidirectional Encoder) pre-training model, and each round of dialog text is mapped into a high-dimensional vector space, so that semantic information loss caused by mapping in a traditional method is avoided. Meanwhile, sentence vectors generated by the BERT contain multilayer semantic information, so that the performance of the classifier can be greatly improved. Here by means of prior art.
As shown in fig. 4, in the encoding process, each dialog text is divided into single words, Token Embedding, Segment Embedding, and Position Embedding are performed respectively, and then the sum is input to BERT and encoded into a vector of a continuous vector space, so as to obtain an encoding vector of each dialog text:
Figure BDA0002676996960000081
since most users in a dialog system usually use the expression of phrases and ellipses, the content of the dialog is difficult to be understood in a single round of dialog. Therefore, by collecting more semantic information using the context of the current conversation, it is necessary to more accurately understand the intention of the user. For this purpose, for each dialog text, the invention can use a dynamic sliding window to obtain the dialog information of the front and back rounds of the dialog text. The sizes w _ before and w _ after of the sliding window can be customized (w _ before and w _ after are positive integers greater than or equal to 0), such as: if the preceding two dialogs of the current turn of dialog are used as the above information, and the following one of the current turn of dialog is used as the below information, it means that w _ before is 2 and w _ after is 1. It should be noted that each turn of dialog text refers to a group of dialogs with which both sides of the dialog have interaction, and may include two or more dialog texts.
After each dialog text in the sliding window is coded by BERT, a coding vector with context information is obtained. The sliding window is recorded with n dialog texts, wherein the code vector of any one dialog text is recorded as Vi=BERT(XC),i=1,2......n。
Some of the historical information is related to the intention of the current conversation, and many of the historical information are unrelated to the intention of the current conversation, so that introduction of indiscriminate information can cause introduction of a lot of noise, and an attention model is required to be introduced for judgment. V of current dialog textQ3V with contextual informationiThe correlation between the two vectors is represented by the inner product of the two vectors, and the finally calculated set of correlation degree values is normalized by softmax to obtain the symbolAnd (4) integrating attention of the probability distribution value interval to distribute probability distribution values. The formula is as follows:
Figure BDA0002676996960000091
wherein the softmax function is:
Figure BDA0002676996960000092
with PiAs weighted values, each code vector V in the sliding window is setiTo obtain a feature vector with context information associated with the context, denoted as VH
2. Control module
In a multi-turn dialog system, user intent recognition requires reference to context information that is context-dependent in some cases, and intent may be determined in some cases by simply relying on the current dialog. For dialogs where the intent can be determined directly, the introduction of context information that is context dependent is equivalent to the introduction of a lot of noise. To this end the control module controls whether to introduce the context information feature with context correlation by calculating a state value, as shown in fig. 5. The state value formula is as follows (this part of the step is implemented using existing techniques):
S=sigmoid(WS*[VQ3,VH])
wherein, WSWhat is it? The sigmoid function is:
Figure BDA0002676996960000093
generating a value sigma between 0 and 1 through a sigmoid function, wherein when the sigma is close to 0, the sigma indicates that context information related to a context does not need to be referred to; when sigma is close to 1, context information related to a reference context is indicated; determining a preset value, judging whether the state value S exceeds the preset value, and if so, judging that context information related to context needs to be introduced; if not, the introduction is judged not to be needed.
3. Classification training module
Since the intent recognition problem of a multi-turn dialog can itself be viewed as a problem that classifies intent, classification can be implemented in analogy to multi-label classification algorithms. Deep learning has achieved excellent performance in solving the multi-label classification problem. (this step can be accomplished using existing techniques.)
As shown in fig. 6, an engineering classification model based on the existing implementation is designed, such as: a Convolutional Neural Network (CNN) acts as a classifier.
If context information relevant to the context needs to be introduced, a dialogue vector V is introducedQ3And a feature vector V with context information that is context-dependentHClassification is performed in combination. Firstly, a sigmoid function is used for combination to obtain a characteristic vector E, and the formula is as follows:
E=sigmoid(WE*[VQ3+SVH])
by processing each piece of dialogue data in the training set as described above. According to the judgment result of whether the control module is led in or not, if the control module judges that the control module is led in, the obtained combined feature vector E is input into a classifier, otherwise, the coding vector V of the dialog text is directly input into the classifierQ3Inputting a classifier; and (4) training a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
The training process specifically comprises: and (3) performing convolution, maximum pooling and full-connection operation on the combined feature vector E, and setting an activation function of a classification layer to be sigmoid, so that the predicted score is limited between (0 and 1), and obtaining a predicted label vector O, wherein the dimensionality of the predicted label vector O is the size of a label set.
To evaluate the model, an objective function may be used. Alternatively, the objective function may use binary cross entry, n representing the total number of samples, x representing the samples, yiThe ith dimension, o, representing the target tag vectoriRepresenting the ith dimension of the predicted label vector, the target label vector y and the predicted label vector o being labeled with the dimension as labelVector of the size of the set of labels. The objective function is used to calculate the similarity score between the predicted label vector O and the target label vector Y, and the formula is as follows:
Figure BDA0002676996960000101
4. setting of custom threshold
And testing each piece of dialogue text data in the verification set by using the multi-label classification model obtained by training, and outputting a prediction label vector. Then, using the objective function as described above, a score of the similarity between the predicted tag vector and the target tag vector is calculated, thereby obtaining a score for each intention in the verification set, the score being between 0 and 1.
Further, the present invention also provides a scheme for determining an optimal threshold for a score for each intent in a verification set, comprising the steps of:
step S1: an initial threshold of 0.01 is first set, and then one intent in the verification set is selected and an F1 score is calculated for that intent over the entire verification set. The F1 Score (F1 Score) is an index used for measuring the accuracy of the two classification models in statistics, and the index gives consideration to the accuracy and the recall rate of the classification models. The F1 score can be viewed as a harmonic mean of model accuracy and recall with a maximum of 1 and a minimum of 0.
Step S2: judging whether the threshold value threshold is between (0, 1), if so, updating the intention threshold value as follows: threshold +0.01, then calculate the F1 score for the intent and compare it to the last F1 score, record the maximum F1 score and its corresponding threshold; otherwise, the process proceeds to step S3.
Step S3: judging whether all intentions are traversed or not, if not, traversing the next intention, returning to the step S1, and repeating the steps S1 and S2; otherwise, the process proceeds to step S4.
Step S4: the threshold corresponding to each F1 score with the highest intent is ultimately selected as the best threshold for the score corresponding to the intent.
Referring to fig. 7, a method for recognizing multiple intents is also provided according to an embodiment of the present invention. The method adopts the multi-label classification model obtained by the training in the above to perform multi-intention recognition on the dialog text.
The multi-intent recognition method may include: an encoding step, a control step and a classification step.
Wherein the encoding step comprises:
701. and acquiring the current dialog text from the dialog system, and performing data preprocessing.
702. And coding the current dialog text to obtain a coding vector.
Optionally, the text is encoded by using a deep learning pre-training model based on a neural network, and the dialog text is vectorized. Herein, the encoding vector of the current dialog text is denoted as Vd
703. A feature vector with context information that is context-dependent for the current dialog text is computed.
Optionally, the text uses a sliding window to obtain multiple rounds of text of the current dialog text, obtains coding vectors of all the dialog texts in the sliding window, performs weighted summation, and calculates a weighted sum VhI.e. feature vectors with context information that are context-dependent for the current dialog text.
The control steps comprise:
704. judging whether context information related to the context needs to be introduced or not;
in particular, the encoding vector V can be based on the current dialog textdAnd its vector V with context information related to contexthCalculating a state value, and judging whether context information related to context needs to be introduced or not according to whether the state value exceeds a preset value or not; if the content exceeds the preset value, introducing the content, and if the content does not exceed the preset value, not introducing the content.
The classification step comprises:
705. intent classification is performed using a multi-label classification model.
Specifically, if the control step judges that the current dialog text is the dialog text, the current dialog text is codedQuantity VdAnd its feature vector V with context information that is context-dependenthInputting a multi-label classification model after combination; if the control step judges that the text is not the dialog text, directly encoding the vector V of the dialog textdInputting a multi-label classification model; thus, the intention of the current dialog text is classified and recognized. The multi-label classification model is obtained by training according to the multi-intention recognition model training method.
Optionally, in the encoding step, the weighted summation may include: the code vector of each dialog text in the sliding window is recorded as Vj(ii) a Calculating the probability distribution value of attention distribution according to the formula
Figure BDA0002676996960000121
With PjFor the weighted value, for each code vector V in the sliding windowjWeighted summation is carried out to obtain a feature vector V with context information related to contexth
Optionally, in the controlling step, the encoding vector V according to the current dialog text may be useddAnd its feature vector V with context information that is context-dependenthAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)S*[Vd,Vh]) Wherein W isSIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.
Optionally, in the classifying step, the combining the coding vector of the current dialog text and the feature vector thereof with context information related to context information and inputting the combined coding vector into the multi-label classification model may include: the coding vector V of the dialog text is coded by using the sigmoid functiondAnd its feature vector V with context information that is context-dependenthBinding, E ═ sigmoid (W)E*[Vd+SVh]) Wherein W isEAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.
Referring to fig. 8, an embodiment of the present invention further provides a multi-intent recognition apparatus, including:
the encoding module 81 is configured to perform vector encoding on the current dialog text, and calculate a feature vector of the current dialog text with context information related to context;
a control module 82 for determining whether context information relating to context needs to be introduced;
and the classification module 83 is configured to, if the control module determines that the text is a text conversation, combine the coding vector of the current text conversation and the feature vector thereof with context information related to context, and input the combined vector into the multi-label classification model to perform classification and identification on the intention of the current text conversation.
The multi-intent recognition model training method, the multi-intent recognition method and the related device provided by the embodiment of the invention are explained above. According to the technical scheme, the embodiment of the invention has the following advantages:
1. firstly, the method based on deep learning can be better used in a migration mode. For the intention recognition tasks in different fields, only the intention data marked in the field needs to be replaced, and the model is retrained. Compared with the traditional method based on the rule template, the normative expressed by the user does not need to be considered, the cost of manually making the template is saved, and the method is easier to expand.
2. Compared with the traditional intention identification method, the method based on deep learning has more accurate identification effect.
3. Compared with a method for performing intention identification only by using current text information, the method for performing intention identification only by using the text information has the advantages that the context information related to the context in a certain range is obtained by using the sliding window capable of controlling the text range, the performance of a classification algorithm can be improved, and the influence of larger noise caused by introducing the context information related to the context from the full text can be reduced.
Referring to fig. 9, an embodiment of the present invention further provides a computer device 90, which includes a processor 91 and a memory 92, where the memory 92 stores a program, and the program includes computer-executable instructions, and when the computer device 90 runs, the processor 91 executes the computer-executable instructions stored in the memory 92, so as to make the computer device 90 execute the multi-intent recognition model training method or the multi-intent recognition method as described above.
An embodiment of the present invention also provides a computer readable storage medium storing one or more programs, the one or more programs comprising computer executable instructions, which when executed by a computer device, cause the computer device to perform a multi-intent recognition model training method or a multi-intent recognition method as described above.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; those of ordinary skill in the art will understand that: the technical solutions described in the above embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A multi-intention recognition model training method is characterized by comprising the following steps:
and (3) encoding: coding the training data to obtain a coding vector of each dialog text in the training data, and calculating a feature vector with context information related to context of each dialog text;
the control steps are as follows: judging whether context information related to context needs to be introduced or not according to each dialog text;
and (3) classification training: for each dialog text, if the control step judges that the dialog text is the dialog text, combining the coding vector of the dialog text and the feature vector with context information related to the context information, and inputting the combined coding vector and the feature vector into a classifier; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
2. The method of claim 1,
the encoding step specifically includes: coding the training data by adopting a deep learning pre-training model to obtain a coding vector V of each dialog text in the training dataQ3And for each dialog text, acquiring multiple rounds of dialog texts including the dialog text by using a sliding window, and weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector with context information related to the context of the dialog text, which is marked as VH
3. The method of claim 2,
the control step specifically comprises: for each dialog text, the coding vector V according to the dialog textQ3And its feature vector V with context information that is context-dependentHAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)S*[VQ3,VH]) Wherein W isSIs an empirical parameter; and judging whether context information related to the context needs to be introduced or not according to whether the state value S exceeds a preset value or not.
4. The method according to claim 2, wherein in the classification training step, the combining the coded vector of the dialog text and the feature vector with context information thereof is input to a classifier, comprising:
the coding vector V of the dialog text is coded by using the sigmoid functionQ3And its feature vector V with context information that is context-dependentHBinding, E ═ sigmoid (W)E*[VQ3+SVH]) Wherein W isEAnd inputting the feature vector E obtained by combination into a classifier for empirical parameters.
5. The method according to any of claims 1-4, further comprising the step of obtaining a score for each intent in the verification set, the step comprising in particular:
testing each dialog text in the verification set by using the multi-label classification model obtained by training, outputting a predicted label vector, and calculating the similarity score between the predicted label vector and the target label vector by using a target function, thereby obtaining the score of each intention in the verification set, wherein the score is between 0 and 1.
6. The method of claim 5, further comprising the step of determining an optimal threshold for the score of each intent in the verification set, the step comprising in particular:
s1: setting an initial threshold to 0.01, selecting an intention in the verification set and calculating an F1 score of the intention on the whole verification set;
s2: judging whether the threshold value threshold is between (0, 1), if yes, updating the intention threshold value as follows: threshold +0.01, then calculate the F1 score for the intent and compare it to the last F1 score, record the maximum F1 score and its corresponding threshold; if not, go to step S3;
s3: judging whether all intentions are traversed or not, if not, traversing the next intention, returning to the step 1, otherwise, entering the step S4;
s4: and finally, selecting a threshold corresponding to the F1 score with the highest intention as the optimal threshold of the score of the intention.
7. A multi-intent recognition method, comprising:
and (3) encoding: vector coding is carried out on the current dialog text, and a feature vector with context information related to context of the current dialog text is calculated;
the control steps are as follows: judging whether context information related to the context needs to be introduced or not;
and (3) classification step: if the judgment of the control step is yes, combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information, inputting the combination into a multi-label classification model, and classifying and identifying the intention of the current dialog text.
8. The method of claim 7,
the encoding step specifically includes: coding the current dialog text by adopting a deep learning pre-training model, and recording the coding vector of the current dialog text as VdAnd acquiring multiple rounds of text of the current dialog text by using a sliding window, weighting and summing the coding vectors of all the dialog texts in the sliding window to obtain a feature vector of the current dialog text with context information related to the context, and marking the feature vector as Vh
The control step specifically includes: coding vector V according to current dialog textdAnd its feature vector V with context information that is context-dependenthAnd calculating the state value, wherein the calculation formula is as follows: s-sigmoid (W)S*[Vd,Vh]) Wherein W isSIs an empirical parameter; judging whether context information related to context needs to be introduced or not according to whether the state value S exceeds a preset value or not;
in the classifying step, the combining the encoding vector of the current dialog text and the feature vector with context information related to the context information and inputting the combined encoding vector into the multi-label classification model includes: the coding vector V of the dialog text is coded by using the sigmoid functiondAnd its feature vector V with context information that is context-dependenthBinding, E ═ sigmoid (W)E*[Vd+SVh]) Wherein W isEAnd inputting the feature vector E obtained by combination into the multi-label classification model as an empirical parameter.
9. A multi-intent recognition model training device, comprising:
the encoding module is used for encoding the training data to obtain an encoding vector of each dialog text in the training data and calculating a feature vector with context information related to context of each dialog text;
the control module is used for judging whether context information related to context needs to be introduced or not aiming at each dialog text;
the classification training module is used for inputting the coded vector of each dialog text and the characteristic vector with context information related to context into the classifier after the coded vector is combined with the characteristic vector if the control module judges that the coded vector is positive; otherwise, directly inputting the coding vector of the dialog text into the classifier; and training through a classifier to obtain a multi-label classification model for classifying and identifying the intention of the dialog text.
10. A multiple intent recognition apparatus, comprising:
the encoding module is used for carrying out vector encoding on the current dialog text and calculating a feature vector with context information related to context of the current dialog text;
the control module is used for judging whether context information related to context needs to be introduced or not;
and the classification module is used for combining the coding vector of the current dialog text and the characteristic vector with context information related to the context information and then inputting the combined coding vector into the multi-label classification model to classify and identify the intention of the current dialog text if the control module judges that the coding vector is positive.
CN202010951226.5A 2020-09-11 2020-09-11 Multi-intention recognition model training method, multi-intention recognition method and related device Pending CN111984780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010951226.5A CN111984780A (en) 2020-09-11 2020-09-11 Multi-intention recognition model training method, multi-intention recognition method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010951226.5A CN111984780A (en) 2020-09-11 2020-09-11 Multi-intention recognition model training method, multi-intention recognition method and related device

Publications (1)

Publication Number Publication Date
CN111984780A true CN111984780A (en) 2020-11-24

Family

ID=73451041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010951226.5A Pending CN111984780A (en) 2020-09-11 2020-09-11 Multi-intention recognition model training method, multi-intention recognition method and related device

Country Status (1)

Country Link
CN (1) CN111984780A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541079A (en) * 2020-12-10 2021-03-23 杭州远传新业科技有限公司 Multi-intention recognition method, device, equipment and medium
CN112765332A (en) * 2021-01-05 2021-05-07 西交思创智能科技研究院(西安)有限公司 Intelligent dialog intention recognition method, system, storage medium and application
CN113254617A (en) * 2021-06-11 2021-08-13 成都晓多科技有限公司 Message intention identification method and system based on pre-training language model and encoder
CN113326373A (en) * 2021-05-19 2021-08-31 武汉大学 WeChat group chat record identification method and system fusing session scene information
CN113850078A (en) * 2021-09-29 2021-12-28 平安科技(深圳)有限公司 Multi-intention identification method and device based on machine learning and readable storage medium
CN114817501A (en) * 2022-04-27 2022-07-29 马上消费金融股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN118037362A (en) * 2024-04-12 2024-05-14 中国传媒大学 Sequence recommendation method and system based on user multi-intention comparison

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829667A (en) * 2018-05-28 2018-11-16 南京柯基数据科技有限公司 It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829667A (en) * 2018-05-28 2018-11-16 南京柯基数据科技有限公司 It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN111027069A (en) * 2019-11-29 2020-04-17 暨南大学 Malicious software family detection method, storage medium and computing device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541079A (en) * 2020-12-10 2021-03-23 杭州远传新业科技有限公司 Multi-intention recognition method, device, equipment and medium
CN112765332A (en) * 2021-01-05 2021-05-07 西交思创智能科技研究院(西安)有限公司 Intelligent dialog intention recognition method, system, storage medium and application
CN113326373A (en) * 2021-05-19 2021-08-31 武汉大学 WeChat group chat record identification method and system fusing session scene information
CN113326373B (en) * 2021-05-19 2022-08-05 武汉大学 WeChat group chat record identification method and system fusing session scene information
CN113254617A (en) * 2021-06-11 2021-08-13 成都晓多科技有限公司 Message intention identification method and system based on pre-training language model and encoder
CN113254617B (en) * 2021-06-11 2021-10-22 成都晓多科技有限公司 Message intention identification method and system based on pre-training language model and encoder
CN113850078A (en) * 2021-09-29 2021-12-28 平安科技(深圳)有限公司 Multi-intention identification method and device based on machine learning and readable storage medium
CN114817501A (en) * 2022-04-27 2022-07-29 马上消费金融股份有限公司 Data processing method, data processing device, electronic equipment and storage medium
CN118037362A (en) * 2024-04-12 2024-05-14 中国传媒大学 Sequence recommendation method and system based on user multi-intention comparison

Similar Documents

Publication Publication Date Title
CN111625641B (en) Dialog intention recognition method and system based on multi-dimensional semantic interaction representation model
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN111933127B (en) Intention recognition method and intention recognition system with self-learning capability
CN111626063A (en) Text intention identification method and system based on projection gradient descent and label smoothing
KR20180125905A (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
CN113987187B (en) Public opinion text classification method, system, terminal and medium based on multi-label embedding
CN115599901B (en) Machine question-answering method, device, equipment and storage medium based on semantic prompt
CN112699686B (en) Semantic understanding method, device, equipment and medium based on task type dialogue system
CN112101044B (en) Intention identification method and device and electronic equipment
CN112818698B (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN111460142B (en) Short text classification method and system based on self-attention convolutional neural network
CN112417894A (en) Conversation intention identification method and system based on multi-task learning
CN112417132B (en) New meaning identification method for screening negative samples by using guest information
CN113239690A (en) Chinese text intention identification method based on integration of Bert and fully-connected neural network
CN112328748A (en) Method for identifying insurance configuration intention
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN112307179A (en) Text matching method, device, equipment and storage medium
CN114756678B (en) Unknown intention text recognition method and device
CN113869055A (en) Power grid project characteristic attribute identification method based on deep learning
CN117217277A (en) Pre-training method, device, equipment, storage medium and product of language model
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN110992943A (en) Semantic understanding method and system based on word confusion network
CN113254575B (en) Machine reading understanding method and system based on multi-step evidence reasoning
CN113158062A (en) User intention identification method and device based on heterogeneous graph neural network
CN112257432A (en) Self-adaptive intention identification method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination