CN115238050A

CN115238050A - Intelligent dialogue method and device based on text matching and intention recognition fusion processing

Info

Publication number: CN115238050A
Application number: CN202210734681.9A
Authority: CN
Inventors: 李卓群; 陶焜
Original assignee: Beijing Aiyisheng Technology Co ltd
Current assignee: Beijing Aiyisheng Technology Co ltd
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-10-25

Abstract

The invention discloses an intelligent dialogue method and device based on text matching and intention recognition fusion processing, wherein the method comprises the following steps: setting at least one intention recognition category and acquiring answer texts for current questions; if the current problem is a blank filling problem, performing intention identification by using an intention identification model, and entering a processing logic of a corresponding intention; and if the current problem is a choice problem, performing fusion processing by simultaneously adopting an intention recognition model and a text matching model, selecting the model from the fusion output result as an adaptation model, and performing judgment processing by using the adaptation model. The invention integrates the text matching model and the intention recognition model, improves the semantic understanding accuracy in the question answering process, and greatly improves the intelligence and service experience of the man-machine conversation system.

Description

Intelligent dialogue method and device based on text matching and intention recognition fusion processing

Technical Field

The invention relates to natural language processing, in particular to an intelligent dialogue method and device based on text matching and intention recognition fusion processing, aiming at realizing more intelligent semantic understanding and dialogue logic control through a fusion model of text matching and intention recognition on a task-type man-machine dialogue system and being applied to various robot revisit dialogue tasks.

Background

Text matching is one of the most basic tasks in the NLP field and is widely applied to specific tasks such as information retrieval, question answering systems and the like. The traditional text matching methods include similarity judgment according to vocabulary overlap (such as editing distance) and relation learning on manually defined features, and the like, and the methods have limited model representation capability and are difficult to utilize truly effective information and features in texts. At present, a deep learning method is widely applied to a text matching task, and compared with a traditional method, a deep text matching model can automatically extract effective features in a text through learning of mass data, and matching is carried out through similarity calculation modes of different structures. Deep text matching far outperforms traditional methods in terms of performance.

The intention recognition is a classification task for recognizing the text of a sentence as a preset intention. The traditional intention identification method mainly utilizes templates and statistical characteristics, can not process irregular texts, and the current deep learning method extracts text representation by using modes such as a convolution network and the like, so that higher performance can be shown.

The return visit and questionnaire survey of target people are general requirements in many industrial scenes, such as satisfaction survey return visit of the service industry, postoperative rehabilitation condition follow-up visit of a hospital to a patient, and the like. The AI robot is used to make a call and follow-up visits in a man-machine conversation manner, taking the user experience, the return visit success rate and the service cost into comprehensive consideration. The robot return visit dialogue takes a preset questionnaire as a core to carry out dialogue, the questions of the questionnaire are generally divided into choice questions and blank filling questions, and how to correspond the answers of the user to correct options in the choice questions is a semantic matching question. However, in the meantime, there are often situations where the user does not answer according to the preset questions, for example, when the user suddenly answers a sentence "i do not speak again at any time and at any time or" i do not hear and please speak again ", the robot is required to recognize the intention expressed by the user and enter a corresponding processing logic. Since we cannot predict in advance whether each user answers in a sentence is a response to a preset option or an expression of a specific intention, semantic understanding in a robot revisit dialogue cannot be simply classified as a text matching task, and fusing text matching and intention recognition may improve processing accuracy, and until now, there is no relevant research.

Disclosure of Invention

In order to solve the problems, the invention discloses an intelligent dialogue method based on text matching and intention recognition fusion processing, which comprises the following steps:

step S1, setting at least one intention identification category, and acquiring an answer text for a current question;

step S2, if the current question is a blank filling question, performing intention recognition by using an intention recognition model, outputting an intention recognition result of the answer text, and if the intention recognition result does not belong to any intention recognition category, storing and recording the answer text; if the intention identification result belongs to one intention identification category, entering the processing logic of the corresponding intention;

if the current question is a choice question, the intention recognition model and the text matching model are adopted for fusion processing at the same time, the model is selected from the fusion output result to serve as the adaptation model, the adaptation model is used for outputting the similarity of text matching or the intention recognition result of the answer text, and whether the current question is answered or not is determined according to the similarity of the output text matching, so that whether the next question is entered or not is determined, or the processing logic of the corresponding intention is entered if the current question belongs to any intention recognition category according to the intention recognition result of the output answer text.

Optionally, the text matching model includes a first embedded word vector layer, a first Bi-LSTM module, a second Bi-LSTM module, a first pooling layer, a first multilayer perceptron and a first softmax layer, which are connected in sequence, and the second pooling layer, the comparison module are also connected in sequence from an output branch of the first Bi-LSTM module, and the process of training the text matching model is as follows:

step S211, obtaining an answer text with labels and a preset option text library, wherein the preset option text library comprises a matched text and a unmatched text for question answers, the answer text and the matched text form a first text pair, the answer text and the unmatched text form a second text pair, and word vector mapping is performed on the answer text, the matched text and the unmatched text by using a first embedded word vector layer obtained by pre-training to respectively obtain corresponding embedded word vectors;

step S212, the embedded word vectors corresponding to the answer text, the matching text and the unmatched text respectively pass through a first Bi-LSTM module to respectively obtain corresponding first-layer hidden-layer word vectors;

step S213, the first layer of hidden word vectors are input into the second pooling layer to obtain sentence expression, the vector similarity and positive case matching score expressed by the sentences of the first text pair are calculated by the comparison module, the vector similarity and negative case matching score expressed by the sentences of the second text pair are calculated by the comparison module, and the positive case matching score is increased, the negative case matching score is reduced and the text matching model is optimized in advance through gradient return of a comparison loss function;

step S214, performing text interaction on the first layer hidden layer word vectors in the first text pair and the second text pair to obtain a second layer hidden layer word vector;

step S215, inputting the second layer of hidden word vectors into a second Bi-LSTM module to obtain a third layer of hidden word vectors, inputting the third layer of hidden word vectors into a first pooling layer, performing maximum pooling to obtain sentence expression, respectively calculating vector similarity of sentence expression of the first text pair and the second text pair through a first multilayer perceptron, inputting a probability value that the answer text belongs to the matched text and matching loss into a first softmax layer, and reducing the matching loss through continuous iteration to obtain a text matching model.

Optionally, the intent recognition model includes a second embedded word vector layer, a third Bi-LSTM module, a third pooling layer, a second multi-layered perceptron, and a second softmax layer,

the training process of the intention recognition model is as follows:

step S221, inputting the label text and the answer text in the intention label library into the second embedded word vector layer, the third Bi-LSTM module and the third pooling layer in sequence to obtain the sentence representation of the answer text and the label text;

step S222, inputting sentence expressions of the answer text into a second multilayer sensor and a second softmax layer in sequence to obtain a classification result and a classification loss, and calculating a similarity loss by using the sentence expressions of the label text and the sentence expressions of the answer text;

step S223, obtaining the trained intention recognition model through continuous iteration and gradient pass-back classification loss and similarity loss.

Optionally, the performing fusion processing by simultaneously adopting the intention recognition model and the text matching model, and selecting the model from the fusion output result as the adaptation model, includes:

step S21, initializing scores of the intention recognition model and the text matching model;

step S22, scoring the intention recognition model and the text matching model by adopting the following three methods, and selecting the model with high score as an adaptive model, wherein the three methods comprise:

the method A comprises the steps of obtaining text matching reliability of a text matching model output result and intention recognition reliability of an intention recognition model output result, comparing the text matching reliability and the intention recognition reliability with corresponding reliability thresholds, and outputting corresponding scores of the text matching model and the intention recognition model, wherein if the text matching reliability is greater than the text matching reliability threshold and the intention recognition reliability is less than the intention recognition reliability threshold, the text matching model corresponds to a score sa, otherwise, the intention recognition model corresponds to a score sb;

the method B comprises the steps of utilizing a third multilayer perceptron to represent sentences output by a first word vector embedding layer, a first Bi-LSTM module and a second pooling layer of answer texts in a text matching model, represent sentences output by a second word vector embedding layer, a third Bi-LSTM module and a third pooling layer of answer texts in an intention recognition model, obtain text representation change vectors by representing the sentences embedded and output by a third word vector of the answer texts, splicing the two text representation change vectors to form new feature vectors, performing adaptation model on the new feature vectors, classifying the new feature vectors into two categories of the text matching model or the intention recognition model, and outputting the probability of applying the text matching model and the intention recognition model;

the method C comprises the steps of splicing sentence representations of a text matching model and an intention recognition model to obtain a joint representation, using a fourth multilayer perceptron as an adaptation model to classify the text matching model or the intention recognition model, and outputting an assumed prior probability suitable for the text matching model and the intention recognition model;

and processing the assumed prior probability of the method C and the output results of the methods A and B in a weighted summation mode, so as to obtain the final score of the text matching model and the final score of the intention recognition model.

Optionally, in the method C, the joint representation and the question text are spliced and respectively input to the corresponding fifth multilayer sensor and the sixth multilayer sensor, so as to output two vectors, the two vectors are weighted and fused by using the assumed prior probability, the vectors obtained by the weighted fusion are classified, whether the text and the question form a question-answer relationship is judged, and the assumed prior probability is updated through gradient return and continuous iteration.

Optionally, the inter-text interaction means that cosine similarity is calculated between any word vector h in the text a and all word vectors in the text B and normalized as weights, all word vectors in the text B are added according to the weights, the obtained word vectors are spliced to the word vectors h and then restored to original dimensions by using MLP, that is, a new h is obtained as a new representation of the word vectors, and all word vectors in the text a and the text B are operated according to the method, that is, all inter-text interaction is completed.

Optionally, when the text matching model is trained, taking the median of all similarity scores in the training process as a text matching reliability threshold, and when the intention recognition model is trained, taking the median of all probabilities corresponding to the intention recognition categories predicted in the training process as an intention recognition reliability threshold;

wherein sa is the similarity score 0.1 of the text matching model output, sb is the predicted intention recognition class probability 0.1 of the intention recognition model output.

Optionally, heuristic text labels are adopted in the label texts in the intention label library, and a heuristic template corresponding to each category of unmatched texts is obtained by studying training set data, wherein the heuristic template comprises keywords of most data of the category.

The invention also provides an intelligent dialogue device based on text matching and intention recognition fusion processing, which comprises:

the question acquisition module is used for setting at least one intention identification category and acquiring an answer text for the current question;

the question judging module is used for performing intention recognition by using an intention recognition model if the current question is a blank filling question, outputting an intention recognition result of the answer text, and storing and recording the answer text if the intention recognition result does not belong to any intention recognition category; if the intention identification result belongs to one intention identification category, entering the processing logic of the corresponding intention;

if the current question is a choice question, simultaneously adopting an intention recognition model and a text matching model for fusion processing, selecting the model from the fusion output result as an adaptation model, outputting the similarity of text matching or the intention recognition result of an answer text by using the adaptation model, and determining whether the current question is answered or not according to the similarity of the output text matching so as to determine whether to enter the next question or not, or determining whether to belong to any intention recognition category according to the intention recognition result of the output answer text, entering the processing logic of the corresponding intention,

the text matching model comprises a first embedded word vector layer, a first Bi-LSTM module, a second Bi-LSTM module, a first pooling layer, a first multilayer perceptron and a first softmax layer which are connected in sequence, and the second pooling layer and the comparison module are also connected in sequence from the output branch of the first Bi-LSTM module;

wherein the intent recognition model includes a second embedded word vector layer, a third Bi-LSTM module, a third pooling layer, a second multi-layer perceptron, and a second softmax layer.

the method A comprises the steps of obtaining text matching reliability of a text matching model output result and intention recognition reliability of an intention recognition model output result, comparing the text matching reliability and the intention recognition reliability with corresponding reliability thresholds, and outputting corresponding bonus points of the text matching model and the intention recognition model, wherein if the text matching reliability is greater than the text matching reliability threshold and the intention recognition reliability is less than the intention recognition reliability threshold, the text matching model corresponds to a bonus point sa, and otherwise, the intention recognition model corresponds to a bonus point sb;

the method B comprises the steps of utilizing a third multilayer perceptron to represent sentences output by a first word vector embedding layer, a first Bi-LSTM module and a second pooling layer of answer texts in a text matching model, represent sentences output by a second word vector embedding layer, a third Bi-LSTM module and a third pooling layer of answer texts in an intention recognition model, obtain text representation change vectors with the sentence representations output by the third word vector embedding of the answer texts, splicing the two text representation change vectors to form new feature vectors, performing adaptation model on the new feature vectors, classifying the new feature vectors into two categories of a text matching model or an intention recognition model, and outputting probabilities of applying the text matching model and the intention recognition model;

The invention solves the problem of fusion recognition of text matching and intention recognition in the intelligent return visit process of the robot. And aiming at the text matching and the intention recognition, training by using methods such as fine-grained interaction, heuristic method and the like to obtain a model with better performance. Finally, a text matching model and an intention recognition model are fused to improve the semantic understanding accuracy in the question and answer process and greatly improve the intelligence and service experience of the man-machine conversation system.

Drawings

FIG. 1 is a diagram of the text matching model framework according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the intent recognition model framework of an embodiment of the present invention;

fig. 3 is a schematic diagram of the fusion model framework according to the embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The intelligent dialogue method based on the text matching and the intention recognition fusion processing comprises the following steps:

step S1, setting an intention identification category and acquiring an answer text for a proposed question.

The category of intent recognition is a category of intent recognition to which the answer text should be determined. Since in actual data, most answer texts are relevant to the question, only a very small part of answer texts belong to abnormal replies, for example, the intention recognition category may be "need for manual customer service", "only rest day contact", "only work day contact", or the like.

Step S2, if the question is a blank question, namely no preset reply option exists, the intention recognition model is used for intention recognition, the intention of the answer text is output, and if the result of each intention recognition category is negative (namely the intention of the answer text does not belong to any intention recognition category), the answer is directly filled in the record; if the intentional recognition result in each intent recognition category is positive (i.e., the intent of the answer text belongs to one of the intent recognition categories), then processing logic for the corresponding intent is entered.

If the question is a choice question, namely with a preset option, the intention recognition model and the text matching model are adopted for fusion processing at the same time, the model is selected from the fusion output result to serve as the adaptive model, and the similarity of text matching or the intention of answering the text is output by using the adaptive model.

The question of filling in the blank and the choice question refers to, for example, a question of asking whether you have used other goods of the product card, which belongs to a question without preset options, which is a blank question, and the answer text is usually "used", "not" and the like. For example, "ask you whether you have used other goods of the brand, a means used, and B means not used", which belongs to a question with preset options, that is, a choice question, and the answer text should be generally a, B.

The method comprises the following steps of simultaneously adopting an intention recognition model and a text matching model for fusion processing, and selecting a model as an adaptive model according to a fusion output result, wherein the method comprises the following steps:

in step S21, the scores of the intention recognition model and the text matching model are initialized, for example, each set with an initial score of 0.

the method A includes the steps that text matching reliability of a text matching model output result and intention recognition reliability of an intention recognition model output result are obtained and compared with corresponding reliability threshold values, if the text matching reliability is larger than the text matching reliability threshold value and the intention recognition reliability is smaller than the intention recognition reliability threshold value, it needs to be stated that each category is intended to be recognized, and the reliability of all the categories is required to be smaller than the corresponding threshold values. The text matching model score plus a certain value sa, otherwise the intent recognition model score plus a certain value sb.

The text matching reliability threshold and the intention identification reliability threshold are determined by the following method: during model training, recording the credibility of each data in a training set, wherein the text matching credibility is the similarity score in the text matching task, and taking the median of all the similarity scores in the training process as the text matching credibility threshold. And the intention recognition credibility is the probability corresponding to the prediction category in the intention recognition task, and the median of all the probabilities corresponding to the prediction category in the training process is used as the intention recognition credibility threshold.

Wherein sa is the similarity score 0.1 of the text matching model output, sb is the prediction category probability 0.1 of the intention recognition model output. The prediction category refers to the category in which the training data is really described.

And B, utilizing a third multilayer perceptron to represent sentences output by a first word vector embedding layer, a first Bi-LSTM module and a second pooling layer of answer texts in the text matching model and sentences output by a second embedded word vector layer, a third Bi-LSTM module and a third pooling layer of answer texts in the intention recognition model, and to represent sentences output by a third word vector embedding layer of answer texts to obtain text representation change vectors, namely obtaining text representation change vectors represented by the sentences of the text matching model and original sentences, splicing the sentence representation change vectors of the intention recognition model and the text representation change vectors represented by the original sentences, splicing the two text representation change vectors as new feature vectors, classifying the new feature vectors as two classes of the text matching model or the intention recognition model, judging whether the text matching model or the intention recognition model is applicable, and respectively adding output probability scores to the text matching model and the intention recognition model.

And the method C comprises the steps of directly splicing sentence representations of the text matching model and the intention recognition model to obtain joint representation, performing binary classification that the adaptive model is the text matching model or the intention recognition model by utilizing a fourth multilayer perceptron, judging whether the adaptive model is the text matching model or the intention recognition model, and taking the classification result as the assumed prior probability.

And processing the assumed prior probability of the method C and the results of the methods A and B in a weighted summation mode, so as to obtain the final score of the text matching model and the final score of the intention recognition model. And determining whether the answer text is suitable for the text matching model or the intention recognition model according to the final score.

Preferably, in the method C, the joint representation and the question text are spliced and input to the corresponding fifth multi-layer perceptron and the sixth multi-layer perceptron, so as to output two vectors, and the two vectors are weighted and fused by using the assumed prior probability. And (5) performing binary classification on the vector obtained by weighting fusion, and judging whether the text and the question form a question-answer relationship. The prior probability of this assumption will be progressively accurate through gradient backtransmission and iteration.

For the two categories for judging whether the text and the question form a question-answer relationship, specifically, the two categories can be set to have a result of 0 so as not to form a question-answer relationship with the question and 1 so as to form a question-answer relationship. Because the original task does not have the task of 'whether to form the binary classification of the question-answer relationship' but is newly constructed here, no label data exists, so that the label of the answer text is used as the label of the binary classification task, and the label of the answer text can only be [ abnormal intention ] or [ matching item ], so that the label can be used as the class label of the newly created binary classification task.

And S3, determining whether the current question is answered or not according to the similarity of the text matching output by the text matching model, so as to enter the next question, or performing corresponding processing according to the intention type identified by the intention identification model.

The following describes the structure and training method of the text matching model and the intention recognition model.

Wherein the text matching model comprises a first embedded word vector layer, a first Bi-LSTM module, a second Bi-LSTM module, a first pooling layer, a first multi-layer perceptron and a first softmax layer, which are connected in sequence, and preferably, is further connected to a second pooling layer, a comparison module (e.g. softmax), branching from behind the first Bi-LSTM module.

The process of training the text matching model is as follows:

step S211, obtaining an answer text and a preset option text library, wherein the answer text is a text for robot question answering, the preset option text library comprises matched texts and unmatched texts for all robot question answering, and word vector mapping is performed on all texts (including the answer text, the matched texts and the unmatched texts) by using an embedded word vector layer obtained through pre-training to obtain a hidden layer word vector. The embedded word vector layer obtained by pre-training uses an open-source pre-training model bert-base-kernel, and the pre-training method comprises MLM (occluded word prediction) and NSP (prediction of the next sentence).

The matching text refers to a normal reply to a robot question, for example, the robot question "ask you to be satisfied with the product," the matching text is "satisfied" or "not satisfied," and the non-matching text may be a sentence, other than the matching text, which is not related to the answer text, for example, "who you are", "i do not have time now.

Step S212, the answer text, the matched text and the unmatched text respectively pass through the first Bi-LSTM module to obtain a first layer of hidden layer word vectors, and the process is called text inner interaction.

Step S214, forming a first text pair by the answer text and the matching text, forming a second text pair by the answer text and the non-matching text, and performing text-to-text interaction on the first layer hidden word vectors of each pair of texts to obtain a second layer hidden word vector. The text interaction means that, for example, a certain word vector h in the two texts a and B is subjected to cosine similarity calculation and normalization with all word vectors in the text B, the result is used as a weight, all word vectors in the text B are added according to the weight, the obtained word vectors are spliced to h, then the original dimensionality is restored by using the MLP, that is, a new h is obtained as a new representation of the word vector, and other word vectors in the text a and the text B are also operated according to the method, that is, all text interaction is completed.

Step S215, inputting the second layer hidden layer word vector obtained after the text interaction into a second Bi-LSTM module to obtain a third layer hidden layer word vector. And inputting the third layer of hidden layer word vectors into a first pooling layer, performing maximum pooling to obtain sentence expression, respectively calculating the vector similarity of the sentence expression of the first text pair and the sentence expression of the second text pair by using a first multilayer perceptron, and inputting the probability value of the answer text belonging to the matched text and the matching loss into a first softmax layer. And reducing the matching loss through continuous iteration to obtain a text matching model.

Preferably, the method further includes step S213, the first layer hidden layer word vector is further input into the second pooling layer, maximal pooling is performed to obtain sentence representations, the vector similarity and the positive case matching score of the sentence representations of the first text pair are calculated by the comparison module, the vector similarity and the negative case matching score of the sentence representations of the second text pair are calculated by the comparison module, and the positive case matching score is increased, the negative case matching score is decreased, and the text matching model is optimized in advance by passing back through the gradient of the contrast loss function. Specifically, for example, if the answer text, the matching text, and the sentence of the unmatched text represent vectors E1, E2, and E3 in sequence, the positive and negative comparison loss is-cos (E1, E2)/((cos (E1, E2) + cos (E1, E3)), and when passing back in a gradient, the loss is reduced, that is, cos (E1, E2) is increased, and cos (E1, E3) is reduced, and if there are a plurality of unmatched texts, the cosine similarity is calculated with the answer text and then added at the denominator.

Wherein the intention recognition model comprises a second embedded word vector layer, a third Bi-LSTM module, a third pooling layer, a second multi-layered perceptron and a second softmax layer.

The training process of the intention recognition model is as follows:

and step S221, inputting the label text and the answer text in the intention label library into the second embedded word vector layer, the third Bi-LSTM module and the third pooling layer in sequence to obtain sentence representations of the answer text and the label text.

In actual data, most answers are related to questions, and only a few answers belong to abnormal replies, so that the training data are extremely unbalanced. In order to alleviate the dependence on the data scale and the balance degree in the model training process, a heuristic text label is adopted, that is, a heuristic template corresponding to each category of abnormal answers (that is, unmatched texts) is obtained by researching training set data. The heuristic template can contain the key words of most data in the category, and one or more typical sentences can be manually selected as the heuristic template in each work. For example: one of the intention categories is "need to be manually served", which is an intention determined by analyzing data such as "do you be manual or robot, i want to speak with a person", "are you be computer or real person", and so on, so that a template of "need to be manually served" in this category can be defined as "are you manual or machine? ", which contains the keyword for that category.

It should be noted here that, for the category "need human service", only "are you human or machine? "as a heuristic template, but for some categories, it may be necessary to select a plurality of typical sentences as heuristic templates because the data characteristics are not concentrated. For example, if the data characteristics of the category "need human service" are not centralized, a plurality of sentences containing keywords corresponding to the category can be used as heuristic templates. Of course, there may still be data that does not fit the template, but experiments have shown that performance can be improved as long as the template can be approximated to most of the data. In training the model, for example, if the intent of the current data is A (i.e., belongs to class A), then the Bi-LSTM network is used to obtain the textual representation of the class A template if the class A template is only one representative sentence, and then the Bi-LSTM network is used to obtain the textual representations of all representative sentences and then averaged to obtain the template textual representation if the class A template is a plurality of representative sentences.

Step S222, calculating a similarity loss by using the sentence representation of the label text and the sentence representation of the answer text, and obtaining the similarity loss by adjusting the model parameter to approximate the cosine similarity between the label text and the answer text, which is called heuristic loss.

And preferably, the answer text is further input into the second multilayer perceptron and the second softmax layer, a classification result and a classification loss are obtained, and the answer text is classified more accurately by adjusting parameters, so that the cosine similarity between the label text and the answer text can be more quickly and efficiently drawn.

Note that the above-mentioned words vector embedding layer, bi-LSTM module, pooling layer, multi-layer sensor, first and second of softmax layer \8230, 8230, are merely for the sake of distinction of expressions, and are not used to indicate differences.

the question acquisition module is used for setting at least one intention identification category and acquiring an answer text of the current question;

wherein the intent recognition model includes a second embedded word vector layer, a third Bi-LSTM module, a third pooling layer, a second multi-layered perceptron, and a second softmax layer.

Further, the fusion processing of the intention recognition model and the text matching model is simultaneously adopted, and the model is selected as the adaptive model according to the fusion output result, which includes:

step S22, adopting the following three methods to carry out two-classification of the adaptation model, namely the text matching model or the intention identification model, and selecting the model with high score as the adaptation model, wherein the three methods comprise:

the method C comprises the steps of splicing sentence expressions of a text matching model and an intention recognition model to obtain a joint expression, classifying the sentence expressions by using a fourth multilayer perceptron, and outputting an assumed prior probability suitable for the text matching model and the intention recognition model;

The present invention is capable of other embodiments, and various changes and modifications can be made by one skilled in the art without departing from the spirit and scope of the invention.

Claims

1. An intelligent dialogue method based on text matching and intention recognition fusion processing is characterized by comprising the following steps:

2. The intelligent dialogue method based on text matching and intent recognition fusion processing according to claim 1, wherein the text matching model comprises a first embedded word vector layer, a first Bi-LSTM module, a second Bi-LSTM module, a first pooling layer, a first multi-layer perceptron, and a first softmax layer, which are connected in sequence, and further a second pooling layer, a comparison module, which are connected in sequence from an output branch of the first Bi-LSTM module, and the process of training the text matching model is as follows:

step S212, the embedded word vectors corresponding to the answer text, the matching text and the unmatched text respectively and independently pass through a first Bi-LSTM module to respectively obtain corresponding first-layer hidden-layer word vectors;

3. The intelligent dialogue method based on text matching and intent recognition fusion processing of claim 2, wherein the intent recognition model comprises a second embedded word vector layer, a third Bi-LSTM module, a third pooling layer, a second multi-layered perceptron, and a second softmax layer,

the training process of the intention recognition model is as follows:

step S223, obtaining the trained intention recognition model through continuous iteration based on the gradient pass-back classification loss and the similarity loss.

4. The intelligent dialogue method based on the fusion process of text matching and intention recognition according to claim 3, wherein the fusion process of the intention recognition model and the text matching model is simultaneously adopted, and the model is selected as the adaptive model according to the fusion output result, and the method comprises the following steps:

and (4) processing the assumed prior probability of the method C and the output results of the method A and the method B in a weighted summation mode, thereby obtaining the final score of the text matching model and the final score of the intention recognition model.

5. The intelligent dialogue method based on text matching and intent recognition fusion process of claim 4,

in the method C, the joint representation and the question text are spliced and are respectively input into a fifth multilayer perceptron and a sixth multilayer perceptron which correspond to each other, so that two vectors are output, the two vectors are subjected to weighted fusion by using assumed prior probability, the vectors obtained by the weighted fusion are classified in two ways, whether the text and the question form a question-answer relation or not is judged, and the assumed prior probability is updated through gradient return and continuous iteration.

6. The intelligent dialog method based on text matching and intent recognition fusion process of claim 5,

the inter-text interaction means that cosine similarity is calculated between any word vector h in the text A and all word vectors in the text B and normalized to serve as weight, all word vectors in the text B are added according to the weight, the obtained word vectors are spliced to the word vector h and then restored to the original dimension by using MLP, and then new h is obtained to serve as new representation of the word vector, and the text A and all word vectors in the text B are operated according to the method, so that the inter-text interaction is completed.

7. The intelligent dialogue method based on the text matching and the intention recognition fusion processing of claim 6, wherein during training of a text matching model, taking the median of all similarity scores in the training process as a text matching reliability threshold, and during training of an intention recognition model, taking the median of all probabilities corresponding to intention recognition categories predicted in the training process as an intention recognition reliability threshold;

8. The intelligent dialogue method based on text matching and intent recognition fusion processing of claim 3, wherein the label texts in the intent label library adopt heuristic text labels, and a heuristic template corresponding to each category of unmatched texts is obtained by studying training set data, wherein the heuristic template comprises keywords of most data of the categories.

9. An intelligent dialogue device based on text matching and intention recognition fusion processing, comprising:

10. The intelligent dialogue device based on text matching and intention recognition fusion processing according to claim 9, wherein the simultaneously adopting the intention recognition model and the text matching model fusion processing and selecting the model from the fusion output result as the adaptation model comprises:

step S22, scoring the intention recognition model and the text matching model by adopting the following three methods, and selecting a model with high score as an adaptive model, wherein the three methods comprise: