CN110209791B - Multi-round dialogue intelligent voice interaction system and device - Google Patents

Multi-round dialogue intelligent voice interaction system and device Download PDF

Info

Publication number
CN110209791B
CN110209791B CN201910505280.4A CN201910505280A CN110209791B CN 110209791 B CN110209791 B CN 110209791B CN 201910505280 A CN201910505280 A CN 201910505280A CN 110209791 B CN110209791 B CN 110209791B
Authority
CN
China
Prior art keywords
intention
model
semantic
dialogue
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910505280.4A
Other languages
Chinese (zh)
Other versions
CN110209791A (en
Inventor
张韶峰
冯鑫
王世朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bairong Yunchuang Technology Co ltd
Original Assignee
Bairong Yunchuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bairong Yunchuang Technology Co ltd filed Critical Bairong Yunchuang Technology Co ltd
Priority to CN201910505280.4A priority Critical patent/CN110209791B/en
Publication of CN110209791A publication Critical patent/CN110209791A/en
Application granted granted Critical
Publication of CN110209791B publication Critical patent/CN110209791B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A multi-round dialogue intelligent voice interaction system and a device thereof are provided, the system comprises a mixed semantic understanding module, a semantic understanding self-adapting module and an automatic dialogue management module, voice input is converted into text input after voice recognition and then is input into the mixed semantic understanding module, user intention is understood and corresponding state information is extracted, the automatic dialogue management module guides a dialogue process based on the user intention, dialogue text is output and converted into voice output, dialogue is realized, and the semantic understanding self-adapting module is used for optimizing and learning of the mixed semantic understanding module. The invention integrates a plurality of modules of voice recognition, natural language understanding, natural language generation, voice synthesis, conversation management and the like to form a whole set of multi-turn conversation intelligent voice interaction system which is easy to expand, can be configured and can be applied to any scene.

Description

Multi-round dialogue intelligent voice interaction system and device
Technical Field
The invention belongs to the technical field of computers, relates to the fields of natural language processing and artificial intelligence, and discloses a multi-turn dialogue intelligent voice interaction system.
Background
How to solve human language by computer mechanism is always a popular research direction for artificial intelligence and natural language processing, and is also one of the core problems to be solved in the field of modern artificial intelligence. While the application of speech recognition and image recognition technologies is becoming more sophisticated, deep learning technologies have been studied with a high degree of heat in the field of semantic understanding, but there are still a few artificial intelligence products for real spoken language dialogs. Common voice robots in the market are mostly voice assistant robots, are often matched based on key words, can recognize and understand contents simply, are difficult to realize continuous multi-round interaction, and frequently answer questions
Several prior art intelligent voice interaction schemes are described below.
(one) the first technical scheme: speech recognition ASR + text matching
Speech recognition plus text matching is the most traditional way to implement intelligent speech interaction systems, which is often applied to traditional call centers due to its advantages of easy implementation, low dependency on data, etc. Text matching usually adopts accurate matching, such as judging character strings to be equal, or fuzzy matching, such as wildcard by using a regular expression, and the like, to extract keywords from characters of voice recognition, and to perform instruction allocation on the keywords, so as to achieve the purpose of understanding spoken language.
However, such a solution has the following problems: 1) the accuracy of speech recognition is closely related to the dialogue field, and the problem of excessive cost exists in training a speech recognition system in a specific field, 2) spoken dialogue is different from text dialogue, and multiple intentions and even contradictory intentions often appear in a sentence, and 3) when the interactive content becomes complex, the matching grammar written by people can be rapidly increased to the extent of difficult maintenance, and the conflict between the matching grammars often appears, so the semantic recognition accuracy of the speech interactive system is greatly limited.
(II) the second technical scheme: speech recognition ASR + intent recognition + semantic groove
Intention recognition and semantic slot extraction are also common intelligent voice interaction modes, conversation text data are collected in advance, and the text data are labeled and classified, for example:
what is the weather in Beijing tomorrow? "mark as" tomorrow/TIME Beijing/LOC weather/how the INT looks/B ", and on a large amount of marking data, train the classification model to carry on the intention and discern, the common intention discerns models have SVM models or CNN, RNN, etc. deep learning models; and extracting semantic slots according to intentions, wherein common semantic slot extraction modes comprise a syntax analysis mode, a named entity extraction mode, a sequence-to-sequence model mode and the like. As shown in the above example, the intent is "look through the weather", and the extractable semantic slot includes two factors, "TIME (TIME)", "Location (LOC)". The mode of intention recognition and semantic slot extraction increases the accuracy of semantic recognition and the multi-turn conversation capability of the robot to a certain extent. However, in the method 1), a large amount of manual labeling data is relied on, the quality and the quantity of the labeling data determine the quality of final interaction, and the manual labeling data usually takes a long time and is difficult to finish in a short time; 2) only simple 1-2 rounds of language interaction can be realized, and the interactive content is limited by the content of the semantic slot. Therefore, the scheme is limited by the speed capability of expansion, can only be applied to service scenes which are quite mature and have definite slot values, and is difficult to adapt to the scene requirements of rapid development and change of contents.
(III) the third technical scheme is as follows: speech recognition ASR + sequence-to-sequence learning Seq2Seq + text-to-speech TTS
The method of speech recognition, sequence model plus speech synthesis, so-called end-to-end model, uses the Seq2Seq model in deep learning to directly predict the content of the output text from the input text through a large amount of text data generated by the internet, and converts the text into speech for output through speech synthesis.
However, the adoption of the end-to-end model and the direct prediction of the output text from the input text seems to improve the intelligent interaction capability of the robot, but due to the limitation of the current technical level, the interaction mode only considers the currently input text and does not consider the semantics, so that the robot loses the capability of multiple rounds of conversation and can only be used as chatting. Moreover, the interaction loses controllability since the content of its interaction is given entirely by the model. Most often, the model only memorizes the content of the training data to a certain extent, and does not generate self-intelligence to judge the conversation content, so that the conversation interaction logic is disordered and questions are answered. Thus, this solution is more useful for entertainment and is difficult to apply to production practices.
Disclosure of Invention
The invention aims to solve the problems that: the existing intelligent speech recognition technology has a single recognition mode, is not good in spoken language recognition effect with rigorous grammar, cannot recognize complex speech content, or needs a large amount of manual labeling in recognition learning, is not enough in self-adaptive capacity, needs manual intervention to complete data updating, seems to realize multi-turn conversations in part of schemes, but does not recognize the logical relationship among the conversations in the actual recognition process, and is only simple stacking of single-turn conversations. Aiming at the defects of the prior art, the invention provides an intelligent voice conversation system with multiple rounds of conversations.
The technical scheme of the invention is as follows: a multi-round dialogue intelligent voice interaction system comprises a mixed semantic understanding module, a semantic understanding self-adapting module and an automatic dialogue management module, wherein voice input is converted into text input mixed semantic understanding module after voice recognition, user intention is understood and corresponding state information is extracted, the automatic dialogue management module guides dialogue process based on the user intention, dialogue text is output and converted into voice output to realize dialogue, the semantic understanding self-adapting module is used for optimizing and learning of the mixed semantic understanding module,
the mixed semantic understanding module comprehensively judges conversation semantics by adopting a model fusion mode and combining semantic understanding schemes such as text matching, semantic similarity matching, information retrieval, multi-intention classification models and the like, wherein the text matching belongs to a pre-algorithm, sentences are preprocessed to obtain preprocessed conversation texts, and final results are output together by the semantic similarity matching, the information retrieval and the multi-intention classification model fusion mode;
the semantic understanding self-adapting module optimizes the existing hybrid semantic understanding model in a mode of transfer learning and retraining, and comprises a Bi-LSTM language model, a similarity matching model and a multi-intention classification model;
the automatic conversation management module is used for realizing human-computer interaction control and expansion configuration, and comprehensively judging an output interaction instruction based on the state of the current conversation, the identified current user intention and historical conversation interaction information; and multiple interactions and state conversion are realized through multi-intention recognition and one man-machine conversation.
Furthermore, in the hybrid semantic understanding module, text matching is used for establishing a semantic understanding rule, semantic similarity matching adopts a Bi-LSTM neural network language model based on an attention mechanism to establish a semantic matching model, input dialog texts are vectorized and expressed in combination with the semantic understanding rule, the semantic matching model is finely tuned by adopting a twin network training mode, and finally a regression model of a convolutional neural network is trained on the basis of vectorization expression to predict semantic similarity between the two texts; the information retrieval is implemented through a standard corpus database corresponding to the dialog text and the intention category of the standard corpus, a corpus with the highest semantic similarity with the dialog text is retrieved from a standard corpus based on the semantic similarity, and the intention category is used as the intention classification of the dialog text to realize the intention identification of the input text; and the multi-intention classification model combines the standard corpus data and the service data of the application occasion to generate marking data of multi-intention categories, trains a Bi-LSTM network based on an attention mechanism as the multi-intention classification model, performs multi-classification prediction and provides intention categories of the standard corpus.
Further, the semantic understanding adaptive module is used for optimizing the model in the hybrid semantic understanding module, and comprises the following optimizations:
1) importing the newly added corpus data into training data of a Bi-LSTM semantic matching model, training, and updating the network weight;
2) cleaning and filtering newly added labeled data of the multi-intention classification model, screening out an optimal part of labeled data, mixing the optimal part of labeled data into a standard corpus, carrying out intention prediction on all labeled data again, establishing a supervised sequencing model and a corresponding index monitoring mechanism according to text features of the corpus and the expression of the corpus on a labeled data set, and monitoring the accuracy and recall rate of each intention identification and the corpus change of the labeled corpus;
3) importing the newly added annotation data into a Bi-LSTM multi-purpose classification model, updating the network weight of the model, and monitoring the accuracy and the recall rate of the model on a verification set;
4) and automatically deploying the model in the hybrid semantic understanding module after online updating and a standard corpus. .
Furthermore, the automatic dialogue management module adopts a mode of combining a finite state machine with reinforcement learning, and when each pair of dialogue interaction is executed, the state of the current dialogue, the current user intention and historical interaction information are combined with preset dialogue interaction rules and interaction strategies obtained through learning to carry out comprehensive judgment, and interactive operation instructions to be executed by the dialogue robot are output.
The invention also provides a multi-turn dialogue intelligent voice interaction device which is a computer device with a storage medium, wherein a computer program is loaded in the storage medium and is used for realizing the multi-turn dialogue intelligent voice interaction system.
The invention integrates a plurality of modules of voice recognition, natural language understanding, natural language generation, voice synthesis, conversation management and the like to form a whole set of multi-turn conversation intelligent voice interaction system which is easy to expand, can be configured and can be applied to any scene. In the aspect of natural language understanding, the mixed semantic understanding model is innovatively used, the spoken language semantics are understood in real time by combining the traditional natural language processing technology and the deep neural network algorithm on the basis of deeply mining spoken language rules and summarizing semantic categories, man-machine smooth communication is realized, and interactive experience is improved. Meanwhile, in order to reduce the dependence of model optimization on manual labeling to the maximum extent, a natural language understanding self-adaptive module is customized, and model parameters are automatically updated, so that the model can be updated and optimized by itself.
Drawings
FIG. 1 is a system architecture diagram of an embodiment of the present invention.
Fig. 2 is a schematic view of an identification process according to an embodiment of the present invention.
FIG. 3 is a flow chart of a multi-turn dialog according to an embodiment of the present invention.
Detailed Description
The invention provides a multi-round dialogue intelligent voice interaction system, which comprises a mixed semantic understanding module, a semantic understanding self-adaptive module and an automatic dialogue management module, wherein voice input is converted into text input after voice recognition and then is input into the mixed semantic understanding module, user intention is understood and corresponding state information is extracted, the automatic dialogue management module guides a dialogue process based on the user intention, dialogue text is output and converted into voice output to realize dialogue, the semantic understanding self-adaptive module is used for optimizing and learning the mixed semantic understanding module,
the mixed semantic understanding module comprehensively judges conversation semantics by adopting a model fusion mode and combining semantic understanding schemes such as text matching, semantic similarity matching, information retrieval, multi-intention classification models and the like, wherein the text matching belongs to a pre-algorithm, sentences are preprocessed to obtain preprocessed conversation texts, and final results are output together by the semantic similarity matching, the information retrieval and the multi-intention classification model fusion mode;
the semantic understanding self-adapting module optimizes the existing hybrid semantic understanding model in a mode of transfer learning and retraining, and comprises a Bi-LSTM language model, a similarity matching model and a multi-intention classification model;
the automatic conversation management module is used for realizing human-computer interaction control and expansion configuration, and comprehensively judging an output interaction instruction based on the state of the current conversation, the identified current user intention and historical conversation interaction information; and multiple interactions and state conversion are realized through multi-intention recognition and one man-machine conversation.
Fig. 1 is a schematic diagram of a system structure of a specific implementation of the present invention, in which an ASR module is a speech recognition module, and is configured to transcribe a query text from a speech collected by a user side; the NLU module is a semantic understanding module and is used for understanding user intention and extracting corresponding information, namely a mixed semantic understanding module; the self-adaptive module is a semantic understanding self-adaptive module and assists the self-updating of the NLU module, and the DM module is an automatic conversation management module and guides a conversation process based on the intention of a user; the NLG module is a text generation module and is used for generating a text of voice output based on user intention and knowledge base extraction; the TTS module is a speech synthesis module that converts information to be output into corresponding speech.
The implementation of the various modules of the present invention is described in detail below.
Hybrid semantic understanding module
The mixed semantic understanding module adopts a model fusion mode, mixes semantic understanding schemes such as text matching, semantic similarity matching, information retrieval, intention classification and the like, comprehensively judges dialogue semantics, and greatly increases the accuracy and flexibility of semantic understanding. Text matching is used for establishing a semantic understanding rule, semantic similarity matching adopts a Bi-LSTM neural network language model based on an attention mechanism to establish a semantic matching model, input dialog texts are vectorized and expressed by combining the semantic understanding rule, fine-tune is carried out on the semantic matching model by adopting a twin network training mode, and finally, a regression model of a convolutional neural network is trained on the basis of vectorization expression to predict semantic similarity between the two texts; the information retrieval is implemented through a standard corpus database corresponding to the dialog text and the intention category of the standard corpus, a corpus with the highest semantic similarity with the dialog text is retrieved from a standard corpus based on the semantic similarity, and the intention category is used as the intention classification of the dialog text to realize the intention identification of the input text; and the multi-intention classification model combines the standard corpus data and the service data of the application occasion to generate marking data of multi-intention categories, trains a Bi-LSTM network based on an attention mechanism as the multi-intention classification model, performs multi-classification prediction and provides intention categories of the standard corpus.
1) Text matching
And matching the texts, and carrying out fine-grained induction on common words of the service scene into characteristic words by combining a traditional natural language processing algorithm, such as key word extraction, syntactic analysis and named entity recognition technologies. On the basis of vocabulary classification, semantic understanding rules are established, and one grammar rule is as follows:
[ today ] [ can ] [ repayment ]
The method comprises the following steps of (1) including three fine-grained characteristic words: the three feature words include corresponding common expressions, for example, the feature words include "today, afternoon of today, a moment, an instant, a little, etc., and the feature words can be classified with finer granularity according to the business scene.
The extraction of the characteristic words can be automatically extracted from the text through keyword extraction, syntactic analysis and named entity recognition technologies, and also can be extracted through a character string processing algorithm by manually configuring certain words as the characteristic words, so that the method is very flexible. The text matching rules can be rapidly and flexibly configured without a large amount of manual labels, so that a semantic understanding model can be rapidly realized, and the problem of cold start of a new service scene can be solved by building from scratch.
2) Semantic similarity matching model based on attention mechanism
The Chinese spoken language dialogue has the characteristics of loose grammar structure, variable expression modes, obvious core word meaning and the like. The attention mechanism is a neural network mechanism which puts more weight on the vocabulary determining the text semantics during model training and ignores common spoken words, connective words and the like to a certain extent. Therefore, in order to most effectively mine the semantic information of the short text, the invention trains a Bi-LSTM Neural network Language Model (Attention based Bi-LSTM Neural Language Model) based on the Attention mechanism, constructs a semantic matching Model and vectorizes and represents the input text. Meanwhile, in order to overcome the difficulties of high cost of manual labeling data and small labeling data set, the invention adopts a twin Network (Simease Network) training mode to carry out fine-tune on the Bi-LSTM neural Network semantic matching model, so that the Bi-LSTM neural Network semantic matching model can adapt to a service scene to the maximum extent and generate word vector representation more suitable for the current service scene.
Meanwhile, on the similarity annotation data, each piece of the similarity annotation data is a text with two sentences separated by commas, and the similarity of the corresponding manual annotation is divided into 1-5 grades, wherein 1 represents the least similar text, and 5 represents the complete consistency. In combination with vectorization representation of the semantic matching model, the invention trains a regression model of a Convolutional Neural Network (CNN) for predicting semantic similarity between two texts. When the Matching scores of the two texts exceed a preset threshold (dynamically adjustable), it is considered that the contents of the two text expressions match, that is, the two text expressions are judged to be synonymous, and the intention of the input text is recognized based on the semantic similarity.
3) Information retrieval
Establishing a standard corpus database corresponding to the dialog text and the intention category of the standard corpus, calculating the semantic similarity between the dialog text and the standard corpus, and searching a corpus with the highest semantic similarity with the dialog text from the standard corpus as the intention classification of the dialog text. In the use process of the interactive system, the obtained service data is labeled to supplement the standard corpus, so that the coverage rate and the accuracy of the semantic understanding capability of the interactive system can be improved, the optimization efficiency of the voice interactive system is greatly improved, and the whole system achieves the effect of more intelligent use.
4) Multi-intent classification model
Because the Chinese spoken language has quite rich expression capability, the same intention can have various expression modes, however, the standard corpus cannot be supplemented without limit. Therefore, the invention combines the standard corpus data and the service data to generate the labeling data of multiple intention categories, trains a Bi-LSTM network based on the attention mechanism as a multiple intention classification model, performs multiple classification prediction, provides the intention categories of the standard corpus, and provides support for information retrieval. Due to the capability of automatically extracting features by the neural network, the dependence of a semantic understanding module on a standard corpus is reduced to a great extent, and the prediction accuracy of the model is further improved.
Semantic understanding adaptive module
The semantic understanding self-adaptive module aims to minimize labor cost, intelligently optimizes the models in the existing hybrid semantic understanding module in a mode of transfer learning and retraining, and improves the semantic understanding capability of the existing business scene by a Bi-LSTM semantic matching model, a similarity matching model, a multi-intention classification model and the like.
Carrying out intention labeling on input text data, 1) introducing newly added corpus data into training data of a Bi-LSTM semantic matching model, training, and updating the network weight of the newly added corpus data; 2) and cleaning and filtering the newly added annotation data, screening out an optimal part of annotation data, mixing the optimal part of annotation data into the standard corpus, and performing intention prediction on all the annotation data again. According to the text characteristics of the corpus and the expression of the corpus on the labeled data set, a supervised sequencing model and a corresponding index monitoring mechanism are established, and the accuracy and the recall rate of each intention identification, the corpus change of the labeled corpus and the like are monitored; 3) importing the newly added annotation data into a Bi-LSTM multi-purpose classification model, updating the network weight of the model, and monitoring the accuracy and the recall rate of the model on a verification set; 4) and automatically deploying the updated model and the annotated corpus on line. And the semantic understanding self-adaptive module is used for automatically updating the word vectors, the text semantic similarity model, the standard corpus, the intention classifier and the like, so that the manual intervention is minimized.
Automatic dialogue management module
The automatic dialogue management module is an important module for realizing the controllable and rapid configuration expansion of human-computer interaction and is an important guarantee for the smoothness and the nature of the human-computer interaction. The automatic conversation management module is used for comprehensively judging the output interactive instruction based on the current conversation state, the current user intention and historical interactive information; and through multi-purpose recognition, the automatic dialogue management module can realize multiple interaction and state conversion through one-time man-machine dialogue, so that the fluency of man-machine interaction is greatly increased.
The automatic dialogue management module adopts a mode of combining a Finite-State Machine (finish-State Machine) and reinforcement learning, and transmits the State of the current dialogue, the current user intention and the history interactive information into the dialogue management module when each round of interaction is executed. And the dialogue management module is used for outputting interactive operation which the robot should execute, such as executing the next round of interaction, inquiring a knowledge base, interrupting the voice broadcast of the robot or not, executing default intention or not and the like, by combining a preset interaction rule and an interaction strategy obtained through learning.
Meanwhile, in the automatic dialogue management module, the user can define the interactive flow of a plurality of rounds of dialogue, namely, the interactive rule is set, thereby realizing the rapid expansion of configuration.
The following describes the implementation of the present invention through a specific implementation scenario.
The implementation of the invention is illustrated by taking the communication of financial scenes as an example, and is different from other traditional service industries, the accuracy and the compliance of the occasions of the financial scenes are very strict, the scenes are flexible and changeable, and the requirements on interaction are different because the customers face different groups, so that the requirements on the semantic understanding accuracy, the interaction flow controllability and the scene expansion flexibility of the intelligent interactive robot are very high. However, it is difficult to satisfy the above requirements simultaneously with the existing technical solutions. Due to the communication characteristics of financial scenes, the invention provides a complete multi-turn dialogue intelligent voice interaction system, and realizes the strict control of dialogue flows by opening all modules such as voice recognition, natural language understanding, natural language generation, voice synthesis and the like, can quickly expand configuration, flexibly cope with different interaction scenes, can self-adapt, automatically improve the natural language understanding capability and minimize manual intervention.
The credit card arrearage is taken as an example of a real application scene of the invention for explanation, the accuracy rate of the invention in the pure language perspective (only considering different texts and not considering the occurrence frequency of the texts) intention identification in the scene exceeds 85 percent, and the industry leading level is reached. As a voice interaction system of multi-turn dialogue, when each turn of interaction is carried out, firstly, the voice input of a user is transcribed into a text through an ASR module, and the text content is transmitted into a semantic understanding module; the semantic understanding module processes and extracts features of the input text and predicts the intention of the input text. Transmitting the predicted intention into a dialogue management module, and judging interactive operation to be executed by the robot next step; according to the interactive instruction output by the dialogue management module, the operations of inquiring a knowledge base (optional), generating a text, generating a voice and the like are executed; the voice is transmitted to the client for broadcasting through the network, as shown in fig. 2.
The multi-turn conversation process of the credit card collection scenario is shown in fig. 3, where the node with the bifurcation selection is the main process node, the gray node is the on-hook node at the end of the process, and the white node is the corresponding intention. The whole human-computer interaction process is connected in series according to the intention type through the automatic dialogue management module, so that the non-sensory human-computer interaction is realized.
The invention develops three important modules, namely a hybrid semantic understanding module, a semantic understanding self-adaptive module, an automatic dialogue management module and the like, realizes the strict control of multi-turn dialogue, improves the self-understanding capability of natural language, can be flexibly configured and meets different interaction requirements.

Claims (3)

1. A multi-round dialogue intelligent voice interaction system is characterized by comprising a hybrid semantic understanding module, a semantic understanding self-adaptive module and an automatic dialogue management module, wherein voice input is converted into text input through voice recognition and then is input into the hybrid semantic understanding module, user intention is understood and corresponding state information is extracted, the automatic dialogue management module guides a dialogue process based on the user intention, dialogue texts are output and converted into voice output to realize dialogue, and the semantic understanding self-adaptive module is used for optimizing and learning the hybrid semantic understanding module;
the mixed semantic understanding module comprehensively judges conversation semantics by adopting a model fusion mode and combining semantic understanding schemes such as text matching, semantic similarity matching, information retrieval, multi-intention classification models and the like, wherein the text matching belongs to a pre-algorithm, sentences are preprocessed to obtain preprocessed conversation texts, and final results are output together by the semantic similarity matching, the information retrieval and the multi-intention classification model fusion mode; the text matching is used for establishing a semantic understanding rule, the semantic similarity matching adopts a Bi-LSTM neural network language model based on an attention mechanism to establish a semantic matching model, an input dialog text is vectorized and represented by combining the semantic understanding rule, the fine-tune is carried out on the semantic matching model by adopting a twin network training mode, and finally the semantic similarity between the two texts is predicted by training a regression model of a convolutional neural network based on vectorization representation; the information retrieval is implemented through a standard corpus database corresponding to the dialog text and the intention category of the standard corpus, a corpus with the highest semantic similarity with the dialog text is retrieved from a standard corpus based on the semantic similarity, and the intention category is used as the intention classification of the dialog text to realize the intention identification of the input text; the multi-intention classification model combines standard corpus data and service data of application occasions to generate marking data of multi-intention categories, a Bi-LSTM network based on an attention mechanism is trained to serve as the multi-intention classification model, multi-classification prediction is carried out, and intention categories of the standard corpus are provided;
the semantic understanding self-adapting module optimizes the existing hybrid semantic understanding model in a mode of transfer learning and retraining, wherein the hybrid semantic understanding model comprises a Bi-LSTM semantic matching model, a similarity matching model and a multi-intention classification model, and comprises the following optimization steps:
1) importing the newly added corpus data into training data of a Bi-LSTM semantic matching model, training, and updating the network weight;
2) cleaning and filtering newly added labeled data of the multi-intention classification model, screening out an optimal part of labeled data, mixing the optimal part of labeled data into a standard corpus, carrying out intention prediction on all labeled data again, establishing a supervised sequencing model and a corresponding index monitoring mechanism according to text features of the corpus and the expression of the corpus on a labeled data set, and monitoring the accuracy and recall rate of each intention identification and the corpus change of the labeled corpus;
3) importing the newly added annotation data into a Bi-LSTM multi-purpose classification model, updating the network weight of the model, and monitoring the accuracy and the recall rate of the model on a verification set;
4) automatically deploying a model in the hybrid semantic understanding module after online updating and a standard corpus;
the automatic conversation management module is used for realizing human-computer interaction control and expansion configuration, and comprehensively judging an output interaction instruction based on the state of the current conversation, the identified current user intention and historical conversation interaction information; and multiple interactions and state conversion are realized through multi-intention recognition and one man-machine conversation.
2. The multi-round dialogue intelligent voice interaction system as claimed in claim 1, wherein the automatic dialogue management module adopts a mode of combining a finite state machine and reinforcement learning, and when each round of dialogue interaction is executed, carries out comprehensive judgment on the state of the current dialogue, the current user intention and historical interaction information by combining preset dialogue interaction rules and interaction strategies obtained through learning, and outputs an interactive operation instruction to be executed by the dialogue robot.
3. A multi-turn dialog intelligent voice interaction device, characterized in that the device is a computer device having a storage medium, wherein a computer program is loaded on the storage medium, and the computer program is used for implementing the multi-turn dialog intelligent voice interaction system of claim 1 or 2.
CN201910505280.4A 2019-06-12 2019-06-12 Multi-round dialogue intelligent voice interaction system and device Active CN110209791B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910505280.4A CN110209791B (en) 2019-06-12 2019-06-12 Multi-round dialogue intelligent voice interaction system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910505280.4A CN110209791B (en) 2019-06-12 2019-06-12 Multi-round dialogue intelligent voice interaction system and device

Publications (2)

Publication Number Publication Date
CN110209791A CN110209791A (en) 2019-09-06
CN110209791B true CN110209791B (en) 2021-03-26

Family

ID=67792172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910505280.4A Active CN110209791B (en) 2019-06-12 2019-06-12 Multi-round dialogue intelligent voice interaction system and device

Country Status (1)

Country Link
CN (1) CN110209791B (en)

Families Citing this family (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112550306A (en) * 2019-09-10 2021-03-26 奥迪股份公司 Vehicle driving assistance system, vehicle including the same, and corresponding method and medium
CN110705311B (en) * 2019-09-27 2022-11-25 安徽咪鼠科技有限公司 Semantic understanding accuracy improving method, device and system applied to intelligent voice mouse and storage medium
CN110633475A (en) * 2019-09-27 2019-12-31 安徽咪鼠科技有限公司 Natural language understanding method, device and system based on computer scene and storage medium
CN110737762A (en) * 2019-10-09 2020-01-31 尹曦 old people personal information assistant system based on voice interaction
CN110781685B (en) * 2019-10-18 2022-08-19 四川长虹电器股份有限公司 Method for automatically marking correctness of semantic analysis result based on user feedback
CN110866099B (en) * 2019-10-30 2023-05-09 上海益商网络科技有限公司 Intelligent manager service method and system based on intelligent sound box voice interaction
CN110765270B (en) * 2019-11-04 2022-07-01 思必驰科技股份有限公司 Training method and system of text classification model for spoken language interaction
CN111128147A (en) * 2019-11-18 2020-05-08 云知声智能科技股份有限公司 System and method for terminal equipment to automatically access AI multi-turn conversation capability
CN111078846A (en) * 2019-11-25 2020-04-28 青牛智胜(深圳)科技有限公司 Multi-turn dialog system construction method and system based on business scene
CN112992132A (en) * 2019-12-02 2021-06-18 浙江思考者科技有限公司 AI intelligent voice interaction program bridging one-key application applet
CN110956142A (en) * 2019-12-03 2020-04-03 中国太平洋保险(集团)股份有限公司 Intelligent interactive training system
CN111090726A (en) * 2019-12-04 2020-05-01 中国南方电网有限责任公司 NLP-based electric power industry character customer service interaction method
CN111090730A (en) * 2019-12-05 2020-05-01 中科数智(北京)科技有限公司 Intelligent voice scheduling system and method
CN110942774A (en) * 2019-12-12 2020-03-31 北京声智科技有限公司 Man-machine interaction system, and dialogue method, medium and equipment thereof
CN111091826B (en) * 2019-12-13 2023-09-01 中博信息技术研究院有限公司 Intelligent voice robot system based on deep learning and finite state machine
CN111143525A (en) * 2019-12-17 2020-05-12 广东广信通信服务有限公司 Vehicle information acquisition method and device and intelligent vehicle moving system
CN111158648B (en) * 2019-12-18 2023-04-07 西安电子科技大学 Interactive help system development method based on live-action semantic understanding and platform thereof
CN111091011B (en) * 2019-12-20 2023-07-28 科大讯飞股份有限公司 Domain prediction method, domain prediction device and electronic equipment
CN111104502A (en) * 2019-12-24 2020-05-05 携程计算机技术(上海)有限公司 Dialogue management method, system, electronic device and storage medium for outbound system
CN111128126B (en) * 2019-12-30 2023-04-07 海智讯通(上海)智能科技有限公司 Multi-language intelligent voice conversation method and system
CN111241236B (en) * 2019-12-30 2023-08-22 新大陆数字技术股份有限公司 Task-oriented question-answering method, system, electronic device and readable storage medium
CN111243587A (en) 2020-01-08 2020-06-05 北京松果电子有限公司 Voice interaction method, device, equipment and storage medium
CN111257971A (en) * 2020-01-17 2020-06-09 河北冀云气象技术服务有限责任公司 Meteorological platform with artificial intelligence service ability and learning ability
CN111128175B (en) * 2020-01-19 2021-04-16 大连即时智能科技有限公司 Spoken language dialogue management method and system
CN111314451A (en) * 2020-02-07 2020-06-19 普强时代(珠海横琴)信息技术有限公司 Language processing system based on cloud computing application
CN111324708A (en) * 2020-02-07 2020-06-23 普强时代(珠海横琴)信息技术有限公司 Natural language processing system based on human-computer interaction
CN111312242A (en) * 2020-02-13 2020-06-19 上海凯岸信息科技有限公司 Intelligent voice robot scheme capable of interrupting intention without influencing dialogue management
CN111427996B (en) * 2020-03-02 2023-10-20 云知声智能科技股份有限公司 Method and device for extracting date and time from man-machine interaction text
CN111325037B (en) * 2020-03-05 2022-03-29 苏宁云计算有限公司 Text intention recognition method and device, computer equipment and storage medium
CN111475616B (en) * 2020-03-13 2023-08-22 平安科技(深圳)有限公司 Multi-round dialogue method and device based on dialogue state prediction and computer equipment
CN111428017B (en) * 2020-03-24 2022-12-02 科大讯飞股份有限公司 Human-computer interaction optimization method and related device
CN111460118B (en) * 2020-03-26 2023-10-20 聚好看科技股份有限公司 Artificial intelligence conflict semantic recognition method and device
CN111428483B (en) * 2020-03-31 2022-05-24 华为技术有限公司 Voice interaction method and device and terminal equipment
CN111462752B (en) * 2020-04-01 2023-10-13 北京思特奇信息技术股份有限公司 Attention mechanism, feature embedding and BI-LSTM (business-to-business) based customer intention recognition method
CN111508488A (en) * 2020-04-13 2020-08-07 江苏止芯科技有限公司 Intelligent robot dialogue system
CN111538814B (en) * 2020-04-26 2024-03-08 云知声智能科技股份有限公司 Method for supporting custom standardization by protocol in semantic understanding
CN111611378A (en) * 2020-05-15 2020-09-01 金日泽 Behavior training dialogue control method, behavior training dialogue control system, storage medium, program, and terminal
CN111639223B (en) * 2020-05-26 2024-04-19 广东小天才科技有限公司 Audio generation method of virtual object for spoken language exercise and electronic equipment
CN111833872B (en) * 2020-07-08 2021-04-30 北京声智科技有限公司 Voice control method, device, equipment, system and medium for elevator
CN111858888B (en) * 2020-07-13 2023-05-30 北京航空航天大学 Multi-round dialogue system of check-in scene
CN112017654A (en) * 2020-07-17 2020-12-01 武汉赛思云科技有限公司 Method and system for realizing non-interface office based on human-computer voice interaction
CN112115244B (en) * 2020-08-21 2024-05-03 深圳市欢太科技有限公司 Dialogue interaction method and device, storage medium and electronic equipment
CN112035632A (en) * 2020-08-21 2020-12-04 惠州市德赛西威汽车电子股份有限公司 Preferred distribution method and system suitable for multi-conversation robot collaboration task
CN112199486A (en) * 2020-10-21 2021-01-08 中国电子科技集团公司第十五研究所 Task type multi-turn conversation method and system for office scene
CN112256854A (en) * 2020-11-05 2021-01-22 云南电网有限责任公司 Intelligent AI conversation method and device based on AI natural language understanding
CN112599124A (en) * 2020-11-20 2021-04-02 内蒙古电力(集团)有限责任公司电力调度控制分公司 Voice scheduling method and system for power grid scheduling
CN112750434B (en) * 2020-12-16 2021-10-15 马上消费金融股份有限公司 Method and device for optimizing voice recognition system and electronic equipment
CN112527969B (en) * 2020-12-22 2022-11-15 上海浦东发展银行股份有限公司 Incremental intention clustering method, device, equipment and storage medium
CN112784027B (en) * 2021-01-21 2024-05-14 军事科学院系统工程研究院系统总体研究所 Natural language interaction system and method in intelligent networking
CN112732887A (en) * 2021-01-22 2021-04-30 南京英诺森软件科技有限公司 Processing device and system for multi-turn conversation
CN112818097A (en) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 Off-task training system based on dialog box state tracking model
CN112836030B (en) * 2021-01-29 2023-04-25 成都视海芯图微电子有限公司 Intelligent dialogue system and method
CN112989829B (en) * 2021-02-10 2024-03-08 卡奥斯数字科技(上海)有限公司 Named entity recognition method, device, equipment and storage medium
CN112818107B (en) * 2021-02-24 2023-10-31 中国人民大学 Conversation robot for daily life and chat method thereof
CN112905747A (en) * 2021-03-08 2021-06-04 国能大渡河流域水电开发有限公司 Professional system archive question-answering robot system based on semantic analysis technology
CN113204971B (en) * 2021-03-26 2024-01-26 南京邮电大学 Scene self-adaptive Attention multi-intention recognition method based on deep learning
CN113409631A (en) * 2021-06-18 2021-09-17 上海锡鼎智能科技有限公司 AI auxiliary teaching robot
CN113515616B (en) * 2021-07-12 2024-05-14 中国电子科技集团公司第二十八研究所 Task driving system based on natural language
CN113689851B (en) * 2021-07-27 2024-02-02 国家电网有限公司 Scheduling professional language understanding system and method
CN113687719A (en) * 2021-08-23 2021-11-23 广东电网有限责任公司 Intelligent interaction method and device suitable for voice information
CN113935309A (en) * 2021-09-13 2022-01-14 惠州市德赛西威汽车电子股份有限公司 Skill optimization processing method and system based on semantic platform
CN114036277A (en) * 2021-11-15 2022-02-11 深圳壹账通智能科技有限公司 Dialogue robot route skipping method and device, electronic equipment and medium
CN114171021A (en) * 2021-11-29 2022-03-11 国网江苏省电力有限公司南京供电分公司 Distribution network virtual scheduling system and method based on artificial intelligence
CN114582314B (en) * 2022-02-28 2023-06-23 江苏楷文电信技术有限公司 Man-machine audio-video interaction logic model design method based on ASR
CN114706965B (en) * 2022-03-22 2022-11-11 广州营客信息科技有限公司 AI intelligent customer service system
CN114970559B (en) * 2022-05-18 2024-02-02 马上消费金融股份有限公司 Intelligent response method and device
CN115168593B (en) * 2022-09-05 2022-11-29 深圳爱莫科技有限公司 Intelligent dialogue management method capable of self-learning and processing equipment
CN115794065B (en) * 2022-11-01 2023-11-03 中犇科技有限公司 Visual intelligent programming method based on AI voice interaction
CN115936011B (en) * 2022-12-28 2023-10-20 南京易米云通网络科技有限公司 Multi-intention semantic recognition method in intelligent dialogue
CN117671212A (en) * 2023-12-13 2024-03-08 江苏麦克数字空间营造有限公司 Exhibition hall exhibition system based on meta universe and interaction method thereof
CN117648408B (en) * 2024-01-30 2024-04-30 北京水滴科技集团有限公司 Intelligent question-answering method and device based on large model, electronic equipment and storage medium
CN117807215B (en) * 2024-03-01 2024-05-24 青岛海尔科技有限公司 Statement multi-intention recognition method, device and equipment based on model

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354835A (en) * 2016-08-31 2017-01-25 上海交通大学 Artificial dialogue auxiliary system based on context semantic understanding
CN106407178A (en) * 2016-08-25 2017-02-15 中国科学院计算技术研究所 Session abstract generation method and device
CN107193978A (en) * 2017-05-26 2017-09-22 武汉泰迪智慧科技有限公司 A kind of many wheel automatic chatting dialogue methods and system based on deep learning
CN107346340A (en) * 2017-07-04 2017-11-14 北京奇艺世纪科技有限公司 A kind of user view recognition methods and system
CN107515944A (en) * 2017-08-31 2017-12-26 广东美的制冷设备有限公司 Exchange method, user terminal and storage medium based on artificial intelligence
CN108228764A (en) * 2017-12-27 2018-06-29 神思电子技术股份有限公司 A kind of single-wheel dialogue and the fusion method of more wheel dialogues
CN109241255A (en) * 2018-08-20 2019-01-18 华中师范大学 A kind of intension recognizing method based on deep learning
CN109522393A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Intelligent answer method, apparatus, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11157490B2 (en) * 2017-02-16 2021-10-26 Microsoft Technology Licensing, Llc Conversational virtual assistant
CN108415923B (en) * 2017-10-18 2020-12-11 北京邮电大学 Intelligent man-machine conversation system of closed domain
CN108717439A (en) * 2018-05-16 2018-10-30 哈尔滨理工大学 A kind of Chinese Text Categorization merged based on attention mechanism and characteristic strengthening
CN108829667A (en) * 2018-05-28 2018-11-16 南京柯基数据科技有限公司 It is a kind of based on memory network more wheels dialogue under intension recognizing method
CN108874972B (en) * 2018-06-08 2021-10-19 合肥工业大学 Multi-turn emotion conversation method based on deep learning
CN108874782B (en) * 2018-06-29 2019-04-26 北京寻领科技有限公司 A kind of more wheel dialogue management methods of level attention LSTM and knowledge mapping

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407178A (en) * 2016-08-25 2017-02-15 中国科学院计算技术研究所 Session abstract generation method and device
CN106354835A (en) * 2016-08-31 2017-01-25 上海交通大学 Artificial dialogue auxiliary system based on context semantic understanding
CN107193978A (en) * 2017-05-26 2017-09-22 武汉泰迪智慧科技有限公司 A kind of many wheel automatic chatting dialogue methods and system based on deep learning
CN107346340A (en) * 2017-07-04 2017-11-14 北京奇艺世纪科技有限公司 A kind of user view recognition methods and system
CN107515944A (en) * 2017-08-31 2017-12-26 广东美的制冷设备有限公司 Exchange method, user terminal and storage medium based on artificial intelligence
CN108228764A (en) * 2017-12-27 2018-06-29 神思电子技术股份有限公司 A kind of single-wheel dialogue and the fusion method of more wheel dialogues
CN109241255A (en) * 2018-08-20 2019-01-18 华中师范大学 A kind of intension recognizing method based on deep learning
CN109522393A (en) * 2018-10-11 2019-03-26 平安科技(深圳)有限公司 Intelligent answer method, apparatus, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
人机对话系统中意图识别方法综述;刘娇等;《计算机工程与应用》;20191231;第55卷(第12期);全文 *
人机对话系统中用户意图分类方法研究;黄佳伟;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115(第1期);全文 *
基于中文知识库的问答系统研究与实现;史梦飞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115(第1期);第3-4章 *
基于深度学习的智能问答系统设计;洪源;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190315(第3期);全文 *

Also Published As

Publication number Publication date
CN110209791A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110209791B (en) Multi-round dialogue intelligent voice interaction system and device
CN111651609A (en) Multi-turn dialogue method and system integrating knowledge graph and emotion supervision
CN111653262B (en) Intelligent voice interaction system and method
CN111666381B (en) Task type question-answer interaction system oriented to intelligent control
WO2022057712A1 (en) Electronic device and semantic parsing method therefor, medium, and human-machine dialog system
CN101447185B (en) Audio frequency rapid classification method based on content
WO2022022746A1 (en) Intent recognition method and intent recognition system having self learning capability
CN110853649A (en) Label extraction method, system, device and medium based on intelligent voice technology
US20240153489A1 (en) Data driven dialog management
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN105810200A (en) Man-machine dialogue apparatus and method based on voiceprint identification
CN108899013A (en) Voice search method, device and speech recognition system
WO2021135534A1 (en) Speech recognition-based dialogue management method, apparatus, device and medium
CN104391673A (en) Voice interaction method and voice interaction device
CN114691852B (en) Man-machine conversation system and method
CN111161726B (en) Intelligent voice interaction method, device, medium and system
CN113178193A (en) Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN111128175B (en) Spoken language dialogue management method and system
CN115392264A (en) RASA-based task-type intelligent multi-turn dialogue method and related equipment
CN109933773A (en) A kind of multiple semantic sentence analysis system and method
CN111081218A (en) Voice recognition method and voice control system
CN112818011B (en) Improved TextCNN and TextRNN rumor identification method
CN112257432A (en) Self-adaptive intention identification method and device and electronic equipment
CN117524202A (en) Voice data retrieval method and system for IP telephone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant