CN110706702A

CN110706702A - Infinite stage multi-turn dialogue method for speech recognition

Info

Publication number: CN110706702A
Application number: CN201910978335.3A
Authority: CN
Inventors: 杨毅; 李秋标; 吴恭辉; 陈伟德; 吴启庭; 黄永煌
Original assignee: Wuhu Ombo Technology Co Ltd
Current assignee: Wuhu Ombo Technology Co Ltd
Priority date: 2019-10-15
Filing date: 2019-10-15
Publication date: 2020-01-17
Anticipated expiration: 2039-10-15
Also published as: CN110706702B

Abstract

The invention discloses a voice recognition infinite-stage multi-turn dialogue method, which comprises the steps of formulating a functional card model file, wherein the card type of the functional card model file comprises a main card and an alternative card, after voice input and a recognition result are obtained, recognized semantics are matched with corresponding semantics of the cards one by one, after the recognized semantics are matched with the cards, corresponding card instructions are executed, the corresponding alternative cards are reloaded into a pool to be matched according to the corresponding next-stage dialogue ID, the main card stored into the corresponding pool to be matched is taken as a matching object in the first turn of dialogue, then each turn of dialogue takes the reloaded alternative card after the previous turn of dialogue is matched as a matching object, and the dialogue is ended when the matched card has no next-stage dialogue ID or undefined abnormality occurs. The number of the targets required to be matched in the multi-stage conversation is only related to the number of the targets in the previous stage, the increase of the number of the associated conversation stages has no influence on the matching efficiency, and the matching efficiency and the reliability of the multi-stage infinite round of conversation are guaranteed.

Description

Infinite stage multi-turn dialogue method for speech recognition

Technical Field

The invention relates to an infinite-level multi-turn dialogue recognition method.

Background

At present, the application range of the voice recognition dialogue mode control equipment is wider and wider, and the voice recognition dialogue mode control equipment also starts to adopt the mode to control and automatically answer. In multiple rounds of dialogues, one dialog is related to the requirement of the dialog itself, and the correct feedback and response corresponding to the dialog need to be judged by combining the content of the previous dialog, compared with a single dialog, which is called a secondary dialog, the more and higher the number of the dialog layers related to the front and the back are, but the most of the speech recognition multi-level dialogues in the market at present are three-level dialogues, i.e., a dialog can be made that is continuously related to the content of the previous dialog, can only contain three levels, because existing dialogs have more targets in the pool to be matched when answering or performing a match, and the involvement of an associated multi-level dialog also requires multiple matches in conjunction with previous dialogs, which greatly reduces the efficiency and reliability of the matching, and as the number of the associated dialog levels is increased, the later dialog is obtained, the matching effect is poorer, and the currently associated multi-level dialog can only realize limited round of dialog within three levels.

Disclosure of Invention

The invention aims to provide a voice recognition infinite-stage multi-turn dialogue method, which aims to solve the problem that the related dialogue stages of multi-turn dialogue are limited due to the fact that related targets and related contents are more in matching, matching efficiency is low, error rate is high in the prior art.

The voice recognition infinite-stage multi-turn conversation method comprises the steps that a functional card model file is formulated, the card type of the functional card model file comprises a main card and an alternative card, the content of the functional card model file comprises a card type ID, a card ID, corresponding semantics of the card, a card instruction and a next-stage conversation ID of the card, and the card ID of the alternative card is the next-stage conversation ID of the main card or the alternative card; after voice input and analysis are carried out to obtain semantics, the semantics are matched with the corresponding semantics of the cards one by one, corresponding card instructions are executed after the semantics are matched with the cards, the matched alternative cards are reloaded into a pool to be matched according to the card IDs contained in the corresponding next-level conversation IDs, the pool to be matched after the cards are reset only contains a main card as a matching object, each round of conversation outside the first level takes the reloaded alternative card after the previous round of conversation is matched as the matching object, and the conversation is ended when the matched cards have no card next-level conversation IDs or undefined abnormal conditions occur.

Preferably, the system formulates the function card model file according to a set semantic understanding protocol, and the function card model file is arranged in groups according to a main card and an alternative card, and the method specifically comprises the following steps:

the method comprises the following steps that firstly, initialization is carried out after a system is started, only a main card is loaded in a pool to be matched to realize card resetting, and then the pool is in an idle state to wait for voice input;

secondly, the system receives the dialogue voice and obtains semantics through the analysis of a voice recognition module;

step three, matching the semantics with the corresponding semantics of each card in the pool to be matched one by one, and executing the step five when the matching is successful, otherwise executing the step four;

step four, checking whether the abnormal condition is defined or not when the matching is unsuccessful, if so, prompting a user to repeat the dialogue voice, returning to the step two to analyze the semantics again, and if not, executing the step six;

step five, if the semantic matching is successful, executing a card instruction of the matched card, checking whether the card has a next-level dialogue ID, if so, reloading the corresponding next-level alternative card into a pool to be matched, then entering an idle state to wait for voice input, and if not, executing step six, wherein the card is a dialogue endpoint;

and step six, ending the conversation to reset the card, and then waiting for voice input in an idle state.

Preferably, the method is provided with a parameter abnormal frequency representing the frequency of occurrence of the defined abnormal condition, the initial value is 0, the abnormal frequency is +1 if the defined abnormal condition is judged in the fourth step, then the abnormal frequency is checked, if the abnormal frequency is less than 3, the user is prompted to repeat the dialogue voice and return to the second step to analyze the semantics again, and if not, the sixth step is executed.

Preferably, the method further comprises the steps of after the system executes the card instruction, if the card is not reset, starting timing through a timer, resetting the timer after the system receives the next round of conversation, judging that the system is abnormal after the timing is greater than a set first threshold, prompting a user to provide conversation voice if the abnormal times are less than 3, resetting a counter for timing, and resetting and closing the timer when the card is reset.

Preferably, the next-level session ID of the candidate card may be null, or may include one or more card IDs of the candidate cards, and when reloading the candidate cards, all the card IDs included in the next-level session ID are read out respectively, and then the candidate cards corresponding to each card ID are reloaded into the pool to be matched.

Preferably, the defined abnormal condition comprises that the semantic is normal after being analyzed, but the semantic is not matched with the card, and the dialogue voice is not received after overtime; the undefined abnormal condition comprises that the received voice cannot be analyzed to obtain semantics and the semantics comprise abnormal characters.

The invention has the following advantages:

according to the method, the function card model file is formulated to be used for matching with the semantic meaning recognized by the voice to feed back the execution command, the matched cards are divided into the main card and the alternative cards, the main card is used for the first-stage conversation (also the first-round conversation) of conversation start, and the main card and the alternative cards are divided into different pools to be matched, so that the target number of conversation matching is greatly reduced at the beginning of the conversation, and the matching efficiency and accuracy are improved. Meanwhile, for each stage of later conversation, the next stage of conversation ID related to the card is set, the possible related next stage of alternative cards are searched out for reloading, the number of targets matched with the later conversation is changed, invalid targets are prevented from being added into objects matched one by one, therefore, when all stages of later conversations are matched, the number of targets is only related to the number of card IDs contained in the next stage of conversation ID of the card corresponding to the previous stage of conversation, no matter how many stages of conversations are related to the previous stage of conversation, the matching efficiency cannot be greatly influenced due to the increase of the number of the related conversation stages, and the matching efficiency and reliability of multi-stage infinite conversation are guaranteed.

Drawings

FIG. 1 is a flow chart of a method for infinite multi-turn speech recognition according to the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be given in order to provide those skilled in the art with a more complete, accurate and thorough understanding of the inventive concept and technical solutions of the present invention.

As shown in fig. 1, the present invention provides a voice recognition infinite-level multi-round conversation method, in which a semantic understanding protocol set by a voice recognition matching system used in the method defines a function card model file, the card types of the function card model file include a main card and an alternative card, the contents of the function card model file include a card type ID, a card ID, corresponding semantics of the card, a card instruction, and a next-level conversation ID of the card, the card IDs of the alternative cards are all the next-level conversation IDs of the main card or the alternative card, and the main card and the alternative card are stored in groups.

The infinite-stage multi-turn dialogue recognition method specifically comprises the following steps of:

The system is initialized firstly after being started, the cards in the pool to be matched are reset, namely the cards are all reloaded into the main cards, so that only the main cards in the pool to be matched corresponding to the first wheel pair session are used as matching objects, and then if the previous session of each wheel pair session is not finished and the cards in the pool to be matched are not reset, the reloaded alternative cards after the previous wheel pair session is matched are used as the matching objects.

The matched cards are divided into main cards and alternative cards, the main cards are used for the first-level conversation (also the first round of conversation) of conversation starting, and the main cards and the alternative cards are divided into different pools to be matched, so that the target number of conversation matching is greatly reduced at the beginning of the conversation, and the matching efficiency and accuracy are improved. When all levels of conversations are matched, the number of the matched card targets is only related to the number of the card IDs contained in the next level of conversation IDs of the corresponding cards of the previous level of conversations, no matter how many levels of conversations are related to the previous level of conversations, the matching efficiency cannot be greatly influenced due to the increase of the number of the related conversation levels, and the matching efficiency and the reliability of the multi-level infinite conversation are guaranteed.

Furthermore, in order to prevent the user from being repeatedly prompted to repeatedly talk to enter the dead loop, the method is provided with a parameter abnormal frequency representing the frequency of occurrence of the defined abnormal condition, and the initial value is 0. And in the fourth step, if the abnormal condition is judged to be the defined abnormal condition, the abnormal times are plus 1, then the abnormal times are checked, if the abnormal times are less than 3, a user is prompted to repeat the dialogue voice and return to the second step to analyze the semantics again, if not, the sixth step is executed, and the card is reset after the dialogue is ended.

Therefore, when the received voice is unclear, the voice which can not be correctly recognized appears, and the semantic content exceeds the range which can be responded by the system, the method can firstly repeat the voice for 2 times and then recognize and match, can timely finish the conversation when the voice can not be matched for more than 3 times so as to prevent the effect of repeated prompting endless loop, and feeds back a user to enable the user to know the condition that the voice recognition matching has errors, thereby timely remedying.

The defined abnormal conditions comprise that the semantic is normal but can not be matched with the card after being analyzed, and the dialogue voice is not received after overtime; the undefined abnormal condition comprises that the received voice cannot be analyzed to obtain semantics and the semantics comprise abnormal characters. The normal semantics obtained through analysis and the failure of matching with the card generally belong to normal conversation, but the conversation content exceeds the range of the system executable command and cannot be realized; or the speech is unclear, so that the semantic analysis has a small amount of errors and cannot be matched. The absence of the received speech is usually that the next dialog is not performed for a long time after the first dialog, and the user may leave or not want to perform the next dialog, but may just forget to perform the next dialog. These situations require the user to be prompted to repeat or conduct the conversation and to end the conversation when multiple failures occur. When the speech is analyzed to obtain the semantics and the semantics contain abnormal characters, the input speech or the input speech does not belong to the normal conversation of the user and may be noise; on the other hand, the problem may occur that the speech recognition matching system itself is out of order, resulting in information errors, which need to be corrected, and the dialog should be ended and the problem should be fed back to the user.

The method further comprises the steps that after the system executes a card instruction, if the card is not reset, timing is started through a timer, the timer is reset after the system receives the next round of conversation, the system judges that the system is abnormal after the timing is larger than a set first threshold value, if the abnormal times are smaller than 3, a user is prompted to give out conversation voice, meanwhile, a counter is reset to conduct timing, and the system resets and closes the timer when the card is reset. Therefore, after the multi-level conversation starts, the method can remind the user who forgets to continue the conversation in time, and can also remind the user to automatically stop the conversation service after the user is not fed back 2 times when the user leaves the scene or does not want to continue the conversation, thereby reducing the energy consumption of the system and avoiding the repeated prompt from disturbing the user or others.

The next-level conversation ID of the alternative cards can be empty, and can also comprise one card ID of one to a plurality of alternative cards, when the alternative cards are reloaded, all the card IDs contained in the next-level conversation ID are respectively read, and then the alternative cards corresponding to the card IDs are reloaded into the pool to be matched. Therefore, the next stage dialogue has multiple choices, one-to-one matching can be carried out according to the problem actually proposed, and the ID of the next stage dialogue also allows the corresponding alternative card which is being matched to be contained with the ID of the previous stage, so that the freedom degree of dialogue matching can be greatly improved compared with the traditional stage-by-stage association and multiple matching, and the efficiency can be improved.

The invention is described above with reference to the accompanying drawings, it is obvious that the specific implementation of the invention is not limited by the above-mentioned manner, and it is within the scope of the invention to adopt various insubstantial modifications of the inventive concept and solution of the invention, or to apply the inventive concept and solution directly to other applications without modification.

Claims

1. An infinite-level multi-turn dialogue method for voice recognition is characterized in that: a functional card model file is formulated, the card type of the functional card model file comprises a main card and an alternative card, the content of the functional card model file comprises a card type ID, a card ID, corresponding semantics of the card, a card instruction and a next-level card conversation ID, and the card ID of the alternative card is the next-level card conversation ID of the main card or the alternative card; after voice input and analysis are carried out to obtain semantics, the semantics are matched with the corresponding semantics of the cards one by one, corresponding card instructions are executed after the semantics are matched with the cards, the matched alternative cards are reloaded into a pool to be matched according to the card IDs contained in the corresponding next-level conversation IDs, the pool to be matched after the cards are reset only contains a main card as a matching object, each round of conversation outside the first level takes the reloaded alternative card after the previous round of conversation is matched as the matching object, and the conversation is ended when the matched cards have no card next-level conversation IDs or undefined abnormal conditions occur.

2. A speech recognition infinite number of dialog method according to claim 1, characterized in that: the system formulates the function card model file according to a set semantic understanding protocol, the function card model file is arranged according to the main card and the alternative card in a grouping mode, and the method specifically comprises the following steps:

3. A speech recognition infinite number of dialog method according to claim 2, characterized in that: the method is provided with parameter abnormal times which represent the times of occurrence of defined abnormal conditions, the initial value is 0, the abnormal times is +1 if the defined abnormal conditions are judged in the fourth step, then the abnormal times is checked, if the abnormal times is less than 3, a user is prompted to repeat the dialogue voice and return to the second step to analyze the semantics again, and if not, the sixth step is executed.

4. A speech recognition infinite number of rounds of dialogue as claimed in claim 3, wherein: the method comprises the steps that after a system executes a card instruction, if the card is not reset, timing is started through a timer, the timer is reset after the system receives the next round of conversation, the system judges that the system is abnormal after the timing is larger than a set first threshold value, if the abnormal times are smaller than 3, a user is prompted to give out conversation voice, meanwhile, a counter is reset for timing, and the system resets and closes the timer when the card is reset.

5. A speech recognition infinite number of rounds of dialogue as claimed in claim 4, wherein: the next-level conversation ID of the alternative cards can be empty, and can also comprise one card ID of one to a plurality of alternative cards, when the alternative cards are reloaded, all the card IDs contained in the next-level conversation ID are respectively read, and then the alternative cards corresponding to the card IDs are reloaded into the pool to be matched.

6. A speech recognition infinite number of rounds of dialogue as claimed in claim 5, wherein: the defined abnormal conditions comprise that the semantic is normal but can not be matched with the card after being analyzed, and the dialogue voice is not received after overtime; the undefined abnormal condition comprises that the received voice cannot be analyzed to obtain semantics and the semantics comprise abnormal characters.