CN114064858A - Dialogue processing method and device for dialogue robot, electronic equipment and medium - Google Patents

Dialogue processing method and device for dialogue robot, electronic equipment and medium Download PDF

Info

Publication number
CN114064858A
CN114064858A CN202111432736.2A CN202111432736A CN114064858A CN 114064858 A CN114064858 A CN 114064858A CN 202111432736 A CN202111432736 A CN 202111432736A CN 114064858 A CN114064858 A CN 114064858A
Authority
CN
China
Prior art keywords
intention
user
robot
voice output
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111432736.2A
Other languages
Chinese (zh)
Inventor
高乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111432736.2A priority Critical patent/CN114064858A/en
Publication of CN114064858A publication Critical patent/CN114064858A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Manipulator (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a dialogue processing method, a device, a medium and a terminal of a dialogue robot, wherein the method comprises the following steps: acquiring real-time voice data of a user, executing robot voice output according to a preset flow, and when a conflict occurs, acquiring character information and user voice interval time according to real-time voice data, judging whether the user intention is an original intention or a new intention according to process characters, interval time and the user intention, executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to a judgment result, aiming at conversation scenes when a client breaks and snatches, by judging the conflict type firstly and then identifying the intention, different processing flows are respectively executed for interrupting and preempting the call, the real complaints and the understanding deviation of the customers are avoided to be ignored, the conversation robot can be made to cope with the situations which often occur in the real scene, and the conversation capacity of the robot under the challenge of the client is greatly improved.

Description

Dialogue processing method and device for dialogue robot, electronic equipment and medium
Technical Field
The present invention relates to the field of computer applications, and in particular, to a method and an apparatus for processing a dialog of a dialog robot, an electronic device, and a medium.
Background
With the rapid development of artificial intelligence, a conversation robot is frequently used, and usually, by mounting a natural language processing system, when a problem is thrown to the conversation robot, input keywords are captured, the most appropriate answer is found from a database through an algorithm, and a corresponding response reply is carried out.
However, the logic of human conversation is varied, many clients do not speak according to the conversation rules of the robot at all, and do not regularly make a word, and in a real scene, a situation that the client interrupts the robot to speak may occur, for example, when the robot just speaks for 1 second and is then spoken by the client, or when the client just speaks, the robot just prepares to broadcast a speech, and the client supplements the speech spoken before. At present, the existing robot cannot cope with the complex situation, or neglects the real appeal of a client, or understands that deviation is generated, so that conversation cannot be continued, the AI robot is not intelligent enough in chat interaction, and user experience is reduced.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention provides a method, an apparatus, a medium and a terminal for processing dialog of a dialog robot, so as to solve the above-mentioned technical problems.
The invention provides a dialogue processing method of a dialogue robot, which comprises the following steps:
acquiring real-time voice data of a user, and executing robot voice output according to a preset flow;
when the real-time voice data and the robot voice output conflict in the time dimension, acquiring text information and user voice interval time according to the real-time voice data, wherein the voice output conflict type comprises a first conflict type used for representing interruption;
when the user is in the first conflict type, executing a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the robot voice output, and performing intention identification after the user stops outputting;
if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
and if the recognition result is the intention interruption, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is an original intention or a new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
In an embodiment of the present invention, the type of the speech output conflict further includes a second conflict type for indicating a call preemption;
and when the conflict exists in the second conflict type, triggering the robot to splice the original content and the call grabbing content of the user and then play the spliced content so as to judge the intention of the user again, and executing the strategy and the content of the robot voice output corresponding to different intentions according to the preset mapping relation.
In an embodiment of the present invention, when the collision type is a first collision type, the word number of the process word is compared with a preset word number threshold:
if the word number of the process characters is larger than a preset word number threshold value, stopping the robot voice output, and after the user voice output is finished, performing intention identification;
if the process character number is less than or equal to a preset character number threshold value, directly performing intention identification; if the recognition result is the interruption intention, stopping the robot voice output; and if the recognition result is the non-interruption intention, continuing the robot voice output.
In an embodiment of the present invention, when the first conflict type exists, the method further includes:
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be a continuous playing type, the interruption intention of the continuous playing type is not processed, and the voice is normally played and output;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objected continuous playing, playing preset objected dialogues, stopping playing when the user continues speaking, and normally playing the original voice and outputting after the user judges that the character is continuous playing;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objection, playing preset objection dialogues and auxiliary dialogues, stopping playing when the user continues speaking, judging the intention of the user again, and playing the voice again according to the final intention of the user for outputting;
and if the word number of the process characters is less than or equal to the preset word number threshold value and the interruption intention is judged to be refused, ending the robot voice output.
In an embodiment of the present invention, when the first conflict type exists, the method further includes:
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, and if the intention of continuing speaking is a continuous playing class, normally playing the original voice and outputting after the user finishes speaking;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, if the intention of continuing speaking is an objection continuous playing type, firstly playing a preset switching language after the user finishes speaking, and then normally playing the original voice for output;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, the playing is stopped, and if the intention of continuing speaking is refusal, the robot voice output is ended.
In an embodiment of the present invention, a time threshold is preset, and when an interval time between two preceding and following words of the user voice output is smaller than the time threshold, it is determined that the user voice output is in the second collision type;
when the second conflict type is judged, combining the previous intention and the current intention, performing multi-intention processing in the turn of matching the previous intention, adjusting the waiting time of the robot when the words are robbed for a plurality of times continuously, and playing a preset linkage operation; after the call is completed, the language is played according to the merged intention without repeating the content played before the call is completed.
In an embodiment of the present invention, the intention identification includes:
setting a main natural language processor and a plurality of sub natural language processors, and acquiring the intention of the real-time voice data through the main natural language processor;
dispatching the intention of the real-time voice data to a sub natural language processor, wherein the query intentions of the main natural language processor and the sub natural language processor are different;
acquiring intention recognition results of the plurality of sub natural language processors, and feeding back all recognition results to the main natural language processor;
and evaluating all recognition results according to the confidence degrees of the recognition results, and selecting one recognition result as a final recognition intention result by the main natural language processor according to the evaluation result.
The invention also provides a dialogue processing method and a device of the dialogue robot, comprising the following steps:
a voice acquisition module for acquiring real-time voice data of a user,
the voice output module is used for executing the voice output of the robot according to a preset flow;
a processing module comprising an identification unit and a control unit;
when the real-time voice data conflicts with the robot voice output in the time dimension, the recognition unit acquires character information and user voice interval time according to the real-time voice data, wherein the voice output conflict type comprises a first conflict type used for representing interruption;
when the user is in the first conflict type, the control unit executes a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the robot voice output, waiting for the user to stop outputting, and performing intention identification through the identification unit;
if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
if the recognition result is the intention interruption, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is an original intention or a new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to preset mapping relations and the judgment result
The invention also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any one of the preceding claims when executing the computer program.
The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any one of the above.
The invention has the beneficial effects that: aiming at the complex problem in the conversation scene when a client breaks and snatches the conversation, the invention judges the conflict type and then identifies the intention, and executes different processing flows for the breaking and snatching respectively, so that the real appeal of the client can not be ignored, the understanding deviation can not be generated, the conversation robot can deal with the frequent situation in the real scene, and the conversation capability of the robot under the challenge to the client is greatly improved.
In addition, the invention can splice the previous content and the call grabbing content together and recognize the intention once again, so that the accuracy of intention recognition is greatly improved, and the intention really wanted to be expressed by the client can not be ignored due to the call grabbing of the client. The invention can also process the interruption and the call of the client, when the robot needs to continue to go down according to the flow, the robot can smoothly respond and is not obtrusive through the sentence cutting and the connection of the call, and the newly broadcasted call can not re-broadcast the previously broadcasted part, so that the robot can express more really in the conversation process and tends to the human, and the conversation experience and the feeling of the user are improved.
Drawings
Fig. 1 is a flow chart illustrating a conversation processing method of a conversation robot according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of an intention recognition flow of a conversation processing method of a conversation robot in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a hardware structure of an electronic device according to another embodiment of the present invention.
Fig. 5 is a schematic diagram of a hardware configuration of a dialogue processing device of the dialogue robot of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention, however, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring embodiments of the present invention.
As shown in fig. 1, the conversation processing method of the conversation robot in the present embodiment includes:
s1, acquiring real-time voice data of a user, and executing robot voice output according to a preset flow;
s2, when real-time voice data and robot voice output conflict in a time dimension, acquiring character information and user voice interval time according to the real-time voice data, wherein the type of voice output conflict comprises a first conflict type used for representing interruption;
s3, when the user is in the first conflict type, executing a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the voice output of the robot, and performing intention identification after the user stops outputting;
s4, if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
and S5, if the recognition result is the intention interruption, executing a second processing flow, wherein the second processing flow comprises the steps of judging whether the intention of the user is the original intention or the new intention according to the process characters, the interval time and the intention of the user, and executing the strategy and the content of the robot voice output corresponding to different intentions according to the preset mapping relation according to the judgment result.
In step S1 of this embodiment, the real-time voice data of the user is first obtained, and when no interruption or call robbery occurs, the robot voice output may be performed according to a preset flow. In this embodiment, the user Speech may be obtained in real time through an ASR (Automatic Speech Recognition) technology, and the ASR technology may convert the Speech into text in this embodiment. Of course, other ways to achieve the above functions may be adopted instead, and are not described herein again.
In step S2 of the present embodiment, when the real-time voice data and the robot voice output collide in the time dimension, the text information and the user voice interval time are acquired from the real-time voice data, and the type of the voice output collision includes a first collision type for indicating an interruption. In this embodiment, when the user voice and the robot voice output conflict, it is determined whether the user voice is interrupted, and then it is determined whether the user voice is a call robbery.
In the present embodiment, in steps S4 and S5, if the recognition result is the non-interruption intention, the execution of the voice output is continued as it is; and if the recognition result is the interrupting intention, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is the original intention or the new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
In this embodiment, the types of speech output collisions further include a second collision type for indicating a call preemption; and when the conflict exists in the second conflict type, triggering the robot to splice the original content and the call grabbing content of the user and then play the spliced content so as to judge the intention of the user again, and executing the strategy and the content of the robot voice output corresponding to different intentions according to the preset mapping relation.
In this embodiment, the judgment of whether the first collision type is a break or not can be performed by presetting a word number threshold. The "intention" in the present embodiment is computer-readable data indicating that a computer system component has been recognized as meaning intended by a natural language query, and the intention may be classified in advance based on the recognition result of NLP (natural language processing), for example, into a continuation class, an objection continuation, an objection class, and a rejection class in the present embodiment. The interruption processing flow in this embodiment may include two cases:
interrupting case 1: if the word number of the process characters is larger than a preset word number threshold value, stopping the robot voice output, and after the user voice output is finished, performing intention identification;
specifically, the interruption case 1 includes:
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be a continuous playing type, the interruption intention of the continuous playing type is not processed, and the voice is normally played and output;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objected continuous playing, playing preset objected dialogues, stopping playing when the user continues speaking, and normally playing the original voice and outputting after the user judges that the character is continuous playing;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objection, playing preset objection dialogues and auxiliary dialogues, stopping playing when the user continues speaking, judging the intention of the user again, and playing the voice again according to the final intention of the user for outputting;
and if the word number of the process characters is less than or equal to the preset word number threshold value and the interruption intention is judged to be refused, ending the robot voice output.
An interruption condition 2, if the word number of the process characters is less than or equal to a preset word number threshold value, performing intention identification, and if the identification result is an interruption intention, stopping the robot voice output; if the recognition result is the non-interrupting intention, the voice output of the robot is continued, and the broadcast is not influenced.
Specifically, the interruption case 2 includes:
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, and if the intention of continuing speaking is a continuous playing class, normally playing the original voice and outputting after the user finishes speaking;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, if the intention of continuing speaking is an objection continuous playing type, firstly playing a preset switching language after the user finishes speaking, and then normally playing the original voice for output;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, and if the intention of continuing speaking is refusal, ending the robot voice output;
in this embodiment, based on the ASR-processed characters, the break logic under different conditions may be preset according to the sub-spitting text:
situation 101-without interruption of normal play-if the user is not interrupted, the robot plays speech normally for output;
case 102-no intent to resume play, no interruption to play: if the number of words spoken by the user is less than or equal to the preset word number threshold value and the continuous playing class is judged, the continuous playing class intention is not processed, and the robot normally plays voice for output;
case 103-handling resume class intent, when there is an interruption to play: if the number of words spoken by the user is less than or equal to the preset word number threshold value and the objection continuous playing is judged, the robot plays the objection speech; when the client continues speaking, the playing is stopped, and after the client judges that the client continues speaking, the robot normally plays the original voice for output;
case 104-non-resume class intent, interrupting the latter non-resume: if the number of words spoken by the user is less than or equal to the preset word number threshold value and the robot judges that the words are objected, the robot plays objectional speech and auxiliary speech; when the client continues speaking, stopping playing, judging the intention of the user again, and playing the voice output again by the robot according to the final intention of the user;
case 105-handling no resume class intent, no interruption to play: and if the number of words spoken by the user is less than or equal to the preset word number threshold value and the judgment is negative, ending the robot voice output.
Case 206-do not handle resume class intent, without interruption to play: if the number of words spoken by the user is larger than the preset number of words threshold, stopping playing when the client continues speaking, and if the client judges that the user continues speaking, normally playing the original voice and outputting by the robot after the client finishes speaking;
case 207-handling resume class intent, without interruption to play: if the number of words spoken by the user is larger than the preset number of words threshold, the robot stops playing when the client continues speaking, and if the client judges that the words are in the disagreement continuous playing type, the robot plays the switching words first and then normally plays the original voice for output after the client finishes speaking.
Case 208-handling no-resume class intent, without interruption to play: and if the number of words spoken by the user is greater than the preset word number threshold value and is judged to be rejected, ending the robot voice output.
In this embodiment, in the two interruption situations, the interruption situation 1 generates an intention recognition result, and the interruption situation 2 triggers the robot to stop voice output on the premise that the recognition result is the interruption intention, and based on the two different interruption situations, it is necessary to continue to determine whether the user is a call robber. In this embodiment, a time threshold is preset, and when the interval time between two preceding and following words of the user voice output is smaller than the time threshold, it is determined that the user voice output is in the second collision type; and when the conflict is judged to be in the second conflict type, combining the previous intention and the current intention, performing multi-intention processing in the turn matched with the previous intention, adjusting the waiting time of the robot when the words are robbed for a plurality of times continuously, playing preset connection words, and not broadcasting the parts which are broadcasted before. Specifically, whether a call robbing rule is met or not can be judged through the interval time between two preceding and following words output by the user voice, for example, the preset time threshold is 3 seconds to judge, if the interval time between the two following words is smaller than the time threshold, a call robbing processing flow is triggered according to the user intention, the call robbing processing flow comprises the steps of merging the previous intention and the current intention, multi-intention processing is carried out in the turn matched with the previous intention, and when multiple successive calls are robbed, a preset processing mode can be triggered, for example, the waiting time is adjusted, the connection operation is increased, and the connection operation can be as 'kayinghe', and the like. When the interval time of the two words is larger than or equal to the time threshold, the trunk branch matching processing is carried out:
if the main trunk branch is matched, the core meaning rule is started, and the main dialect and the auxiliary dialect are not completely broadcasted, ignoring the user intention, and directly broadcasting the connection dialect and the auxiliary dialect;
if the main speech operation and the auxiliary speech operation are not matched with the main trunk branch or matched with the main trunk branch but the core meaning rule is not opened and the main speech operation and the auxiliary speech operation are completely broadcasted, the main speech operation and the auxiliary speech operation are normally matched according to the intention. In this embodiment, the ASR mode needs to be wordled, no new text is output for more than 3s, no end mark is given, and the robot performs on-hook processing and records a log. The conversational branch in this embodiment mainly includes a trunk branch, an objection branch, and an ending branch. Trunk branches are the broadcast of the splicing and auxiliary dialogs, ignoring the client intent. The objection branch is to reply an objection, ignore objection action after reply, and broadcast a connection operation and an auxiliary operation (the auxiliary operation still follows the rule of circular broadcast in sequence). The end branch refers to normal matching processing according to intention.
In this embodiment, after the first processing flow and the second processing flow, when the robot needs to continue the subsequent flow, the dialog is cut, and after adding the preset linking dialog, the dialog is broadcasted, and the part which has been broadcasted before is not broadcasted any more. By the method, the robot can smoothly carry out the dialect connection and is not more obtrusive, and the rebroadcast dialect can not rebroadcast the previously broadcasted part again, so that the robot can express more really in the dialog process.
In this embodiment, when the join broadcast is needed, when TTS is requested, an identifier indicating whether the sentence is "continue broadcast" is added, and the TTS engine splices and returns according to the join broadcast identifier of the DM. TTS technology (Text-To-Speech, Speech synthesis) is a technical implementation method for converting characters into voice, TTS, and mainly includes two types: splicing and parametric methods. The splicing method is formed by splicing a plurality of voice recorded in advance by selecting required basic units. Units may be syllables, phonemes, or the like; in pursuit of continuity of synthesized speech, it is also common to use diphones as units. The advantage is a higher speech quality and the disadvantage is a too large database requirement. Typically, tens of hours of finished corpora are required, and the cost is high. The parametric method is to generate speech parameters (including fundamental frequency, formant frequency, etc.) at every moment according to a statistical model and then convert the parameters into waveforms. Mainly divided into 3 modules: front-end, back-end and vocoder. The front end analyzes the text to determine what the pronunciation of each character is, what tone the sentence is in, what rhythm the sentence is read with, what places are important points to be emphasized, and the like. Common mood-related data descriptions include, but are not limited to, the following: prosodic boundaries, accents, boundary tones, and even emotions. The requirement of the database is relatively small, and optionally, a parameter method may be adopted in this embodiment to perform the linkage broadcast. A splicing operation may be as follows, requiring support of a library of voices to all scenes:
Figure BDA0003380825020000101
Figure BDA0003380825020000111
TABLE 1
In this embodiment, natural language query intent assignment may be combined to match a particular natural language query with intents from multiple intent matchers, pointing to the appropriate dialog query processor, further enabling the robot to smoothly make verbal links, without being obtrusive. For example, for the user's voice "I hungry," the query may be matched against multiple extended natural language processors that are capable of processing the query and generating an intent of the query (e.g., order pizza, order coffee). Each of the extended natural language processors may be a natural language processor that is independent of a main natural language processor of the system and is capable of returning at least one intent and is also capable of returning one or more entities for a natural language query. Each extended natural language processor may also: extending the natural language processor independent of perception of other processors regarding production intent; the extended natural language processor can identify the intent of the query using its own form of natural language query matching the natural language of the intent. A way to eliminate ambiguity between multiple intent matchers, dialog query processors, and possibly receive user input selections, such as selections between different intents, different dialog query processors, may also be provided by using data such as user rankings, user preferences, contextual information, and/or user profiles. As shown in fig. 2, specifically, comprises
S601, setting a main natural language processor and a plurality of sub natural language processors, and acquiring the intention of the real-time voice data through the main natural language processor;
s602, distributing the intention of the real-time voice data to a sub natural language processor, wherein the main natural language processor and the sub natural language processor have different query intentions;
s603, acquiring intention recognition results of the plurality of sub-natural language processors, and feeding back all recognition results to the main natural language processor;
and S604, evaluating all recognition results according to the confidence degrees of the recognition results, and selecting one recognition result as a final recognition intention result by the main natural language processor according to the evaluation results.
In this embodiment, the main natural language processor may have sent a query to a large set of component natural language processors, and only a subset of these component natural language processors are able to understand the particular query, returning a corresponding intent. If the main natural language processor sends a query of the corpus "i hungry" to be processed to multiple different sub-natural language processors, the sub-natural language processor expanded for food ordering may be programmed to return the intent of the query, and another sub-natural language processor for making a reservation at a restaurant may also be programmed to return the intent of the query, but the expanded sub-natural language processor for scheduling a taxi ride may not be programmed to return the intent of the query. Thus, the main natural language processor may send queries to all three extended natural language processors, but it may only receive back the intent of "food order" from the food order extended natural language processor and the intent of "make a reservation" from the restaurant reservation extended natural language processor. The main natural language processor may match the intent with the corresponding conversational query processor. For example, the extended natural language processor may be part of the same extension as the corresponding conversational query processor, as indicated to the primary natural language processor in the registration for the extension. Thus, after receiving an intent from a particular split natural language processor, the master natural language processor may look up the registry of extensions for that split natural language processor, finding data relevant to the corresponding conversational query processor. As another example, along with returning an intent, the sub-natural language processor may also return an identifier (e.g., address, etc.) of the conversational query processor that processed the intent. Such an identifier may be used by the host natural language processor to match the received intent to the matching conversational query processor.
Accordingly, as shown in fig. 5, the present embodiment further provides a dialogue processing apparatus for a dialogue robot, including:
a voice acquisition module 101 for acquiring real-time voice data of a user,
the voice output module 102 is used for executing robot voice output according to a preset flow;
a processing module 103 comprising an identification unit and a control unit;
when real-time voice data and robot voice output conflict in a time dimension, the recognition unit acquires character information and user voice interval time according to the real-time voice data, wherein the type of voice output conflict comprises a first conflict type used for representing interruption;
when the user is in the first conflict type, the control unit executes a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the robot voice output, waiting for the user to stop outputting, and performing intention identification through the identification unit;
if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
and if the recognition result is the interrupting intention, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is the original intention or the new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
In this embodiment, the real-time voice data of the user is first acquired through the voice acquisition module, and the robot voice output can be executed according to a preset flow without interruption or call robbery. In this embodiment, the Recognition unit may convert the user Speech acquired in real time into text through an ASR (Automatic Speech Recognition) technique.
In this embodiment, when the real-time voice data conflicts with the robot voice output in the time dimension, the text information and the user voice interval time are acquired according to the real-time voice data, and the type of the voice output conflict includes a first conflict type for representing interruption. In this embodiment, when the user voice and the robot voice output conflict, it is determined whether the user voice is interrupted, and then it is determined whether the user voice is a call robbery.
In the embodiment, if the recognition result is the non-interruption intention, the voice output is continuously executed according to the original flow; and if the recognition result is the interrupting intention, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is the original intention or the new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
In this embodiment, the types of speech output collisions further include a second collision type for indicating a call preemption; and when the conflict exists in the second conflict type, triggering the robot to splice the original content and the call grabbing content of the user and then play the spliced content so as to judge the intention of the user again, and executing the strategy and the content of the robot voice output corresponding to different intentions according to the preset mapping relation.
In this embodiment, the judgment of whether the first collision type is a break or not can be performed by presetting a word number threshold. The "intention" in the present embodiment is computer-readable data indicating that a computer system component has been recognized as meaning intended by a natural language query, and the intention may be classified in advance based on the recognition result of NLP (natural language processing), for example, into a continuation class, an objection continuation, an objection class, and a rejection class in the present embodiment. The interruption processing flow in this embodiment may include two cases:
interrupting case 1: if the word number of the process characters is larger than a preset word number threshold value, stopping the robot voice output, and after the user voice output is finished, performing intention identification;
specifically, the interruption case 1 includes:
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be a continuous playing type, the interruption intention of the continuous playing type is not processed, and the voice is normally played and output;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objected continuous playing, playing preset objected dialogues, stopping playing when the user continues speaking, and normally playing the original voice and outputting after the user judges that the character is continuous playing;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objection, playing preset objection dialogues and auxiliary dialogues, stopping playing when the user continues speaking, judging the intention of the user again, and playing the voice again according to the final intention of the user for outputting;
and if the word number of the process characters is less than or equal to the preset word number threshold value and the interruption intention is judged to be refused, ending the robot voice output.
An interruption condition 2, if the word number of the process characters is less than or equal to a preset word number threshold value, performing intention identification, and if the identification result is an interruption intention, stopping the robot voice output; if the recognition result is the non-interrupting intention, the voice output of the robot is continued, and the broadcast is not influenced.
Specifically, the interruption case 2 includes:
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, and if the intention of continuing speaking is a continuous playing class, normally playing the original voice and outputting after the user finishes speaking;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, if the intention of continuing speaking is an objection continuous playing type, firstly playing a preset switching language after the user finishes speaking, and then normally playing the original voice for output;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, the playing is stopped, and if the intention of continuing speaking is refusal, the robot voice output is ended.
In this embodiment, in the two interruption situations, the interruption situation 1 generates an intention recognition result, and the interruption situation 2 triggers the robot to stop voice output on the premise that the recognition result is the interruption intention, and based on the two different interruption situations, it is necessary to continue to determine whether the user is a call robber. In this embodiment, a time threshold is preset, and when the interval time between two preceding and following words of the user voice output is smaller than the time threshold, it is determined that the user voice output is in the second collision type; and when the conflict is judged to be in the second conflict type, combining the previous intention and the current intention, performing multi-intention processing in the turn matched with the previous intention, adjusting the waiting time of the robot when the words are robbed for a plurality of times continuously, playing preset connection words, and not broadcasting the parts which are broadcasted before. Specifically, whether a call robbing rule is met or not can be judged through the interval time between two preceding and following words output by the user voice, for example, the preset time threshold is 3 seconds to judge, if the interval time between the two following words is smaller than the time threshold, a call robbing processing flow is triggered according to the user intention, the call robbing processing flow comprises the steps of merging the previous intention and the current intention, multi-intention processing is carried out in the turn matched with the previous intention, and when multiple successive calls are robbed, a preset processing mode can be triggered, for example, the waiting time is adjusted, the connection operation is increased, and the connection operation can be as 'kayinghe', and the like. When the interval time of the two words is larger than or equal to the time threshold, the trunk branch matching processing is carried out:
if the main trunk branch is matched, the core meaning rule is started, and the main dialect and the auxiliary dialect are not completely broadcasted, ignoring the user intention, and directly broadcasting the connection dialect and the auxiliary dialect;
if the main speech operation and the auxiliary speech operation are not matched with the main trunk branch or matched with the main trunk branch but the core meaning rule is not opened and the main speech operation and the auxiliary speech operation are completely broadcasted, the main speech operation and the auxiliary speech operation are normally matched according to the intention. In this embodiment, the ASR mode needs to be wordled, no new text is output for more than 3s, no end mark is given, and the robot performs on-hook processing and records a log. The conversational branch in this embodiment mainly includes a trunk branch, an objection branch, and an ending branch. Trunk branches are the broadcast of the splicing and auxiliary dialogs, ignoring the client intent. The objection branch is to reply an objection, ignore objection action after reply, and broadcast a connection operation and an auxiliary operation (the auxiliary operation still follows the rule of circular broadcast in sequence). The end branch refers to normal matching processing according to intention.
In this embodiment, after the first processing flow and the second processing flow, when the robot needs to continue the subsequent flow, the dialog is cut, and after adding the preset linking dialog, the dialog is broadcasted, and the part which has been broadcasted before is not broadcasted any more. By the method, the robot can smoothly carry out the dialect connection and is not more obtrusive, and the rebroadcast dialect can not rebroadcast the previously broadcasted part again, so that the robot can express more really in the dialog process.
In this embodiment, when a join-up broadcast is required, when a Text-To-Speech (TTS) request is made, an identifier indicating whether the sentence is "continue broadcast" is added, and the TTS engine splices and returns according To the join-up broadcast identifier. In this embodiment, text is converted into voice by TTS, which mainly includes two types: splicing and parametric methods. The splicing method is formed by splicing a plurality of voice recorded in advance by selecting required basic units. Units may be syllables, phonemes, or the like; in pursuit of continuity of synthesized speech, it is also common to use diphones as units. The advantage is a higher speech quality and the disadvantage is a too large database requirement. Typically, tens of hours of finished corpora are required, and the cost is high. The parametric method is to generate speech parameters (including fundamental frequency, formant frequency, etc.) at every moment according to a statistical model and then convert the parameters into waveforms. Mainly divided into 3 modules: front-end, back-end and vocoder. The front end analyzes the text to determine what the pronunciation of each character is, what tone the sentence is in, what rhythm the sentence is read with, what places are important points to be emphasized, and the like. Common mood-related data descriptions include, but are not limited to, the following: prosodic boundaries, accents, boundary tones, and even emotions. The requirement of the database is relatively small, and optionally, a parameter method may be adopted in this embodiment to perform the linkage broadcast.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements any of the methods in the present embodiments.
The present embodiment further provides an electronic terminal, including: a processor and a memory;
the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the method in the embodiment.
The computer-readable storage medium in the present embodiment can be understood by those skilled in the art as follows: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The electronic device provided by the embodiment comprises a processor, a memory, a transceiver and a communication interface, wherein the memory and the communication interface are connected with the processor and the transceiver and are used for completing mutual communication, the memory is used for storing a computer program, the communication interface is used for carrying out communication, and the processor and the transceiver are used for operating the computer program so as to enable the electronic terminal to execute the steps of the method.
As shown in fig. 3, the electronic device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Fig. 4 is a hardware structure of an electronic device provided in another embodiment, and the electronic device in this embodiment may include a second processor 1201 and a second memory 1202. The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment. The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The electronic device may further include: communication components 1203, power components 1204, multimedia components 1205, audio components 1206, input/output interfaces 1207, and/or sensor components 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
In this embodiment, the Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In the above embodiments, unless otherwise specified, the description of common objects by using "first", "second", etc. ordinal numbers only indicate that they refer to different instances of the same object, rather than indicating that the objects being described must be in a given sequence, whether temporally, spatially, in ranking, or in any other manner. In the above-described embodiments, reference in the specification to "the embodiment," "an embodiment," "another embodiment," or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments. The various appearances of the phrase "the present embodiment," "one embodiment," or "another embodiment" are not necessarily all referring to the same embodiment. If the specification states a component, feature, structure, or characteristic "may", "might", or "could" be included, that particular component, feature, structure, or characteristic is not necessarily included. If the specification or claim refers to "a" or "an" element, that does not mean there is only one of the element. If the specification or claim refers to "a further" element, that does not preclude there being more than one of the further element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In the embodiments described above, although the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic ram (dram)) may use the discussed embodiments. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims.
The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A conversation process method for a conversation robot, comprising:
acquiring real-time voice data of a user, and executing robot voice output according to a preset flow;
when voice output conflict occurs between the real-time voice data and robot voice output in a time dimension, acquiring text information and user voice interval time according to the real-time voice data, wherein the type of the voice output conflict comprises a first conflict type used for representing interruption;
when the user is in the first conflict type, executing a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the robot voice output, and performing intention identification after the user stops outputting;
if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
and if the recognition result is the intention interruption, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is an original intention or a new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
2. The conversation processing method of a conversation robot according to claim 1, wherein the type of the voice output conflict further comprises a second conflict type for indicating a robbery of a conversation;
and when the conflict exists in the second conflict type, triggering the robot to splice the original content and the call grabbing content of the user and then play the spliced content so as to judge the intention of the user again, and executing the strategy and the content of the robot voice output corresponding to different intentions according to the preset mapping relation.
3. The conversation process method of a conversation robot according to claim 1, wherein when in the first conflict type, the word count of the course word is compared with a preset word count threshold value:
if the word number of the process characters is larger than a preset word number threshold value, stopping the robot voice output, and after the user voice output is finished, performing intention identification;
if the process character number is less than or equal to a preset character number threshold value, directly performing intention identification; if the recognition result is the interruption intention, stopping the robot voice output; and if the recognition result is the non-interruption intention, continuing the robot voice output.
4. The conversation process method of a conversation robot according to claim 3, further comprising, when in the first conflict type:
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be a continuous playing type, the interruption intention of the continuous playing type is not processed, and the voice is normally played and output;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objected continuous playing, playing preset objected dialogues, stopping playing when the user continues speaking, and normally playing the original voice and outputting after the user judges that the character is continuous playing;
if the word number of the process characters is less than or equal to a preset word number threshold value and the interruption intention is judged to be objection, playing preset objection dialogues and auxiliary dialogues, stopping playing when the user continues speaking, judging the intention of the user again, and playing the voice again according to the final intention of the user for outputting;
and if the word number of the process characters is less than or equal to the preset word number threshold value and the interruption intention is judged to be refused, ending the robot voice output.
5. The conversation process method of a conversation robot according to claim 3, further comprising, when in the first conflict type:
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, and if the intention of continuing speaking is a continuous playing class, normally playing the original voice and outputting after the user finishes speaking;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, stopping playing, if the intention of continuing speaking is an objection continuous playing type, firstly playing a preset switching language after the user finishes speaking, and then normally playing the original voice for output;
if the word number of the process characters is larger than the preset word number threshold value and the user continues speaking, the playing is stopped, and if the intention of continuing speaking is refusal, the robot voice output is ended.
6. The dialogue processing method of the dialogue robot of claim 2,
presetting a time threshold, and when the interval time between the front sentence and the rear sentence of the voice output of the user is less than the time threshold, determining that the voice output of the user is in a second conflict type;
when the second conflict type is judged, combining the previous intention and the current intention, performing multi-intention processing in the turn of matching the previous intention, adjusting the waiting time of the robot when the words are robbed for a plurality of times continuously, and playing a preset linkage operation; after the call is completed, the language is played according to the merged intention without repeating the content played before the call is completed.
7. The conversation processing method of a conversation robot according to claim 1, wherein the intention recognition comprises:
setting a main natural language processor and a plurality of sub natural language processors, and acquiring the intention of the real-time voice data through the main natural language processor;
dispatching the intention of the real-time voice data to a sub natural language processor, wherein the query intentions of the main natural language processor and the sub natural language processor are different;
acquiring intention recognition results of the plurality of sub natural language processors, and feeding back all recognition results to the main natural language processor;
and evaluating all recognition results according to the confidence degrees of the recognition results, and selecting one recognition result as a final recognition intention result by the main natural language processor according to the evaluation result.
8. A conversation processing method and device for a conversation robot, comprising:
a voice acquisition module for acquiring real-time voice data of a user,
the voice output module is used for executing the voice output of the robot according to a preset flow;
a processing module comprising an identification unit and a control unit;
when the real-time voice data conflicts with the robot voice output in the time dimension, the recognition unit acquires character information and user voice interval time according to the real-time voice data, wherein the voice output conflict type comprises a first conflict type used for representing interruption;
when the user is in the first conflict type, the control unit executes a first processing flow according to the word number of the process words in the word information, wherein the first processing flow comprises stopping the robot voice output, waiting for the user to stop outputting, and performing intention identification through the identification unit;
if the recognition result is the non-interruption intention, continuing to execute voice output according to the original flow;
and if the recognition result is the intention interruption, executing a second processing flow, wherein the second processing flow comprises judging whether the user intention is an original intention or a new intention according to the process characters, the interval time and the user intention, and executing strategies and contents of robot voice output corresponding to different intentions according to a preset mapping relation according to the judgment result.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202111432736.2A 2021-11-29 2021-11-29 Dialogue processing method and device for dialogue robot, electronic equipment and medium Pending CN114064858A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111432736.2A CN114064858A (en) 2021-11-29 2021-11-29 Dialogue processing method and device for dialogue robot, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111432736.2A CN114064858A (en) 2021-11-29 2021-11-29 Dialogue processing method and device for dialogue robot, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN114064858A true CN114064858A (en) 2022-02-18

Family

ID=80277017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111432736.2A Pending CN114064858A (en) 2021-11-29 2021-11-29 Dialogue processing method and device for dialogue robot, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN114064858A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528822A (en) * 2022-02-25 2022-05-24 平安科技(深圳)有限公司 Conversation process control method, device, server and medium for customer service robot
CN116798427A (en) * 2023-06-21 2023-09-22 支付宝(杭州)信息技术有限公司 Man-machine interaction method based on multiple modes and digital man system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528822A (en) * 2022-02-25 2022-05-24 平安科技(深圳)有限公司 Conversation process control method, device, server and medium for customer service robot
WO2023159749A1 (en) * 2022-02-25 2023-08-31 平安科技(深圳)有限公司 Dialogue process control method and apparatus of customer service robot, server and medium
CN114528822B (en) * 2022-02-25 2024-02-06 平安科技(深圳)有限公司 Conversation flow control method and device of customer service robot, server and medium
CN116798427A (en) * 2023-06-21 2023-09-22 支付宝(杭州)信息技术有限公司 Man-machine interaction method based on multiple modes and digital man system

Similar Documents

Publication Publication Date Title
US11264030B2 (en) Indicator for voice-based communications
US20220020357A1 (en) On-device learning in a hybrid speech processing system
US10074369B2 (en) Voice-based communications
US10453449B2 (en) Indicator for voice-based communications
US10482885B1 (en) Speaker based anaphora resolution
US8064573B2 (en) Computer generated prompting
US11869495B2 (en) Voice to voice natural language understanding processing
DE112021001064T5 (en) Device-directed utterance recognition
CN114064858A (en) Dialogue processing method and device for dialogue robot, electronic equipment and medium
US11276403B2 (en) Natural language speech processing application selection
US20240144933A1 (en) Voice-controlled communication requests and responses
US20240005923A1 (en) Systems and methods for disambiguating a voice search query
US20240029743A1 (en) Intermediate data for inter-device speech processing
DE112022000504T5 (en) Interactive content delivery
WO2018045154A1 (en) Voice-based communications
US10957313B1 (en) System command processing
DE112021000292T5 (en) VOICE PROCESSING SYSTEM
CN112102807A (en) Speech synthesis method, apparatus, computer device and storage medium
CN113132214A (en) Conversation method, device, server and storage medium
CN112397053B (en) Voice recognition method and device, electronic equipment and readable storage medium
US11763809B1 (en) Access to multiple virtual assistants
Neto et al. The development of a multi-purpose spoken dialogue system.
CN117496973B (en) Method, device, equipment and medium for improving man-machine conversation interaction experience
CN113421549A (en) Speech synthesis method, speech synthesis device, computer equipment and storage medium
JP2022161353A (en) Information output system, server device and information output method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination