EP3714453A1 - Full duplex communication for conversation between chatbot and human - Google Patents
Full duplex communication for conversation between chatbot and humanInfo
- Publication number
- EP3714453A1 EP3714453A1 EP18830117.0A EP18830117A EP3714453A1 EP 3714453 A1 EP3714453 A1 EP 3714453A1 EP 18830117 A EP18830117 A EP 18830117A EP 3714453 A1 EP3714453 A1 EP 3714453A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- complete
- response
- response message
- speech recognition
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title abstract description 23
- 241000282414 Homo sapiens Species 0.000 title abstract description 16
- 230000004044 response Effects 0.000 claims abstract description 328
- 230000014509 gene expression Effects 0.000 claims abstract description 181
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000012545 processing Methods 0.000 claims description 163
- 230000000717 retained effect Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 21
- 230000033764 rhythmic process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 238000003672 processing method Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 210000003733 optic disk Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Definitions
- a chatbot may directly communicate with human beings as users by using human being’s languages.
- a typical implementation scenario applying chatbot may be the artificial intelligence (AI) technique.
- the chatbot may be often used in APPs, such as instant message, social applications, smart phone personal assistant, and IoT (Internet of things) intelligent devices. It may be convenient to understand users’ intention and provide information that the user wants by voice conversation between the chatbot and the user, so that the display on screen may be omitted.
- the chatbot in the art still uses messages as construction unit for conversation during the voice conversation with a user and thus there is a long way to realize the imitation of conversation between human beings.
- a technical solution related to full duplex communication for voice conversation between chatbot and human beings is disclosed. More particularly, by using such technique, the conventional conversation mode with message as center in the art is subverted so as to realize a conversation mode in full duplex mode.
- the entire expression that a user intents to express may be predicted when obtaining intermediate result of speech recognition, and response messages may be generated in advance based on the predicted whole expression so that the generated response message may be output immediately when a response condition is satisfied, e.g., it is determined that a user has finished a paragraph of talking.
- the latency from the end of voice input of a user and the start of speech output of a chatbot may be minimized.
- FIG. 1 is an exemplary block diagram of a conversation processing device of embodiments of the present disclosure
- FIG. 2 is a schematic flowchart showing a conversation processing method of embodiments of the present disclosure
- FIG. 3 is an exemplary block diagram of another conversation processing device of embodiments of the present disclosure.
- FIG. 4 is a schematic block diagram showing application of thread management of embodiments of the present disclosure
- FIG. 5 is a schematic block diagram showing data structure of thread management of embodiments of the present disclosure.
- FIG. 6 is a schematic flowchart showing another conversation processing method of embodiments of the present disclosure.
- FIG. 7 is an exemplary block diagram of still another conversation processing device of embodiments of the present disclosure.
- FIG. 8 is an exemplary block diagram of another conversation processing device of embodiments of the present disclosure.
- FIG. 9 is an exemplary block diagram of an implementation example of conversation processing device of embodiments of the present disclosure.
- Fig. 10 is a schematic structural block diagram of an electronic apparatus of embodiments of the present disclosure.
- This disclosure is drawn, inter alia, to methods, apparatus, systems, and computer program products related to recommendation in using of mobile devices.
- the term "technique”, as cited herein, for instance, may refer to system(s), method(s), computer-readable instructions, module(s), algorithms, hardware logic (e.g., Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs)), and/or other technique(s) as permitted by the context above and throughout the document.
- FPGAs Field-programmable Gate Arrays
- ASICs Application-specific Integrated Circuits
- ASSPs Application-specific Standard Products
- SOCs System-on-a-chip systems
- CPLDs Complex Programmable Logic Devices
- the present disclosure may involve improvement on the way of conversation between a chatbot and a user.
- the chatbot may mainly apply three components independent from each other to construct a conversation with the users. These components may include: a Speech Recognition (SR) module, a Conversation Engine (CE) module, and a Text To Speech (TTS) module.
- SR Speech Recognition
- CE Conversation Engine
- TTS Text To Speech
- the conventional chatbot may always play a role of listening, and convert the speech input into text by using a speech recognition module. Then, the chatbot may perform processing to generate a response message in text according to the converted text after determining that a user finishes a sentence or a paragraph, and then a speech output may be performed by the text to speech module.
- a chatbot cannot make the total imitation on the way of thinking while talking of human beings during the speech interaction based on messages. More particularly, the chatbot has to perform a processing of preparing the language for answer after waiting till the user finishes a whole sentence or paragraph in each time for response. Such procedure may certainly induce a feeling of pause in the conversation and the feeling of conversation between human beings may be impossible to be imitated well.
- a user is performing a voice conversation with conventional chatbot in the art, the user may feel like taking turns at talking with another person via two way radios. Therefore, such solution with messages at center brings limitation on the various and naturalness of voice conversation between a chatbot and human beings.
- a chatbot may perform prediction on the intention a user wants to express and prepare a response message synchronously while listening to the speech input by the user.
- the chatbot may predict the whole or complete expression that a user wants to express based on an intermediate result of speech recognition and perform the preparation on the response messages in advance based on the predicted whole or complete expression, during a conversation between the chatbot and human beings. Therefore, when a certain condition for response is satisfied, e.g., a user finishes a paragraph of speech and thus a final result of speech recognition is generated, a response message may be output in time. With such solution, the feeling of conversation between human beings may be imitated better.
- a pause of speech may be generated after one party has completed the expression of a paragraph of speech (e.g., one sentence, several sentences, or a part of a sentence, or the like).
- Such pause of speech may be a case that one party has finished a piece of intended expression and waits for a response from the other party.
- the paragraph of speech of one party and the response to that paragraph of speech from the other party may constitute one turn of conversation.
- the above speech recognition module is capable to recognize such pause and determine that a user has completed a paragraph of speech when such pause is long enough. At that time, a whole paragraph of the result of speech recognition may be output with respect to that paragraph of speech. That result of speech recognition may be the final result described above.
- the intermediate result described above may refer to a result generated by a processing of speech recognition before the end of a paragraph of speech.
- a conversation processing device 101 in Fig. 1 may be implemented as or provided in a small portable (or mobile) electronic device, such as cell phone, personal digital assistant (PDA), personal media player device, wireless network player device, personal headset device, IoT (internet of things) intelligent device, dedicate device or combined device containing any of functions described above.
- the conversation processing device 101 may be also implemented or provided in a personal computer including configurations of laptop computer and non laptop computer.
- the conversation processing device 101 may be further implemented as a server on internet or provided in a server on internet.
- Such server may be implemented in one or more computer systems (distributed server), or implemented as a server based on cloud technology.
- Such servers may be connected with user clients via internet to receive voice output of the users collected by the user clients, and generate response messages after conversation processing on the voice output. Then the generated response messages may be returned to the user clients to be output to the users.
- the conversation processing device 101 of embodiments of the present disclosure may implement the functions of the chatbot described above.
- the conversation processing device 101 may include: a speech recognition
- SR language predicting
- LP language predicting
- response message generating module 104 response message outputting module 105.
- the speech recognition module 102 may be configured to perform speech recognition on a user speech 1 10 input by a user 116 to generate an intermediate result 111 of speech recognition. Furthermore, when a paragraph of speech is finished, the speech recognition module 102 may output a final result 114 of speech recognition. More particularly, the speech recognition module 102 may include an Acoustic Model (LM) module 106 and a Language Model (LM) module 107.
- the acoustic model module 106 may be configured to output a speech recognition result in a form of phonetic symbol sequence.
- the language model module 107 may generate a speech recognition result in a form of text based on the phonetic symbol sequence output by the acoustic model module 106.
- the language predicting module 103 may be configured to predict a whole expression 112 according to the intermediate result 111. More particularly, the intermediate result may be an intermediate result in the form of phonetic symbol sequence output by the acoustic model module 106 and/or an intermediate result in a form of text output by the language model module 107.
- the response message generating module 104 may be configured to generate a response message 113 according to the whole expression.
- the response message generating module 104 may include: a Conversation Engine (CE) module 108 and a Text To Speech (TTS) module 109. More particularly, the conversation engine module 108 may be configured to generate the content of a response message according to a predicted whole expression.
- the response messages output by the conversation engine module 108 may be generally in a form of text, and then the text to speech module 109 may generate a response message in a form of audio segment. Furthermore, the response message generating module 104 may generate a response message according to the final result 114.
- the response message outputting module 105 may be configured to output a response message 113 to a user 116 in response to satisfying a response condition. More particularly, a response message 113 may be output to a user as machine voice 115. Furthermore, the response message 113 may be output as voice or displayed as text. For example, in some scenario, a user performs a communication with a chatbot via speech, while the chatbot may always response in a form of text message (e.g., a response message may be displayed on a screen/plane). If the technical solutions of embodiments of the present disclosure are applied in this chatbot, the chatbot may output response more quickly than conventional chatbot does in a mode of two way radio during conversation, the user may have a feeling of performing conversation with human beings.
- the response message outputting module 105 may be mainly configured to control the output of the response message 113.
- the response condition cited herein may be the generation of a final result 114 of speech recognition. That is to say, when the speech recognition module 102 recognizes a final result 114, the outputting of the response message 113 may be triggered.
- the response conditions cited above may further include: the predicted whole expression and the final result 114 satisfy a threshold of similarity therebetween, i.e., the predicted whole expression 112 is relatively accurate. When such condition is satisfied, a response message, which is prepared in advance, may be output. If the predicted whole expression 112 does not satisfy the threshold of similarity, it may be triggered that the response message generating module 104 generates a response message 113 based on the final result 114 and output the response message 113 in a conventional way.
- the conversation processing method may include:
- S201 predicting a complete expression based on an intermediate result of speech recognition. This step of S201 may be performed by the above speech prediction module 102 and a language predicting module 103.
- S202 generating a response message based on the predicted complete expression. This step of S202 may be performed by the above response message generating module 104.
- S203 outputting a response message in response to satisfying a response condition.
- This step of S203 may be performed by the above response message outputting module 105.
- conversation processing method and apparatus of embodiments of the present disclosure which may subvert the conventional conversation mode with message as center in the art so as to realize a conversation mode in full duplex mode.
- the entire expression that a user intents to express may be predicted when obtaining speech segment information (intermediate result of speech recognition), and response messages may be generated in advance based on the predicted whole expression, so that the generated response message may be output immediately when a response condition is satisfied, e.g., it is determined that a user has finished a paragraph of speech.
- a response condition e.g., it is determined that a user has finished a paragraph of speech.
- the conversation processing device 301 may include: a Continuous Speech Recognition module 302, a language predicting module 303, a response message generating module 304, and a response message outputting module 305.
- the continuous speech recognition module 302 may be configured to perform continuous speech recognition on user’s speech 312 input by a user 320 to generate one or more intermediate results 313 of speech recognition and a final result 314.
- an intermediate result 313 may be output each time when one character or one word is recognized, and this intermediate result 313 may be a language segment from the start of a user’s speech 312 to the character or word currently recognized.
- a complete user’s speech 312 is“I want to drink some water”
- the above intermediate results 313 may be“I”,“I want to”, and“I want to drink”
- the final result 314 is“I want to drink some water”.
- the continuous speech recognition module 302 may include an Acoustic
- the acoustic model module 306 may output a speech recognition result in a form of phonetic symbol sequence.
- the language model module 307 may generate a speech recognition result in a form of text based on the phonetic symbol sequence output by the acoustic model module 406.
- the acoustic model module 306 and the language model module 307 may also output intermediate results 313 and final result 314 in the form of phonetic symbol sequence and text.
- the language predicting module 303 may be configured to predict one or more complete expressions 315 according to the one or more intermediate results 313.
- the response message generating module 304 may be configured to generate one or more response messages according to one or more complete expressions 315.
- the response messages may include response messages in a form of text (shown as response text 316 in the Figs.) and response messages in a form of audio (shown as audio segment 317).
- the response message generating module 304 may include: a Conversation
- the conversation engine module 308 may be configured to generate a response message in a form of text according to a predicted complete expression 315, i.e., a response text 316, and then the text to speech module 309 may generate a response message in a form of audio segment according to the response text 316, i.e., audio segment 317. Furthermore, the response message generating module 304 may generate a response message according to the final result 314.
- the response message outputting module 305 may be configured to compare a final result 314 with one or more complete expressions 315 in response to the generating the final result 314 of speech recognition. If there are one or more complete expressions 315 satisfying a similarity threshold, a response message may be selected from one or more response messages corresponding to the one or more complete expressions 315 satisfying the similarity threshold so as to be output. If the predicted complete expressions 315 do not satisfy the similarity threshold, the response message generating module 304 may be launched to generate a response message based on the final result 314 and output that response message, similarly with the technical solution in the art.
- the response message outputting module 305 may include a turn coordinator module 310 and a speech playing module 311. More particularly, the turn coordinator module 310 may be configured to calculate similarity between a plurality of complete expressions 315 and a final result 314 to select one or more complete expressions 315 satisfying a similarity threshold when the final result 314 is generated, and select an audio segment 317 generated based on a complete expression 315 according to a preset condition for selection, so as to send the selected audio segment 317 to a speech playing module 311 for outputting, or actuate the response message generating module 304 to generate a response message according to the final result 314 in the case that all of the complete expressions 315 are not good enough.
- Such response messages may include a response text 316 and an audio segment 317 generated based on the final result 314 (such as the processing procedure as shown with dash lines in the bottom of Fig. 3). Then, the audio segment 317 based on the final result 314 may be sent to the speech playing module 311 as output.
- the speech playing module 311 may be configured to play the audio segment 317 as machine voice 319 when the turn coordinator module 310 determines the audio segment 317 to be output. Therefore, the user 320 may hear the machine voice 319 as response to the user’s speech 312, and thus one turn of conversation ends.
- the selection on response messages corresponding to a plurality of complete expressions 315 satisfying the similarity threshold by the turn coordinator module 310 may use the following conditions:
- Condition 1 the response message corresponding to the complete expression
- Condition 2 a response, which is generated earliest. As shown in Fig. 3, after generating each intermediate result 313, a series of processing may be performed to obtain the final response message. One thread may be established for one intermediate result 313 to perform a series of subsequent processing. However, before the generating of the final result 314, it may be uncertain that each thread can complete a series of processing jobs to generate the final response message. If a response message corresponding to the complete expression 315 satisfying the similarity threshold is still in a procedure of generating, it may take time to wait for the processing result of the thread. In order to make quick response to the user, response message from a thread among these threads, whose processing is quicker, may be selected as output.
- the turn coordinator module 310 may perform the selection according to one of the above conditions or may use both of the two above conditions to perform the selection. For example, a response whose comprehensive ranking of the similarity and speed for generating response message may be selected as output.
- each intermediate result 313 when each intermediate result 313 is generated, a series of processing may be launched to generate response messages.
- One thread may be established for each intermediate result 313 to perform a series of subsequent processing.
- a thread management module may be used to manage these threads.
- the thread management module 401 may establish a thread 402 in response to the continuous generation of each intermediate result 313 by the continuous speech recognition module 302.
- Each thread 402 may parallel call the language predicting module 303 to perform prediction on the complete expressions 315 and call the response message generating module 304 to perform generation of the response messages.
- the conversation engine module 308 may be called to generate a response text 316 and then the text to speech module 309 may be called to generate an audio segment 317 as the output result of this thread 402.
- Fig. 5 is a schematic block diagram 500 showing data structure of thread management of embodiments of the present disclosure
- the response message outputting module 305 may select the processing result of one thread according to the comparison result between the final result 314 and the predicted complete expression 315.
- the data structure as shown in Fig. 5 is a schematic block diagram 500 showing data structure of thread management of embodiments of the present disclosure
- the thread identification 501 of a thread 402 may be used to record the mapping relationship between a thread identification 501 of a thread 402 and a complete expression 315, and the complete expression 315 and the thread identification 501 may be stored associated with each other, so that when a complete expression 315, which satisfies a mapping condition, is found, the thread corresponding thereto may be found correspondingly so as to further obtain or wait for the processing result of that thread.
- the thread management module 401 may be further configured to perform dynamical maintenance and management on each thread 402.
- the thread management module 401 may calculate a gain of the complete expression 315 predicted by each thread, and determine whether each thread 402 needs to be retained or abandoned. More particularly, the gain may reflect the indexes in the following two indexes.
- time gap which can be covered by a complete expression.
- the time gap as cited herein may refer to a time gap from the time when an intermediate result is obtained by recognition to the time when a final result is obtained by recognition. The earlier the complete expression is recognized, the more time that may be used to prepare the response message, and the more valuable the predicted complete expression thereof is.
- gain may be calculated based on one of the above two aspects, or both of the above two aspects.
- the thread management module 401 may perform dynamical management on each thread 402 according to the obtained gain for each thread and current computation resources. As an example of extremity, if the computer resources are rich enough, all threads may be retained. As another example of extremity, if the computer resources are extremely poor, the thread with highest gain may be retained or all threads may be abandoned. In a case that all threads are abandoned, a response message may be generated based on a final result when the final result comes. The thread management module 401 may balance the computation resources and conversation efficiency between these two cases of extremity.
- Fig. 6 is a schematic flowchart 600 showing another conversation processing method of embodiments of the present disclosure
- the conversation processing method may include the following steps.
- S601 predicting one or more complete expressions based on one or more intermediate results of speech recognition respectively. This step may be performed by the continuous speech recognition module 302 and the language predicting module 303.
- S602 generating one or more response messages based on the predicted one or more complete expressions respectively. This step may be performed by the response message generating module 304.
- S603 comparing the final result with one or more complete expressions in response to generating the final result of speech recognition so as to determine whether or not there are one or more complete expressions satisfying a threshold of similarity. If there are one or more complete expressions satisfying the threshold of similarity, a step of S604 is performed. If there is no complete expression satisfying the threshold of similarity, a step of S605 is performed. This step of S603 may be performed by the response message outputting module 305.
- S604 selecting a response message from one or more response messages corresponding to the one or more complete expressions satisfying the threshold of similarity as output. More particularly, the turn coordinator module 310 may be used to select a plurality of response messages as output according to the conditions for selecting response messages as described above. This step of S604 may be performed by the response message outputting module 305.
- S605 generating the response message based on the final result and outputting the generated response message. This step of S605 may be performed by the response message generating module 304 and the response message outputting module 305.
- steps S601 and S602 may be performed in a way of establishing one or more threads 402 based on processing of each intermediate result, and in the steps S601 and S602, maintenance and management may be dynamically performed on each thread by calculating gain of complete expression predicted by each thread.
- the specific of the processing on the threads may refer to the above description and may be performed by the thread management module 401.
- the conversation processing device 701 may include: a turn coordinator (TC) module 702, a rhythm coordinator (RC) module 703, and a speech playing module 704.
- TC turn coordinator
- RC rhythm coordinator
- the turn coordinator module 702 may be configured to obtain a plurality of response messages generated according to a user speech of each turn input by a user and write the response messages into a queue 705 in an order of generating sequence.
- the response message as shown in Fig. 7 is an audio segment 706.
- one turn of conversation may be constituted of a paragraph of speech of one party of the conversation and a response to that paragraph of speech of the other party.
- the expression of“turn” as cited herein does not mean the speech paragraph of one party should be directly followed by the response to that speech paragraph of the other party.
- one party may pause for a while when finishing a paragraph of speech and then start another paragraph of speech. In such scenario of conversation, the other party may wait until one party finishes several continuous paragraphs of speech to perform response to each paragraph of speech respectively.
- the conversation processing device may accumulate a plurality of response messages in queue and these response messages in the queue would not be output until a user has finished a series of conversation paragraphs. In such case, it is the time for outputting response message when it is determined that the user has finished a series of conversation paragraphs.
- the conversation processing device mainly concerns the timing for outputting response messages, i.e., whether or not a conversation message should be output when a user finishes a paragraph of speech.
- the conversation processing device may use the same mechanism of detecting the period of pause of the user’s speech to determine the timing for outputting response messages and detecting whether or not a user finishes a paragraph of speech, except that the period of pause to be detected for determining the timing of outputting response messages may be longer.
- the conversation processing device may determine that a user has finished a paragraph of speech when detecting that the period of pause of a user is over the time of Tl .
- the conversation processing device may keep detecting the period of pause of the user, and determine it is the timing for outputting a response message if the period of pause of the user is over time of T2 (T2 >Tl).
- T2 >Tl The specific setting for the time of Tl and T2 may be determined as needed.
- the detection on the period of pause of the user’s speech may be performed by the continuous speech recognition module 302 as shown in Fig. 3. Then the result of the detection may be notified to the rhythm coordinator module 703. Alternatively, the detection on the period of pause of the user’s speech may be performed by the rhythm coordinator module 703.
- the rhythm coordinator module 703 may be configured to detect a timing of outputting a response message. Such detection by the rhythm coordinator module 703 may be done by obtaining a detection result from the continuous speech recognition module 302, or detecting by the rhythm coordinator module 703 itself.
- the rhythm coordinator module 703 may be further configured to perform processing on a plurality of response messages in the queue 705 according to a preset outputting strategy for outputting response messages.
- the processing on a plurality of response messages in the queue 705 may include: one of a processing of outputting in queue and a processing of interrupting outputting, or one of a processing of outputting in queue and a processing of abandoning, or one of a processing of outputting in queue, a processing of interrupting outputting and a processing of abandoning.
- the specific processing of a processing of outputting in queue, a processing of interrupting outputting and a processing of abandoning may be as follows.
- the processing of outputting in queue may include: outputting in an order of sequence of writing into the queue.
- the processing mode of outputting in queue may be a normal processing mode of a conversation processing device. That is to say, in general, the response messages may be output in an order of sequence in the queue. Therefore, the conversation processing device may avoid the outputting of a machine speech from being interrupted by another machine speech.
- the processing of interrupting outputting may include: outputting one or more response messages immediately.
- the processing of interrupting outputting may be performed, so as to interrupt the speech state of a user so as to output the important and/or urgent response message.
- the audio segment being currently played may be erased or all audio segments before the audio segment just to be played may be deleted.
- the processing of abandoning may include: abandoning one or more response messages in a queue.
- the processing of abandoning may be performed when the response messages written into the queue are over a preset threshold of amount and/or length.
- the audio segments in a queue 705 may be output to the speech playing module 704 for playing under the control of the rhythm coordinator module 703, so as to generate a machine speech 707 to be output a user 708.
- the response messages obtained by the above turn coordinator 702 may be the response messages generated by the conversation processing as shown in Fig. 3, or the response messages by other ways.
- the response messages may be the response messages generated by the conventional method in the art.
- the conversation processing device 801 in Fig. 8 may combine the conversation processing device 301 as shown in Fig. 3 and the conversation processing device 701 as shown in Fig. 7.
- Original reference numbers in Fig. 3 and Fig. 7 may be used for the modules with same functions as those in Fig. 3 and Fig. 7.
- the turn coordinator module 801 may combine the functions of the turn coordinator module 310 in Fig. 3 and the turn coordinator module 702 in Fig. 7. Therefore, the turn coordinator module 801 may be labeled as the turn coordinator module 801 in Fig. 8.
- the rhythm coordinator module 703 may be included in the response message outputting module 305.
- the turn coordinator module 801 determines an audio segment 318 of each turn, the audio segment 318 may be written into a queue 705.
- the rhythm coordinator module 703 may detect a timing for outputting the response message, and perform a control on the outputting of response according to the preset outputting strategy of outputting response messages.
- the specific procedure of the processing may refer to the description on the conversation processing device in Fig. 7.
- Fig. 9 is an exemplary block diagram 900 of an implementation example of conversation processing device of embodiments of the present disclosure
- a paragraph of“tell a joke” may be used as an example of complete paragraph of speech of a user in Fig. 9 so as to explain an example of conversation processing procedure of embodiments of the present disclosure.
- the axis of X and the axis of Y in Fig. 9 may represent the timeline of the processing performed by a conversation processing device, such as predicting complete expression, preparing response messages, and the timeline of the processing of speech recognition by the conversation processing device.
- the continuous speech recognition module 302 may continuously performing the processing of speech recognition and obtain an intermediate result of“tell” 901, an intermediate result of“tell a” 902, and a final result of“tell a joke” 903. Threads may be established for the intermediate result of “tell” 901 and the intermediate result of “tell a” 902, respectively so as to start the succeeding processing such as the processing of predicting complete expression.
- Thread one performing a processing of predicting complete expression 904 on the generated intermediate result of“tell” 901 so as to generate a complete expression of “tell a story” 907 by predicting, and then a processing of generating a response text 910 may be performed to generate the content text of the story of“Long long time ago .. 912 as a response text.
- Thread two perform a processing of predicting complete expression 905 on the generated intermediate result of“tell a” 902, so as to generate a complete expression of “tell a joke” 908 by predicting, and then a processing of generating a response text 911 may be performed to generate a response text of“Something funning happens today ...” 913, and then a processing of generating an audio segment 915 may be performed.
- the processing of abandoning 914 may be performed on the thread for the complete expression of“tell a story” 907, i.e., the thread one. Then, an audio segment generated by the thread two may be output in response to the generating of the final result of“tell a joke” 909. If the thread two has not finished the generating of the audio segment when the final result of“tell a joke” 909 is generated, time may be taken to wait for the output result of the thread two.
- one or more components or modules and one or more steps as shown in Fig. 1 to Fig. 10 may be implemented by software, hardware, or in combination of software and hardware.
- the above component or module and one or more steps may be implemented in system on chip (SoC).
- Soc may include: integrated circuit chip, including one or more of processing unit (such as center processing unit (CPU), micro controller, micro processing unit, digital signal processing unit (DSP) or the like), memory, one or more communication interface, and/or other circuit for performing its function and alternative embedded firmware.
- processing unit such as center processing unit (CPU), micro controller, micro processing unit, digital signal processing unit (DSP) or the like
- memory such as center processing unit (CPU), micro controller, micro processing unit, digital signal processing unit (DSP) or the like
- DSP digital signal processing unit
- the electronic apparatus 1000 may include: a memory 1001 and a processor 1002.
- the memory 1001 may be configured to store programs. In addition to the above programs, the memory 1001 may be configured to store other data to support operations on the electronic apparatus 1000.
- the examples of these data may include instructions of any applications or methods operated on the electronic apparatus 1000, contact data, phone book data, messages, pictures, videos, and the like.
- the memory 1001 may be implemented by any kind of volatile or nonvolatile storage device or their combinations, such as static random access memory (SRAM), electronically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, disk memory, or optical disk.
- SRAM static random access memory
- EEPROM electronically erasable programmable read-only memory
- EPROM erasable programmable read-only memory
- PROM programmable read-only memory
- ROM read-only memory
- the memory 1001 may be coupled to the processor 1002 and contain instructions stored thereon.
- the instructions may cause the electronic apparatus lOOOto perform operations upon being executed by the processor 1002, the operations may include:
- the above operations may include:
- the predicting one or more complete expressions based on one or more intermediate results of speech recognition, respectively, and generating one or more response messages based on the predicted one or more complete expressions, respectively may include:
- each or the one or more threads may perform the predicting on the complete expressions and the generating of the response messages in parallel.
- the above operations may further include:
- the selecting a response message as output from the one or more response messages corresponding to the one or more complete expressions satisfying a threshold of similarity may include:
- the above operations may include:
- detecting a timing for outputting a response message and performing a processing on the plurality of response messages in the queue according to a preset outputting strategy for outputting response messages, the processing performed on plurality of response messages in the queue include: one of a processing of outputting in queue and a processing of interrupting outputting; or one of a processing of outputting in queue and a processing of abandoning; or one of a processing of outputting in queue, a processing of interrupting outputting, and a processing of abandoning.
- the processing of outputting in queue includes: outputting in an order of sequence for writing into the queue.
- the processing of interrupting outputting includes: outputting one or more response messages in the queue immediately.
- the processing of abandoning includes: abandoning one or more response messages in the queue.
- the outputting strategy for outputting response messages may include: performing the processing of interrupting outputting when it is necessary to output an important and/or urgent response messages; performing the processing of abandoning when a response message written into a queue is over a present threshold in amount and/or length.
- the electronic apparatus 1000 may further include: a communication unit 1003, a power supply unit 1004, an audio unit 1005, a display unit 1006, chipset 1007, and other units. Only part of units are exemplarily shown in Fig. 10 and it is obvious to one skilled in the art that the electronic apparatus 1000 only includes the units shown in Fig. 10.
- the communication unit 1003 may be configured to facilitate wireless or wired communication between the c electronic apparatus 1000 and other apparatuses.
- the electronic apparatus may be connected to wireless network based on communication standard, such as WiFi, 2G, 3G, or their combination.
- the communication unit 1003 may receive radio signal or radio related information from external radio management system via radio channel.
- the communication unit 1003 may further include near field communication (NFC) module for facilitating short-range communication.
- NFC near field communication
- the NFC module may be implemented with radio frequency identification (RFID) technology, Infrared data association (IrDA) technology, ultra wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA Infrared data association
- UWB ultra wideband
- Bluetooth Bluetooth
- the power supply unit 1004 may be configured to supply power to various units of the electronic device.
- the power supply unit 1004 may include a power supply management system, one or more power supplies, and other units related to power generation, management, and allocation.
- the audio unit 1005 may be configured to output and/or input audio signals.
- the audio unit 1005 may include a microphone (MIC).
- the MIC When the electronic apparatus in an operation mode, such as calling mode, recording mode, and voice recognition mode, the MIC may be configured to receive external audio signals.
- the received audio signals may be further stored in the memory 1001 or sent via the communication unit 1003.
- the audio unit 1005 may further include a speaker configured to output audio signals.
- the display unit 1006 may include a screen, which may include liquid crystal display (LCD) and touch panel (TP). If the screen includes a touch panel, the screen may be implemented as touch screen so as to receive input signal from users.
- the touch panel may include a plurality of touch sensors to sense touching, sliding, and gestures on the touch panel. The touch sensor may not only sense edges of touching or sliding actions, but also sense period and pressure related to the touching or sliding operations.
- the above memory 1001, processor 1002, communication unit 1003, power supply unit 1004, audio unit 1005 and display unit 1006 may be connected with the chipset 1007.
- the chipset 1007 may provide interface between the processor 1002 and other units of the electronic apparatus 1000. Furthermore, the chipset 1007 may provide interface for each unit of the electronic apparatus 1000 to access the memory 1001 and communication interface for accessing among units.
- a method including:
- [116] predicting a complete expression based on an intermediate result in a form of phonetic symbol sequence output by an acoustic model module, and/or predicting a complete expression based on an intermediate result in a form of text output by a language model module.
- a method including:
- the response message corresponds to a complete expression with the highest similarity with respect to the final result
- the selecting a response message as output from the one or more response messages corresponding to the one or more complete expressions satisfying a threshold of similarity includes:
- I A device including:
- a speech recognition module configured to perform speech recognition on a speech input by a user to generate an intermediate result of speech recognition
- a language predicting module configured to predict a complete expression according to the intermediate result
- a response message generating module configured to generate a response message according to the complete expression
- a response message outputting module configured to output the response message in response to satisfying a response condition.
- the outputting the response message in response to the satisfying of a response condition includes:
- a device including:
- a continuous speech recognition module configured to perform continuous speech recognition on a user’s speech input by a user to generate one or more intermediate results of speech recognition and a final result;
- a language predicting module configured to predict one or more complete expressions based on the one or more intermediate results of speech recognition
- a response message generating module configured to generate one or more response messages based on the one or more complete expressions
- a response message outputting module configured to compare the final result with the one or more complete expressions in response to the generating the final result of speech recognition and select a response message as output from the one or more response messages corresponding to the one or more complete expressions satisfying a threshold of similarity, if there are one or more complete expressions satisfying the threshold of similarity.
- a thread management module configured to establish one or more threads in response to the one or more intermediate results of speech recognition output by the continuous speech recognition module, each of the one or more threads calls the language predicting module and the response message generating module in parallel to perform the predicting on the complete expressions and the generating of the response messages.
- [149] calculate gain of the complete expressions predicted by the one or more threads and determine whether each thread is to be retained or abandoned according to the calculated gain for each thread, wherein the gain represents an accuracy of the predicting on the complete expression and/or a time gap which can be covered by the complete expression.
- the response message corresponds to a complete expression with the highest similarity with respect to the final result
- P An electronic apparatus including:
- a memory coupled to the processing unit and containing instructions stored thereon, the instructions cause the electronic apparatus to perform operations upon being executed by the processing unit, the operations include:
- An electronic apparatus including:
- a memory coupled to the processing unit and containing instructions stored thereon, the instructions cause the electronic apparatus to perform operations upon being executed by the processing unit, the operations include:
- each of the one or more threads performs the predicting on the complete expressions and the generating of the response messages in parallel.
- [168] calculating gain of the complete expressions predicted by the one or more threads and determining whether each thread is to be retained or abandoned according to the calculated gain for each thread, wherein the gain represents an accuracy of the predicting on the complete expression and/or a time gap which can be covered by the complete expression.
- the response message corresponds to a complete expression with the highest similarity with respect to the final result
- processing of outputting in queue includes: outputting in an order of sequence for writing into the queue;
- the processing of interrupting outputting includes: outputting one or more response messages in the queue immediately;
- the processing of abandoning includes: abandoning one or more response messages in the queue.
- a device including:
- a turn coordinator module configured to obtain a plurality of response messages generated with respect to a speech input by a user in each turn and write the response messages into a queue in an order of generating sequence
- rhythm coordinator module configured to detect a timing for outputting a response message and perform a processing on the plurality of response messages in the queue according to a preset outputting strategy for outputting response messages, the processing performed on plurality of response messages in the queue includes: ne of a processing of outputting in queue and a processing of interrupting outputting; or one of a processing of outputting in queue and a processing of abandoning; or one of a processing of outputting in queue, a processing of interrupting outputting, and a processing of abandoning,
- processing of outputting in queue includes: outputting in an order of sequence for writing into the queue;
- the processing of interrupting outputting includes: outputting one or more response messages in the queue immediately; and [187] the processing of abandoning includes: abandoning one or more response messages in the queue.
- An electronic apparatus including:
- a memory coupled to the processing unit and containing instructions stored thereon, the instructions cause the electronic apparatus to perform operations upon being executed by the processing unit, the operations include:
- detecting a timing for outputting a response message and performing a processing on the plurality of response messages in the queue according to a preset outputting strategy for outputting response messages, the processing performed on plurality of response messages in the queue include: ne of a processing of outputting in queue and a processing of interrupting outputting; or one of a processing of outputting in queue and a processing of abandoning; or one of a processing of outputting in queue, a processing of interrupting outputting, and a processing of abandoning,
- processing of outputting in queue includes: outputting in an order of sequence for writing into the queue;
- the processing of interrupting outputting includes: outputting one or more response messages in the queue immediately; and
- the processing of abandoning includes: abandoning one or more response messages in the queue.
- the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
- Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Versatile Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
- a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
- a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
- any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable”, to each other to achieve the desired functionality.
- operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
- references in the specification to“an implementation”,“one implementation”, “some implementations”, or“other implementations” may mean that a particular feature, structure, or characteristic described in connection with one or more implementations may be included in at least some implementations, but not necessarily in all implementations.
- the various appearances of “an implementation”, “one implementation”, or “some implementations” in the preceding description are not necessarily all referring to the same implementations.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Networks & Wireless Communication (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711487228.8A CN109994108B (en) | 2017-12-29 | 2017-12-29 | Full duplex communication techniques for conversational conversations between chat robots and people |
US16/124,077 US10847155B2 (en) | 2017-12-29 | 2018-09-06 | Full duplex communication for conversation between chatbot and human |
PCT/US2018/065314 WO2019133265A1 (en) | 2017-12-29 | 2018-12-13 | Full duplex communication for conversation between chatbot and human |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3714453A1 true EP3714453A1 (en) | 2020-09-30 |
EP3714453B1 EP3714453B1 (en) | 2022-03-16 |
Family
ID=67058377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18830117.0A Active EP3714453B1 (en) | 2017-12-29 | 2018-12-13 | Full duplex communication for conversation between chatbot and human |
Country Status (4)
Country | Link |
---|---|
US (1) | US10847155B2 (en) |
EP (1) | EP3714453B1 (en) |
CN (1) | CN109994108B (en) |
WO (1) | WO2019133265A1 (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11979360B2 (en) | 2018-10-25 | 2024-05-07 | Microsoft Technology Licensing, Llc | Multi-phrase responding in full duplex voice conversation |
CN110557451B (en) * | 2019-08-30 | 2021-02-05 | 北京百度网讯科技有限公司 | Dialogue interaction processing method and device, electronic equipment and storage medium |
CN112447177B (en) * | 2019-09-04 | 2022-08-23 | 思必驰科技股份有限公司 | Full duplex voice conversation method and system |
KR20210034276A (en) * | 2019-09-20 | 2021-03-30 | 현대자동차주식회사 | Dialogue system, dialogue processing method and electronic apparatus |
US11749265B2 (en) * | 2019-10-04 | 2023-09-05 | Disney Enterprises, Inc. | Techniques for incremental computer-based natural language understanding |
KR20210050901A (en) * | 2019-10-29 | 2021-05-10 | 엘지전자 주식회사 | Voice recognition method and device |
US11146509B2 (en) * | 2019-11-07 | 2021-10-12 | D8AI Inc. | Systems and methods of instant-messaging bot supporting human-machine symbiosis |
KR20210061141A (en) | 2019-11-19 | 2021-05-27 | 삼성전자주식회사 | Method and apparatus for processimg natural languages |
CN111739506B (en) * | 2019-11-21 | 2023-08-04 | 北京汇钧科技有限公司 | Response method, terminal and storage medium |
CN113707128B (en) * | 2020-05-20 | 2023-06-20 | 思必驰科技股份有限公司 | Test method and system for full duplex voice interaction system |
CN111585874A (en) * | 2020-07-17 | 2020-08-25 | 支付宝(杭州)信息技术有限公司 | Method and system for automatically controlling service group |
CN112002315B (en) * | 2020-07-28 | 2023-12-29 | 珠海格力节能环保制冷技术研究中心有限公司 | Voice control method and device, electrical equipment, storage medium and processor |
CN114255757A (en) * | 2020-09-22 | 2022-03-29 | 阿尔卑斯阿尔派株式会社 | Voice information processing device and voice information processing method |
CN112581972A (en) * | 2020-10-22 | 2021-03-30 | 广东美的白色家电技术创新中心有限公司 | Voice interaction method, related device and corresponding relation establishing method |
CN112714058B (en) * | 2020-12-21 | 2023-05-12 | 浙江百应科技有限公司 | Method, system and electronic device for immediately interrupting AI voice |
CN112820290A (en) * | 2020-12-31 | 2021-05-18 | 广东美的制冷设备有限公司 | Household appliance and voice control method, voice device and computer storage medium thereof |
US11881216B2 (en) | 2021-06-08 | 2024-01-23 | Bank Of America Corporation | System and method for conversation agent selection based on processing contextual data from speech |
US11605384B1 (en) | 2021-07-30 | 2023-03-14 | Nvidia Corporation | Duplex communications for conversational AI by dynamically responsive interrupting content |
CN113643696A (en) * | 2021-08-10 | 2021-11-12 | 阿波罗智联(北京)科技有限公司 | Voice processing method, device, equipment, storage medium and program |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8914288B2 (en) * | 2011-09-01 | 2014-12-16 | At&T Intellectual Property I, L.P. | System and method for advanced turn-taking for interactive spoken dialog systems |
US10614799B2 (en) * | 2014-11-26 | 2020-04-07 | Voicebox Technologies Corporation | System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance |
CN105183848A (en) * | 2015-09-07 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Human-computer chatting method and device based on artificial intelligence |
US9922647B2 (en) * | 2016-01-29 | 2018-03-20 | International Business Machines Corporation | Approach to reducing the response time of a speech interface |
-
2017
- 2017-12-29 CN CN201711487228.8A patent/CN109994108B/en active Active
-
2018
- 2018-09-06 US US16/124,077 patent/US10847155B2/en active Active
- 2018-12-13 WO PCT/US2018/065314 patent/WO2019133265A1/en unknown
- 2018-12-13 EP EP18830117.0A patent/EP3714453B1/en active Active
Also Published As
Publication number | Publication date |
---|---|
US10847155B2 (en) | 2020-11-24 |
EP3714453B1 (en) | 2022-03-16 |
CN109994108B (en) | 2023-08-29 |
WO2019133265A1 (en) | 2019-07-04 |
US20190206397A1 (en) | 2019-07-04 |
CN109994108A (en) | 2019-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10847155B2 (en) | Full duplex communication for conversation between chatbot and human | |
JP7037602B2 (en) | Long-distance expansion of digital assistant services | |
US11877016B2 (en) | Live comments generating | |
CN105378708B (en) | Context aware dialog policy and response generation | |
US10452348B2 (en) | Systems and methods for communicating notifications and textual data associated with applications | |
US11435980B2 (en) | System for processing user utterance and controlling method thereof | |
US20190095050A1 (en) | Application Gateway for Providing Different User Interfaces for Limited Distraction and Non-Limited Distraction Contexts | |
KR101703911B1 (en) | Visual confirmation for a recognized voice-initiated action | |
KR20200039030A (en) | Far-field extension for digital assistant services | |
KR20200133019A (en) | Virtual assistant activation | |
JP2017079051A (en) | Zero Latency Digital Assistant | |
KR20180109580A (en) | Electronic device for processing user utterance and method for operation thereof | |
KR20140082771A (en) | Automatically adapting user interfaces for hands-free interaction | |
US11423875B2 (en) | Highly empathetic ITS processing | |
JP7166294B2 (en) | Audio processing method, device and storage medium | |
CN105139848B (en) | Data transfer device and device | |
EP3714383A1 (en) | Characterized chatbot with personality | |
KR20150104930A (en) | Method and system of supporting multitasking of speech recognition service in in communication device | |
US11921782B2 (en) | VideoChat |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200625 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 13/00 20060101ALN20210803BHEP Ipc: G10L 15/34 20130101ALN20210803BHEP Ipc: G10L 15/18 20130101ALI20210803BHEP Ipc: G10L 15/28 20130101ALI20210803BHEP Ipc: G10L 15/26 20060101ALI20210803BHEP Ipc: G10L 15/22 20060101AFI20210803BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 13/00 20060101ALN20210916BHEP Ipc: G10L 15/34 20130101ALN20210916BHEP Ipc: G10L 15/18 20130101ALI20210916BHEP Ipc: G10L 15/28 20130101ALI20210916BHEP Ipc: G10L 15/26 20060101ALI20210916BHEP Ipc: G10L 15/22 20060101AFI20210916BHEP |
|
INTG | Intention to grant announced |
Effective date: 20211007 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 13/00 20060101ALN20210924BHEP Ipc: G10L 15/34 20130101ALN20210924BHEP Ipc: G10L 15/18 20130101ALI20210924BHEP Ipc: G10L 15/28 20130101ALI20210924BHEP Ipc: G10L 15/26 20060101ALI20210924BHEP Ipc: G10L 15/22 20060101AFI20210924BHEP |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602018032453 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1476455 Country of ref document: AT Kind code of ref document: T Effective date: 20220415 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20220316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220616 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220616 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1476455 Country of ref document: AT Kind code of ref document: T Effective date: 20220316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220617 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220718 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220716 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602018032453 Country of ref document: DE |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
26N | No opposition filed |
Effective date: 20221219 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230523 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20221231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20221213 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20221231 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20221213 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20221231 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20221231 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20231121 Year of fee payment: 6 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20231122 Year of fee payment: 6 Ref country code: DE Payment date: 20231121 Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20220316 |