CN108877792A - For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue - Google Patents
For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue Download PDFInfo
- Publication number
- CN108877792A CN108877792A CN201810541680.6A CN201810541680A CN108877792A CN 108877792 A CN108877792 A CN 108877792A CN 201810541680 A CN201810541680 A CN 201810541680A CN 108877792 A CN108877792 A CN 108877792A
- Authority
- CN
- China
- Prior art keywords
- voice
- recognition result
- user
- reply
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 239000013589 supplement Substances 0.000 claims abstract description 13
- 230000001502 supplementing effect Effects 0.000 claims abstract description 10
- 230000004044 response Effects 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 4
- 239000002131 composite material Substances 0.000 claims description 2
- 150000001875 compounds Chemical class 0.000 claims description 2
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 238000012937 correction Methods 0.000 abstract description 11
- 230000008569 process Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 10
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000000153 supplemental effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
According to an example embodiment of the present disclosure, a kind of method, apparatus for handling voice dialogue, electronic equipment and computer readable storage medium are provided.This method includes providing and replying for the first of the first voice in response to receiving the first voice from the user, wherein the first reply is generated based on the first recognition result to the first voice.This method further includes receiving the second voice from the user, wherein the second voice is for correcting or supplementing the first recognition result.In addition, this method further includes generating to reply provide a user second based on the first voice and the second voice, wherein second replys the intention replied than first and be more in line with user.In accordance with an embodiment of the present disclosure, in the case where causing chat robots that can not accurately identify the voice content of user due to speech recognition exception, further voice dialogue can be used to carry out active correction or supplement, so as to solve the exception in speech recognition in user.
Description
Technical field
Embodiment of the disclosure relates generally to artificial intelligence field, and more particularly relates to processing voice dialogue
Method, apparatus, electronic equipment and computer readable storage medium.
Background technique
In recent years, the theory of " dialogue is platform (Conversation as a Platform) " is increasingly rooted in the hearts of the people, more
Begin to use conversational man-machine interaction mode come more networking products and application.Chat robots, which refer to, can pass through text
Word, voice or picture etc. realize the computer program or software of human-computer interaction, are understood that the content that user issues, and certainly
It is dynamic to respond.Chat robots can replace true man to engage in the dialogue to a certain extent, can be integrated into conversational system
It is middle to be used as automatic on-line assistant, for the scene such as intelligence chat, customer service, information query.
Voice dialogue is a kind of common human-computer interaction form, and compared with text conversation, voice dialogue also relates to voice
The processing of content, front end recognition, speech recognition, speech synthesis etc..Due to conversational system be based on speech recognition content and into
Row work, thus have higher requirement to the accuracy of speech recognition.The application scenarios of voice dialogue may include intelligent sound
Assistant, intelligent sound box, vehicle mounted guidance etc..
Summary of the invention
According to an example embodiment of the present disclosure, a kind of method, apparatus for handling voice dialogue, electronic equipment are provided
And computer readable storage medium.
In the first aspect of the disclosure, a kind of method for handling voice dialogue is provided.This method includes:Response
It in receiving the first voice from the user, provides and is replied for the first of the first voice, wherein first replys based on to first
First recognition result of voice and be generated;The second voice from the user is received, wherein the second voice is for correcting or mending
Fill the first recognition result;And it is based on the first voice and the second voice, it generates and is replied provide a user second, wherein second
Reply the intention replied than first and be more in line with user.
In the second aspect of the disclosure, provide a kind of for handling the device of voice dialogue.The device includes:First
Module is provided, is configured to respond to receive the first voice from the user, provides and is replied for the first of the first voice,
In first reply be generated based on the first recognition result to the first voice;Speech reception module is configured as reception and comes from
The second voice of user, wherein the second voice is for correcting or supplementing the first recognition result;And second provide module, matched
It is set to based on the first voice and the second voice, generates and replied provide a user second, wherein the second reply is replied than first
It is more in line with the intention of user.
In the third aspect of the disclosure, a kind of electronic equipment is provided comprising one or more processors;And it deposits
Storage device, for storing one or more programs.One or more programs are when being executed by one or more processors, so that electronics
Equipment realizes method or process according to an embodiment of the present disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should
Method or process according to an embodiment of the present disclosure are realized when program is executed by processor.
It should be appreciated that content described in this part of the disclosure is not intended to limit the key of embodiment of the disclosure
Or important feature, without in limiting the scope of the present disclosure.The other feature of the disclosure will become to hold by description below
It is readily understood.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure
It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, wherein:
Fig. 1, which shows embodiment of the disclosure, can be realized schematic diagram in example context wherein;
Fig. 2 shows according to an embodiment of the present disclosure for handling the flow chart of the method for voice dialogue;
Fig. 3 shows according to an embodiment of the present disclosure for handling the schematic diagram of the process of speech message;
Fig. 4 shows the flow chart of the method according to an embodiment of the present disclosure that Text region mistake is solved by dialogue;
Fig. 5 shows the flow chart of the method according to an embodiment of the present disclosure that number identification mistake is solved by dialogue;
Fig. 6 shows the flow chart of the method according to an embodiment of the present disclosure by dialogue supplement recognition result;
Fig. 7 shows according to an embodiment of the present disclosure for handling the block diagram of the device of voice dialogue;And
Fig. 8 shows the block diagram that can implement the electronic equipment of multiple embodiments of the disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although shown in the drawings of the certain of the disclosure
Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this
In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that
It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
In the description of embodiment of the disclosure, term " includes " and its similar term should be understood as that opening includes,
I.e. " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " reality
Apply example " it should be understood as " at least one embodiment ".Hereafter it is also possible that other specific and implicit definition.
In the scene of voice dialogue, since there are ambient noise or user's accents, speech recognition exception is frequently resulted in
(such as speech recognition errors or cannot identify).In order to solve the problems, such as that speech recognition exception, a kind of improved method are to improve
The accuracy of speech recognition itself, another improved method are to improve semantic serious forgiveness.However, even if being changed using both the above
Into method, it still will appear the scene that can not accurately talk with since speech recognition is abnormal.In general, semantic understanding can be based on
The recognition result of voice is made, and when chat robots cannot identify or when the intention of wrong identification user, it may
Cause unexpected consequence.
Embodiment of the disclosure proposes a kind of for handling the scheme of voice dialogue.In accordance with an embodiment of the present disclosure, exist
When leading to chat robots wrong identification since speech recognition is abnormal or can not identify the voice content of user, user can make
Active correction or supplement are carried out with further voice dialogue.Therefore, semantic understanding according to an embodiment of the present disclosure is flat
Platform is able to solve the problem of speech recognition exception, thus promotes the user experience in chat process.It is detailed below with reference to attached drawing 1-8
Some example embodiments of the thin description disclosure.
Fig. 1, which shows embodiment of the disclosure, can be realized schematic diagram in example context 100 wherein.In example context
In 100, user 110 carries out voice dialogue with chat robots 120 (also referred to as " chat engine ").Optionally, user 110
It can directly can engage in the dialogue with chat robots 120 in the local of chat robots 120, i.e. user 110.Alternatively, it uses
It is logical that its local device (laptop computer, desktop computer, smart phone, tablet computer etc.) also can be used in family 110
It crosses network and chat robots 120 carries out voice dialogue.It should be appreciated that chat robots 120 can be deployed to local electricity
In sub- equipment, it can also be deployed in cloud or by distributed deployment.
With reference to Fig. 1, user 110 sends voice 121 (referred to as " the first voice ") to chat robots 120, chat robots
120 handle voices 121 and provide corresponding reply 122 (referred to as " first replys ") to user 110.So far, user 110 with chat
The first round dialogue of its robot 120 has been completed.In some embodiments, the text of voice 121 can be simultaneously displayed on
In the display of user equipment, so that user is apparent to understand current conversation content.
In embodiment of the disclosure, the demand that 122 are not able to satisfy user 110 is replied (such as to there is voice 121 and identify
Mistake prevents chat robots 120 from accurately identifying the intention of user 110), user can be sent to chat robots 120 into
The voice 123 (referred to as " the second voice ") of one step with for correcting or supplement, chat robots 120 handle voice 123 and
Corresponding reply 124 (referred to as " second replys ") is provided to user 110.In accordance with an embodiment of the present disclosure, since voice 123 is pair
The correction or supplement of the recognition result of voice 121, therefore, by combine voice 121 and 123, chat robots 120 can be more quasi-
Really identify the intention of user 110.The following table 1 shows one dialogue example of voice that the first speech recognition result is corrected.
Table 1
The recognition result of user speech: | Search the contact method of Wang Feng |
The reply of chat robots: | The contact method of Wang Feng is being inquired, please later |
The recognition result of user speech: | It is the rich of good harvest |
The reply of chat robots: | The contact method of Wang Feng is being inquired, please later |
For example, the recognition result of the voice 121 of user 110 is " contact method for searching Wang Feng ", chat robots 120 are raw
122 are replied at corresponding as " inquiring the contact method of Wang Feng, please later ".Since originally meaning for user 110 " searches Wang Feng
Contact method ", thus it corrects the recognition result of voice 121 in voice 123, has said " being the rich of good harvest ".Merely
Its robot 120 generates according to the content of correction and replys 124 " inquiring the contact method of Wang Feng, please later ".In this way, chatting
Its robot 120 can accurately identify the true intention of user 110.
Fig. 2 shows according to an embodiment of the present disclosure for handling the flow chart of the method 200 of voice dialogue.It should manage
Solution, method 200 can be executed by the chat robots 120 above with reference to described in Fig. 1.
In frame 202, in response to receiving the first voice from the user, provides and is replied for the first of the first voice,
In first reply be generated based on the first recognition result to the first voice.For example, in chat robots 120 from user 110
After receiving voice 121, corresponding reply 122 is provided to user 110.In embodiment of the disclosure, first reply due to
Speech recognition is abnormal and fails to accurately identify the intention of user, for example, it may be possible to cause the reply of mistake, or prompt " cannot be known
" user is not required to say again.
In some embodiments, it can only be provided by way of voice to user 110 and reply 122.In other implementations
In example, in order to allow the more intuitive recognition result for understanding its voice of user, (such as it can also be set by display by visual mode
It is standby) recognition result of voice 121 is presented and replys 122 textual form.
In frame 204, the second voice from the user is received, wherein the second voice is for correcting or supplementing the first identification knot
Fruit.For example, fail the intention for correctly identifying user due to replying 122 (i.e. first reply), chat robots 120 are from user 110
Further receive the voice 123 for correcting or supplementing.That is, leading to machine of chatting due to speech recognition errors
120 wrong identification of people or when can not identify the voice content of user 110, user 110 can by further voice dialogue come
Carry out active clarification.For example, user can actively correct or actively supplement, including one or more texts are corrected by voice
And/or number, or supplemented by voice.
In frame 206, it is based on the first voice and the second voice, generates and is replied provide a user second, wherein second time
Compound ratio first replys the intention for being more in line with user.For example, chat robots 120 based on to voice 123 recognition result and
To the recognition result of voice 121, is provided to user 110 and reply 124.Since the knowledge to voice 121 is corrected or supplemented to voice 123
Not as a result, chat robots 120 is enabled to further understand the intention of user 110, thus reply 124 (i.e. second replys) ratios
It replys 122 (first replys) and is more in line with the intention of user 110, to solve the exception in speech recognition, and promote chat
The user experience of process.
In some embodiments, it if second replys the intention for correctly identifying user, can execute and second time
The associated movement of complex phase.For example, since the second voice is the correction or supplement to the first recognition result of the first voice, so that chatting
Its robot can accurately identify the intention of user, thus chat robots can be executed or be indicated to execute and reply phase with second
Associated movement, such as call, digital map navigation etc..In some embodiments, if offer second reply after threshold
Further voice is not received from user in the value time, the intention that the second reply has met user can be defaulted as, into
And it can directly execute movement associated with the second reply.It should be appreciated that can also be while generation second is replied or preceding
Movement associated with the second reply is directly executed afterwards, and the intention for whether having met user is replied but regardless of second.
In some embodiments, if second replys the intention of still unidentified user, it can receive from the user the
Three voices.Next, being based at least partially on third voice, third reply is provided a user.For example, although second replys than the
One replys the intention for being more in line with user, but may still can not fully meet the demand of user.In this case, user
Further voice can be initiated to continue the speech recognition result before correcting or supplementing.
Fig. 3 shows according to an embodiment of the present disclosure for handling the schematic diagram of the process 300 of speech message.It should manage
Solution, process 300 can execute by the chat robots 120 above with reference to described in Fig. 1, and more than process 300 can be
The example implementation of reply is provided with reference to described in Fig. 2 based on the voice received.
In frame 302, voice from the user is inputted, in frame 304, voice will be inputted by automatic speech recognition (ASR)
It is converted into text.In frame 306, the expression shape that computer can identify is converted text to by natural language understanding (NLU)
Formula.In frame 308, the intention and word slot in text are extracted, and safeguard (DST) by the dialogue state of itself and history by dialogue state
It is integrated.In frame 310, according to current dialogue state, current state is best suitable for by acting candidate sequencing selection one
Movement.After being acted, in frame 312, by generating natural language (NLG), and in frame 314 by the natural language of generation
It carries out speech synthesis (TTS).Then, in frame 316, voice is exported to be supplied to user.In process 300, frame 302, frame 304,
Frame 314 and frame 316 are related to speech processes process, and frame 306, frame 308, frame 310 and frame 312 are related to natural language processing
Journey, wherein dialogue state maintenance and the candidate sequence composition dialogue management of movement, can voice-based semantic expressiveness and it is current on
The movement to be executed hereafter is generated, and updates context.
In some embodiments, in order to make user that there is better interactive experience, speech recognition result can be shown
It shows on equipment (such as display of user equipment).In such a case, it is possible to which voice is presented simultaneously by display equipment
Recognition result and its reply, so that user could be aware that the recognition result of its voice.
Fig. 4 shows the process of the method 400 according to an embodiment of the present disclosure that Text region mistake is solved by dialogue
Figure.It should be appreciated that method 400 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 402 can be with
For the example implementation of the frame 204 above with reference to described in Fig. 2, frame 404 and 406 can be frame 206 above with reference to described in Fig. 2
Example implementation.
In frame 402, the second voice for correcting one or more texts in the first recognition result is received, wherein to the
There are Text region mistakes for first recognition result of one voice.In frame 404, entangled using to the second recognition result of the second voice
Positive first recognition result.Second is presented by display equipment and is replied based on the first corrected recognition result in frame 406.Example
Such as, following table 2-3 shows the voice dialogue corrected to one or more errors in text of the first speech recognition result and shows
Example.
Table 2
The recognition result of user speech: | I will go to west two odd |
The reply of chat robots: | Sorry, I does not understand your meaning |
The recognition result of user speech: | It is West Second Qi |
The reply of chat robots: | Inquiring the navigation routine of West Second Qi |
Table 3
The recognition result of user speech: | I will go to practise two surprises |
The reply of chat robots: | Sorry, I does not understand your meaning |
The recognition result of user speech: | It is the west of thing, the flag of national flag |
The reply of chat robots: | Inquiring the navigation routine of West Second Qi |
In the example of table 2, user corrects for single wrongly written character " surprise " in the recognition result of the first voice in the second voice;
In the example of table 3, user corrects multiple wrongly written characters " habit " and " surprise " in the recognition result of the first voice in the second voice.It connects
Get off, chat robots provide the second reply based on the speech recognition result after correction, for example, entangling in the example of table 2 and table 3
Recognition result after just is " I will go to West Second Qi ".Chinese text is carried out although should be appreciated that in the embodiment of the above table 2-3
It corrects, however, the disclosure is speech recognition errors that embodiment can be used for correcting other language.
Fig. 5 shows the process of the method 500 according to an embodiment of the present disclosure that number identification mistake is solved by dialogue
Figure.It should be appreciated that method 500 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 502 can be with
For the example implementation of the frame 204 above with reference to described in Fig. 2, frame 504 and 506 can be frame 206 above with reference to described in Fig. 2
Example implementation.
It in frame 502, receives for correcting one or more the second digital voices in the first recognition result, wherein to the
There is number identification mistake in the first recognition result of one voice.In frame 504, using in the second recognition result to the second voice
One or more numbers correct the first recognition result.It is set based on the first corrected recognition result by display in frame 506
It is standby that the second reply is presented.For example, following table 4 shows the voice pair corrected to the numeric error of the first speech recognition result
Talk about example.
Table 4
The recognition result of user speech: | Phone 13511652271 |
The reply of chat robots: | Phoning 13511652271 |
The recognition result of user speech: | It is 110 |
The reply of chat robots: | Phoning 13511052271 |
In the example of table 4, user corrects single error number in the recognition result of the first voice in the second voice
" 6 ", i.e., in telephone number the 6th number should be 0 rather than 6.In some embodiments, in the mistake of user's correcting digital
Cheng Zhong can match based on the maximum between the second voice and the recognition result of the first voice to determine the number corrected of user's needs
Character segment.For example, digital " 110 " are utmostly matched with " 116 " in recognition result, thus can be true in the example of table 4
Fixed number word " 110 " is for correcting " 116 " in previous round dialogue.In some embodiments, user can also correct language simultaneously
Both the text of mistake in sound recognition result and number.
The inventors of the present application found that certain speech recognition errors be only capable of being found by visual display (for example, due to
Unisonance and cause to be Text region mistake), and some speech recognition errors can be also found by voice, such as telephone number
In the identified mistakes of one or more number.In some embodiments, it is also possible to not by showing that equipment shows the first voice
Recognition result and first reply, and the first reply is only provided by voice.In this case, user can be based on first
Set of number is read aloud in reply, to issue the voice for correcting the one or more number in set of number.Next, chatting
Its robot is based on the set of number after correction, to provide the second reply for being more in line with user's intention.It should be appreciated that not showing
The scene of recognition result both can be the scene intelligent sound box of display (for example, do not have) without display, can also be with
For with display, still display is not used in the scene of display speech recognition result (for example, under smart phone black state
Voice dialogue).
Fig. 6 shows the flow chart of the method 600 according to an embodiment of the present disclosure by dialogue supplement recognition result.It answers
Work as understanding, method 600 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 602 can for
On the frame 204 with reference to described in Fig. 2 example implementation, frame 604 and 606 can show for above with reference to described in Fig. 2 frame 206
Example is realized.
In frame 602, the second voice for supplementing the first recognition result is received.For example, the first identification to the first voice
As a result do not reflect the demand of user completely.In frame 604, the first voice is semantically being supplemented in response to the content of the second voice
Content combines the first recognition result and to the second recognition result of the second voice to generate third recognition result.
The relationship whether content of two sections of voices there is supplement to illustrate in other words can be determined by various means.
For example, in some embodiments, it can be determined that the content of the second voice whether can be in the content semantically with the first voice
It is no that there is continuous relationship.For example, if the content of two sections of voices can be used as entirety and be parsed together, it can be assumed that the two has
There is semantic continuous relationship, and determines that the content of the second voice is the supplement of the first voice accordingly.Alternatively or additionally, one
In a little embodiments, it can be determined that whether the probability that the content of two sections of voices occurs simultaneously is greater than predetermined threshold.For example, if two
The content of voice is all to occur simultaneously most of the time, then can be determined that the content of the second voice is the supplement of the first voice.
In frame 606, it is based on third recognition result, second is presented by display equipment and replys.For example, following table 5 is shown
The dialogue example that first speech recognition result is supplemented.
Table 5
In the example of table 5, user's is intended to Peking University west gate, and speech recognition system is receiving " north
Identification is just had begun after capital university ", thus not can recognize that the accurate intention of user.In this case, user can lead to
Natural language dialogue supplemental information is crossed, the recognition result for then combining the two voices generates new recognition result " I will go to Beijing
University west gate " then generates corresponding reply based on new recognition result.In some embodiments, complete for not expressing
Recognition result, user can direct supplemental information, without wait recognition result and parsing result return, do not need to repeat yet
Previous conversation content.
Fig. 7 shows according to an embodiment of the present disclosure for handling the block diagram of the device 700 of voice dialogue.Such as Fig. 7 institute
Show, device 700 includes the first offer module 710, speech reception module 720 and the second offer module 730.First provides module
710 are configured to respond to receive the first voice from the user, provide and reply for the first of the first voice, wherein first
Reply is generated based on the first recognition result to the first voice.Speech reception module 720 is configured as receiving from user
The second voice, wherein the second voice is for correcting or supplement the first recognition result.Second offer module 730 is configured as base
It in the first voice and the second voice, generates and is replied provide a user second, wherein the second reply is replied than first and more accorded with
Share the intention at family.
In some embodiments, wherein the first offer module 710 includes the first presentation module, the first presentation module is configured
It is replied for the first recognition result and first is presented by display equipment.
In some embodiments, wherein the second offer module 730 includes the first correction module, the first correction module is configured
The Text region mistake in the first recognition result is corrected to use the second recognition result to the second voice.
In some embodiments, wherein the second offer module 730 includes the second correction module, the second correction module is configured
Number to use one or more numbers in the second recognition result to the second voice to correct in the first recognition result is known
Not mistake.
In some embodiments, wherein the second voice is for supplementing the first recognition result, and second provides the packet of module 730
It includes:Composite module, be configured to respond to determine the content of the second voice in the content for semantically supplementing the first voice, combination the
One recognition result and to the second recognition result of the second voice to generate third recognition result;And second present module, matched
It is set to based on third recognition result, is presented second by display equipment and replys.
In some embodiments, wherein the first offer module 710 includes the first speech rendering module, the first voice provides mould
Block is configured as only providing the first reply by voice, wherein the first reply includes the set of number from the first recognition result.
In some embodiments, wherein the second voice is for correcting in set of number, and second provides the packet of module 730
It includes:Third corrects module, is configured with one or more numbers in the second recognition result to the second voice to correct
Set of number;And second speech rendering module, it is configured as passing through voice based on corrected set of number and providing second time
It is multiple.
In some embodiments, device 700 further includes action executing module, and action executing module is configured to respond to
Two reply the intention of identified user, execute movement associated with the second reply.
In some embodiments, device 700 further includes:Third speech reception module is configured to respond to the second reply
The intention of still unidentified user receives third voice from the user;And third provides module, is configured as at least partly
Based on third voice, third reply is provided a user.
It should be appreciated that the first offer module 710, speech reception module 720 and second shown in Fig. 7 provide module
730 can be included in the chat robots 120 with reference to described in Fig. 1.Furthermore, it is to be understood that module shown in Fig. 7
Can execute with reference to embodiment of the disclosure method or in the process the step of or movement.
Fig. 8 shows the schematic block diagram that can be used to implement the example apparatus 800 of embodiment of the disclosure.It should manage
Solution, it is described for handling the device 700 or chat robots of voice dialogue that equipment 800 can be used to implement the disclosure
120.As shown, equipment 800 includes central processing unit (CPU) 801, it can be according to being stored in read-only memory (ROM)
Computer program instructions in 802 are loaded into the computer in random access storage device (RAM) 803 from storage unit 808
Program instruction, to execute various movements appropriate and processing.In RAM 803, can also store equipment 800 operate it is required various
Program and data.CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface
805 are also connected to bus 804.
Multiple components in equipment 800 are connected to I/O interface 805, including:Input unit 806, such as keyboard, mouse etc.;
Output unit 807, such as various types of displays, loudspeaker etc.;Storage unit 808, such as disk, CD etc.;And it is logical
Believe unit 809, such as network interface card, modem, wireless communication transceiver etc..Communication unit 809 allows equipment 800 by such as
The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 801 executes each method and process as described above, such as method 200, process 300, method
400, method 500 and method 600.For example, in some embodiments, method 200, process 300, method 400, method 500 with
And method 600 can be implemented as computer software programs, be tangibly embodied in machine readable media, such as storage unit
808.In some embodiments, computer program it is some or all of can via ROM 802 and/or communication unit 809 and
It is loaded into and/or is installed in equipment 800.When computer program loads to RAM 803 and by CPU 801 execute when, can hold
The one or more movements or step of row method as described above 200, process 300, method 400, method 500 and method 600.
Alternatively, in other embodiments, CPU 801 can by other any modes (for example, by means of firmware) appropriate by
It is configured to execution method 200, process 300, method 400, method 500 and method 600.
Function described herein can be executed at least partly by one or more hardware logic components.Example
Such as, without limitation, the hardware logic component for the exemplary type that can be used includes:Field programmable gate array (FPGA), specially
With integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device
(CPLD), etc..
For implement disclosed method program code can using any combination of one or more programming languages come
It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units
Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution
Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software
Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for
The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can
Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity
Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction
Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter
Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM
Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or
Any appropriate combination of above content.
Although this should be understood as requiring acting in this way in addition, depicting each movement or step using certain order
Or step is executed with shown certain order or in sequential order, or requires the movement of all diagrams or step that should be performed
To obtain desired result.Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although above
Several specific implementation details are contained in discussion, but these are not construed as the limitation to the scope of the present disclosure.In list
Certain features described in the context of only embodiment can also be realized in combination in single realize.On the contrary, single
Various features described in the context of realization can also be realized individually or in any suitable subcombination multiple
In realization.
Although having used the implementation specific to the language description of the structure feature and/or method logical action disclosure
Example it should be appreciated that theme defined in the appended claims is not necessarily limited to special characteristic described above or dynamic
Make.On the contrary, special characteristic described above and movement are only to realize the exemplary forms of claims.
Claims (20)
1. a kind of method for handling voice dialogue, including:
In response to receiving the first voice from the user, provides and replied for the first of first voice, described first time
Complex radical is generated in the first recognition result to first voice;
The second voice from the user is received, second voice is for correcting or supplementing first recognition result;
And
Based on first voice and second voice, generates and replied provided to the user second, described second time
First replys the intention for being more in line with the user described in compound ratio.
2. according to the method described in claim 1, wherein provide for first voice first reply include:
First recognition result and described first is presented by display equipment to reply.
3. according to the method described in claim 2, wherein generate it is described second reply include:
Using the second recognition result to second voice, to correct the Text region mistake in first recognition result.
4. according to the method described in claim 2, wherein generate it is described second reply include:
Using the one or more number in the second recognition result to second voice, to correct first recognition result
In number identification mistake.
5. according to the method described in claim 2, wherein generate it is described second reply include:
In response to determination second voice content in the content for semantically supplementing first voice, combine described first and know
Other result and to the second recognition result of second voice to generate third recognition result;And
Based on third recognition result, described second is presented by the display equipment and is replied.
6. according to the method described in claim 1, wherein provide for first voice first reply include:
It provides described first by voice to reply, first reply includes the set of number from first recognition result.
7. according to the method described in claim 6, wherein generate it is described second reply include:
The set of number is corrected using one or more numbers in the second recognition result to second voice;And
Based on the corrected set of number, described second is provided by voice and is replied.
8. method according to any one of claims 1-7 further includes:
The intention for replying the identified user in response to described second executes associated with second reply dynamic
Make.
9. method according to any one of claims 1-7 further includes:
The intention for replying the still unidentified user in response to described second receives the third voice from the user;
And
It is based at least partially on the third voice, third is provided to the user and replys.
10. it is a kind of for handling the device of voice dialogue, including:
First provides module, is configured to respond to receive the first voice from the user, provide for first voice
First reply, it is described first reply be generated based on the first recognition result to first voice;
Speech reception module, is configured as receiving the second voice from the user, second voice for correcting or
Supplement first recognition result;And
Second provides module, is configured as based on first voice and second voice, generation will be provided to the user
Second reply, described second replys to reply than described first and is more in line with the intention of the user.
11. device according to claim 10, wherein the first offer module includes:
First is presented module, is configured as that first recognition result and first reply are presented by display equipment.
12. device according to claim 11, wherein the second offer module includes:
First corrects module, is configured with the second recognition result to second voice to correct the first identification knot
Text region mistake in fruit.
13. device according to claim 11, wherein the second offer module includes:
Second corrects module, and the one or more number being configured in the second recognition result to second voice is come
Correct the number identification mistake in first recognition result.
14. device according to claim 11, wherein the second offer module includes:
Composite module is configured to respond to determine that the content of second voice is semantically being supplemented in first voice
Hold, combines first recognition result and to the second recognition result of second voice to generate third recognition result;And
Second is presented module, is configured as being presented described second based on third recognition result by the display equipment and being replied.
15. device according to claim 10, wherein the first offer module includes:
First speech rendering module is configured as providing first reply by voice, and described first replys including coming from institute
State the set of number of the first recognition result.
16. device according to claim 15, wherein the second offer module includes:
Third corrects module, and the one or more number being configured in the second recognition result to second voice is come
Correct the set of number;And
Second speech rendering module is configured as only providing described second by voice based on the corrected set of number
It replys.
17. device described in any one of 0-16 according to claim 1, further includes:
Action executing module is configured to respond to the described second intention for replying the identified user, execute with
Described second replys associated movement.
18. device described in any one of 0-16 according to claim 1, further includes:
Third speech reception module is configured to respond to the described second intention for replying the still unidentified user, connects
Receive the third voice from the user;And
Third provides module, is configured as being based at least partially on the third voice, provides third to the user and replys.
19. a kind of electronic equipment, the electronic equipment include:
One or more processors;And
Storage device, for storing one or more programs, one or more of programs are when by one or more of processing
Device executes, so that the electronic equipment realizes method according to claim 1 to 9.
20. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor
Method according to claim 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810541680.6A CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810541680.6A CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108877792A true CN108877792A (en) | 2018-11-23 |
CN108877792B CN108877792B (en) | 2023-10-24 |
Family
ID=64335845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810541680.6A Active CN108877792B (en) | 2018-05-30 | 2018-05-30 | Method, apparatus, electronic device and computer readable storage medium for processing voice conversations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108877792B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109712616A (en) * | 2018-11-29 | 2019-05-03 | 平安科技(深圳)有限公司 | Telephone number error correction method, device and computer equipment based on data processing |
CN109922371A (en) * | 2019-03-11 | 2019-06-21 | 青岛海信电器股份有限公司 | Natural language processing method, equipment and storage medium |
CN110223694A (en) * | 2019-06-26 | 2019-09-10 | 百度在线网络技术(北京)有限公司 | Method of speech processing, system and device |
CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
CN110347815A (en) * | 2019-07-11 | 2019-10-18 | 上海蔚来汽车有限公司 | Multi-task processing method and multitasking system in speech dialogue system |
CN110738997A (en) * | 2019-10-25 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | information correction method, device, electronic equipment and storage medium |
CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
WO2023040658A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Speech interaction method and electronic device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012037820A (en) * | 2010-08-11 | 2012-02-23 | Murata Mach Ltd | Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method |
CN105094315A (en) * | 2015-06-25 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus for smart man-machine chat based on artificial intelligence |
CN105468582A (en) * | 2015-11-18 | 2016-04-06 | 苏州思必驰信息科技有限公司 | Method and device for correcting numeric string based on human-computer interaction |
CN106710592A (en) * | 2016-12-29 | 2017-05-24 | 北京奇虎科技有限公司 | Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment |
US9728188B1 (en) * | 2016-06-28 | 2017-08-08 | Amazon Technologies, Inc. | Methods and devices for ignoring similar audio being received by a system |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
CN107305483A (en) * | 2016-04-25 | 2017-10-31 | 北京搜狗科技发展有限公司 | A kind of voice interactive method and device based on semantics recognition |
JP2018004976A (en) * | 2016-07-04 | 2018-01-11 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice interactive method, voice interactive device and voice interactive program |
CN107870977A (en) * | 2016-09-27 | 2018-04-03 | 谷歌公司 | Chat robots output is formed based on User Status |
CN107943914A (en) * | 2017-11-20 | 2018-04-20 | 渡鸦科技(北京)有限责任公司 | Voice information processing method and device |
-
2018
- 2018-05-30 CN CN201810541680.6A patent/CN108877792B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012037820A (en) * | 2010-08-11 | 2012-02-23 | Murata Mach Ltd | Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method |
CN105094315A (en) * | 2015-06-25 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus for smart man-machine chat based on artificial intelligence |
CN105468582A (en) * | 2015-11-18 | 2016-04-06 | 苏州思必驰信息科技有限公司 | Method and device for correcting numeric string based on human-computer interaction |
CN107305483A (en) * | 2016-04-25 | 2017-10-31 | 北京搜狗科技发展有限公司 | A kind of voice interactive method and device based on semantics recognition |
US9728188B1 (en) * | 2016-06-28 | 2017-08-08 | Amazon Technologies, Inc. | Methods and devices for ignoring similar audio being received by a system |
JP2018004976A (en) * | 2016-07-04 | 2018-01-11 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice interactive method, voice interactive device and voice interactive program |
CN107870977A (en) * | 2016-09-27 | 2018-04-03 | 谷歌公司 | Chat robots output is formed based on User Status |
CN106710592A (en) * | 2016-12-29 | 2017-05-24 | 北京奇虎科技有限公司 | Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment |
CN107045496A (en) * | 2017-04-19 | 2017-08-15 | 畅捷通信息技术股份有限公司 | The error correction method and error correction device of text after speech recognition |
CN107943914A (en) * | 2017-11-20 | 2018-04-20 | 渡鸦科技(北京)有限责任公司 | Voice information processing method and device |
Non-Patent Citations (2)
Title |
---|
盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》 * |
盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》, 23 November 2017 (2017-11-23), pages 19 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109712616B (en) * | 2018-11-29 | 2023-11-14 | 平安科技(深圳)有限公司 | Telephone number error correction method and device based on data processing and computer equipment |
CN109712616A (en) * | 2018-11-29 | 2019-05-03 | 平安科技(深圳)有限公司 | Telephone number error correction method, device and computer equipment based on data processing |
CN109922371A (en) * | 2019-03-11 | 2019-06-21 | 青岛海信电器股份有限公司 | Natural language processing method, equipment and storage medium |
CN109922371B (en) * | 2019-03-11 | 2021-07-09 | 海信视像科技股份有限公司 | Natural language processing method, apparatus and storage medium |
CN110223694B (en) * | 2019-06-26 | 2021-10-15 | 百度在线网络技术(北京)有限公司 | Voice processing method, system and device |
CN110223694A (en) * | 2019-06-26 | 2019-09-10 | 百度在线网络技术(北京)有限公司 | Method of speech processing, system and device |
CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
CN110347815A (en) * | 2019-07-11 | 2019-10-18 | 上海蔚来汽车有限公司 | Multi-task processing method and multitasking system in speech dialogue system |
CN110738997B (en) * | 2019-10-25 | 2022-06-17 | 百度在线网络技术(北京)有限公司 | Information correction method and device, electronic equipment and storage medium |
CN110738997A (en) * | 2019-10-25 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | information correction method, device, electronic equipment and storage medium |
CN112002321A (en) * | 2020-08-11 | 2020-11-27 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
CN112002321B (en) * | 2020-08-11 | 2023-09-19 | 海信电子科技(武汉)有限公司 | Display device, server and voice interaction method |
WO2023040658A1 (en) * | 2021-09-18 | 2023-03-23 | 华为技术有限公司 | Speech interaction method and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN108877792B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108877792A (en) | For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue | |
US10460029B2 (en) | Reply information recommendation method and apparatus | |
US20210280190A1 (en) | Human-machine interaction | |
US11217236B2 (en) | Method and apparatus for extracting information | |
CN109002501A (en) | For handling method, apparatus, electronic equipment and the computer readable storage medium of natural language dialogue | |
CN112926306B (en) | Text error correction method, device, equipment and storage medium | |
CN107430616A (en) | The interactive mode of speech polling re-forms | |
CN107680588B (en) | Intelligent voice navigation method, device and storage medium | |
CN112506359B (en) | Method and device for providing candidate long sentences in input method and electronic equipment | |
CN112307188B (en) | Dialog generation method, system, electronic device and readable storage medium | |
US20220358955A1 (en) | Method for detecting voice, method for training, and electronic devices | |
WO2019035373A1 (en) | Information processing device, information processing method, and program | |
CN112527127B (en) | Training method and device for input method long sentence prediction model, electronic equipment and medium | |
US20190073994A1 (en) | Self-correcting computer based name entity pronunciations for speech recognition and synthesis | |
CN115905497B (en) | Method, device, electronic equipment and storage medium for determining reply sentence | |
US20230153550A1 (en) | Machine Translation Method and Apparatus, Device and Storage Medium | |
CN114970666B (en) | Spoken language processing method and device, electronic equipment and storage medium | |
JP7349523B2 (en) | Speech recognition method, speech recognition device, electronic device, storage medium computer program product and computer program | |
CN114758649B (en) | Voice recognition method, device, equipment and medium | |
CN112669839B (en) | Voice interaction method, device, equipment and storage medium | |
CN113743127B (en) | Task type dialogue method, device, electronic equipment and storage medium | |
CN114722171B (en) | Multi-round dialogue processing method and device, electronic equipment and storage medium | |
CN114860910A (en) | Intelligent dialogue method and system | |
CN112466278B (en) | Voice recognition method and device and electronic equipment | |
US20230196026A1 (en) | Method for Evaluating Text Content, and Related Apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |