CN108877792A - For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue - Google Patents

For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue Download PDF

Info

Publication number
CN108877792A
CN108877792A CN201810541680.6A CN201810541680A CN108877792A CN 108877792 A CN108877792 A CN 108877792A CN 201810541680 A CN201810541680 A CN 201810541680A CN 108877792 A CN108877792 A CN 108877792A
Authority
CN
China
Prior art keywords
voice
recognition result
user
reply
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810541680.6A
Other languages
Chinese (zh)
Other versions
CN108877792B (en
Inventor
王矩
张晶晶
孙珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810541680.6A priority Critical patent/CN108877792B/en
Publication of CN108877792A publication Critical patent/CN108877792A/en
Application granted granted Critical
Publication of CN108877792B publication Critical patent/CN108877792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

According to an example embodiment of the present disclosure, a kind of method, apparatus for handling voice dialogue, electronic equipment and computer readable storage medium are provided.This method includes providing and replying for the first of the first voice in response to receiving the first voice from the user, wherein the first reply is generated based on the first recognition result to the first voice.This method further includes receiving the second voice from the user, wherein the second voice is for correcting or supplementing the first recognition result.In addition, this method further includes generating to reply provide a user second based on the first voice and the second voice, wherein second replys the intention replied than first and be more in line with user.In accordance with an embodiment of the present disclosure, in the case where causing chat robots that can not accurately identify the voice content of user due to speech recognition exception, further voice dialogue can be used to carry out active correction or supplement, so as to solve the exception in speech recognition in user.

Description

For handling the method, apparatus of voice dialogue, electronic equipment and computer-readable depositing Storage media
Technical field
Embodiment of the disclosure relates generally to artificial intelligence field, and more particularly relates to processing voice dialogue Method, apparatus, electronic equipment and computer readable storage medium.
Background technique
In recent years, the theory of " dialogue is platform (Conversation as a Platform) " is increasingly rooted in the hearts of the people, more Begin to use conversational man-machine interaction mode come more networking products and application.Chat robots, which refer to, can pass through text Word, voice or picture etc. realize the computer program or software of human-computer interaction, are understood that the content that user issues, and certainly It is dynamic to respond.Chat robots can replace true man to engage in the dialogue to a certain extent, can be integrated into conversational system It is middle to be used as automatic on-line assistant, for the scene such as intelligence chat, customer service, information query.
Voice dialogue is a kind of common human-computer interaction form, and compared with text conversation, voice dialogue also relates to voice The processing of content, front end recognition, speech recognition, speech synthesis etc..Due to conversational system be based on speech recognition content and into Row work, thus have higher requirement to the accuracy of speech recognition.The application scenarios of voice dialogue may include intelligent sound Assistant, intelligent sound box, vehicle mounted guidance etc..
Summary of the invention
According to an example embodiment of the present disclosure, a kind of method, apparatus for handling voice dialogue, electronic equipment are provided And computer readable storage medium.
In the first aspect of the disclosure, a kind of method for handling voice dialogue is provided.This method includes:Response It in receiving the first voice from the user, provides and is replied for the first of the first voice, wherein first replys based on to first First recognition result of voice and be generated;The second voice from the user is received, wherein the second voice is for correcting or mending Fill the first recognition result;And it is based on the first voice and the second voice, it generates and is replied provide a user second, wherein second Reply the intention replied than first and be more in line with user.
In the second aspect of the disclosure, provide a kind of for handling the device of voice dialogue.The device includes:First Module is provided, is configured to respond to receive the first voice from the user, provides and is replied for the first of the first voice, In first reply be generated based on the first recognition result to the first voice;Speech reception module is configured as reception and comes from The second voice of user, wherein the second voice is for correcting or supplementing the first recognition result;And second provide module, matched It is set to based on the first voice and the second voice, generates and replied provide a user second, wherein the second reply is replied than first It is more in line with the intention of user.
In the third aspect of the disclosure, a kind of electronic equipment is provided comprising one or more processors;And it deposits Storage device, for storing one or more programs.One or more programs are when being executed by one or more processors, so that electronics Equipment realizes method or process according to an embodiment of the present disclosure.
In the fourth aspect of the disclosure, a kind of computer-readable medium is provided, computer program is stored thereon with, it should Method or process according to an embodiment of the present disclosure are realized when program is executed by processor.
It should be appreciated that content described in this part of the disclosure is not intended to limit the key of embodiment of the disclosure Or important feature, without in limiting the scope of the present disclosure.The other feature of the disclosure will become to hold by description below It is readily understood.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, wherein:
Fig. 1, which shows embodiment of the disclosure, can be realized schematic diagram in example context wherein;
Fig. 2 shows according to an embodiment of the present disclosure for handling the flow chart of the method for voice dialogue;
Fig. 3 shows according to an embodiment of the present disclosure for handling the schematic diagram of the process of speech message;
Fig. 4 shows the flow chart of the method according to an embodiment of the present disclosure that Text region mistake is solved by dialogue;
Fig. 5 shows the flow chart of the method according to an embodiment of the present disclosure that number identification mistake is solved by dialogue;
Fig. 6 shows the flow chart of the method according to an embodiment of the present disclosure by dialogue supplement recognition result;
Fig. 7 shows according to an embodiment of the present disclosure for handling the block diagram of the device of voice dialogue;And
Fig. 8 shows the block diagram that can implement the electronic equipment of multiple embodiments of the disclosure.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although shown in the drawings of the certain of the disclosure Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
In the description of embodiment of the disclosure, term " includes " and its similar term should be understood as that opening includes, I.e. " including but not limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " reality Apply example " it should be understood as " at least one embodiment ".Hereafter it is also possible that other specific and implicit definition.
In the scene of voice dialogue, since there are ambient noise or user's accents, speech recognition exception is frequently resulted in (such as speech recognition errors or cannot identify).In order to solve the problems, such as that speech recognition exception, a kind of improved method are to improve The accuracy of speech recognition itself, another improved method are to improve semantic serious forgiveness.However, even if being changed using both the above Into method, it still will appear the scene that can not accurately talk with since speech recognition is abnormal.In general, semantic understanding can be based on The recognition result of voice is made, and when chat robots cannot identify or when the intention of wrong identification user, it may Cause unexpected consequence.
Embodiment of the disclosure proposes a kind of for handling the scheme of voice dialogue.In accordance with an embodiment of the present disclosure, exist When leading to chat robots wrong identification since speech recognition is abnormal or can not identify the voice content of user, user can make Active correction or supplement are carried out with further voice dialogue.Therefore, semantic understanding according to an embodiment of the present disclosure is flat Platform is able to solve the problem of speech recognition exception, thus promotes the user experience in chat process.It is detailed below with reference to attached drawing 1-8 Some example embodiments of the thin description disclosure.
Fig. 1, which shows embodiment of the disclosure, can be realized schematic diagram in example context 100 wherein.In example context In 100, user 110 carries out voice dialogue with chat robots 120 (also referred to as " chat engine ").Optionally, user 110 It can directly can engage in the dialogue with chat robots 120 in the local of chat robots 120, i.e. user 110.Alternatively, it uses It is logical that its local device (laptop computer, desktop computer, smart phone, tablet computer etc.) also can be used in family 110 It crosses network and chat robots 120 carries out voice dialogue.It should be appreciated that chat robots 120 can be deployed to local electricity In sub- equipment, it can also be deployed in cloud or by distributed deployment.
With reference to Fig. 1, user 110 sends voice 121 (referred to as " the first voice ") to chat robots 120, chat robots 120 handle voices 121 and provide corresponding reply 122 (referred to as " first replys ") to user 110.So far, user 110 with chat The first round dialogue of its robot 120 has been completed.In some embodiments, the text of voice 121 can be simultaneously displayed on In the display of user equipment, so that user is apparent to understand current conversation content.
In embodiment of the disclosure, the demand that 122 are not able to satisfy user 110 is replied (such as to there is voice 121 and identify Mistake prevents chat robots 120 from accurately identifying the intention of user 110), user can be sent to chat robots 120 into The voice 123 (referred to as " the second voice ") of one step with for correcting or supplement, chat robots 120 handle voice 123 and Corresponding reply 124 (referred to as " second replys ") is provided to user 110.In accordance with an embodiment of the present disclosure, since voice 123 is pair The correction or supplement of the recognition result of voice 121, therefore, by combine voice 121 and 123, chat robots 120 can be more quasi- Really identify the intention of user 110.The following table 1 shows one dialogue example of voice that the first speech recognition result is corrected.
Table 1
The recognition result of user speech: Search the contact method of Wang Feng
The reply of chat robots: The contact method of Wang Feng is being inquired, please later
The recognition result of user speech: It is the rich of good harvest
The reply of chat robots: The contact method of Wang Feng is being inquired, please later
For example, the recognition result of the voice 121 of user 110 is " contact method for searching Wang Feng ", chat robots 120 are raw 122 are replied at corresponding as " inquiring the contact method of Wang Feng, please later ".Since originally meaning for user 110 " searches Wang Feng Contact method ", thus it corrects the recognition result of voice 121 in voice 123, has said " being the rich of good harvest ".Merely Its robot 120 generates according to the content of correction and replys 124 " inquiring the contact method of Wang Feng, please later ".In this way, chatting Its robot 120 can accurately identify the true intention of user 110.
Fig. 2 shows according to an embodiment of the present disclosure for handling the flow chart of the method 200 of voice dialogue.It should manage Solution, method 200 can be executed by the chat robots 120 above with reference to described in Fig. 1.
In frame 202, in response to receiving the first voice from the user, provides and is replied for the first of the first voice, In first reply be generated based on the first recognition result to the first voice.For example, in chat robots 120 from user 110 After receiving voice 121, corresponding reply 122 is provided to user 110.In embodiment of the disclosure, first reply due to Speech recognition is abnormal and fails to accurately identify the intention of user, for example, it may be possible to cause the reply of mistake, or prompt " cannot be known " user is not required to say again.
In some embodiments, it can only be provided by way of voice to user 110 and reply 122.In other implementations In example, in order to allow the more intuitive recognition result for understanding its voice of user, (such as it can also be set by display by visual mode It is standby) recognition result of voice 121 is presented and replys 122 textual form.
In frame 204, the second voice from the user is received, wherein the second voice is for correcting or supplementing the first identification knot Fruit.For example, fail the intention for correctly identifying user due to replying 122 (i.e. first reply), chat robots 120 are from user 110 Further receive the voice 123 for correcting or supplementing.That is, leading to machine of chatting due to speech recognition errors 120 wrong identification of people or when can not identify the voice content of user 110, user 110 can by further voice dialogue come Carry out active clarification.For example, user can actively correct or actively supplement, including one or more texts are corrected by voice And/or number, or supplemented by voice.
In frame 206, it is based on the first voice and the second voice, generates and is replied provide a user second, wherein second time Compound ratio first replys the intention for being more in line with user.For example, chat robots 120 based on to voice 123 recognition result and To the recognition result of voice 121, is provided to user 110 and reply 124.Since the knowledge to voice 121 is corrected or supplemented to voice 123 Not as a result, chat robots 120 is enabled to further understand the intention of user 110, thus reply 124 (i.e. second replys) ratios It replys 122 (first replys) and is more in line with the intention of user 110, to solve the exception in speech recognition, and promote chat The user experience of process.
In some embodiments, it if second replys the intention for correctly identifying user, can execute and second time The associated movement of complex phase.For example, since the second voice is the correction or supplement to the first recognition result of the first voice, so that chatting Its robot can accurately identify the intention of user, thus chat robots can be executed or be indicated to execute and reply phase with second Associated movement, such as call, digital map navigation etc..In some embodiments, if offer second reply after threshold Further voice is not received from user in the value time, the intention that the second reply has met user can be defaulted as, into And it can directly execute movement associated with the second reply.It should be appreciated that can also be while generation second is replied or preceding Movement associated with the second reply is directly executed afterwards, and the intention for whether having met user is replied but regardless of second.
In some embodiments, if second replys the intention of still unidentified user, it can receive from the user the Three voices.Next, being based at least partially on third voice, third reply is provided a user.For example, although second replys than the One replys the intention for being more in line with user, but may still can not fully meet the demand of user.In this case, user Further voice can be initiated to continue the speech recognition result before correcting or supplementing.
Fig. 3 shows according to an embodiment of the present disclosure for handling the schematic diagram of the process 300 of speech message.It should manage Solution, process 300 can execute by the chat robots 120 above with reference to described in Fig. 1, and more than process 300 can be The example implementation of reply is provided with reference to described in Fig. 2 based on the voice received.
In frame 302, voice from the user is inputted, in frame 304, voice will be inputted by automatic speech recognition (ASR) It is converted into text.In frame 306, the expression shape that computer can identify is converted text to by natural language understanding (NLU) Formula.In frame 308, the intention and word slot in text are extracted, and safeguard (DST) by the dialogue state of itself and history by dialogue state It is integrated.In frame 310, according to current dialogue state, current state is best suitable for by acting candidate sequencing selection one Movement.After being acted, in frame 312, by generating natural language (NLG), and in frame 314 by the natural language of generation It carries out speech synthesis (TTS).Then, in frame 316, voice is exported to be supplied to user.In process 300, frame 302, frame 304, Frame 314 and frame 316 are related to speech processes process, and frame 306, frame 308, frame 310 and frame 312 are related to natural language processing Journey, wherein dialogue state maintenance and the candidate sequence composition dialogue management of movement, can voice-based semantic expressiveness and it is current on The movement to be executed hereafter is generated, and updates context.
In some embodiments, in order to make user that there is better interactive experience, speech recognition result can be shown It shows on equipment (such as display of user equipment).In such a case, it is possible to which voice is presented simultaneously by display equipment Recognition result and its reply, so that user could be aware that the recognition result of its voice.
Fig. 4 shows the process of the method 400 according to an embodiment of the present disclosure that Text region mistake is solved by dialogue Figure.It should be appreciated that method 400 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 402 can be with For the example implementation of the frame 204 above with reference to described in Fig. 2, frame 404 and 406 can be frame 206 above with reference to described in Fig. 2 Example implementation.
In frame 402, the second voice for correcting one or more texts in the first recognition result is received, wherein to the There are Text region mistakes for first recognition result of one voice.In frame 404, entangled using to the second recognition result of the second voice Positive first recognition result.Second is presented by display equipment and is replied based on the first corrected recognition result in frame 406.Example Such as, following table 2-3 shows the voice dialogue corrected to one or more errors in text of the first speech recognition result and shows Example.
Table 2
The recognition result of user speech: I will go to west two odd
The reply of chat robots: Sorry, I does not understand your meaning
The recognition result of user speech: It is West Second Qi
The reply of chat robots: Inquiring the navigation routine of West Second Qi
Table 3
The recognition result of user speech: I will go to practise two surprises
The reply of chat robots: Sorry, I does not understand your meaning
The recognition result of user speech: It is the west of thing, the flag of national flag
The reply of chat robots: Inquiring the navigation routine of West Second Qi
In the example of table 2, user corrects for single wrongly written character " surprise " in the recognition result of the first voice in the second voice; In the example of table 3, user corrects multiple wrongly written characters " habit " and " surprise " in the recognition result of the first voice in the second voice.It connects Get off, chat robots provide the second reply based on the speech recognition result after correction, for example, entangling in the example of table 2 and table 3 Recognition result after just is " I will go to West Second Qi ".Chinese text is carried out although should be appreciated that in the embodiment of the above table 2-3 It corrects, however, the disclosure is speech recognition errors that embodiment can be used for correcting other language.
Fig. 5 shows the process of the method 500 according to an embodiment of the present disclosure that number identification mistake is solved by dialogue Figure.It should be appreciated that method 500 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 502 can be with For the example implementation of the frame 204 above with reference to described in Fig. 2, frame 504 and 506 can be frame 206 above with reference to described in Fig. 2 Example implementation.
It in frame 502, receives for correcting one or more the second digital voices in the first recognition result, wherein to the There is number identification mistake in the first recognition result of one voice.In frame 504, using in the second recognition result to the second voice One or more numbers correct the first recognition result.It is set based on the first corrected recognition result by display in frame 506 It is standby that the second reply is presented.For example, following table 4 shows the voice pair corrected to the numeric error of the first speech recognition result Talk about example.
Table 4
The recognition result of user speech: Phone 13511652271
The reply of chat robots: Phoning 13511652271
The recognition result of user speech: It is 110
The reply of chat robots: Phoning 13511052271
In the example of table 4, user corrects single error number in the recognition result of the first voice in the second voice " 6 ", i.e., in telephone number the 6th number should be 0 rather than 6.In some embodiments, in the mistake of user's correcting digital Cheng Zhong can match based on the maximum between the second voice and the recognition result of the first voice to determine the number corrected of user's needs Character segment.For example, digital " 110 " are utmostly matched with " 116 " in recognition result, thus can be true in the example of table 4 Fixed number word " 110 " is for correcting " 116 " in previous round dialogue.In some embodiments, user can also correct language simultaneously Both the text of mistake in sound recognition result and number.
The inventors of the present application found that certain speech recognition errors be only capable of being found by visual display (for example, due to Unisonance and cause to be Text region mistake), and some speech recognition errors can be also found by voice, such as telephone number In the identified mistakes of one or more number.In some embodiments, it is also possible to not by showing that equipment shows the first voice Recognition result and first reply, and the first reply is only provided by voice.In this case, user can be based on first Set of number is read aloud in reply, to issue the voice for correcting the one or more number in set of number.Next, chatting Its robot is based on the set of number after correction, to provide the second reply for being more in line with user's intention.It should be appreciated that not showing The scene of recognition result both can be the scene intelligent sound box of display (for example, do not have) without display, can also be with For with display, still display is not used in the scene of display speech recognition result (for example, under smart phone black state Voice dialogue).
Fig. 6 shows the flow chart of the method 600 according to an embodiment of the present disclosure by dialogue supplement recognition result.It answers Work as understanding, method 600 can be executed by the chat robots 120 above with reference to described in Fig. 1, and frame 602 can for On the frame 204 with reference to described in Fig. 2 example implementation, frame 604 and 606 can show for above with reference to described in Fig. 2 frame 206 Example is realized.
In frame 602, the second voice for supplementing the first recognition result is received.For example, the first identification to the first voice As a result do not reflect the demand of user completely.In frame 604, the first voice is semantically being supplemented in response to the content of the second voice Content combines the first recognition result and to the second recognition result of the second voice to generate third recognition result.
The relationship whether content of two sections of voices there is supplement to illustrate in other words can be determined by various means. For example, in some embodiments, it can be determined that the content of the second voice whether can be in the content semantically with the first voice It is no that there is continuous relationship.For example, if the content of two sections of voices can be used as entirety and be parsed together, it can be assumed that the two has There is semantic continuous relationship, and determines that the content of the second voice is the supplement of the first voice accordingly.Alternatively or additionally, one In a little embodiments, it can be determined that whether the probability that the content of two sections of voices occurs simultaneously is greater than predetermined threshold.For example, if two The content of voice is all to occur simultaneously most of the time, then can be determined that the content of the second voice is the supplement of the first voice.
In frame 606, it is based on third recognition result, second is presented by display equipment and replys.For example, following table 5 is shown The dialogue example that first speech recognition result is supplemented.
Table 5
In the example of table 5, user's is intended to Peking University west gate, and speech recognition system is receiving " north Identification is just had begun after capital university ", thus not can recognize that the accurate intention of user.In this case, user can lead to Natural language dialogue supplemental information is crossed, the recognition result for then combining the two voices generates new recognition result " I will go to Beijing University west gate " then generates corresponding reply based on new recognition result.In some embodiments, complete for not expressing Recognition result, user can direct supplemental information, without wait recognition result and parsing result return, do not need to repeat yet Previous conversation content.
Fig. 7 shows according to an embodiment of the present disclosure for handling the block diagram of the device 700 of voice dialogue.Such as Fig. 7 institute Show, device 700 includes the first offer module 710, speech reception module 720 and the second offer module 730.First provides module 710 are configured to respond to receive the first voice from the user, provide and reply for the first of the first voice, wherein first Reply is generated based on the first recognition result to the first voice.Speech reception module 720 is configured as receiving from user The second voice, wherein the second voice is for correcting or supplement the first recognition result.Second offer module 730 is configured as base It in the first voice and the second voice, generates and is replied provide a user second, wherein the second reply is replied than first and more accorded with Share the intention at family.
In some embodiments, wherein the first offer module 710 includes the first presentation module, the first presentation module is configured It is replied for the first recognition result and first is presented by display equipment.
In some embodiments, wherein the second offer module 730 includes the first correction module, the first correction module is configured The Text region mistake in the first recognition result is corrected to use the second recognition result to the second voice.
In some embodiments, wherein the second offer module 730 includes the second correction module, the second correction module is configured Number to use one or more numbers in the second recognition result to the second voice to correct in the first recognition result is known Not mistake.
In some embodiments, wherein the second voice is for supplementing the first recognition result, and second provides the packet of module 730 It includes:Composite module, be configured to respond to determine the content of the second voice in the content for semantically supplementing the first voice, combination the One recognition result and to the second recognition result of the second voice to generate third recognition result;And second present module, matched It is set to based on third recognition result, is presented second by display equipment and replys.
In some embodiments, wherein the first offer module 710 includes the first speech rendering module, the first voice provides mould Block is configured as only providing the first reply by voice, wherein the first reply includes the set of number from the first recognition result.
In some embodiments, wherein the second voice is for correcting in set of number, and second provides the packet of module 730 It includes:Third corrects module, is configured with one or more numbers in the second recognition result to the second voice to correct Set of number;And second speech rendering module, it is configured as passing through voice based on corrected set of number and providing second time It is multiple.
In some embodiments, device 700 further includes action executing module, and action executing module is configured to respond to Two reply the intention of identified user, execute movement associated with the second reply.
In some embodiments, device 700 further includes:Third speech reception module is configured to respond to the second reply The intention of still unidentified user receives third voice from the user;And third provides module, is configured as at least partly Based on third voice, third reply is provided a user.
It should be appreciated that the first offer module 710, speech reception module 720 and second shown in Fig. 7 provide module 730 can be included in the chat robots 120 with reference to described in Fig. 1.Furthermore, it is to be understood that module shown in Fig. 7 Can execute with reference to embodiment of the disclosure method or in the process the step of or movement.
Fig. 8 shows the schematic block diagram that can be used to implement the example apparatus 800 of embodiment of the disclosure.It should manage Solution, it is described for handling the device 700 or chat robots of voice dialogue that equipment 800 can be used to implement the disclosure 120.As shown, equipment 800 includes central processing unit (CPU) 801, it can be according to being stored in read-only memory (ROM) Computer program instructions in 802 are loaded into the computer in random access storage device (RAM) 803 from storage unit 808 Program instruction, to execute various movements appropriate and processing.In RAM 803, can also store equipment 800 operate it is required various Program and data.CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 are also connected to bus 804.
Multiple components in equipment 800 are connected to I/O interface 805, including:Input unit 806, such as keyboard, mouse etc.; Output unit 807, such as various types of displays, loudspeaker etc.;Storage unit 808, such as disk, CD etc.;And it is logical Believe unit 809, such as network interface card, modem, wireless communication transceiver etc..Communication unit 809 allows equipment 800 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Processing unit 801 executes each method and process as described above, such as method 200, process 300, method 400, method 500 and method 600.For example, in some embodiments, method 200, process 300, method 400, method 500 with And method 600 can be implemented as computer software programs, be tangibly embodied in machine readable media, such as storage unit 808.In some embodiments, computer program it is some or all of can via ROM 802 and/or communication unit 809 and It is loaded into and/or is installed in equipment 800.When computer program loads to RAM 803 and by CPU 801 execute when, can hold The one or more movements or step of row method as described above 200, process 300, method 400, method 500 and method 600. Alternatively, in other embodiments, CPU 801 can by other any modes (for example, by means of firmware) appropriate by It is configured to execution method 200, process 300, method 400, method 500 and method 600.
Function described herein can be executed at least partly by one or more hardware logic components.Example Such as, without limitation, the hardware logic component for the exemplary type that can be used includes:Field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), the system (SOC) of system on chip, load programmable logic device (CPLD), etc..
For implement disclosed method program code can using any combination of one or more programming languages come It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of the disclosure, machine readable media can be tangible medium, may include or is stored for The program that instruction execution system, device or equipment are used or is used in combination with instruction execution system, device or equipment.Machine can Reading medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media can include but is not limited to electricity Son, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content any conjunction Suitable combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable meter Calculation machine disk, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM Or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage facilities or Any appropriate combination of above content.
Although this should be understood as requiring acting in this way in addition, depicting each movement or step using certain order Or step is executed with shown certain order or in sequential order, or requires the movement of all diagrams or step that should be performed To obtain desired result.Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although above Several specific implementation details are contained in discussion, but these are not construed as the limitation to the scope of the present disclosure.In list Certain features described in the context of only embodiment can also be realized in combination in single realize.On the contrary, single Various features described in the context of realization can also be realized individually or in any suitable subcombination multiple In realization.
Although having used the implementation specific to the language description of the structure feature and/or method logical action disclosure Example it should be appreciated that theme defined in the appended claims is not necessarily limited to special characteristic described above or dynamic Make.On the contrary, special characteristic described above and movement are only to realize the exemplary forms of claims.

Claims (20)

1. a kind of method for handling voice dialogue, including:
In response to receiving the first voice from the user, provides and replied for the first of first voice, described first time Complex radical is generated in the first recognition result to first voice;
The second voice from the user is received, second voice is for correcting or supplementing first recognition result; And
Based on first voice and second voice, generates and replied provided to the user second, described second time First replys the intention for being more in line with the user described in compound ratio.
2. according to the method described in claim 1, wherein provide for first voice first reply include:
First recognition result and described first is presented by display equipment to reply.
3. according to the method described in claim 2, wherein generate it is described second reply include:
Using the second recognition result to second voice, to correct the Text region mistake in first recognition result.
4. according to the method described in claim 2, wherein generate it is described second reply include:
Using the one or more number in the second recognition result to second voice, to correct first recognition result In number identification mistake.
5. according to the method described in claim 2, wherein generate it is described second reply include:
In response to determination second voice content in the content for semantically supplementing first voice, combine described first and know Other result and to the second recognition result of second voice to generate third recognition result;And
Based on third recognition result, described second is presented by the display equipment and is replied.
6. according to the method described in claim 1, wherein provide for first voice first reply include:
It provides described first by voice to reply, first reply includes the set of number from first recognition result.
7. according to the method described in claim 6, wherein generate it is described second reply include:
The set of number is corrected using one or more numbers in the second recognition result to second voice;And
Based on the corrected set of number, described second is provided by voice and is replied.
8. method according to any one of claims 1-7 further includes:
The intention for replying the identified user in response to described second executes associated with second reply dynamic Make.
9. method according to any one of claims 1-7 further includes:
The intention for replying the still unidentified user in response to described second receives the third voice from the user; And
It is based at least partially on the third voice, third is provided to the user and replys.
10. it is a kind of for handling the device of voice dialogue, including:
First provides module, is configured to respond to receive the first voice from the user, provide for first voice First reply, it is described first reply be generated based on the first recognition result to first voice;
Speech reception module, is configured as receiving the second voice from the user, second voice for correcting or Supplement first recognition result;And
Second provides module, is configured as based on first voice and second voice, generation will be provided to the user Second reply, described second replys to reply than described first and is more in line with the intention of the user.
11. device according to claim 10, wherein the first offer module includes:
First is presented module, is configured as that first recognition result and first reply are presented by display equipment.
12. device according to claim 11, wherein the second offer module includes:
First corrects module, is configured with the second recognition result to second voice to correct the first identification knot Text region mistake in fruit.
13. device according to claim 11, wherein the second offer module includes:
Second corrects module, and the one or more number being configured in the second recognition result to second voice is come Correct the number identification mistake in first recognition result.
14. device according to claim 11, wherein the second offer module includes:
Composite module is configured to respond to determine that the content of second voice is semantically being supplemented in first voice Hold, combines first recognition result and to the second recognition result of second voice to generate third recognition result;And
Second is presented module, is configured as being presented described second based on third recognition result by the display equipment and being replied.
15. device according to claim 10, wherein the first offer module includes:
First speech rendering module is configured as providing first reply by voice, and described first replys including coming from institute State the set of number of the first recognition result.
16. device according to claim 15, wherein the second offer module includes:
Third corrects module, and the one or more number being configured in the second recognition result to second voice is come Correct the set of number;And
Second speech rendering module is configured as only providing described second by voice based on the corrected set of number It replys.
17. device described in any one of 0-16 according to claim 1, further includes:
Action executing module is configured to respond to the described second intention for replying the identified user, execute with Described second replys associated movement.
18. device described in any one of 0-16 according to claim 1, further includes:
Third speech reception module is configured to respond to the described second intention for replying the still unidentified user, connects Receive the third voice from the user;And
Third provides module, is configured as being based at least partially on the third voice, provides third to the user and replys.
19. a kind of electronic equipment, the electronic equipment include:
One or more processors;And
Storage device, for storing one or more programs, one or more of programs are when by one or more of processing Device executes, so that the electronic equipment realizes method according to claim 1 to 9.
20. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor Method according to claim 1 to 9.
CN201810541680.6A 2018-05-30 2018-05-30 Method, apparatus, electronic device and computer readable storage medium for processing voice conversations Active CN108877792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810541680.6A CN108877792B (en) 2018-05-30 2018-05-30 Method, apparatus, electronic device and computer readable storage medium for processing voice conversations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810541680.6A CN108877792B (en) 2018-05-30 2018-05-30 Method, apparatus, electronic device and computer readable storage medium for processing voice conversations

Publications (2)

Publication Number Publication Date
CN108877792A true CN108877792A (en) 2018-11-23
CN108877792B CN108877792B (en) 2023-10-24

Family

ID=64335845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810541680.6A Active CN108877792B (en) 2018-05-30 2018-05-30 Method, apparatus, electronic device and computer readable storage medium for processing voice conversations

Country Status (1)

Country Link
CN (1) CN108877792B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109712616A (en) * 2018-11-29 2019-05-03 平安科技(深圳)有限公司 Telephone number error correction method, device and computer equipment based on data processing
CN109922371A (en) * 2019-03-11 2019-06-21 青岛海信电器股份有限公司 Natural language processing method, equipment and storage medium
CN110223694A (en) * 2019-06-26 2019-09-10 百度在线网络技术(北京)有限公司 Method of speech processing, system and device
CN110299152A (en) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 Interactive output control method, device, electronic equipment and storage medium
CN110347815A (en) * 2019-07-11 2019-10-18 上海蔚来汽车有限公司 Multi-task processing method and multitasking system in speech dialogue system
CN110738997A (en) * 2019-10-25 2020-01-31 百度在线网络技术(北京)有限公司 information correction method, device, electronic equipment and storage medium
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
WO2023040658A1 (en) * 2021-09-18 2023-03-23 华为技术有限公司 Speech interaction method and electronic device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037820A (en) * 2010-08-11 2012-02-23 Murata Mach Ltd Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method
CN105094315A (en) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for smart man-machine chat based on artificial intelligence
CN105468582A (en) * 2015-11-18 2016-04-06 苏州思必驰信息科技有限公司 Method and device for correcting numeric string based on human-computer interaction
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
US9728188B1 (en) * 2016-06-28 2017-08-08 Amazon Technologies, Inc. Methods and devices for ignoring similar audio being received by a system
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN107305483A (en) * 2016-04-25 2017-10-31 北京搜狗科技发展有限公司 A kind of voice interactive method and device based on semantics recognition
JP2018004976A (en) * 2016-07-04 2018-01-11 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Voice interactive method, voice interactive device and voice interactive program
CN107870977A (en) * 2016-09-27 2018-04-03 谷歌公司 Chat robots output is formed based on User Status
CN107943914A (en) * 2017-11-20 2018-04-20 渡鸦科技(北京)有限责任公司 Voice information processing method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012037820A (en) * 2010-08-11 2012-02-23 Murata Mach Ltd Voice recognition apparatus, voice recognition apparatus for picking, and voice recognition method
CN105094315A (en) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for smart man-machine chat based on artificial intelligence
CN105468582A (en) * 2015-11-18 2016-04-06 苏州思必驰信息科技有限公司 Method and device for correcting numeric string based on human-computer interaction
CN107305483A (en) * 2016-04-25 2017-10-31 北京搜狗科技发展有限公司 A kind of voice interactive method and device based on semantics recognition
US9728188B1 (en) * 2016-06-28 2017-08-08 Amazon Technologies, Inc. Methods and devices for ignoring similar audio being received by a system
JP2018004976A (en) * 2016-07-04 2018-01-11 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Voice interactive method, voice interactive device and voice interactive program
CN107870977A (en) * 2016-09-27 2018-04-03 谷歌公司 Chat robots output is formed based on User Status
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
CN107045496A (en) * 2017-04-19 2017-08-15 畅捷通信息技术股份有限公司 The error correction method and error correction device of text after speech recognition
CN107943914A (en) * 2017-11-20 2018-04-20 渡鸦科技(北京)有限责任公司 Voice information processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》 *
盛世流光中: "苹果Siri对比三星Bixby,语音助手都成精了", 《爱奇艺》, 23 November 2017 (2017-11-23), pages 19 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109712616B (en) * 2018-11-29 2023-11-14 平安科技(深圳)有限公司 Telephone number error correction method and device based on data processing and computer equipment
CN109712616A (en) * 2018-11-29 2019-05-03 平安科技(深圳)有限公司 Telephone number error correction method, device and computer equipment based on data processing
CN109922371A (en) * 2019-03-11 2019-06-21 青岛海信电器股份有限公司 Natural language processing method, equipment and storage medium
CN109922371B (en) * 2019-03-11 2021-07-09 海信视像科技股份有限公司 Natural language processing method, apparatus and storage medium
CN110223694B (en) * 2019-06-26 2021-10-15 百度在线网络技术(北京)有限公司 Voice processing method, system and device
CN110223694A (en) * 2019-06-26 2019-09-10 百度在线网络技术(北京)有限公司 Method of speech processing, system and device
CN110299152A (en) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 Interactive output control method, device, electronic equipment and storage medium
CN110347815A (en) * 2019-07-11 2019-10-18 上海蔚来汽车有限公司 Multi-task processing method and multitasking system in speech dialogue system
CN110738997B (en) * 2019-10-25 2022-06-17 百度在线网络技术(北京)有限公司 Information correction method and device, electronic equipment and storage medium
CN110738997A (en) * 2019-10-25 2020-01-31 百度在线网络技术(北京)有限公司 information correction method, device, electronic equipment and storage medium
CN112002321A (en) * 2020-08-11 2020-11-27 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
CN112002321B (en) * 2020-08-11 2023-09-19 海信电子科技(武汉)有限公司 Display device, server and voice interaction method
WO2023040658A1 (en) * 2021-09-18 2023-03-23 华为技术有限公司 Speech interaction method and electronic device

Also Published As

Publication number Publication date
CN108877792B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN108877792A (en) For handling method, apparatus, electronic equipment and the computer readable storage medium of voice dialogue
US10460029B2 (en) Reply information recommendation method and apparatus
US20210280190A1 (en) Human-machine interaction
US11217236B2 (en) Method and apparatus for extracting information
CN109002501A (en) For handling method, apparatus, electronic equipment and the computer readable storage medium of natural language dialogue
CN112926306B (en) Text error correction method, device, equipment and storage medium
CN107430616A (en) The interactive mode of speech polling re-forms
CN107680588B (en) Intelligent voice navigation method, device and storage medium
CN112506359B (en) Method and device for providing candidate long sentences in input method and electronic equipment
CN112307188B (en) Dialog generation method, system, electronic device and readable storage medium
US20220358955A1 (en) Method for detecting voice, method for training, and electronic devices
WO2019035373A1 (en) Information processing device, information processing method, and program
CN112527127B (en) Training method and device for input method long sentence prediction model, electronic equipment and medium
US20190073994A1 (en) Self-correcting computer based name entity pronunciations for speech recognition and synthesis
CN115905497B (en) Method, device, electronic equipment and storage medium for determining reply sentence
US20230153550A1 (en) Machine Translation Method and Apparatus, Device and Storage Medium
CN114970666B (en) Spoken language processing method and device, electronic equipment and storage medium
JP7349523B2 (en) Speech recognition method, speech recognition device, electronic device, storage medium computer program product and computer program
CN114758649B (en) Voice recognition method, device, equipment and medium
CN112669839B (en) Voice interaction method, device, equipment and storage medium
CN113743127B (en) Task type dialogue method, device, electronic equipment and storage medium
CN114722171B (en) Multi-round dialogue processing method and device, electronic equipment and storage medium
CN114860910A (en) Intelligent dialogue method and system
CN112466278B (en) Voice recognition method and device and electronic equipment
US20230196026A1 (en) Method for Evaluating Text Content, and Related Apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant