CN104679472A - Man-machine voice interactive method and device - Google Patents

Man-machine voice interactive method and device Download PDF

Info

Publication number
CN104679472A
CN104679472A CN201510080163.XA CN201510080163A CN104679472A CN 104679472 A CN104679472 A CN 104679472A CN 201510080163 A CN201510080163 A CN 201510080163A CN 104679472 A CN104679472 A CN 104679472A
Authority
CN
China
Prior art keywords
voice
result
speech recognition
terminal
recognition server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510080163.XA
Other languages
Chinese (zh)
Inventor
陈本东
谢文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510080163.XA priority Critical patent/CN104679472A/en
Publication of CN104679472A publication Critical patent/CN104679472A/en
Priority to PCT/CN2015/083207 priority patent/WO2016127550A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Abstract

The invention provides a man-machine voice interactive method and device. The man-machine voice interactive method comprises the steps that in the voice broadcast process of a terminal for a broadcast result, a voice recognition result which is sent by a voice recognition server is received; the voice recognition result is sent to a QU server for context understanding, and a result of the context understanding is received and stored; according to the stored result of the context understanding, the intention of a voice input by a user is determined, and a broadcast result is generated according to the intention; the broadcast result is sent to the voice recognition server, so that the voice recognition server can send the broadcast result to the terminal for voice broadcast. According to the invention, in the man-machine voice interactive process, the voice broadcast and the voice input of the user are carried out at the same time, thus a recoding state and a broadcast state do not need to be repeatedly switched in the man-machine interactive process, and many rounds of dialogue can be more coherent.

Description

Man machine language's exchange method and device
Technical field
The present invention relates to Internet technical field, particularly relate to a kind of man machine language's exchange method and device.
Background technology
Speech recognition and man machine language have had very long history alternately, existing various voice assistant class application (Application; Hereinafter referred to as: APP), in mode of operation, the triggering of recording is by button, and after recording, answer reported by machine, when reporting answer, can not record.That is, existing voice assistant class APP can only carry out half-duplex operation, and when namely machine is reported, user is mute, and when user speaks, machine can not be reported.
So just need machine ceaselessly to switch between recording and report two states, often need the operation of user to intervene, use very inconvenient.Now, some voice assistant class APP are provided with autoanswer mode, and namely machine enters recording state after reporting automatically, but under this autoanswer mode, machine automatically switches sometimes, does not sometimes automatically switch, and allows user be at a loss on the contrary.
In sum, it is very inconvenient that existing man machine language's interactive mode uses, and each question-response, all needs user intervention, complex operation, and man-machine interaction mode is very unnatural yet, and user experience is poor.
Summary of the invention
Object of the present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.
For this reason, first object of the present invention is to propose a kind of man machine language's exchange method.Pass through the method, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user can be carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
Second object of the present invention is to propose a kind of man machine language's interactive device.
To achieve these goals, man machine language's exchange method of first aspect present invention embodiment, comprise: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; Send to keyword to understand server institute's speech recognition result and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword; Determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, generate according to described intention and report result; Described report result is sent to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.
Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice identification result that speech recognition server sends can be received, the intention of the voice that user inputs is determined according to upper speech recognition result, and generate report result according to this intention, then report result is sent to speech recognition server, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
To achieve these goals, man machine language's exchange method of second aspect present invention embodiment, comprise: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; Described voice are identified, voice identification result is sent to and takes turns dialog server more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; Receive the report result that described many wheel dialog servers send, send to described terminal to carry out voice broadcast described report result.
Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiving terminal sends, above-mentioned voice are identified, then voice identification result is sent to and take turns dialog server more, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then speech recognition server receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
To achieve these goals, man machine language's exchange method of third aspect present invention embodiment, comprising: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives the voice of the user's input using described terminal; The voice that described user inputs are sent to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; Receive and report the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server.
Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, terminal receives and reports the report result of speech recognition server transmission; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
To achieve these goals, man machine language's interactive device of fourth aspect present invention embodiment, comprise: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; And to be sent to by institute's speech recognition result keyword to understand after server carries out context understanding at sending module, receive the result that the context understanding that server sends understood in described keyword; Described sending module, the voice identification result for being received by described receiver module sends to keyword to understand server to carry out context understanding; Preserve module, for preserving the result of the context understanding that described receiver module receives; Determination module, the result for the context understanding of preserving according to described preservation module determines the intention of the voice that described user inputs; Generation module, the intention for determining according to described determination module generates reports result; Described sending module, the report result also for being generated by described generation module sends to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.
Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module can receive the voice identification result that speech recognition server sends, determination module determines the intention of the voice that user inputs according to upper speech recognition result, generation module generates according to the intention that determination module is determined and reports result, then report result is sent to speech recognition server by sending module, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
To achieve these goals, man machine language's interactive device of fifth aspect present invention embodiment, comprise: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; And after voice identification result is sent to many wheel dialog servers by sending module, receive the report result that described many wheel dialog servers send; Identification module, identifies for the voice received described receiver module; Described sending module, take turns dialog server for being sent to by the voice identification result of described identification module identification more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; And after described receiver module receives the report result of described many wheel dialog servers transmission, send to described terminal to carry out voice broadcast described report result.
Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiver module receiving terminal sends, identification module identifies above-mentioned voice, then voice identification result sends to and takes turns dialog server more by sending module, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then receiver module receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result by sending module, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
To achieve these goals, man machine language's interactive device of sixth aspect present invention embodiment, comprising: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiving the voice of the user's input using described terminal, and after described voice are sent to described speech recognition server by sending module, receive the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server, described sending module, voice for being received by described receiver module send to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and the intention of the voice that described user inputs is determined according to the result of the context understanding of preserving, and generate report result according to described intention, report module, for reporting the report result that described receiver module receives.
Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module receives the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server by sending module, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate report result according to above-mentioned intention, then, receiver module receives and is reported the report result of speech recognition server transmission by report module, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the process flow diagram of the present invention's man-machine voice interactive method embodiment;
Fig. 2 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention;
Fig. 3 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention;
Fig. 4 is the schematic diagram of the annexation embodiment in the man-machine voice interactive method of the present invention;
Fig. 5 is the structural representation of the present invention's man-machine voice interaction device embodiment;
Fig. 6 is the structural representation of man-machine another embodiment of voice interaction device of the present invention;
Fig. 7 is the structural representation of man-machine another embodiment of voice interaction device of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.On the contrary, embodiments of the invention comprise fall into attached claims spirit and intension within the scope of all changes, amendment and equivalent.
Fig. 1 is the process flow diagram of the present invention's man-machine voice interactive method embodiment, and as shown in Figure 1, this man machine language's exchange method can comprise:
Step 101, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that speech recognition server sends, upper speech recognition result is that speech recognition server identifies rear transmission to using the voice of the user of above-mentioned terminal input.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, use the user of above-mentioned terminal still can continue to input voice, that is, this terminal is being carried out in the process of voice broadcast to report result, still continuing the voice receiving user's input, and send to speech recognition server to carry out speech recognition the voice that user inputs constantly, then voice identification result sends to and takes turns dialog server more by speech recognition server constantly, many wheels dialog server receives the voice identification result that speech recognition server sends constantly.Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.
Particularly, the voice identification result receiving speech recognition server transmission can be: receive above-mentioned speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching above-mentioned predetermined degree of confidence of transmission.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.
In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that many wheel dialog servers perform follow-up step 102 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal receives report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.
Step 102, sends to keyword to understand (Query Understand upper speech recognition result; Hereinafter referred to as: QU) server carries out context understanding, receives and preserves the result of context understanding that above-mentioned QU server sends.
Step 103, determines the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generates report result according to above-mentioned intention.
In the present embodiment, many wheel dialog servers according to the intention of the voice of the clear and definite user's input of the result of the context understanding of preserving, then directly can generate according to above-mentioned intention and report result;
Or generating report result according to above-mentioned intention can be: obtain the information corresponding with above-mentioned intention according to above-mentioned intention from resource access server, generate report result according to the information obtained.
Step 104, sends to described speech recognition server by above-mentioned report result, so that above-mentioned report result sends to above-mentioned terminal to carry out voice broadcast by speech recognition server.
In the present embodiment, according to the user profile of above-mentioned user and current state, can also obtain and be applicable to the content recommending above-mentioned user, and trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user is sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal.
That is, in the present embodiment, many wheels dialog server has learning ability, can according to the current state (such as: current location and/or current session content etc.) of the user profile of user (such as: the schedule of user and/or the song etc. of listening) and user, analyze idea and the wish of user, obtain and be applicable to the content recommending user, then many wheel dialog servers can trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user can be sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal.Dialog procedure is afterwards identical with the process that step 101 ~ step 104 describes, and does not repeat them here.
In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice identification result that speech recognition server sends can be received, the intention of the voice that user inputs is determined according to upper speech recognition result, and generate report result according to this intention, then report result is sent to speech recognition server, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
Fig. 2 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention, and as shown in Figure 2, this man machine language's exchange method can comprise:
Step 201, carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receive the voice that above-mentioned terminal sends, above-mentioned voice use the user of above-mentioned terminal to input to above-mentioned terminal.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, speech recognition server can also receive the voice that above-mentioned terminal sends, that is, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user is carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states.
Step 202, above-mentioned voice are identified, voice identification result is sent to and takes turns dialog server more, so that upper speech recognition result sends to QU server to carry out context understanding by many wheel dialog servers, receive and preserve the result of context understanding that QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.
Particularly, carry out identification to above-mentioned voice to comprise: the starting and ending being determined every words in above-mentioned voice by quiet detection technique.
In the present embodiment, use quiet detection technique, speech recognition server can realize the cutting to sentence, and namely speech recognition server can determine the starting and ending of every words in above-mentioned voice.
Particularly, voice identification result is sent to how wheel dialog server can be: after determining that the voice identification result obtained reaches predetermined degree of confidence, sent to by the voice identification result reaching above-mentioned predetermined degree of confidence and take turns dialog server more.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.
In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that the mode that many wheel dialog servers describe according to the present invention's step 102 embodiment illustrated in fig. 1 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal have received report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.
Step 203, receives the report result that many wheel dialog servers send, sends to above-mentioned terminal to carry out voice broadcast above-mentioned report result.
In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiving terminal sends, above-mentioned voice are identified, then voice identification result is sent to and take turns dialog server more, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then speech recognition server receives the report result that many wheel dialog servers send, and sends to terminal to carry out voice broadcast above-mentioned report result; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
Fig. 3 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention, and as shown in Figure 3, this man machine language's exchange method can comprise:
Step 301, carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives the voice of the user's input using above-mentioned terminal.
Particularly, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiving the voice using the user of above-mentioned terminal to input can be: the terminal used user is reported in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminate play from Text To Speech (Text to Speech; Hereinafter referred to as: the TTS) input of voice, only receives the voice of above-mentioned user input.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, user still can input voice to terminal, that is, user can by the voice broadcast to terminal input barge terminal, also can directly feed back the report result that terminal is reported, affect the ensuing report content of terminal, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.
Step 302, the voice that above-mentioned user inputs are sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.
Particularly, above-mentioned speech recognition server is sent to be the voice that user inputs: the voice of predetermined length user inputted send to above-mentioned speech recognition server.Wherein, above-mentioned predetermined length can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined length is not construed as limiting.
Particularly, send to above-mentioned speech recognition server also can be the voice that user inputs: the starting and ending being determined every words in the voice that above-mentioned user input by quiet detection technique, the recording comprising voice is sent to above-mentioned speech recognition server.
Due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, just the voice of the predetermined length of user's input are sent to above-mentioned speech recognition server, or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, and report result is sent to terminal by speech recognition server, and at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast to above-mentioned report result.
Step 303, receive and report speech recognition server send report result.Wherein, the report result that above-mentioned speech recognition server sends is that many wheel dialog servers send to above-mentioned speech recognition server.
In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, terminal receives and reports the report result of speech recognition server transmission; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
In Fig. 1, Fig. 2 of the present invention and the man machine language's exchange method provided embodiment illustrated in fig. 3, terminal, speech recognition server, take turns dialog server more, annexation between QU server and resource access server can as shown in Figure 4, Fig. 4 is the schematic diagram of the annexation embodiment in the man-machine voice interactive method of the present invention.
See Fig. 4, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, terminal receives the voice of the user's input using above-mentioned terminal.In the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, user still can input voice to terminal, that is, user can by the voice broadcast to terminal input barge terminal, also directly can feed back the report result that terminal is reported, thus following two kinds of session operational scenarios can be realized.
Session operational scenarios one: the voice broadcast of User break terminal
User: order
What terminal: you need?
User: Spicy diced chicken with peanuts, Beijing roast duck.
Terminal: good, is prepared as you and places an order, and Spicy diced chicken with peanuts is a
User: Spicy diced chicken with peanuts has not been wanted, changes diced chicken saute with green pepper into.
Terminal: good, is prepared as you and places an order, and Spicy diced chicken with peanuts is a, and Beijing roast duck is a.
Session operational scenarios two: the voice broadcast of user feedback terminal
People: how is weather in the past few days?
Machine: slightly well, today weather
People: grace
Machine (not stopping): tomorrow weather
People: grace, continues
Machine (not stopping): the day after tomorrow weather
People: OK
Machine: report complete.
Then, the voice that above-mentioned user inputs are sent to above-mentioned speech recognition server by terminal, speech recognition server identifies above-mentioned voice, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.
Here due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, the voice of the predetermined length just user inputted send to above-mentioned speech recognition server; Or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server identify above-mentioned voice, and voice identification result is sent to take turns dialog server more.Or, because user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, therefore, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server.
Then, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, report result is sent to terminal by speech recognition server, at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast, thus can realize following session operational scenarios to above-mentioned report result.
Session operational scenarios three: terminal interrupts the phonetic entry of user.
User: go where to play relatively good, very boringly recently thinks
Terminal (interrupting): I knows your demand, the Worker's Stadium has the concert of Deng Ziqi tonight, and current admission ticket has preferential, can consider
User: good, places an order.
Terminal: for you buy 9 Deng tonight purple chess concert admission ticket, admission fee xxx unit.
In addition, many wheels dialog server has learning ability, can according to the current state (such as: current location and/or current session content etc.) of the user profile of user (such as: the schedule of user and/or the song etc. of listening) and user, analyze idea and the wish of user, obtain and be applicable to the content recommending user, then many wheel dialog servers can trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user can be sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal, thus following session operational scenarios can be realized.
Session operational scenarios four: the schedule according to user recommends taxi information to user
Whether terminal: you have ordered the count of votes of 4 this afternoon, and the current time is 2 pm is that you order a taxi.
User: be out of use, I drives to myself.
Terminal: your car is restricted driving today.
User: OK, that helps me to be a special train.
Terminal: good, just a moment,please (... .), Wang master worker order, license plate number is xxxx, estimates to arrive for 3 minutes.
User: thank.
In the present invention, when terminal carries out voice broadcast to report result time, user still can input voice to terminal, then voice send to speech recognition server to identify by terminal, voice identification result sends to and takes turns dialog server more by speech recognition server, voice identification result sends to QU server to carry out context understanding by many wheels dialog server, then receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, then return to terminal according to above-mentioned intention generation report result and carry out voice broadcast, following 5 kinds of states can be realized:
1, terminal keep voice broadcast, under this state, user input voice may be " oh " or " interesting ");
2, terminal stops current report, terminates actualite, and under this state, the voice of user's input may be " being aware of " or " much of that ");
3, new topic opened by many wheel dialog server connection resource access servers, and under this state, the voice of user's input may be " intercutting lower Beijing weather ";
4, many wheel dialog server connection resource access servers go deep into topic, and under this state, the voice of user's input may be " Beijing weather " and " Shanghai ";
5, get back to before topic, under this state, user input voice may be " joke is before said and is over "; Also can take turns dialog server initiatively inquiry, the report the possibility of result that terminal receives is " weather is reported and is over, and also needs cross-talk before to say " more.
In sum, the present invention when not needing user's manual intervention operations such as () buttons, can maintain dialogue, ensures chat effect.
Fig. 5 is the structural representation of the present invention's man-machine voice interaction device embodiment, man machine language's interactive device in the present embodiment can as taking turns dialog server more, or a part for many wheel dialog servers realizes the present invention's flow process embodiment illustrated in fig. 1, as shown in Figure 5, this man machine language's interactive device can comprise: receiver module 51, sending module 52, preservation module 53, determination module 54 and generation module 55.
Wherein, receiver module 51, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that above-mentioned speech recognition server sends, upper speech recognition result is that speech recognition server identifies rear transmission to using the voice of the user of above-mentioned terminal input; And upper speech recognition result sent to after QU server carries out context understanding at sending module 52, receive the result of the context understanding that above-mentioned QU server sends.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, use the user of above-mentioned terminal still can continue to input voice, that is, this terminal is being carried out in the process of voice broadcast to report result, still continuing the voice receiving user's input, and send to speech recognition server to carry out speech recognition the voice that user inputs constantly, then voice identification result sends to and takes turns dialog server more by speech recognition server constantly, so receiver module 51 receives the voice identification result that speech recognition server sends constantly.Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.
Sending module 52, the voice identification result for being received by receiver module 51 sends to QU server to carry out context understanding.
Preserve module 53, for preserving the result of the context understanding that receiver module 51 receives.
Determination module 54, for determining the intention of the voice that above-mentioned user inputs according to the result of preserving the context understanding that module 53 is preserved.
Generation module 55, the intention for determining according to determination module 54 generates reports result.
Sending module 52, the report result also for being generated by generation module 55 sends to speech recognition server, so that above-mentioned report result sends to terminal to carry out voice broadcast by speech recognition server.
In the present embodiment, generation module 55, obtains the information corresponding with above-mentioned intention specifically for the intention determined according to determination module 54 from resource access server, generates report result according to the information obtained.
In the present embodiment, receiver module 51, specifically for receiving above-mentioned speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching above-mentioned predetermined degree of confidence of transmission.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.
In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that determination module 54 determines the intention of the voice that user inputs, and then to be generated by generation module 55 and effectively report result, this report result sends to above-mentioned terminal to carry out voice broadcast by sending module 52, that is, if terminal receives report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.
In the present embodiment, further, above-mentioned man machine language's interactive device can also comprise: acquisition module 56, for according to the user profile of above-mentioned user and current state, obtains and is applicable to the content recommending above-mentioned user; Sending module 52, also for triggering cloud Push Service, sends to above-mentioned terminal by above-mentioned cloud Push Service by the content being applicable to recommending above-mentioned user, and initiates the dialogue with above-mentioned terminal.
In above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 51 can receive the voice identification result that speech recognition server sends, determination module 54 determines the intention of the voice that user inputs according to upper speech recognition result, generation module 55 generates according to the intention determined and reports result, then report result is sent to speech recognition server by sending module 52, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
Fig. 6 is the structural representation of man-machine another embodiment of voice interaction device of the present invention, man machine language's interactive device in the present embodiment can as speech recognition server, or a part for speech recognition server realizes the present invention's flow process embodiment illustrated in fig. 2, as shown in Figure 6, this man machine language's interactive device can comprise: receiver module 61, sending module 62 and identification module 63;
Wherein, receiver module 61, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice that receiving terminal sends, above-mentioned voice use the user of above-mentioned terminal to input to above-mentioned terminal; And after voice identification result is sent to many wheel dialog servers by sending module 62, receive the report result that many wheel dialog servers send.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 61 can also receive the voice that above-mentioned terminal sends, that is, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user is carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states.
Identification module 63, identifies for the voice received receiver module 61.Wherein, identification module 63, specifically for determining the starting and ending of every words in above-mentioned voice by quiet detection technique.In the present embodiment, use quiet detection technique, identification module 63 can realize the cutting to sentence, and namely identification module 63 can determine the starting and ending of every words in above-mentioned voice.
Sending module 62, voice identification result for being identified by identification module 63 sends to takes turns dialog server more, so that upper speech recognition result sends to QU server to carry out context understanding by many wheel dialog servers, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention; And after receiver module 61 receives the report result of many wheel dialog servers transmission, send to terminal to carry out voice broadcast above-mentioned report result.
Wherein, sending module 62, specifically for after determining that the voice identification result obtained reaches predetermined degree of confidence, sends to the voice identification result reaching above-mentioned predetermined degree of confidence and takes turns dialog server more.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.In the present embodiment, user is in above-mentioned terminal input voice, identification module 63 is also constantly identifying the voice that terminal is sent, when determining the degree of confidence that acquired voice identification result has reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by sending module 62, so that the mode that many wheel dialog servers describe according to the present invention's step 102 embodiment illustrated in fig. 1 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal have received report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.
In above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiver module 61 receiving terminal sends, identification module 63 identifies above-mentioned voice, then voice identification result sends to and takes turns dialog server more by sending module 62, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then receiver module 61 receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result by sending module 62, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
Fig. 7 is the structural representation of man-machine another embodiment of voice interaction device of the present invention, man machine language's interactive device in the present embodiment can as terminal, or a part for terminal realizes the present invention's flow process embodiment illustrated in fig. 3, as shown in Figure 7, this man machine language's interactive device can comprise: receiver module 71, sending module 72 and report module 73;
Receiver module 71, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receives the voice of the user's input using above-mentioned terminal; And after above-mentioned voice are sent to speech recognition server by sending module 72, receive the report result that above-mentioned speech recognition server sends, the report result that above-mentioned speech recognition server sends is that many wheel dialog servers send to above-mentioned speech recognition server; In the present embodiment, receiver module 71, specifically for reporting in above-mentioned terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminating the input of the TTS voice play, only receiving the voice of above-mentioned user input.
In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, user still can input voice to terminal, that is, user can by the voice broadcast to terminal input barge terminal, also can directly feed back the report result that terminal is reported, affect the ensuing report content of terminal, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.
Sending module 72, voice for being received by receiver module 71 send to above-mentioned speech recognition server, to make above-mentioned speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention;
Report module 73, for reporting the report result that receiver module 71 receives.
In a kind of implementation of the present embodiment, sending module 72, the voice specifically for the predetermined length above-mentioned user inputted send to above-mentioned speech recognition server.Wherein, above-mentioned predetermined length can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined length is not construed as limiting.
In the another kind of implementation of the present embodiment, sending module 72, specifically for being determined the starting and ending of every words in the voice that above-mentioned user inputs by quiet detection technique, only sends to speech recognition server by the recording comprising voice.
Due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, the voice of the predetermined length that user just inputs by sending module 72 send to above-mentioned speech recognition server, or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, and report result is sent to terminal by speech recognition server, and at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast to above-mentioned report result.
Above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 71 receives the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server by sending module 72, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, receiver module 71 receives and is reported the report result of speech recognition server transmission by report module 73; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.
It should be noted that, in describing the invention, term " first ", " second " etc. only for describing object, and can not be interpreted as instruction or hint relative importance.In addition, in describing the invention, except as otherwise noted, the implication of " multiple " is two or more.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (Programmable Gate Array; Hereinafter referred to as: PGA), field programmable gate array (Field Programmable Gate Array; Hereinafter referred to as: FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is that the hardware that can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, this program perform time, step comprising embodiment of the method one or a combination set of.
In addition, each functional module in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of modules exists, also can two or more module integrations in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.If described integrated module using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.
The above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (22)

1. man machine language's exchange method, is characterized in that, comprising:
Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input;
Send to keyword to understand server institute's speech recognition result and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword;
Determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, generate according to described intention and report result;
Described report result is sent to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.
2. method according to claim 1, is characterized in that, described generation according to described intention is reported result and comprised:
Be intended to corresponding information from the acquisition of resource access server with described according to described intention, generate according to the information obtained and report result.
3. method according to claim 1, is characterized in that, the voice identification result that the described speech recognition server of described reception sends comprises:
Receive described speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence of transmission.
4. the method according to claim 1-3 any one, is characterized in that, also comprises:
According to user profile and the current state of described user, obtain and be applicable to the content recommending described user, and trigger cloud Push Service, by described cloud Push Service, the described content being applicable to recommending described user is sent to described terminal, and initiate the dialogue with described terminal.
5. man machine language's exchange method, is characterized in that, comprising:
Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal;
Described voice are identified, voice identification result is sent to and takes turns dialog server more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention;
Receive the report result that described many wheel dialog servers send, send to described terminal to carry out voice broadcast described report result.
6. method according to claim 5, is characterized in that, describedly carries out identification to described voice and comprises:
The starting and ending of every words in described voice is determined by quiet detection technique.
7. the method according to claim 5 or 6, is characterized in that, is describedly sent to by voice identification result many wheel dialog servers to comprise:
After determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence is sent to and takes turns dialog server more.
8. man machine language's exchange method, is characterized in that, comprising:
Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using described terminal;
The voice that described user inputs are sent to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention;
Receive and report the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server.
9. method according to claim 8, is characterized in that, describedly carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives and uses the voice of the user of described terminal input to comprise:
Report in described terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminate the input from Text To Speech TTS voice of playing, only receive the voice of described user input.
10. method according to claim 8 or claim 9, is characterized in that, describedly sends to described speech recognition server to comprise the voice that described user inputs:
The voice of the predetermined length described user inputted send to described speech recognition server.
11. methods according to claim 8 or claim 9, is characterized in that, describedly send to described speech recognition server to comprise the voice that described user inputs:
Determined the starting and ending of every words in the voice that described user inputs by quiet detection technique, only the recording comprising voice is sent to described speech recognition server.
12. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:
Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; And to be sent to by institute's speech recognition result keyword to understand after server carries out context understanding at sending module, receive the result that the context understanding that server sends understood in described keyword;
Described sending module, the voice identification result for being received by described receiver module sends to keyword to understand server to carry out context understanding;
Preserve module, for preserving the result of the context understanding that described receiver module receives;
Determination module, the result for the context understanding of preserving according to described preservation module determines the intention of the voice that described user inputs;
Generation module, the intention for determining according to described determination module generates reports result;
Described sending module, the report result also for being generated by described generation module sends to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.
13. devices according to claim 12, is characterized in that,
Described generation module, is intended to corresponding information from the acquisition of resource access server with described specifically for the intention determined according to described determination module, generates report result according to the information obtained.
14. devices according to claim 12, is characterized in that,
Described receiver module, specifically for receiving described speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence of transmission.
15. devices according to claim 12-14 any one, is characterized in that, also comprise:
Acquisition module, for according to the user profile of described user and current state, obtains and is applicable to the content recommending described user;
Described sending module, also for triggering cloud Push Service, sends to described terminal by described cloud Push Service by the described content being applicable to recommending described user, and initiates the dialogue with described terminal.
16. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:
Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; And after voice identification result is sent to many wheel dialog servers by sending module, receive the report result that described many wheel dialog servers send;
Identification module, identifies for the voice received described receiver module;
Described sending module, take turns dialog server for being sent to by the voice identification result of described identification module identification more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; And after described receiver module receives the report result of described many wheel dialog servers transmission, send to described terminal to carry out voice broadcast described report result.
17. devices according to claim 16, is characterized in that,
Described identification module, specifically for determining the starting and ending of every words in described voice by quiet detection technique.
18. devices according to claim 16 or 17, is characterized in that,
Described sending module, specifically for after determining that the voice identification result obtained reaches predetermined degree of confidence, sends to the voice identification result reaching described predetermined degree of confidence and takes turns dialog server more.
19. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:
Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receives the voice of the user's input using described terminal; And after described voice are sent to described speech recognition server by sending module, receive the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server;
Described sending module, voice for being received by described receiver module send to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and the intention of the voice that described user inputs is determined according to the result of the context understanding of preserving, and generate report result according to described intention,
Report module, for reporting the report result that described receiver module receives.
20. devices according to claim 19, is characterized in that,
Described receiver module, specifically for reporting in described terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminating the input from Text To Speech TTS voice of playing, only receiving the voice of described user input.
21. devices according to claim 19 or 20, is characterized in that,
Described sending module, the voice specifically for the predetermined length described user inputted send to described speech recognition server.
22. devices according to claim 19 or 20, is characterized in that,
Described sending module, specifically for being determined the starting and ending of every words in the voice that described user inputs by quiet detection technique, only sends to described speech recognition server by the recording comprising voice.
CN201510080163.XA 2015-02-13 2015-02-13 Man-machine voice interactive method and device Pending CN104679472A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510080163.XA CN104679472A (en) 2015-02-13 2015-02-13 Man-machine voice interactive method and device
PCT/CN2015/083207 WO2016127550A1 (en) 2015-02-13 2015-07-02 Method and device for human-machine voice interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510080163.XA CN104679472A (en) 2015-02-13 2015-02-13 Man-machine voice interactive method and device

Publications (1)

Publication Number Publication Date
CN104679472A true CN104679472A (en) 2015-06-03

Family

ID=53314597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510080163.XA Pending CN104679472A (en) 2015-02-13 2015-02-13 Man-machine voice interactive method and device

Country Status (2)

Country Link
CN (1) CN104679472A (en)
WO (1) WO2016127550A1 (en)

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN105161097A (en) * 2015-07-23 2015-12-16 百度在线网络技术(北京)有限公司 Voice interaction method and apparatus
WO2016127550A1 (en) * 2015-02-13 2016-08-18 百度在线网络技术(北京)有限公司 Method and device for human-machine voice interaction
CN106095833A (en) * 2016-06-01 2016-11-09 竹间智能科技(上海)有限公司 Human computer conversation's content processing method
CN107799116A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 More wheel interacting parallel semantic understanding method and apparatus
CN107832439A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method, system and the terminal device of more wheel state trackings
CN107943834A (en) * 2017-10-25 2018-04-20 百度在线网络技术(北京)有限公司 Interactive implementation method, device, equipment and storage medium
CN108600511A (en) * 2018-03-22 2018-09-28 上海摩软通讯技术有限公司 The control system and method for intelligent sound assistant's equipment
CN109145853A (en) * 2018-08-31 2019-01-04 百度在线网络技术(北京)有限公司 The method and apparatus of noise for identification
CN109657091A (en) * 2019-01-02 2019-04-19 百度在线网络技术(北京)有限公司 State rendering method, device, equipment and the storage medium of interactive voice equipment
CN109725798A (en) * 2017-10-25 2019-05-07 腾讯科技(北京)有限公司 The switching method and relevant apparatus of Autonomous role
CN110364152A (en) * 2019-07-25 2019-10-22 深圳智慧林网络科技有限公司 Voice interactive method, equipment and computer readable storage medium
CN110557451A (en) * 2019-08-30 2019-12-10 北京百度网讯科技有限公司 Dialogue interaction processing method and device, electronic equipment and storage medium
CN110782625A (en) * 2018-12-17 2020-02-11 北京嘀嘀无限科技发展有限公司 Riding safety alarm method and device, electronic equipment and storage medium
CN111292732A (en) * 2018-12-06 2020-06-16 深圳市广和通无线股份有限公司 Audio information processing method and device, computer equipment and storage medium
CN111429896A (en) * 2018-06-01 2020-07-17 苹果公司 Voice interaction for accessing calling functionality of companion device at primary device
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
CN112700767A (en) * 2019-10-21 2021-04-23 苏州思必驰信息科技有限公司 Man-machine conversation interruption method and device
CN112732340A (en) * 2019-10-14 2021-04-30 苏州思必驰信息科技有限公司 Man-machine conversation processing method and device
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
WO2022141990A1 (en) * 2020-12-31 2022-07-07 广东美的制冷设备有限公司 Household appliance and voice control method therefor, voice device, and computer storage medium
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492822A (en) * 2018-02-23 2018-09-04 济南汇通远德科技有限公司 A kind of audio recognition method based on commercial Application
CN108831434A (en) * 2018-05-29 2018-11-16 尹绍华 voice interactive system and method
CN111916082A (en) * 2020-08-14 2020-11-10 腾讯科技(深圳)有限公司 Voice interaction method and device, computer equipment and storage medium
CN112735423B (en) * 2020-12-14 2024-04-05 美的集团股份有限公司 Voice interaction method and device, electronic equipment and storage medium
CN113257242A (en) * 2021-04-06 2021-08-13 杭州远传新业科技有限公司 Voice broadcast suspension method, device, equipment and medium in self-service voice service
CN113569021B (en) * 2021-06-29 2023-08-04 杭州摸象大数据科技有限公司 Method for classifying users, computer device and readable storage medium
US11605384B1 (en) 2021-07-30 2023-03-14 Nvidia Corporation Duplex communications for conversational AI by dynamically responsive interrupting content

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591315A (en) * 2003-05-29 2005-03-09 微软公司 Semantic object synchronous understanding for highly interactive interface
CN101178705A (en) * 2007-12-13 2008-05-14 中国电信股份有限公司 Free-running speech comprehend method and man-machine interactive intelligent system
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN203055434U (en) * 2012-07-30 2013-07-10 刘强 Family speech interactive terminal based on cloud technique
WO2014040022A2 (en) * 2012-09-10 2014-03-13 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistants
CN104282306A (en) * 2014-09-22 2015-01-14 奇瑞汽车股份有限公司 Vehicle-mounted voice recognition interaction method, terminal and server

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6078964B2 (en) * 2012-03-26 2017-02-15 富士通株式会社 Spoken dialogue system and program
CN103413549B (en) * 2013-07-31 2016-07-06 深圳创维-Rgb电子有限公司 The method of interactive voice, system and interactive terminal
CN103971681A (en) * 2014-04-24 2014-08-06 百度在线网络技术(北京)有限公司 Voice recognition method and system
CN104679472A (en) * 2015-02-13 2015-06-03 百度在线网络技术(北京)有限公司 Man-machine voice interactive method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1591315A (en) * 2003-05-29 2005-03-09 微软公司 Semantic object synchronous understanding for highly interactive interface
CN101178705A (en) * 2007-12-13 2008-05-14 中国电信股份有限公司 Free-running speech comprehend method and man-machine interactive intelligent system
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN203055434U (en) * 2012-07-30 2013-07-10 刘强 Family speech interactive terminal based on cloud technique
WO2014040022A2 (en) * 2012-09-10 2014-03-13 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistants
CN104282306A (en) * 2014-09-22 2015-01-14 奇瑞汽车股份有限公司 Vehicle-mounted voice recognition interaction method, terminal and server

Cited By (101)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11636869B2 (en) 2013-02-07 2023-04-25 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
WO2016127550A1 (en) * 2015-02-13 2016-08-18 百度在线网络技术(北京)有限公司 Method and device for human-machine voice interaction
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11842734B2 (en) 2015-03-08 2023-12-12 Apple Inc. Virtual assistant activation
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
CN105070290A (en) * 2015-07-08 2015-11-18 苏州思必驰信息科技有限公司 Man-machine voice interaction method and system
CN105161097A (en) * 2015-07-23 2015-12-16 百度在线网络技术(北京)有限公司 Voice interaction method and apparatus
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11550542B2 (en) 2015-09-08 2023-01-10 Apple Inc. Zero latency digital assistant
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
CN106095833B (en) * 2016-06-01 2019-04-16 竹间智能科技(上海)有限公司 Human-computer dialogue content processing method
CN106095833A (en) * 2016-06-01 2016-11-09 竹间智能科技(上海)有限公司 Human computer conversation's content processing method
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11657820B2 (en) 2016-06-10 2023-05-23 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
CN107799116A (en) * 2016-08-31 2018-03-13 科大讯飞股份有限公司 More wheel interacting parallel semantic understanding method and apparatus
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
CN109725798B (en) * 2017-10-25 2021-07-27 腾讯科技(北京)有限公司 Intelligent role switching method and related device
CN107943834A (en) * 2017-10-25 2018-04-20 百度在线网络技术(北京)有限公司 Interactive implementation method, device, equipment and storage medium
CN107943834B (en) * 2017-10-25 2021-06-11 百度在线网络技术(北京)有限公司 Method, device, equipment and storage medium for implementing man-machine conversation
CN109725798A (en) * 2017-10-25 2019-05-07 腾讯科技(北京)有限公司 The switching method and relevant apparatus of Autonomous role
US10664755B2 (en) 2017-11-16 2020-05-26 Baidu Online Network Technology (Beijing) Co., Ltd. Searching method and system based on multi-round inputs, and terminal
CN107832439A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method, system and the terminal device of more wheel state trackings
CN108600511A (en) * 2018-03-22 2018-09-28 上海摩软通讯技术有限公司 The control system and method for intelligent sound assistant's equipment
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
CN111429896A (en) * 2018-06-01 2020-07-17 苹果公司 Voice interaction for accessing calling functionality of companion device at primary device
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11360577B2 (en) 2018-06-01 2022-06-14 Apple Inc. Attention aware virtual assistant dismissal
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
CN109145853A (en) * 2018-08-31 2019-01-04 百度在线网络技术(北京)有限公司 The method and apparatus of noise for identification
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
CN111292732A (en) * 2018-12-06 2020-06-16 深圳市广和通无线股份有限公司 Audio information processing method and device, computer equipment and storage medium
CN111292732B (en) * 2018-12-06 2023-07-21 深圳市广和通无线股份有限公司 Audio information processing method, device, computer equipment and storage medium
CN110782625A (en) * 2018-12-17 2020-02-11 北京嘀嘀无限科技发展有限公司 Riding safety alarm method and device, electronic equipment and storage medium
CN109657091A (en) * 2019-01-02 2019-04-19 百度在线网络技术(北京)有限公司 State rendering method, device, equipment and the storage medium of interactive voice equipment
US11205431B2 (en) 2019-01-02 2021-12-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for presenting state of voice interaction device, and storage medium
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
CN110364152A (en) * 2019-07-25 2019-10-22 深圳智慧林网络科技有限公司 Voice interactive method, equipment and computer readable storage medium
CN110364152B (en) * 2019-07-25 2022-04-01 深圳智慧林网络科技有限公司 Voice interaction method, device and computer-readable storage medium
CN110557451A (en) * 2019-08-30 2019-12-10 北京百度网讯科技有限公司 Dialogue interaction processing method and device, electronic equipment and storage medium
US11830483B2 (en) 2019-10-14 2023-11-28 Ai Speech Co., Ltd. Method for processing man-machine dialogues
CN112732340A (en) * 2019-10-14 2021-04-30 苏州思必驰信息科技有限公司 Man-machine conversation processing method and device
CN112700767A (en) * 2019-10-21 2021-04-23 苏州思必驰信息科技有限公司 Man-machine conversation interruption method and device
WO2021077528A1 (en) * 2019-10-21 2021-04-29 苏州思必驰信息科技有限公司 Method for interrupting human-machine conversation
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11924254B2 (en) 2020-05-11 2024-03-05 Apple Inc. Digital assistant hardware abstraction
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
WO2022141990A1 (en) * 2020-12-31 2022-07-07 广东美的制冷设备有限公司 Household appliance and voice control method therefor, voice device, and computer storage medium

Also Published As

Publication number Publication date
WO2016127550A1 (en) 2016-08-18

Similar Documents

Publication Publication Date Title
CN104679472A (en) Man-machine voice interactive method and device
JP6949149B2 (en) Spoken privilege management for voice assistant systems
CN110442701B (en) Voice conversation processing method and device
EP3721605B1 (en) Streaming radio with personalized content integration
KR101821358B1 (en) Method and system for providing multi-user messenger service
KR102342623B1 (en) Voice and connection platform
CN111049996B (en) Multi-scene voice recognition method and device and intelligent customer service system applying same
CN111429895B (en) Semantic understanding method and device for multi-round interaction and computer storage medium
US20150206534A1 (en) Method of controlling interactive system, method of controlling server, server, and interactive device
CN104318924A (en) Method for realizing voice recognition function
WO2018063922A1 (en) Conversational interactions using superbots
CN106558310A (en) Virtual reality sound control method and device
CN105280183A (en) Voice interaction method and system
CN109147779A (en) Voice data processing method and device
CN116628157A (en) Parameter collection and automatic dialog generation in dialog systems
CN106911812A (en) A kind of processing method of session information, server and computer-readable recording medium
WO2021050159A1 (en) Dynamic contextual dialog session extension
CN108962262A (en) Voice data processing method and device
CN111753061B (en) Multi-round dialogue processing method and device, electronic equipment and storage medium
CN108604177A (en) Sequence relevant data messages in the computer network environment of voice activation are integrated
CN109473104A (en) Speech recognition network delay optimization method and device
CN110223697A (en) Interactive method and system
CN109448694A (en) A kind of method and device of rapid synthesis TTS voice
CN105491126A (en) Service providing method and service providing device based on artificial intelligence
CN108628908B (en) Method, device and electronic equipment for classifying user question-answer boundaries

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150603

RJ01 Rejection of invention patent application after publication