CN104679472A

CN104679472A - Man-machine voice interactive method and device

Info

Publication number: CN104679472A
Application number: CN201510080163.XA
Authority: CN
Inventors: 陈本东; 谢文
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-02-13
Filing date: 2015-02-13
Publication date: 2015-06-03
Also published as: WO2016127550A1

Abstract

The invention provides a man-machine voice interactive method and device. The man-machine voice interactive method comprises the steps that in the voice broadcast process of a terminal for a broadcast result, a voice recognition result which is sent by a voice recognition server is received; the voice recognition result is sent to a QU server for context understanding, and a result of the context understanding is received and stored; according to the stored result of the context understanding, the intention of a voice input by a user is determined, and a broadcast result is generated according to the intention; the broadcast result is sent to the voice recognition server, so that the voice recognition server can send the broadcast result to the terminal for voice broadcast. According to the invention, in the man-machine voice interactive process, the voice broadcast and the voice input of the user are carried out at the same time, thus a recoding state and a broadcast state do not need to be repeatedly switched in the man-machine interactive process, and many rounds of dialogue can be more coherent.

Description

Man machine language's exchange method and device

Technical field

The present invention relates to Internet technical field, particularly relate to a kind of man machine language's exchange method and device.

Background technology

Speech recognition and man machine language have had very long history alternately, existing various voice assistant class application (Application; Hereinafter referred to as: APP), in mode of operation, the triggering of recording is by button, and after recording, answer reported by machine, when reporting answer, can not record.That is, existing voice assistant class APP can only carry out half-duplex operation, and when namely machine is reported, user is mute, and when user speaks, machine can not be reported.

So just need machine ceaselessly to switch between recording and report two states, often need the operation of user to intervene, use very inconvenient.Now, some voice assistant class APP are provided with autoanswer mode, and namely machine enters recording state after reporting automatically, but under this autoanswer mode, machine automatically switches sometimes, does not sometimes automatically switch, and allows user be at a loss on the contrary.

In sum, it is very inconvenient that existing man machine language's interactive mode uses, and each question-response, all needs user intervention, complex operation, and man-machine interaction mode is very unnatural yet, and user experience is poor.

Summary of the invention

Object of the present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.

For this reason, first object of the present invention is to propose a kind of man machine language's exchange method.Pass through the method, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user can be carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

Second object of the present invention is to propose a kind of man machine language's interactive device.

To achieve these goals, man machine language's exchange method of first aspect present invention embodiment, comprise: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; Send to keyword to understand server institute's speech recognition result and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword; Determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, generate according to described intention and report result; Described report result is sent to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.

Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice identification result that speech recognition server sends can be received, the intention of the voice that user inputs is determined according to upper speech recognition result, and generate report result according to this intention, then report result is sent to speech recognition server, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

To achieve these goals, man machine language's exchange method of second aspect present invention embodiment, comprise: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; Described voice are identified, voice identification result is sent to and takes turns dialog server more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; Receive the report result that described many wheel dialog servers send, send to described terminal to carry out voice broadcast described report result.

Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiving terminal sends, above-mentioned voice are identified, then voice identification result is sent to and take turns dialog server more, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then speech recognition server receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

To achieve these goals, man machine language's exchange method of third aspect present invention embodiment, comprising: carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives the voice of the user's input using described terminal; The voice that described user inputs are sent to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; Receive and report the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server.

Man machine language's exchange method of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, terminal receives and reports the report result of speech recognition server transmission; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

To achieve these goals, man machine language's interactive device of fourth aspect present invention embodiment, comprise: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; And to be sent to by institute's speech recognition result keyword to understand after server carries out context understanding at sending module, receive the result that the context understanding that server sends understood in described keyword; Described sending module, the voice identification result for being received by described receiver module sends to keyword to understand server to carry out context understanding; Preserve module, for preserving the result of the context understanding that described receiver module receives; Determination module, the result for the context understanding of preserving according to described preservation module determines the intention of the voice that described user inputs; Generation module, the intention for determining according to described determination module generates reports result; Described sending module, the report result also for being generated by described generation module sends to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.

Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module can receive the voice identification result that speech recognition server sends, determination module determines the intention of the voice that user inputs according to upper speech recognition result, generation module generates according to the intention that determination module is determined and reports result, then report result is sent to speech recognition server by sending module, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

To achieve these goals, man machine language's interactive device of fifth aspect present invention embodiment, comprise: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; And after voice identification result is sent to many wheel dialog servers by sending module, receive the report result that described many wheel dialog servers send; Identification module, identifies for the voice received described receiver module; Described sending module, take turns dialog server for being sent to by the voice identification result of described identification module identification more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; And after described receiver module receives the report result of described many wheel dialog servers transmission, send to described terminal to carry out voice broadcast described report result.

Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiver module receiving terminal sends, identification module identifies above-mentioned voice, then voice identification result sends to and takes turns dialog server more by sending module, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then receiver module receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result by sending module, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

To achieve these goals, man machine language's interactive device of sixth aspect present invention embodiment, comprising: receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiving the voice of the user's input using described terminal, and after described voice are sent to described speech recognition server by sending module, receive the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server, described sending module, voice for being received by described receiver module send to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and the intention of the voice that described user inputs is determined according to the result of the context understanding of preserving, and generate report result according to described intention, report module, for reporting the report result that described receiver module receives.

Man machine language's interactive device of the embodiment of the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module receives the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server by sending module, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate report result according to above-mentioned intention, then, receiver module receives and is reported the report result of speech recognition server transmission by report module, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

The aspect that the present invention adds and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.

Accompanying drawing explanation

The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:

Fig. 1 is the process flow diagram of the present invention's man-machine voice interactive method embodiment;

Fig. 2 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention;

Fig. 3 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention;

Fig. 4 is the schematic diagram of the annexation embodiment in the man-machine voice interactive method of the present invention;

Fig. 5 is the structural representation of the present invention's man-machine voice interaction device embodiment;

Fig. 6 is the structural representation of man-machine another embodiment of voice interaction device of the present invention;

Fig. 7 is the structural representation of man-machine another embodiment of voice interaction device of the present invention.

Embodiment

Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.On the contrary, embodiments of the invention comprise fall into attached claims spirit and intension within the scope of all changes, amendment and equivalent.

Fig. 1 is the process flow diagram of the present invention's man-machine voice interactive method embodiment, and as shown in Figure 1, this man machine language's exchange method can comprise:

Step 101, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that speech recognition server sends, upper speech recognition result is that speech recognition server identifies rear transmission to using the voice of the user of above-mentioned terminal input.

In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, use the user of above-mentioned terminal still can continue to input voice, that is, this terminal is being carried out in the process of voice broadcast to report result, still continuing the voice receiving user's input, and send to speech recognition server to carry out speech recognition the voice that user inputs constantly, then voice identification result sends to and takes turns dialog server more by speech recognition server constantly, many wheels dialog server receives the voice identification result that speech recognition server sends constantly.Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.

Particularly, the voice identification result receiving speech recognition server transmission can be: receive above-mentioned speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching above-mentioned predetermined degree of confidence of transmission.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.

In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that many wheel dialog servers perform follow-up step 102 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal receives report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.

Step 102, sends to keyword to understand (Query Understand upper speech recognition result; Hereinafter referred to as: QU) server carries out context understanding, receives and preserves the result of context understanding that above-mentioned QU server sends.

Step 103, determines the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generates report result according to above-mentioned intention.

In the present embodiment, many wheel dialog servers according to the intention of the voice of the clear and definite user's input of the result of the context understanding of preserving, then directly can generate according to above-mentioned intention and report result;

Or generating report result according to above-mentioned intention can be: obtain the information corresponding with above-mentioned intention according to above-mentioned intention from resource access server, generate report result according to the information obtained.

Step 104, sends to described speech recognition server by above-mentioned report result, so that above-mentioned report result sends to above-mentioned terminal to carry out voice broadcast by speech recognition server.

In the present embodiment, according to the user profile of above-mentioned user and current state, can also obtain and be applicable to the content recommending above-mentioned user, and trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user is sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal.

That is, in the present embodiment, many wheels dialog server has learning ability, can according to the current state (such as: current location and/or current session content etc.) of the user profile of user (such as: the schedule of user and/or the song etc. of listening) and user, analyze idea and the wish of user, obtain and be applicable to the content recommending user, then many wheel dialog servers can trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user can be sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal.Dialog procedure is afterwards identical with the process that step 101 ~ step 104 describes, and does not repeat them here.

In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice identification result that speech recognition server sends can be received, the intention of the voice that user inputs is determined according to upper speech recognition result, and generate report result according to this intention, then report result is sent to speech recognition server, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

Fig. 2 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention, and as shown in Figure 2, this man machine language's exchange method can comprise:

Step 201, carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receive the voice that above-mentioned terminal sends, above-mentioned voice use the user of above-mentioned terminal to input to above-mentioned terminal.

In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, speech recognition server can also receive the voice that above-mentioned terminal sends, that is, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user is carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states.

Step 202, above-mentioned voice are identified, voice identification result is sent to and takes turns dialog server more, so that upper speech recognition result sends to QU server to carry out context understanding by many wheel dialog servers, receive and preserve the result of context understanding that QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.

Particularly, carry out identification to above-mentioned voice to comprise: the starting and ending being determined every words in above-mentioned voice by quiet detection technique.

In the present embodiment, use quiet detection technique, speech recognition server can realize the cutting to sentence, and namely speech recognition server can determine the starting and ending of every words in above-mentioned voice.

Particularly, voice identification result is sent to how wheel dialog server can be: after determining that the voice identification result obtained reaches predetermined degree of confidence, sent to by the voice identification result reaching above-mentioned predetermined degree of confidence and take turns dialog server more.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.

In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that the mode that many wheel dialog servers describe according to the present invention's step 102 embodiment illustrated in fig. 1 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal have received report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.

Step 203, receives the report result that many wheel dialog servers send, sends to above-mentioned terminal to carry out voice broadcast above-mentioned report result.

In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiving terminal sends, above-mentioned voice are identified, then voice identification result is sent to and take turns dialog server more, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then speech recognition server receives the report result that many wheel dialog servers send, and sends to terminal to carry out voice broadcast above-mentioned report result; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

Fig. 3 is the process flow diagram of man-machine another embodiment of voice interactive method of the present invention, and as shown in Figure 3, this man machine language's exchange method can comprise:

Step 301, carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives the voice of the user's input using above-mentioned terminal.

Particularly, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiving the voice using the user of above-mentioned terminal to input can be: the terminal used user is reported in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminate play from Text To Speech (Text to Speech; Hereinafter referred to as: the TTS) input of voice, only receives the voice of above-mentioned user input.

In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, user still can input voice to terminal, that is, user can by the voice broadcast to terminal input barge terminal, also can directly feed back the report result that terminal is reported, affect the ensuing report content of terminal, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.

Step 302, the voice that above-mentioned user inputs are sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.

Particularly, above-mentioned speech recognition server is sent to be the voice that user inputs: the voice of predetermined length user inputted send to above-mentioned speech recognition server.Wherein, above-mentioned predetermined length can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined length is not construed as limiting.

Particularly, send to above-mentioned speech recognition server also can be the voice that user inputs: the starting and ending being determined every words in the voice that above-mentioned user input by quiet detection technique, the recording comprising voice is sent to above-mentioned speech recognition server.

Due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, just the voice of the predetermined length of user's input are sent to above-mentioned speech recognition server, or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, and report result is sent to terminal by speech recognition server, and at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast to above-mentioned report result.

Step 303, receive and report speech recognition server send report result.Wherein, the report result that above-mentioned speech recognition server sends is that many wheel dialog servers send to above-mentioned speech recognition server.

In above-described embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, terminal receives and reports the report result of speech recognition server transmission; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

In Fig. 1, Fig. 2 of the present invention and the man machine language's exchange method provided embodiment illustrated in fig. 3, terminal, speech recognition server, take turns dialog server more, annexation between QU server and resource access server can as shown in Figure 4, Fig. 4 is the schematic diagram of the annexation embodiment in the man-machine voice interactive method of the present invention.

See Fig. 4, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, terminal receives the voice of the user's input using above-mentioned terminal.In the present invention, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, user still can input voice to terminal, that is, user can by the voice broadcast to terminal input barge terminal, also directly can feed back the report result that terminal is reported, thus following two kinds of session operational scenarios can be realized.

Session operational scenarios one: the voice broadcast of User break terminal

User: order

What terminal: you need?

User: Spicy diced chicken with peanuts, Beijing roast duck.

Terminal: good, is prepared as you and places an order, and Spicy diced chicken with peanuts is a

User: Spicy diced chicken with peanuts has not been wanted, changes diced chicken saute with green pepper into.

Terminal: good, is prepared as you and places an order, and Spicy diced chicken with peanuts is a, and Beijing roast duck is a.

Session operational scenarios two: the voice broadcast of user feedback terminal

People: how is weather in the past few days?

Machine: slightly well, today weather

People: grace

Machine (not stopping): tomorrow weather

People: grace, continues

Machine (not stopping): the day after tomorrow weather

People: OK

Machine: report complete.

Then, the voice that above-mentioned user inputs are sent to above-mentioned speech recognition server by terminal, speech recognition server identifies above-mentioned voice, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.

Here due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, the voice of the predetermined length just user inputted send to above-mentioned speech recognition server; Or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server identify above-mentioned voice, and voice identification result is sent to take turns dialog server more.Or, because user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, therefore, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server.

Then, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, report result is sent to terminal by speech recognition server, at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast, thus can realize following session operational scenarios to above-mentioned report result.

Session operational scenarios three: terminal interrupts the phonetic entry of user.

User: go where to play relatively good, very boringly recently thinks

Terminal (interrupting): I knows your demand, the Worker's Stadium has the concert of Deng Ziqi tonight, and current admission ticket has preferential, can consider

User: good, places an order.

Terminal: for you buy 9 Deng tonight purple chess concert admission ticket, admission fee xxx unit.

In addition, many wheels dialog server has learning ability, can according to the current state (such as: current location and/or current session content etc.) of the user profile of user (such as: the schedule of user and/or the song etc. of listening) and user, analyze idea and the wish of user, obtain and be applicable to the content recommending user, then many wheel dialog servers can trigger cloud Push Service, by above-mentioned cloud Push Service, the content being applicable to recommending user can be sent to above-mentioned terminal, and initiate the dialogue with above-mentioned terminal, thus following session operational scenarios can be realized.

Session operational scenarios four: the schedule according to user recommends taxi information to user

Whether terminal: you have ordered the count of votes of 4 this afternoon, and the current time is 2 pm is that you order a taxi.

User: be out of use, I drives to myself.

Terminal: your car is restricted driving today.

User: OK, that helps me to be a special train.

Terminal: good, just a moment,please (... .), Wang master worker order, license plate number is xxxx, estimates to arrive for 3 minutes.

User: thank.

In the present invention, when terminal carries out voice broadcast to report result time, user still can input voice to terminal, then voice send to speech recognition server to identify by terminal, voice identification result sends to and takes turns dialog server more by speech recognition server, voice identification result sends to QU server to carry out context understanding by many wheels dialog server, then receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, then return to terminal according to above-mentioned intention generation report result and carry out voice broadcast, following 5 kinds of states can be realized:

1, terminal keep voice broadcast, under this state, user input voice may be " oh " or " interesting ");

2, terminal stops current report, terminates actualite, and under this state, the voice of user's input may be " being aware of " or " much of that ");

3, new topic opened by many wheel dialog server connection resource access servers, and under this state, the voice of user's input may be " intercutting lower Beijing weather ";

4, many wheel dialog server connection resource access servers go deep into topic, and under this state, the voice of user's input may be " Beijing weather " and " Shanghai ";

5, get back to before topic, under this state, user input voice may be " joke is before said and is over "; Also can take turns dialog server initiatively inquiry, the report the possibility of result that terminal receives is " weather is reported and is over, and also needs cross-talk before to say " more.

In sum, the present invention when not needing user's manual intervention operations such as () buttons, can maintain dialogue, ensures chat effect.

Fig. 5 is the structural representation of the present invention's man-machine voice interaction device embodiment, man machine language's interactive device in the present embodiment can as taking turns dialog server more, or a part for many wheel dialog servers realizes the present invention's flow process embodiment illustrated in fig. 1, as shown in Figure 5, this man machine language's interactive device can comprise: receiver module 51, sending module 52, preservation module 53, determination module 54 and generation module 55.

Wherein, receiver module 51, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that above-mentioned speech recognition server sends, upper speech recognition result is that speech recognition server identifies rear transmission to using the voice of the user of above-mentioned terminal input; And upper speech recognition result sent to after QU server carries out context understanding at sending module 52, receive the result of the context understanding that above-mentioned QU server sends.

In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, use the user of above-mentioned terminal still can continue to input voice, that is, this terminal is being carried out in the process of voice broadcast to report result, still continuing the voice receiving user's input, and send to speech recognition server to carry out speech recognition the voice that user inputs constantly, then voice identification result sends to and takes turns dialog server more by speech recognition server constantly, so receiver module 51 receives the voice identification result that speech recognition server sends constantly.Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, and then can realize not needing in interactive process repeatedly switch recording and report two states.

Sending module 52, the voice identification result for being received by receiver module 51 sends to QU server to carry out context understanding.

Preserve module 53, for preserving the result of the context understanding that receiver module 51 receives.

Determination module 54, for determining the intention of the voice that above-mentioned user inputs according to the result of preserving the context understanding that module 53 is preserved.

Generation module 55, the intention for determining according to determination module 54 generates reports result.

Sending module 52, the report result also for being generated by generation module 55 sends to speech recognition server, so that above-mentioned report result sends to terminal to carry out voice broadcast by speech recognition server.

In the present embodiment, generation module 55, obtains the information corresponding with above-mentioned intention specifically for the intention determined according to determination module 54 from resource access server, generates report result according to the information obtained.

In the present embodiment, receiver module 51, specifically for receiving above-mentioned speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching above-mentioned predetermined degree of confidence of transmission.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.

In the present embodiment, user is in above-mentioned terminal input voice, speech recognition server is also constantly identifying the voice that terminal is sent, when speech recognition server determines the degree of confidence that acquired voice identification result reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by speech recognition server, so that determination module 54 determines the intention of the voice that user inputs, and then to be generated by generation module 55 and effectively report result, this report result sends to above-mentioned terminal to carry out voice broadcast by sending module 52, that is, if terminal receives report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.

In the present embodiment, further, above-mentioned man machine language's interactive device can also comprise: acquisition module 56, for according to the user profile of above-mentioned user and current state, obtains and is applicable to the content recommending above-mentioned user; Sending module 52, also for triggering cloud Push Service, sends to above-mentioned terminal by above-mentioned cloud Push Service by the content being applicable to recommending above-mentioned user, and initiates the dialogue with above-mentioned terminal.

In above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 51 can receive the voice identification result that speech recognition server sends, determination module 54 determines the intention of the voice that user inputs according to upper speech recognition result, generation module 55 generates according to the intention determined and reports result, then report result is sent to speech recognition server by sending module 52, terminal is sent to carry out voice broadcast above-mentioned report result by speech recognition server, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

Fig. 6 is the structural representation of man-machine another embodiment of voice interaction device of the present invention, man machine language's interactive device in the present embodiment can as speech recognition server, or a part for speech recognition server realizes the present invention's flow process embodiment illustrated in fig. 2, as shown in Figure 6, this man machine language's interactive device can comprise: receiver module 61, sending module 62 and identification module 63;

Wherein, receiver module 61, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, the voice that receiving terminal sends, above-mentioned voice use the user of above-mentioned terminal to input to above-mentioned terminal; And after voice identification result is sent to many wheel dialog servers by sending module 62, receive the report result that many wheel dialog servers send.

In the present embodiment, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 61 can also receive the voice that above-mentioned terminal sends, that is, in the process that man machine language is mutual, the phonetic entry of voice broadcast and user is carried out simultaneously, thus can realize not needing in interactive process repeatedly switch recording and report two states.

Identification module 63, identifies for the voice received receiver module 61.Wherein, identification module 63, specifically for determining the starting and ending of every words in above-mentioned voice by quiet detection technique.In the present embodiment, use quiet detection technique, identification module 63 can realize the cutting to sentence, and namely identification module 63 can determine the starting and ending of every words in above-mentioned voice.

Sending module 62, voice identification result for being identified by identification module 63 sends to takes turns dialog server more, so that upper speech recognition result sends to QU server to carry out context understanding by many wheel dialog servers, receive and preserve the result of context understanding that above-mentioned QU server sends, and determine the intention of the voice that user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention; And after receiver module 61 receives the report result of many wheel dialog servers transmission, send to terminal to carry out voice broadcast above-mentioned report result.

Wherein, sending module 62, specifically for after determining that the voice identification result obtained reaches predetermined degree of confidence, sends to the voice identification result reaching above-mentioned predetermined degree of confidence and takes turns dialog server more.Wherein, this predetermined degree of confidence can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined degree of confidence is not construed as limiting.In the present embodiment, user is in above-mentioned terminal input voice, identification module 63 is also constantly identifying the voice that terminal is sent, when determining the degree of confidence that acquired voice identification result has reached predetermined, the voice identification result reaching above-mentioned predetermined degree of confidence sends to and takes turns dialog server more by sending module 62, so that the mode that many wheel dialog servers describe according to the present invention's step 102 embodiment illustrated in fig. 1 ~ step 104, determine the intention of the voice that user inputs, and then generation effectively reports result, above-mentioned terminal is sent to carry out voice broadcast, that is, if terminal have received report result, just can interrupt the phonetic entry of user, the report result of acquisition is reported directly to user.

In above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, after the voice that receiver module 61 receiving terminal sends, identification module 63 identifies above-mentioned voice, then voice identification result sends to and takes turns dialog server more by sending module 62, so that many wheel dialog servers determine the intention of the voice that user inputs according to upper speech recognition result, and generate report result according to above-mentioned intention, then receiver module 61 receives the report result that many wheel dialog servers send, and send to terminal to carry out voice broadcast above-mentioned report result by sending module 62, thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

Fig. 7 is the structural representation of man-machine another embodiment of voice interaction device of the present invention, man machine language's interactive device in the present embodiment can as terminal, or a part for terminal realizes the present invention's flow process embodiment illustrated in fig. 3, as shown in Figure 7, this man machine language's interactive device can comprise: receiver module 71, sending module 72 and report module 73;

Receiver module 71, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receives the voice of the user's input using above-mentioned terminal; And after above-mentioned voice are sent to speech recognition server by sending module 72, receive the report result that above-mentioned speech recognition server sends, the report result that above-mentioned speech recognition server sends is that many wheel dialog servers send to above-mentioned speech recognition server; In the present embodiment, receiver module 71, specifically for reporting in above-mentioned terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminating the input of the TTS voice play, only receiving the voice of above-mentioned user input.

Sending module 72, voice for being received by receiver module 71 send to above-mentioned speech recognition server, to make above-mentioned speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that QU server sends, and determine the intention of the voice that above-mentioned user inputs according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention;

Report module 73, for reporting the report result that receiver module 71 receives.

In a kind of implementation of the present embodiment, sending module 72, the voice specifically for the predetermined length above-mentioned user inputted send to above-mentioned speech recognition server.Wherein, above-mentioned predetermined length can when specific implementation sets itself, the size of the present embodiment to above-mentioned predetermined length is not construed as limiting.

In the another kind of implementation of the present embodiment, sending module 72, specifically for being determined the starting and ending of every words in the voice that above-mentioned user inputs by quiet detection technique, only sends to speech recognition server by the recording comprising voice.

Due to user, sometimes to input voice long, and often to the description of details, so can arrange predetermined length, when the voice of user's input reach this predetermined length, the voice of the predetermined length that user just inputs by sending module 72 send to above-mentioned speech recognition server, or, sometimes user has pause in the process of input voice, so the starting and ending of every words in the voice that above-mentioned user inputs can be determined by quiet detection technique, only the recording comprising voice is sent to above-mentioned speech recognition server, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, send to QU server to carry out context understanding upper speech recognition result by taking turns dialog server more, receive and preserve the result of context understanding that above-mentioned QU server sends, and the intention of the voice that above-mentioned user inputs is determined according to the result of the context understanding of preserving, and generate report result according to above-mentioned intention.Then report result is sent to speech recognition server by many wheel dialog servers, and report result is sent to terminal by speech recognition server, and at this moment terminal just can interrupt the phonetic entry of user, carries out voice broadcast to above-mentioned report result.

Above-mentioned man machine language's interactive device, carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receiver module 71 receives the voice of the user's input using above-mentioned terminal, then the voice that above-mentioned user inputs are sent to speech recognition server by sending module 72, to make speech recognition server, above-mentioned voice are identified, and voice identification result is sent to take turns dialog server more, by the intention of taking turns dialog server more and to determine according to this voice identification result the voice that user inputs, and then generate according to above-mentioned intention and report result; Then, receiver module 71 receives and is reported the report result of speech recognition server transmission by report module 73; Thus can be implemented in the mutual process of man machine language, the phonetic entry of voice broadcast and user is carried out simultaneously, realize not needing in interactive process repeatedly switch recording and report two states, realize the communication mode of man-machine interaction full duplex, and then many wheel dialogues can be made more coherent.

It should be noted that, in describing the invention, term " first ", " second " etc. only for describing object, and can not be interpreted as instruction or hint relative importance.In addition, in describing the invention, except as otherwise noted, the implication of " multiple " is two or more.

Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.

Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (Programmable Gate Array; Hereinafter referred to as: PGA), field programmable gate array (Field Programmable Gate Array; Hereinafter referred to as: FPGA) etc.

Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is that the hardware that can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, this program perform time, step comprising embodiment of the method one or a combination set of.

In addition, each functional module in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of modules exists, also can two or more module integrations in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.If described integrated module using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.

The above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.

In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.

Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims

1. man machine language's exchange method, is characterized in that, comprising:

Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input;

Send to keyword to understand server institute's speech recognition result and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword;

Determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, generate according to described intention and report result;

Described report result is sent to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.

2. method according to claim 1, is characterized in that, described generation according to described intention is reported result and comprised:

Be intended to corresponding information from the acquisition of resource access server with described according to described intention, generate according to the information obtained and report result.

3. method according to claim 1, is characterized in that, the voice identification result that the described speech recognition server of described reception sends comprises:

Receive described speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence of transmission.

4. the method according to claim 1-3 any one, is characterized in that, also comprises:

According to user profile and the current state of described user, obtain and be applicable to the content recommending described user, and trigger cloud Push Service, by described cloud Push Service, the described content being applicable to recommending described user is sent to described terminal, and initiate the dialogue with described terminal.

5. man machine language's exchange method, is characterized in that, comprising:

Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal;

Described voice are identified, voice identification result is sent to and takes turns dialog server more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention;

Receive the report result that described many wheel dialog servers send, send to described terminal to carry out voice broadcast described report result.

6. method according to claim 5, is characterized in that, describedly carries out identification to described voice and comprises:

The starting and ending of every words in described voice is determined by quiet detection technique.

7. the method according to claim 5 or 6, is characterized in that, is describedly sent to by voice identification result many wheel dialog servers to comprise:

After determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence is sent to and takes turns dialog server more.

8. man machine language's exchange method, is characterized in that, comprising:

Carry out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice of the user's input using described terminal;

The voice that described user inputs are sent to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention;

Receive and report the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server.

9. method according to claim 8, is characterized in that, describedly carries out in the process of voice broadcast in terminal to the report result that speech recognition server sends, and receives and uses the voice of the user of described terminal input to comprise:

Report in described terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminate the input from Text To Speech TTS voice of playing, only receive the voice of described user input.

10. method according to claim 8 or claim 9, is characterized in that, describedly sends to described speech recognition server to comprise the voice that described user inputs:

The voice of the predetermined length described user inputted send to described speech recognition server.

11. methods according to claim 8 or claim 9, is characterized in that, describedly send to described speech recognition server to comprise the voice that described user inputs:

Determined the starting and ending of every words in the voice that described user inputs by quiet detection technique, only the recording comprising voice is sent to described speech recognition server.

12. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:

Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice identification result that described speech recognition server sends, institute's speech recognition result is that described speech recognition server identifies rear transmission to using the voice of the user of described terminal input; And to be sent to by institute's speech recognition result keyword to understand after server carries out context understanding at sending module, receive the result that the context understanding that server sends understood in described keyword;

Described sending module, the voice identification result for being received by described receiver module sends to keyword to understand server to carry out context understanding;

Preserve module, for preserving the result of the context understanding that described receiver module receives;

Determination module, the result for the context understanding of preserving according to described preservation module determines the intention of the voice that described user inputs;

Generation module, the intention for determining according to described determination module generates reports result;

Described sending module, the report result also for being generated by described generation module sends to described speech recognition server, so that described report result sends to described terminal to carry out voice broadcast by described speech recognition server.

13. devices according to claim 12, is characterized in that,

Described generation module, is intended to corresponding information from the acquisition of resource access server with described specifically for the intention determined according to described determination module, generates report result according to the information obtained.

14. devices according to claim 12, is characterized in that,

Described receiver module, specifically for receiving described speech recognition server after determining that the voice identification result obtained reaches predetermined degree of confidence, the voice identification result reaching described predetermined degree of confidence of transmission.

15. devices according to claim 12-14 any one, is characterized in that, also comprise:

Acquisition module, for according to the user profile of described user and current state, obtains and is applicable to the content recommending described user;

Described sending module, also for triggering cloud Push Service, sends to described terminal by described cloud Push Service by the described content being applicable to recommending described user, and initiates the dialogue with described terminal.

16. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:

Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receive the voice that described terminal sends, described voice use the user of described terminal to input to described terminal; And after voice identification result is sent to many wheel dialog servers by sending module, receive the report result that described many wheel dialog servers send;

Identification module, identifies for the voice received described receiver module;

Described sending module, take turns dialog server for being sent to by the voice identification result of described identification module identification more, so that institute's speech recognition result sends to keyword to understand server by described many wheel dialog servers carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and determine the intention of the voice that described user inputs according to the result of the context understanding of preserving, and generate report result according to described intention; And after described receiver module receives the report result of described many wheel dialog servers transmission, send to described terminal to carry out voice broadcast described report result.

17. devices according to claim 16, is characterized in that,

Described identification module, specifically for determining the starting and ending of every words in described voice by quiet detection technique.

18. devices according to claim 16 or 17, is characterized in that,

Described sending module, specifically for after determining that the voice identification result obtained reaches predetermined degree of confidence, sends to the voice identification result reaching described predetermined degree of confidence and takes turns dialog server more.

19. 1 kinds of man machine language's interactive devices, is characterized in that, comprising:

Receiver module, for carrying out in the process of voice broadcast in terminal to the report result that speech recognition server sends, receives the voice of the user's input using described terminal; And after described voice are sent to described speech recognition server by sending module, receive the report result that described speech recognition server sends, the report result that described speech recognition server sends is that described many wheel dialog servers send to described speech recognition server;

Described sending module, voice for being received by described receiver module send to described speech recognition server, to make described speech recognition server, described voice are identified, and voice identification result is sent to take turns dialog server more, send to keyword to understand server institute's speech recognition result by described many wheel dialog servers and carry out context understanding, receive and preserve the result that the context understanding that server sends understood in described keyword, and the intention of the voice that described user inputs is determined according to the result of the context understanding of preserving, and generate report result according to described intention,

Report module, for reporting the report result that described receiver module receives.

20. devices according to claim 19, is characterized in that,

Described receiver module, specifically for reporting in described terminal in the process of the report result that speech recognition server sends, by echo cancellation technology, eliminating the input from Text To Speech TTS voice of playing, only receiving the voice of described user input.

21. devices according to claim 19 or 20, is characterized in that,

Described sending module, the voice specifically for the predetermined length described user inputted send to described speech recognition server.

22. devices according to claim 19 or 20, is characterized in that,

Described sending module, specifically for being determined the starting and ending of every words in the voice that described user inputs by quiet detection technique, only sends to described speech recognition server by the recording comprising voice.