CN109903760A

CN109903760A - Voice interactive method, device and storage medium

Info

Publication number: CN109903760A
Application number: CN201910000681.4A
Authority: CN
Inventors: 陈果果; 牛飞; 王芃; 潘向; 胡文波
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-01-02
Filing date: 2019-01-02
Publication date: 2019-06-18

Abstract

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving the first audio that peripheral hardware end is sent and being sent to server；It receives the first response audio that server is sent and is sent to peripheral hardware end, so that peripheral hardware end plays the first response audio, the first response audio is used to determine that the user of the corresponding user of terminal to be intended to；It receives the second audio that peripheral hardware end is sent and is sent to server, the second audio is for characterizing user's intention；It receives the second response audio that server is sent and is sent to peripheral hardware end, so that peripheral hardware end plays the second response audio, the second response audio is that server is intended to the response audio obtained based on user.The present invention realizes more wheel interactive voices between terminal and server, enriches the interactive function of peripheral hardware end and terminal, improves user experience.

Description

Voice interactive method, device and storage medium

Technical field

The present invention relates to technical field of voice interaction more particularly to a kind of voice interactive methods, device and storage medium.

Background technique

Bluetooth (Bluetooth) is a kind of wireless technology standard, it can be achieved that fixed equipment, mobile device and building people domain Short-range data exchange between net；After terminal and bluetooth equipment are attached, according to the category of bluetooth equipment, bluetooth can be set It is standby to carry out corresponding operation；As bluetooth equipment be Baffle Box of Bluetooth when, terminal can play music by bluetooth equipment.

In the prior art, the interactive function between terminal and bluetooth equipment is single, does not meet the side of current device intelligence To poor user experience.

Summary of the invention

The present invention provides a kind of voice interactive method, device and storage medium, realizes more between terminal and server Interactive voice is taken turns, the interactive function of peripheral hardware end and terminal is enriched, improves user experience.

The first aspect of the present invention is provided with a kind of voice interactive method, is applied to terminal, comprising:

It receives the first audio that peripheral hardware end is sent and is sent to server；

It receives the first response audio that the server is sent and is sent to the peripheral hardware end, so that the peripheral hardware end plays The first response audio, the first response audio are used to determine that the user of the corresponding user of the terminal to be intended to；

It receives the second audio that the peripheral hardware end is sent and is sent to the server, second audio is for characterizing institute State user's intention；

It receives the second response audio that the server is sent and is sent to the peripheral hardware end, so that the peripheral hardware end plays The second response audio, the second response audio are that the server is intended to the response audio obtained based on the user.

Optionally, the first response audio is used to request to determine the semanteme of first audio, first audio Semanteme is intended to for characterizing the user；

Second audio is used to characterize the semanteme of first audio, and the second response audio is the server base In the semantic response audio obtained of first audio.

Optionally, before first audio for receiving the transmission of peripheral hardware end, comprising:

Radio reception instruction is sent to the peripheral hardware end, the radio reception instruction is used to indicate the peripheral hardware end and starts radio reception；

The third audio that the peripheral hardware end is sent is received, if in the third audio including the corresponding wake-up letter of terminal Breath, then the terminal enters wake-up states.

The beginning radio reception message that the peripheral hardware end is sent is received, the beginning radio reception message is used to indicate at the peripheral hardware end In wake-up states, and start radio reception.

It is optionally, described to receive the second audio that the peripheral hardware end is sent and be sent to after the server, further includes:

It receives the stopping that the server is sent and sends message, the stopping sends message and is used to indicate the terminal stopping Audio is sent to the server, it is described to stop sending message being the of the server after receiving second audio In one preset duration, sent when not receiving four audio that the terminal is sent；

It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate the peripheral hardware end and stops receiving Sound.

Optionally, the second of the reception server transmission responds audio and is sent to after the peripheral hardware end, also Include:

If not receiving the 4th audio that the peripheral hardware end is sent in the second preset duration, enter non-wake-up states, And non-wake-up states message is sent to the peripheral hardware end.

The second aspect of the present invention provides a kind of voice interactive method, is applied to peripheral hardware end, comprising:

The first audio is sent to terminal, so that the terminal to server sends first audio, so that the service Device returns to the first response audio to the terminal according to first audio, and the first response audio is for determining the terminal The user of corresponding user is intended to；

The first response audio that the terminal is sent is received, and plays the first response audio；

The second audio is sent to the terminal, so that the terminal sends second audio to the server, so that The server returns to the second response audio to the terminal, and second audio is intended to for characterizing the user；

The second response audio that the terminal is sent is received, and plays the second response audio, second sound Answering audio is that the server is intended to the response audio obtained based on the user.

Corresponding, second audio is used to characterize the semanteme of first audio, and the second response audio is described The semantic response audio that obtains of the server based on first audio.

Optionally, before first audio of transmission to terminal, further includes:

The radio reception instruction that the terminal is sent is received, the radio reception instruction is used to indicate the peripheral hardware end and starts radio reception；

Third audio is sent to the terminal, if in the third audio including the corresponding wake-up information of the terminal, Then the terminal enters wake-up states.

Optionally, before first audio of transmission to terminal, further includes:

It is sent to the terminal and starts radio reception message, the beginning radio reception message is for notifying the terminal, the peripheral hardware End is in wake-up states, and starts radio reception.

It is optionally, described to be sent before starting radio reception message to the terminal, further includes:

It collects the first of user and wakes up audio, and enter wake-up states, include described outer in the first wake-up audio If holding corresponding wake-up information；Alternatively,

The user is received to the operational order for waking up control, and enters wake-up states, is arranged on the peripheral hardware end There is wake-up control, the wake-up control wakes up the peripheral hardware end for triggering.

Optionally, after second audio of transmission to the terminal, further includes:

Receive the stopping radio reception message that the terminal is sent；

Stop radio reception.

Optionally, after the stopping radio reception, further includes:

Receive the non-wake-up states message that the terminal is sent；

If not receiving within the scope of time threshold includes to wake up the second of information to wake up audio, enter suspend mode shape State.

The third aspect of the present invention provides a kind of voice interaction device, comprising:

First audio processing modules, for receiving the first audio of peripheral hardware end transmission and being sent to server；

First response audio processing modules, for receiving the first response audio of the server transmission and being sent to described Peripheral hardware end, so that the peripheral hardware end plays the first response audio, the first response audio is for determining that the voice is handed over The user of the corresponding user of mutual device is intended to；

Second audio processing modules, for receiving the second audio of the peripheral hardware end transmission and being sent to the server, Second audio is intended to for characterizing the user；

Second response audio processing modules, for receiving the second response audio of the server transmission and being sent to described Peripheral hardware end, so that the peripheral hardware end plays the second response audio, the second response audio is based on institute for the server It states user and is intended to the response audio obtained.

Optionally, the first response audio is used to request to determine the semanteme of first audio, first audio Semanteme is intended to for characterizing the user；Second audio is used to characterize the semanteme of first audio, second response Audio is the semantic response audio that obtains of the server based on first audio.

Optionally, described device further include: radio reception instruction sending module and third audio receiving module；

The radio reception instruction sending module, for sending radio reception instruction to the peripheral hardware end, the radio reception instruction is for referring to Show that the peripheral hardware end starts radio reception；

The third audio receiving module, the third audio sent for receiving the peripheral hardware end, if the third audio In include the corresponding wake-up information of voice interaction device, then the voice interaction device enters wake-up states.

Optionally, described device further include: start radio reception message reception module；

The beginning radio reception message reception module, the beginning radio reception message sent for receiving the peripheral hardware end are described to open Beginning radio reception message is used to indicate the peripheral hardware end and is in wake-up states, and starts radio reception.

Optionally, described device further include: stop radio module；

The stopping radio module sending message for receiving the stopping that the server is sent, and the stopping transmission disappearing Breath, which is used to indicate the voice interaction device, to be stopped sending audio to the server, and it is the service that the stopping, which sends message, Device is receiving in the first preset duration after second audio, does not receive the voice interaction device is sent the 4th It is sent when audio；It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate the peripheral hardware end Stop radio reception.

Optionally, optionally, described device further include: non-wake-up states message transmission module；

The non-wake-up states message transmission module, if for not receiving the peripheral hardware end hair in the second preset duration The 4th audio sent then enters non-wake-up states, and sends non-wake-up states message to the peripheral hardware end.

The fourth aspect of the present invention provides a kind of voice interaction device, comprising:

First audio sending module, for sending the first audio to terminal, so that described in terminal to server transmission First audio, so that the server returns to the first response audio, first sound to the terminal according to first audio Answer audio for determining that the user of the corresponding user of the terminal is intended to；

Playing module, the first response audio sent for receiving the terminal, and play the first response sound Frequently；

Second audio sending module, for sending the second audio to the terminal, so that the terminal is to the server Second audio is sent, so that the server returns to the second response audio to the terminal, second audio is used for table The user is levied to be intended to；

The playing module is also used to receive the second response audio that the terminal is sent, and plays described second Audio is responded, the second response audio is that the server is intended to the response audio obtained based on the user.

Described device further include: third audio sending module；

Optionally, described device further include: third audio sending module；

The third audio sending module, the radio reception instruction sent for receiving the terminal, the radio reception instruction are used for Indicate that the voice interaction device starts radio reception；Third audio is sent to the terminal, if comprising in the third audio The corresponding wake-up information of terminal is stated, then the terminal enters wake-up states.

Optionally, described device further include: start radio reception message module；

The beginning radio reception message module starts radio reception message, the beginning radio reception message for sending to the terminal For notifying the terminal, the voice interaction device is in wake-up states, and starts radio reception.

Optionally, described device further include: wake-up module；

The wake-up module wakes up audio for collecting the first of user, and enters wake-up states, and described first wakes up sound It include the corresponding wake-up information of the voice interaction device in frequency；Alternatively, receiving the user to the behaviour for waking up control It instructs, and enters wake-up states, wake-up control is provided on the voice interaction device, the wake-up control is called out for triggering It wakes up the voice interaction device.

Optionally, described device further include: stop radio module；

The stopping radio module, the stopping radio reception message sent for receiving the terminal；Stop radio reception.

Optionally, described device further include: sleep block；

The sleep block, the non-wake-up states message sent for receiving the terminal；If within the scope of time threshold Not receiving includes to wake up the second of word to wake up audio, then enters dormant state.

The fifth aspect of the present invention provides a kind of terminal, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal executes The voice interactive method of above-mentioned first aspect.

The sixth aspect of the present invention provides a kind of peripheral hardware end, comprising: at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the peripheral hardware end is held The voice interactive method of the above-mentioned second aspect of row.

The seventh aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned first aspect Exchange method.

The eighth aspect of the present invention provides a kind of computer readable storage medium, deposits on the computer readable storage medium Computer executed instructions are contained, when the computer executed instructions are executed by processor, realize the voice of above-mentioned second aspect Exchange method.

The present invention provides a kind of voice interactive method, device and storage medium, this method comprises: receiving what peripheral hardware end was sent First audio is simultaneously sent to server；It receives the first response audio that server is sent and is sent to peripheral hardware end, so that peripheral hardware end The first response audio is played, the first response audio is used to determine that the user of the corresponding user of terminal to be intended to；Peripheral hardware end is received to send The second audio and be sent to server, the second audio for characterize user intention；Receive the second response sound that server is sent Frequency is simultaneously sent to peripheral hardware end, so that peripheral hardware end plays the second response audio, the second response audio is that server is intended to based on user The response audio of acquisition.The present invention realizes more wheel interactive voices between terminal and server, enriches peripheral hardware end and terminal Interactive function, improve user experience.

Detailed description of the invention

Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in；

Fig. 2 is the flow diagram one of voice interactive method provided by the invention；

Fig. 3 is the flow diagram two of voice interactive method provided by the invention；

Fig. 4 is the flow diagram three of voice interactive method provided by the invention；

Fig. 5 is the interface schematic diagram of terminal provided by the invention；

Fig. 6 is the flow diagram four of voice interactive method provided by the invention；

Fig. 7 is the structural schematic diagram one of a voice interaction device provided by the invention；

Fig. 8 is the structural schematic diagram two of a voice interaction device provided by the invention；

Fig. 9 is the structural schematic diagram three of a voice interaction device provided by the invention；

Figure 10 is the structural schematic diagram one of another voice interaction device provided by the invention；

Figure 11 is the structural schematic diagram two of another voice interaction device provided by the invention；

Figure 12 is the structural schematic diagram three of another voice interaction device provided by the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the embodiment of the present invention, to this Technical solution in inventive embodiments is clearly and completely described, it is clear that described embodiment is that a part of the invention is real Example is applied, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creation Property labour under the premise of every other embodiment obtained, shall fall within the protection scope of the present invention.

Bluetooth peripheral hardware end in existing technology is varied, such as bluetooth headset, Baffle Box of Bluetooth, bluetooth keyboard, movement hand Ring etc., these bluetooth peripheral hardware ends are before use, need to establish bluetooth connection with terminal；Illustratively, Baffle Box of Bluetooth is built with terminal The process of vertical bluetooth connection are as follows: the power key of long-pressing Baffle Box of Bluetooth searches for bluetooth sound so that Baffle Box of Bluetooth is opened at the terminal The title of case, input pairing password, then can establish bluetooth connection.

Upon establishment of a connection, terminal can pass through the song or other audios on Baffle Box of Bluetooth playback terminal, the audio It can be stored in the local folders of terminal, be also possible to terminal and interact the instant audio obtained with server；Terminal The file played will be needed to be sent to Baffle Box of Bluetooth, Baffle Box of Bluetooth can the corresponding audio of played file.

But the interactive function in the prior art between terminal and bluetooth peripheral hardware end is excessively single, is merely able to realize in terminal It is passively playable, cannot be interacted with user, poor user experience under control；And can be interacted in the prior art with user Equipment is smart machine, with the proviso that can establish connection with server, the deployment cost at peripheral hardware end is high.

Precisely in order to solving the problems, such as that the interactive function between above-mentioned terminal and bluetooth peripheral hardware end is excessively single, and abundant While interactive function between the two, the deployment cost at bluetooth peripheral hardware end is reduced；The present invention provides a kind of interactive voice sides Formula.Fig. 1 is the schematic diagram of a scenario that voice interactive method provided by the invention is applicable in, as shown in Figure 1, voice provided by the invention is handed over It include: peripheral hardware end, terminal and server in the applicable scene of mutual method.

Wherein, peripheral hardware end can establish bluetooth connection with terminal, and the specific bluetooth connection can be in the prior art Based on the data communication of classical bluetooth, designated equipment is selected in the system set interface guidance user of terminal and completes to match；Or Person, terminal can establish smart bluetooth (DuerOS Mobile Accessories, DMA) connection with peripheral hardware end, illustratively, Terminal, can direct sweeping at the interface of the application program of terminal completion peripheral hardware end when wanting to establish DMA with peripheral hardware end and connect It retouches, match and connects, the system set interface for needing not return to terminal is configured, then completes to connect to the interface of application program It connects.Corresponding, when establishing common bluetooth connection in the present embodiment, peripheral hardware end is common bluetooth equipment；It is established therewith in terminal When DMA connection, peripheral hardware end is dma device, that is, supports the equipment of DMA Bluetooth protocol.Specifically, established when terminal and peripheral hardware end When being common bluetooth connection, specific mode is referred to bluetooth connection mode in the prior art；Terminal and peripheral hardware end are established Be DMA connection process, be specifically illustrated in the following embodiments.

It can be wireless connection or wired connection between terminal and server in the present invention, the terminal in the present invention can Think mobile phone, personal digital assistant (Personal Digital Assistant, PDA), tablet computer, portable equipment (for example, Portable computer, pocket computer or handheld computer) etc. mobile devices；It is also possible to the fixation such as desktop computer to set It is standby.

Below between peripheral hardware end, terminal and server interaction angle, to voice interactive method provided by the invention into Row explanation, Fig. 2 is the flow diagram one of voice interactive method provided by the invention, as shown in Fig. 2, language provided in this embodiment Sound exchange method may include:

S201, peripheral hardware end send the first audio to terminal.

Peripheral hardware end in the present embodiment has function of radio receiver, specifically, peripheral hardware end can be the vehicle-mounted branch with Mike Mic Frame has the Baffle Box of Bluetooth of function of radio receiver, bluetooth headset, light emitting diode (Light-Emitting Diode, LED) lamp, makes a noise The equipment such as clock.Terminal with after bluetooth connection is established at peripheral hardware end or DMA is connect, when user has interactive voice demand, such as user When wanting inquiry weather, playing song, peripheral hardware end or terminal can be waken up, so that peripheral hardware end and terminal interact, transmission The audio or response audio collected；Alternatively, peripheral hardware end is constantly in radio reception state, as long as audio can be collected The audio collected is sent to terminal.

It in the case of the first, needs to wake up terminal and peripheral hardware end, the wake-up mode of terminal and peripheral hardware end is done briefly below Explanation.

Wherein, the concrete mode for waking up terminal can be with are as follows: terminal has function of radio receiver, wakes up letter in advance for terminal setting Breath, the wake-up information can be to wake up word, when terminal is collected and includes the audio of wake-up word of terminal, into wake-up shape State；Specifically, wake-up control can be shown on the display interface of terminal in the present embodiment, user by click or other After operation selection wakes up control, terminal starts to collect audio；It is envisioned that user can be configured terminal, such as exist Preset time in one day can carry out radio reception, can be automatic after radio reception is not detected in the preset duration of the preset time Close radio reception.Specifically, terminal can send radio reception instruction to peripheral hardware end, after terminal enters wake-up states to indicate peripheral hardware End starts radio reception, and after peripheral hardware end receives radio reception instruction, peripheral hardware end starts radio reception, and sends the first audio to terminal.

Wherein, wake up a kind of mode at peripheral hardware end are as follows: be provided with wake-up button on peripheral hardware end, user by clicking or Other operations are selected button is waken up, which sends out for waking up peripheral hardware end, and after peripheral hardware end wakes up to terminal It send and starts radio reception message.

Wake up the yet another approach at peripheral hardware end are as follows: peripheral hardware end has preset wake-up information, which can be to call out Awake word, when user is collected at peripheral hardware end to say wake-up word or include to wake up the sentence of word, terminal enters wake-up states, and It is sent to terminal and starts radio reception message.

Wake up peripheral hardware end yet another approach are as follows: peripheral hardware is provided with switch button on end, user by click or its After his operation selects wake-up button, peripheral hardware end is opened, and is collected user when peripheral hardware end at this time and is said peripheral hardware end wake-up word Or when including the sentence of peripheral hardware end wake-up word, terminal enters wake-up states, and sends to terminal and start radio reception message.

Wake up the yet another approach at peripheral hardware end are as follows: terminal sends audio message to peripheral hardware end and works as peripheral hardware to wake up peripheral hardware end It holds to terminal and sends response message, i.e., when this starts radio reception message, i.e. instruction peripheral hardware end is in wake-up states.Specifically, terminal Before sending audio message to peripheral hardware end, it is also desirable to enter wake-up states；Wherein, terminal enters the concrete mode of wake-up states It can be user and click the wake-up control being arranged on terminal interface, triggering wakes up terminal, alternatively, terminal, which collects user, says end The wake-up word at end or include terminal wake-up word sentence, triggering wake up terminal.

It is illustrated in the present embodiment using waking up peripheral hardware end as movement of the example to wake-up, such as the wake-up at peripheral hardware end Word is " small degree ", then when user says the sentence of " small degree " or " small degree is waken up ", after the audio is collected at peripheral hardware end, to this Audio is parsed, determine include in the audio peripheral hardware end wake-up word, then peripheral hardware end enters wake-up states.

Specifically, radio reception message can be started to peripheral hardware end, and after radio reception after peripheral hardware end enters wake-up states, to Terminal sends the first audio.

The first audio in the present embodiment can be peripheral hardware termination and receive the first section audio collected after radio reception, either The first section audio that peripheral hardware end is collected after waking up.Specifically, preset time period of the peripheral hardware end after detecting first end audio It is interior, if effective audio is not detected, which is sent to terminal.The sound that peripheral hardware end can will collect in the present embodiment The volume of frequency is more than the audio of threshold volume as effective audio.It is envisioned that when peripheral hardware end enters wake-up states Afterwards, if only detecting a word that user says, peripheral hardware end is using the corresponding audio of a word collected as the first sound Frequently.

In the present embodiment, the first audio collected is sent to terminal by peripheral hardware end, is waken up in the present embodiment at peripheral hardware end Start radio reception afterwards or after receiving radio reception instruction, the audio collected is effective audio, so that terminal and server is handed over Mutually, the corresponding response audio of the audio is obtained；It can solve peripheral hardware end not to be waken up or when terminal does not indicate the radio reception of peripheral hardware end, The problem of audio collected is sent to terminal by peripheral hardware end, the memory of occupied terminal.

S202, terminal receive the first audio that peripheral hardware end is sent and are sent to server.

In the present embodiment, the first audio is sent to server after the first audio for receiving the transmission of peripheral hardware end by terminal, To obtain the corresponding response audio of first audio.

Illustratively, the first audio is " how is Pekinese's weather ", which is sent to terminal, terminal by peripheral hardware end First audio " how is Pekinese's weather " is further sent to server, to obtain the response data of first audio.

S203, the first response audio that server is sent to terminal, the first response audio is for determining the corresponding use of terminal The user at family is intended to.

In the present embodiment, server can carry out the first audio after receiving the first response audio of terminal transmission Parsing obtains the corresponding response audio of the first audio.Specifically, the process that is parsed to the first audio of server can be with are as follows: Text is converted by the first audio, text is subjected to cutting processing, obtains the corresponding multiple words of the text；Further according to each word The part of speech of language obtains target word, and further according to target word, corresponding user is intended to, and obtains the corresponding response sound of first audio Frequently.

It can be using such as neural LISP program LISP (Neuro-Linguistic of tokenizer in the present embodiment Programming, NLP) tool carries out word segmentation processing to the corresponding text of the first audio, the corresponding multiple words of text are obtained, If the corresponding text of the first audio is " how is Pekinese's weather ", using tokenizer by the character segmentation at multiple words, tool Word after the cutting of body can be " Beijing ", " ", " weather " and " how ".

In the present embodiment, optionally, the corresponding target word of effective information can be obtained according to the part of speech of multiple words of acquisition Language, such as quantifier, adverbial word, the adjective in the conversation message after cutting are removed, the corresponding target word of effective information is obtained, Such as noun and verb obtain the corresponding target word of effective information if removed " how " and " " in above-mentioned cutting result Language, " Beijing " and " weather ".Server determines user it is intended that " Pekinese's weather " according to the target word of acquisition.

Although the part that server can obtain user at this time, which is intended that, obtains Pekinese's weather, user not can determine that Which day Pekinese's weather required answer be, can be according in the first audio when server does not know user's intention The user having determined is intended to, and returns to the first response audio to terminal.Illustratively, such as the first sound that server is sent to terminal " may I ask you it should be understood which day Pekinese's weather " answers audio is.

It is worth noting that, server in the present embodiment according to the corresponding text of the first audio, the mood of user and Punctuate, obtains the intention of user；Such as " putting a first song of expressing one's emotion " that the first audio is user's input, server obtains the intention of user For " playing lyric song ", server can recommend the title of multiple lyric songs for user according to the user's intention, such as return First response audio " recommends " A ", " B ", " C " and " D " for you "；To determine user is intended which specific song of broadcasting.

It is worth noting that, server can first carry out text when the corresponding text of the first audio is more texts Subordinate sentence processing, then word segmentation processing is carried out to each clause, it is intended to further according to the corresponding user of middle target word of each clause, obtains The corresponding response audio of each clause is taken, by the corresponding multiple response audios of the first audio sound according to clause in the literature successive Sequence is sent to terminal.

Illustratively, the corresponding text of the first audio of user be " go to Beijing what has joyful? sexual valence of where staying Than high? ", text is divided into two clauses " place what Beijing has joyful " and " cost performance of where staying height " by server. Obtain the corresponding target word of each clause respectively again, such as " Beijing ", " joyful ", " place " and " lodging ", " cost performance is high ", Then obtain the corresponding response audio of each clause respectively, such as be respectively " there are the Forbidden City, Great Wall ... in the joyful place in Beijing " and " staying in Beijing, you can choose the hotel xx ".

Wherein it is possible to not know other parts clause it is contemplated that the user when server determining section molecule sentence is intended to When user is intended to, the transmission of response audio, the clause being intended to for not knowing user are carried out for the clause of determining user's intention The first response audio can be returned to, to request the corresponding user for determining the first audio to be intended to.

S204, terminal send the first response audio to peripheral hardware end.

In the present embodiment, the electricity for saving terminal is acted or user is used terminal and terminal is caused to be inconvenient to play sound When answering audio, the first response audio can be sent to peripheral hardware end after receiving the first response audio of first audio, The first response audio is played out by peripheral hardware end.Specifically, peripheral hardware end can be for audio broadcasting under this kind of embodiment The peripheral hardware end of function, such as Baffle Box of Bluetooth, motion bracelet.

It is envisioned that the first response audio that can also be received by end-on plays out.

S205, peripheral hardware end play the first response audio.

S206, peripheral hardware end send the second audio to terminal.

In the present embodiment, after peripheral hardware end plays the first response audio, the available server of user sends the first response The semanteme of audio, therefore, for the first response audio, user can say the sentence that characterization user is intended to, and peripheral hardware end can be with The response audio collected the sentence of second audio, i.e. the second audio, and the first audio is answered in order to obtain, then to Terminal sends second audio.

Illustratively, the first response audio " may I ask you it should be understood which day Pekinese's weather " is played at peripheral hardware end Afterwards, the second audio that peripheral hardware end is collected is " tomorrow ", sends second audio " tomorrow " to terminal.

S207, terminal receives the second audio that peripheral hardware end is sent, and sends the second audio to server.

In the present embodiment, after terminal receives the second audio, and the response that the first audio is answered in order to obtain Audio sends second audio to server.Such as second audio " tomorrow " is sent to server

S208, server send the second response audio to terminal.

Server can determine that user is intended that and ask Pekinese day after receiving the first audio such as " Pekinese's weather " Gas, and after receiving the second audio " tomorrow ", it is intended to since the second audio can characterize user, server can be true Determine user and is intended that " Pekinese's weather tomorrow ".It is worth noting that, server is after receiving the second audio, it can be according to upper It states and the analysis mode of audio parses the second audio.

In the present embodiment, server is intended to according to the user of the first determining audio, and then server can be returned to user It returns and responds audio about the second of Beijing weather tomorrow, such as " tomorrow Beijing fine day, 20 degree of temperature ".And by this second response audio It is sent to terminal.

It is worth noting that, server can determine the semanteme of the first audio after obtaining the second audio in the present embodiment, And then it returns and responds audio for the second of the first audio；After server receives the second audio, if first can't be determined The user of audio is intended to, then can continue to send the second response audio to terminal, at this point, the second response audio is still used to request The user of first audio is intended to, i.e., server, peripheral hardware end and terminal in the present embodiment can carry out take turns more and interact, until service The user that device can obtain the first audio is intended to, and the response audio being intended to for the user of the first audio is sent to terminal.

S209, terminal send the second response audio to peripheral hardware end.

In the present embodiment, terminal can be sent to peripheral hardware end after receiving the second response audio, so that peripheral hardware end plays The second response audio.

It is envisioned that the second sound that terminal after receiving the second response audio, can also be received by end-on Audio is answered to play out.

S210, peripheral hardware end play the second response audio, and the second response audio is that server is intended to the sound obtained based on user Answer audio.

In the present embodiment using peripheral hardware end carry out radio reception, compared with the existing technology in terminal directly handed over server Mutually obtain the mode of response audio；It on the one hand, may not be able to be accurate apart from its certain distance since the radio reception effect of terminal is limited Radio reception or radio reception effect are poor, such as have the vehicle-mounted bracket of Mic in the present embodiment using peripheral hardware end, and radio reception effect is more preferable；Separately On the one hand, also make the interaction of terminal and bluetooth equipment more diversified, improve user experience.

Voice interactive method provided in this embodiment includes: to receive the first audio of peripheral hardware end transmission and be sent to service Device；It receives the first response audio that server is sent and is sent to peripheral hardware end, so that peripheral hardware end broadcasting the first response audio, first Response audio is used to determine that the user of the corresponding user of terminal to be intended to；It receives the second audio that peripheral hardware end is sent and is sent to service Device, the second audio is for characterizing user's intention；It receives the second response audio that server is sent and is sent to peripheral hardware end, so that outside If end plays the second response audio, the second response audio is that server is intended to the response audio obtained based on user.The present embodiment More wheel interactive voices between terminal and server are realized, the interactive function of peripheral hardware end and terminal is enriched, improves user Experience.

The detailed process for how obtaining user's intention to server below is illustrated, and Fig. 3 is voice provided by the invention The flow diagram two of exchange method, as shown in figure 3, voice interactive method provided in this embodiment includes:

S301, peripheral hardware end send the first audio to terminal.

S302 receives the first audio that peripheral hardware end is sent and is sent to server.

S303, the first response audio that server is sent to terminal, the first response audio determine the first audio for requesting Semanteme.

In the present embodiment, server can carry out the first audio after receiving the first response audio of terminal transmission Parsing, specific resolving can refer to the associated description in the S203 in above-described embodiment；The first response in the present embodiment Audio is used for the semanteme for requesting to determine the first audio.In the intention of user include the first audio semanteme, the tone and mood etc., on Stating the first response audio for determining user's intention in embodiment can also be that the active of server is recommended.

Illustratively, if the corresponding text of the first audio is " how is Pekinese's weather ", using tokenizer by the text Be cut into multiple words, the word after specific cutting can be " Beijing ", " ", " weather " and " how ".In the present embodiment, Optionally, the corresponding target word of effective information can be obtained, such as by the session after cutting according to the part of speech of multiple words of acquisition Quantifier, adverbial word, adjective in message etc. remove, and obtain the corresponding target word of effective information, such as noun and verb, such as will In above-mentioned cutting result " how " and " " remove, obtain the corresponding target word of effective information, " Beijing " and " weather ".Clothes Business device determines that the corresponding semanteme of the first audio is " Pekinese's weather " according to the target word of acquisition.

Although what server can obtain user demand at this time is Pekinese's weather, it not can determine that user is required Which day Pekinese's weather answer be, when server does not know the semanteme of the first audio, can according in the first audio Through determining semanteme, the first response audio is returned to terminal.Illustratively, first sent such as server to terminal responds audio For " may I ask you it should be understood which day Pekinese's weather ".

S304, terminal send the first response audio to peripheral hardware end.

S305, peripheral hardware end play the first response audio.

S306, peripheral hardware end send the second audio to terminal.

In the present embodiment, after peripheral hardware end plays the first response audio, user can receive server and send the first sound The semanteme of audio is answered, therefore, for the first response audio, user can say the semantic sentence of the first audio of characterization, outside If end can collect the semantic sentence of first audio of characterization, i.e. the second audio, and carry out in order to obtain to the first audio The response audio of answer then sends second audio to terminal.

S307, terminal receives the second audio that peripheral hardware end is sent, and sends the second audio to server.

S308, server send the second response audio to terminal.

Server can determine that being intended that for user asks Pekinese day after receiving the first audio such as " Pekinese's weather " Gas, and after receiving the second audio " tomorrow ", since the second audio can characterize the semanteme of the first audio, server The semanteme that can determine the first audio is " Pekinese's weather tomorrow ".It is worth noting that, server is receiving the second audio Afterwards, the second audio can be parsed according to the above-mentioned analysis mode to audio.

In the present embodiment, server is according to the semanteme of the first determining audio, and then server can be returned to user and be closed In the second response audio of Beijing weather tomorrow, such as " tomorrow Beijing fine day, 30 degree of temperature ".And the second response audio is sent To terminal.

It is worth noting that, server can determine the semanteme of the first audio after obtaining the second audio in the present embodiment, And then it returns and responds audio for the second of the first audio；After server receives the second audio, if first can't be determined The semanteme of audio can then continue to send the second response audio to terminal, at this point, the second response audio is still used to request first The semanteme of audio, i.e., server, peripheral hardware end and terminal in the present embodiment can carry out take turns more and interact, until server can obtain The semanteme for taking the first audio sends the semantic respective audio for the first audio to terminal.

S309, terminal send the second response audio to peripheral hardware end.

S310, peripheral hardware end plays the second response audio, when the second response audio is that server determines the semanteme of the first audio The response audio of acquisition.

Voice interactive method provided in this embodiment includes: the first audio for receiving peripheral hardware end and sending, and is sent out to server Send the first audio；Receive the first response audio that server is sent, the language that the first response audio is used to request to determine the first audio Justice；The first response audio is sent to peripheral hardware end, so that peripheral hardware end plays the first response audio；Receive the second sound that peripheral hardware end is sent Frequently, the second audio and to server is sent, the second audio is used to characterize the semanteme of the first radio reception；Receive server is sent second Audio is responded, and plays the second response audio, alternatively, the second response audio is sent to peripheral hardware end, so that peripheral hardware end plays second Audio is responded, the second response audio is that server determines the response audio obtained when the semanteme of the first audio.The present embodiment is realized More wheel interactive voices between terminal and server, enrich the interactive function of peripheral hardware end and terminal, improve user experience.

In the present invention, before carrying out above-mentioned interactive voice, need to wake up terminal or peripheral hardware end, following implementations To how waking up terminal or peripheral hardware end is described in detail in example.

On the basis of the above embodiments, below with reference to Fig. 4 to how being waken up in voice interactive method provided by the invention Terminal is illustrated, and Fig. 4 is the flow diagram three of voice interactive method provided by the invention, as shown in figure 4, the present embodiment mentions The voice interactive method of confession may include:

S401, terminal are established DMA with peripheral hardware end and are connect.

In the prior art, bluetooth connection is established between terminal and peripheral hardware bluetooth equipment are as follows: terminal is swept by existing bluetooth Mode is retouched, i.e. Bluetooth Low Energy (Bluetooth Low Energy, ble) scanning obtains the bluetooth equipment that can connect, with indigo plant Tooth equipment room first establishes ble connection；After the connection is established, bluetooth equipment is indicated to terminal returning response message, the response message Terminal can be by supporting the rfcomm link of rfcomm agreement disappear with the connection of bluetooth equipment, terminal receiving the response It disconnects after breath and being connect with the ble of bluetooth equipment, is attached again through rfcomm link with bluetooth equipment.It is in the prior art Connection type will lead under ble link normal condition, influence the success rate and speed that carry out rfcomm connection.

Peripheral hardware end in the present embodiment is the peripheral hardware end for supporting DMA agreement, specifically, to terminal and peripheral hardware in the present embodiment End is established DMA connection type and is described briefly: terminal supports the DMA peripheral hardware end of DMA agreement to send out to terminal during scanning Broadcast packet is sent, includes to indicate that the identification information of DMA connection is supported at the peripheral hardware end in the broadcast packet, then terminal directly passes through Rfcomm link is attached with peripheral hardware end, is solved under ble link normal condition in the prior art, influences to carry out rfcomm The problem of success rate and speed of connection.

S402, terminal send radio reception instruction to peripheral hardware end, and radio reception instruction is used to indicate peripheral hardware end and starts radio reception.

Peripheral hardware end in the present embodiment has function of radio receiver, specifically, peripheral hardware end can be the vehicle-mounted branch with Mike Mic Frame, Baffle Box of Bluetooth, bluetooth headset, light emitting diode (Light-Emitting Diode, LED) lamp etc. with function of radio receiver are set It is standby.

Terminal with after bluetooth connection is established at peripheral hardware end or DMA is connect, when user has interactive voice demand, as user wants It when inquiring weather, playing song, can be operated on the interface of terminal, radio reception is sent to peripheral hardware end with triggering terminal and is referred to It enables.Fig. 5 is the interface schematic diagram of terminal provided by the invention, can be as shown in figure 5, terminal is after establishing connection with peripheral hardware end The title that peripheral hardware end is shown on terminal interface, such as peripheral hardware end A；And " start radio reception " control, user by click or other " starting radio reception " control is somebody's turn to do in operation selection, sends radio reception instruction to peripheral hardware end with triggering terminal, specifically, radio reception instruction is used for Instruction peripheral hardware end starts radio reception.

S403, terminal receives the third audio that peripheral hardware end is sent, if in third audio including the corresponding wake-up word of terminal, Then terminal enters wake-up states.

In the present embodiment, the third audio that end-on receives the transmission of peripheral hardware end is parsed, specifically, the resolving It can be with are as follows: the third audio that terminal will acquire is converted to text using conversion regime in the prior art.

Whether the third audio that terminal judgement receives has default wake-up word, which is used to wake up terminal, specifically, It is the interaction waken up between terminal and server.Corresponding, terminal judges whether there is wake-up word in the corresponding text of third audio. When terminal, which determines in third audio, wake-up word, that is, enter wake-up states, i.e. terminal can will carry the third for waking up word Audio after audio is sent to server.

Illustratively, waking up word is " small degree ", then when having " small degree " in the corresponding text of third audio, terminal determines should It is carried in third audio and wakes up word " small degree ", then terminal enters wake-up states.

S404, peripheral hardware end send the first audio to terminal.

S405, terminal receives the first audio that peripheral hardware end is sent, and sends the first audio to server.

S406, the first response audio that server is sent to terminal, the first response audio is for determining the corresponding use of terminal The user at family is intended to.

Optionally, the semanteme that the first response audio in the present embodiment is used to request to determine the first audio.

S407, terminal send the first response audio to peripheral hardware end.

S408, peripheral hardware end play the first response audio.

S409, peripheral hardware end send the second audio to terminal.

Optionally, the second audio in the present embodiment is used to characterize the semanteme of the first audio.

S410, terminal receives the second audio that peripheral hardware end is sent, and sends the second audio to server.

S411, terminal receive server send stopping send message, stop send message be used to indicate terminal stop to Server sends audio.

The first preset duration is provided in the present embodiment, in server, server is in the second sound for receiving terminal transmission After frequency, if not receiving the 4th audio of terminal transmission in the first preset time again, it is determined that user, which speaks, to be finished, then according to the Two audios obtain corresponding response audio, and send to terminal and stop sending message, wherein stopping sends message and is used to indicate end End stops sending audio to server.Specifically, terminal receive server transmission stopping send message after, no longer to clothes Business device sends new audio.

S412 sends to peripheral hardware end and stops radio reception message, stops radio reception message and is used to indicate the stopping radio reception of peripheral hardware end.

Peripheral hardware end in the present embodiment is the peripheral hardware end of controllable radio reception, and terminal is sent in the stopping for receiving server transmission After message, it can be sent to peripheral hardware end and stop radio reception message, so that peripheral hardware end stops radio reception, to reduce the power consumption at peripheral hardware end.

S413, peripheral hardware end stop radio reception.

S414, server send the second response audio to terminal.

Optionally, the second response audio is the semantic response audio that obtains of the server based on the first audio.

S415, terminal send the second response audio to peripheral hardware end.

S416, peripheral hardware end play the second response audio.

S417 enters non-wake-up if terminal does not receive the 4th audio of peripheral hardware end transmission in the second preset duration State, and non-wake-up states message is sent to peripheral hardware end.

The second preset duration is stored in the present embodiment, in terminal, in terminal in the second response for receiving server transmission In the second preset duration after audio, if not receiving the 4th audio of peripheral hardware end transmission, i.e., new audio, it is determined that user does not have There is the demand of new interactive voice, then terminal enters non-wake-up states, specifically also sends non-wake-up states message to peripheral hardware end. Due to being the terminal being waken up, terminal, which enters non-wake-up states and can be terminal, enters dormant state for terminal in the present embodiment. Specifically, the non-wake-up states that terminal enters can be into energy-saving mode, to reduce in the case where no interactive voice Power consumption of terminal.

S418, peripheral hardware end receive the non-wake-up states message that terminal is sent；If not collecting packet within the scope of time threshold Audio is waken up containing wake up word second, then enters dormant state.

In the present embodiment, peripheral hardware end determines that the voice of terminal is handed over after the non-wake-up states message for receiving terminal transmission Mutually complete；Specifically, receiving within the scope of the time threshold after the non-wake-up states message, if peripheral hardware end is not collected It include to wake up the second of word to wake up audio, it is determined that user does not have the demand of interactive voice, then enters dormant state.

Specifically, the embodiment in S404-S410, S414-S416 in the present embodiment specifically can refer to above-mentioned implementation The associated description in S301-S307, S308-S310 in example, this is not restricted.

Terminal in the present embodiment is established DMA with peripheral hardware end and is connect, and solves ble link normal condition in the prior art Under, influence the problem of carrying out the success rate and speed of rfcomm connection；Terminal, can be with after entering wake-up states in the present embodiment Realization is interacted with more wheels of server, is enriched the interactive function of peripheral hardware end and terminal, is improved user experience；Further, If terminal do not received in the second preset duration peripheral hardware end transmission audio, enter non-wake-up states, and peripheral hardware end when Between not receive in threshold range include the wake-up audio for waking up word, then enter dormant state, so as in no interactive voice In the case where, reduce the power consumption of terminal and peripheral hardware end.And it is further, the present embodiment realizes Rapid Speech interaction, that is, is calling out It can start to input voice demand after terminal of waking up, further improve user experience.

It is illustrated below with reference to method of the Fig. 6 to the wake-up peripheral hardware end in voice interactive method provided by the invention, Fig. 6 For the flow diagram four of voice interactive method provided by the invention, as shown in fig. 6, voice interactive method provided in this embodiment May include:

S601, terminal are established DMA with peripheral hardware end and are connect.

S602, peripheral hardware end enter wake-up states.

In the present embodiment, peripheral hardware end can wake up for control, be also possible to wake up word wake-up.

Wherein, it is established after DMA connects at terminal and peripheral hardware end, user wants to carry out interactive voice, then user says peripheral hardware The wake-up word at end, or saying includes the sentence for waking up word, to wake up peripheral hardware end.Wherein, the first wake-up audio is to include The corresponding audio for waking up word in peripheral hardware end, after peripheral hardware end collects and wakes up audio to first, into wake-up states.

Illustratively, as peripheral hardware end wake-up word be " small degree ", then user says the language of " small degree " or " small degree is waken up " Sentence when, peripheral hardware end collect this first wake up audio after, determine include in the audio peripheral hardware end wake-up word, peripheral hardware end into Enter wake-up states.

Alternatively, being provided with wake-up control on peripheral hardware end in the present embodiment, control is waken up for triggering and wakes up peripheral hardware end；When When user has the demand of interactive voice, DMA is established with peripheral hardware end by terminal and is connect, after establishing DMA connection, user passes through It clicks or other operations is selected control is waken up, which wakes up peripheral hardware end for triggering, and peripheral hardware end, which receives, to be used Family to wake up control operational order after enter wake-up states.

It is worth noting that, the wake-up control being arranged on peripheral hardware end can be mechanical button, it is also possible to peripheral hardware end and shows The wake-up control shown on interface.

S603, terminal receive the beginning radio reception message that peripheral hardware end is sent, and beginning radio reception message is used to indicate peripheral hardware end and is in Wake-up states, and start radio reception.

S604, peripheral hardware end send the first audio to terminal.

S605, terminal receives the first audio that peripheral hardware end is sent, and sends the first audio to server.

S606, the first response audio that server is sent to terminal.

S607, terminal send the first response audio to peripheral hardware end.

S608, peripheral hardware end play the first response audio.

S609, peripheral hardware end send the second audio to terminal.

S610, terminal receives the second audio that peripheral hardware end is sent, and sends the second audio to server.

S611, terminal receive server send stopping send message, stop send message be used to indicate terminal stop to Server sends audio.

S612 sends to peripheral hardware end and stops radio reception message, stops radio reception message and is used to indicate the stopping radio reception of peripheral hardware end

S613, peripheral hardware end stop radio reception.

S614, server send the second response audio to terminal.

S615, terminal send the second response audio to peripheral hardware end.

S616, peripheral hardware end play the second response audio.

S617 enters non-wake-up if terminal does not receive the 4th audio of peripheral hardware end transmission in the second preset duration State, and non-wake-up states message is sent to peripheral hardware end.

S618, peripheral hardware end receive the non-wake-up states message that terminal is sent；If not receiving packet within the scope of time threshold Audio is waken up containing wake up word second, then enters dormant state.

Specifically, the embodiment in S601, S605-S618 in the present embodiment specifically can refer in above-described embodiment Associated description in S501, S505-S518, this is not restricted.

In the present embodiment, after entering wake-up states by peripheral hardware end, interactive voice is initiated, realizes the more of terminal and server Wheel interaction, enriches the interactive function of peripheral hardware end and terminal, improves user experience.

Fig. 7 is the structural schematic diagram one of a voice interaction device provided by the invention, as shown in fig. 7, the interactive voice fills Setting 700 includes: the first audio processing modules 701, first response audio processing modules 702, the second audio processing modules 703 and the Two response audio processing modules 704.

First audio processing modules 701, for receiving the first audio of peripheral hardware end transmission and being sent to server.

First response audio processing modules 702, for receiving the first response audio of server transmission and being sent to peripheral hardware End, so that peripheral hardware end plays the first response audio, the first response audio is used to determine the use of the corresponding user of voice interaction device Family is intended to.

Second audio processing modules 703, for receiving the second audio of peripheral hardware end transmission and being sent to server, the second sound Frequency is for characterizing user's intention.

Second response audio processing modules 704, for receiving the second response audio of server transmission and being sent to peripheral hardware End, so that peripheral hardware end plays the second response audio, the second response audio is that server is intended to the response audio obtained based on user.

Voice interaction device provided in this embodiment is similar with principle and technical effect that above-mentioned voice interactive method is realized, Therefore not to repeat here.

Optionally, Fig. 8 is the structural schematic diagram two of a voice interaction device provided by the invention, as shown in figure 8, the voice Interactive device 700 includes: radio reception instruction sending module 705, third audio receiving module 706, starts radio reception message reception module 707, stop radio module 708 and non-wake-up states message transmission module 709.

Radio reception instruction sending module 705, for sending radio reception instruction to peripheral hardware end, radio reception instruction is used to indicate peripheral hardware end and opens Beginning radio reception.

Third audio receiving module 706, for receiving the third audio of peripheral hardware end transmission, if including language in third audio The corresponding wake-up information of sound interactive device, then voice interaction device enters wake-up states.

Start radio reception message reception module 707, for receiving the beginning radio reception message of peripheral hardware end transmission, starts radio reception message It is used to indicate peripheral hardware end and is in wake-up states, and start radio reception.

Stop radio module 708, the stopping for receiving server transmission sends message, stops transmission message and is used to indicate Voice interaction device stops sending audio to server, stops sending message being the of server after receiving the second audio In one preset duration, sent when not receiving four audio of voice interaction device transmission；It is sent to peripheral hardware end and stops radio reception Message stops radio reception message and is used to indicate the stopping radio reception of peripheral hardware end.

Non- wake-up states message transmission module 709, if for not receiving the transmission of peripheral hardware end in the second preset duration 4th audio then enters non-wake-up states, and sends non-wake-up states message to peripheral hardware end.

Optionally, the semanteme that the first response audio is used to request to determine the first audio, the semanteme of the first audio is for characterizing User is intended to；Second audio is used to characterize the semanteme of the first audio, and the second response audio is language of the server based on the first audio The response audio that justice obtains.

Fig. 9 is the structural schematic diagram three of a voice interaction device provided by the invention, as shown in figure 9, the interactive voice fills Setting 900 includes: memory 901 and at least one processor 902.

Memory 901, for storing program instruction.

Processor 902, for being performed the voice interactive method realized in the present embodiment, specific implementation in program instruction Principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 900 can also include and input/output interface 904.

Input/output interface 904 may include independent output interface and input interface, or integrated input and defeated Integrated interface out.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data are the general designation exported in above method embodiment, and the data of input are the general designation inputted in above method embodiment.

The present invention also provides a kind of readable storage medium storing program for executing, it is stored with and executes instruction in readable storage medium storing program for executing, work as interactive voice When at least one processor of device executes this and executes instruction, when computer executed instructions are executed by processor, realize above-mentioned Voice interactive method in embodiment.

The present invention also provides a kind of program product, the program product include execute instruction, this execute instruction be stored in it is readable In storage medium.At least one processor of voice interaction device can read this from readable storage medium storing program for executing and execute instruction, at least One processor executes this and executes instruction so that voice interaction device implements the interactive voice that above-mentioned various embodiments provide Method.

Figure 10 is the structural schematic diagram one of another voice interaction device provided by the invention, which is peripheral hardware End, as shown in Figure 10, which includes: the first audio sending module 1001, playing module 1002 and second Audio sending module 1003.

First audio sending module 1001, for sending the first audio to terminal, so that terminal to server sends first Audio, so that server returns to the first response audio to terminal according to the first audio, the first response audio is for determining terminal pair The user of the user answered is intended to.

Playing module 1002 for receiving the first response audio of terminal transmission, and plays the first response audio.

Second audio sending module 1003, for sending the second audio to terminal, so that terminal to server sends second Audio, so that server returns to the second response audio to terminal, the second audio is for characterizing user's intention.

Playing module 1002, is also used to receive the second response audio of terminal transmission, and plays the second response audio, and second Responding audio is that server is intended to the response audio obtained based on user.

Optionally, Figure 11 is the structural schematic diagram two of another voice interaction device provided by the invention, as shown in figure 11, should Voice interaction device 1000 include: third audio sending module 1004, start radio reception message module 1005, wake-up module 1006, Stop radio module 1007 and sleep block 1008.

Third audio sending module 1004, for receiving the radio reception instruction of terminal transmission, radio reception instruction is used to indicate voice Interactive device starts radio reception；Third audio is sent to terminal, if in third audio including the corresponding wake-up information of terminal, eventually End enters wake-up states.

Start radio reception message module 1005, start radio reception message for sending to terminal, starts radio reception message for notifying Terminal, voice interaction device is in wake-up states, and starts radio reception.

Wake-up module 1006 wakes up audio for collecting the first of user, and enters wake-up states, and first wakes up in audio It include the corresponding wake-up information of voice interaction device；Alternatively, receiving user to the operational order for waking up control, and enters and wake up State is provided with wake-up control on voice interaction device, wakes up control for triggering and wakes up voice interaction device.

Stop radio module 1007, for receiving the stopping radio reception message of terminal transmission；Stop radio reception.

Sleep block 1008, for receiving the non-wake-up states message of terminal transmission；If not connect within the scope of time threshold Receiving includes to wake up the second of word to wake up audio, then enters dormant state.

Figure 12 is the structural schematic diagram three of another voice interaction device provided by the invention, and as shown in figure 12, which hands over Mutual device 1200 includes: memory 1201 and at least one processor 1202.

Memory 1201, for storing program instruction.

Processor 1202, it is specific real for being performed the voice interactive method realized in the present embodiment in program instruction Existing principle can be found in above-described embodiment, and details are not described herein again for the present embodiment.

The voice interaction device 1200 can also include and input/output interface 1203.

Input/output interface 1203 may include independent output interface and input interface, or integrated input and The integrated interface of output.Wherein, output interface is used for output data, and input interface is used to obtain the data of input, above-mentioned output Data be the general designation that exports in above method embodiment, the data of input are the general designation inputted in above method embodiment.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this hair The part steps of bright each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviation: ROM), random access memory (English: Random Access Memory, letter Claim: RAM), the various media that can store program code such as magnetic or disk.

In the embodiment of the above-mentioned network equipment or terminal device, it should be appreciated that processor can be central processing unit (English: Central Processing Unit, referred to as: CPU), it can also be other general processors, digital signal processor (English: Digital Signal Processor, abbreviation: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor It is also possible to any conventional processor etc..Hardware handles can be embodied directly in conjunction with the step of method disclosed in the present application Device executes completion, or in processor hardware and software module combination execute completion.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of voice interactive method is applied to terminal characterized by comprising

It receives the first response audio that the server is sent and is sent to the peripheral hardware end, so that described in the broadcasting of the peripheral hardware end First response audio, the first response audio are used to determine that the user of the corresponding user of the terminal to be intended to；

It receives the second audio that the peripheral hardware end is sent and is sent to the server, second audio is for characterizing the use Family is intended to；

It receives the second response audio that the server is sent and is sent to the peripheral hardware end, so that described in the broadcasting of the peripheral hardware end Second response audio, the second response audio are that the server is intended to the response audio obtained based on the user.

2. the method according to claim 1, wherein

The first response audio is used to request to determine the semanteme of first audio, and the semanteme of first audio is for characterizing The user is intended to；

Second audio is used to characterize the semanteme of first audio, and the second response audio is based on institute for the server State the semantic response audio obtained of the first audio.

3. the method according to claim 1, wherein being wrapped before first audio for receiving the transmission of peripheral hardware end It includes:

The third audio that the peripheral hardware end is sent is received, if in the third audio including the corresponding wake-up information of terminal, The terminal enters wake-up states.

4. the method according to claim 1, wherein being wrapped before first audio for receiving the transmission of peripheral hardware end It includes:

Receive the beginning radio reception message that the peripheral hardware end is sent, the beginnings radio reception message is used to indicate the peripheral hardware end in calling out The state of waking up, and start radio reception.

5. method according to claim 1-4, which is characterized in that second for receiving the peripheral hardware end and sending Audio is simultaneously sent to after the server, further includes:

It receives the stopping that the server is sent and sends message, the stopping transmission message is used to indicate the terminal and stops to institute It states server and sends audio；The server does not receive institute receiving in the first preset duration after second audio When stating four audio of terminal transmission, sends the stopping and send message；

It is sent to the peripheral hardware end and stops radio reception message, the stopping radio reception message is used to indicate the peripheral hardware end and stops radio reception.

6. the method according to claim 3 or 4, which is characterized in that second response for receiving the server and sending Audio is simultaneously sent to after the peripheral hardware end, further includes:

If not receiving the 4th audio that the peripheral hardware end is sent in the second preset duration, into non-wake-up states, and to The peripheral hardware end sends non-wake-up states message.

7. a kind of voice interactive method is applied to peripheral hardware end characterized by comprising

The first audio is sent to terminal, so that the terminal to server sends first audio, so that the server root The first response audio is returned to the terminal according to first audio, the first response audio is for determining that the terminal is corresponding User user be intended to；

The second audio is sent to the terminal, so that the terminal sends second audio to the server, so that described Server returns to the second response audio to the terminal, and second audio is intended to for characterizing the user；

The second response audio that the terminal is sent is received, and plays the second response audio, the second response sound Frequency is that the server is intended to the response audio obtained based on the user.

8. the method according to the description of claim 7 is characterized in that

9. the method according to the description of claim 7 is characterized in that before first audio of transmission to terminal, further includes:

Third audio is sent to the terminal, if in the third audio including the corresponding wake-up information of the terminal, institute It states terminal and enters wake-up states.

10. the method according to the description of claim 7 is characterized in that before first audio of transmission to terminal, further includes:

It is sent to the terminal and starts radio reception message, the beginnings radio reception message is used to notify the terminal, at the peripheral hardware end In wake-up states, and start radio reception.

11. according to the method described in claim 10, it is characterized in that, it is described to the terminal send start radio reception message it Before, further includes:

It collects the first of user and wakes up audio, and enter wake-up states, include the peripheral hardware end in the first wake-up audio Corresponding wake-up information；Alternatively,

The user is received to the operational order for waking up control, and enters wake-up states, is provided with and calls out on the peripheral hardware end Awake control, the wake-up control wake up the peripheral hardware end for triggering.

12. according to the described in any item methods of claim 7-11, which is characterized in that described to send the second audio to the terminal Later, further includes:

Receive the stopping radio reception message that the terminal is sent；

Stop radio reception.

13. according to the method for claim 12, which is characterized in that after the stopping radio reception, further includes:

Receive the non-wake-up states message that the terminal is sent；

If not receiving within the scope of time threshold includes to wake up the second of information to wake up audio, enter dormant state.

14. a kind of voice interaction device characterized by comprising

First response audio processing modules, for receiving the first response audio of the server transmission and being sent to the peripheral hardware End, so that the peripheral hardware end plays the first response audio, the first response audio is for determining the interactive voice dress The user for setting corresponding user is intended to；

Second audio processing modules, it is described for receiving the second audio of the peripheral hardware end transmission and being sent to the server Second audio is intended to for characterizing the user；

Second response audio processing modules, for receiving the second response audio of the server transmission and being sent to the peripheral hardware End, so that the peripheral hardware end plays the second response audio, the second response audio is that the server is based on the use Family is intended to the response audio obtained.

15. a kind of voice processing apparatus characterized by comprising

First audio sending module, for sending the first audio to terminal, so that the terminal to server sends described first Audio, so that the server returns to the first response audio, the first response sound to the terminal according to first audio Frequency is for determining that the user of the corresponding user of the terminal is intended to；

First response audio processing modules, the first response audio sent for receiving the terminal, and play described the One response audio；

Second audio sending module, for sending the second audio to the terminal, so that the terminal is sent to the server Second audio, so that the server returns to the second response audio to the terminal, second audio is for characterizing institute State user's intention；

Second response audio processing modules, the second response audio sent for receiving the terminal, and play described the Two response audios, the second response audio are that the server is intended to the response audio obtained based on the user.

16. a kind of terminal characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that the terminal perform claim It is required that the described in any item methods of 1-6.

17. a kind of peripheral hardware end characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that peripheral hardware end right of execution Benefit requires the described in any item methods of 7-13.

18. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes method described in any one of claims 1-6.

19. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium It executes instruction, when the computer executed instructions are executed by processor, realizes the described in any item methods of claim 7-13.