CN109949801A

CN109949801A - A kind of smart home device sound control method and system based on earphone

Info

Publication number: CN109949801A
Application number: CN201910022445.2A
Authority: CN
Inventors: 揭东辉; 郎柳
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-01-10
Filing date: 2019-01-10
Publication date: 2019-06-28

Abstract

The invention discloses a kind of smart home device sound control method and system based on earphone, the method includes earphones to receive the request voice that user issues, and uploads the request voice to Cloud Server；Cloud Server receives the request voice, is parsed, and issues corresponding instruction to Intelligent household central control；The Intelligent household central control receives described instruction, operates to smart home device.By the invention it is possible to realize the control by earphone, using the interactive mode of natural language to smart home device.Under effective solution indoor application scene, the problem of more people share discrimination reduction caused by same set of Intelligent household central control bring privacy is poor, pick up facility inconvenience is mobile, environment is noisy.It avoids in multicell linkage and increases desktop speaker or wall switch bring cost problem.

Description

A kind of smart home device sound control method and system based on earphone

[technical field]

The present invention relates to Computer Applied Technologies, in particular to the smart home device sound control method based on earphone and System.

[background technique]

With the continuous development of AI technology, AI technology is widely applied to the every field of people's life, especially applies In " smart home system ", the phonetic order of user is received by pick up facility, is sent to cloud service and is carried out speech recognition, by cloud Service carries out the control instruction that speech recognition obtains and is sent to smart home device, realizes the control to smart home device.

But the pick up facility of current " smart home system " is one " desktop speaker " or " wall switch " mostly Form exists, it has not been convenient to mobile.Indoors under more people's noisy environments, phonetic recognization rate can be substantially reduced；Also, user instruction meeting It is heard by owner, privacy is poor.It is typically also by increase " desktop speaker " or " metope in the solution of multicell linkage Switch " solves, at high cost.

[summary of the invention]

The many aspects of the application provide smart home device sound control method and system based on earphone, Neng Goutong Earphone realization is crossed to the voice control of smart home device.

The one side of the application provides a kind of smart home device sound control method based on earphone, comprising:

Earphone receives the request voice that user issues, and uploads the request voice to Cloud Server；

Cloud Server receives the request voice, is parsed, and issues corresponding instruction to Intelligent household central control；

The Intelligent household central control receives described instruction, operates to smart home device.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation is received in earphone The request voice that user issues, and before uploading the request voice to Cloud Server:

Earphone receives the wake-up word that user issues or user to the key being arranged on earphone or the operation of touch area, enters Wake-up states.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the earphone are logical The wireless network connection of wireless router offer is crossed to Cloud Server.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation is asked described in upload The voice is asked to include: to Cloud Server

It reports earphone to be waken up to Cloud Server and starts to receive the event of user speech request, and by the request voice It is subsidiary to upload.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, Cloud Server connect The request voice is received, is parsed, issuing corresponding instruction to Intelligent household central control includes:

Cloud Server carries out speech recognition to the request voice in the event, obtains corresponding request text；

Intention assessment is carried out to the request text, obtains corresponding intention；

The corresponding instruction of the intention is issued to Intelligent household central control.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, described instruction are Smart home protocol instructions, content play protocol instructions or third party's service protocol instructions.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, described instruction packet Include the device identifier of smart home device.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the intelligence man Control placed in the middle receives described instruction, carries out operation to smart home device and includes:

The Intelligent household central control sets corresponding smart home according to the device identifier for including in the control instruction It is standby to be controlled.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the method is also The following steps are included:

Intelligent household central control returns to the state of the smart home device after the completion of operation to Cloud Server；

Cloud Server returns to processing result to earphone.

Another invention of the application discloses a kind of smart home device speech control system based on earphone, including ear Machine, Cloud Server, Intelligent household central control, wherein

The earphone for receiving the request voice of user's sending, and uploads the request voice to Cloud Server；

The Cloud Server is parsed for receiving the request voice, issues corresponding instruction into smart home Control；

The Intelligent household central control operates smart home device for receiving described instruction.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the earphone is also For the wake-up word for receiving user's sending or user to be to the key being arranged on earphone or the operation of touch area, into wake-up shape State.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the earphone tool Body is used for:

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the cloud service Implement body is used for:

Speech recognition is carried out to the request voice in the event, obtains corresponding request text；

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the intelligence man Control placed in the middle is specifically used for:

According to the device identifier for including in the control instruction, corresponding smart home device is controlled.

The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the intelligence man Control placed in the middle is also used to, and the state of the smart home device after the completion of operation is returned to Cloud Server；

The Cloud Server is also used to, and returns to processing result to earphone.

Another aspect of the present invention, provides a kind of computer equipment, including memory, processor and is stored in the storage On device and the computer program that can run on the processor, the processor are realized as previously discussed when executing described program Method.

Another aspect of the present invention provides a kind of computer readable storage medium, is stored thereon with computer program, described Method as described above is realized when program is executed by processor.

It can be seen that based on above-mentioned introduction using scheme of the present invention, can be realized by earphone and smart home is set Standby voice control.

[Detailed description of the invention]

Fig. 1 is the flow chart of the smart home device sound control method of the present invention based on earphone；

Fig. 2 is the structure chart of the smart home device speech control system of the present invention based on earphone；

Fig. 3 shows the frame for being suitable for the exemplary computer system/server 012 for being used to realize embodiment of the present invention Figure.

[specific embodiment]

To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Whole other embodiments obtained without creative efforts, shall fall in the protection scope of this application.

Fig. 1 is the flow chart of the smart home device sound control method embodiment of the present invention based on earphone, such as Fig. 1 It is shown, comprising the following steps:

Step S11, earphone receives the request voice that user issues, and uploads the request voice to Cloud Server；

Step S12, Cloud Server receives the request voice, is parsed, and issues corresponding instruction into smart home Control；

Step S13, after the described Intelligent household central control receives described instruction, smart home device is operated.

In the present embodiment, the earphone is lightweight equipment, and lightweight equipment refers to that operation FreeRTOS, mbedOS etc. are light The Intelligent hardware of magnitude operating system, they have the characteristics that low cost, low-power consumption, portable.Preferably, the earphone is hard Part minimum standard is the MCU of ARM Cortex-M3 kernel, dominant frequency 120Mhz, Flash capacity > 256KB, SRAM > 64KB, described MCU uses FreeRTOS or mbedOS.

The lightweight equipment can be accessed using the intelligent sound interaction platform service for including but is not limited to dueros DuerOS.Dialog mode artificial intelligence operating system DuerOS open as one, while open platform is issued, build voice AI Ecosystem supports the ability of third party developer to access.The lightweight equipment uses DCS (DuerOS Conversational Service) agreement communicated with Cloud Server, and DCS agreement is the intelligence that DuerOS externally opens for free It can interactive voice service API, the API that the intelligent sound interaction capabilities of DuerOS are opened to all devices.The lightweight is set It is standby to be communicated by this set API with server-side, it realizes DCS protocol client logic, accesses DuerOS and service, it will be able to allow institute State all interaction capabilities that lightweight equipment has DuerOS.

Wherein, DCS agreement is made of three instruction, event, end state parts.Instruction is that DuerOS Cloud Server is handed down to Equipment end, the operation that equipment end needs to be implemented.For example a voice is played, an alarm clock is set, music etc. is played.Thing Part is that equipment end is reported to server-side, the thing that notice server-side occurs in equipment end.For example music starts, music is broadcast It puts and finishes, quarter-bell starts to ring, and equipment is waken up and starts to receive user speech request etc..Instruction and event are DCS associations Most basic element is discussed, the variation occurred in equipment end all notifies server-side, server-side to pass through by reporting corresponding event Under issue a command to equipment end, to user request respond.Equipment end needs to take the end state of equipment end in reported event Information.For example currently whether thering is music to be playing, broadcasting is where, and whether equipment end has setting quarter-bell, alarm state etc. Deng.Request to user, the current state in which of server-side binding end determine reasonable response, issue corresponding instruction.

The earphone includes sound collection unit, signal processing unit, communication unit, voice output unit etc..

Preferably, the earphone is wireless headset, the communication unit of the earphone, using WiFi SoC chip and no route Being connected by device, being communicated with will pass through wireless router with Cloud Server；Or, using BT (Bluetooth bluetooth) SOC chip It is connected with mobile terminal, and then is communicated by the forwarding of mobile terminal with Cloud Server.The mobile terminal passes through wireless Router or mobile network are connected with Cloud Server.

In a kind of preferred implementation of step S11,

Preferably, when user initiates voice request, earphone is waken up, and receives the request voice that user issues, and uploads institute Request voice is stated to Cloud Server.

Preferably, earphone is likely to be at following several working conditions: in a dormant state, waiting is waken up earphone；Earphone In broadcast state；Earphone is on call.User wakes up word, touch key-press or other modes wake-up earphone by saying, Pickup mode is made it into, the request voice that user issues is received.Preferably, after earphone enters pickup mode, earphone termination is worked as Preceding broadcasting or call.

Preferably, the signal processing unit of earphone, such as low-power consumption DSP receive wake-up word or user couple that user issues The operation of the key or touch area that are arranged on earphone wakes up earphone, into wake-up states.

Preferably, after earphone is waken up, into pickup mode, the request voice that user issues is obtained.

Preferably, earphone carries out speech detection to audio data collected after the wake-up moment, detects language therein Segment, the request voice that institute's speech segment is issued as user.

Preferably, earphone uploads the request voice to Cloud Server, and the Cloud Server is DuerOS.Earphone passes through DCS agreement reports earphone to be waken up and starts to receive event (the lisenStarted thing of user speech request to Cloud Server Part), and request voice is incidentally uploaded.

Preferably, ListenStarted event is the HTTP request of a multipart message, the HTTP request Comprising two message (message), first message is the corresponding JSON message of ListenStarted event, second message For the binary audio data flow of voice, the format of data flow is by ListenStarted event JSON message Payload.format is determined.The JSON message is by two part groups of end state (clientContext) and event (event) At.

Preferably, the sound collection unit of earphone acquires audio data, is waken up using the signal processing unit of earphone Detection；Including following sub-step:

Sub-step A, acquisition audio data are simultaneously cached.

Sound collection unit on earphone, such as microphone acquire the audio data in earphone local environment to be called out It wakes up and detects.

Preferably, the microphone can be in always pickup state (persistently sampled, quantified to audio data), adopt Collect the audio data in earphone local environment to carry out wake-up detection.

Preferably, according to specific needs, such as in order to reduce the power consumption of earphone, earphone can also be acquired according to predetermined period Audio data in local environment；For example, being detected by the period of 10ms.The period of the regular volume detection can be in ear It is default when machine dispatches from the factory, it can also be arranged by user according to self-demand.

In the present embodiment, for the audio data it is to be understood that in earphone local environment, microphone can be collected The corresponding information of any sound, for example, people, including sound, ambient noise etc. that user issues, as long as the microphone can be adopted Collection.

Preferably, the sound collection unit on earphone caches collected audio data using circular buffer, It should be noted that above-mentioned circular buffer (Ring Buffer or Circular Buffer) can be in storage audio data Audio data is read simultaneously.

Sub-step B, speech detection is carried out to audio data collected.

The signal processing unit of earphone, such as low-power consumption DSP, including speech detection module, for what is acquired to microphone Audio data carries out speech detection (Voice Activity Detection, VAD), can go out audio signal with accurate detection Voice segments initial position, to isolate voice segments and non-speech segment (mute or noise) signal.

Since VAD is needed in the local completion in real time of earphone.Since computing resource is very limited, it can generally use base In the VAD of threshold value；It may also be utilized by the classification of engineering optimization.

It, can be only to voice segments by detecting voice segments therein to audio data collected progress speech detection Wake-up detection is carried out, to reduce power consumption.

Sub-step C, the voice segment information obtained to detection carry out wake-up detection.

Preferably, the signal processing unit of earphone, such as low-power consumption DSP, built-in voice wake up engine, at any time etc. to be received The voice wake up instruction that user issues is detected by waking up word algorithm.

In daily use, user needs to be issued according to the wake-up word of earphone first before saying voice operating instruction Wake up instruction, for example, operational order could then be issued to wake up earphone by issuing the wake up instruction of " the small small degree of degree ".

The voice of signal processing unit wakes up the voice segments that engine obtains detection and carries out wake-up detection, since target is single (only need to detect specified wake-up word), wake-up only need lesser acoustic model and decoding network (only need to distinguish whether there is or not Word is waken up to occur), acoustics marking and decoding can quickly, and space hold is few, can locally complete in real time in earphone.

It is said simultaneously preferably due to which user will wake up word with request voice, such as " the small small degree of degree, open kitchen lamp ", In view of the length of common wake up instruction and phonetic order, circular buffer can cache the audio data of 5s or so, in this way, The request voice of user's sending can be immediately obtained after detecting wake up instruction, user is waken up simultaneously without waiting for earphone It issues the user with prompt tone and just issues request voice later.

The speech detection module of signal processing unit carries out speech detection (Voice to the audio data that microphone acquires Activity Detection, VAD), the voice segments initial position of audio signal can be gone out with accurate detection, to isolate language Segment and non-speech segment (mute or noise) signal, using institute's speech segment as request voice.

Preferably, the signal processing unit of the earphone, such as low-power consumption DSP, are integrated with the noise reduction algorithm of near field voice, Noise reduction process is carried out to the request voice, environmental noise can be effectively reduced, improve the signal-to-noise ratio of the voice of user.

Since the microphone of earphone is close at a distance from user, the Wave beam forming for carrying out complexity is not needed generally and sound source is fixed Position, it is only necessary to carry out the signal processing operations such as noise reduction, echo processing, obtain more pure user's voice information.

Preferably, for request voice each time, earphone needs to generate one for it uniquely DialogRequestId, unique identification are this time talked with；Cloud Server issues the corresponding instruction of this dialogue to Intelligent household central control Afterwards, to this id will be carried in the reply of earphone.

In order to improve the response speed of user's request, voice just is sent to Cloud Server when user starts voice request Request, and when user side is spoken in real time by voice data stream carry out streaming upload, rather than wait users finish after again into Row request.Preferably, the voice data stream carries out streaming upload with every 10 milliseconds for a data block (chunk).

Preferably, if Cloud Server detects that (Voice Activity Detection is VAD for detecting user It is not to finish), then StopListen instruction can be issued to earphone, and earphone stops immediately after receiving StopListen instruction Upload voice data stream.Preferably, stop listening to voice, and mute microphone (MIC) immediately.

Due to the request voice issued by earphone reception user, the request voice is directly sent to Cloud Server, Avoid existing Intelligent household central control pick up facility it is at high cost, it is private it is poor, be inconvenient to the defect moved.Earphone is moved easily It carries, there is stronger environment de-noising ability.When environment is noisy, user speech can be effectively picked up, eliminates environmental noise, improves voice Identify accuracy.

In a kind of preferred implementation of step S12,

Preferably, Cloud Server receives the request voice that earphone is sent, and is parsed, including speech recognition, intention Identification issues corresponding instruction to Intelligent household central control.

Preferably, in order to improve the probability of success of speech recognition, Cloud Server carries out noise reduction process to the request voice. The noise reduction process at earphone end and the noise reduction process of Cloud Server can be used individually, can also be used jointly, to guarantee user in ring In the biggish situation of border noise, higher speech recognition accuracy still can guarantee.

Cloud Server carries out speech recognition to the request voice first, in the present embodiment, the request issued with user For control smart machine request.For example, the request voice that user issues are as follows: " please open kitchen lamp " is corresponded to Request text " kitchen lamp please be open "；Preferably, pass through ASR (Automatic Speech Recognition, automatic speech Identification) technology/engine, the request voice is converted into text information.

Cloud Server carries out intention assessment to the request text, obtains corresponding user and is intended to, such as " opens kitchen Lamp ".

Preferably, by NLP technology/engine, text information is done into natural language understanding, and handles user's intention.

Preferably, intention assessment is carried out to the text information, obtains target intention and slot position；According to the target intention And slot position, generate corresponding instruction.

Preferably, the corresponding instruction of user intention is issued to Intelligent household central control by Cloud Server.

Voice request can substantially be divided into three classes: audio-video frequency content playing request, correspond to content and broadcast protocol instructions；Household Equipment control request, corresponds to smart home protocol instructions；Third party's service request (such as " calling a taxi ", " ordering hotel " of extension Deng), correspond to third party's service protocol instructions.In the application by taking smart home protocol instructions as an example.

Preferably, the smart home protocol instructions are control instruction, for controlling smart home device, are wrapped It includes: either on or off equipment, controllable light units, controllable temperature equipment, controllable air speed equipment, equipment mode setting, television channel Setting, controllable volume equipment, lockable equipment, printing device, controllable suction device, water amount controllable equipment, controllable electricity equipment, Controllable direction equipment, controllable height equipment etc..Wherein, the either on or off equipment includes opening equipment, timing opening equipment, closing The operation such as hull closure, timing pass hull closure, halt device.For example, Cloud Server can incite somebody to action when user wants to open designated equipment TurnonRequest instruction is issued to Intelligent household central control.

Preferably, the smart home protocol instructions can also instruct for discovering device.The discovering device instruction is used for The scene and device packets information searching the available equipment of user, can be used, have DiscoverAppliancesRequest and DiscoverAppliancesResponse two instructions.DiscoverAppliancesRequest instruction is to issue to search to set The equipment found is replied in standby request, DiscoverAppliancesResponse instruction.

Preferably, the smart home protocol instructions can also be inquiry instruction.Inquiry instruction (the Query Message air quality, inquiry air humidity, query facility temperature, query facility mainly) are inquired by smart home device The information such as state.

Preferably, earphone, smart home device are identified with the SSID of the wireless router, are determined as same group and are set It is standby.For example, earphone is connected to Cloud Server by wireless router, smart home device is connected to nothing by Intelligent household central control Line router, and then it is connected to Cloud Server.Earphone can only control the smart home device under same wireless router.

User needs first discovery equipment before controlling smart home device.It is set when user searches smart home When standby, DiscoverAppliancesRequest can be instructed and is sent to Intelligent household central control, Intelligent household central control by Cloud Server If finding smart home device, the relevant information of smart home device can be returned to Cloud Server.So that Cloud Server connects When receiving the request voice of user, corresponding smart home device is searched, and issue corresponding instruction to Intelligent household central control.It is excellent The operating process of selection of land, the discovery smart home device is similar with the operating process of smart home device is opened, herein no longer It repeats.

Preferably, Cloud Server searches the equipment mark with smart home device of the earphone under same wireless router Know symbol；It is intended to instruction corresponding with device identifier generation according to described, is sent to Intelligent household central control.

Preferably, the TurnonRequest instruction includes Header information and Payload information, wherein

The Header information is as described in Table 1:

Table 1

The Payload information is as described in Table 2:

Table 2

In a kind of preferred implementation of step S13, after the Intelligent household central control receives described instruction, to intelligence Home equipment is operated.

Preferably, described instruction is control instruction, the Intelligent household central control according to the control instruction, such as The device identifier for including in TurnonRequest instruction, controls corresponding smart home device, for example, described Device identifier in TurnonRequest instruction is the device identifier of kitchen lamp, then opens kitchen lamp.

In another preferred embodiment of the application, the voice request that user issues can be audio-video frequency content broadcasting and ask It asks, such as " I wants to listen news "；Earphone will request voice to be sent to Cloud Server, and Cloud Server parses being intended that for user and obtains The resource of news technical ability is taken, and sends instruction and resource to Intelligent household central control；Intelligent household central control receives request and provides request Source relevant information is sent to corresponding smart home device；Progress resource is broadcast after corresponding smart home device gets resource It puts.That is, described instruction further includes corresponding resource, so that smart home device carries out resource broadcasting.

User can control smart home device by issuing voice request to earphone as a result,.

The Intelligent household central control can manipulate the smart home device.

Preferably, the smart home device includes intelligent refrigerator, smart television, intelligent air condition, intelligent washing machine, intelligence The big household electrical appliances such as water heater, further include the small household appliances such as sweeping robot, set-top box, electric cooker, air purifier and water purifier, The home equipments such as socket, door lock, lamp.

Preferably, the present embodiment can with the following steps are included:

Step S14, after the completion of the operation carried out to the smart home device, Intelligent household central control obtains the intelligence The state of home equipment, and return to Cloud Server.

Preferably, the state of the smart home device is the operation that the smart home device executes Intelligent household central control Device attribute information afterwards, for example, the attribute information of kitchen lamp is turnOnState.

Preferably, response instruction being sent to Cloud Server, the response instruction includes the state of the smart home device, Such as device attribute information.

Preferably, the response is Confirmation instruction, for example, it is directed to TurnonRequest control instruction, it is described Response is TurnonConfirmation instruction, including Header information and Payload information, wherein

The Header information is as described in Table 4:

Table 3

The Payload information is as described in Table 4:

Table 4

Step S15, Cloud Server receives the state of the smart home device, returns to processing result to earphone.

Preferably, after Cloud Server receives the response instruction, the state of the smart home device is parsed, is used The processing result of family request, and return to earphone.

Preferably, the processing result corresponding to the response instruction, the processing result are stored in the Cloud Server For text information, the Cloud Server passes through TTS (Text To Speech, text voice) technology/engine for the text envelope Breath is converted to voice messaging, and the voice messaging is returned to user.

Preferably, such as the response instructs corresponding meaning for " kitchen lamp is successfully opened ", and corresponding processing result is The text information of " kitchen lamp is successfully opened " or " good "；The text information is converted into voice letter by TTS technology/engine Breath；Voice messaging " kitchen lamp is successfully opened " or voice messaging " good " are returned to earphone.

Preferably, the text information can be converted to by voice messaging by tts engine, or in advance by each corresponding formulation The text information of corresponding processing result is converted to voice messaging by tts engine, is pre-stored in voice messaging library, when receiving After the response instruction, corresponding voice messaging is searched from preset voice messaging library.

Step S16, earphone receives the processing result that Cloud Server is sent, and plays the processing result.

Through the above steps, to the user feedback for the using earphone implementing result of voice request, user is made to know household The execution state of smart machine, to make the operation of next step.

Above-described embodiment through the invention can be realized using the interactive mode of natural language to intelligent family by earphone Occupy the control of equipment.The auxiliary control appliance that earphone can be used as Intelligent household central control carry out using.It can be answered in effective solution room With under scene, more people share that same set of Intelligent household central control bring privacy is poor, pick up facility inconvenience is mobile, environment is noisy The problem of caused discrimination reduces.It avoids in multicell linkage and increases desktop speaker or wall switch bring cost problem.

Fig. 2 is the structure chart of the smart home device speech control system embodiment of the present invention based on earphone, such as Fig. 2 It is shown, including earphone 21, Cloud Server 22 and Intelligent household central control 23, wherein

The earphone 21 for receiving the request voice of user's sending, and uploads the request voice to Cloud Server；

The Cloud Server 22 is parsed for receiving the request voice, issues corresponding instruction to smart home Middle control；

The Intelligent household central control 34 operates smart home device 24 for receiving described instruction.

In the present embodiment, the earphone 21 is lightweight equipment, and lightweight equipment refers to operation FreeRTOS, mbedOS etc. The Intelligent hardware of lightweight operating system, they have the characteristics that low cost, low-power consumption, portable.Preferably, the earphone Hardware minimum standard is the MCU of ARM Cortex-M3 kernel, dominant frequency 120Mhz, Flash capacity > 256KB, SRAM > 64KB, institute MCU is stated using FreeRTOS or mbedOS.

In a kind of preferred implementation of earphone 21, the earphone 21, for receiving the request voice of user's sending, and The request voice is uploaded to Cloud Server.

Preferably, when user initiates voice request, earphone 21 is waken up, and receives the request voice that user issues, and is uploaded The request voice is to Cloud Server 22.

Preferably, earphone 21 is likely to be at following several working conditions: in a dormant state, waiting is waken up earphone 21； Earphone 21 is in broadcast state；Earphone 21 is on call.User is called out by saying wake-up word, touch key-press or other modes Awake earphone 21, makes it into pickup mode, receives the request voice that user issues.Preferably, after earphone 21 enters pickup mode, Earphone terminates currently playing or call.

Preferably, the signal processing unit of earphone 21, such as low-power consumption DSP receive wake-up word or user that user issues Operation to the key or touch area that are arranged on earphone wakes up earphone, into wake-up states.

Preferably, after earphone 21 is waken up, into pickup mode, the request voice that user issues is obtained.

Preferably, audio data progress speech detection collected after 21 pairs of wake-up moment of earphone, detects therein Voice segments, the request voice that institute's speech segment is issued as user.

Preferably, earphone 21 uploads the request voice to Cloud Server 22, and the Cloud Server 22 is DuerOS.Earphone 21 report earphone 21 to be waken up and start to receive the event of user speech request by DCS agreement to Cloud Server 22 (lisenStarted event), and request voice is incidentally uploaded.

Preferably, the sound collection unit of earphone 21 acquires audio data, is carried out using the signal processing unit of earphone 21 Wake up detection；Including following sub-step:

Sub-step A, acquisition audio data are simultaneously cached.

Sound collection unit on earphone 21, such as microphone, acquire 21 local environment of earphone in audio data so as into Row wakes up detection.

Preferably, the microphone can be in always pickup state (persistently sampled, quantified to audio data), adopt Collect the audio data in 21 local environment of earphone to carry out wake-up detection.

Preferably, according to specific needs, such as in order to reduce the power consumption of earphone 21, ear can also be acquired according to predetermined period Audio data in 21 local environment of machine；For example, being detected by the period of 10ms.The period of the regular volume detection can be with It is default when earphone 21 dispatches from the factory, it can also be arranged by user according to self-demand.

In the present embodiment, it is to be understood that in 21 local environment of earphone, microphone can collect the audio data The corresponding information of any sound, for example, people, including sound, ambient noise etc. that user issues, as long as the microphone can Acquisition.

Preferably, the sound collection unit on earphone 21 delays collected audio data using circular buffer It deposits, it should be noted that above-mentioned circular buffer (Ring Buffer or Circular Buffer) can be in storage audio number According to while read audio data.

Sub-step B, speech detection is carried out to audio data collected.

The signal processing unit of earphone 21, such as low-power consumption DSP, including speech detection module, for being acquired to microphone Audio data carry out speech detection (Voice Activity Detection, VAD), audio signal can be gone out with accurate detection Voice segments initial position, to isolate voice segments and non-speech segment (mute or noise) signal.

Since VAD is needed in the local completion in real time of earphone 21.Since computing resource is very limited, it can generally use VAD based on threshold value；It may also be utilized by the classification of engineering optimization.

Preferably, the signal processing unit of earphone 21, such as low-power consumption DSP, built-in voice wake up engine, at any time etc. waiting The voice wake up instruction that user issues is received, is detected by waking up word algorithm.

In daily use, user needs to be sent out according to the wake-up word of earphone 21 first before saying voice operating instruction Wake up instruction out, for example, operational order could then be issued to wake up earphone 21 by issuing the wake up instruction of " the small small degree of degree ".

The voice of signal processing unit wakes up the voice segments that engine obtains detection and carries out wake-up detection, since target is single (only need to detect specified wake-up word), wake-up only need lesser acoustic model and decoding network (only need to distinguish whether there is or not Word is waken up to occur), acoustics marking and decoding can quickly, and space hold is few, can locally complete in real time in earphone 21.

It is said simultaneously preferably due to which user will wake up word with request voice, such as " the small small degree of degree, open kitchen lamp ", In view of the length of common wake up instruction and phonetic order, circular buffer can cache the audio data of 5s or so, in this way, The request voice of user's sending can be immediately obtained after detecting wake up instruction, user is waken up without waiting for earphone 21 And it issues the user with prompt tone and just issues request voice later.

Preferably, the signal processing unit of the earphone 21, such as low-power consumption DSP, the noise reduction for being integrated near field voice are calculated Method carries out noise reduction process to the request voice, environmental noise can be effectively reduced, improve the signal-to-noise ratio of the voice of user.

Since the microphone of earphone 21 is close at a distance from user, do not need generally to carry out complicated Wave beam forming and sound source Positioning, it is only necessary to carry out the signal processing operations such as noise reduction, echo processing, obtain more pure user's voice information.

Preferably, for request voice each time, earphone 21 needs to generate one for it uniquely DialogRequestId, unique identification are this time talked with；It is corresponding that Cloud Server 22 to Intelligent household central control 23 issues this dialogue After instruction, to this id will be carried in the reply of earphone 21.

In order to improve the response speed of user's request, language just is sent to Cloud Server 22 when user starts voice request Sound request, and voice data stream is subjected to streaming upload in real time when user side is spoken, rather than after waiting users to finish again It makes requests.Preferably, the voice data stream carries out streaming upload with every 10 milliseconds for a data block (chunk).

Preferably, if Cloud Server 22 detects VAD (Voice Activity Detection, for detecting user Whether finish), then StopListen instruction can be issued to earphone 21, and earphone 212 is stood after receiving StopListen instruction Stop uploading voice data stream.Preferably, stop listening to voice, and mute microphone (MIC) immediately.

In a kind of preferred implementation of Cloud Server 22, the Cloud Server, for receiving the request voice, into Row parsing issues corresponding instruction to Intelligent household central control.

Preferably, Cloud Server 22 receives the request voice that earphone 21 is sent, and is parsed, including speech recognition, Intention assessment issues corresponding instruction to Intelligent household central control.

Preferably, in order to improve the probability of success of speech recognition, Cloud Server 22 carries out at noise reduction the request voice Reason.The noise reduction process at 21 end of earphone and the noise reduction process of Cloud Server 22 can be used individually, can also be used jointly, to guarantee to use Family still can guarantee higher speech recognition accuracy in the biggish situation of ambient noise.

Cloud Server 22 carries out speech recognition to the request voice first, in the present embodiment, is asked with what user issued It asks as control smart machine request.For example, the request voice that user issues are as follows: " please open kitchen lamp " obtains pair The request text " kitchen lamp please be open " answered；Preferably, pass through ASR (Automatic Speech Recognition, automatic language Sound identification) technology/engine, the request voice is converted into text information.

Cloud Server 22 carries out intention assessment to the request text, obtains corresponding user and is intended to, such as " opens kitchen Lamp ".

Preferably, the corresponding instruction of user intention is issued to Intelligent household central control 23 by Cloud Server 22.

Preferably, the smart home protocol instructions are control instruction, for controlling smart home device 24, are wrapped It includes: either on or off equipment, controllable light units, controllable temperature equipment, controllable air speed equipment, equipment mode setting, television channel Setting, controllable volume equipment, lockable equipment, printing device, controllable suction device, water amount controllable equipment, controllable electricity equipment, Controllable direction equipment, controllable height equipment etc..Wherein, the either on or off equipment includes opening equipment, timing opening equipment, closing The operation such as hull closure, timing pass hull closure, halt device.For example, Cloud Server 22 can incite somebody to action when user wants to open designated equipment TurnonRequest instruction is issued to Intelligent household central control.

Preferably, earphone 21, smart home device 24 are identified with the SSID of the wireless router, are determined as same Group equipment.For example, earphone 21 is connected to Cloud Server 22 by wireless router, smart home device 24 passes through in smart home Control 23 is connected to wireless router, and then is connected to Cloud Server 22.Earphone 21 can only be to the intelligence under same wireless router Home equipment is controlled.

User needs first discovery equipment before controlling smart home device.It is set when user searches smart home When standby 24, DiscoverAppliancesRequest instruction can be sent to Intelligent household central control 23, intelligent family by Cloud Server 22 If control 23 finds smart home device 24 between two parties, the relevant information of smart home device 24 can be returned to Cloud Server 22. When receiving the request voice of user so as to Cloud Server 22, corresponding smart home device 24 is searched, and into smart home Control 23 issues corresponding instruction.Preferably, the operating process of the discovery smart home device 24 and opening smart home device 24 operating process is similar, and details are not described herein.

Preferably, Cloud Server 22 is searched and smart home device 24 of the earphone 21 under same wireless router Device identifier；It is intended to instruction corresponding with device identifier generation according to described, is sent to Intelligent household central control 23.

The Header information is as described in Table 1:

Table 1

The Payload information is as described in Table 2:

Table 2

In a kind of preferred implementation of Intelligent household central control 23, the Intelligent household central control 23 receives described instruction Afterwards, smart home device 24 is operated.

Preferably, described instruction is control instruction, the Intelligent household central control 23 according to the control instruction, such as The device identifier for including in TurnonRequest instruction, controls corresponding smart home device 24, for example, described Device identifier in TurnonRequest instruction is the device identifier of kitchen lamp, then opens kitchen lamp.

In another preferred embodiment of the application, the voice request that user issues can be audio-video frequency content broadcasting and ask It asks, such as " I wants to listen news "；Earphone 21 will request voice to be sent to Cloud Server 22, and Cloud Server 22 parses the meaning of user Figure is the resource for obtaining news technical ability, and sends instruction and resource to Intelligent household central control 23；Intelligent household central control 23, which receives, asks It asks and request resource related information is sent to corresponding smart home device 24；Corresponding smart home device 24 gets resource Resource broadcasting is carried out afterwards.That is, described instruction further includes corresponding resource, so that smart home device carries out resource broadcasting.

User can control smart home device 24 by issuing voice request to earphone 21 as a result,.

The Intelligent household central control 23 can manipulate the smart home device 24.

Preferably, the smart home device 24 includes intelligent refrigerator, smart television, intelligent air condition, intelligent washing machine, intelligence The big household electrical appliances such as energy water heater, further include the small household appliances such as sweeping robot, set-top box, electric cooker, air purifier and water purification The home equipments such as device, socket, door lock, lamp.

Preferably, after the completion of the operation carried out to the smart home device 24, Intelligent household central control 23 obtains described The state of smart home device 24, and return to Cloud Server 22.

Preferably, the state of the smart home device 24 is that the smart home device 24 executes Intelligent household central control 23 Operation after device attribute information, for example, the attribute information of kitchen lamp be turnOnState.

Preferably, Intelligent household central control 23 sends response instruction to Cloud Server 22, and the response instruction includes the intelligence The state of energy home equipment 24, such as device attribute information.

The Header information is as described in Table 4:

Table 3

The Payload information is as described in Table 4:

Table 4

Cloud Server 22 receives the state of the smart home device 24, returns to processing result to earphone 21.

Preferably, after the response that Cloud Server 22 receives the return of Intelligent household central control 23 instructs, the intelligence is parsed The state of energy home equipment 24, obtains the processing result of user's request, and return to earphone 21.

Preferably, such as response instruction is " kitchen lamp is successfully opened ", and corresponding processing result is " kitchen lamp success The text information of opening " or " good "；The text information is converted into voice messaging by TTS technology/engine；To earphone 21 Return to voice messaging " kitchen lamp is successfully opened " or voice messaging " good ".

Preferably, the text information can be converted to by voice messaging by tts engine；Or in advance by each corresponding formulation The text information of corresponding processing result is converted to voice messaging by tts engine, is pre-stored in voice messaging library, when receiving After the response instruction, corresponding voice messaging is searched from preset voice messaging library.

Earphone 21 receives the processing result that Cloud Server 22 is sent, and plays the processing result.

It can be seen that the both hands that user has been liberated using mode described in above-described embodiment based on above-mentioned introduction, without manual Operation can realize the control to earphone and terminal by voice；The physical button quantity on earphone can be reduced, volume is reduced； Wake-up rate is improved, false wake-up rate is reduced；The operation convenience of user is improved, the user experience is improved.

It is apparent to those skilled in the art that for convenience and simplicity of description, the terminal of the description It with the specific work process of server, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.The integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

Fig. 3 shows the frame for being suitable for the exemplary computer system/server 012 for being used to realize embodiment of the present invention Figure.The computer system/server 012 that Fig. 3 is shown is only an example, should not function and use to the embodiment of the present invention Range band carrys out any restrictions.

As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes The component of business device 012 can include but is not limited to: one or more processor or processing unit 016, system storage 028, connect the bus 018 of different system components (including system storage 028 and processing unit 016).

Bus 018 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer system/server 012 typically comprises a variety of computer system readable media.These media, which can be, appoints The usable medium what can be accessed by computer system/server 012, including volatile and non-volatile media, movably With immovable medium.

System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include other Removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034 can For reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although in Fig. 3 It is not shown, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to can The CD drive of mobile anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these situations Under, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 may include At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured To execute the function of various embodiments of the present invention.

Program/utility 040 with one group of (at least one) program module 042, can store in such as memory In 028, such program module 042 includes --- but being not limited to --- operating system, one or more application program, other It may include the realization of network environment in program module and program data, each of these examples or certain combination.Journey Sequence module 042 usually executes function and/or method in embodiment described in the invention.

Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment, Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with outside radar equipment, can also be with One or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make the meter Any equipment (such as network interface card, the modulation that calculation machine systems/servers 012 can be communicated with one or more of the other calculating equipment Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/clothes Being engaged in device 012 can also be by network adapter 020 and one or more network (such as local area network (LAN), wide area network (WAN) And/or public network, such as internet) communication.As shown in figure 3, network adapter 020 by bus 018 and computer system/ Other modules of server 012 communicate.It should be understood that computer system/server 012 can be combined although being not shown in Fig. 3 Using other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external magnetic Dish driving array, RAID system, tape drive and data backup storage system etc..

The program that processing unit 016 is stored in system storage 028 by operation, thereby executing described in the invention Function and/or method in embodiment.

Above-mentioned computer program can be set in computer storage medium, i.e., the computer storage medium is encoded with Computer program, the program by one or more computers when being executed, so that one or more computers execute in the present invention State method flow shown in embodiment and/or device operation.

With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited by Tangible medium, can also be directly from network downloading etc..It can be using any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of the description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of smart home device sound control method based on earphone characterized by comprising

2. the method according to claim 1, wherein receiving the request voice that user issues in earphone, and uploading Before the request voice is to Cloud Server:

Earphone receives the wake-up word that user issues or user to the key being arranged on earphone or the operation of touch area, into wake-up State.

3. the method according to claim 1, wherein

The wireless network connection that the earphone is provided by wireless router is to Cloud Server.

4. the method according to claim 1, wherein the upload request voice includes: to Cloud Server

It reports earphone to be waken up to Cloud Server and starts to receive the event of user speech request, and the request voice is attached to It uploads.

5. according to the method described in claim 4, parsed it is characterized in that, Cloud Server receives the request voice, under Hair is instructed to Intelligent household central control accordingly

6. the method according to claim 1, wherein described instruction is smart home protocol instructions, content broadcasting Protocol instructions or third party's service protocol instructions.

7. according to the method described in claim 6, it is characterized in that, described instruction includes the device identification of smart home device Symbol.

8. the method according to the description of claim 7 is characterized in that the Intelligent household central control receives described instruction, to intelligence Home equipment carries out operation

The Intelligent household central control according to the device identifier for including in the control instruction, to corresponding smart home device into Row control.

9. the method according to claim 1, wherein the method also includes following steps:

Cloud Server returns to processing result to earphone.

10. a kind of smart home device speech control system based on earphone, which is characterized in that including earphone, Cloud Server and Intelligent household central control, wherein

The Cloud Server is parsed for receiving the request voice, issues corresponding instruction to Intelligent household central control；

11. system according to claim 10, which is characterized in that the earphone is also used to, and receives the wake-up that user issues Word or user are to the key being arranged on earphone or the operation of touch area, into wake-up states.

12. system according to claim 10, which is characterized in that

13. system according to claim 10, which is characterized in that the earphone is specifically used for:

14. system according to claim 13, which is characterized in that the Cloud Server is specifically used for:

15. system according to claim 10, which is characterized in that described instruction is smart home protocol instructions, content is broadcast Put protocol instructions or third party's service protocol instructions.

16. system according to claim 15, which is characterized in that described instruction includes the device identification of smart home device Symbol.

17. system according to claim 16, which is characterized in that the Intelligent household central control is specifically used for:

18. system according to claim 10, which is characterized in that

The Intelligent household central control is also used to, and the state of the smart home device after the completion of operation is returned to Cloud Server；

The Cloud Server is also used to, and returns to processing result to earphone.

19. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~9 Method described in.

20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Such as method according to any one of claims 1 to 9 is realized when device executes.