CN103714815A

CN103714815A - Voice control method and device thereof

Info

Publication number: CN103714815A
Application number: CN201310657278.1A
Authority: CN
Inventors: 何永; 李传丰
Original assignee: Individual
Current assignee: Individual
Priority date: 2013-12-09
Filing date: 2013-12-09
Publication date: 2014-04-09

Abstract

The invention discloses a voice control method and a device thereof. The control method comprises the steps of: (1) receiving audio data in real time; (b) performing head part judgment on the received audio data through voice breakpoint detection to obtain valid audio information; (c) judging whether the valid audio information includes awakening information, and if the awakening information being included, further executing step (d), otherwise executing step (a); (d) performing head part and tail part judgment on the valid audio information through voice breakpoint detection to obtain executive content information; (e) performing semantic parsing to convert the executive content information to standard executive command information; and (f) executing a related command according to the standard executive command information and displaying an executive result to a user. The voice control method and the device thereof provide a novel intelligent voice interaction environment, and enable the user to use a voice interaction function efficiently and conveniently.

Description

Sound control method and equipment thereof

Technical field

The present invention relates to voice/semantic recognition technology, natural language processing technique and intelligent terminal applicating developing technology field, specifically, is a kind of sound control method and equipment thereof.

Background technology

Along with interactive voice technology and intelligent control technology ground development, there is speech identifying function and can also get more and more according to the equipment of inputted voice content execution associative operation.At present, known voice opertaing device is mainly adopted and is carried out in two ways alternately, and a kind of mode is by manually booting speech recognition switch, and after starting this switch, content is carried out in phonetic entry.Another kind of mode is by specifically waking information up to start speech identifying function, and after having waken up, then content is carried out in phonetic entry.But the opertaing device of the interactive voice of above-mentioned two classes has following weak point: (1) first kind of way, need manual operation, and can not realize interactive voice full automatic working completely.(2) second way, each voice operating, first need first phonetic entry one specifically to wake information up, then waiting for a setting-up time (some seconds) afterwards, equipment just can remove to intercept the voice content described in user automatically, so can greatly reduce the mutual agility of intelligent sound and convenience like this.

Therefore, need a kind of novel sound control method and equipment thereof.

Summary of the invention

The object of the invention is to, a kind of sound control method and equipment thereof are provided, it can overcome the deficiencies in the prior art part, and provides a kind of novel intelligent sound mutual environment, makes user can more efficiently use easily voice interactive function.

For achieving the above object, a kind of sound control method of the present invention, comprises step: (a) audio reception data in real time; (b) by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information; (c) judge whether described effective audio-frequency information comprises the information of waking up; If wake information described in comprising up, further perform step (d); Otherwise execution step (a); (d) by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information; (e) carry out semanteme and resolve, so that described execution content information is converted to standard fill order information; (f) according to described standard fill order information, carry out relevant order, and the result of execution is shown to user.

Further, further comprising the steps in step (c):

Described effective audio-frequency information is sent to a this locality and wakes information database up;

The content of waking described effective audio-frequency information up information database with described this locality is mated; When matching while waking information up, execution step (d); Otherwise, perform step (a).

Further, further comprising the steps in described step (d) and step (e):

The obtained information of waking up and execution content information are sent to high in the clouds database simultaneously;

By high in the clouds speech recognition, the described information of waking up is mated with the content of high in the clouds database; If while matching, perform step (e); Otherwise execution step (a).

Further, further comprising the steps in described step (e):

Transfer obtained execution content information to text formatting information;

By described text formatting information analysis, be standard fill order information.

Further, the information of waking up described in be in a word, word or a sentence any one.

To achieve these goals, the present invention also provides a kind of voice opertaing device, and it comprises audio frequency receiver module, breaking point detection module, wakes signal judgement module up, carries out content information acquisition module, modular converter and execution module; Wherein said audio frequency receiver module, in order to audio reception data in real time; Described breaking point detection module, is connected with described audio frequency receiver module, in order to by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information; The described signal judgement module that wakes up, is connected with described breaking point detection module, in order to judge whether described effective audio-frequency information comprises the information of waking up, if call described execution content information acquisition module, otherwise calls described audio frequency receiver module; Described execution content information acquisition module, is connected with the described signal judgement module that wakes up, in order to by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information; Described modular converter, is connected with described execution content information acquisition module, resolves, so that described execution content information is converted to standard fill order information in order to carry out semanteme; Described execution module, is connected with described modular converter, and described execution module is in order to carry out relevant order according to described standard fill order information, and the result of execution is shown to user.

Further, described in, wake signal judgement module up and further comprise delivery unit and matching unit; Described delivery unit wakes information database up in order to described effective audio-frequency information is sent to a this locality; Described matching unit is connected with described delivery unit, in order to the content of waking described effective audio-frequency information up information database with described this locality, mates; When matching while waking information up, call and carry out content information acquisition module; Otherwise, call described audio frequency receiver module.

Further, described delivery unit is further in order to be sent to high in the clouds database by the obtained information of waking up and execution content information simultaneously; Described matching unit is further in order to mate the described information of waking up by high in the clouds speech recognition with the content of high in the clouds database; If while matching, call described modular converter; Otherwise call described audio frequency receiver module.

Further, described modular converter further comprises converting unit and resolution unit, and described converting unit, in order to transfer obtained execution content information to text formatting information; Described resolution unit is connected with described converting unit, in order to being standard fill order information by described text formatting information analysis.

The invention has the advantages that, utilize voice breaking point detection technology, wake information detection technology and speech recognition technology up, to provide a kind of novel intelligent sound mutual environment, make user can use more efficiently and easily voice interactive function, thus make relevant equipment can complete more quickly the voice content carried out of wish.

Accompanying drawing explanation

Fig. 1 is the flow chart of steps of sound control method of the present invention.

Fig. 2 is the Organization Chart of voice opertaing device of the present invention.

Embodiment

Below in conjunction with accompanying drawing, the embodiment of a kind of sound control method provided by the invention and equipment is elaborated.

First by reference to the accompanying drawings provide the embodiment of sound control method of the present invention.

Fig. 1 is the flow chart of steps of sound control method of the present invention.Shown in Figure 1, sound control method of the present invention comprises: step S110, audio reception data in real time; Step S120, by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information; Step S130, judge whether described effective audio-frequency information comprises the information of waking up; If wake information described in comprising up, further perform step S140; Otherwise execution step S110; Step S140, by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgements, to obtain execution content information; Step S150, carry out semanteme and resolve, so that described execution content information is converted to standard fill order information; Step S160 carries out relevant order according to described standard fill order information, and the result of execution is shown to user.

Below with reference to accompanying drawing 1, illustrate each step.

Step S110: audio reception data in real time.

Enter init state, 24 hours audio reception data (with voice mode input) in real time.

Step S120: by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information.

In this step, utilize voice breaking point detection mode to carry out stem judgement to received described voice data, thereby obtain an effective audio-frequency information.So-called stem judgement, utilize just voice breaking point detection mode, can obtain effective audio-frequency information, and get rid of information that noise produces or the information of improper phonetic entry, thereby reduce the probability that destination object performs an action because of wrong audio-frequency information.

Step S130: judge whether described effective audio-frequency information comprises the information of waking up; If wake information described in comprising up, further perform step S140; Otherwise execution step S110.

In one embodiment of the present invention, described in to wake information (or claim wake up word) up be one to preset, its can be when dispatching from the factory default setting, or can select before use setting.Described wake up information be in a word, word or a sentence any one.For example, waking information up can be " newly ", " Xiao Ming ", " my baby " etc.Wake information up except comprising Chinese word, can also comprise other foreign language words, at this, do not limit.In addition, the address to destination object when the information of waking up described in literary composition is phonetic entry, this destination object can be carried out relevant action according to received voice content.Describedly wake the information explanation that also can be further explained hereinafter up.

In this step, utilization wakes information detection technology up and judges whether described effective audio-frequency information comprises the information of waking up setting.Described in comprising if judge, wake information up, continue subsequent step, otherwise again wait for the voice data that reception is new.

When judging effective audio-frequency information, comprise after the information of waking up, further confirm to wake the starting position whether information is positioned at effective audio-frequency information up, be positioned at the stem of effective audio-frequency information.If satisfy condition, carry out subsequent step, otherwise the information of for example waking up appears at the middle somewhere of effective audio-frequency information, or occur in the end, in the case, can again wait for and receive new voice data.

In another embodiment of the present invention, further comprising the steps in step S130:

The content of waking described effective audio-frequency information up information database with described this locality is mated; When matching while waking information up, perform step S140; Otherwise, perform step S110.

The content of wherein waking described effective audio-frequency information up information database with described this locality is mated, can be understood as, first by the data that preset in a large number, set up with data model, then by described effective audio-frequency information, mate with this data model, to determine similarity, when if similarity reaches a threshold value, think that described effective audio-frequency information includes the information of waking up.

At other embodiments of the present invention, be not limited to aforesaid way, can adopt above-mentioned by the information of waking up presetting, to judge whether described effective audio-frequency information comprises the information of waking up setting.

Step S140: by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information.

In this step, by voice breaking point detection, again described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content.So-called head and the tail judgement, is by voice breaking point detection and not only can judges the end position of the information of waking up, carries out the starting position of content, and judges the end position of carrying out content, like this, just can obtain one and effectively carry out content information.

And prior art is first specifically to wake information up by phonetic entry one, then waiting for a setting-up time (fixing some seconds) afterwards, target device just can remove to intercept the voice content described in user automatically, so can cause the situation of time delay intercepting voice content, so that have deviation with actual speech input content, imperfect, thus different execution results produced.As can be seen here, adopt voice breaking point detection technology can guarantee that the execution content of obtaining is correct.

In another embodiment of the present invention, further comprising the steps in described step S140 and step S150:

By high in the clouds speech recognition, the described information of waking up is mated with the content of high in the clouds database; If match while waking information up, perform step S150; Otherwise execution step S110.

The execution of above-mentioned steps is in order to reduce false wake-up probability, by adopting high in the clouds speech recognition (engine) to verify that again whether the current information of waking up is effective.If again match identically while waking information up, carry out subsequent step.With only by this locality, wake the mode that information database judges whether effective audio-frequency information comprises the information of waking up up and compare, the mode that this step adopts is the data model that utilizes database its large amount of complex data that have in high in the clouds to set up, wake the coupling of information up, thereby can effectively lower false wake-up number of times.

In other embodiments of the present invention, be not limited to aforesaid way, also can adopt other modes to verify the correctness of the information of waking up.

Step S150: carry out semanteme and resolve, so that described execution content information is converted to standard fill order information.

Wake information up described in comprising judging described effective audio-frequency information, and after obtaining and carrying out content information,, by semantic analysis mode, described execution content information is converted to standard fill order information.

In one embodiment of the present invention, this step may further include following steps:

Transfer obtained execution content information to text formatting information;

In other words, by speech recognition technology, (for example convert voice messaging to discernible text message exactly, convert voice messaging " Xiao Ming; please open door " to text formatting " Xiao Ming; please open door "), and described this paper information analysis is gone out to relevant fill order, with standard format, export.Wherein, the step that the execution content information of described acquisition is transferred to text formatting information completes in database beyond the clouds, thereby improves conversion efficiency.And this step also can complete in other embodiments in local data base.By natural language processing technique, by described text formatting information analysis, be standard fill order information simultaneously.

Step S160: carry out relevant order according to described standard fill order information, and the result of execution is shown to user.

When target device (waking the object of information up) can be according to described standard fill order information, and call relevant module to carry out relevant fill order, and execution result is shown to user.

Below with reference to accompanying drawing, provide the embodiment of technique scheme.

Embodiment mono-, with user speech input " little intelligence please be opened bedroom air-conditioning ", be example.

Step S110, audio reception data in real time.

Destination object is within 24 hours, to detect in real time the voice data of received phonetic entry.

Step S120, by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information.

When destination object receives voice data, can utilize voice breaking point detection to carry out stem judgement to received voice data, to obtain effective audio-frequency information " little intelligence; please open bedroom air-conditioning ", and got rid of " little intelligence " effectively audio-frequency information noise information or improper speech input information before.

Step S130, judge whether described effective audio-frequency information comprises the information of waking up.

Destination object is sent to a this locality by received effective audio-frequency information and wakes information database up.

The content of waking described effective audio-frequency information up information database with described this locality is mated, detect and whether have the qualified information of waking up, after " little intelligence " having been detected this waken information up, can further judge, this wakes the stem whether information is positioned at described effective audio-frequency information up " little intelligence ".Due to " little intelligence ", this wakes the stem whether information is positioned at described effective audio-frequency information up, therefore, carries out subsequent step, otherwise destination object is waited for the voice data that reception is new again.

Step S140, by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgements, to obtain execution content information.

By voice breaking point detection, again described effective audio-frequency information " little intelligence; please open bedroom air-conditioning " is carried out to head and the tail judgement, judge " intelligence " word in " little intelligence " and when finish, think that ensuing audio-frequency information is the starting position of carrying out content.Equally, utilize voice breaking point detection also to judge " tune " word in " please open bedroom air-conditioning " and when finish, think and carry out the end position of content.So, can obtain and carry out content information (" please open bedroom air-conditioning ").

In the present embodiment, destination object can comprise that by effective audio-frequency information the information of waking up and execution content information (are sent to high in the clouds database for " little intelligence " " please open bedroom air-conditioning " herein simultaneously through a step.

By high in the clouds speech recognition by described in wake information " little intelligence " up and mate with the content of high in the clouds database, if match, carry out next step operation, otherwise destination object is waited for new voice data again.By this locality, wake the content matching of information database and the content matching of high in the clouds database up, wake the double verification of information up, effectively to reduce false wake-up number of times.

In the present embodiment, by high in the clouds database and speech recognition technology, described execution content information transfers text formatting information to, thereby improves conversion efficiency.

Step S150, carry out semanteme and resolve, so that described execution content information is converted to standard fill order information.

In this step, by natural language processing technique, by described text formatting information analysis, be standard fill order information.That is to say, by natural-sounding treatment technology, to text formatting information analysis, identify the true intention of text formatting information, the implication that " please open bedroom air-conditioning " is " air-conditioning in this room, bedroom is opened ", and changes into standard fill order information for " CommandOpen| bedroom | air-conditioning ".The form of described standard fill order information can be by requirement definition, only need to be with set form.

Step S160 carries out relevant order according to described standard fill order information, and the result of execution is shown to user.

Destination object, according to described standard fill order information " CommandOpen| bedroom | air-conditioning ", calls relevant processing module and execution module, to have coordinated the content of described standard fill order information.Meanwhile, the result of execution is shown to user's (being destination object herein, opens the air-conditioning in bedroom).

Sound control method of the present invention, by identifying the information of waking up of user speech input and carrying out content, to start voice control flow, thereby the operational order (carrying out content) of user speech input is sent to target device with predetermined manner, realizes the control to target device.

More important point is, the present invention utilizes voice breaking point detection technology, wakes information detection technology, speech recognition technology and natural language processing technique up and provide a kind of novel intelligent sound mutual environment, user is without manual operation target device, so reduce user's operation, make user can use more efficiently and easily voice interactive function.

Except a kind of sound control method that the invention described above provides, the present invention also provides a kind of voice opertaing device.

Fig. 2 is the Organization Chart of voice opertaing device of the present invention.Shown in Figure 2, voice opertaing device of the present invention comprises audio frequency receiver module M210, breaking point detection module M220, wakes signal judgement module M230 up, carries out content information acquisition module M240, modular converter M250 and execution module M260.Wherein said audio frequency receiver module M210, in order to audio reception data in real time.

Described breaking point detection module M220, is connected with described audio frequency receiver module M210, in order to by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information.

Wherein, so-called stem judgement, utilizes voice breaking point detection mode just, can obtain effective audio-frequency information, and got rid of information that noise produces or the information of improper phonetic entry, thereby reduced the probability that destination object performs an action because of wrong audio-frequency information.

The described signal judgement module M230 that wakes up, is connected with described breaking point detection module M220, in order to judge whether described effective audio-frequency information comprises the information of waking up, if call described execution content information acquisition module, otherwise calls described audio frequency receiver module.

In an embodiment of the present invention, described in to wake information up be one to preset, its can be when dispatching from the factory default setting, or can select before use setting.Described wake up information be in a word, word or a sentence any one.For example, waking information up can be " newly ", " Xiao Ming ", " my baby " etc.Wake information up except comprising Chinese word, can also comprise other foreign language words, at this, do not limit.In addition, the address to destination object when the information of waking up described in literary composition is phonetic entry, this destination object can be carried out relevant action according to received voice content.

And as preferred embodiment, described in wake signal judgement module M230 up and further comprise delivery unit M231 and matching unit M233; Described delivery unit M231 wakes information database up in order to described effective audio-frequency information is sent to a this locality; Described matching unit M233 is connected with described delivery unit M231, in order to the content of waking described effective audio-frequency information up information database with described this locality, mates; When matching while waking information up, call described execution content information acquisition module M240; Otherwise, call described audio frequency receiver module M210.

As preferred embodiment, described delivery unit M231 is further in order to be sent to high in the clouds database by the obtained information of waking up and execution content information simultaneously; Described matching unit M233 is further in order to mate the described information of waking up by high in the clouds speech recognition with the content of high in the clouds database; If match while waking information up, call described modular converter M250; Otherwise call described audio frequency receiver module M210.With only by this locality, wake the mode that information database judges whether effective audio-frequency information comprises the information of waking up up and compare, the data model that utilizes database its large amount of complex data that have in high in the clouds to set up, wake the coupling of information up, thereby can effectively lower false wake-up number of times.

Described execution content information acquisition module M240, is connected with the described signal judgement module M230 that wakes up, in order to by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information.

So-called head and the tail judgement, is by voice breaking point detection and not only can judges the end position of the information of waking up, carries out the starting position of content, and judges the end position of carrying out content, like this, just can obtain one and effectively carry out content information.And prior art is first specifically to wake information up by phonetic entry one, then waiting for a setting-up time (fixing some seconds) afterwards, target device just can remove to intercept the voice content described in user automatically, so can cause the situation of time delay intercepting voice content, so that have deviation with actual speech input content, imperfect, thus different execution results produced.As can be seen here, adopt voice breaking point detection technology can guarantee that the execution content of obtaining is correct.

Described modular converter M250, is connected with described execution content information acquisition module M240, resolves, so that described execution content information is converted to standard fill order information in order to carry out semanteme.

As preferred embodiment, described modular converter M250 further comprises converting unit M251 and resolution unit M253, and described converting unit M251, in order to transfer obtained execution content information to text formatting information.Wherein, described converting unit M251 can arrange in the database of high in the clouds, to transfer the execution content information of described acquisition to text formatting information, thereby improves conversion efficiency.And described converting unit M251 can arrange in local data base in other embodiments.Described resolution unit M253 is connected with described converting unit M251, in order to being standard fill order information by described text formatting information analysis.

Described execution module M260, is connected with described modular converter M250, and described execution module M260 is in order to carry out relevant order according to described standard fill order information, and the result of execution is shown to user.

The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1. a sound control method, is characterized in that, comprises step:

(a) audio reception data in real time;

(b) by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information;

(c) judge whether described effective audio-frequency information comprises the information of waking up; If wake information described in comprising up, further perform step (d); Otherwise execution step (a);

(d) by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information;

(e) carry out semanteme and resolve, so that described execution content information is converted to standard fill order information;

(f) according to described standard fill order information, carry out relevant order, and the result of execution is shown to user.

2. sound control method according to claim 1, is characterized in that, further comprising the steps in step (c):

3. sound control method according to claim 2, is characterized in that, further comprising the steps in described step (d) and step (e):

4. sound control method according to claim 1, is characterized in that, further comprising the steps in described step (e):

Transfer obtained execution content information to text formatting information;

5. sound control method according to claim 1, is characterized in that, described in wake up information be in a word, word or a sentence any one.

6. a voice opertaing device, is characterized in that, comprises audio frequency receiver module, breaking point detection module, wakes signal judgement module up, carries out content information acquisition module, modular converter and execution module; Wherein

Described audio frequency receiver module, in order to audio reception data in real time;

Described breaking point detection module, is connected with described audio frequency receiver module, in order to by voice breaking point detection, the described voice data receiving is carried out to stem judgement, to obtain an effective audio-frequency information;

The described signal judgement module that wakes up, is connected with described breaking point detection module, in order to judge whether described effective audio-frequency information comprises the information of waking up, if call described execution content information acquisition module, otherwise calls described audio frequency receiver module;

Described execution content information acquisition module, is connected with the described signal judgement module that wakes up, in order to by voice breaking point detection, described effective audio-frequency information is carried out to head and the tail judgement, to obtain execution content information;

Described modular converter, is connected with described execution content information acquisition module, resolves, so that described execution content information is converted to standard fill order information in order to carry out semanteme;

Described execution module, is connected with described modular converter, and described execution module is in order to carry out relevant order according to described standard fill order information, and the result of execution is shown to user.

7. voice opertaing device according to claim 6, is characterized in that, described in wake signal judgement module up and further comprise delivery unit and matching unit; Described delivery unit wakes information database up in order to described effective audio-frequency information is sent to a this locality; Described matching unit is connected with described delivery unit, in order to the content of waking described effective audio-frequency information up information database with described this locality, mates; When matching while waking information up, call described execution content information acquisition module; Otherwise, call described audio frequency receiver module.

8. voice opertaing device according to claim 7, is characterized in that, described delivery unit is further in order to be sent to high in the clouds database by the obtained information of waking up and execution content information simultaneously; Described matching unit is further in order to mate the described information of waking up by high in the clouds speech recognition with the content of high in the clouds database; If while matching, call described modular converter; Otherwise call described audio frequency receiver module.

9. voice opertaing device according to claim 6, is characterized in that, described modular converter further comprises converting unit and resolution unit, and described converting unit, in order to transfer obtained execution content information to text formatting information; Described resolution unit is connected with described converting unit, in order to being standard fill order information by described text formatting information analysis.

10. voice opertaing device according to claim 6, is characterized in that, described in wake up information be in a word, word or a sentence any one.