Summary of the invention
The embodiment of the present invention provides a kind of voice awakening method and device, to reduce the power consumption of terminal under noisy environment.
First aspect, the embodiment of the present invention provides a kind of voice awakening method, comprising:
Periodic samples is carried out to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer;
Calculate described sampled signal y
iaudio power T
i;
At described audio power T
ibe more than or equal to described t
ithe first threshold A in moment
0when, carry out voice activation and detect VAD;
When VAD detects unsuccessfully for n time continuously, and when VAD detects unsuccessfully, and at described t
idetected unsuccessfully for n time continuously before moment, and the first noise energy S
0with described t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, according to described first noise energy S
0generate Second Threshold A
1, and by described Second Threshold A
1as t
i+1the first threshold A in moment
0, wherein, described first noise energy S
0be by with the first extraction yield 1/x to described sampled point y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is positive integer and n is less than i.
In conjunction with first aspect, in the first possible implementation of first aspect, described according to described first noise energy S
0generate Second Threshold A
1, comprising:
By described first noise energy S
0as described Second Threshold A
1;
Or, by described first noise energy S
0with the first correction N preset
0sum is as described Second Threshold A
1;
Or, by described first noise energy S
0with the first coefficient a preset
0long-pending as described Second Threshold A
1.
In conjunction with first aspect, in the implementation that the second of first aspect is possible, at the described sampled signal y of described calculating
iaudio power T
iafterwards, also comprise:
At described audio power T
ibe less than described t
ithe first threshold A in moment
0, and from t
i-mmoment is until t
imoment first threshold A separately
0with the second noise energy F
0difference be all greater than the second default threshold value M
1when, carry out VAD, m and be positive integer and m is less than i;
When VAD detects successfully, according to described second noise energy F
0generate the 3rd threshold value A
2, and by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0, wherein, described second noise energy F
0be by with the second extraction yield 1/z to described sampled signal y
iextract, and carry out quick tracking filter to the sampled point yf extracted and obtain, wherein, z is the natural number being greater than x.
In conjunction with the implementation that the second of first aspect is possible, in the third possible implementation of first aspect, described according to described second noise energy F
0generate the 3rd threshold value A
2, comprising:
By described second noise energy F
0as described 3rd threshold value A
2;
Or, by described second noise energy F
0with the second correction N preset
1sum is as described 3rd threshold value A
2;
Or, by described second noise energy F
0with the second coefficient a preset
1long-pending as described 3rd threshold value A
2.
In conjunction with the second or the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0before, also comprise:
Record described t
imoment is for reducing the threshold value moment;
As described t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform described by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0step, otherwise, do not perform described by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0step.
In conjunction with first aspect, in the 5th kind of possible implementation of first aspect, at the described sampled signal y of described calculating
iaudio power T
iafterwards, also comprise:
At described audio power T
ibe less than described t
ithe first threshold A in moment
0, and described t
ithe first threshold A in moment
0with described first noise energy S
0difference be greater than the 3rd default threshold value M
2when, according to described first noise energy S
0generate the 4th threshold value A
3, and by described 4th threshold value A
3as t
i+1the first threshold A in moment
0.
In conjunction with the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect, described according to described first noise energy S
0generate the 4th threshold value A
3, comprising:
By described first noise energy S
0as described 4th threshold value A
3;
Or, by described first noise energy S
0with the 3rd correction N preset
2sum is as described 4th threshold value A
3;
Or, by described first noise energy S
0with the 3rd coefficient a preset
2long-pending as described 4th threshold value A
3.
In conjunction with the 5th kind or the 6th kind of possible implementation of first aspect, in the 7th kind of possible implementation of first aspect, by described 4th threshold value A
3as t
i+1the first threshold A in moment
0before, also comprise:
Record described t
imoment is for reducing the threshold value moment;
As described t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform described by described 4th threshold value A
3as t
i+1the first threshold A in moment
0step, otherwise, do not perform described by described 4th threshold value A
3as t
i+1the first threshold A in moment
0step.
Second aspect, the embodiment of the present invention provides a kind of voice Rouser, comprising:
Sampling frequency converter SRC, for carrying out periodic samples to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer;
Computing circuit, for calculating described sampled signal y
iaudio power T
i;
Threshold value decision circuit, for judging described audio power T
iwhether be more than or equal to described t
ithe first threshold A in moment
0; At described audio power T
ibe more than or equal to described t
ithe first threshold A in moment
0when, triggered interrupts treatment circuit exports interruption pulse signal to interrupt control circuit, carries out voice activation detect VAD by described interrupt control circuit enable digital signals processor DSP or processor;
First withdrawal device, the input end of described first withdrawal device is coupled to the output terminal of described SRC, for the first extraction yield 1/x to described sampled signal y
icarry out extraction and obtain sampled point ys, x be greater than 1 natural number;
The input end of tracking filter STF at a slow speed, described STF is coupled to the output terminal of described first withdrawal device, carries out tracking filter at a slow speed obtain the first noise energy S for obtaining sampled point ys to described extraction
0;
Comparer, the input end of described comparer is coupled to and the output terminal of described STF and described threshold value decision circuit, for more described first noise energy S
0with described t
ithe first threshold A in moment
0difference whether be greater than the first default threshold value M
0;
Configurator, for detecting unsuccessfully as VAD, and at described t
idetected unsuccessfully for n time continuously before moment, and described first noise energy S
0with described t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, according to described first noise energy S
0generate Second Threshold A
1, and by described Second Threshold A
1as t
i+1the first threshold A in moment
0, be issued to described threshold value decision circuit, n is positive integer and n is less than i.
In conjunction with second aspect, in the first possible implementation of second aspect, described configurator specifically for:
By described first noise energy S
0as described Second Threshold A
1;
Or, by described first noise energy S
0with the first correction N preset
0sum is as described Second Threshold A
1;
Or, by described first noise energy S
0with the first coefficient a preset
0long-pending as described Second Threshold A
1.
In conjunction with second aspect, in the implementation that the second of second aspect is possible, also comprise:
Second withdrawal device, the input end of described second withdrawal device is coupled to the output terminal of described SRC, for the second extraction yield 1/z to described sampled signal y
icarry out extraction and obtain sampled point yf, wherein, z is the natural number being greater than x;
The input end of fast tracking filter FTF, described FTF is coupled to the output terminal of described second withdrawal device, carries out quick tracking filter obtain the second noise energy F for obtaining sampled point yf to described extraction
0second noise energy;
Described comparer, with the output terminal of described FTF, also at described audio power T
ibe less than described t
ithe first threshold A in moment
0when, the first threshold in more each moment and described second noise energy F
0difference whether be greater than the second default threshold value M
1; And work as from t
i-mmoment is until t
imoment first threshold A separately
0with described second noise energy F
0difference be all greater than the second default threshold value M
1when, trigger described interrupt processing circuit and export interruption pulse signal to described interrupt control circuit, carry out VAD, m by the enable described DSP of described interrupt control circuit or described processor and be positive integer and m is less than i;
Described configurator, also for when VAD detects successfully, according to described second noise energy F
0generate the 3rd threshold value A
2, and by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0, be issued to described threshold value decision circuit.
In conjunction with the implementation that the second of second aspect is possible, in the third possible implementation of second aspect, described configurator specifically for:
By described second noise energy F
0as described 3rd threshold value A
2;
Or, by described second noise energy F
0with the second correction N preset
1sum is as described 3rd threshold value A
2;
Or, by described second noise energy F
0with the second coefficient a preset
1long-pending as described 3rd threshold value A
2.
In conjunction with the second or the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described configurator also for:
Record described t
imoment is for reducing the threshold value moment;
As described t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform described by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0step, otherwise, do not perform described by described 3rd threshold value A
2as t
i+1the first threshold A in moment
0step.
In conjunction with second aspect, in the 5th kind of possible implementation of second aspect, described configurator also for:
At described audio power T
ibe less than described t
ithe first threshold A in moment
0, and described t
ithe first threshold A in moment
0with described first noise energy S
0difference be greater than the 3rd default threshold value M
2when, according to described first noise energy S
0generate the 4th threshold value A
3, and by described 4th threshold value A
3as t
i+1the first threshold A in moment
0.
In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect, described configurator specifically for:
By described first noise energy S
0as described 4th threshold value A
3;
Or, by described first noise energy S
0with the 3rd correction N preset
2sum is as described 4th threshold value A
3;
Or, by described first noise energy S
0with the 3rd coefficient a preset
2long-pending as described 4th threshold value A
3.
In conjunction with the 5th kind or the 6th kind of possible implementation of second aspect, in the 7th kind of possible implementation of second aspect, described configurator also for:
Record described t
imoment is for reducing the threshold value moment;
As described t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform described by described 4th threshold value A
3as t
i+1the first threshold A in moment
0step, otherwise, do not perform described by described 4th threshold value A
3as t
i+1the first threshold A in moment
0step.
The embodiment of the present invention provides a kind of voice awakening method and device, by obtaining t
iinstance sample obtains sampled signal y
iaudio power T
i, and at this audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, carry out VAD; When VAD detects unsuccessfully, and at t
idetected unsuccessfully for n time continuously before moment, and the first noise energy S
0with t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, adjustment first threshold A
0size, obtain t
i+1the first threshold A in moment
0: according to the first noise energy S
0generate Second Threshold A
1, and by Second Threshold A
1as t
i+1the first threshold A in moment
0.Wherein, the first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t
i+1the first threshold A in moment
0according to t
ithe first noise energy S in moment
0obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present
0size, make the first threshold A in each moment
0with environments match, to reduce the number of times carrying out VAD, realize the reduction of terminal power consumption under noisy environment.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The implication that voice wake up, refers in any case, can wake word up, activate terminal by predefined, and performs specific application.Similar user key-press lights screen, the process Activated Phone.The advantage that voice wake up is the both hands of having liberated user.
Wake up in scheme at the voice of a smart mobile phone, under quiet environment, the stand-by power consumption of this smart mobile phone is about 2.2 milliamperes × 3.8 volts; Under noisy environment, the stand-by power consumption of this smart mobile phone is 5.5 milliamperes × 3.8 volts.Visible, the power consumption difference of this smart mobile phone under noisy environment and quiet environment is about 12 milliwatts, (5.5-2.2) × 3.8=12.
According to power consumption estimation model: average power consumption=noisy power consumption × 30% of quiet power consumption × 70%+, therefore, should consider the power consumption under reduction noisy environment, the embodiment of the present invention pays close attention to the optimised power consumption under noisy environment.
A kind of method that the embodiment of the present invention provides voice to wake digital signal processor in terminal up and device, wake DSP in terminal up carry out the number of times of VAD to reduce, realize the reduction of the power consumption of terminal under noisy environment.
Fig. 1 is the process flow diagram of voice awakening method embodiment one of the present invention.The method can be performed by voice Rouser, and this device can be realized by the mode of hardware.PDA) etc. voice Rouser can be integrated in such as panel computer, smart mobile phone, palm PC, and (PersonalDigitalAssistant is called for short: in terminal.As shown in Figure 1, voice awakening method comprises:
S101, periodic samples is carried out to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer.
Similarly, t
i-1the sampled signal in moment can be denoted as y
i-1, t
i+1the sampled signal in moment can be denoted as y
i+1, by that analogy, do not enumerate here.
Wherein, in any embodiment of the present invention, the signal that sound signal can collect for sound collection equipment such as microphones.By sampling frequency converter, (SampleRateConvertor is called for short: SRC) carry out periodic samples to the sound signal that the sound collection equipment such as microphone collect.Or the sound signal sound collection equipment such as microphone collected after the filter process such as such as bandpass filter, then carries out periodic samples by SRC, and the embodiment of the present invention is not limited.
S102, calculating sampling signal y
iaudio power T
i.
It should be noted that, can carry out after obtaining sampled signal the calculating of the audio power of sampled signal, such as: at t
i-1instance sample obtains sampled signal y
i-1after, also can calculate sampled signal y
i-1corresponding audio power T
i-1.
It will be appreciated by those skilled in the art that because of sampled signal y
icertain, therefore, sampled signal y
iaudio power T
ican obtain by calculating.
Particularly, x (j) is adopted to represent sampled signal y
iin the amplitude of jth sampled point, x (j) × x (j) represents sampled signal y
iin the energy size in jth moment, j is the integer between 0 to M-1, and M is total number of sample points, coefficient a
jbe used for representing the weight size of each sampled point, T
irepresent sampled signal y
iaudio power.Such as, formula is below a normalized process, the number percent that each sampled point of concrete expression takies at integral energy:
Wherein,
Here only example illustrates calculating sampling signal y
iaudio power T
i, the embodiment of the present invention not as restriction, also can pass through root mean square (Rootmeansquare, be called for short: RMS) or other similar fashion obtain sampled signal y
iaudio power T
i, the process be not such as normalized, etc.
S103, at audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, carry out VAD.
Wherein, what carry out VAD can be specifically the element such as DSP or processor in terminal.
S104, to detect unsuccessfully as VAD, and at t
idetected unsuccessfully for n time continuously before moment, and the first noise energy S
0with t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, according to the first noise energy S
0generate Second Threshold A
1, and by Second Threshold A
1as t
i+1the first threshold A in moment
0, wherein, the first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is positive integer and n is less than i.
It should be noted that, when VAD detects unsuccessfully, and at described t
ibefore moment, n detection unsuccessfully refers to continuously: at t
ithe VAD that moment carries out detects unsuccessfully, and from t
i-nmoment is to t
i-1the VAD that moment carries out detects all failed, particularly, supposes that n is 2, then when VAD detects unsuccessfully, and at described t
ibefore moment, n detection unsuccessfully refers to continuously: at t
ibefore the VAD that carries out of moment detects unsuccessfully, continuous two moment are (namely from t
i-2moment is to t
i-1moment) VAD that carries out detects failure continuously 2 times.Further, for the ease of understanding technical scheme of the present invention better, VAD being detected and is unsuccessfully illustrated, such as: current is the sound of motor car engine, and the audio power due to this sound is greater than the first threshold A of current time
0, then need to carry out VAD, but by VAD, can judge that this sound is not the voice of user, therefore VAD detects unsuccessfully.In other words, if terminal is in high-noise environment, accordingly, the noise energy of neighbourhood noise can be higher, once the noise energy of neighbourhood noise is greater than the first threshold A of current time
0, just need to start VAD, but, because neighbourhood noise itself is disorderly and unsystematic, when VAD detects, cannot, from the voice signal be wherein tested with, VAD therefore can be caused to detect unsuccessfully.First noise energy S
0represent the energy level of the steady-state noise of environment residing for terminal.First threshold value M
0be default parameter, can be determined by debugging.
Also it should be noted that, in any embodiment of the present invention, first and second for distinguishing same term, such as, " second " in " first " and " Second Threshold " of " first threshold ", be only the naming method that different threshold value is distinguished, do not represent the order between threshold value.
Under the application scenarios of reality, the noise under different application scene varies in size.Such as, under quiet environment, (decibel, is called for short: db) noise about 30 to 35 decibels; Under noisy environment, neighbourhood noise can with reference to following data: mall noise is about 60db, and road noise is about 70db, and aircraft cabin noise is about 70db, and public transport noise is about 80db, and metro noise is about 90db, etc.In addition, same place, the noise size of different time is also different.Such as, same place, the noise in daytime and evening may differ 10 to 15db.
Moreover user carries out conversing under noisy environment, when talking, subconsciously can improve speech volume, thus (SignalNoiseRatio is called for short: SNR), provide feasibility basis for voice wake up to improve signal to noise ratio (S/N ratio).
Therefore, adopt at present unified noise gate, i.e. predetermined threshold value, voice wake scheme up, when voice wake terminal up, cannot distinguish and treat quiet environment and noisy environment, if predetermined threshold value arranges too high, voice can be caused undetected; If predetermined threshold value arranges too low, then can cause frequent wake up process device, and then cause power consumption bigger than normal.
In embodiments of the present invention, adjust the first threshold A in each moment in good time
0size.
Particularly, by S101 to S103, obtain at t
iinstance sample obtains sampled signal y
iaudio power T
iand this audio power T
irelative t
ithe first threshold A in moment
0size, and as audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, carry out VAD, carry out VAD to make DSP or processor etc. and according to the result of VAD, judge whether to wake terminal up.Wherein, VAD detects successfully, and namely DSP or processor etc. can carry out the element of VAD at sampled signal y
iin detect and the voice of user then wake terminal up; Otherwise VAD detects unsuccessfully, namely DSP or processor etc. can carry out the element of VAD at sampled signal y
iin do not detect and the voice of user then do not wake terminal up.
In S104, at the first noise energy S
0with t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, show the current environment that may be in high ground unrest of terminal.Now, according to the first noise energy S
0generate Second Threshold A
1, and by Second Threshold A
1as t
i+1the first threshold A in moment
0.Wherein, the first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is the positive integer being less than i.In practical application, sampled signal y
it may be comprised
ithe voice of the user in moment and neighbourhood noise, or, sampled signal y
ionly comprise t
ithe neighbourhood noise in moment.At t
imoment obtains t
i+1the first threshold A in moment
0, i.e. t
i+1during the moment, terminal performs the first threshold that in voice awakening method, S103 and S104 uses.
If t
ithe voice in moment wake up as first time voice wake up, then t
ithe first threshold A in moment
0can be default.Can think, the first threshold A preset
0be an Optimal Parameters, corresponding a kind of possible application scenarios, such as, by first threshold A
0be preset as 50 decibels, the ground unrest thresholding under quiet environment can be thought.Wherein, Fig. 2 example illustrates the first threshold under quiet environment and noisy environment.As shown in Figure 2, under quiet environment, first threshold comparatively neighbourhood noise exceeds the first preset value; Under noisy environment, first threshold comparatively neighbourhood noise exceeds the second preset value.In addition, the first threshold of noisy environment is the first threshold higher than quiet environment.
In addition, S103 can also be: 1) at audio power T
iwith t
i-1the audio power T in moment
i-1difference be more than or equal to t
ithe differential threshold A in moment
00when, carry out VAD; Or, 2) and at audio power T
ibe more than or equal to t
ithe first threshold A in moment
0, and, audio power T
iwith t
i-1the audio power T in moment
i-1difference be more than or equal to t
ithe differential threshold A in moment
00when, carry out VAD; Or, 3) and at audio power T
ibe more than or equal to t
ithe first threshold A in moment
0, or, audio power T
iwith t
i-1the audio power T in moment
i-1difference be more than or equal to t
ithe differential threshold A in moment
00, when the two meets one, carry out VAD.Wherein, t
i-1the audio power T in moment
i-1be buffer memory in the terminal, at t
i-1moment calculating sampling signal y
i-1audio power obtain.
If 1), then similar adjustment t
ithe first threshold A in moment
0method, adjustment t
ithe differential threshold A in moment
00; If 2), then similar adjustment t
ithe first threshold A in moment
0method, adjust t simultaneously
ithe first threshold A in moment
0and t
ithe differential threshold A in moment
00; If 3), then similar adjustment t
ithe first threshold A in moment
0method, adjustment t
ithe first threshold A in moment
0or t
ithe differential threshold A in moment
00.
The embodiment of the present invention is by obtaining t
iinstance sample obtains sampled signal y
iaudio power T
i, and at this audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, carry out VAD; When VAD detects unsuccessfully, and at t
idetected unsuccessfully for n time continuously before moment, and the first noise energy S
0with t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, adjustment first threshold A
0size, obtain t
i+1the first threshold A in moment
0: according to the first noise energy S
0generate Second Threshold A
1, and by Second Threshold A
1as t
i+1the first threshold A in moment
0.Wherein, the first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t
i+1the first threshold A in moment
0according to t
ithe first noise energy S in moment
0obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present
0size, make the first threshold A in each moment
0with environments match, to reduce the number of times carrying out VAD, realize the reduction of terminal power consumption under noisy environment.
In the above-described embodiments, according to the first noise energy S
0generate Second Threshold A
1, can comprise: by the first noise energy S
0as Second Threshold A
1; Or, by the first noise energy S
0with the first correction N preset
0sum is as Second Threshold A
1, i.e. A
1=S
0+ N
0; Or, by the first noise energy S
0with the first coefficient a preset
0long-pending as Second Threshold A
1, i.e. A
1=a
0× S
0.
Wherein, if the first correction N
0numerical value comparatively large, Second Threshold A is described
1at the first noise energy S
0basis on raise fast; If the first correction N
0numerical value less, Second Threshold A is described
1at the first noise energy S
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the first correction N
0size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the first coefficient a
0numerical value comparatively large, Second Threshold A is described
1at the first noise energy S
0basis on raise fast; If the first coefficient a
0numerical value less, Second Threshold A is described
1at the first noise energy S
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the first coefficient a
0size can set according to actual scene, the embodiment of the present invention will not limit.
Alternatively, can also by the first noise energy S
0with the first coefficient a preset
0product, adding the first default correction N
0as Second Threshold A
1, A
1=a
0× S
0+ N
0.
Fig. 3 is the process flow diagram of voice awakening method embodiment two of the present invention.As shown in Figure 3, the method can comprise:
S301, periodic samples is carried out to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer.
S302, calculating sampling signal y
iaudio power T
i.
S303, at audio power T
ibe less than t
ithe first threshold A in moment
0, and from t
i-mmoment is until t
imoment first threshold A separately
0with the second noise energy F
0difference be all greater than the second default threshold value M
1when, carry out VAD, m and be positive integer and m is less than i.
Exemplary, if m=2, then as audio power T
ibe less than t
ithe first threshold A in moment
0, and t
i-2the first threshold A in moment
0with the second noise energy F
0difference be greater than the second threshold value M
1, t
i-1the first threshold A in moment
0with the second noise energy F
0difference be greater than the second threshold value M
1, and t
ithe first threshold A in moment
0with the second noise energy F
0difference be greater than the second threshold value M
1time, carry out VAD.
S304, when VAD detects successfully, according to the second noise energy F
0generate the 3rd threshold value A
2, and by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0, wherein, this second noise energy F
0be by with the second extraction yield 1/z to sampled signal y
iextract, and carry out quick tracking filter to the sampled point yf extracted and obtain, wherein, z is the natural number being greater than x.
Wherein, illustrating of S301 and S302 with reference to embodiment as shown in Figure 1, can repeat no more herein.
For S303, at audio power T
ibe less than t
ithe first threshold A in moment
0when, the voice for prior art wake scheme up, no longer carry out VAD, like this, just may occur the situation that the voice of user are undetected.Such as, t
ithe first threshold A in moment
0be applicable to noisy environment, but now terminal is in relative quiet environment (such as, the environment of low ground unrest), thus causes sampled signal y
ithe voice of middle user undetected.The embodiment of the present invention changes t by S303 and S304
i+1the first threshold A in moment
0, make it mate with current environment.
When from t
i-mmoment is until t
imoment first threshold A separately
0with the second noise energy F
0difference be all greater than the second default threshold value M
1time, namely add up to occur first threshold A m+1 time
0with the second noise energy F
0difference be greater than the second default threshold value M
1situation, illustrate that terminal is now in quiet environment (environment of low ground unrest), current first threshold A
0comparatively large, need to lower, to mate with quiet environment.Wherein, the second threshold value M
1be default parameter, can obtain through debugging.
Detect successfully for S304, VAD, sampled signal y is described
iin comprise the voice of user, for avoiding the undetected of the voice of this user, according to the second noise energy F
0generate the 3rd threshold value A
2, and by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0.Wherein, this second noise energy F
0be by with the second extraction yield 1/z to sampled signal y
iextract, and quick tracking filter is carried out to the sampled point yf extracted obtain, therefore, the second noise energy F
0the energy level of the transient noise of environment residing for terminal can be reflected to a certain extent.
The embodiment of the present invention is by obtaining t
iinstance sample obtains sampled signal y
iaudio power T
i, and at this audio power T
ibe less than t
ithe first threshold A in moment
0, and from t
i-mmoment is until t
imoment first threshold A separately
0with the second noise energy F
0difference be all greater than the second default threshold value M
1when, carry out VAD; When VAD detects successfully, according to the second noise energy F
0generate the 3rd threshold value A
2, and by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0.Wherein, this second noise energy F
0be by with the second extraction yield 1/z to sampled signal y
iextract, and quick tracking filter is carried out to the sampled point yf extracted obtain, that is, t
i+1the first threshold A in moment
0according to t
ithe second noise energy F in moment
0obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present
0size, make the first threshold A in each moment
0with environments match, with reducing the number of times carrying out VAD, when realizing the reduction of terminal power consumption under noisy environment, avoid sampled signal y further
ithe voice of middle user undetected.
In the above-described embodiments, according to described second noise energy F
0generate the 3rd threshold value A
2, specifically can comprise: by the second noise energy F
0as the 3rd threshold value A
2; Or, by the second noise energy F
0with the second correction N preset
1sum is as the 3rd threshold value A
2, i.e. A
2=F
0+ N
1; Or, by the second noise energy F
0with the second coefficient a preset
1long-pending as the 3rd threshold value A
2, i.e. A
2=a
1× F
0.
Wherein, if the second correction N
1numerical value comparatively large, the 3rd threshold value A is described
2at the second noise energy F
0basis on raise fast; If the second correction N
1numerical value less, the 3rd threshold value A is described
2at the second noise energy F
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the second correction N
1size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the second coefficient a
1numerical value comparatively large, the 3rd threshold value A is described
2at the second noise energy F
0basis on raise fast; If the second coefficient a
1numerical value less, the 3rd threshold value A is described
2at the second noise energy F
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the second coefficient a
1size can set according to actual scene, the embodiment of the present invention will not limit.
Alternatively, can also by the second noise energy F
0with the second coefficient a preset
1product, adding the second default correction N
1as the 3rd threshold value A
2, A
2=a
1× F
0+ N
1.
Fig. 4 is the process flow diagram of voice awakening method embodiment three of the present invention.As shown in Figure 4, the method can comprise:
S401, periodic samples is carried out to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer.
S402, calculating sampling signal y
iaudio power T
i.
S403, at audio power T
ibe less than t
ithe first threshold A in moment
0, and t
ithe first threshold A in moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2when, according to the first noise energy S
0generate the 4th threshold value A
3, and by the 4th threshold value A
3as t
i+1the first threshold A in moment
0.
Wherein, illustrating of S401 and S402 with reference to embodiment as shown in Figure 1, can repeat no more herein.
As for S403, at audio power T
ibe less than t
ithe first threshold A in moment
0when, the voice for prior art wake scheme up, no longer carry out VAD, like this, just may occur the situation that the voice of user are undetected.Such as, t
ithe first threshold A in moment
0be applicable to noisy environment, but now terminal is in relatively quiet environment, thus causes sampled signal y
ithe voice of middle user undetected.The embodiment of the present invention changes t by S403
i+1the first threshold A in moment
0, make it mate with current environment facies.
Work as t
ithe first threshold A in moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2time, also, t
ithe first threshold A in moment
0compare the first noise energy S
0comparatively large, illustrate that terminal is now in relatively quiet environment, t
ithe first threshold A in moment
0comparatively large, need to lower, with environments match.Wherein, the 3rd threshold value M
2be default parameter, can obtain through debugging.
Because of the first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, therefore the first noise energy S
0the stable energy of reaction environment.Therefore, S403 without the need to as S303, the first threshold A in more multiple moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2.Work as t
ithe first threshold A in moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2time, sampled signal y can be described
iin comprise the voice of user, for avoiding the undetected of the voice of this user, according to the first noise energy S
0generate the 4th threshold value A
3, and by the 4th threshold value A
3as t
i+1the first threshold A in moment
0.
The embodiment of the present invention is by obtaining t
iinstance sample obtains sampled signal y
iaudio power T
i, and at this audio power T
ibe less than t
ithe first threshold A in moment
0, and t
ithe first threshold A in moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2when, according to the first noise energy S
0generate the 4th threshold value A
3, and by the 4th threshold value A
3as t
i+1the first threshold A in moment
0.Wherein, this first noise energy S
0be by with the first extraction yield 1/x to sampled signal y
iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t
i+1the first threshold A in moment
0according to t
ithe first noise energy S in moment
0obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present
0size, make the first threshold A in each moment
0with environments match, with reducing the number of times carrying out VAD, when realizing the reduction of terminal power consumption under noisy environment, avoid sampled signal y further
ithe voice of middle user undetected.
Based on above-described embodiment, wherein, according to the first noise energy S
0generate the 4th threshold value A
3can comprise: by the first noise energy S
0as the 4th threshold value A
3; Or, by the first noise energy S
0with the 3rd correction N preset
2sum is as the 4th threshold value A
3, i.e. A
3=S
0+ N
2; Or, by the first noise energy S
0with the 3rd coefficient a preset
2long-pending as the 4th threshold value A
3, i.e. A
3=a
2× S
0.
Wherein, if the 3rd correction N
2numerical value comparatively large, the 4th threshold value A is described
3at the first noise energy S
0basis on raise fast; If the 3rd correction N
2numerical value less, the 4th threshold value A is described
3at the first noise energy S
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the 3rd correction N
2size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the 3rd coefficient a
2numerical value comparatively large, the 4th threshold value A is described
3at the first noise energy S
0basis on raise fast; If the 3rd coefficient a
2numerical value less, the 4th threshold value A is described
3at the first noise energy S
0basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the 3rd coefficient a
2size can set according to actual scene, the embodiment of the present invention will not limit.
Alternatively, can also by the first noise energy S
0with the 3rd coefficient a preset
2product, adding the 3rd default correction N
2as the 4th threshold value A
3, i.e. A
3=a
2× S
0+ N
2.
Supplementary notes, the second correction N
1with the 3rd correction N
2reflect under different conditions respectively, first threshold A
0the numerical value of relative noise energy lift.Wherein, first threshold A
0relative second noise energy F
0large second correction N
1, first threshold A
0relative first noise energy S
0large 3rd correction N
2.In addition, due to the first noise energy S
0for tracking filter at a slow speed, the second noise energy F
0for quick tracking filter, therefore, alternatively, the 3rd correction N
2be greater than the second correction N
1, to realize the Rapid matching to environment.
Further, the embodiment of the present invention can also record the scene of first threshold change.For the scene raising first threshold, can be recorded as and raise the threshold value moment; For the scene reducing first threshold, can be recorded as and reduce the threshold value moment.
Particularly, by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0before, the method can also comprise: record t
imoment is for reducing the threshold value moment; Work as t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform above-mentioned by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0step, otherwise, do not perform above-mentioned by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0step.
By the 4th threshold value A
3as t
i+1the first threshold A in moment
0before, the method can also comprise: record t
imoment is for reducing the threshold value moment; Work as t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform above-mentioned by the 4th threshold value A
3as t
i+1the first threshold A in moment
0step, otherwise, do not perform above-mentioned by the 4th threshold value A
3as t
i+1the first threshold A in moment
0step.
Above-mentioned two kinds of concrete implementations can prevent first threshold A
0pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability.
The embodiment of the present invention continues to monitor and tracking environmental ground unrest, environmentally the size adaptation adjustment first threshold A of ground unrest
0, and to this first threshold A
0the slow mode rising or fall slowly is taked in adjustment, thus reduces voice false dismissal probability.In addition, first threshold A
0dynamic adjustments, make the power consumption under quiet environment and noisy environment close, thus can Consumer's Experience be promoted, improve product competitiveness.
Fig. 5 is the structural representation of voice Rouser embodiment one of the present invention.This voice Rouser can be realized by the mode of hardware.This voice Rouser can be integrated in the terminals such as such as panel computer, smart mobile phone, PDA.As shown in Figure 5, STF) 15, comparer 16, configurator 17 and interrupt processing circuit 18 voice Rouser 10 comprises: (SlowTrackingFilter is called for short: for SRC11, computing circuit 12, threshold value decision circuit 13, first withdrawal device 14, at a slow speed tracking filter.
Wherein, SRC11 is used for carrying out periodic samples to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer.Computing circuit 12 is for calculating sampling signal y
iaudio power T
i.Threshold value decision circuit 13 is for judging audio power T
iwhether be more than or equal to t
ithe first threshold A in moment
0; At audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, triggered interrupts treatment circuit 18 exports interruption pulse signal to interrupt control circuit 20, carries out VAD by the enable DSP of interrupt control circuit 20 or processor 30.The input end of the first withdrawal device 14 is coupled to the output terminal of SRC11, the first withdrawal device 14 for the first extraction yield 1/x to sampled signal y
icarry out extraction obtain sampled point ys and export, x be greater than 1 natural number.The input end of STF15 is coupled to the output terminal of the first withdrawal device 14, and STF15 is used for obtaining sampled point ys to extraction to carry out tracking filter at a slow speed and obtain the first noise energy S
0.The input end of comparer 16 is coupled to output terminal and the threshold value decision circuit 13 of STF15, and comparer 16 is for comparing the first noise energy S
0with t
ithe first threshold A in moment
0difference whether be greater than the first default threshold value M
0.Configurator 17 detects unsuccessfully for working as VAD, and at t
idetected unsuccessfully for n time continuously before moment, and the first noise energy S
0with t
ithe first threshold A in moment
0difference be greater than the first default threshold value M
0time, according to the first noise energy S
0generate Second Threshold A
1, and by Second Threshold A
1as t
i+1the first threshold A in moment
0, be issued to threshold value decision circuit 13, n and be positive integer and n is less than i.
With reference to figure 5, configurator 17 is voice Rouser 10 configuration parameter, such as above-mentioned first threshold A
0deng.It will be appreciated by those skilled in the art that, configurator 17 receives the configuration parameter of self terminal, and corresponding control signal configuration parameter converted to each logic module in voice Rouser 10, wherein, logic module comprises computing circuit 12, threshold value decision circuit 13 and interrupt processing circuit 18 etc.SRC11 specifically can adopt down-sampled mode to sample to sound signal, such as by 32 kilo hertzs of (kilohertz, abbreviations: data KHz) are converted to 16KHz etc.
Sampled signal y
ithe flow direction is in Figure 5:
SRC11-> computing circuit 12-> threshold value decision circuit 13-> interrupt processing circuit 18 (optional)-> interrupt control circuit 20 (optional)->DSP or processor 30 (optional).
At audio power T
ibe more than or equal to t
ithe first threshold A in moment
0when, sampled signal y
ithe flow direction comprise above-mentioned optional part; At audio power T
ibe less than t
ithe first threshold A in moment
0when, sampled signal y
ithe flow direction do not comprise above-mentioned optional part.
First withdrawal device 14, STF15 and comparer 16 do not affect normal voice and wake up, only for configurator 17 acting in conjunction change voice wake up in first threshold A
0.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 1, it realizes principle and technique effect is similar, repeats no more herein.
In the above-described embodiments, configurator 17 can be specifically for: by the first noise energy S
0as Second Threshold A
1; Or, by the first noise energy S
0with the first correction N preset
0sum is as Second Threshold A
1, i.e. A
1=S
0+ N
0; Or, by the first noise energy S
0with the first coefficient a preset
0long-pending as Second Threshold A
1, i.e. A
1=a
0× S
0, etc., the embodiment of the present invention is not as restriction.
Fig. 6 is the structural representation of voice Rouser embodiment two of the present invention.This voice Rouser can be realized by the mode of hardware.This voice Rouser can be integrated in the terminals such as such as panel computer, smart mobile phone, PDA.As shown in Figure 6, FTF) 150, comparer 160, configurator 170 and interrupt processing circuit 180 voice Rouser 100 comprises: (FastTrackingFilter is called for short: for SRC110, computing circuit 120, threshold value decision circuit 130, second withdrawal device 140, fast tracking filter.
Wherein, SRC110 is used for carrying out periodic samples to sound signal, wherein, at t
iinstance sample obtains sampled signal y
i, i is positive integer.Computing circuit 120 is for calculating sampling signal y
iaudio power T
i.Threshold value decision circuit 130 is for judging audio power T
iwhether be more than or equal to t
ithe first threshold A in moment
0.The input end of the second withdrawal device 140 is coupled to the output terminal of SRC110, the second withdrawal device 140 for the second extraction yield 1/z to sampled signal y
icarry out extraction and obtain sampled point yf, wherein, z is the natural number being greater than x.The input end of FTF150 is coupled to the output terminal of the second withdrawal device 140, and FTF150 is used for obtaining sampled point yf to extraction to carry out quick tracking filter and obtain the second noise energy F
0.The input end of comparer 160 is coupled to the output terminal of FTF150, and comparer 160 is at audio power T
ibe less than t
ithe first threshold A in moment
0when, the first threshold in more each moment and the second noise energy F
0difference whether be greater than the second default threshold value M
1; And work as from t
i-mmoment is until t
imoment first threshold A separately
0with the second noise energy F
0difference be all greater than the second default threshold value M
1when, triggered interrupts treatment circuit 180 exports interruption pulse signal to interrupt control circuit 200, carries out VAD, m be positive integer and m is less than i by the enable DSP of interrupt control circuit 200 or processor 300.Configurator 170 for when VAD detects successfully, according to the second noise energy F
0generate the 3rd threshold value A
2, and by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0, be issued to threshold value decision circuit 130.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 3, it realizes principle and technique effect is similar, repeats no more herein.
On the basis of above-described embodiment, configurator can be specifically for: by the second noise energy F
0as the 3rd threshold value A
2; Or, by the second noise energy F
0with the second correction N preset
1sum is as the 3rd threshold value A
2, i.e. A
2=F
0+ N
1; Or, by the second noise energy F
0with the second coefficient a preset
1long-pending as the 3rd threshold value A
2, i.e. A
2=a
1× F
0, etc., the embodiment of the present invention is not as restriction.
Alternatively, configurator 170 can also be used for: record t
imoment is for reducing the threshold value moment; Work as t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform above-mentioned by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0step, otherwise, do not perform above-mentioned by the 3rd threshold value A
2as t
i+1the first threshold A in moment
0step, thus can first threshold A be prevented
0pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability.
With reference to figure 5, configurator 17 can also be used for: at audio power T
ibe less than t
ithe first threshold A in moment
0, and t
ithe first threshold A in moment
0with the first noise energy S
0difference be greater than the 3rd default threshold value M
2when, according to the first noise energy S
0generate the 4th threshold value A
3, and by the 4th threshold value A
3as t
i+1the first threshold A in moment
0.
Now, the device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 4, it realizes principle and technique effect is similar, repeats no more herein.
Further, configurator 17 can be specifically for: by the first noise energy S
0as the 4th threshold value A
3; Or, by the first noise energy S
0with the 3rd correction N preset
2sum is as the 4th threshold value A
3, i.e. A
3=S
0+ N
2; Or, by the first noise energy S
0with the 3rd coefficient a preset
2long-pending as the 4th threshold value A
3, i.e. A
3=a
2× S
0, etc., the embodiment of the present invention is not as restriction.
Further, configurator 17 can also be used for: record t
imoment is for reducing the threshold value moment; Work as t
imoment and upper one reduce the threshold value moment interval greater than preset value T
timetime, perform above-mentioned by the 4th threshold value A
3as t
i+1the first threshold A in moment
0step, otherwise, do not perform above-mentioned by the 4th threshold value A
3as t
i+1the first threshold A in moment
0step, thus can first threshold A be prevented
0pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability
With reference to figure 5 and Fig. 6, the first withdrawal device 14 and the second withdrawal device 140 realize long period or short-period data pick-up respectively.STF15 is the wave filter of a slow convergence, changes for tenacious tracking neighbourhood noise.FTF150 is the wave filter of a Fast Convergent, for quick tracking environmental noise change.Alternatively, STF15 is the wave filter of a slow convergence, changes for tenacious tracking neighbourhood noise.STF15 and FTF150, for following the tracks of the energy of current calculating window, adopts and computing circuit 12 or the similar structure of computing circuit 120.The difference of STF15 and FTF150 is the exponent number of wave filter and the difference of parameter, and the exponent number of wave filter and parameter set according to the debugging situation of reality.FTF150 is used for carrying out short period filtering, and the data variation namely occurred recently can the output of rapid contribution wave filter.STF15 is long period filtering, and the impact of data variation on the output of wave filter namely occurred recently is smaller and slow.
Alternatively, on the basis of Fig. 5, composition graphs 6, obtains structure as shown in Figure 7.Fig. 7 is the structural representation of voice Rouser embodiment three of the present invention.As shown in Figure 7, voice Rouser 1000 comprises: SRC11, computing circuit 12, threshold value decision circuit 13, first withdrawal device 14, second withdrawal device 140, STF15, FTF150, comparer 16, configurator 17 and interrupt processing circuit 18.
Wherein, threshold value decision circuit 13 also possesses effect and the function of threshold value decision circuit 130; Comparer 16 also possesses effect and the function of comparer 160; Configurator 17 also possesses effect and the function of configurator 170; Interrupt processing circuit 18 also possesses effect and the function of interrupt processing circuit 180.Concrete principle, as above-described embodiment, repeats no more herein.
The embodiment of the present invention continues to monitor and tracking environmental ground unrest, environmentally the size adaptation adjustment first threshold A of ground unrest
0, and to this first threshold A
0the slow mode rising or fall slowly is taked in adjustment, thus reduces voice false dismissal probability.In addition, first threshold A
0dynamic adjustments, make the power consumption under quiet environment and noisy environment close, thus can Consumer's Experience be promoted, improve product competitiveness.
In several embodiments that the application provides, should be understood that the equipment disclosed and method can realize by another way.Such as, apparatus embodiments described above is only schematic, such as, the division of described unit or module, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or module can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of equipment or module or communication connection can be electrical, machinery or other form.
The described module illustrated as separating component can or may not be physically separates, and the parts as module display can be or may not be physical module, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.