CN105261368A

CN105261368A - Voice wake-up method and apparatus

Info

Publication number: CN105261368A
Application number: CN201510549435.6A
Authority: CN
Inventors: 马涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Guangdong Gaohang Intellectual Property Operation Co ltd; Nanjing Advanced Biomaterials And Process Equipment Research Institute Co ltd
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2016-01-20
Anticipated expiration: 2035-08-31
Also published as: CN105261368B

Abstract

The embodiment of the invention provides a voice wake-up method and apparatus. The method comprises: periodic sampling is carried out on an audio signal, wherein sampling is carried out at ti time to obtain a sampling signal; audio energy of the sampling signal is calculated; when the audio energy is larger than or equal to a first threshold value at the ti time, a digital signal processor (DSP) is waken up to carry out voice activation detection (VAD); when the VAD fails, detection fails n times continuously before the ti time, and a difference value between first noise energy and a first threshold value at the ti time is larger than a preset first threshold value, a second threshold value is generated according to the first noise energy and is used as a first threshold value at ti+1 time, wherein the first noise energy is obtained by extracting the sampling signal by a first extraction rate 1/x and carrying out slow-speed tracking filtering on the extracted sampling point. According to the embodiment of the invention, the number of times of VAD is reduced and reduction of power consumption of the terminal in a noisy environment is realized.

Description

A kind of voice awakening method and device

Technical field

The embodiment of the present invention relates to voice awakening technology, particularly relates to a kind of voice awakening method and device.

Background technology

Along with the development of science and technology, terminal generally has voice arousal function, and user uses voice to wake terminal up and carries out corresponding Voice command to it.

It is adopt microphone to activate to detect (MicrophoneActivityDetection that current voice wake scheme up, be called for short: MAD) (DigitalSignalProcessor is called for short: DSP) two-stage cooperation wakes terminal up for circuit and digital signal processor.Wherein, if MAD electric circuit inspection to the energy of current audio signals be greater than predetermined threshold value, then wake DSP up to carry out voice activation detection (VoiceActivityDetection, is called for short: VAD), to be identified above-mentioned sound signal by VAD whether for the voice of user; If so, then terminal is waken up; If not, DSP wakes up as Lost wake-up or false wake-up.Particularly, VAD, by the feature of the feature of the above-mentioned sound signal of comparison and the voice of user, judges that whether voice signal is the voice of user.

Above-mentioned voice are adopted to wake scheme up; such as, when terminal is in different environment, under being switched to noisy environment by quiet environment, because predetermined threshold value is fixing; therefore often there will be the phenomenon of Lost wake-up or false wake-up, cause the power consumption of terminal under noisy environment higher.

Summary of the invention

The embodiment of the present invention provides a kind of voice awakening method and device, to reduce the power consumption of terminal under noisy environment.

First aspect, the embodiment of the present invention provides a kind of voice awakening method, comprising:

Periodic samples is carried out to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer;

Calculate described sampled signal y _iaudio power T _i;

At described audio power T _ibe more than or equal to described t _ithe first threshold A in moment ₀when, carry out voice activation and detect VAD;

When VAD detects unsuccessfully for n time continuously, and when VAD detects unsuccessfully, and at described t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with described t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, according to described first noise energy S ₀generate Second Threshold A ₁, and by described Second Threshold A ₁as t _i+1the first threshold A in moment ₀, wherein, described first noise energy S ₀be by with the first extraction yield 1/x to described sampled point y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is positive integer and n is less than i.

In conjunction with first aspect, in the first possible implementation of first aspect, described according to described first noise energy S ₀generate Second Threshold A ₁, comprising:

By described first noise energy S ₀as described Second Threshold A ₁;

Or, by described first noise energy S ₀with the first correction N preset ₀sum is as described Second Threshold A ₁;

Or, by described first noise energy S ₀with the first coefficient a preset ₀long-pending as described Second Threshold A ₁.

In conjunction with first aspect, in the implementation that the second of first aspect is possible, at the described sampled signal y of described calculating _iaudio power T _iafterwards, also comprise:

At described audio power T _ibe less than described t _ithe first threshold A in moment ₀, and from t _i-mmoment is until t _imoment first threshold A separately ₀with the second noise energy F ₀difference be all greater than the second default threshold value M ₁when, carry out VAD, m and be positive integer and m is less than i;

When VAD detects successfully, according to described second noise energy F ₀generate the 3rd threshold value A ₂, and by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀, wherein, described second noise energy F ₀be by with the second extraction yield 1/z to described sampled signal y _iextract, and carry out quick tracking filter to the sampled point yf extracted and obtain, wherein, z is the natural number being greater than x.

In conjunction with the implementation that the second of first aspect is possible, in the third possible implementation of first aspect, described according to described second noise energy F ₀generate the 3rd threshold value A ₂, comprising:

By described second noise energy F ₀as described 3rd threshold value A ₂;

Or, by described second noise energy F ₀with the second correction N preset ₁sum is as described 3rd threshold value A ₂;

Or, by described second noise energy F ₀with the second coefficient a preset ₁long-pending as described 3rd threshold value A ₂.

In conjunction with the second or the third possible implementation of first aspect, in the 4th kind of possible implementation of first aspect, by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀before, also comprise:

Record described t _imoment is for reducing the threshold value moment;

As described t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform described by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step, otherwise, do not perform described by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step.

In conjunction with first aspect, in the 5th kind of possible implementation of first aspect, at the described sampled signal y of described calculating _iaudio power T _iafterwards, also comprise:

At described audio power T _ibe less than described t _ithe first threshold A in moment ₀, and described t _ithe first threshold A in moment ₀with described first noise energy S ₀difference be greater than the 3rd default threshold value M ₂when, according to described first noise energy S ₀generate the 4th threshold value A ₃, and by described 4th threshold value A ₃as t _i+1the first threshold A in moment ₀.

In conjunction with the 5th kind of possible implementation of first aspect, in the 6th kind of possible implementation of first aspect, described according to described first noise energy S ₀generate the 4th threshold value A ₃, comprising:

By described first noise energy S ₀as described 4th threshold value A ₃;

Or, by described first noise energy S ₀with the 3rd correction N preset ₂sum is as described 4th threshold value A ₃;

Or, by described first noise energy S ₀with the 3rd coefficient a preset ₂long-pending as described 4th threshold value A ₃.

In conjunction with the 5th kind or the 6th kind of possible implementation of first aspect, in the 7th kind of possible implementation of first aspect, by described 4th threshold value A ₃as t _i+1the first threshold A in moment ₀before, also comprise:

Record described t _imoment is for reducing the threshold value moment;

As described t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform described by described 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step, otherwise, do not perform described by described 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step.

Second aspect, the embodiment of the present invention provides a kind of voice Rouser, comprising:

Sampling frequency converter SRC, for carrying out periodic samples to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer;

Computing circuit, for calculating described sampled signal y _iaudio power T _i;

Threshold value decision circuit, for judging described audio power T _iwhether be more than or equal to described t _ithe first threshold A in moment ₀; At described audio power T _ibe more than or equal to described t _ithe first threshold A in moment ₀when, triggered interrupts treatment circuit exports interruption pulse signal to interrupt control circuit, carries out voice activation detect VAD by described interrupt control circuit enable digital signals processor DSP or processor;

First withdrawal device, the input end of described first withdrawal device is coupled to the output terminal of described SRC, for the first extraction yield 1/x to described sampled signal y _icarry out extraction and obtain sampled point ys, x be greater than 1 natural number;

The input end of tracking filter STF at a slow speed, described STF is coupled to the output terminal of described first withdrawal device, carries out tracking filter at a slow speed obtain the first noise energy S for obtaining sampled point ys to described extraction ₀;

Comparer, the input end of described comparer is coupled to and the output terminal of described STF and described threshold value decision circuit, for more described first noise energy S ₀with described t _ithe first threshold A in moment ₀difference whether be greater than the first default threshold value M ₀;

Configurator, for detecting unsuccessfully as VAD, and at described t _idetected unsuccessfully for n time continuously before moment, and described first noise energy S ₀with described t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, according to described first noise energy S ₀generate Second Threshold A ₁, and by described Second Threshold A ₁as t _i+1the first threshold A in moment ₀, be issued to described threshold value decision circuit, n is positive integer and n is less than i.

In conjunction with second aspect, in the first possible implementation of second aspect, described configurator specifically for:

By described first noise energy S ₀as described Second Threshold A ₁;

In conjunction with second aspect, in the implementation that the second of second aspect is possible, also comprise:

Second withdrawal device, the input end of described second withdrawal device is coupled to the output terminal of described SRC, for the second extraction yield 1/z to described sampled signal y _icarry out extraction and obtain sampled point yf, wherein, z is the natural number being greater than x;

The input end of fast tracking filter FTF, described FTF is coupled to the output terminal of described second withdrawal device, carries out quick tracking filter obtain the second noise energy F for obtaining sampled point yf to described extraction ₀second noise energy;

Described comparer, with the output terminal of described FTF, also at described audio power T _ibe less than described t _ithe first threshold A in moment ₀when, the first threshold in more each moment and described second noise energy F ₀difference whether be greater than the second default threshold value M ₁; And work as from t _i-mmoment is until t _imoment first threshold A separately ₀with described second noise energy F ₀difference be all greater than the second default threshold value M ₁when, trigger described interrupt processing circuit and export interruption pulse signal to described interrupt control circuit, carry out VAD, m by the enable described DSP of described interrupt control circuit or described processor and be positive integer and m is less than i;

Described configurator, also for when VAD detects successfully, according to described second noise energy F ₀generate the 3rd threshold value A ₂, and by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀, be issued to described threshold value decision circuit.

In conjunction with the implementation that the second of second aspect is possible, in the third possible implementation of second aspect, described configurator specifically for:

By described second noise energy F ₀as described 3rd threshold value A ₂;

In conjunction with the second or the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described configurator also for:

Record described t _imoment is for reducing the threshold value moment;

In conjunction with second aspect, in the 5th kind of possible implementation of second aspect, described configurator also for:

In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect, described configurator specifically for:

By described first noise energy S ₀as described 4th threshold value A ₃;

In conjunction with the 5th kind or the 6th kind of possible implementation of second aspect, in the 7th kind of possible implementation of second aspect, described configurator also for:

Record described t _imoment is for reducing the threshold value moment;

The embodiment of the present invention provides a kind of voice awakening method and device, by obtaining t _iinstance sample obtains sampled signal y _iaudio power T _i, and at this audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, carry out VAD; When VAD detects unsuccessfully, and at t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, adjustment first threshold A ₀size, obtain t _i+1the first threshold A in moment ₀: according to the first noise energy S ₀generate Second Threshold A ₁, and by Second Threshold A ₁as t _i+1the first threshold A in moment ₀.Wherein, the first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t _i+1the first threshold A in moment ₀according to t _ithe first noise energy S in moment ₀obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present ₀size, make the first threshold A in each moment ₀with environments match, to reduce the number of times carrying out VAD, realize the reduction of terminal power consumption under noisy environment.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in describing embodiment is done one below to introduce simply, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the process flow diagram of voice awakening method embodiment one of the present invention;

Fig. 2 is voice awakening method of the present invention first threshold exemplary plot under various circumstances;

Fig. 3 is the process flow diagram of voice awakening method embodiment two of the present invention;

Fig. 4 is the process flow diagram of voice awakening method embodiment three of the present invention;

Fig. 5 is the structural representation of voice Rouser embodiment one of the present invention;

Fig. 6 is the structural representation of voice Rouser embodiment two of the present invention;

Fig. 7 is the structural representation of voice Rouser embodiment three of the present invention.

Embodiment

For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

The implication that voice wake up, refers in any case, can wake word up, activate terminal by predefined, and performs specific application.Similar user key-press lights screen, the process Activated Phone.The advantage that voice wake up is the both hands of having liberated user.

Wake up in scheme at the voice of a smart mobile phone, under quiet environment, the stand-by power consumption of this smart mobile phone is about 2.2 milliamperes × 3.8 volts; Under noisy environment, the stand-by power consumption of this smart mobile phone is 5.5 milliamperes × 3.8 volts.Visible, the power consumption difference of this smart mobile phone under noisy environment and quiet environment is about 12 milliwatts, (5.5-2.2) × 3.8=12.

According to power consumption estimation model: average power consumption=noisy power consumption × 30% of quiet power consumption × 70%+, therefore, should consider the power consumption under reduction noisy environment, the embodiment of the present invention pays close attention to the optimised power consumption under noisy environment.

A kind of method that the embodiment of the present invention provides voice to wake digital signal processor in terminal up and device, wake DSP in terminal up carry out the number of times of VAD to reduce, realize the reduction of the power consumption of terminal under noisy environment.

Fig. 1 is the process flow diagram of voice awakening method embodiment one of the present invention.The method can be performed by voice Rouser, and this device can be realized by the mode of hardware.PDA) etc. voice Rouser can be integrated in such as panel computer, smart mobile phone, palm PC, and (PersonalDigitalAssistant is called for short: in terminal.As shown in Figure 1, voice awakening method comprises:

S101, periodic samples is carried out to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer.

Similarly, t _i-1the sampled signal in moment can be denoted as y _i-1, t _i+1the sampled signal in moment can be denoted as y _i+1, by that analogy, do not enumerate here.

Wherein, in any embodiment of the present invention, the signal that sound signal can collect for sound collection equipment such as microphones.By sampling frequency converter, (SampleRateConvertor is called for short: SRC) carry out periodic samples to the sound signal that the sound collection equipment such as microphone collect.Or the sound signal sound collection equipment such as microphone collected after the filter process such as such as bandpass filter, then carries out periodic samples by SRC, and the embodiment of the present invention is not limited.

S102, calculating sampling signal y _iaudio power T _i.

It should be noted that, can carry out after obtaining sampled signal the calculating of the audio power of sampled signal, such as: at t _i-1instance sample obtains sampled signal y _i-1after, also can calculate sampled signal y _i-1corresponding audio power T _i-1.

It will be appreciated by those skilled in the art that because of sampled signal y _icertain, therefore, sampled signal y _iaudio power T _ican obtain by calculating.

Particularly, x (j) is adopted to represent sampled signal y _iin the amplitude of jth sampled point, x (j) × x (j) represents sampled signal y _iin the energy size in jth moment, j is the integer between 0 to M-1, and M is total number of sample points, coefficient a _jbe used for representing the weight size of each sampled point, T _irepresent sampled signal y _iaudio power.Such as, formula is below a normalized process, the number percent that each sampled point of concrete expression takies at integral energy:

T_{i} = \sqrt{Σ_{j = 0}^{M - 1} a_{j} \times x (j) \times x (j)},

Wherein,

Σ_{j = 0}^{M - 1} a_{j} = 1

Here only example illustrates calculating sampling signal y _iaudio power T _i, the embodiment of the present invention not as restriction, also can pass through root mean square (Rootmeansquare, be called for short: RMS) or other similar fashion obtain sampled signal y _iaudio power T _i, the process be not such as normalized, etc.

S103, at audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, carry out VAD.

Wherein, what carry out VAD can be specifically the element such as DSP or processor in terminal.

S104, to detect unsuccessfully as VAD, and at t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, according to the first noise energy S ₀generate Second Threshold A ₁, and by Second Threshold A ₁as t _i+1the first threshold A in moment ₀, wherein, the first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is positive integer and n is less than i.

It should be noted that, when VAD detects unsuccessfully, and at described t _ibefore moment, n detection unsuccessfully refers to continuously: at t _ithe VAD that moment carries out detects unsuccessfully, and from t _i-nmoment is to t _i-1the VAD that moment carries out detects all failed, particularly, supposes that n is 2, then when VAD detects unsuccessfully, and at described t _ibefore moment, n detection unsuccessfully refers to continuously: at t _ibefore the VAD that carries out of moment detects unsuccessfully, continuous two moment are (namely from t _i-2moment is to t _i-1moment) VAD that carries out detects failure continuously 2 times.Further, for the ease of understanding technical scheme of the present invention better, VAD being detected and is unsuccessfully illustrated, such as: current is the sound of motor car engine, and the audio power due to this sound is greater than the first threshold A of current time ₀, then need to carry out VAD, but by VAD, can judge that this sound is not the voice of user, therefore VAD detects unsuccessfully.In other words, if terminal is in high-noise environment, accordingly, the noise energy of neighbourhood noise can be higher, once the noise energy of neighbourhood noise is greater than the first threshold A of current time ₀, just need to start VAD, but, because neighbourhood noise itself is disorderly and unsystematic, when VAD detects, cannot, from the voice signal be wherein tested with, VAD therefore can be caused to detect unsuccessfully.First noise energy S ₀represent the energy level of the steady-state noise of environment residing for terminal.First threshold value M ₀be default parameter, can be determined by debugging.

Also it should be noted that, in any embodiment of the present invention, first and second for distinguishing same term, such as, " second " in " first " and " Second Threshold " of " first threshold ", be only the naming method that different threshold value is distinguished, do not represent the order between threshold value.

Under the application scenarios of reality, the noise under different application scene varies in size.Such as, under quiet environment, (decibel, is called for short: db) noise about 30 to 35 decibels; Under noisy environment, neighbourhood noise can with reference to following data: mall noise is about 60db, and road noise is about 70db, and aircraft cabin noise is about 70db, and public transport noise is about 80db, and metro noise is about 90db, etc.In addition, same place, the noise size of different time is also different.Such as, same place, the noise in daytime and evening may differ 10 to 15db.

Moreover user carries out conversing under noisy environment, when talking, subconsciously can improve speech volume, thus (SignalNoiseRatio is called for short: SNR), provide feasibility basis for voice wake up to improve signal to noise ratio (S/N ratio).

Therefore, adopt at present unified noise gate, i.e. predetermined threshold value, voice wake scheme up, when voice wake terminal up, cannot distinguish and treat quiet environment and noisy environment, if predetermined threshold value arranges too high, voice can be caused undetected; If predetermined threshold value arranges too low, then can cause frequent wake up process device, and then cause power consumption bigger than normal.

In embodiments of the present invention, adjust the first threshold A in each moment in good time ₀size.

Particularly, by S101 to S103, obtain at t _iinstance sample obtains sampled signal y _iaudio power T _iand this audio power T _irelative t _ithe first threshold A in moment ₀size, and as audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, carry out VAD, carry out VAD to make DSP or processor etc. and according to the result of VAD, judge whether to wake terminal up.Wherein, VAD detects successfully, and namely DSP or processor etc. can carry out the element of VAD at sampled signal y _iin detect and the voice of user then wake terminal up; Otherwise VAD detects unsuccessfully, namely DSP or processor etc. can carry out the element of VAD at sampled signal y _iin do not detect and the voice of user then do not wake terminal up.

In S104, at the first noise energy S ₀with t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, show the current environment that may be in high ground unrest of terminal.Now, according to the first noise energy S ₀generate Second Threshold A ₁, and by Second Threshold A ₁as t _i+1the first threshold A in moment ₀.Wherein, the first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is the positive integer being less than i.In practical application, sampled signal y _it may be comprised _ithe voice of the user in moment and neighbourhood noise, or, sampled signal y _ionly comprise t _ithe neighbourhood noise in moment.At t _imoment obtains t _i+1the first threshold A in moment ₀, i.e. t _i+1during the moment, terminal performs the first threshold that in voice awakening method, S103 and S104 uses.

If t _ithe voice in moment wake up as first time voice wake up, then t _ithe first threshold A in moment ₀can be default.Can think, the first threshold A preset ₀be an Optimal Parameters, corresponding a kind of possible application scenarios, such as, by first threshold A ₀be preset as 50 decibels, the ground unrest thresholding under quiet environment can be thought.Wherein, Fig. 2 example illustrates the first threshold under quiet environment and noisy environment.As shown in Figure 2, under quiet environment, first threshold comparatively neighbourhood noise exceeds the first preset value; Under noisy environment, first threshold comparatively neighbourhood noise exceeds the second preset value.In addition, the first threshold of noisy environment is the first threshold higher than quiet environment.

In addition, S103 can also be: 1) at audio power T _iwith t _i-1the audio power T in moment _i-1difference be more than or equal to t _ithe differential threshold A in moment ₀₀when, carry out VAD; Or, 2) and at audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀, and, audio power T _iwith t _i-1the audio power T in moment _i-1difference be more than or equal to t _ithe differential threshold A in moment ₀₀when, carry out VAD; Or, 3) and at audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀, or, audio power T _iwith t _i-1the audio power T in moment _i-1difference be more than or equal to t _ithe differential threshold A in moment ₀₀, when the two meets one, carry out VAD.Wherein, t _i-1the audio power T in moment _i-1be buffer memory in the terminal, at t _i-1moment calculating sampling signal y _i-1audio power obtain.

If 1), then similar adjustment t _ithe first threshold A in moment ₀method, adjustment t _ithe differential threshold A in moment ₀₀; If 2), then similar adjustment t _ithe first threshold A in moment ₀method, adjust t simultaneously _ithe first threshold A in moment ₀and t _ithe differential threshold A in moment ₀₀; If 3), then similar adjustment t _ithe first threshold A in moment ₀method, adjustment t _ithe first threshold A in moment ₀or t _ithe differential threshold A in moment ₀₀.

The embodiment of the present invention is by obtaining t _iinstance sample obtains sampled signal y _iaudio power T _i, and at this audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, carry out VAD; When VAD detects unsuccessfully, and at t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, adjustment first threshold A ₀size, obtain t _i+1the first threshold A in moment ₀: according to the first noise energy S ₀generate Second Threshold A ₁, and by Second Threshold A ₁as t _i+1the first threshold A in moment ₀.Wherein, the first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t _i+1the first threshold A in moment ₀according to t _ithe first noise energy S in moment ₀obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present ₀size, make the first threshold A in each moment ₀with environments match, to reduce the number of times carrying out VAD, realize the reduction of terminal power consumption under noisy environment.

In the above-described embodiments, according to the first noise energy S ₀generate Second Threshold A ₁, can comprise: by the first noise energy S ₀as Second Threshold A ₁; Or, by the first noise energy S ₀with the first correction N preset ₀sum is as Second Threshold A ₁, i.e. A ₁=S ₀+ N ₀; Or, by the first noise energy S ₀with the first coefficient a preset ₀long-pending as Second Threshold A ₁, i.e. A ₁=a ₀× S ₀.

Wherein, if the first correction N ₀numerical value comparatively large, Second Threshold A is described ₁at the first noise energy S ₀basis on raise fast; If the first correction N ₀numerical value less, Second Threshold A is described ₁at the first noise energy S ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the first correction N ₀size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the first coefficient a ₀numerical value comparatively large, Second Threshold A is described ₁at the first noise energy S ₀basis on raise fast; If the first coefficient a ₀numerical value less, Second Threshold A is described ₁at the first noise energy S ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the first coefficient a ₀size can set according to actual scene, the embodiment of the present invention will not limit.

Alternatively, can also by the first noise energy S ₀with the first coefficient a preset ₀product, adding the first default correction N ₀as Second Threshold A ₁, A ₁=a ₀× S ₀+ N ₀.

Fig. 3 is the process flow diagram of voice awakening method embodiment two of the present invention.As shown in Figure 3, the method can comprise:

S301, periodic samples is carried out to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer.

S302, calculating sampling signal y _iaudio power T _i.

S303, at audio power T _ibe less than t _ithe first threshold A in moment ₀, and from t _i-mmoment is until t _imoment first threshold A separately ₀with the second noise energy F ₀difference be all greater than the second default threshold value M ₁when, carry out VAD, m and be positive integer and m is less than i.

Exemplary, if m=2, then as audio power T _ibe less than t _ithe first threshold A in moment ₀, and t _i-2the first threshold A in moment ₀with the second noise energy F ₀difference be greater than the second threshold value M ₁, t _i-1the first threshold A in moment ₀with the second noise energy F ₀difference be greater than the second threshold value M ₁, and t _ithe first threshold A in moment ₀with the second noise energy F ₀difference be greater than the second threshold value M ₁time, carry out VAD.

S304, when VAD detects successfully, according to the second noise energy F ₀generate the 3rd threshold value A ₂, and by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀, wherein, this second noise energy F ₀be by with the second extraction yield 1/z to sampled signal y _iextract, and carry out quick tracking filter to the sampled point yf extracted and obtain, wherein, z is the natural number being greater than x.

Wherein, illustrating of S301 and S302 with reference to embodiment as shown in Figure 1, can repeat no more herein.

For S303, at audio power T _ibe less than t _ithe first threshold A in moment ₀when, the voice for prior art wake scheme up, no longer carry out VAD, like this, just may occur the situation that the voice of user are undetected.Such as, t _ithe first threshold A in moment ₀be applicable to noisy environment, but now terminal is in relative quiet environment (such as, the environment of low ground unrest), thus causes sampled signal y _ithe voice of middle user undetected.The embodiment of the present invention changes t by S303 and S304 _i+1the first threshold A in moment ₀, make it mate with current environment.

When from t _i-mmoment is until t _imoment first threshold A separately ₀with the second noise energy F ₀difference be all greater than the second default threshold value M ₁time, namely add up to occur first threshold A m+1 time ₀with the second noise energy F ₀difference be greater than the second default threshold value M ₁situation, illustrate that terminal is now in quiet environment (environment of low ground unrest), current first threshold A ₀comparatively large, need to lower, to mate with quiet environment.Wherein, the second threshold value M ₁be default parameter, can obtain through debugging.

Detect successfully for S304, VAD, sampled signal y is described _iin comprise the voice of user, for avoiding the undetected of the voice of this user, according to the second noise energy F ₀generate the 3rd threshold value A ₂, and by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀.Wherein, this second noise energy F ₀be by with the second extraction yield 1/z to sampled signal y _iextract, and quick tracking filter is carried out to the sampled point yf extracted obtain, therefore, the second noise energy F ₀the energy level of the transient noise of environment residing for terminal can be reflected to a certain extent.

The embodiment of the present invention is by obtaining t _iinstance sample obtains sampled signal y _iaudio power T _i, and at this audio power T _ibe less than t _ithe first threshold A in moment ₀, and from t _i-mmoment is until t _imoment first threshold A separately ₀with the second noise energy F ₀difference be all greater than the second default threshold value M ₁when, carry out VAD; When VAD detects successfully, according to the second noise energy F ₀generate the 3rd threshold value A ₂, and by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀.Wherein, this second noise energy F ₀be by with the second extraction yield 1/z to sampled signal y _iextract, and quick tracking filter is carried out to the sampled point yf extracted obtain, that is, t _i+1the first threshold A in moment ₀according to t _ithe second noise energy F in moment ₀obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present ₀size, make the first threshold A in each moment ₀with environments match, with reducing the number of times carrying out VAD, when realizing the reduction of terminal power consumption under noisy environment, avoid sampled signal y further _ithe voice of middle user undetected.

In the above-described embodiments, according to described second noise energy F ₀generate the 3rd threshold value A ₂, specifically can comprise: by the second noise energy F ₀as the 3rd threshold value A ₂; Or, by the second noise energy F ₀with the second correction N preset ₁sum is as the 3rd threshold value A ₂, i.e. A ₂=F ₀+ N ₁; Or, by the second noise energy F ₀with the second coefficient a preset ₁long-pending as the 3rd threshold value A ₂, i.e. A ₂=a ₁× F ₀.

Wherein, if the second correction N ₁numerical value comparatively large, the 3rd threshold value A is described ₂at the second noise energy F ₀basis on raise fast; If the second correction N ₁numerical value less, the 3rd threshold value A is described ₂at the second noise energy F ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the second correction N ₁size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the second coefficient a ₁numerical value comparatively large, the 3rd threshold value A is described ₂at the second noise energy F ₀basis on raise fast; If the second coefficient a ₁numerical value less, the 3rd threshold value A is described ₂at the second noise energy F ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the second coefficient a ₁size can set according to actual scene, the embodiment of the present invention will not limit.

Alternatively, can also by the second noise energy F ₀with the second coefficient a preset ₁product, adding the second default correction N ₁as the 3rd threshold value A ₂, A ₂=a ₁× F ₀+ N ₁.

Fig. 4 is the process flow diagram of voice awakening method embodiment three of the present invention.As shown in Figure 4, the method can comprise:

S401, periodic samples is carried out to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer.

S402, calculating sampling signal y _iaudio power T _i.

S403, at audio power T _ibe less than t _ithe first threshold A in moment ₀, and t _ithe first threshold A in moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂when, according to the first noise energy S ₀generate the 4th threshold value A ₃, and by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀.

Wherein, illustrating of S401 and S402 with reference to embodiment as shown in Figure 1, can repeat no more herein.

As for S403, at audio power T _ibe less than t _ithe first threshold A in moment ₀when, the voice for prior art wake scheme up, no longer carry out VAD, like this, just may occur the situation that the voice of user are undetected.Such as, t _ithe first threshold A in moment ₀be applicable to noisy environment, but now terminal is in relatively quiet environment, thus causes sampled signal y _ithe voice of middle user undetected.The embodiment of the present invention changes t by S403 _i+1the first threshold A in moment ₀, make it mate with current environment facies.

Work as t _ithe first threshold A in moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂time, also, t _ithe first threshold A in moment ₀compare the first noise energy S ₀comparatively large, illustrate that terminal is now in relatively quiet environment, t _ithe first threshold A in moment ₀comparatively large, need to lower, with environments match.Wherein, the 3rd threshold value M ₂be default parameter, can obtain through debugging.

Because of the first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, therefore the first noise energy S ₀the stable energy of reaction environment.Therefore, S403 without the need to as S303, the first threshold A in more multiple moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂.Work as t _ithe first threshold A in moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂time, sampled signal y can be described _iin comprise the voice of user, for avoiding the undetected of the voice of this user, according to the first noise energy S ₀generate the 4th threshold value A ₃, and by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀.

The embodiment of the present invention is by obtaining t _iinstance sample obtains sampled signal y _iaudio power T _i, and at this audio power T _ibe less than t _ithe first threshold A in moment ₀, and t _ithe first threshold A in moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂when, according to the first noise energy S ₀generate the 4th threshold value A ₃, and by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀.Wherein, this first noise energy S ₀be by with the first extraction yield 1/x to sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, that is, t _i+1the first threshold A in moment ₀according to t _ithe first noise energy S in moment ₀obtain, like this, terminal can adjust the first threshold A of subsequent time according to ambient noise present ₀size, make the first threshold A in each moment ₀with environments match, with reducing the number of times carrying out VAD, when realizing the reduction of terminal power consumption under noisy environment, avoid sampled signal y further _ithe voice of middle user undetected.

Based on above-described embodiment, wherein, according to the first noise energy S ₀generate the 4th threshold value A ₃can comprise: by the first noise energy S ₀as the 4th threshold value A ₃; Or, by the first noise energy S ₀with the 3rd correction N preset ₂sum is as the 4th threshold value A ₃, i.e. A ₃=S ₀+ N ₂; Or, by the first noise energy S ₀with the 3rd coefficient a preset ₂long-pending as the 4th threshold value A ₃, i.e. A ₃=a ₂× S ₀.

Wherein, if the 3rd correction N ₂numerical value comparatively large, the 4th threshold value A is described ₃at the first noise energy S ₀basis on raise fast; If the 3rd correction N ₂numerical value less, the 4th threshold value A is described ₃at the first noise energy S ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the 3rd correction N ₂size can set according to actual scene, the embodiment of the present invention will not limit.Equally, if the 3rd coefficient a ₂numerical value comparatively large, the 4th threshold value A is described ₃at the first noise energy S ₀basis on raise fast; If the 3rd coefficient a ₂numerical value less, the 4th threshold value A is described ₃at the first noise energy S ₀basis on raise slow, the speed degree of rising can set according to the actual requirements.Wherein, the 3rd coefficient a ₂size can set according to actual scene, the embodiment of the present invention will not limit.

Alternatively, can also by the first noise energy S ₀with the 3rd coefficient a preset ₂product, adding the 3rd default correction N ₂as the 4th threshold value A ₃, i.e. A ₃=a ₂× S ₀+ N ₂.

Supplementary notes, the second correction N ₁with the 3rd correction N ₂reflect under different conditions respectively, first threshold A ₀the numerical value of relative noise energy lift.Wherein, first threshold A ₀relative second noise energy F ₀large second correction N ₁, first threshold A ₀relative first noise energy S ₀large 3rd correction N ₂.In addition, due to the first noise energy S ₀for tracking filter at a slow speed, the second noise energy F ₀for quick tracking filter, therefore, alternatively, the 3rd correction N ₂be greater than the second correction N ₁, to realize the Rapid matching to environment.

Further, the embodiment of the present invention can also record the scene of first threshold change.For the scene raising first threshold, can be recorded as and raise the threshold value moment; For the scene reducing first threshold, can be recorded as and reduce the threshold value moment.

Particularly, by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀before, the method can also comprise: record t _imoment is for reducing the threshold value moment; Work as t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform above-mentioned by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step, otherwise, do not perform above-mentioned by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step.

By the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀before, the method can also comprise: record t _imoment is for reducing the threshold value moment; Work as t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform above-mentioned by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step, otherwise, do not perform above-mentioned by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step.

Above-mentioned two kinds of concrete implementations can prevent first threshold A ₀pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability.

The embodiment of the present invention continues to monitor and tracking environmental ground unrest, environmentally the size adaptation adjustment first threshold A of ground unrest ₀, and to this first threshold A ₀the slow mode rising or fall slowly is taked in adjustment, thus reduces voice false dismissal probability.In addition, first threshold A ₀dynamic adjustments, make the power consumption under quiet environment and noisy environment close, thus can Consumer's Experience be promoted, improve product competitiveness.

Fig. 5 is the structural representation of voice Rouser embodiment one of the present invention.This voice Rouser can be realized by the mode of hardware.This voice Rouser can be integrated in the terminals such as such as panel computer, smart mobile phone, PDA.As shown in Figure 5, STF) 15, comparer 16, configurator 17 and interrupt processing circuit 18 voice Rouser 10 comprises: (SlowTrackingFilter is called for short: for SRC11, computing circuit 12, threshold value decision circuit 13, first withdrawal device 14, at a slow speed tracking filter.

Wherein, SRC11 is used for carrying out periodic samples to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer.Computing circuit 12 is for calculating sampling signal y _iaudio power T _i.Threshold value decision circuit 13 is for judging audio power T _iwhether be more than or equal to t _ithe first threshold A in moment ₀; At audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, triggered interrupts treatment circuit 18 exports interruption pulse signal to interrupt control circuit 20, carries out VAD by the enable DSP of interrupt control circuit 20 or processor 30.The input end of the first withdrawal device 14 is coupled to the output terminal of SRC11, the first withdrawal device 14 for the first extraction yield 1/x to sampled signal y _icarry out extraction obtain sampled point ys and export, x be greater than 1 natural number.The input end of STF15 is coupled to the output terminal of the first withdrawal device 14, and STF15 is used for obtaining sampled point ys to extraction to carry out tracking filter at a slow speed and obtain the first noise energy S ₀.The input end of comparer 16 is coupled to output terminal and the threshold value decision circuit 13 of STF15, and comparer 16 is for comparing the first noise energy S ₀with t _ithe first threshold A in moment ₀difference whether be greater than the first default threshold value M ₀.Configurator 17 detects unsuccessfully for working as VAD, and at t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, according to the first noise energy S ₀generate Second Threshold A ₁, and by Second Threshold A ₁as t _i+1the first threshold A in moment ₀, be issued to threshold value decision circuit 13, n and be positive integer and n is less than i.

With reference to figure 5, configurator 17 is voice Rouser 10 configuration parameter, such as above-mentioned first threshold A ₀deng.It will be appreciated by those skilled in the art that, configurator 17 receives the configuration parameter of self terminal, and corresponding control signal configuration parameter converted to each logic module in voice Rouser 10, wherein, logic module comprises computing circuit 12, threshold value decision circuit 13 and interrupt processing circuit 18 etc.SRC11 specifically can adopt down-sampled mode to sample to sound signal, such as by 32 kilo hertzs of (kilohertz, abbreviations: data KHz) are converted to 16KHz etc.

Sampled signal y _ithe flow direction is in Figure 5:

SRC11-> computing circuit 12-> threshold value decision circuit 13-> interrupt processing circuit 18 (optional)-> interrupt control circuit 20 (optional)->DSP or processor 30 (optional).

At audio power T _ibe more than or equal to t _ithe first threshold A in moment ₀when, sampled signal y _ithe flow direction comprise above-mentioned optional part; At audio power T _ibe less than t _ithe first threshold A in moment ₀when, sampled signal y _ithe flow direction do not comprise above-mentioned optional part.

First withdrawal device 14, STF15 and comparer 16 do not affect normal voice and wake up, only for configurator 17 acting in conjunction change voice wake up in first threshold A ₀.

The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 1, it realizes principle and technique effect is similar, repeats no more herein.

In the above-described embodiments, configurator 17 can be specifically for: by the first noise energy S ₀as Second Threshold A ₁; Or, by the first noise energy S ₀with the first correction N preset ₀sum is as Second Threshold A ₁, i.e. A ₁=S ₀+ N ₀; Or, by the first noise energy S ₀with the first coefficient a preset ₀long-pending as Second Threshold A ₁, i.e. A ₁=a ₀× S ₀, etc., the embodiment of the present invention is not as restriction.

Fig. 6 is the structural representation of voice Rouser embodiment two of the present invention.This voice Rouser can be realized by the mode of hardware.This voice Rouser can be integrated in the terminals such as such as panel computer, smart mobile phone, PDA.As shown in Figure 6, FTF) 150, comparer 160, configurator 170 and interrupt processing circuit 180 voice Rouser 100 comprises: (FastTrackingFilter is called for short: for SRC110, computing circuit 120, threshold value decision circuit 130, second withdrawal device 140, fast tracking filter.

Wherein, SRC110 is used for carrying out periodic samples to sound signal, wherein, at t _iinstance sample obtains sampled signal y _i, i is positive integer.Computing circuit 120 is for calculating sampling signal y _iaudio power T _i.Threshold value decision circuit 130 is for judging audio power T _iwhether be more than or equal to t _ithe first threshold A in moment ₀.The input end of the second withdrawal device 140 is coupled to the output terminal of SRC110, the second withdrawal device 140 for the second extraction yield 1/z to sampled signal y _icarry out extraction and obtain sampled point yf, wherein, z is the natural number being greater than x.The input end of FTF150 is coupled to the output terminal of the second withdrawal device 140, and FTF150 is used for obtaining sampled point yf to extraction to carry out quick tracking filter and obtain the second noise energy F ₀.The input end of comparer 160 is coupled to the output terminal of FTF150, and comparer 160 is at audio power T _ibe less than t _ithe first threshold A in moment ₀when, the first threshold in more each moment and the second noise energy F ₀difference whether be greater than the second default threshold value M ₁; And work as from t _i-mmoment is until t _imoment first threshold A separately ₀with the second noise energy F ₀difference be all greater than the second default threshold value M ₁when, triggered interrupts treatment circuit 180 exports interruption pulse signal to interrupt control circuit 200, carries out VAD, m be positive integer and m is less than i by the enable DSP of interrupt control circuit 200 or processor 300.Configurator 170 for when VAD detects successfully, according to the second noise energy F ₀generate the 3rd threshold value A ₂, and by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀, be issued to threshold value decision circuit 130.

The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 3, it realizes principle and technique effect is similar, repeats no more herein.

On the basis of above-described embodiment, configurator can be specifically for: by the second noise energy F ₀as the 3rd threshold value A ₂; Or, by the second noise energy F ₀with the second correction N preset ₁sum is as the 3rd threshold value A ₂, i.e. A ₂=F ₀+ N ₁; Or, by the second noise energy F ₀with the second coefficient a preset ₁long-pending as the 3rd threshold value A ₂, i.e. A ₂=a ₁× F ₀, etc., the embodiment of the present invention is not as restriction.

Alternatively, configurator 170 can also be used for: record t _imoment is for reducing the threshold value moment; Work as t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform above-mentioned by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step, otherwise, do not perform above-mentioned by the 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀step, thus can first threshold A be prevented ₀pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability.

With reference to figure 5, configurator 17 can also be used for: at audio power T _ibe less than t _ithe first threshold A in moment ₀, and t _ithe first threshold A in moment ₀with the first noise energy S ₀difference be greater than the 3rd default threshold value M ₂when, according to the first noise energy S ₀generate the 4th threshold value A ₃, and by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀.

Now, the device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 4, it realizes principle and technique effect is similar, repeats no more herein.

Further, configurator 17 can be specifically for: by the first noise energy S ₀as the 4th threshold value A ₃; Or, by the first noise energy S ₀with the 3rd correction N preset ₂sum is as the 4th threshold value A ₃, i.e. A ₃=S ₀+ N ₂; Or, by the first noise energy S ₀with the 3rd coefficient a preset ₂long-pending as the 4th threshold value A ₃, i.e. A ₃=a ₂× S ₀, etc., the embodiment of the present invention is not as restriction.

Further, configurator 17 can also be used for: record t _imoment is for reducing the threshold value moment; Work as t _imoment and upper one reduce the threshold value moment interval greater than preset value T _timetime, perform above-mentioned by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step, otherwise, do not perform above-mentioned by the 4th threshold value A ₃as t _i+1the first threshold A in moment ₀step, thus can first threshold A be prevented ₀pingpang handoff, do not affect the reliability of speech detection simultaneously, reduce voice false dismissal probability

With reference to figure 5 and Fig. 6, the first withdrawal device 14 and the second withdrawal device 140 realize long period or short-period data pick-up respectively.STF15 is the wave filter of a slow convergence, changes for tenacious tracking neighbourhood noise.FTF150 is the wave filter of a Fast Convergent, for quick tracking environmental noise change.Alternatively, STF15 is the wave filter of a slow convergence, changes for tenacious tracking neighbourhood noise.STF15 and FTF150, for following the tracks of the energy of current calculating window, adopts and computing circuit 12 or the similar structure of computing circuit 120.The difference of STF15 and FTF150 is the exponent number of wave filter and the difference of parameter, and the exponent number of wave filter and parameter set according to the debugging situation of reality.FTF150 is used for carrying out short period filtering, and the data variation namely occurred recently can the output of rapid contribution wave filter.STF15 is long period filtering, and the impact of data variation on the output of wave filter namely occurred recently is smaller and slow.

Alternatively, on the basis of Fig. 5, composition graphs 6, obtains structure as shown in Figure 7.Fig. 7 is the structural representation of voice Rouser embodiment three of the present invention.As shown in Figure 7, voice Rouser 1000 comprises: SRC11, computing circuit 12, threshold value decision circuit 13, first withdrawal device 14, second withdrawal device 140, STF15, FTF150, comparer 16, configurator 17 and interrupt processing circuit 18.

Wherein, threshold value decision circuit 13 also possesses effect and the function of threshold value decision circuit 130; Comparer 16 also possesses effect and the function of comparer 160; Configurator 17 also possesses effect and the function of configurator 170; Interrupt processing circuit 18 also possesses effect and the function of interrupt processing circuit 180.Concrete principle, as above-described embodiment, repeats no more herein.

In several embodiments that the application provides, should be understood that the equipment disclosed and method can realize by another way.Such as, apparatus embodiments described above is only schematic, such as, the division of described unit or module, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or module can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of equipment or module or communication connection can be electrical, machinery or other form.

The described module illustrated as separating component can or may not be physically separates, and the parts as module display can be or may not be physical module, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.

Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims

1. a voice awakening method, is characterized in that, comprising:

Calculate described sampled signal y _iaudio power T _i;

When VAD detects unsuccessfully, and at described t _idetected unsuccessfully for n time continuously before moment, and the first noise energy S ₀with described t _ithe first threshold A in moment ₀difference be greater than the first default threshold value M ₀time, according to described first noise energy S ₀generate Second Threshold A ₁, and by described Second Threshold A ₁as t _i+1the first threshold A in moment ₀, wherein, described first noise energy S ₀be by with the first extraction yield 1/x to described sampled signal y _iextract, and tracking filter is at a slow speed carried out to the sampled point ys extracted obtain, x be greater than 1 natural number, n is positive integer and n is less than i.

2. method according to claim 1, is characterized in that, described according to described first noise energy S ₀generate Second Threshold A ₁, comprising:

By described first noise energy S ₀as described Second Threshold A ₁;

3. method according to claim 1, is characterized in that, at the described sampled signal y of described calculating _iaudio power T _iafterwards, also comprise:

4. method according to claim 3, is characterized in that, described according to described second noise energy F ₀generate the 3rd threshold value A ₂, comprising:

By described second noise energy F ₀as described 3rd threshold value A ₂;

5. the method according to claim 3 or 4, is characterized in that, by described 3rd threshold value A ₂as t _i+1the first threshold A in moment ₀before, also comprise:

Record described t _imoment is for reducing the threshold value moment;

6. method according to claim 1, is characterized in that, at the described sampled signal y of described calculating _iaudio power T _iafterwards, also comprise:

7. method according to claim 6, is characterized in that, described according to described first noise energy S ₀generate the 4th threshold value A ₃, comprising:

By described first noise energy S ₀as described 4th threshold value A ₃;

8. the method according to claim 6 or 7, is characterized in that, by described 4th threshold value A ₃as t _i+1the first threshold A in moment ₀before, also comprise:

Record described t _imoment is for reducing the threshold value moment;

9. a voice Rouser, is characterized in that, comprising:

First withdrawal device, the input end of described withdrawal device is coupled to the output terminal of described SRC, for the first extraction yield 1/x to described sampled signal y _icarry out extraction and obtain sampled point ys, x be greater than 1 natural number;

The input end of tracking filter STF at a slow speed, described STF is coupled to the output terminal of described first sampling thief, carries out tracking filter at a slow speed obtain the first noise energy S for obtaining sampled point ys to described extraction ₀;

Comparer, the input end of described comparer is coupled to the output terminal of described first withdrawal device and described threshold value decision circuit, for more described first noise energy S ₀with described t _ithe first threshold A in moment ₀difference whether be greater than the first default threshold value M ₀;

10. device according to claim 9, is characterized in that, described configurator specifically for:

By described first noise energy S ₀as described Second Threshold A ₁;

11. devices according to claim 9, is characterized in that, also comprise:

The input end of fast tracking filter FTF, described FTF is coupled to the output terminal of described second withdrawal device, carries out quick tracking filter obtain the second noise energy F for obtaining sampled point yf to described extraction ₀;

Described comparer, the input end of described comparer is coupled to the output terminal of described FTF, also at described audio power T _ibe less than described t _ithe first threshold A in moment ₀when, the first threshold in more each moment and described second noise energy F ₀difference whether be greater than the second default threshold value M ₁; And work as from t _i-mmoment is until t _imoment first threshold A separately ₀with described second noise energy F ₀difference be all greater than the second default threshold value M ₁when, trigger described interrupt processing circuit and export interruption pulse signal to described interrupt control circuit, carry out VAD, m by the enable described DSP of described interrupt control circuit or described processor and be positive integer and m is less than i;

12. devices according to claim 11, is characterized in that, described configurator specifically for:

By described second noise energy F ₀as described 3rd threshold value A ₂;

13. devices according to claim 11 or 12, is characterized in that, described configurator also for:

Record described t _imoment is for reducing the threshold value moment;

14. devices according to claim 9, is characterized in that, described configurator also for:

15. devices according to claim 14, is characterized in that, described configurator specifically for:

By described first noise energy S ₀as described 4th threshold value A ₃;

16. devices according to claims 14 or 15, is characterized in that, described configurator also for:

Record described t _imoment is for reducing the threshold value moment;