Summary of the invention
The embodiment of the present invention provides a kind of voice awakening method and device, to reduce power consumption of the terminal under noisy environment.
In a first aspect, the embodiment of the present invention provides a kind of voice awakening method, comprising:
Periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive integer;
Calculate the sampled signal yiAudio power Ti;
In the audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, carry out voice activation
Detect VAD;
When VAD continuous n times detection failure, and when VAD detection failure, and in the tiContinuous n times inspection before moment
Dendrometry loses and the first noise energy S0With the tiThe first threshold A at moment0Difference be greater than preset first threshold value M0
When, according to the first noise energy S0Generate second threshold A1, and by the second threshold A1As ti+1First threshold at moment
Value A0, wherein the first noise energy S0Be by with the first extraction yield 1/x to the sampled point yiIt is extracted, and to pumping
The sampled point ys of taking-up carries out tracking filter at a slow speed and obtains, and x is the natural number greater than 1, and n is positive integer and n is less than i.
With reference to first aspect, in the first possible implementation of the first aspect, described according to first noise
Energy S0Generate second threshold A1, comprising:
By the first noise energy S0As the second threshold A1;
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1;
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1。
With reference to first aspect, in the second possible implementation of the first aspect, the sampling letter is calculated described
Number yiAudio power TiLater, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment is respective
First threshold A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, VAD is carried out, m is positive
Integer and m are less than i;
When VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and by the third threshold value
A2As ti+1The first threshold A at moment0, wherein the second noise energy F0It is by being adopted with the second extraction yield 1/z to described
Sample signal yiIt is extracted, and quick tracking filter is carried out to the sampled point yf extracted and is obtained, wherein z is the nature greater than x
Number.
The possible implementation of second with reference to first aspect, in the third possible implementation of first aspect
In, it is described according to the second noise energy F0Generate third threshold value A2, comprising:
By the second noise energy F0As the third threshold value A2;
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2;
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2。
Second with reference to first aspect or the third possible implementation, in the 4th kind of possible reality of first aspect
In existing mode, by the third threshold value A2As ti+1The first threshold A at moment0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute
State third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As
ti+1The first threshold A at moment0The step of.
With reference to first aspect, in the fifth possible implementation of the first aspect, the sampling letter is calculated described
Number yiAudio power TiLater, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With institute
State the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0It is raw
At the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0。
The 5th kind of possible implementation with reference to first aspect, in the 6th kind of possible implementation of first aspect
In, it is described according to the first noise energy S0Generate the 4th threshold value A3, comprising:
By the first noise energy S0As the 4th threshold value A3;
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3;
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3。
The 5th kind with reference to first aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of first aspect
In existing mode, by the 4th threshold value A3As ti+1The first threshold A at moment0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute
State the 4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As
ti+1The first threshold A at moment0The step of.
Second aspect, the embodiment of the present invention provide a kind of voice Rouser, comprising:
Sampling frequency converter SRC, for carrying out periodic samples to audio signal, wherein in tiInstance sample obtains
Sampled signal yi, i is positive integer;
Computing circuit, for calculating the sampled signal yiAudio power Ti;
Threshold value decision circuit, for judging the audio power TiWhether the t is greater than or equal toiThe first threshold at moment
A0;In the audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, triggering interrupt processing circuit is defeated
Interruption pulse signal enables digital signal processor DSP or processor by the interrupt control circuit to interrupt control circuit out
It carries out voice activation and detects VAD;
The input terminal of first withdrawal device, first withdrawal device is coupled to the output end of the SRC, for extracting with first
Rate 1/x is to the sampled signal yiIt is extracted to obtain sampled point ys, x is the natural number greater than 1;
Tracking filter STF at a slow speed, the input terminal of the STF are coupled to the output end of first withdrawal device, for pair
The extraction obtains sampled point ys progress, and tracking filter obtains the first noise energy S at a slow speed0;
Comparator, the input terminal of the comparator are coupled to and the output end of the STF and the threshold value decision circuit, use
In the first noise energy S0With the tiThe first threshold A at moment0Difference whether be greater than preset first threshold value
M0;
Configurator, for failing when VAD detection, and in the tiContinuous n times detection failure and described before moment
First noise energy S0With the tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to described
First noise energy S0Generate second threshold A1, and by the second threshold A1As ti+1The first threshold A at moment0, it is issued to
The threshold value decision circuit, n is positive integer and n is less than i.
In conjunction with second aspect, in the first possible implementation of the second aspect, the configurator is specifically used for:
By the first noise energy S0As the second threshold A1;
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1;
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1。
In conjunction with second aspect, in a second possible implementation of the second aspect, further includes:
The input terminal of second withdrawal device, second withdrawal device is coupled to the output end of the SRC, for extracting with second
Rate 1/z is to the sampled signal yiIt is extracted to obtain sampled point yf, wherein z is the natural number greater than x;
Fast tracking filter FTF, the input terminal of the FTF are coupled to the output end of second withdrawal device, for pair
The extraction obtains the quick tracking filter of sampled point yf progress and obtains the second noise energy F0Second noise energy;
The comparator, the output end with the FTF are also used in the audio power TiLess than the tiThe of moment
One threshold value A0In the case where, the first threshold at more each moment and the second noise energy F0Difference whether be greater than it is preset
Second threshold value M1;And work as from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy F0Difference
Both greater than preset second threshold value M1In the case where, the interrupt processing circuit output interruption pulse signal is triggered in described
Disconnected control circuit enables the DSP by the interrupt control circuit or the processor carries out VAD, and m is positive integer and m is less than
i;
The configurator is also used to when VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2,
And by the third threshold value A2As ti+1The first threshold A at moment0, it is issued to the threshold value decision circuit.
In conjunction with second of possible implementation of second aspect, in the third possible implementation of second aspect
In, the configurator is specifically used for:
By the second noise energy F0As the third threshold value A2;
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2;
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2。
In conjunction with second of second aspect or the third possible implementation, in the 4th kind of possible reality of second aspect
In existing mode, the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute
State third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As
ti+1The first threshold A at moment0The step of.
In conjunction with second aspect, in a fifth possible implementation of the second aspect, the configurator is also used to:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With institute
State the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0It is raw
At the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0。
In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect
In, the configurator is specifically used for:
By the first noise energy S0As the 4th threshold value A3;
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3;
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3。
In conjunction with the 5th kind of second aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of second aspect
In existing mode, the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute
State the 4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As
ti+1The first threshold A at moment0The step of.
The embodiment of the present invention provides a kind of voice awakening method and device, passes through and obtains tiInstance sample obtains sampled signal
yiAudio power Ti, and in audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD;When
VAD detection failure, and in tiContinuous n times detect failure and the first noise energy S before moment0With tiFirst threshold at moment
Value A0Difference be greater than preset first threshold value M0When, adjust first threshold A0Size, obtain ti+1The first threshold at moment
A0: according to the first noise energy S0Generate second threshold A1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein,
First noise energy S0Be by with the first extraction yield 1/x to sampled signal yiExtracted, and to the sampled point ys extracted into
Tracking filter obtains row at a slow speed, that is to say, that ti+1The first threshold A at moment0It is according to tiThe first noise energy S at moment0?
It arrives, in this way, terminal can adjust the first threshold A of subsequent time according to ambient noise present0Size, make the of each moment
One threshold value A0The reduction of terminal power consumption under noisy environment is realized with environments match to reduce the number for carrying out VAD.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention
A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
The meaning that voice wakes up, refers in any case, can be swashed by predefined wake-up word to terminal
It is living, and execute specific application.Similar user key-press lights screen, the processing to Activate Phone.The advantages of voice wakes up is liberation
The both hands of user.
In the voice wake-up scheme of a smart phone, under quiet environment, the stand-by power consumption of the smart phone is about 2.2 millis
× 3.8 volts of peace;Under noisy environment, the stand-by power consumption of the smart phone is 5.5 milliamperes × 3.8 volts.As it can be seen that the intelligence hand
Power consumption difference of the machine under noisy environment and quiet environment is about 12 milliwatts, (5.5-2.2) × 3.8=12.
According to power consumption estimation model: therefore average power consumption=peace and quiet power consumption × noisy power consumption × 30% of 70%+ is considered as
The power consumption under noisy environment is reduced, the embodiment of the present invention pays close attention to the optimised power consumption under noisy environment.
The embodiment of the present invention provides the method and device that a kind of voice wakes up digital signal processor in terminal, is called out with reducing
DSP carries out the number of VAD in terminal of waking up, and realizes the reduction of power consumption of the terminal under noisy environment.
Fig. 1 is the flow chart of voice awakening method embodiment one of the present invention.This method can be executed by voice Rouser,
The device can be realized by way of hardware.Voice Rouser can integrate in such as tablet computer, smart phone, palm
In the terminals such as computer (Personal Digital Assistant, referred to as: PDA).As shown in Figure 1, voice awakening method includes:
S101, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole
Number.
Similarly, ti-1The sampled signal at moment can be denoted as yi-1, ti+1The sampled signal at moment can be denoted as yi+1, with this
Analogize, is not listed one by one here.
Wherein, in any embodiment of the present invention, audio signal can be collected for sound collection equipments such as microphones
Signal.By sampling frequency converter (Sample Rate Convertor, referred to as: SRC) to sound collection equipments such as microphones
Collected audio signal carries out periodic samples.Alternatively, the collected audio signal of the sound collection equipments such as microphone is passed through
After crossing the filter process such as bandpass filter, then periodic samples are carried out by SRC, the embodiment of the present invention does not limit it
System.
S102, sampled signal y is calculatediAudio power Ti。
It is carried out after obtaining sampled signal it should be noted that can be to the calculating of the audio power of sampled signal
, such as: in ti-1Instance sample obtains sampled signal yi-1Afterwards, sampled signal y can also be calculatedi-1Corresponding audio power Ti-1。
It will be appreciated by those skilled in the art that because of sampled signal yiBe it is certain, therefore, sampled signal yiAudio power
TiIt can be obtained by calculating.
Specifically, sampled signal y is indicated using x (j)iIn the amplitude of jth sampled point, x (j) × x (j) indicates sampled signal
yiIn the energy size at jth moment, j is 0 to the integer between M-1, and M is total number of sample points, coefficient ajFor indicating each sampling
The weight size of point, TiIndicate sampled signal yiAudio power.For example, following formula is a normalized processing, specifically
Indicate the percentage that each sampled point is occupied in integral energy:
Wherein,
Here it only illustrates and calculates sampled signal yiAudio power Ti, the embodiment of the present invention is not limited system, can also
To obtain sampled signal y by root mean square (Root mean square, referred to as: RMS) or other similar modeiAudio power
Ti, such as without normalized processing, etc..
S103, in audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD.
Wherein, the elements such as DSP or the processor that specifically can be in terminal of VAD are carried out.
S104, fail when VAD is detected, and in tiContinuous n times detect failure and the first noise energy S before moment0
With tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to the first noise energy S0Generate second
Threshold value A1, and by second threshold A1As ti+1The first threshold A at moment0, wherein the first noise energy S0It is by being taken out with first
Take rate 1/x to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1
Natural number, n is that positive integer and n are less than i.
It should be noted that when VAD detection failure, and in the tiContinuous n times detection unsuccessfully refers to before moment:
tiThe VAD that moment carries out detects failure, and from ti-nMoment is to ti-1The VAD detection that moment carries out fails, specifically, it is assumed that n
It is 2, then fails when VAD is detected, and in the tiContinuous n times detection unsuccessfully refers to before moment: in tiThe VAD that moment carries out
Before detection failure, continuous two moment are (i.e. from ti-2Moment is to ti-1Moment) carried out the continuous failure of VAD detection 2 times.Into
One step, for the ease of more fully understanding technical solution of the present invention, VAD detection is unsuccessfully illustrated, such as: when
Before be automobile engine sound, due to the sound audio power be greater than current time first threshold A0, then need to carry out
VAD still passes through VAD, it can be determined that goes out the voice that the sound is not user, therefore VAD detection failure.In other words, if
Terminal is in high-noise environment, correspondingly, the noise energy of ambient noise can be relatively high, once the noise energy of ambient noise
Greater than the first threshold A at current time0, it is necessary to start VAD, however, examining since ambient noise itself is disorderly and unsystematic in VAD
, can not be from the voice signal being wherein tested with when survey, therefore will lead to VAD detection failure.First noise energy S0It indicates
The energy level of the steady-state noise of terminal local environment.First threshold value M0It is preset parameter, can be determined by debugging.
It should also be noted that, first and second for distinguishing same term in any embodiment of the present invention,
For example, " second " in " first " and " second threshold " of " first threshold ", the name side only distinguished to different threshold values
Formula does not represent the order between threshold value.
Under actual application scenarios, the noise under different application scene is of different sizes.For example, making an uproar under quiet environment
About 30 to 35 decibels of sound (decibel, referred to as: db);Under noisy environment, ambient noise can refer to following data: mall noise
About 60db, road noise about 70db, aircraft cabin noise about 70db, public transport noise about 80db, metro noise about 90db, etc..Separately
Outside, the noise size in same place, different time is also different.For example, the noise in same place, daytime and evening may phase
Poor 10 to 15db.
Furthermore when user converses under noisy environment, talks, speech volume can be subconsciously improved, to mention
High signal-to-noise ratio (Signal Noise Ratio, referred to as: SNR), it is basic to provide feasibility for voice wake-up.
Therefore, at present using unified noise gate, i.e. preset threshold, voice wake up scheme, when voice wakes up terminal,
It cannot be distinguished and treat quiet environment and noisy environment, if preset threshold setting is excessively high, will lead to voice missing inspection;If preset threshold is set
It sets too low, then will lead to frequent wake-up processor, and then cause power consumption bigger than normal.
In embodiments of the present invention, the first threshold A at each moment is adjusted in due course0Size.
Specifically, it by S101 to S103, obtains in tiInstance sample obtains sampled signal yiAudio power TiAnd it should
Audio power TiOpposite tiThe first threshold A at moment0Size, and work as audio power TiMore than or equal to tiThe first threshold at moment
A0In the case where, VAD is carried out, so that DSP or processor etc. carry out VAD and according to VAD's as a result, judging whether to wake up terminal.
Wherein, VAD is detected successfully, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal yiIn detect the language of user
Sound then wakes up terminal;Otherwise, VAD detection failure, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal yiIn
The voice for not detecting user, then do not wake up terminal.
In S104, in the first noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first thresholding
Value M0When, show that terminal is currently likely to be at the environment of high background noise.At this point, according to the first noise energy S0Generate the second threshold
Value A1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein, the first noise energy S0It is by being extracted with first
Rate 1/x is to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1
Natural number, n are the positive integer less than i.In practical application, sampled signal yiIt may include tiThe voice and environment of the user at moment
Noise, alternatively, sampled signal yiIt only include tiThe ambient noise at moment.In tiMoment obtains ti+1The first threshold A at moment0, i.e.,
ti+1When the moment, terminal executes first threshold used in S103 and S104 in voice awakening method.
If tiIt is that first time voice wakes up that the voice at moment, which wakes up, then tiThe first threshold A at moment0It can be preset.It can
To think, preset first threshold A0It is an Optimal Parameters, corresponding a kind of possible application scenarios, for example, by first threshold A0In advance
It is set as 50 decibels, it is believed that be the ambient noise thresholding under quiet environment.Wherein, Fig. 2 example show quiet environment with it is noisy
First threshold under environment.As shown in Fig. 2, first threshold is higher by the first preset value compared with ambient noise under quiet environment;Noisy ring
Under border, first threshold is higher by the second preset value compared with ambient noise.In addition, the first threshold of noisy environment is above quiet environment
First threshold.
In addition, S103 can be with are as follows: 1) in audio power TiWith ti-1The audio power T at momenti-1Difference be greater than or wait
In tiThe differential threshold A at moment00In the case where, carry out VAD;Alternatively, 2) in audio power TiMore than or equal to tiThe first of moment
Threshold value A0, and, audio power TiWith ti-1The audio power T at momenti-1Difference be greater than or equal to tiThe differential threshold A at moment00's
In the case of, carry out VAD;Alternatively, 3) in audio power TiMore than or equal to tiThe first threshold A at moment0, or, audio power TiWith
ti-1The audio power T at momenti-1Difference be greater than or equal to tiThe differential threshold A at moment00, in the case that the two meets one,
Carry out VAD.Wherein, ti-1The audio power T at momenti-1It caches in the terminal, in ti-1Moment calculates sampled signal yi-1's
Audio power obtains.
If 1), then similar adjustment tiThe first threshold A at moment0Method, adjust tiThe differential threshold A at moment00;If
2), then similar to adjust tiThe first threshold A at moment0Method, while adjusting tiThe first threshold A at moment0And tiThe difference at moment
Threshold value A00;If 3), then similar adjustment tiThe first threshold A at moment0Method, adjust tiThe first threshold A at moment0Or tiWhen
The differential threshold A at quarter00。
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy
Measure TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD;Fail when VAD is detected, and in tiBefore moment
Continuous n times detection failure and the first noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first
Limit value M0When, adjust first threshold A0Size, obtain ti+1The first threshold A at moment0: according to the first noise energy S0Generate the
Two threshold value As1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein, the first noise energy S0It is by with first
Extraction yield 1/x is to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is,
It says, ti+1The first threshold A at moment0It is according to tiThe first noise energy S at moment0It obtains, in this way, terminal can be according to current
The first threshold A of ambient noise adjustment subsequent time0Size, make the first threshold A at each moment0With environments match, with reduce into
The number of row VAD realizes the reduction of terminal power consumption under noisy environment.
In the above-described embodiments, according to the first noise energy S0Generate second threshold A1, may include: by the first noise energy
Measure S0As second threshold A1;Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as second threshold A1,
That is A1=S0+N0;Alternatively, by the first noise energy S0With preset first coefficient a0Product as second threshold A1, i.e. A1=a0×
S0。
Wherein, if the first correction amount N0Numerical value it is larger, illustrate second threshold A1In the first noise energy S0On the basis of rise
High is fast;If the first correction amount N0Numerical value it is smaller, illustrate second threshold A1In the first noise energy S0On the basis of it is raised
Slowly, raised speed degree can be set according to actual needs.Wherein, the first correction amount N0Size can be according to actual scene
It is set, the embodiment of the present invention not limits.Equally, if the first coefficient a0Numerical value it is larger, illustrate second threshold A1First
Noise energy S0On the basis of it is raised fast;If the first coefficient a0Numerical value it is smaller, illustrate second threshold A1In the first noise energy
S0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the first coefficient a0Size can
To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the first noise energy S0With preset first coefficient a0Product, add preset first
Correction amount N0As second threshold A1, A1=a0×S0+N0。
Fig. 3 is the flow chart of voice awakening method embodiment two of the present invention.As shown in figure 3, this method may include:
S301, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole
Number.
S302, sampled signal y is calculatediAudio power Ti。
S303, in audio power TiLess than tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment respective
One threshold value A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, VAD is carried out, m is positive whole
It counts and m is less than i.
Illustratively, if m=2, as audio power TiLess than tiThe first threshold A at moment0, and ti-2The first of moment
Threshold value A0With the second noise energy F0Difference be greater than the second threshold value M1, ti-1The first threshold A at moment0With the second noise energy
F0Difference be greater than the second threshold value M1And tiThe first threshold A at moment0With the second noise energy F0Difference be greater than second
Limit value M1When, carry out VAD.
S304, when VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and by third threshold value A2Make
For ti+1The first threshold A at moment0, wherein second noise energy F0Be by with the second extraction yield 1/z to sampled signal yiInto
Row extracts, and carries out quick tracking filter to the sampled point yf extracted and obtain, wherein z is the natural number greater than x.
Wherein, illustrating for S301 and S302 can refer to embodiment as shown in Figure 1, and details are not described herein again.
For S303, in audio power TiLess than tiThe first threshold A at moment0In the case where, for the voice of the prior art
Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, tiFirst threshold at moment
Value A0Suitable for noisy environment, but at this time, terminal is in opposite quiet environment (for example, environment of low ambient noise), to lead
Cause sampled signal yiThe missing inspection of the voice of middle user.The embodiment of the present invention changes t by S303 and S304i+1First threshold at moment
Value A0, match it with current environment.
When from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy F0Difference it is both greater than default
The second threshold value M1When, that is, add up first threshold A occur m+1 times0With the second noise energy F0Difference be greater than preset second
Threshold value M1The case where, illustrate that terminal is now in quiet environment (environment of low ambient noise), current first threshold A0Compared with
Greatly, it needs to lower, to match with quiet environment.Wherein, the second threshold value M1It is preset parameter, can be obtained by debugging.
S304, VAD are detected successfully, illustrate sampled signal yiIn include user voice, for the language for avoiding the user
The missing inspection of sound, according to the second noise energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold at moment
A0.Wherein, second noise energy F0Be by with the second extraction yield 1/z to sampled signal yiIt is extracted, and to extracting
Sampled point yf carries out quick tracking filter and obtains, therefore, the second noise energy F0It can reflect to a certain extent locating for terminal
The energy level of the transient noise of environment.
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy
Measure TiLess than tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy
F0Difference be both greater than preset second threshold value M1In the case where, carry out VAD;When VAD is detected successfully, according to the second noise
Energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold A at moment0.Wherein, second noise energy
F0Be by with the second extraction yield 1/z to sampled signal yiIt is extracted, and quickly tracking filter is carried out to the sampled point yf extracted
Wave obtains, that is to say, that ti+1The first threshold A at moment0It is according to tiThe second noise energy F at moment0It obtains, in this way, eventually
End can adjust the first threshold A of subsequent time according to ambient noise present0Size, make the first threshold A at each moment0With ring
Border matching realizes that terminal in the case where the reduction of power consumption, is further kept away under noisy environment to reduce the number for carrying out VAD
Exempt from sampled signal yiThe missing inspection of the voice of middle user.
In the above-described embodiments, according to the second noise energy F0Generate third threshold value A2, can specifically include: by
Two noise energy F0As third threshold value A2;Alternatively, by the second noise energy F0With preset second correction amount N1The sum of as the
Three threshold value As2, i.e. A2=F0+N1;Alternatively, by the second noise energy F0With preset second coefficient a1Product as third threshold value A2,
That is A2=a1×F0。
Wherein, if the second correction amount N1Numerical value it is larger, illustrate third threshold value A2In the second noise energy F0On the basis of rise
High is fast;If the second correction amount N1Numerical value it is smaller, illustrate third threshold value A2In the second noise energy F0On the basis of it is raised
Slowly, raised speed degree can be set according to actual needs.Wherein, the second correction amount N1Size can be according to actual scene
It is set, the embodiment of the present invention not limits.Equally, if the second coefficient a1Numerical value it is larger, illustrate third threshold value A2Second
Noise energy F0On the basis of it is raised fast;If the second coefficient a1Numerical value it is smaller, illustrate third threshold value A2In the second noise energy
F0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the second coefficient a1Size can
To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the second noise energy F0With preset second coefficient a1Product, add preset second
Correction amount N1As third threshold value A2, A2=a1×F0+N1。
Fig. 4 is the flow chart of voice awakening method embodiment three of the present invention.As shown in figure 4, this method may include:
S401, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole
Number.
S402, sampled signal y is calculatediAudio power Ti。
S403, in audio power TiLess than tiThe first threshold A at moment0, and tiThe first threshold A at moment0With the first noise
Energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0Generate the 4th threshold value A3, and
By the 4th threshold value A3As ti+1The first threshold A at moment0。
Wherein, illustrating for S401 and S402 can refer to embodiment as shown in Figure 1, and details are not described herein again.
As for S403, in audio power TiLess than tiThe first threshold A at moment0In the case where, for the voice of the prior art
Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, tiFirst threshold at moment
Value A0Suitable for noisy environment, but at this time, terminal is in relatively quiet environment, so as to cause sampled signal yiThe language of middle user
The missing inspection of sound.The embodiment of the present invention changes t by S403i+1The first threshold A at moment0, it is made to match with current environment.
Work as tiThe first threshold A at moment0With the first noise energy S0Difference be greater than preset third threshold value M2When,
That is, tiThe first threshold A at moment0Compare the first noise energy S0It is larger, illustrate that terminal is now in relatively quiet environment,
tiThe first threshold A at moment0It is larger, need to lower, with environments match.Wherein, third threshold value M2It is preset parameter, it can be with
It is obtained by debugging.
Because of the first noise energy S0Be by with the first extraction yield 1/x to sampled signal yiIt is extracted, and to extracting
Sampled point ys carries out tracking filter at a slow speed and obtains, therefore the first noise energy S0The stable energy of reaction environment.Therefore, S403 is not necessarily to
As S303, the first threshold A at more multiple moment0With the first noise energy S0Difference be greater than preset third threshold value
M2.Work as tiThe first threshold A at moment0With the first noise energy S0Difference be greater than preset third threshold value M2When, it can illustrate
Sampled signal yiIn include user voice, for avoid the user voice missing inspection, according to the first noise energy S0Generate the 4th
Threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0。
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy
Measure TiLess than tiThe first threshold A at moment0, and tiThe first threshold A at moment0With the first noise energy S0Difference be greater than it is preset
Third threshold value M2In the case where, according to the first noise energy S0Generate the 4th threshold value A3, and by the 4th threshold value A3As ti+1When
The first threshold A at quarter0.Wherein, first noise energy S0Be by with the first extraction yield 1/x to sampled signal yiIt is extracted,
And tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is to say, that ti+1The first threshold A at moment0It is according to ti
The first noise energy S at moment0It obtains, in this way, terminal can adjust the first threshold of subsequent time according to ambient noise present
Value A0Size, make the first threshold A at each moment0Realize terminal noisy to reduce the number for carrying out VAD with environments match
Under environment in the case where the reduction of power consumption, sampled signal y is further avoidediThe missing inspection of the voice of middle user.
Based on the above embodiment, wherein according to the first noise energy S0Generate the 4th threshold value A3It may include: that first makes an uproar
Acoustic energy S0As the 4th threshold value A3;Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold
Value A3, i.e. A3=S0+N2;Alternatively, by the first noise energy S0With preset third coefficient a2Product as the 4th threshold value A3, i.e. A3
=a2×S0。
Wherein, if third correction amount N2Numerical value it is larger, illustrate the 4th threshold value A3In the first noise energy S0On the basis of rise
High is fast;If third correction amount N2Numerical value it is smaller, illustrate the 4th threshold value A3In the first noise energy S0On the basis of it is raised
Slowly, raised speed degree can be set according to actual needs.Wherein, third correction amount N2Size can be according to actual scene
It is set, the embodiment of the present invention not limits.Equally, if third coefficient a2Numerical value it is larger, illustrate the 4th threshold value A3First
Noise energy S0On the basis of it is raised fast;If third coefficient a2Numerical value it is smaller, illustrate the 4th threshold value A3In the first noise energy
S0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, third coefficient a2Size can
To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the first noise energy S0With preset third coefficient a2Product, add preset third
Correction amount N2As the 4th threshold value A3, i.e. A3=a2×S0+N2。
Supplementary explanation, the second correction amount N1With third correction amount N2Reflect respectively under different conditions, first threshold
A0The numerical value of relative noise energy lift.Wherein, first threshold A0Opposite second noise energy F0Big second correction amount N1, first
Threshold value A0Opposite first noise energy S0Big third correction amount N2.In addition, due to the first noise energy S0For tracking filter at a slow speed,
Second noise energy F0For quick tracking filter, therefore, optionally, third correction amount N2Greater than the second correction amount N1, with realization pair
The Rapid matching of environment.
Further, the embodiment of the present invention can also record the scene of first threshold variation.For increasing first threshold
Scene, can recorde as the rise threshold moment;For reducing the scene of first threshold, can recorde to reduce the threshold value moment.
Specifically, by third threshold value A2As ti+1The first threshold A at moment0Before, this method can also include: record
tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes
It is above-mentioned by third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute above-mentioned by third threshold value A2As
ti+1The first threshold A at moment0The step of.
By the 4th threshold value A3As ti+1The first threshold A at moment0Before, this method can also include: record tiMoment
To reduce the threshold value moment;Work as tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, execution is above-mentioned will
4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute above-mentioned by the 4th threshold value A3As ti+1Moment
First threshold A0The step of.
Above two concrete implementation mode can prevent first threshold A0Pingpang handoff, while not influencing speech detection
Reliability reduces voice false dismissal probability.
The embodiment of the present invention continues to monitor and tracking environmental ambient noise, according to the size adaptation tune of environmental background noise
Whole first threshold A0, and to first threshold A0The slow mode for rising or dropping slowly is taken in adjustment, to reduce voice false dismissal probability.Separately
Outside, first threshold A0Dynamic regulation so that power consumption under quiet environment and noisy environment is close, so as to promote user's body
It tests, improves product competitiveness.
Fig. 5 is the structural schematic diagram of voice Rouser embodiment one of the present invention.The voice Rouser can be by hard
The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 5
It is shown, voice Rouser 10 include: SRC 11, computing circuit 12, threshold value decision circuit 13, the first withdrawal device 14, at a slow speed with
Track filter (Slow Tracking Filter, referred to as: STF) 15, comparator 16, configurator 17 and interrupt processing circuit 18.
Wherein, SRC 11 is used to carry out periodic samples to audio signal, wherein in tiInstance sample obtains sampled signal
yi, i is positive integer.Computing circuit 12 is for calculating sampled signal yiAudio power Ti.Threshold value decision circuit 13 is for judging sound
Frequency energy TiWhether t is greater than or equal toiThe first threshold A at moment0;In audio power TiMore than or equal to tiThe first threshold at moment
A0In the case where, triggering interrupt processing circuit 18 exports interruption pulse signal to interrupt control circuit 20, by interrupt control circuit
20 enabled DSP or processor 30 carry out VAD.The input terminal of first withdrawal device 14 is coupled to the output end of SRC 11, and first extracts
Device 14 is used for the first extraction yield 1/x to sampled signal yiIt is extracted to obtain sampled point ys and be exported, x is the nature greater than 1
Number.The input terminal of STF 15 is coupled to the output end of the first withdrawal device 14, and STF 15 is used to obtain extraction sampled point ys to carry out
Tracking filter obtains the first noise energy S at a slow speed0.The input terminal of comparator 16 is coupled to output end and the threshold value judgement of STF 15
Circuit 13, comparator 16 is for comparing the first noise energy S0With tiThe first threshold A at moment0Difference whether be greater than it is preset
First threshold value M0.Configurator 17 is used for when VAD detection failure, and in tiContinuous n times detection failure, Yi Ji before moment
One noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to the first noise energy
Measure S0Generate second threshold A1, and by second threshold A1As ti+1The first threshold A at moment0, it is issued to threshold value decision circuit 13,
N is positive integer and n is less than i.
With reference to Fig. 5, configurator 17 is 10 configuration parameter of voice Rouser, such as above-mentioned first threshold A0Deng.This field
Technical staff is appreciated that configurator 17 receives the configuration parameter for carrying out self terminal, and configuration parameter is converted into waking up voice
The corresponding control signal of each logic module in device 10, wherein logic module includes computing circuit 12, threshold value decision circuit 13
With interrupt processing circuit 18 etc..SRC 11 can specifically sample audio signal by the way of down-sampled, such as by 32
The data of kilohertz (kilohertz, referred to as: KHz) are converted to 16KHz etc..
Sampled signal yiFlow direction in Fig. 5 are as follows:
SRC 11-> computing circuit 12-> threshold value decision circuit 13-> interrupt processing circuit 18 (optional)-> interruption control
Circuit 20 (optional)-> DSP or processor 30 (optional) processed.
In audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, sampled signal yiFlow direction include
Above-mentioned optional part;In audio power TiLess than tiThe first threshold A at moment0In the case where, sampled signal yiFlow direction do not wrap
Include above-mentioned optional part.
First withdrawal device 14, STF 15 and comparator 16 do not influence normal voice and wake up, and are only used for total with configurator 17
Same-action changes the first threshold A in voice wake-up0。
The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1, realization principle and skill
Art effect is similar, and details are not described herein again.
In the above-described embodiments, configurator 17 can be specifically used for: by the first noise energy S0As second threshold A1;Alternatively,
By the first noise energy S0With preset first correction amount N0The sum of be used as second threshold A1, i.e. A1=S0+N0;Alternatively, by first
Noise energy S0With preset first coefficient a0Product as second threshold A1, i.e. A1=a0×S0, etc., the embodiment of the present invention is not
As limitation.
Fig. 6 is the structural schematic diagram of voice Rouser embodiment two of the present invention.The voice Rouser can be by hard
The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 6
Shown, voice Rouser 100 includes: SRC 110, computing circuit 120, threshold value decision circuit 130, the second withdrawal device 140, fast
Fast tracking filter (Fast Tracking Filter, referred to as: FTF) 150, comparator 160, configurator 170 and interrupt processing
Circuit 180.
Wherein, SRC 110 is used to carry out periodic samples to audio signal, wherein in tiInstance sample obtains sampling letter
Number yi, i is positive integer.Computing circuit 120 is for calculating sampled signal yiAudio power Ti.Threshold value decision circuit 130 is for sentencing
Disconnected audio power TiWhether t is greater than or equal toiThe first threshold A at moment0.The input terminal of second withdrawal device 140 is coupled to SRC
110 output end, the second withdrawal device 140 are used for the second extraction yield 1/z to sampled signal yiIt is extracted to obtain sampled point yf,
Wherein, z is the natural number greater than x.The input terminal of FTF 150 is coupled to the output end of the second withdrawal device 140, and FTF 150 is used for
The quick tracking filter of sampled point yf progress is obtained to extraction and obtains the second noise energy F0.The input terminal of comparator 160 is coupled to
The output end of FTF 150, comparator 160 are used in audio power TiLess than tiThe first threshold A at moment0In the case where, it is relatively more each
The first threshold at moment and the second noise energy F0Difference whether be greater than preset second threshold value M1;And work as from ti-mMoment is straight
To tiMoment respective first threshold A0With the second noise energy F0Difference be both greater than preset second threshold value M1The case where
Under, triggering interrupt processing circuit 180 exports interruption pulse signal to interrupt control circuit 200, is enabled by interrupt control circuit 200
DSP or processor 300 carry out VAD, and m is positive integer and m is less than i.Configurator 170 is used for when VAD is detected successfully, according to second
Noise energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold A at moment0, it is issued to threshold value judgement
Circuit 130.
The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 3, realization principle and skill
Art effect is similar, and details are not described herein again.
On the basis of the above embodiments, configurator can be specifically used for: by the second noise energy F0As third threshold value
A2;Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as third threshold value A2, i.e. A2=F0+N1;Or
Person, by the second noise energy F0With preset second coefficient a1Product as third threshold value A2, i.e. A2=a1×F0, etc., this hair
Bright embodiment is not limited system.
Optionally, configurator 170 can be also used for: record tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one reduces
The time interval at threshold value moment is greater than preset value TtimeWhen, it executes above-mentioned by third threshold value A2As ti+1The first threshold A at moment0
The step of, otherwise, do not execute above-mentioned by third threshold value A2As ti+1The first threshold A at moment0The step of, thus prevent first
Threshold value A0Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability.
With reference to Fig. 5, configurator 17 be can be also used for: in audio power TiLess than tiThe first threshold A at moment0, and tiMoment
First threshold A0With the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise
Energy S0Generate the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0。
At this point, the device of the present embodiment, can be used for executing the technical solution of embodiment of the method shown in Fig. 4, realize former
Reason is similar with technical effect, and details are not described herein again.
Further, configurator 17 can be specifically used for: by the first noise energy S0As the 4th threshold value A3;Alternatively, by first
Noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3, i.e. A3=S0+N2;Alternatively, by the first noise energy
Measure S0With preset third coefficient a2Product as the 4th threshold value A3, i.e. A3=a2×S0, etc., the embodiment of the present invention not as
Limitation.
Further, configurator 17 can be also used for: record tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one
The time interval for reducing the threshold value moment is greater than preset value TtimeWhen, it executes above-mentioned by the 4th threshold value A3As ti+1First threshold at moment
Value A0The step of, otherwise, do not execute above-mentioned by the 4th threshold value A3As ti+1The first threshold A at moment0The step of, it thus prevents
First threshold A0Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability
With reference to Fig. 5 and Fig. 6, the first withdrawal device 14 and the second withdrawal device 140 realize long period or short-period number respectively
According to extraction.STF 15 is the filter of a slow convergence, is changed for tenacious tracking ambient noise.FTF 150 is one fast
The convergent filter of speed, changes for quick tracking environmental noise.Optionally, STF 15 is the filter of a slow convergence,
Change for tenacious tracking ambient noise.STF15 and FTF 150 is used to track the energy of current calculating window, use and operation
Circuit 12 or the similar structure of computing circuit 120.The order and parameter that the difference of STF 15 and FTF 150 is filter are not
Together, and the order of filter and parameter are set according to actual debugging situation.FTF 150 is used to carry out short cycle filter
Wave, that is, the data variation that occurs recently are capable of the output of rapid contribution filter.STF 15 is long period filtering, that is,
Influence of the data variation occurred recently to the output of filter is smaller and slow.
Optionally, on the basis of Fig. 5, in conjunction with Fig. 6, structure as shown in Figure 7 is obtained.Fig. 7 is voice of the present invention wake-up
The structural schematic diagram of Installation practice three.As shown in fig. 7, voice Rouser 1000 includes: SRC 11, computing circuit 12, threshold
Be worth decision circuit 13, the first withdrawal device 14, the second withdrawal device 140, STF 15, FTF 150, comparator 16, configurator 17 and in
Disconnected processing circuit 18.
Wherein, threshold value decision circuit 13 is also equipped with the effect and function of threshold value decision circuit 130;Comparator 16 is also equipped with ratio
Compared with the effect and function of device 160;Configurator 17 is also equipped with the effect and function of configurator 170;Interrupt processing circuit 18 is also equipped with
The effect and function of interrupt processing circuit 180.Concrete principle such as above-described embodiment, details are not described herein again.
The embodiment of the present invention continues to monitor and tracking environmental ambient noise, according to the size adaptation tune of environmental background noise
Whole first threshold A0, and to first threshold A0The slow mode for rising or dropping slowly is taken in adjustment, to reduce voice false dismissal probability.Separately
Outside, first threshold A0Dynamic regulation so that power consumption under quiet environment and noisy environment is close, so as to promote user's body
It tests, improves product competitiveness.
In several embodiments provided herein, it should be understood that revealed device and method can pass through it
Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the unit or module
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or module
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or module
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.