CN105261368B

CN105261368B - A kind of voice awakening method and device

Info

Publication number: CN105261368B
Application number: CN201510549435.6A
Authority: CN
Inventors: 马涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Guangdong Gaohang Intellectual Property Operation Co ltd; Nanjing Advanced Biomaterials And Process Equipment Research Institute Co ltd
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2019-05-21
Anticipated expiration: 2035-08-31
Also published as: CN105261368A

Abstract

The embodiment of the present invention provides a kind of voice awakening method and device.This method comprises: carrying out periodic samples to audio signal, wherein in t_iInstance sample obtains sampled signal；Calculate the audio power of sampled signal；It is greater than or equal to t in audio power_iWhen the first threshold at moment, wakes up DSP and carry out voice activation detection VAD；Fail when VAD is detected, and in t_iContinuous n times detect failure and the first noise energy and t before moment_iWhen the difference of the first threshold at moment is greater than preset first threshold value, second threshold is generated according to the first noise energy, and using second threshold as t_i+1The first threshold at moment, wherein the first noise energy is that tracking filter obtains at a slow speed by being extracted with the first extraction yield 1/x to sampled signal, and to the sampled point progress extracted.The embodiment of the present invention can reduce the number for carrying out VAD, realize the reduction of terminal power consumption under noisy environment.

Description

A kind of voice awakening method and device

Technical field

The present embodiments relate to voice awakening technology more particularly to a kind of voice awakening methods and device.

Background technique

With the development of science and technology, terminal generally has voice arousal function, and user wakes up terminal simultaneously using voice Corresponding voice control is carried out to it.

Current voice wake up scheme be using microphone activation detection (Microphone Activity Detection, Referred to as: MAD) circuit and digital signal processor (Digital Signal Processor, referred to as: DSP) two-stage cooperation is to call out Awake terminal.Wherein, it if the energy for the current audio signals that MAD circuit detects is greater than preset threshold, wakes up DSP and carries out language Sound activation detection (Voice Activity Detection, referred to as: VAD), with by VAD identify above-mentioned audio signal whether be The voice of user；If so, waking up terminal；If it is not, it is Lost wake-up or false wake-up that DSP, which wakes up,.Specifically, VAD passes through comparison The feature of the voice of the feature and user of above-mentioned audio signal, come judge voice signal whether be user voice.

Scheme is waken up using above-mentioned voice, when terminal is in different environment, such as is switched to by quiet environment noisy Under environment, since preset threshold is fixed, the phenomenon that often will appear Lost wake-up or false wake-up, terminal is caused to exist Power consumption under noisy environment is higher.

Summary of the invention

The embodiment of the present invention provides a kind of voice awakening method and device, to reduce power consumption of the terminal under noisy environment.

In a first aspect, the embodiment of the present invention provides a kind of voice awakening method, comprising:

Periodic samples are carried out to audio signal, wherein in t_iInstance sample obtains sampled signal y_i, i is positive integer；

Calculate the sampled signal y_iAudio power T_i；

In the audio power T_iMore than or equal to the t_iThe first threshold A at moment₀In the case where, carry out voice activation Detect VAD；

When VAD continuous n times detection failure, and when VAD detection failure, and in the t_iContinuous n times inspection before moment Dendrometry loses and the first noise energy S₀With the t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀ When, according to the first noise energy S₀Generate second threshold A₁, and by the second threshold A₁As t_i+1First threshold at moment Value A₀, wherein the first noise energy S₀Be by with the first extraction yield 1/x to the sampled point y_iIt is extracted, and to pumping The sampled point ys of taking-up carries out tracking filter at a slow speed and obtains, and x is the natural number greater than 1, and n is positive integer and n is less than i.

With reference to first aspect, in the first possible implementation of the first aspect, described according to first noise Energy S₀Generate second threshold A₁, comprising:

By the first noise energy S₀As the second threshold A₁；

Alternatively, by the first noise energy S₀With preset first correction amount N₀The sum of be used as the second threshold A₁；

Alternatively, by the first noise energy S₀With preset first coefficient a₀Product be used as the second threshold A₁。

With reference to first aspect, in the second possible implementation of the first aspect, the sampling letter is calculated described Number y_iAudio power T_iLater, further includes:

In the audio power T_iLess than the t_iThe first threshold A at moment₀, and from t_i-mMoment is until t_iMoment is respective First threshold A₀With the second noise energy F₀Difference be both greater than preset second threshold value M₁In the case where, VAD is carried out, m is positive Integer and m are less than i；

When VAD is detected successfully, according to the second noise energy F₀Generate third threshold value A₂, and by the third threshold value A₂As t_i+1The first threshold A at moment₀, wherein the second noise energy F₀It is by being adopted with the second extraction yield 1/z to described Sample signal y_iIt is extracted, and quick tracking filter is carried out to the sampled point yf extracted and is obtained, wherein z is the nature greater than x Number.

The possible implementation of second with reference to first aspect, in the third possible implementation of first aspect In, it is described according to the second noise energy F₀Generate third threshold value A₂, comprising:

By the second noise energy F₀As the third threshold value A₂；

Alternatively, by the second noise energy F₀With preset second correction amount N₁The sum of be used as the third threshold value A₂；

Alternatively, by the second noise energy F₀With preset second coefficient a₁Product be used as the third threshold value A₂。

Second with reference to first aspect or the third possible implementation, in the 4th kind of possible reality of first aspect In existing mode, by the third threshold value A₂As t_i+1The first threshold A at moment₀Before, further includes:

Record the t_iMoment is to reduce the threshold value moment；

As the t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, it executes described by institute State third threshold value A₂As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute described by the third threshold value A₂As t_i+1The first threshold A at moment₀The step of.

With reference to first aspect, in the fifth possible implementation of the first aspect, the sampling letter is calculated described Number y_iAudio power T_iLater, further includes:

In the audio power T_iLess than the t_iThe first threshold A at moment₀, and the t_iThe first threshold A at moment₀With institute State the first noise energy S₀Difference be greater than preset third threshold value M₂In the case where, according to the first noise energy S₀It is raw At the 4th threshold value A₃, and by the 4th threshold value A₃As t_i+1The first threshold A at moment₀。

The 5th kind of possible implementation with reference to first aspect, in the 6th kind of possible implementation of first aspect In, it is described according to the first noise energy S₀Generate the 4th threshold value A₃, comprising:

By the first noise energy S₀As the 4th threshold value A₃；

Alternatively, by the first noise energy S₀With preset third correction amount N₂The sum of be used as the 4th threshold value A₃；

Alternatively, by the first noise energy S₀With preset third coefficient a₂Product be used as the 4th threshold value A₃。

The 5th kind with reference to first aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of first aspect In existing mode, by the 4th threshold value A₃As t_i+1The first threshold A at moment₀Before, further includes:

Record the t_iMoment is to reduce the threshold value moment；

As the t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, it executes described by institute State the 4th threshold value A₃As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute described by the 4th threshold value A₃As t_i+1The first threshold A at moment₀The step of.

Second aspect, the embodiment of the present invention provide a kind of voice Rouser, comprising:

Sampling frequency converter SRC, for carrying out periodic samples to audio signal, wherein in t_iInstance sample obtains Sampled signal y_i, i is positive integer；

Computing circuit, for calculating the sampled signal y_iAudio power T_i；

Threshold value decision circuit, for judging the audio power T_iWhether the t is greater than or equal to_iThe first threshold at moment A₀；In the audio power T_iMore than or equal to the t_iThe first threshold A at moment₀In the case where, triggering interrupt processing circuit is defeated Interruption pulse signal enables digital signal processor DSP or processor by the interrupt control circuit to interrupt control circuit out It carries out voice activation and detects VAD；

The input terminal of first withdrawal device, first withdrawal device is coupled to the output end of the SRC, for extracting with first Rate 1/x is to the sampled signal y_iIt is extracted to obtain sampled point ys, x is the natural number greater than 1；

Tracking filter STF at a slow speed, the input terminal of the STF are coupled to the output end of first withdrawal device, for pair The extraction obtains sampled point ys progress, and tracking filter obtains the first noise energy S at a slow speed₀；

Comparator, the input terminal of the comparator are coupled to and the output end of the STF and the threshold value decision circuit, use In the first noise energy S₀With the t_iThe first threshold A at moment₀Difference whether be greater than preset first threshold value M₀；

Configurator, for failing when VAD detection, and in the t_iContinuous n times detection failure and described before moment First noise energy S₀With the t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀When, according to described First noise energy S₀Generate second threshold A₁, and by the second threshold A₁As t_i+1The first threshold A at moment₀, it is issued to The threshold value decision circuit, n is positive integer and n is less than i.

In conjunction with second aspect, in the first possible implementation of the second aspect, the configurator is specifically used for:

By the first noise energy S₀As the second threshold A₁；

In conjunction with second aspect, in a second possible implementation of the second aspect, further includes:

The input terminal of second withdrawal device, second withdrawal device is coupled to the output end of the SRC, for extracting with second Rate 1/z is to the sampled signal y_iIt is extracted to obtain sampled point yf, wherein z is the natural number greater than x；

Fast tracking filter FTF, the input terminal of the FTF are coupled to the output end of second withdrawal device, for pair The extraction obtains the quick tracking filter of sampled point yf progress and obtains the second noise energy F₀Second noise energy；

The comparator, the output end with the FTF are also used in the audio power T_iLess than the t_iThe of moment One threshold value A₀In the case where, the first threshold at more each moment and the second noise energy F₀Difference whether be greater than it is preset Second threshold value M₁；And work as from t_i-mMoment is until t_iMoment respective first threshold A₀With the second noise energy F₀Difference Both greater than preset second threshold value M₁In the case where, the interrupt processing circuit output interruption pulse signal is triggered in described Disconnected control circuit enables the DSP by the interrupt control circuit or the processor carries out VAD, and m is positive integer and m is less than i；

The configurator is also used to when VAD is detected successfully, according to the second noise energy F₀Generate third threshold value A₂, And by the third threshold value A₂As t_i+1The first threshold A at moment₀, it is issued to the threshold value decision circuit.

In conjunction with second of possible implementation of second aspect, in the third possible implementation of second aspect In, the configurator is specifically used for:

By the second noise energy F₀As the third threshold value A₂；

In conjunction with second of second aspect or the third possible implementation, in the 4th kind of possible reality of second aspect In existing mode, the configurator is also used to:

Record the t_iMoment is to reduce the threshold value moment；

In conjunction with second aspect, in a fifth possible implementation of the second aspect, the configurator is also used to:

In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect In, the configurator is specifically used for:

By the first noise energy S₀As the 4th threshold value A₃；

In conjunction with the 5th kind of second aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of second aspect In existing mode, the configurator is also used to:

Record the t_iMoment is to reduce the threshold value moment；

The embodiment of the present invention provides a kind of voice awakening method and device, passes through and obtains t_iInstance sample obtains sampled signal y_iAudio power T_i, and in audio power T_iMore than or equal to t_iThe first threshold A at moment₀In the case where, carry out VAD；When VAD detection failure, and in t_iContinuous n times detect failure and the first noise energy S before moment₀With t_iFirst threshold at moment Value A₀Difference be greater than preset first threshold value M₀When, adjust first threshold A₀Size, obtain t_i+1The first threshold at moment A₀: according to the first noise energy S₀Generate second threshold A₁, and by second threshold A₁As t_i+1The first threshold A at moment₀.Wherein, First noise energy S₀Be by with the first extraction yield 1/x to sampled signal y_iExtracted, and to the sampled point ys extracted into Tracking filter obtains row at a slow speed, that is to say, that t_i+1The first threshold A at moment₀It is according to t_iThe first noise energy S at moment₀? It arrives, in this way, terminal can adjust the first threshold A of subsequent time according to ambient noise present₀Size, make the of each moment One threshold value A₀The reduction of terminal power consumption under noisy environment is realized with environments match to reduce the number for carrying out VAD.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, embodiment will be described below Needed in attached drawing do one and simply introduce, it should be apparent that, the accompanying drawings in the following description is some realities of the invention Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be attached according to these Figure obtains other attached drawings.

Fig. 1 is the flow chart of voice awakening method embodiment one of the present invention；

Fig. 2 is the first threshold exemplary diagram of voice awakening method of the present invention under various circumstances；

Fig. 3 is the flow chart of voice awakening method embodiment two of the present invention；

Fig. 4 is the flow chart of voice awakening method embodiment three of the present invention；

Fig. 5 is the structural schematic diagram of voice Rouser embodiment one of the present invention；

Fig. 6 is the structural schematic diagram of voice Rouser embodiment two of the present invention；

Fig. 7 is the structural schematic diagram of voice Rouser embodiment three of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.

The meaning that voice wakes up, refers in any case, can be swashed by predefined wake-up word to terminal It is living, and execute specific application.Similar user key-press lights screen, the processing to Activate Phone.The advantages of voice wakes up is liberation The both hands of user.

In the voice wake-up scheme of a smart phone, under quiet environment, the stand-by power consumption of the smart phone is about 2.2 millis × 3.8 volts of peace；Under noisy environment, the stand-by power consumption of the smart phone is 5.5 milliamperes × 3.8 volts.As it can be seen that the intelligence hand Power consumption difference of the machine under noisy environment and quiet environment is about 12 milliwatts, (5.5-2.2) × 3.8=12.

According to power consumption estimation model: therefore average power consumption=peace and quiet power consumption × noisy power consumption × 30% of 70%+ is considered as The power consumption under noisy environment is reduced, the embodiment of the present invention pays close attention to the optimised power consumption under noisy environment.

The embodiment of the present invention provides the method and device that a kind of voice wakes up digital signal processor in terminal, is called out with reducing DSP carries out the number of VAD in terminal of waking up, and realizes the reduction of power consumption of the terminal under noisy environment.

Fig. 1 is the flow chart of voice awakening method embodiment one of the present invention.This method can be executed by voice Rouser, The device can be realized by way of hardware.Voice Rouser can integrate in such as tablet computer, smart phone, palm In the terminals such as computer (Personal Digital Assistant, referred to as: PDA).As shown in Figure 1, voice awakening method includes:

S101, periodic samples are carried out to audio signal, wherein in t_iInstance sample obtains sampled signal y_i, i is positive whole Number.

Similarly, t_i-1The sampled signal at moment can be denoted as y_i-1, t_i+1The sampled signal at moment can be denoted as y_i+1, with this Analogize, is not listed one by one here.

Wherein, in any embodiment of the present invention, audio signal can be collected for sound collection equipments such as microphones Signal.By sampling frequency converter (Sample Rate Convertor, referred to as: SRC) to sound collection equipments such as microphones Collected audio signal carries out periodic samples.Alternatively, the collected audio signal of the sound collection equipments such as microphone is passed through After crossing the filter process such as bandpass filter, then periodic samples are carried out by SRC, the embodiment of the present invention does not limit it System.

S102, sampled signal y is calculated_iAudio power T_i。

It is carried out after obtaining sampled signal it should be noted that can be to the calculating of the audio power of sampled signal , such as: in t_i-1Instance sample obtains sampled signal y_i-1Afterwards, sampled signal y can also be calculated_i-1Corresponding audio power T_i-1。

It will be appreciated by those skilled in the art that because of sampled signal y_iBe it is certain, therefore, sampled signal y_iAudio power T_iIt can be obtained by calculating.

Specifically, sampled signal y is indicated using x (j)_iIn the amplitude of jth sampled point, x (j) × x (j) indicates sampled signal y_iIn the energy size at jth moment, j is 0 to the integer between M-1, and M is total number of sample points, coefficient a_jFor indicating each sampling The weight size of point, T_iIndicate sampled signal y_iAudio power.For example, following formula is a normalized processing, specifically Indicate the percentage that each sampled point is occupied in integral energy:

Wherein,

Here it only illustrates and calculates sampled signal y_iAudio power T_i, the embodiment of the present invention is not limited system, can also To obtain sampled signal y by root mean square (Root mean square, referred to as: RMS) or other similar mode_iAudio power T_i, such as without normalized processing, etc..

S103, in audio power T_iMore than or equal to t_iThe first threshold A at moment₀In the case where, carry out VAD.

Wherein, the elements such as DSP or the processor that specifically can be in terminal of VAD are carried out.

S104, fail when VAD is detected, and in t_iContinuous n times detect failure and the first noise energy S before moment₀ With t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀When, according to the first noise energy S₀Generate second Threshold value A₁, and by second threshold A₁As t_i+1The first threshold A at moment₀, wherein the first noise energy S₀It is by being taken out with first Take rate 1/x to sampled signal y_iIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1 Natural number, n is that positive integer and n are less than i.

It should be noted that when VAD detection failure, and in the t_iContinuous n times detection unsuccessfully refers to before moment: t_iThe VAD that moment carries out detects failure, and from t_i-nMoment is to t_i-1The VAD detection that moment carries out fails, specifically, it is assumed that n It is 2, then fails when VAD is detected, and in the t_iContinuous n times detection unsuccessfully refers to before moment: in t_iThe VAD that moment carries out Before detection failure, continuous two moment are (i.e. from t_i-2Moment is to t_i-1Moment) carried out the continuous failure of VAD detection 2 times.Into One step, for the ease of more fully understanding technical solution of the present invention, VAD detection is unsuccessfully illustrated, such as: when Before be automobile engine sound, due to the sound audio power be greater than current time first threshold A₀, then need to carry out VAD still passes through VAD, it can be determined that goes out the voice that the sound is not user, therefore VAD detection failure.In other words, if Terminal is in high-noise environment, correspondingly, the noise energy of ambient noise can be relatively high, once the noise energy of ambient noise Greater than the first threshold A at current time₀, it is necessary to start VAD, however, examining since ambient noise itself is disorderly and unsystematic in VAD , can not be from the voice signal being wherein tested with when survey, therefore will lead to VAD detection failure.First noise energy S₀It indicates The energy level of the steady-state noise of terminal local environment.First threshold value M₀It is preset parameter, can be determined by debugging.

It should also be noted that, first and second for distinguishing same term in any embodiment of the present invention, For example, " second " in " first " and " second threshold " of " first threshold ", the name side only distinguished to different threshold values Formula does not represent the order between threshold value.

Under actual application scenarios, the noise under different application scene is of different sizes.For example, making an uproar under quiet environment About 30 to 35 decibels of sound (decibel, referred to as: db)；Under noisy environment, ambient noise can refer to following data: mall noise About 60db, road noise about 70db, aircraft cabin noise about 70db, public transport noise about 80db, metro noise about 90db, etc..Separately Outside, the noise size in same place, different time is also different.For example, the noise in same place, daytime and evening may phase Poor 10 to 15db.

Furthermore when user converses under noisy environment, talks, speech volume can be subconsciously improved, to mention High signal-to-noise ratio (Signal Noise Ratio, referred to as: SNR), it is basic to provide feasibility for voice wake-up.

Therefore, at present using unified noise gate, i.e. preset threshold, voice wake up scheme, when voice wakes up terminal, It cannot be distinguished and treat quiet environment and noisy environment, if preset threshold setting is excessively high, will lead to voice missing inspection；If preset threshold is set It sets too low, then will lead to frequent wake-up processor, and then cause power consumption bigger than normal.

In embodiments of the present invention, the first threshold A at each moment is adjusted in due course₀Size.

Specifically, it by S101 to S103, obtains in t_iInstance sample obtains sampled signal y_iAudio power T_iAnd it should Audio power T_iOpposite t_iThe first threshold A at moment₀Size, and work as audio power T_iMore than or equal to t_iThe first threshold at moment A₀In the case where, VAD is carried out, so that DSP or processor etc. carry out VAD and according to VAD's as a result, judging whether to wake up terminal. Wherein, VAD is detected successfully, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal y_iIn detect the language of user Sound then wakes up terminal；Otherwise, VAD detection failure, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal y_iIn The voice for not detecting user, then do not wake up terminal.

In S104, in the first noise energy S₀With t_iThe first threshold A at moment₀Difference be greater than preset first thresholding Value M₀When, show that terminal is currently likely to be at the environment of high background noise.At this point, according to the first noise energy S₀Generate the second threshold Value A₁, and by second threshold A₁As t_i+1The first threshold A at moment₀.Wherein, the first noise energy S₀It is by being extracted with first Rate 1/x is to sampled signal y_iIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1 Natural number, n are the positive integer less than i.In practical application, sampled signal y_iIt may include t_iThe voice and environment of the user at moment Noise, alternatively, sampled signal y_iIt only include t_iThe ambient noise at moment.In t_iMoment obtains t_i+1The first threshold A at moment₀, i.e., t_i+1When the moment, terminal executes first threshold used in S103 and S104 in voice awakening method.

If t_iIt is that first time voice wakes up that the voice at moment, which wakes up, then t_iThe first threshold A at moment₀It can be preset.It can To think, preset first threshold A₀It is an Optimal Parameters, corresponding a kind of possible application scenarios, for example, by first threshold A₀In advance It is set as 50 decibels, it is believed that be the ambient noise thresholding under quiet environment.Wherein, Fig. 2 example show quiet environment with it is noisy First threshold under environment.As shown in Fig. 2, first threshold is higher by the first preset value compared with ambient noise under quiet environment；Noisy ring Under border, first threshold is higher by the second preset value compared with ambient noise.In addition, the first threshold of noisy environment is above quiet environment First threshold.

In addition, S103 can be with are as follows: 1) in audio power T_iWith t_i-1The audio power T at moment_i-1Difference be greater than or wait In t_iThe differential threshold A at moment₀₀In the case where, carry out VAD；Alternatively, 2) in audio power T_iMore than or equal to t_iThe first of moment Threshold value A₀, and, audio power T_iWith t_i-1The audio power T at moment_i-1Difference be greater than or equal to t_iThe differential threshold A at moment₀₀'s In the case of, carry out VAD；Alternatively, 3) in audio power T_iMore than or equal to t_iThe first threshold A at moment₀, or, audio power T_iWith t_i-1The audio power T at moment_i-1Difference be greater than or equal to t_iThe differential threshold A at moment₀₀, in the case that the two meets one, Carry out VAD.Wherein, t_i-1The audio power T at moment_i-1It caches in the terminal, in t_i-1Moment calculates sampled signal y_i-1's Audio power obtains.

If 1), then similar adjustment t_iThe first threshold A at moment₀Method, adjust t_iThe differential threshold A at moment₀₀；If 2), then similar to adjust t_iThe first threshold A at moment₀Method, while adjusting t_iThe first threshold A at moment₀And t_iThe difference at moment Threshold value A₀₀；If 3), then similar adjustment t_iThe first threshold A at moment₀Method, adjust t_iThe first threshold A at moment₀Or t_iWhen The differential threshold A at quarter₀₀。

The embodiment of the present invention is by obtaining t_iInstance sample obtains sampled signal y_iAudio power T_i, and in the audio energy Measure T_iMore than or equal to t_iThe first threshold A at moment₀In the case where, carry out VAD；Fail when VAD is detected, and in t_iBefore moment Continuous n times detection failure and the first noise energy S₀With t_iThe first threshold A at moment₀Difference be greater than preset first Limit value M₀When, adjust first threshold A₀Size, obtain t_i+1The first threshold A at moment₀: according to the first noise energy S₀Generate the Two threshold value As₁, and by second threshold A₁As t_i+1The first threshold A at moment₀.Wherein, the first noise energy S₀It is by with first Extraction yield 1/x is to sampled signal y_iIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is, It says, t_i+1The first threshold A at moment₀It is according to t_iThe first noise energy S at moment₀It obtains, in this way, terminal can be according to current The first threshold A of ambient noise adjustment subsequent time₀Size, make the first threshold A at each moment₀With environments match, with reduce into The number of row VAD realizes the reduction of terminal power consumption under noisy environment.

In the above-described embodiments, according to the first noise energy S₀Generate second threshold A₁, may include: by the first noise energy Measure S₀As second threshold A₁；Alternatively, by the first noise energy S₀With preset first correction amount N₀The sum of be used as second threshold A₁, That is A₁=S₀+N₀；Alternatively, by the first noise energy S₀With preset first coefficient a₀Product as second threshold A₁, i.e. A₁=a₀× S₀。

Wherein, if the first correction amount N₀Numerical value it is larger, illustrate second threshold A₁In the first noise energy S₀On the basis of rise High is fast；If the first correction amount N₀Numerical value it is smaller, illustrate second threshold A₁In the first noise energy S₀On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, the first correction amount N₀Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if the first coefficient a₀Numerical value it is larger, illustrate second threshold A₁First Noise energy S₀On the basis of it is raised fast；If the first coefficient a₀Numerical value it is smaller, illustrate second threshold A₁In the first noise energy S₀On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the first coefficient a₀Size can To be set according to actual scene, the embodiment of the present invention is not limited.

It optionally, can also be by the first noise energy S₀With preset first coefficient a₀Product, add preset first Correction amount N₀As second threshold A₁, A₁=a₀×S₀+N₀。

Fig. 3 is the flow chart of voice awakening method embodiment two of the present invention.As shown in figure 3, this method may include:

S301, periodic samples are carried out to audio signal, wherein in t_iInstance sample obtains sampled signal y_i, i is positive whole Number.

S302, sampled signal y is calculated_iAudio power T_i。

S303, in audio power T_iLess than t_iThe first threshold A at moment₀, and from t_i-mMoment is until t_iMoment respective One threshold value A₀With the second noise energy F₀Difference be both greater than preset second threshold value M₁In the case where, VAD is carried out, m is positive whole It counts and m is less than i.

Illustratively, if m=2, as audio power T_iLess than t_iThe first threshold A at moment₀, and t_i-2The first of moment Threshold value A₀With the second noise energy F₀Difference be greater than the second threshold value M₁, t_i-1The first threshold A at moment₀With the second noise energy F₀Difference be greater than the second threshold value M₁And t_iThe first threshold A at moment₀With the second noise energy F₀Difference be greater than second Limit value M₁When, carry out VAD.

S304, when VAD is detected successfully, according to the second noise energy F₀Generate third threshold value A₂, and by third threshold value A₂Make For t_i+1The first threshold A at moment₀, wherein second noise energy F₀Be by with the second extraction yield 1/z to sampled signal y_iInto Row extracts, and carries out quick tracking filter to the sampled point yf extracted and obtain, wherein z is the natural number greater than x.

Wherein, illustrating for S301 and S302 can refer to embodiment as shown in Figure 1, and details are not described herein again.

For S303, in audio power T_iLess than t_iThe first threshold A at moment₀In the case where, for the voice of the prior art Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, t_iFirst threshold at moment Value A₀Suitable for noisy environment, but at this time, terminal is in opposite quiet environment (for example, environment of low ambient noise), to lead Cause sampled signal y_iThe missing inspection of the voice of middle user.The embodiment of the present invention changes t by S303 and S304_i+1First threshold at moment Value A₀, match it with current environment.

When from t_i-mMoment is until t_iMoment respective first threshold A₀With the second noise energy F₀Difference it is both greater than default The second threshold value M₁When, that is, add up first threshold A occur m+1 times₀With the second noise energy F₀Difference be greater than preset second Threshold value M₁The case where, illustrate that terminal is now in quiet environment (environment of low ambient noise), current first threshold A₀Compared with Greatly, it needs to lower, to match with quiet environment.Wherein, the second threshold value M₁It is preset parameter, can be obtained by debugging.

S304, VAD are detected successfully, illustrate sampled signal y_iIn include user voice, for the language for avoiding the user The missing inspection of sound, according to the second noise energy F₀Generate third threshold value A₂, and by third threshold value A₂As t_i+1The first threshold at moment A₀.Wherein, second noise energy F₀Be by with the second extraction yield 1/z to sampled signal y_iIt is extracted, and to extracting Sampled point yf carries out quick tracking filter and obtains, therefore, the second noise energy F₀It can reflect to a certain extent locating for terminal The energy level of the transient noise of environment.

The embodiment of the present invention is by obtaining t_iInstance sample obtains sampled signal y_iAudio power T_i, and in the audio energy Measure T_iLess than t_iThe first threshold A at moment₀, and from t_i-mMoment is until t_iMoment respective first threshold A₀With the second noise energy F₀Difference be both greater than preset second threshold value M₁In the case where, carry out VAD；When VAD is detected successfully, according to the second noise Energy F₀Generate third threshold value A₂, and by third threshold value A₂As t_i+1The first threshold A at moment₀.Wherein, second noise energy F₀Be by with the second extraction yield 1/z to sampled signal y_iIt is extracted, and quickly tracking filter is carried out to the sampled point yf extracted Wave obtains, that is to say, that t_i+1The first threshold A at moment₀It is according to t_iThe second noise energy F at moment₀It obtains, in this way, eventually End can adjust the first threshold A of subsequent time according to ambient noise present₀Size, make the first threshold A at each moment₀With ring Border matching realizes that terminal in the case where the reduction of power consumption, is further kept away under noisy environment to reduce the number for carrying out VAD Exempt from sampled signal y_iThe missing inspection of the voice of middle user.

In the above-described embodiments, according to the second noise energy F₀Generate third threshold value A₂, can specifically include: by Two noise energy F₀As third threshold value A₂；Alternatively, by the second noise energy F₀With preset second correction amount N₁The sum of as the Three threshold value As₂, i.e. A₂=F₀+N₁；Alternatively, by the second noise energy F₀With preset second coefficient a₁Product as third threshold value A₂, That is A₂=a₁×F₀。

Wherein, if the second correction amount N₁Numerical value it is larger, illustrate third threshold value A₂In the second noise energy F₀On the basis of rise High is fast；If the second correction amount N₁Numerical value it is smaller, illustrate third threshold value A₂In the second noise energy F₀On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, the second correction amount N₁Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if the second coefficient a₁Numerical value it is larger, illustrate third threshold value A₂Second Noise energy F₀On the basis of it is raised fast；If the second coefficient a₁Numerical value it is smaller, illustrate third threshold value A₂In the second noise energy F₀On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the second coefficient a₁Size can To be set according to actual scene, the embodiment of the present invention is not limited.

It optionally, can also be by the second noise energy F₀With preset second coefficient a₁Product, add preset second Correction amount N₁As third threshold value A₂, A₂=a₁×F₀+N₁。

Fig. 4 is the flow chart of voice awakening method embodiment three of the present invention.As shown in figure 4, this method may include:

S401, periodic samples are carried out to audio signal, wherein in t_iInstance sample obtains sampled signal y_i, i is positive whole Number.

S402, sampled signal y is calculated_iAudio power T_i。

S403, in audio power T_iLess than t_iThe first threshold A at moment₀, and t_iThe first threshold A at moment₀With the first noise Energy S₀Difference be greater than preset third threshold value M₂In the case where, according to the first noise energy S₀Generate the 4th threshold value A₃, and By the 4th threshold value A₃As t_i+1The first threshold A at moment₀。

Wherein, illustrating for S401 and S402 can refer to embodiment as shown in Figure 1, and details are not described herein again.

As for S403, in audio power T_iLess than t_iThe first threshold A at moment₀In the case where, for the voice of the prior art Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, t_iFirst threshold at moment Value A₀Suitable for noisy environment, but at this time, terminal is in relatively quiet environment, so as to cause sampled signal y_iThe language of middle user The missing inspection of sound.The embodiment of the present invention changes t by S403_i+1The first threshold A at moment₀, it is made to match with current environment.

Work as t_iThe first threshold A at moment₀With the first noise energy S₀Difference be greater than preset third threshold value M₂When, That is, t_iThe first threshold A at moment₀Compare the first noise energy S₀It is larger, illustrate that terminal is now in relatively quiet environment, t_iThe first threshold A at moment₀It is larger, need to lower, with environments match.Wherein, third threshold value M₂It is preset parameter, it can be with It is obtained by debugging.

Because of the first noise energy S₀Be by with the first extraction yield 1/x to sampled signal y_iIt is extracted, and to extracting Sampled point ys carries out tracking filter at a slow speed and obtains, therefore the first noise energy S₀The stable energy of reaction environment.Therefore, S403 is not necessarily to As S303, the first threshold A at more multiple moment₀With the first noise energy S₀Difference be greater than preset third threshold value M₂.Work as t_iThe first threshold A at moment₀With the first noise energy S₀Difference be greater than preset third threshold value M₂When, it can illustrate Sampled signal y_iIn include user voice, for avoid the user voice missing inspection, according to the first noise energy S₀Generate the 4th Threshold value A₃, and by the 4th threshold value A₃As t_i+1The first threshold A at moment₀。

The embodiment of the present invention is by obtaining t_iInstance sample obtains sampled signal y_iAudio power T_i, and in the audio energy Measure T_iLess than t_iThe first threshold A at moment₀, and t_iThe first threshold A at moment₀With the first noise energy S₀Difference be greater than it is preset Third threshold value M₂In the case where, according to the first noise energy S₀Generate the 4th threshold value A₃, and by the 4th threshold value A₃As t_i+1When The first threshold A at quarter₀.Wherein, first noise energy S₀Be by with the first extraction yield 1/x to sampled signal y_iIt is extracted, And tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is to say, that t_i+1The first threshold A at moment₀It is according to t_i The first noise energy S at moment₀It obtains, in this way, terminal can adjust the first threshold of subsequent time according to ambient noise present Value A₀Size, make the first threshold A at each moment₀Realize terminal noisy to reduce the number for carrying out VAD with environments match Under environment in the case where the reduction of power consumption, sampled signal y is further avoided_iThe missing inspection of the voice of middle user.

Based on the above embodiment, wherein according to the first noise energy S₀Generate the 4th threshold value A₃It may include: that first makes an uproar Acoustic energy S₀As the 4th threshold value A₃；Alternatively, by the first noise energy S₀With preset third correction amount N₂The sum of be used as the 4th threshold Value A₃, i.e. A₃=S₀+N₂；Alternatively, by the first noise energy S₀With preset third coefficient a₂Product as the 4th threshold value A₃, i.e. A₃ =a₂×S₀。

Wherein, if third correction amount N₂Numerical value it is larger, illustrate the 4th threshold value A₃In the first noise energy S₀On the basis of rise High is fast；If third correction amount N₂Numerical value it is smaller, illustrate the 4th threshold value A₃In the first noise energy S₀On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, third correction amount N₂Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if third coefficient a₂Numerical value it is larger, illustrate the 4th threshold value A₃First Noise energy S₀On the basis of it is raised fast；If third coefficient a₂Numerical value it is smaller, illustrate the 4th threshold value A₃In the first noise energy S₀On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, third coefficient a₂Size can To be set according to actual scene, the embodiment of the present invention is not limited.

It optionally, can also be by the first noise energy S₀With preset third coefficient a₂Product, add preset third Correction amount N₂As the 4th threshold value A₃, i.e. A₃=a₂×S₀+N₂。

Supplementary explanation, the second correction amount N₁With third correction amount N₂Reflect respectively under different conditions, first threshold A₀The numerical value of relative noise energy lift.Wherein, first threshold A₀Opposite second noise energy F₀Big second correction amount N₁, first Threshold value A₀Opposite first noise energy S₀Big third correction amount N₂.In addition, due to the first noise energy S₀For tracking filter at a slow speed, Second noise energy F₀For quick tracking filter, therefore, optionally, third correction amount N₂Greater than the second correction amount N₁, with realization pair The Rapid matching of environment.

Further, the embodiment of the present invention can also record the scene of first threshold variation.For increasing first threshold Scene, can recorde as the rise threshold moment；For reducing the scene of first threshold, can recorde to reduce the threshold value moment.

Specifically, by third threshold value A₂As t_i+1The first threshold A at moment₀Before, this method can also include: record t_iMoment is to reduce the threshold value moment；Work as t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, it executes It is above-mentioned by third threshold value A₂As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute above-mentioned by third threshold value A₂As t_i+1The first threshold A at moment₀The step of.

By the 4th threshold value A₃As t_i+1The first threshold A at moment₀Before, this method can also include: record t_iMoment To reduce the threshold value moment；Work as t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, execution is above-mentioned will 4th threshold value A₃As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute above-mentioned by the 4th threshold value A₃As t_i+1Moment First threshold A₀The step of.

Above two concrete implementation mode can prevent first threshold A₀Pingpang handoff, while not influencing speech detection Reliability reduces voice false dismissal probability.

The embodiment of the present invention continues to monitor and tracking environmental ambient noise, according to the size adaptation tune of environmental background noise Whole first threshold A₀, and to first threshold A₀The slow mode for rising or dropping slowly is taken in adjustment, to reduce voice false dismissal probability.Separately Outside, first threshold A₀Dynamic regulation so that power consumption under quiet environment and noisy environment is close, so as to promote user's body It tests, improves product competitiveness.

Fig. 5 is the structural schematic diagram of voice Rouser embodiment one of the present invention.The voice Rouser can be by hard The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 5 It is shown, voice Rouser 10 include: SRC 11, computing circuit 12, threshold value decision circuit 13, the first withdrawal device 14, at a slow speed with Track filter (Slow Tracking Filter, referred to as: STF) 15, comparator 16, configurator 17 and interrupt processing circuit 18.

Wherein, SRC 11 is used to carry out periodic samples to audio signal, wherein in t_iInstance sample obtains sampled signal y_i, i is positive integer.Computing circuit 12 is for calculating sampled signal y_iAudio power T_i.Threshold value decision circuit 13 is for judging sound Frequency energy T_iWhether t is greater than or equal to_iThe first threshold A at moment₀；In audio power T_iMore than or equal to t_iThe first threshold at moment A₀In the case where, triggering interrupt processing circuit 18 exports interruption pulse signal to interrupt control circuit 20, by interrupt control circuit 20 enabled DSP or processor 30 carry out VAD.The input terminal of first withdrawal device 14 is coupled to the output end of SRC 11, and first extracts Device 14 is used for the first extraction yield 1/x to sampled signal y_iIt is extracted to obtain sampled point ys and be exported, x is the nature greater than 1 Number.The input terminal of STF 15 is coupled to the output end of the first withdrawal device 14, and STF 15 is used to obtain extraction sampled point ys to carry out Tracking filter obtains the first noise energy S at a slow speed₀.The input terminal of comparator 16 is coupled to output end and the threshold value judgement of STF 15 Circuit 13, comparator 16 is for comparing the first noise energy S₀With t_iThe first threshold A at moment₀Difference whether be greater than it is preset First threshold value M₀.Configurator 17 is used for when VAD detection failure, and in t_iContinuous n times detection failure, Yi Ji before moment One noise energy S₀With t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀When, according to the first noise energy Measure S₀Generate second threshold A₁, and by second threshold A₁As t_i+1The first threshold A at moment₀, it is issued to threshold value decision circuit 13, N is positive integer and n is less than i.

With reference to Fig. 5, configurator 17 is 10 configuration parameter of voice Rouser, such as above-mentioned first threshold A₀Deng.This field Technical staff is appreciated that configurator 17 receives the configuration parameter for carrying out self terminal, and configuration parameter is converted into waking up voice The corresponding control signal of each logic module in device 10, wherein logic module includes computing circuit 12, threshold value decision circuit 13 With interrupt processing circuit 18 etc..SRC 11 can specifically sample audio signal by the way of down-sampled, such as by 32 The data of kilohertz (kilohertz, referred to as: KHz) are converted to 16KHz etc..

Sampled signal y_iFlow direction in Fig. 5 are as follows:

SRC 11-> computing circuit 12-> threshold value decision circuit 13-> interrupt processing circuit 18 (optional)-> interruption control Circuit 20 (optional)-> DSP or processor 30 (optional) processed.

In audio power T_iMore than or equal to t_iThe first threshold A at moment₀In the case where, sampled signal y_iFlow direction include Above-mentioned optional part；In audio power T_iLess than t_iThe first threshold A at moment₀In the case where, sampled signal y_iFlow direction do not wrap Include above-mentioned optional part.

First withdrawal device 14, STF 15 and comparator 16 do not influence normal voice and wake up, and are only used for total with configurator 17 Same-action changes the first threshold A in voice wake-up₀。

The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1, realization principle and skill Art effect is similar, and details are not described herein again.

In the above-described embodiments, configurator 17 can be specifically used for: by the first noise energy S₀As second threshold A₁；Alternatively, By the first noise energy S₀With preset first correction amount N₀The sum of be used as second threshold A₁, i.e. A₁=S₀+N₀；Alternatively, by first Noise energy S₀With preset first coefficient a₀Product as second threshold A₁, i.e. A₁=a₀×S₀, etc., the embodiment of the present invention is not As limitation.

Fig. 6 is the structural schematic diagram of voice Rouser embodiment two of the present invention.The voice Rouser can be by hard The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 6 Shown, voice Rouser 100 includes: SRC 110, computing circuit 120, threshold value decision circuit 130, the second withdrawal device 140, fast Fast tracking filter (Fast Tracking Filter, referred to as: FTF) 150, comparator 160, configurator 170 and interrupt processing Circuit 180.

Wherein, SRC 110 is used to carry out periodic samples to audio signal, wherein in t_iInstance sample obtains sampling letter Number y_i, i is positive integer.Computing circuit 120 is for calculating sampled signal y_iAudio power T_i.Threshold value decision circuit 130 is for sentencing Disconnected audio power T_iWhether t is greater than or equal to_iThe first threshold A at moment₀.The input terminal of second withdrawal device 140 is coupled to SRC 110 output end, the second withdrawal device 140 are used for the second extraction yield 1/z to sampled signal y_iIt is extracted to obtain sampled point yf, Wherein, z is the natural number greater than x.The input terminal of FTF 150 is coupled to the output end of the second withdrawal device 140, and FTF 150 is used for The quick tracking filter of sampled point yf progress is obtained to extraction and obtains the second noise energy F₀.The input terminal of comparator 160 is coupled to The output end of FTF 150, comparator 160 are used in audio power T_iLess than t_iThe first threshold A at moment₀In the case where, it is relatively more each The first threshold at moment and the second noise energy F₀Difference whether be greater than preset second threshold value M₁；And work as from t_i-mMoment is straight To t_iMoment respective first threshold A₀With the second noise energy F₀Difference be both greater than preset second threshold value M₁The case where Under, triggering interrupt processing circuit 180 exports interruption pulse signal to interrupt control circuit 200, is enabled by interrupt control circuit 200 DSP or processor 300 carry out VAD, and m is positive integer and m is less than i.Configurator 170 is used for when VAD is detected successfully, according to second Noise energy F₀Generate third threshold value A₂, and by third threshold value A₂As t_i+1The first threshold A at moment₀, it is issued to threshold value judgement Circuit 130.

The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 3, realization principle and skill Art effect is similar, and details are not described herein again.

On the basis of the above embodiments, configurator can be specifically used for: by the second noise energy F₀As third threshold value A₂；Alternatively, by the second noise energy F₀With preset second correction amount N₁The sum of be used as third threshold value A₂, i.e. A₂=F₀+N₁；Or Person, by the second noise energy F₀With preset second coefficient a₁Product as third threshold value A₂, i.e. A₂=a₁×F₀, etc., this hair Bright embodiment is not limited system.

Optionally, configurator 170 can be also used for: record t_iMoment is to reduce the threshold value moment；Work as t_iMoment and upper one reduces The time interval at threshold value moment is greater than preset value T_timeWhen, it executes above-mentioned by third threshold value A₂As t_i+1The first threshold A at moment₀ The step of, otherwise, do not execute above-mentioned by third threshold value A₂As t_i+1The first threshold A at moment₀The step of, thus prevent first Threshold value A₀Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability.

With reference to Fig. 5, configurator 17 be can be also used for: in audio power T_iLess than t_iThe first threshold A at moment₀, and t_iMoment First threshold A₀With the first noise energy S₀Difference be greater than preset third threshold value M₂In the case where, according to the first noise Energy S₀Generate the 4th threshold value A₃, and by the 4th threshold value A₃As t_i+1The first threshold A at moment₀。

At this point, the device of the present embodiment, can be used for executing the technical solution of embodiment of the method shown in Fig. 4, realize former Reason is similar with technical effect, and details are not described herein again.

Further, configurator 17 can be specifically used for: by the first noise energy S₀As the 4th threshold value A₃；Alternatively, by first Noise energy S₀With preset third correction amount N₂The sum of be used as the 4th threshold value A₃, i.e. A₃=S₀+N₂；Alternatively, by the first noise energy Measure S₀With preset third coefficient a₂Product as the 4th threshold value A₃, i.e. A₃=a₂×S₀, etc., the embodiment of the present invention not as Limitation.

Further, configurator 17 can be also used for: record t_iMoment is to reduce the threshold value moment；Work as t_iMoment and upper one The time interval for reducing the threshold value moment is greater than preset value T_timeWhen, it executes above-mentioned by the 4th threshold value A₃As t_i+1First threshold at moment Value A₀The step of, otherwise, do not execute above-mentioned by the 4th threshold value A₃As t_i+1The first threshold A at moment₀The step of, it thus prevents First threshold A₀Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability

With reference to Fig. 5 and Fig. 6, the first withdrawal device 14 and the second withdrawal device 140 realize long period or short-period number respectively According to extraction.STF 15 is the filter of a slow convergence, is changed for tenacious tracking ambient noise.FTF 150 is one fast The convergent filter of speed, changes for quick tracking environmental noise.Optionally, STF 15 is the filter of a slow convergence, Change for tenacious tracking ambient noise.STF15 and FTF 150 is used to track the energy of current calculating window, use and operation Circuit 12 or the similar structure of computing circuit 120.The order and parameter that the difference of STF 15 and FTF 150 is filter are not Together, and the order of filter and parameter are set according to actual debugging situation.FTF 150 is used to carry out short cycle filter Wave, that is, the data variation that occurs recently are capable of the output of rapid contribution filter.STF 15 is long period filtering, that is, Influence of the data variation occurred recently to the output of filter is smaller and slow.

Optionally, on the basis of Fig. 5, in conjunction with Fig. 6, structure as shown in Figure 7 is obtained.Fig. 7 is voice of the present invention wake-up The structural schematic diagram of Installation practice three.As shown in fig. 7, voice Rouser 1000 includes: SRC 11, computing circuit 12, threshold Be worth decision circuit 13, the first withdrawal device 14, the second withdrawal device 140, STF 15, FTF 150, comparator 16, configurator 17 and in Disconnected processing circuit 18.

Wherein, threshold value decision circuit 13 is also equipped with the effect and function of threshold value decision circuit 130；Comparator 16 is also equipped with ratio Compared with the effect and function of device 160；Configurator 17 is also equipped with the effect and function of configurator 170；Interrupt processing circuit 18 is also equipped with The effect and function of interrupt processing circuit 180.Concrete principle such as above-described embodiment, details are not described herein again.

In several embodiments provided herein, it should be understood that revealed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the unit or module It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or module It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or module It closes or communicates to connect, can be electrical property, mechanical or other forms.

The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of voice awakening method characterized by comprising

Calculate the sampled signal y_iAudio power T_i；

In the audio power T_iMore than or equal to the t_iThe first threshold A at moment₀In the case where, carry out voice activation detection VAD；

Fail when VAD is detected, and in the t_iContinuous n times detect failure and the first noise energy S before moment₀With it is described t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀When, according to the first noise energy S₀Generate the Two threshold value As₁, and by the second threshold A₁As t_i+1The first threshold A at moment₀, wherein the first noise energy S₀It is logical It crosses with the first extraction yield 1/x to the sampled signal y_iIt is extracted, and tracking at a slow speed is carried out to the sampled point ys extracted and is filtered Wave obtains, and x is the natural number greater than 1, and n is positive integer and n is less than i.

2. the method according to claim 1, wherein described according to the first noise energy S₀Generate the second threshold Value A₁, comprising:

By the first noise energy S₀As the second threshold A₁；

3. the method according to claim 1, wherein calculating the sampled signal y described_iAudio power T_i Later, further includes:

In the audio power T_iLess than the t_iThe first threshold A at moment₀, and from t_i-mMoment is until t_iMoment respective first Threshold value A₀With the second noise energy F₀Difference be both greater than preset second threshold value M₁In the case where, VAD is carried out, m is positive integer And m is less than i；

When VAD is detected successfully, according to the second noise energy F₀Generate third threshold value A₂, and by the third threshold value A₂Make For t_i+1The first threshold A at moment₀, wherein the second noise energy F₀It is by being believed with the second extraction yield 1/z the sampling Number y_iIt is extracted, and quick tracking filter is carried out to the sampled point yf extracted and is obtained, wherein z is the natural number greater than x.

4. according to the method described in claim 3, it is characterized in that, described according to the second noise energy F₀Generate third threshold Value A₂, comprising:

By the second noise energy F₀As the third threshold value A₂；

5. the method according to claim 3 or 4, which is characterized in that by the third threshold value A₂As t_i+1The of moment One threshold value A₀Before, further includes:

Record the t_iMoment is to reduce the threshold value moment；

As the t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, it executes described by described the Three threshold value As₂As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute described by the third threshold value A₂As t_i+1When The first threshold A at quarter₀The step of.

6. the method according to claim 1, wherein calculating the sampled signal y described_iAudio power T_i Later, further includes:

In the audio power T_iLess than the t_iThe first threshold A at moment₀, and the t_iThe first threshold A at moment₀With described One noise energy S₀Difference be greater than preset third threshold value M₂In the case where, according to the first noise energy S₀Generate the Four threshold value As₃, and by the 4th threshold value A₃As t_i+1The first threshold A at moment₀。

7. according to the method described in claim 6, it is characterized in that, described according to the first noise energy S₀Generate the 4th threshold Value A₃, comprising:

By the first noise energy S₀As the 4th threshold value A₃；

8. method according to claim 6 or 7, which is characterized in that by the 4th threshold value A₃As t_i+1The of moment One threshold value A₀Before, further includes:

Record the t_iMoment is to reduce the threshold value moment；

As the t_iMoment and upper one reduces the time interval at threshold value moment greater than preset value T_timeWhen, it executes described by described the Four threshold value As₃As t_i+1The first threshold A at moment₀The step of, otherwise, do not execute described by the 4th threshold value A₃As t_i+1When The first threshold A at quarter₀The step of.

9. a kind of voice Rouser characterized by comprising

Sampling frequency converter SRC, for carrying out periodic samples to audio signal, wherein in t_iInstance sample obtains sampling letter Number y_i, i is positive integer；

Computing circuit, for calculating the sampled signal y_iAudio power T_i；

Threshold value decision circuit, for judging the audio power T_iWhether the t is greater than or equal to_iThe first threshold A at moment₀；? The audio power T_iMore than or equal to the t_iThe first threshold A at moment₀In the case where, it triggers in interrupt processing circuit output Disconnected pulse signal enables processor by the interrupt control circuit and carries out voice activation detection VAD to interrupt control circuit；

First withdrawal device, the input terminal of the withdrawal device are coupled to the output end of the SRC, for 1/x pairs of the first extraction yield The sampled signal y_iIt is extracted to obtain sampled point ys, x is the natural number greater than 1；

Tracking filter STF at a slow speed, the input terminal of the STF are coupled to the output end of first withdrawal device, for described Extraction obtains sampled point ys progress, and tracking filter obtains the first noise energy S at a slow speed₀；

Comparator, the input terminal of the comparator are coupled to the output end and the threshold value decision circuit of first withdrawal device, For the first noise energy S₀With the t_iThe first threshold A at moment₀Difference whether be greater than preset first thresholding Value M₀；

Configurator, for failing when VAD detection, and in the t_iContinuous n times detection failure and described first before moment Noise energy S₀With the t_iThe first threshold A at moment₀Difference be greater than preset first threshold value M₀When, according to described first Noise energy S₀Generate second threshold A₁, and by the second threshold A₁As t_i+1The first threshold A at moment₀, it is issued to described Threshold value decision circuit, n is positive integer and n is less than i.

10. device according to claim 9, which is characterized in that the configurator is specifically used for:

By the first noise energy S₀As the second threshold A₁；

11. device according to claim 9, which is characterized in that further include:

The input terminal of second withdrawal device, second withdrawal device is coupled to the output end of the SRC, for the second extraction yield 1/ Z is to the sampled signal y_iIt is extracted to obtain sampled point yf, wherein z is the natural number greater than x；

Fast tracking filter FTF, the input terminal of the FTF are coupled to the output end of second withdrawal device, for described Extraction obtains the quick tracking filter of sampled point yf progress and obtains the second noise energy F₀；

The comparator, the input terminal of the comparator are coupled to the output end of the FTF, are also used in the audio power T_i Less than the t_iThe first threshold A at moment₀In the case where, the first threshold at more each moment and the second noise energy F₀'s Whether difference is greater than preset second threshold value M₁；And work as from t_i-mMoment is until t_iMoment respective first threshold A₀With described Two noise energy F₀Difference be both greater than preset second threshold value M₁In the case where, it triggers in the interrupt processing circuit output Disconnected pulse signal gives the interrupt control circuit, enables the processor by the interrupt control circuit and carries out VAD, m is positive whole It counts and m is less than i；

The configurator is also used to when VAD is detected successfully, according to the second noise energy F₀Generate third threshold value A₂, and will The third threshold value A₂As t_i+1The first threshold A at moment₀, it is issued to the threshold value decision circuit.

12. device according to claim 11, which is characterized in that the configurator is specifically used for:

By the second noise energy F₀As the third threshold value A₂；

13. device according to claim 11 or 12, which is characterized in that the configurator is also used to:

Record the t_iMoment is to reduce the threshold value moment；

14. device according to claim 9, which is characterized in that the configurator is also used to:

15. device according to claim 14, which is characterized in that the configurator is specifically used for:

By the first noise energy S₀As the 4th threshold value A₃；

16. device according to claim 14 or 15, which is characterized in that the configurator is also used to:

Record the t_iMoment is to reduce the threshold value moment；