CN105261368B - A kind of voice awakening method and device - Google Patents

A kind of voice awakening method and device Download PDF

Info

Publication number
CN105261368B
CN105261368B CN201510549435.6A CN201510549435A CN105261368B CN 105261368 B CN105261368 B CN 105261368B CN 201510549435 A CN201510549435 A CN 201510549435A CN 105261368 B CN105261368 B CN 105261368B
Authority
CN
China
Prior art keywords
threshold
moment
threshold value
noise energy
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510549435.6A
Other languages
Chinese (zh)
Other versions
CN105261368A (en
Inventor
马涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Gaohang Intellectual Property Operation Co ltd
Nanjing Advanced Biomaterials And Process Equipment Research Institute Co ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510549435.6A priority Critical patent/CN105261368B/en
Publication of CN105261368A publication Critical patent/CN105261368A/en
Application granted granted Critical
Publication of CN105261368B publication Critical patent/CN105261368B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephone Function (AREA)

Abstract

The embodiment of the present invention provides a kind of voice awakening method and device.This method comprises: carrying out periodic samples to audio signal, wherein in tiInstance sample obtains sampled signal;Calculate the audio power of sampled signal;It is greater than or equal to t in audio poweriWhen the first threshold at moment, wakes up DSP and carry out voice activation detection VAD;Fail when VAD is detected, and in tiContinuous n times detect failure and the first noise energy and t before momentiWhen the difference of the first threshold at moment is greater than preset first threshold value, second threshold is generated according to the first noise energy, and using second threshold as ti+1The first threshold at moment, wherein the first noise energy is that tracking filter obtains at a slow speed by being extracted with the first extraction yield 1/x to sampled signal, and to the sampled point progress extracted.The embodiment of the present invention can reduce the number for carrying out VAD, realize the reduction of terminal power consumption under noisy environment.

Description

A kind of voice awakening method and device
Technical field
The present embodiments relate to voice awakening technology more particularly to a kind of voice awakening methods and device.
Background technique
With the development of science and technology, terminal generally has voice arousal function, and user wakes up terminal simultaneously using voice Corresponding voice control is carried out to it.
Current voice wake up scheme be using microphone activation detection (Microphone Activity Detection, Referred to as: MAD) circuit and digital signal processor (Digital Signal Processor, referred to as: DSP) two-stage cooperation is to call out Awake terminal.Wherein, it if the energy for the current audio signals that MAD circuit detects is greater than preset threshold, wakes up DSP and carries out language Sound activation detection (Voice Activity Detection, referred to as: VAD), with by VAD identify above-mentioned audio signal whether be The voice of user;If so, waking up terminal;If it is not, it is Lost wake-up or false wake-up that DSP, which wakes up,.Specifically, VAD passes through comparison The feature of the voice of the feature and user of above-mentioned audio signal, come judge voice signal whether be user voice.
Scheme is waken up using above-mentioned voice, when terminal is in different environment, such as is switched to by quiet environment noisy Under environment, since preset threshold is fixed, the phenomenon that often will appear Lost wake-up or false wake-up, terminal is caused to exist Power consumption under noisy environment is higher.
Summary of the invention
The embodiment of the present invention provides a kind of voice awakening method and device, to reduce power consumption of the terminal under noisy environment.
In a first aspect, the embodiment of the present invention provides a kind of voice awakening method, comprising:
Periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive integer;
Calculate the sampled signal yiAudio power Ti
In the audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, carry out voice activation Detect VAD;
When VAD continuous n times detection failure, and when VAD detection failure, and in the tiContinuous n times inspection before moment Dendrometry loses and the first noise energy S0With the tiThe first threshold A at moment0Difference be greater than preset first threshold value M0 When, according to the first noise energy S0Generate second threshold A1, and by the second threshold A1As ti+1First threshold at moment Value A0, wherein the first noise energy S0Be by with the first extraction yield 1/x to the sampled point yiIt is extracted, and to pumping The sampled point ys of taking-up carries out tracking filter at a slow speed and obtains, and x is the natural number greater than 1, and n is positive integer and n is less than i.
With reference to first aspect, in the first possible implementation of the first aspect, described according to first noise Energy S0Generate second threshold A1, comprising:
By the first noise energy S0As the second threshold A1
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1
With reference to first aspect, in the second possible implementation of the first aspect, the sampling letter is calculated described Number yiAudio power TiLater, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment is respective First threshold A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, VAD is carried out, m is positive Integer and m are less than i;
When VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and by the third threshold value A2As ti+1The first threshold A at moment0, wherein the second noise energy F0It is by being adopted with the second extraction yield 1/z to described Sample signal yiIt is extracted, and quick tracking filter is carried out to the sampled point yf extracted and is obtained, wherein z is the nature greater than x Number.
The possible implementation of second with reference to first aspect, in the third possible implementation of first aspect In, it is described according to the second noise energy F0Generate third threshold value A2, comprising:
By the second noise energy F0As the third threshold value A2
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2
Second with reference to first aspect or the third possible implementation, in the 4th kind of possible reality of first aspect In existing mode, by the third threshold value A2As ti+1The first threshold A at moment0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute State third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As ti+1The first threshold A at moment0The step of.
With reference to first aspect, in the fifth possible implementation of the first aspect, the sampling letter is calculated described Number yiAudio power TiLater, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With institute State the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0It is raw At the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
The 5th kind of possible implementation with reference to first aspect, in the 6th kind of possible implementation of first aspect In, it is described according to the first noise energy S0Generate the 4th threshold value A3, comprising:
By the first noise energy S0As the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3
The 5th kind with reference to first aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of first aspect In existing mode, by the 4th threshold value A3As ti+1The first threshold A at moment0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute State the 4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As ti+1The first threshold A at moment0The step of.
Second aspect, the embodiment of the present invention provide a kind of voice Rouser, comprising:
Sampling frequency converter SRC, for carrying out periodic samples to audio signal, wherein in tiInstance sample obtains Sampled signal yi, i is positive integer;
Computing circuit, for calculating the sampled signal yiAudio power Ti
Threshold value decision circuit, for judging the audio power TiWhether the t is greater than or equal toiThe first threshold at moment A0;In the audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, triggering interrupt processing circuit is defeated Interruption pulse signal enables digital signal processor DSP or processor by the interrupt control circuit to interrupt control circuit out It carries out voice activation and detects VAD;
The input terminal of first withdrawal device, first withdrawal device is coupled to the output end of the SRC, for extracting with first Rate 1/x is to the sampled signal yiIt is extracted to obtain sampled point ys, x is the natural number greater than 1;
Tracking filter STF at a slow speed, the input terminal of the STF are coupled to the output end of first withdrawal device, for pair The extraction obtains sampled point ys progress, and tracking filter obtains the first noise energy S at a slow speed0
Comparator, the input terminal of the comparator are coupled to and the output end of the STF and the threshold value decision circuit, use In the first noise energy S0With the tiThe first threshold A at moment0Difference whether be greater than preset first threshold value M0
Configurator, for failing when VAD detection, and in the tiContinuous n times detection failure and described before moment First noise energy S0With the tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to described First noise energy S0Generate second threshold A1, and by the second threshold A1As ti+1The first threshold A at moment0, it is issued to The threshold value decision circuit, n is positive integer and n is less than i.
In conjunction with second aspect, in the first possible implementation of the second aspect, the configurator is specifically used for:
By the first noise energy S0As the second threshold A1
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1
In conjunction with second aspect, in a second possible implementation of the second aspect, further includes:
The input terminal of second withdrawal device, second withdrawal device is coupled to the output end of the SRC, for extracting with second Rate 1/z is to the sampled signal yiIt is extracted to obtain sampled point yf, wherein z is the natural number greater than x;
Fast tracking filter FTF, the input terminal of the FTF are coupled to the output end of second withdrawal device, for pair The extraction obtains the quick tracking filter of sampled point yf progress and obtains the second noise energy F0Second noise energy;
The comparator, the output end with the FTF are also used in the audio power TiLess than the tiThe of moment One threshold value A0In the case where, the first threshold at more each moment and the second noise energy F0Difference whether be greater than it is preset Second threshold value M1;And work as from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy F0Difference Both greater than preset second threshold value M1In the case where, the interrupt processing circuit output interruption pulse signal is triggered in described Disconnected control circuit enables the DSP by the interrupt control circuit or the processor carries out VAD, and m is positive integer and m is less than i;
The configurator is also used to when VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, And by the third threshold value A2As ti+1The first threshold A at moment0, it is issued to the threshold value decision circuit.
In conjunction with second of possible implementation of second aspect, in the third possible implementation of second aspect In, the configurator is specifically used for:
By the second noise energy F0As the third threshold value A2
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2
In conjunction with second of second aspect or the third possible implementation, in the 4th kind of possible reality of second aspect In existing mode, the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute State third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As ti+1The first threshold A at moment0The step of.
In conjunction with second aspect, in a fifth possible implementation of the second aspect, the configurator is also used to:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With institute State the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0It is raw At the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect In, the configurator is specifically used for:
By the first noise energy S0As the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3
In conjunction with the 5th kind of second aspect or the 6th kind of possible implementation, in the 7th kind of possible reality of second aspect In existing mode, the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by institute State the 4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As ti+1The first threshold A at moment0The step of.
The embodiment of the present invention provides a kind of voice awakening method and device, passes through and obtains tiInstance sample obtains sampled signal yiAudio power Ti, and in audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD;When VAD detection failure, and in tiContinuous n times detect failure and the first noise energy S before moment0With tiFirst threshold at moment Value A0Difference be greater than preset first threshold value M0When, adjust first threshold A0Size, obtain ti+1The first threshold at moment A0: according to the first noise energy S0Generate second threshold A1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein, First noise energy S0Be by with the first extraction yield 1/x to sampled signal yiExtracted, and to the sampled point ys extracted into Tracking filter obtains row at a slow speed, that is to say, that ti+1The first threshold A at moment0It is according to tiThe first noise energy S at moment0? It arrives, in this way, terminal can adjust the first threshold A of subsequent time according to ambient noise present0Size, make the of each moment One threshold value A0The reduction of terminal power consumption under noisy environment is realized with environments match to reduce the number for carrying out VAD.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, embodiment will be described below Needed in attached drawing do one and simply introduce, it should be apparent that, the accompanying drawings in the following description is some realities of the invention Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is the flow chart of voice awakening method embodiment one of the present invention;
Fig. 2 is the first threshold exemplary diagram of voice awakening method of the present invention under various circumstances;
Fig. 3 is the flow chart of voice awakening method embodiment two of the present invention;
Fig. 4 is the flow chart of voice awakening method embodiment three of the present invention;
Fig. 5 is the structural schematic diagram of voice Rouser embodiment one of the present invention;
Fig. 6 is the structural schematic diagram of voice Rouser embodiment two of the present invention;
Fig. 7 is the structural schematic diagram of voice Rouser embodiment three of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical solution in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention A part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
The meaning that voice wakes up, refers in any case, can be swashed by predefined wake-up word to terminal It is living, and execute specific application.Similar user key-press lights screen, the processing to Activate Phone.The advantages of voice wakes up is liberation The both hands of user.
In the voice wake-up scheme of a smart phone, under quiet environment, the stand-by power consumption of the smart phone is about 2.2 millis × 3.8 volts of peace;Under noisy environment, the stand-by power consumption of the smart phone is 5.5 milliamperes × 3.8 volts.As it can be seen that the intelligence hand Power consumption difference of the machine under noisy environment and quiet environment is about 12 milliwatts, (5.5-2.2) × 3.8=12.
According to power consumption estimation model: therefore average power consumption=peace and quiet power consumption × noisy power consumption × 30% of 70%+ is considered as The power consumption under noisy environment is reduced, the embodiment of the present invention pays close attention to the optimised power consumption under noisy environment.
The embodiment of the present invention provides the method and device that a kind of voice wakes up digital signal processor in terminal, is called out with reducing DSP carries out the number of VAD in terminal of waking up, and realizes the reduction of power consumption of the terminal under noisy environment.
Fig. 1 is the flow chart of voice awakening method embodiment one of the present invention.This method can be executed by voice Rouser, The device can be realized by way of hardware.Voice Rouser can integrate in such as tablet computer, smart phone, palm In the terminals such as computer (Personal Digital Assistant, referred to as: PDA).As shown in Figure 1, voice awakening method includes:
S101, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole Number.
Similarly, ti-1The sampled signal at moment can be denoted as yi-1, ti+1The sampled signal at moment can be denoted as yi+1, with this Analogize, is not listed one by one here.
Wherein, in any embodiment of the present invention, audio signal can be collected for sound collection equipments such as microphones Signal.By sampling frequency converter (Sample Rate Convertor, referred to as: SRC) to sound collection equipments such as microphones Collected audio signal carries out periodic samples.Alternatively, the collected audio signal of the sound collection equipments such as microphone is passed through After crossing the filter process such as bandpass filter, then periodic samples are carried out by SRC, the embodiment of the present invention does not limit it System.
S102, sampled signal y is calculatediAudio power Ti
It is carried out after obtaining sampled signal it should be noted that can be to the calculating of the audio power of sampled signal , such as: in ti-1Instance sample obtains sampled signal yi-1Afterwards, sampled signal y can also be calculatedi-1Corresponding audio power Ti-1
It will be appreciated by those skilled in the art that because of sampled signal yiBe it is certain, therefore, sampled signal yiAudio power TiIt can be obtained by calculating.
Specifically, sampled signal y is indicated using x (j)iIn the amplitude of jth sampled point, x (j) × x (j) indicates sampled signal yiIn the energy size at jth moment, j is 0 to the integer between M-1, and M is total number of sample points, coefficient ajFor indicating each sampling The weight size of point, TiIndicate sampled signal yiAudio power.For example, following formula is a normalized processing, specifically Indicate the percentage that each sampled point is occupied in integral energy:
Wherein,
Here it only illustrates and calculates sampled signal yiAudio power Ti, the embodiment of the present invention is not limited system, can also To obtain sampled signal y by root mean square (Root mean square, referred to as: RMS) or other similar modeiAudio power Ti, such as without normalized processing, etc..
S103, in audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD.
Wherein, the elements such as DSP or the processor that specifically can be in terminal of VAD are carried out.
S104, fail when VAD is detected, and in tiContinuous n times detect failure and the first noise energy S before moment0 With tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to the first noise energy S0Generate second Threshold value A1, and by second threshold A1As ti+1The first threshold A at moment0, wherein the first noise energy S0It is by being taken out with first Take rate 1/x to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1 Natural number, n is that positive integer and n are less than i.
It should be noted that when VAD detection failure, and in the tiContinuous n times detection unsuccessfully refers to before moment: tiThe VAD that moment carries out detects failure, and from ti-nMoment is to ti-1The VAD detection that moment carries out fails, specifically, it is assumed that n It is 2, then fails when VAD is detected, and in the tiContinuous n times detection unsuccessfully refers to before moment: in tiThe VAD that moment carries out Before detection failure, continuous two moment are (i.e. from ti-2Moment is to ti-1Moment) carried out the continuous failure of VAD detection 2 times.Into One step, for the ease of more fully understanding technical solution of the present invention, VAD detection is unsuccessfully illustrated, such as: when Before be automobile engine sound, due to the sound audio power be greater than current time first threshold A0, then need to carry out VAD still passes through VAD, it can be determined that goes out the voice that the sound is not user, therefore VAD detection failure.In other words, if Terminal is in high-noise environment, correspondingly, the noise energy of ambient noise can be relatively high, once the noise energy of ambient noise Greater than the first threshold A at current time0, it is necessary to start VAD, however, examining since ambient noise itself is disorderly and unsystematic in VAD , can not be from the voice signal being wherein tested with when survey, therefore will lead to VAD detection failure.First noise energy S0It indicates The energy level of the steady-state noise of terminal local environment.First threshold value M0It is preset parameter, can be determined by debugging.
It should also be noted that, first and second for distinguishing same term in any embodiment of the present invention, For example, " second " in " first " and " second threshold " of " first threshold ", the name side only distinguished to different threshold values Formula does not represent the order between threshold value.
Under actual application scenarios, the noise under different application scene is of different sizes.For example, making an uproar under quiet environment About 30 to 35 decibels of sound (decibel, referred to as: db);Under noisy environment, ambient noise can refer to following data: mall noise About 60db, road noise about 70db, aircraft cabin noise about 70db, public transport noise about 80db, metro noise about 90db, etc..Separately Outside, the noise size in same place, different time is also different.For example, the noise in same place, daytime and evening may phase Poor 10 to 15db.
Furthermore when user converses under noisy environment, talks, speech volume can be subconsciously improved, to mention High signal-to-noise ratio (Signal Noise Ratio, referred to as: SNR), it is basic to provide feasibility for voice wake-up.
Therefore, at present using unified noise gate, i.e. preset threshold, voice wake up scheme, when voice wakes up terminal, It cannot be distinguished and treat quiet environment and noisy environment, if preset threshold setting is excessively high, will lead to voice missing inspection;If preset threshold is set It sets too low, then will lead to frequent wake-up processor, and then cause power consumption bigger than normal.
In embodiments of the present invention, the first threshold A at each moment is adjusted in due course0Size.
Specifically, it by S101 to S103, obtains in tiInstance sample obtains sampled signal yiAudio power TiAnd it should Audio power TiOpposite tiThe first threshold A at moment0Size, and work as audio power TiMore than or equal to tiThe first threshold at moment A0In the case where, VAD is carried out, so that DSP or processor etc. carry out VAD and according to VAD's as a result, judging whether to wake up terminal. Wherein, VAD is detected successfully, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal yiIn detect the language of user Sound then wakes up terminal;Otherwise, VAD detection failure, i.e. DSP or processor etc. can carry out the element of VAD in sampled signal yiIn The voice for not detecting user, then do not wake up terminal.
In S104, in the first noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first thresholding Value M0When, show that terminal is currently likely to be at the environment of high background noise.At this point, according to the first noise energy S0Generate the second threshold Value A1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein, the first noise energy S0It is by being extracted with first Rate 1/x is to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, x is greater than 1 Natural number, n are the positive integer less than i.In practical application, sampled signal yiIt may include tiThe voice and environment of the user at moment Noise, alternatively, sampled signal yiIt only include tiThe ambient noise at moment.In tiMoment obtains ti+1The first threshold A at moment0, i.e., ti+1When the moment, terminal executes first threshold used in S103 and S104 in voice awakening method.
If tiIt is that first time voice wakes up that the voice at moment, which wakes up, then tiThe first threshold A at moment0It can be preset.It can To think, preset first threshold A0It is an Optimal Parameters, corresponding a kind of possible application scenarios, for example, by first threshold A0In advance It is set as 50 decibels, it is believed that be the ambient noise thresholding under quiet environment.Wherein, Fig. 2 example show quiet environment with it is noisy First threshold under environment.As shown in Fig. 2, first threshold is higher by the first preset value compared with ambient noise under quiet environment;Noisy ring Under border, first threshold is higher by the second preset value compared with ambient noise.In addition, the first threshold of noisy environment is above quiet environment First threshold.
In addition, S103 can be with are as follows: 1) in audio power TiWith ti-1The audio power T at momenti-1Difference be greater than or wait In tiThe differential threshold A at moment00In the case where, carry out VAD;Alternatively, 2) in audio power TiMore than or equal to tiThe first of moment Threshold value A0, and, audio power TiWith ti-1The audio power T at momenti-1Difference be greater than or equal to tiThe differential threshold A at moment00's In the case of, carry out VAD;Alternatively, 3) in audio power TiMore than or equal to tiThe first threshold A at moment0, or, audio power TiWith ti-1The audio power T at momenti-1Difference be greater than or equal to tiThe differential threshold A at moment00, in the case that the two meets one, Carry out VAD.Wherein, ti-1The audio power T at momenti-1It caches in the terminal, in ti-1Moment calculates sampled signal yi-1's Audio power obtains.
If 1), then similar adjustment tiThe first threshold A at moment0Method, adjust tiThe differential threshold A at moment00;If 2), then similar to adjust tiThe first threshold A at moment0Method, while adjusting tiThe first threshold A at moment0And tiThe difference at moment Threshold value A00;If 3), then similar adjustment tiThe first threshold A at moment0Method, adjust tiThe first threshold A at moment0Or tiWhen The differential threshold A at quarter00
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy Measure TiMore than or equal to tiThe first threshold A at moment0In the case where, carry out VAD;Fail when VAD is detected, and in tiBefore moment Continuous n times detection failure and the first noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first Limit value M0When, adjust first threshold A0Size, obtain ti+1The first threshold A at moment0: according to the first noise energy S0Generate the Two threshold value As1, and by second threshold A1As ti+1The first threshold A at moment0.Wherein, the first noise energy S0It is by with first Extraction yield 1/x is to sampled signal yiIt is extracted, and tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is, It says, ti+1The first threshold A at moment0It is according to tiThe first noise energy S at moment0It obtains, in this way, terminal can be according to current The first threshold A of ambient noise adjustment subsequent time0Size, make the first threshold A at each moment0With environments match, with reduce into The number of row VAD realizes the reduction of terminal power consumption under noisy environment.
In the above-described embodiments, according to the first noise energy S0Generate second threshold A1, may include: by the first noise energy Measure S0As second threshold A1;Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as second threshold A1, That is A1=S0+N0;Alternatively, by the first noise energy S0With preset first coefficient a0Product as second threshold A1, i.e. A1=a0× S0
Wherein, if the first correction amount N0Numerical value it is larger, illustrate second threshold A1In the first noise energy S0On the basis of rise High is fast;If the first correction amount N0Numerical value it is smaller, illustrate second threshold A1In the first noise energy S0On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, the first correction amount N0Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if the first coefficient a0Numerical value it is larger, illustrate second threshold A1First Noise energy S0On the basis of it is raised fast;If the first coefficient a0Numerical value it is smaller, illustrate second threshold A1In the first noise energy S0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the first coefficient a0Size can To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the first noise energy S0With preset first coefficient a0Product, add preset first Correction amount N0As second threshold A1, A1=a0×S0+N0
Fig. 3 is the flow chart of voice awakening method embodiment two of the present invention.As shown in figure 3, this method may include:
S301, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole Number.
S302, sampled signal y is calculatediAudio power Ti
S303, in audio power TiLess than tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment respective One threshold value A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, VAD is carried out, m is positive whole It counts and m is less than i.
Illustratively, if m=2, as audio power TiLess than tiThe first threshold A at moment0, and ti-2The first of moment Threshold value A0With the second noise energy F0Difference be greater than the second threshold value M1, ti-1The first threshold A at moment0With the second noise energy F0Difference be greater than the second threshold value M1And tiThe first threshold A at moment0With the second noise energy F0Difference be greater than second Limit value M1When, carry out VAD.
S304, when VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and by third threshold value A2Make For ti+1The first threshold A at moment0, wherein second noise energy F0Be by with the second extraction yield 1/z to sampled signal yiInto Row extracts, and carries out quick tracking filter to the sampled point yf extracted and obtain, wherein z is the natural number greater than x.
Wherein, illustrating for S301 and S302 can refer to embodiment as shown in Figure 1, and details are not described herein again.
For S303, in audio power TiLess than tiThe first threshold A at moment0In the case where, for the voice of the prior art Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, tiFirst threshold at moment Value A0Suitable for noisy environment, but at this time, terminal is in opposite quiet environment (for example, environment of low ambient noise), to lead Cause sampled signal yiThe missing inspection of the voice of middle user.The embodiment of the present invention changes t by S303 and S304i+1First threshold at moment Value A0, match it with current environment.
When from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy F0Difference it is both greater than default The second threshold value M1When, that is, add up first threshold A occur m+1 times0With the second noise energy F0Difference be greater than preset second Threshold value M1The case where, illustrate that terminal is now in quiet environment (environment of low ambient noise), current first threshold A0Compared with Greatly, it needs to lower, to match with quiet environment.Wherein, the second threshold value M1It is preset parameter, can be obtained by debugging.
S304, VAD are detected successfully, illustrate sampled signal yiIn include user voice, for the language for avoiding the user The missing inspection of sound, according to the second noise energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold at moment A0.Wherein, second noise energy F0Be by with the second extraction yield 1/z to sampled signal yiIt is extracted, and to extracting Sampled point yf carries out quick tracking filter and obtains, therefore, the second noise energy F0It can reflect to a certain extent locating for terminal The energy level of the transient noise of environment.
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy Measure TiLess than tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment respective first threshold A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, carry out VAD;When VAD is detected successfully, according to the second noise Energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold A at moment0.Wherein, second noise energy F0Be by with the second extraction yield 1/z to sampled signal yiIt is extracted, and quickly tracking filter is carried out to the sampled point yf extracted Wave obtains, that is to say, that ti+1The first threshold A at moment0It is according to tiThe second noise energy F at moment0It obtains, in this way, eventually End can adjust the first threshold A of subsequent time according to ambient noise present0Size, make the first threshold A at each moment0With ring Border matching realizes that terminal in the case where the reduction of power consumption, is further kept away under noisy environment to reduce the number for carrying out VAD Exempt from sampled signal yiThe missing inspection of the voice of middle user.
In the above-described embodiments, according to the second noise energy F0Generate third threshold value A2, can specifically include: by Two noise energy F0As third threshold value A2;Alternatively, by the second noise energy F0With preset second correction amount N1The sum of as the Three threshold value As2, i.e. A2=F0+N1;Alternatively, by the second noise energy F0With preset second coefficient a1Product as third threshold value A2, That is A2=a1×F0
Wherein, if the second correction amount N1Numerical value it is larger, illustrate third threshold value A2In the second noise energy F0On the basis of rise High is fast;If the second correction amount N1Numerical value it is smaller, illustrate third threshold value A2In the second noise energy F0On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, the second correction amount N1Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if the second coefficient a1Numerical value it is larger, illustrate third threshold value A2Second Noise energy F0On the basis of it is raised fast;If the second coefficient a1Numerical value it is smaller, illustrate third threshold value A2In the second noise energy F0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, the second coefficient a1Size can To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the second noise energy F0With preset second coefficient a1Product, add preset second Correction amount N1As third threshold value A2, A2=a1×F0+N1
Fig. 4 is the flow chart of voice awakening method embodiment three of the present invention.As shown in figure 4, this method may include:
S401, periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive whole Number.
S402, sampled signal y is calculatediAudio power Ti
S403, in audio power TiLess than tiThe first threshold A at moment0, and tiThe first threshold A at moment0With the first noise Energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0Generate the 4th threshold value A3, and By the 4th threshold value A3As ti+1The first threshold A at moment0
Wherein, illustrating for S401 and S402 can refer to embodiment as shown in Figure 1, and details are not described herein again.
As for S403, in audio power TiLess than tiThe first threshold A at moment0In the case where, for the voice of the prior art Wake-up scheme, no longer progress VAD, in this way it is possible to the case where voice missing inspection of user occur.For example, tiFirst threshold at moment Value A0Suitable for noisy environment, but at this time, terminal is in relatively quiet environment, so as to cause sampled signal yiThe language of middle user The missing inspection of sound.The embodiment of the present invention changes t by S403i+1The first threshold A at moment0, it is made to match with current environment.
Work as tiThe first threshold A at moment0With the first noise energy S0Difference be greater than preset third threshold value M2When, That is, tiThe first threshold A at moment0Compare the first noise energy S0It is larger, illustrate that terminal is now in relatively quiet environment, tiThe first threshold A at moment0It is larger, need to lower, with environments match.Wherein, third threshold value M2It is preset parameter, it can be with It is obtained by debugging.
Because of the first noise energy S0Be by with the first extraction yield 1/x to sampled signal yiIt is extracted, and to extracting Sampled point ys carries out tracking filter at a slow speed and obtains, therefore the first noise energy S0The stable energy of reaction environment.Therefore, S403 is not necessarily to As S303, the first threshold A at more multiple moment0With the first noise energy S0Difference be greater than preset third threshold value M2.Work as tiThe first threshold A at moment0With the first noise energy S0Difference be greater than preset third threshold value M2When, it can illustrate Sampled signal yiIn include user voice, for avoid the user voice missing inspection, according to the first noise energy S0Generate the 4th Threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
The embodiment of the present invention is by obtaining tiInstance sample obtains sampled signal yiAudio power Ti, and in the audio energy Measure TiLess than tiThe first threshold A at moment0, and tiThe first threshold A at moment0With the first noise energy S0Difference be greater than it is preset Third threshold value M2In the case where, according to the first noise energy S0Generate the 4th threshold value A3, and by the 4th threshold value A3As ti+1When The first threshold A at quarter0.Wherein, first noise energy S0Be by with the first extraction yield 1/x to sampled signal yiIt is extracted, And tracking filter at a slow speed is carried out to the sampled point ys extracted and is obtained, that is to say, that ti+1The first threshold A at moment0It is according to ti The first noise energy S at moment0It obtains, in this way, terminal can adjust the first threshold of subsequent time according to ambient noise present Value A0Size, make the first threshold A at each moment0Realize terminal noisy to reduce the number for carrying out VAD with environments match Under environment in the case where the reduction of power consumption, sampled signal y is further avoidediThe missing inspection of the voice of middle user.
Based on the above embodiment, wherein according to the first noise energy S0Generate the 4th threshold value A3It may include: that first makes an uproar Acoustic energy S0As the 4th threshold value A3;Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold Value A3, i.e. A3=S0+N2;Alternatively, by the first noise energy S0With preset third coefficient a2Product as the 4th threshold value A3, i.e. A3 =a2×S0
Wherein, if third correction amount N2Numerical value it is larger, illustrate the 4th threshold value A3In the first noise energy S0On the basis of rise High is fast;If third correction amount N2Numerical value it is smaller, illustrate the 4th threshold value A3In the first noise energy S0On the basis of it is raised Slowly, raised speed degree can be set according to actual needs.Wherein, third correction amount N2Size can be according to actual scene It is set, the embodiment of the present invention not limits.Equally, if third coefficient a2Numerical value it is larger, illustrate the 4th threshold value A3First Noise energy S0On the basis of it is raised fast;If third coefficient a2Numerical value it is smaller, illustrate the 4th threshold value A3In the first noise energy S0On the basis of it is raised slow, raised speed degree can be set according to actual needs.Wherein, third coefficient a2Size can To be set according to actual scene, the embodiment of the present invention is not limited.
It optionally, can also be by the first noise energy S0With preset third coefficient a2Product, add preset third Correction amount N2As the 4th threshold value A3, i.e. A3=a2×S0+N2
Supplementary explanation, the second correction amount N1With third correction amount N2Reflect respectively under different conditions, first threshold A0The numerical value of relative noise energy lift.Wherein, first threshold A0Opposite second noise energy F0Big second correction amount N1, first Threshold value A0Opposite first noise energy S0Big third correction amount N2.In addition, due to the first noise energy S0For tracking filter at a slow speed, Second noise energy F0For quick tracking filter, therefore, optionally, third correction amount N2Greater than the second correction amount N1, with realization pair The Rapid matching of environment.
Further, the embodiment of the present invention can also record the scene of first threshold variation.For increasing first threshold Scene, can recorde as the rise threshold moment;For reducing the scene of first threshold, can recorde to reduce the threshold value moment.
Specifically, by third threshold value A2As ti+1The first threshold A at moment0Before, this method can also include: record tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes It is above-mentioned by third threshold value A2As ti+1The first threshold A at moment0The step of, otherwise, do not execute above-mentioned by third threshold value A2As ti+1The first threshold A at moment0The step of.
By the 4th threshold value A3As ti+1The first threshold A at moment0Before, this method can also include: record tiMoment To reduce the threshold value moment;Work as tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, execution is above-mentioned will 4th threshold value A3As ti+1The first threshold A at moment0The step of, otherwise, do not execute above-mentioned by the 4th threshold value A3As ti+1Moment First threshold A0The step of.
Above two concrete implementation mode can prevent first threshold A0Pingpang handoff, while not influencing speech detection Reliability reduces voice false dismissal probability.
The embodiment of the present invention continues to monitor and tracking environmental ambient noise, according to the size adaptation tune of environmental background noise Whole first threshold A0, and to first threshold A0The slow mode for rising or dropping slowly is taken in adjustment, to reduce voice false dismissal probability.Separately Outside, first threshold A0Dynamic regulation so that power consumption under quiet environment and noisy environment is close, so as to promote user's body It tests, improves product competitiveness.
Fig. 5 is the structural schematic diagram of voice Rouser embodiment one of the present invention.The voice Rouser can be by hard The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 5 It is shown, voice Rouser 10 include: SRC 11, computing circuit 12, threshold value decision circuit 13, the first withdrawal device 14, at a slow speed with Track filter (Slow Tracking Filter, referred to as: STF) 15, comparator 16, configurator 17 and interrupt processing circuit 18.
Wherein, SRC 11 is used to carry out periodic samples to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive integer.Computing circuit 12 is for calculating sampled signal yiAudio power Ti.Threshold value decision circuit 13 is for judging sound Frequency energy TiWhether t is greater than or equal toiThe first threshold A at moment0;In audio power TiMore than or equal to tiThe first threshold at moment A0In the case where, triggering interrupt processing circuit 18 exports interruption pulse signal to interrupt control circuit 20, by interrupt control circuit 20 enabled DSP or processor 30 carry out VAD.The input terminal of first withdrawal device 14 is coupled to the output end of SRC 11, and first extracts Device 14 is used for the first extraction yield 1/x to sampled signal yiIt is extracted to obtain sampled point ys and be exported, x is the nature greater than 1 Number.The input terminal of STF 15 is coupled to the output end of the first withdrawal device 14, and STF 15 is used to obtain extraction sampled point ys to carry out Tracking filter obtains the first noise energy S at a slow speed0.The input terminal of comparator 16 is coupled to output end and the threshold value judgement of STF 15 Circuit 13, comparator 16 is for comparing the first noise energy S0With tiThe first threshold A at moment0Difference whether be greater than it is preset First threshold value M0.Configurator 17 is used for when VAD detection failure, and in tiContinuous n times detection failure, Yi Ji before moment One noise energy S0With tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to the first noise energy Measure S0Generate second threshold A1, and by second threshold A1As ti+1The first threshold A at moment0, it is issued to threshold value decision circuit 13, N is positive integer and n is less than i.
With reference to Fig. 5, configurator 17 is 10 configuration parameter of voice Rouser, such as above-mentioned first threshold A0Deng.This field Technical staff is appreciated that configurator 17 receives the configuration parameter for carrying out self terminal, and configuration parameter is converted into waking up voice The corresponding control signal of each logic module in device 10, wherein logic module includes computing circuit 12, threshold value decision circuit 13 With interrupt processing circuit 18 etc..SRC 11 can specifically sample audio signal by the way of down-sampled, such as by 32 The data of kilohertz (kilohertz, referred to as: KHz) are converted to 16KHz etc..
Sampled signal yiFlow direction in Fig. 5 are as follows:
SRC 11-> computing circuit 12-> threshold value decision circuit 13-> interrupt processing circuit 18 (optional)-> interruption control Circuit 20 (optional)-> DSP or processor 30 (optional) processed.
In audio power TiMore than or equal to tiThe first threshold A at moment0In the case where, sampled signal yiFlow direction include Above-mentioned optional part;In audio power TiLess than tiThe first threshold A at moment0In the case where, sampled signal yiFlow direction do not wrap Include above-mentioned optional part.
First withdrawal device 14, STF 15 and comparator 16 do not influence normal voice and wake up, and are only used for total with configurator 17 Same-action changes the first threshold A in voice wake-up0
The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1, realization principle and skill Art effect is similar, and details are not described herein again.
In the above-described embodiments, configurator 17 can be specifically used for: by the first noise energy S0As second threshold A1;Alternatively, By the first noise energy S0With preset first correction amount N0The sum of be used as second threshold A1, i.e. A1=S0+N0;Alternatively, by first Noise energy S0With preset first coefficient a0Product as second threshold A1, i.e. A1=a0×S0, etc., the embodiment of the present invention is not As limitation.
Fig. 6 is the structural schematic diagram of voice Rouser embodiment two of the present invention.The voice Rouser can be by hard The mode of part is realized.The voice Rouser can integrate in the terminals such as such as tablet computer, smart phone, PDA.Such as Fig. 6 Shown, voice Rouser 100 includes: SRC 110, computing circuit 120, threshold value decision circuit 130, the second withdrawal device 140, fast Fast tracking filter (Fast Tracking Filter, referred to as: FTF) 150, comparator 160, configurator 170 and interrupt processing Circuit 180.
Wherein, SRC 110 is used to carry out periodic samples to audio signal, wherein in tiInstance sample obtains sampling letter Number yi, i is positive integer.Computing circuit 120 is for calculating sampled signal yiAudio power Ti.Threshold value decision circuit 130 is for sentencing Disconnected audio power TiWhether t is greater than or equal toiThe first threshold A at moment0.The input terminal of second withdrawal device 140 is coupled to SRC 110 output end, the second withdrawal device 140 are used for the second extraction yield 1/z to sampled signal yiIt is extracted to obtain sampled point yf, Wherein, z is the natural number greater than x.The input terminal of FTF 150 is coupled to the output end of the second withdrawal device 140, and FTF 150 is used for The quick tracking filter of sampled point yf progress is obtained to extraction and obtains the second noise energy F0.The input terminal of comparator 160 is coupled to The output end of FTF 150, comparator 160 are used in audio power TiLess than tiThe first threshold A at moment0In the case where, it is relatively more each The first threshold at moment and the second noise energy F0Difference whether be greater than preset second threshold value M1;And work as from ti-mMoment is straight To tiMoment respective first threshold A0With the second noise energy F0Difference be both greater than preset second threshold value M1The case where Under, triggering interrupt processing circuit 180 exports interruption pulse signal to interrupt control circuit 200, is enabled by interrupt control circuit 200 DSP or processor 300 carry out VAD, and m is positive integer and m is less than i.Configurator 170 is used for when VAD is detected successfully, according to second Noise energy F0Generate third threshold value A2, and by third threshold value A2As ti+1The first threshold A at moment0, it is issued to threshold value judgement Circuit 130.
The device of the present embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 3, realization principle and skill Art effect is similar, and details are not described herein again.
On the basis of the above embodiments, configurator can be specifically used for: by the second noise energy F0As third threshold value A2;Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as third threshold value A2, i.e. A2=F0+N1;Or Person, by the second noise energy F0With preset second coefficient a1Product as third threshold value A2, i.e. A2=a1×F0, etc., this hair Bright embodiment is not limited system.
Optionally, configurator 170 can be also used for: record tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one reduces The time interval at threshold value moment is greater than preset value TtimeWhen, it executes above-mentioned by third threshold value A2As ti+1The first threshold A at moment0 The step of, otherwise, do not execute above-mentioned by third threshold value A2As ti+1The first threshold A at moment0The step of, thus prevent first Threshold value A0Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability.
With reference to Fig. 5, configurator 17 be can be also used for: in audio power TiLess than tiThe first threshold A at moment0, and tiMoment First threshold A0With the first noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise Energy S0Generate the 4th threshold value A3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
At this point, the device of the present embodiment, can be used for executing the technical solution of embodiment of the method shown in Fig. 4, realize former Reason is similar with technical effect, and details are not described herein again.
Further, configurator 17 can be specifically used for: by the first noise energy S0As the 4th threshold value A3;Alternatively, by first Noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3, i.e. A3=S0+N2;Alternatively, by the first noise energy Measure S0With preset third coefficient a2Product as the 4th threshold value A3, i.e. A3=a2×S0, etc., the embodiment of the present invention not as Limitation.
Further, configurator 17 can be also used for: record tiMoment is to reduce the threshold value moment;Work as tiMoment and upper one The time interval for reducing the threshold value moment is greater than preset value TtimeWhen, it executes above-mentioned by the 4th threshold value A3As ti+1First threshold at moment Value A0The step of, otherwise, do not execute above-mentioned by the 4th threshold value A3As ti+1The first threshold A at moment0The step of, it thus prevents First threshold A0Pingpang handoff, while not influencing the reliability of speech detection, reduce voice false dismissal probability
With reference to Fig. 5 and Fig. 6, the first withdrawal device 14 and the second withdrawal device 140 realize long period or short-period number respectively According to extraction.STF 15 is the filter of a slow convergence, is changed for tenacious tracking ambient noise.FTF 150 is one fast The convergent filter of speed, changes for quick tracking environmental noise.Optionally, STF 15 is the filter of a slow convergence, Change for tenacious tracking ambient noise.STF15 and FTF 150 is used to track the energy of current calculating window, use and operation Circuit 12 or the similar structure of computing circuit 120.The order and parameter that the difference of STF 15 and FTF 150 is filter are not Together, and the order of filter and parameter are set according to actual debugging situation.FTF 150 is used to carry out short cycle filter Wave, that is, the data variation that occurs recently are capable of the output of rapid contribution filter.STF 15 is long period filtering, that is, Influence of the data variation occurred recently to the output of filter is smaller and slow.
Optionally, on the basis of Fig. 5, in conjunction with Fig. 6, structure as shown in Figure 7 is obtained.Fig. 7 is voice of the present invention wake-up The structural schematic diagram of Installation practice three.As shown in fig. 7, voice Rouser 1000 includes: SRC 11, computing circuit 12, threshold Be worth decision circuit 13, the first withdrawal device 14, the second withdrawal device 140, STF 15, FTF 150, comparator 16, configurator 17 and in Disconnected processing circuit 18.
Wherein, threshold value decision circuit 13 is also equipped with the effect and function of threshold value decision circuit 130;Comparator 16 is also equipped with ratio Compared with the effect and function of device 160;Configurator 17 is also equipped with the effect and function of configurator 170;Interrupt processing circuit 18 is also equipped with The effect and function of interrupt processing circuit 180.Concrete principle such as above-described embodiment, details are not described herein again.
The embodiment of the present invention continues to monitor and tracking environmental ambient noise, according to the size adaptation tune of environmental background noise Whole first threshold A0, and to first threshold A0The slow mode for rising or dropping slowly is taken in adjustment, to reduce voice false dismissal probability.Separately Outside, first threshold A0Dynamic regulation so that power consumption under quiet environment and noisy environment is close, so as to promote user's body It tests, improves product competitiveness.
In several embodiments provided herein, it should be understood that revealed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the unit or module It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or module It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of equipment or module It closes or communicates to connect, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (16)

1. a kind of voice awakening method characterized by comprising
Periodic samples are carried out to audio signal, wherein in tiInstance sample obtains sampled signal yi, i is positive integer;
Calculate the sampled signal yiAudio power Ti
In the audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, carry out voice activation detection VAD;
Fail when VAD is detected, and in the tiContinuous n times detect failure and the first noise energy S before moment0With it is described tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to the first noise energy S0Generate the Two threshold value As1, and by the second threshold A1As ti+1The first threshold A at moment0, wherein the first noise energy S0It is logical It crosses with the first extraction yield 1/x to the sampled signal yiIt is extracted, and tracking at a slow speed is carried out to the sampled point ys extracted and is filtered Wave obtains, and x is the natural number greater than 1, and n is positive integer and n is less than i.
2. the method according to claim 1, wherein described according to the first noise energy S0Generate the second threshold Value A1, comprising:
By the first noise energy S0As the second threshold A1
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1
3. the method according to claim 1, wherein calculating the sampled signal y describediAudio power Ti Later, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and from ti-mMoment is until tiMoment respective first Threshold value A0With the second noise energy F0Difference be both greater than preset second threshold value M1In the case where, VAD is carried out, m is positive integer And m is less than i;
When VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and by the third threshold value A2Make For ti+1The first threshold A at moment0, wherein the second noise energy F0It is by being believed with the second extraction yield 1/z the sampling Number yiIt is extracted, and quick tracking filter is carried out to the sampled point yf extracted and is obtained, wherein z is the natural number greater than x.
4. according to the method described in claim 3, it is characterized in that, described according to the second noise energy F0Generate third threshold Value A2, comprising:
By the second noise energy F0As the third threshold value A2
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2
5. the method according to claim 3 or 4, which is characterized in that by the third threshold value A2As ti+1The of moment One threshold value A0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by described the Three threshold value As2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As ti+1When The first threshold A at quarter0The step of.
6. the method according to claim 1, wherein calculating the sampled signal y describediAudio power Ti Later, further includes:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With described One noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0Generate the Four threshold value As3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
7. according to the method described in claim 6, it is characterized in that, described according to the first noise energy S0Generate the 4th threshold Value A3, comprising:
By the first noise energy S0As the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3
8. method according to claim 6 or 7, which is characterized in that by the 4th threshold value A3As ti+1The of moment One threshold value A0Before, further includes:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by described the Four threshold value As3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As ti+1When The first threshold A at quarter0The step of.
9. a kind of voice Rouser characterized by comprising
Sampling frequency converter SRC, for carrying out periodic samples to audio signal, wherein in tiInstance sample obtains sampling letter Number yi, i is positive integer;
Computing circuit, for calculating the sampled signal yiAudio power Ti
Threshold value decision circuit, for judging the audio power TiWhether the t is greater than or equal toiThe first threshold A at moment0;? The audio power TiMore than or equal to the tiThe first threshold A at moment0In the case where, it triggers in interrupt processing circuit output Disconnected pulse signal enables processor by the interrupt control circuit and carries out voice activation detection VAD to interrupt control circuit;
First withdrawal device, the input terminal of the withdrawal device are coupled to the output end of the SRC, for 1/x pairs of the first extraction yield The sampled signal yiIt is extracted to obtain sampled point ys, x is the natural number greater than 1;
Tracking filter STF at a slow speed, the input terminal of the STF are coupled to the output end of first withdrawal device, for described Extraction obtains sampled point ys progress, and tracking filter obtains the first noise energy S at a slow speed0
Comparator, the input terminal of the comparator are coupled to the output end and the threshold value decision circuit of first withdrawal device, For the first noise energy S0With the tiThe first threshold A at moment0Difference whether be greater than preset first thresholding Value M0
Configurator, for failing when VAD detection, and in the tiContinuous n times detection failure and described first before moment Noise energy S0With the tiThe first threshold A at moment0Difference be greater than preset first threshold value M0When, according to described first Noise energy S0Generate second threshold A1, and by the second threshold A1As ti+1The first threshold A at moment0, it is issued to described Threshold value decision circuit, n is positive integer and n is less than i.
10. device according to claim 9, which is characterized in that the configurator is specifically used for:
By the first noise energy S0As the second threshold A1
Alternatively, by the first noise energy S0With preset first correction amount N0The sum of be used as the second threshold A1
Alternatively, by the first noise energy S0With preset first coefficient a0Product be used as the second threshold A1
11. device according to claim 9, which is characterized in that further include:
The input terminal of second withdrawal device, second withdrawal device is coupled to the output end of the SRC, for the second extraction yield 1/ Z is to the sampled signal yiIt is extracted to obtain sampled point yf, wherein z is the natural number greater than x;
Fast tracking filter FTF, the input terminal of the FTF are coupled to the output end of second withdrawal device, for described Extraction obtains the quick tracking filter of sampled point yf progress and obtains the second noise energy F0
The comparator, the input terminal of the comparator are coupled to the output end of the FTF, are also used in the audio power Ti Less than the tiThe first threshold A at moment0In the case where, the first threshold at more each moment and the second noise energy F0's Whether difference is greater than preset second threshold value M1;And work as from ti-mMoment is until tiMoment respective first threshold A0With described Two noise energy F0Difference be both greater than preset second threshold value M1In the case where, it triggers in the interrupt processing circuit output Disconnected pulse signal gives the interrupt control circuit, enables the processor by the interrupt control circuit and carries out VAD, m is positive whole It counts and m is less than i;
The configurator is also used to when VAD is detected successfully, according to the second noise energy F0Generate third threshold value A2, and will The third threshold value A2As ti+1The first threshold A at moment0, it is issued to the threshold value decision circuit.
12. device according to claim 11, which is characterized in that the configurator is specifically used for:
By the second noise energy F0As the third threshold value A2
Alternatively, by the second noise energy F0With preset second correction amount N1The sum of be used as the third threshold value A2
Alternatively, by the second noise energy F0With preset second coefficient a1Product be used as the third threshold value A2
13. device according to claim 11 or 12, which is characterized in that the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by described the Three threshold value As2As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the third threshold value A2As ti+1When The first threshold A at quarter0The step of.
14. device according to claim 9, which is characterized in that the configurator is also used to:
In the audio power TiLess than the tiThe first threshold A at moment0, and the tiThe first threshold A at moment0With described One noise energy S0Difference be greater than preset third threshold value M2In the case where, according to the first noise energy S0Generate the Four threshold value As3, and by the 4th threshold value A3As ti+1The first threshold A at moment0
15. device according to claim 14, which is characterized in that the configurator is specifically used for:
By the first noise energy S0As the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third correction amount N2The sum of be used as the 4th threshold value A3
Alternatively, by the first noise energy S0With preset third coefficient a2Product be used as the 4th threshold value A3
16. device according to claim 14 or 15, which is characterized in that the configurator is also used to:
Record the tiMoment is to reduce the threshold value moment;
As the tiMoment and upper one reduces the time interval at threshold value moment greater than preset value TtimeWhen, it executes described by described the Four threshold value As3As ti+1The first threshold A at moment0The step of, otherwise, do not execute described by the 4th threshold value A3As ti+1When The first threshold A at quarter0The step of.
CN201510549435.6A 2015-08-31 2015-08-31 A kind of voice awakening method and device Expired - Fee Related CN105261368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510549435.6A CN105261368B (en) 2015-08-31 2015-08-31 A kind of voice awakening method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510549435.6A CN105261368B (en) 2015-08-31 2015-08-31 A kind of voice awakening method and device

Publications (2)

Publication Number Publication Date
CN105261368A CN105261368A (en) 2016-01-20
CN105261368B true CN105261368B (en) 2019-05-21

Family

ID=55101027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510549435.6A Expired - Fee Related CN105261368B (en) 2015-08-31 2015-08-31 A kind of voice awakening method and device

Country Status (1)

Country Link
CN (1) CN105261368B (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3443440B1 (en) * 2016-04-11 2022-03-30 Hewlett-Packard Development Company, L.P. Waking computing devices based on ambient noise
CN106131292B (en) * 2016-06-03 2020-06-30 浙江云澎科技有限公司 Terminal wake-up setting method, wake-up method and corresponding system
CN106297777B (en) * 2016-08-11 2019-11-22 广州视源电子科技股份有限公司 A kind of method and apparatus waking up voice service
CN108447472B (en) 2017-02-16 2022-04-05 腾讯科技(深圳)有限公司 Voice wake-up method and device
CN108536412B (en) * 2017-03-06 2021-01-08 北京君正集成电路股份有限公司 Audio data acquisition method and equipment
CN108536413B (en) * 2017-03-06 2021-03-26 北京君正集成电路股份有限公司 Audio data acquisition method and equipment
CN109243431A (en) * 2017-07-04 2019-01-18 阿里巴巴集团控股有限公司 A kind of processing method, control method, recognition methods and its device and electronic equipment
CN109949831B (en) * 2017-12-20 2021-09-24 青岛海尔智能技术研发有限公司 Method and device for voice recognition in intelligent equipment and computer readable storage medium
CN108198558B (en) * 2017-12-28 2021-01-29 电子科技大学 Voice recognition method based on CSI data
CN109119082A (en) * 2018-10-22 2019-01-01 深圳锐越微技术有限公司 Voice wake-up circuit and electronic equipment
CN109473092B (en) * 2018-12-03 2021-11-16 珠海格力电器股份有限公司 Voice endpoint detection method and device
CN111261143B (en) * 2018-12-03 2024-03-22 嘉楠明芯(北京)科技有限公司 Voice wakeup method and device and computer readable storage medium
CN109671426B (en) * 2018-12-06 2021-01-29 珠海格力电器股份有限公司 Voice control method and device, storage medium and air conditioner
CN109360585A (en) 2018-12-19 2019-02-19 晶晨半导体(上海)股份有限公司 A kind of voice-activation detecting method
CN110048979B (en) * 2019-04-17 2021-05-14 电子科技大学 Multi-domain combined trigger device
CN110390934B (en) * 2019-06-25 2022-07-26 华为技术有限公司 Information prompting method and voice interaction terminal
CN110570861B (en) * 2019-09-24 2022-02-25 Oppo广东移动通信有限公司 Method and device for voice wake-up, terminal equipment and readable storage medium
CN111755002B (en) * 2020-06-19 2021-08-10 北京百度网讯科技有限公司 Speech recognition device, electronic apparatus, and speech recognition method
CN111816178A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment control method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1204766A (en) * 1997-03-25 1999-01-13 皇家菲利浦电子有限公司 Method and device for detecting voice activity
CN1540623A (en) * 2003-11-04 2004-10-27 清华大学 Threshold self-adaptive speech sound detection system
CN101320559A (en) * 2007-06-07 2008-12-10 华为技术有限公司 Sound activation detection apparatus and method
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
CN102314884A (en) * 2011-08-16 2012-01-11 捷思锐科技(北京)有限公司 Voice-activation detecting method and device
CN104216677A (en) * 2013-05-31 2014-12-17 塞瑞斯逻辑公司 Low-power voice gate for device wake-up

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7277853B1 (en) * 2001-03-02 2007-10-02 Mindspeed Technologies, Inc. System and method for a endpoint detection of speech for improved speech recognition in noisy environments
US7231348B1 (en) * 2005-03-24 2007-06-12 Mindspeed Technologies, Inc. Tone detection algorithm for a voice activity detector
US8503686B2 (en) * 2007-05-25 2013-08-06 Aliphcom Vibration sensor and acoustic voice activity detection system (VADS) for use with electronic systems
US9892729B2 (en) * 2013-05-07 2018-02-13 Qualcomm Incorporated Method and apparatus for controlling voice activation
US20140337030A1 (en) * 2013-05-07 2014-11-13 Qualcomm Incorporated Adaptive audio frame processing for keyword detection
US9240182B2 (en) * 2013-09-17 2016-01-19 Qualcomm Incorporated Method and apparatus for adjusting detection threshold for activating voice assistant function
CN104795076B (en) * 2014-01-21 2018-08-14 宁波远志立方能源科技有限公司 A kind of audio method for detecting
CN103888572B (en) * 2014-03-26 2018-08-31 努比亚技术有限公司 A kind of method that mobile terminal and its detection endanger noise

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1204766A (en) * 1997-03-25 1999-01-13 皇家菲利浦电子有限公司 Method and device for detecting voice activity
CN1540623A (en) * 2003-11-04 2004-10-27 清华大学 Threshold self-adaptive speech sound detection system
CN101320559A (en) * 2007-06-07 2008-12-10 华为技术有限公司 Sound activation detection apparatus and method
CN102194452A (en) * 2011-04-14 2011-09-21 西安烽火电子科技有限责任公司 Voice activity detection method in complex background noise
CN102314884A (en) * 2011-08-16 2012-01-11 捷思锐科技(北京)有限公司 Voice-activation detecting method and device
CN104216677A (en) * 2013-05-31 2014-12-17 塞瑞斯逻辑公司 Low-power voice gate for device wake-up

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"强跟踪求积分卡尔曼滤波算法";马丽丽 等;《计算机工程与设计》;20140531;第35卷(第5期);第2页第2段

Also Published As

Publication number Publication date
CN105261368A (en) 2016-01-20

Similar Documents

Publication Publication Date Title
CN105261368B (en) A kind of voice awakening method and device
US10909977B2 (en) Apparatus and method for power efficient signal conditioning for a voice recognition system
CN110244833B (en) Microphone assembly
WO2021139327A1 (en) Audio signal processing method, model training method, and related apparatus
CN104580699B (en) Acoustic control intelligent terminal method and device when a kind of standby
US9721560B2 (en) Cloud based adaptive learning for distributed sensors
TWI474317B (en) Signal processing apparatus and signal processing method
US9443508B2 (en) User programmable voice command recognition based on sparse features
US20150039310A1 (en) Method and Apparatus for Mitigating False Accepts of Trigger Phrases
AU2014243766A1 (en) Speech detection using low power microelectrical mechanical systems sensor
DE112015004522T5 (en) Acoustic device with low power consumption and method of operation
WO2020244257A1 (en) Method and system for voice wake-up, electronic device, and computer-readable storage medium
US11172312B2 (en) Acoustic activity detecting microphone
CN103543814B (en) Signal processing apparatus and signal processing method
CN103901782A (en) Sound control method, electronic device and sound control apparatus
CN110322880A (en) Vehicle-mounted terminal equipment and the method for waking up its multiple interactive voice program
CN106161726A (en) A kind of voice wakes up system and voice awakening method and mobile terminal up
CN106612367A (en) Speech wake method based on microphone and mobile terminal
US10236000B2 (en) Circuit and method for speech recognition
CN115810356A (en) Voice control method, device, storage medium and electronic equipment
CN111128164B (en) Control system for voice acquisition and recognition and implementation method thereof
CN205408096U (en) Digital microphone wind and electronic equipment
CN116705033A (en) System on chip for wireless intelligent audio equipment and wireless processing method
TW202026855A (en) Voice wake-up apparatus and method thereof
CN112885323A (en) Audio information processing method and device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20191211

Address after: 211200 No.1, Xiushan Middle Road, Lishui Economic Development Zone, Nanjing City, Jiangsu Province

Patentee after: NANJING ADVANCED BIOMATERIALS AND PROCESS EQUIPMENT RESEARCH INSTITUTE Co.,Ltd.

Address before: 510000 unit 2414-2416, building, No. five, No. 371, Tianhe District, Guangdong, China

Patentee before: GUANGDONG GAOHANG INTELLECTUAL PROPERTY OPERATION Co.,Ltd.

Effective date of registration: 20191211

Address after: 510000 unit 2414-2416, building, No. five, No. 371, Tianhe District, Guangdong, China

Patentee after: GUANGDONG GAOHANG INTELLECTUAL PROPERTY OPERATION Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190521

Termination date: 20200831

CF01 Termination of patent right due to non-payment of annual fee