CN103903634A

CN103903634A - Voice activation detection (VAD), and method and apparatus for the VAD

Info

Publication number: CN103903634A
Application number: CN201210570563.5A
Authority: CN
Inventors: 江东平; 袁浩; 朱长宝
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2012-12-25
Filing date: 2012-12-25
Publication date: 2014-07-02
Anticipated expiration: 2032-12-25
Also published as: CN109119096B; CN112992188A; CN103903634B; CN109119096A; CN112992188B

Abstract

The invention relates to voice activation detection (VAD), and a method and apparatus for the VAD. The method comprises: obtaining the sub-band signal and the frequency spectrum amplitude of a current frame; according to the sub-band signal, calculating to obtain the frame energy parameter of the current frame and the value of a spectrum gravity center characteristic parameter; according to the background noise energy obtained through estimation of a previous frame, the frame energy parameter of the current frame, and signal-to-noise ratio sub-band energy, calculating to obtain the signal-to-noise ratio parameter of the current frame; and according to a tonal sign, the signal-to-noise ratio parameter, the spectrum gravity center characteristic parameter and the frame energy parameter, calculating to obtain a VAD determination result. The method and apparatus provided by the invention can improve the detection accuracy of unstable noise (such as office noise) and music.

Description

Activate the method and apparatus that sound detects and detect for activating sound

Technical field

The present invention relates to method (comprising that ground unrest detection, tonality input, VAD judgement present frame activate the methods such as the adjustment of signal-noise ratio threshold in the correction of sound maintenance frame number, VAD judgement) and device that a kind of activation sound detects (VAD) and detects for activating sound.

Background technology

In normal voice call, user is speaking sometimes, is sometimes listening, and will occur inactive scale section this time at communication process, and total non-voice activation stage of both call sides will exceed 50% of the total voice coding duration of both call sides under normal circumstances.In inactive scale section, only have ground unrest, ground unrest is conventionally without any useful information.Utilize this fact, in voice frequency signal processing, detect the detection of (VAD) algorithm for activating sound and inactive sound by activating sound, and adopt diverse ways to process respectively.Modern a lot of speech coding standards, as AMR, AMR-WB, all vad enabled function.Aspect efficiency, the VAD of these scramblers can not reach good performance under all typical background noise.Particularly, under astable noise, the VAD efficiency of these scramblers is all lower.And for music signal, these VAD sometimes there will be error-detecting, cause corresponding Processing Algorithm to occur obvious Quality Down.

Summary of the invention

The technical problem to be solved in the present invention is to provide one and activates method (comprising that ground unrest detection, tonality input, VAD adjudicate the methods such as the adjustment of signal-noise ratio threshold in the correction of current activation sound maintenance frame number, VAD judgement) and the device that sound detects (VAD) and detects for activating sound, the accuracy rate detecting to improve VAD.

For solving the problems of the technologies described above, the invention provides a kind of sound that activates and detect (VAD) method, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

Calculate the value of frame energy parameter, spectrum gravity center characteristics parameter and the time domain degree of stability characteristic parameter of present frame according to subband signal; Calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

The ground unrest energy of estimating to obtain according to former frame, the frame energy parameter of present frame and signal to noise ratio (S/N ratio) sub belt energy calculate the signal to noise ratio (S/N ratio) parameter of present frame;

Obtain the tonality mark of present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters;

Calculate VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter.

For solving the problems of the technologies described above, the invention provides a kind of sound that activates and detect (VAD) device, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

Spectral magnitude computing unit, for obtaining the spectral magnitude of present frame;

Characteristic parameter acquiring unit, for calculating the frame energy parameter of present frame, the value of composing gravity center characteristics parameter and time domain degree of stability characteristic parameter according to subband signal; Calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Mark computing unit, for obtaining the tonality mark of present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters;

Snr computation unit, calculates the signal to noise ratio (S/N ratio) parameter of present frame for the frame energy parameter of the ground unrest energy estimating to obtain according to former frame, present frame and signal to noise ratio (S/N ratio) sub belt energy;

VAD decision unit, for calculating VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter.

For solving the problems of the technologies described above, the invention provides a kind of ground unrest detection method, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

The frame energy parameter calculating according to subband signal, the value of composing gravity center characteristics parameter, time domain degree of stability characteristic parameter, calculate the value of composing flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Carry out ground unrest detection according to spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, present frame energy parameter, judge whether present frame is ground unrest.

For solving the problems of the technologies described above, the invention provides a kind of ground unrest pick-up unit, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

Calculation of characteristic parameters unit, for the frame energy parameter calculating according to subband signal, the value of composing gravity center characteristics parameter, time domain degree of stability characteristic parameter, calculates the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Ground unrest judging unit, for carrying out ground unrest detection according to spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, present frame energy parameter, judges whether present frame is ground unrest.

For solving the problems of the technologies described above, the invention provides a kind of tonality signal detecting method, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

Calculate the value of spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter according to subband signal, calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

According to tonality characteristic parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter, judge whether present frame is tonality signal.

For solving the problems of the technologies described above, the invention provides a kind of tonality signal supervisory instrument, this pick-up unit comprises:

Bank of filters, for obtaining the subband signal of present frame;

Calculation of characteristic parameters unit, according to the value that calculates spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter at subband signal, calculates the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Tonality signal judging unit, for judging according to tonality characteristic parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter whether present frame is tonality signal.

For solving the problems of the technologies described above, the invention provides present frame in a kind of VAD judgement and activate the modification method that sound keeps frame number, the method comprises:

Calculate signal to noise ratio (S/N ratio) lt_snr and average full band signal to noise ratio (S/N ratio) SNR2_lt_ave when long;

According to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, keep frame number to revise to current activation sound.

For solving the problems of the technologies described above, the invention provides the correcting device of current activation sound maintenance frame number in a kind of VAD judgement, this correcting device comprises:

Snr computation unit when long, signal to noise ratio (S/N ratio) lt_snr when long for calculating;

Average full band signal to noise ratio (S/N ratio) computing unit, for calculating average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

Activate sound and keep frame number amending unit, for according to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) parameter with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, keep frame number to revise to current activation sound.

For solving the problems of the technologies described above, the invention provides the method for adjustment of signal-noise ratio threshold in a kind of VAD judgement, this method of adjustment comprises:

Calculate the spectrum gravity center characteristics parameter of present frame according to subband signal;

While calculating average long that former frame calculates, activate tone signal energy and the ratio of ground unrest energy when average long, obtain signal to noise ratio (S/N ratio) lt_snr when long;

According to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) when long, above activate continuously sound frame number and above continuing noise frame number continuous_noise_num adjust the signal-noise ratio threshold of VAD judgement.

For solving the problems of the technologies described above, the invention provides the adjusting gear of signal-noise ratio threshold in a kind of VAD judgement, this adjusting gear comprises:

Characteristic parameter acquiring unit, for calculating the spectrum gravity center characteristics parameter of present frame according to subband signal;

Snr computation unit when long, activates tone signal energy and the average ratio of ground unrest energy when long when calculating average long that former frame calculates, obtains signal to noise ratio (S/N ratio) lt_snr when long;

Signal-noise ratio threshold adjustment unit, for according to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) when long, above activate continuously sound frame number and above continuing noise frame number continuous_noise_num adjust the signal-noise ratio threshold of VAD judgement.

The inventive method and device have overcome the shortcoming of existing vad algorithm, also improve the accuracy rate of music detection when raising VAD is to non-stationary noise detection efficiency.Make to adopt the voice frequency signal processing algorithm of this VAD can obtain better performance.

Brief description of the drawings

Fig. 1 is the schematic diagram that the present invention activates sound detection method embodiment 1;

Fig. 2 is the schematic diagram that the present invention activates sound detection method embodiment 2;

Fig. 3 is the process schematic diagram that obtains VAD court verdict in the embodiment of the present invention 1,2;

Fig. 4 is the modular structure schematic diagram that the present invention activates sound detection (VAD) device embodiment 1;

Fig. 5 is the modular structure schematic diagram that the present invention activates sound detection (VAD) device embodiment 2;

Fig. 6 is the modular structure schematic diagram of the VAD decision unit in VAD device of the present invention;

Fig. 7 is the schematic diagram of ground unrest detection method embodiment of the present invention;

Fig. 8 is the modular structure schematic diagram of ground unrest pick-up unit of the present invention;

Fig. 9 is the schematic diagram of tonality signal detecting method embodiment of the present invention;

Figure 10 is the modular structure schematic diagram of tonality signal supervisory instrument of the present invention;

Figure 11 is the modular structure schematic diagram of the tonality signal judging unit of tonality signal supervisory instrument of the present invention;

Figure 12 is the schematic diagram that in VAD judgement of the present invention, current activation sound keeps the modification method embodiment of frame number;

Figure 13 is the modular structure schematic diagram that in VAD judgement of the present invention, current activation sound keeps the correcting device of frame number;

Figure 14 is the schematic diagram of the method for adjustment embodiment of signal-noise ratio threshold in VAD judgement of the present invention;

Figure 15 is the idiographic flow schematic diagram that the present invention adjusts signal-noise ratio threshold;

Figure 16 is the modular structure schematic diagram of the adjusting gear of signal-noise ratio threshold in VAD judgement of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.Obviously, described embodiment is only the present invention's part embodiment, but not whole embodiment.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under the prerequisite of not making creative work, belongs to the scope of protection of the invention.

It should be noted that, in the situation that not conflicting, the feature in embodiment and embodiment in the application can combine mutually.

The present invention activates sound and detects (VAD, Voice Activity Detection) embodiment of the method 1, and as shown in Figure 1, the method comprises:

Step 101: the subband signal and the spectral magnitude that obtain present frame;

In the present embodiment, taking frame length as 20ms, the audio stream that sampling rate is 32kHz is that example illustrates.Under other frame length and sampling rate condition, method of the present invention is applicable equally.

By present frame time-domain signal input filter group, carry out sub-band filter calculating, obtain bank of filters subband signal;

In the present embodiment, adopt the bank of filters of 40 passages, the present invention is applicable equally for adopting the bank of filters of other port numbers.

Present frame time-domain signal is inputted to the bank of filters of 40 passages, carry out sub-band filter calculating, obtain the bank of filters subband signal X[k of 40 subbands on 16 time sampling points, l], 0≤k <, 40,0≤l < 16, wherein k is the index of bank of filters subband, the subband that its value representation coefficient is corresponding, the time sampling point index that l is each subband, implementation step is as follows:

101a: 640 nearest sound signal sample values are stored in data buffer storage.

101b: the data in data buffer storage are moved to 40 positions, 40 sampled values are the earliest shifted out to data buffer storage, and 40 new sampling points are deposited on 0 to 39 position.

Data x in buffer memory is multiplied by window coefficient, obtains array z, calculation equation is as follows:

z[n]＝x[n]·W _qmf[n]；0≤n＜640；

Wherein W _qmffor bank of filters window coefficient.

Adopt following false code to calculate the data u of 80,

\begin{matrix} for (n = 0; n < 80; n + +) \\ {u [n] = 0; \\ for (j = 0; j < 8; j + +) \\ { \\ u [n] + = z [n + j \cdot 80]; \\ } \\ } \end{matrix}

Adopt equation below to calculate array r and i:

r[n]＝u[n]-u[79-n]，0≤n＜40

i[n]＝u[n]+u[79-n]

Adopt equation below to calculate 40 plural subband samples on first time sampling point, X[k, l]=R (k)+iI (k), 0≤k < 40, wherein R (k) and I (k) are respectively real part and the imaginary part of coefficient on l time sampling point of bank of filters subband signal X, and its calculation equation is as follows:

R (k) = Σ_{n = 0}^{39} r (n) \cos [\frac{π}{40} (k + \frac{1}{2}) n]

，0≤k＜40

I (k) = Σ_{n = 0}^{39} i (n) \cos [\frac{π}{40} (k + \frac{1}{2}) n]

101c: repeat the computation process of 101b, until by all device group filtering after filtering of all data of this frame, last Output rusults is bank of filters subband signal X[k, l].

101d: complete above after computation process, obtain the bank of filters subband signal X[k of 16 time sampling points of 40 subbands, l], 0≤k <, 40,0≤l < 16.

Bank of filters subband signal is carried out to time-frequency conversion, and calculate spectral magnitude.

Wherein whole bank of filters subbands or part bank of filters subband are carried out to time-frequency conversion, calculate spectral magnitude, can realize the embodiment of the present invention.Described time-frequency conversion method of the present invention can be DFT, FFT, DCT or DST.It is example that the present embodiment adopts DFT, and its concrete methods of realizing is described.Computation process is as follows:

Be that 16 time sampling point data on each bank of filters subband of 0 to 9 are carried out the DFT conversion of 16 to index, further improve spectral resolution, and calculate the amplitude of each frequency, obtain spectral magnitude X _{dFT_AMP}.

Time-frequency conversion calculation equation is as follows:

X_{DFT} [k, j] = Σ_{l = 0}^{15} X [k, l] \cdot e^{- \frac{2 πi}{16} jl};

0≤k＜9；0≤j＜16；

The amplitude process of calculating each frequency is as follows:

First, calculate array X _dFTthe energy of [k] [j] on each aspect, calculation equation is as follows:

X _{dFT_POW}[k, j]=(real (X _dFT[k, j]) ²+ (image (X _dFT[k, j]) ²; 0≤k < 10; 0≤j < 16; Wherein real (X _{dFT_POW}[k, j]), image (X _{dFT_POW}[k, j]) represent respectively spectral coefficient XD _{fT_POW}real part and the imaginary part of [k, j].

If k is even number, adopt following equation to calculate the spectral magnitude on each frequency:

X_{DFT_AMP} [8 \cdot k + j] = \sqrt{X_{DFT_POW} [k, j] + X_{DFT_POW} [k, 15 - j]};

0≤k＜10；0≤j＜8；

If k is odd number, adopt following equation to calculate the spectral magnitude on each frequency:

X_{DFT_AMP} [8 \cdot k + 7 - j] = \sqrt{X_{DFT_POW} [k, j] + X_{DFT_POW} [k, 15 - j]};

0≤k＜10；0≤j＜8；

X _{dFT_AMP}be the spectral magnitude after time-frequency conversion.

Step 102: calculate the frame energy parameter of present frame and the value of spectrum gravity center characteristics parameter according to subband signal;

The value of frame energy parameter, spectrum gravity center characteristics parameter tunefulness characteristic parameter, can adopt art methods to obtain, and preferably, each parameter is adopted with the following method and obtained:

The weighted stacking value that described frame energy parameter is each subband signal energy or directly superposition value; Particularly:

A) according to bank of filters subband signal X[k, l] calculate the energy of each bank of filters subband, calculation equation is as follows:

E_{sb} [k] = Σ_{l = 0}^{15} ({(real (X [k, l]))}^{2} + (image {(X [k, l])}^{2}));

0≤k＜40；

B) by the part sense of hearing than the energy accumulation of more sensitive bank of filters subband or all bank of filters subband, obtain frame energy parameter.

Wherein according to psychological auditory model, people's ear can be more insensitive to extremely low frequency (as following in 100Hz) and high frequency (more than 20kHz) sound, it is considered herein that the bank of filters subband of arranging from low to high according to frequency, taking penultimate subband to from second son is that the sense of hearing is than more sensitive main bank of filters subband, the part or all of sense of hearing is obtained to frame energy parameter 1 than more sensitive bank of filters sub belt energy is cumulative, and calculation equation is as follows:

E_{t 1} = Σ_{n = e_sb_start}^{e_sb_end} E_{sb} [n];

Wherein, e_sb_start is initial subband index, and its span is [0,6].E_sb_end is for finishing subband index, and its value is greater than 6, is less than sub-band sum.

The value of frame energy parameter 1 adds the partly or entirely weighted value of the energy of untapped bank of filters subband in the time calculating frame energy parameter 1, obtains frame energy parameter 2, and its calculation equation is as follows:

E_{t 2} = E_{t 1} + e_scalel \cdot Σ_{n = 0}^{e_sb_start - 1} E_{sb} [n] + e_scale 2 \cdot Σ_{n = e sb end + 1}^{num_band} E_{sb} [n];

Wherein e_scale1, e_scale2 is weighting scale factor, its span is respectively [0,1]. _{num_band}for the total number of subband.

Spectrum gravity center characteristics parameter be by ask bank of filters sub belt energy weighting summation and with the direct addition of sub belt energy and ratio or obtain by other spectrum gravity center characteristics parameter values are carried out to smothing filtering.

Spectrum gravity center characteristics parameter can adopt following sub-step to realize:

A: by as follows the subband interval division for composing gravity center characteristics calculation of parameter:

B: adopt spectrum gravity center characteristics calculation of parameter interval division mode and the following formula of a, calculate two spectrum gravity center characteristics parameter values, be respectively between the first interval spectrum gravity center characteristics parameter and Second Region and compose gravity center characteristics parameter.

sp_center [k] = \frac{Σ_{n = 0}^{spc_end_band (k) - spc_start_band (k)} (n + 1) \cdot E_{sb} [n + spc_start_band (k)] + Deltal}{Σ_{n = 0}^{spc_end_band (k) - spc_start_band (k)} E_{sb} [n + spc_start_band (k)] + Delta 2};

0≤k＜2

Delta1, Delta2 is respectively a little bias, and span is (0,1).Wherein k is spectrum center of gravity numeral index.

C: to the first interval spectrum gravity center characteristics parameter s p_center[0] carry out smothing filtering computing, smoothly composed gravity center characteristics parameter value, i.e. the smothing filtering value of the first interval spectrum gravity center characteristics parameter value, computation process is as follows:

sp_center[2]＝sp_center _--1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)

Wherein, spc_sm_scale is spectrum center of gravity parameter smoothing filtering scale factor, sp_center _-1[2] the level and smooth spectrum gravity center characteristics parameter value of expression previous frame, its initial value is 1.6.

Step 103: the ground unrest energy of estimating to obtain according to former frame, the frame energy parameter of present frame and signal to noise ratio (S/N ratio) sub belt energy calculate the signal to noise ratio (S/N ratio) parameter of present frame;

The ground unrest energy of former frame can obtain by existing method.

If present frame is start frame, the value of signal to noise ratio (S/N ratio) subband ground unrest energy adopts the initial value of acquiescence.Former frame signal to noise ratio (S/N ratio) subband ground unrest energy estimates that the principle of estimating with the signal to noise ratio (S/N ratio) subband background energy of present frame is identical, and the signal to noise ratio (S/N ratio) subband background energy of present frame is estimated the step 207 vide infra in embodiment 2.Particularly, the signal to noise ratio (S/N ratio) parameter of present frame can adopt existing signal-noise ratio computation method to realize.Preferably, adopt following methods:

First, bank of filters subband is reclassified as to some signal to noise ratio (S/N ratio) subbands, divides index as following table,

Secondly,, according to the dividing mode of signal to noise ratio (S/N ratio) subband, calculate each signal to noise ratio (S/N ratio) sub belt energy of present frame.Accounting equation is as follows:

E_{sb 2} [n] = Σ_{k = SubStartindex (n)}^{Sub_end_index (n)} E_{sb} [k];

0≤n＜13；

Again, according to the ground unrest energy meter operator band average signal-to-noise ratio SNR1 of the energy of each signal to noise ratio (S/N ratio) subband of present frame and each signal to noise ratio (S/N ratio) subband of previous frame.Accounting equation is as follows:

SNR 1 = \frac{1}{num_band} Σ_{n = 0}^{num_band - 1} \log_{2} \frac{E_{sb 2} (n)}{E_{sb 2 bg} (n)}

Wherein E _{sb2_bg}for estimating the ground unrest energy of each signal to noise ratio (S/N ratio) subband of previous frame obtaining, num_band signal to noise ratio (S/N ratio) subband number.The principle of ground unrest energy that obtains previous frame signal to noise ratio (S/N ratio) subband is identical with the principle of signal to noise ratio (S/N ratio) subband background energy that obtains present frame, the process that obtains the signal to noise ratio (S/N ratio) subband background energy of the present frame step 207 of embodiment 2 that vide infra;

Finally, according to the previous frame frame energy parameter with ground unrest energy and present frame entirely of estimating to obtain, calculate full band signal to noise ratio (S/N ratio) SNR2:

SNR 2 = \log_{2} \frac{E_{t 1}}{E_{tbg}}

Wherein E _{t_bg}entirely be with ground unrest energy for estimating the previous frame obtaining, obtain previous frame entirely with ground unrest energy principle with obtain the entirely identical with the principle of ground unrest energy of present frame, obtain present frame entirely with the vide infra step 207 of embodiment 2 of the process of ground unrest energy;

In the present embodiment, signal to noise ratio (S/N ratio) parameter comprises sub-band averaging signal to noise ratio snr 1 and full band signal to noise ratio (S/N ratio) SNR2.Entirely be referred to as ground unrest energy with the ground unrest energy of ground unrest energy and each subband.

Step 104: calculate VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter.

Embodiment 2

The present invention activates sound and detects (VAD) embodiment of the method 2, sound signal to input divides frame to carry out multiphase filtering, obtain bank of filters subband signal, and bank of filters subband signal is further carried out to time-frequency conversion, and calculate spectral magnitude, on each bank of filters subband signal and spectral magnitude, carry out signal characteristic abstraction respectively, obtain each characteristic ginseng value.Calculate the ground unrest mark tunefulness mark of present frame according to characteristic ginseng value.Calculate the signal to noise ratio (S/N ratio) parameter of present frame according to present frame energy parameter value and ground unrest energy, according to the VAD of the signal to noise ratio (S/N ratio) parameter of the present frame calculating, previous frame, (voice activation detects, Voice Activity Detection) court verdict and each characteristic parameter, judge whether present frame is to activate sound frame.According to activating sound frame court verdict, ground unrest mark is revised, obtained new ground unrest mark.Judge whether ground unrest to upgrade according to new ground unrest mark.The detailed process that VAD detects is as follows:

As shown in Figure 2, the method embodiment 2 comprises:

Step 201: the subband signal and the spectral magnitude that obtain present frame;

Step 202: the value that calculates current frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter according to subband signal; Calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

The weighted stacking value that described frame energy parameter is each subband signal energy or directly superposition value;

Described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not;

Particularly,

Calculate spectrum gravity center characteristics parameter according to the energy of each bank of filters subband, spectrum gravity center characteristics parameter be by ask bank of filters sub belt energy weighting summation and with the direct addition of sub belt energy and ratio or obtain by other spectrum gravity center characteristics parameter values are carried out to smothing filtering.

sp_center [k] = \frac{Σ_{n = 0}^{spc_end_band (k) - spc_start_band (k)} (n + 1) \cdot E_{sb} [n + spc_start_band (k)] + Deltal}{Σ_{n = 0}^{spc_end_band (k) - spc_start_band (k)} E_{sb} [n + spc_start_band (k)] + Delta 2};

0≤k＜2

sp_center[2]＝sp_center _-1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_scale)

Wherein, spc_sm_scale is spectrum center of gravity parameter smoothing filtering scale factor, sp_center _-1[2] level and smooth its initial value of spectrum gravity center characteristics parameter value of expression previous frame is 1.6.

The ratio of the expectation of the variance that described time domain degree of stability characteristic parameter is amplitude superposition value and amplitude superposition value square, or this ratio is multiplied by a coefficient;

Particularly,

Calculate time domain degree of stability characteristic parameter by the frame energy parameter of up-to-date some frame signals.Adopt in the present embodiment the frame energy parameter of 40 up-to-date frame signals to calculate time domain degree of stability characteristic parameter.Concrete calculation procedure is:

First, calculate the energy amplitude of nearest 40 frame signals, accounting equation is as follows:

{Amp}_{t 1} [n] = \sqrt{E_{t 2} (n)} + e_offset;

0≤n＜40；

Wherein, e_offset is a bias, and its span is [0,0.1]

Secondly, successively present frame is arrived to the energy amplitude of adjacent two frames of the 40th frame above and be added, obtain 20 amplitude superposition value.Concrete accounting equation is as follows:

Amp _t2(n)＝Amp _t1(-2n)+Amp _t1(-2n-1)；0≤n＜20；

Wherein, when n=0, Amp _t1represent the energy amplitude of present frame, when n < 0, Amp _t1represent the energy amplitude of present frame n frame forward.

Finally, by calculating the nearest variance of 20 amplitude superposition value and the ratio of average energy, obtain time domain degree of stability characteristic parameter 1td_stable_rate0.Calculation equation is as follows:

ltd_stable_rate 0 = \frac{Σ_{n = 0}^{19} {({Amp}_{t 2} (n) - \frac{1}{20} Σ_{j = 0}^{19} {Amp}_{t 2} (j))}^{2}}{Σ_{n = 0}^{19} {Amp}_{t 2} {(n)}^{2} + Delta};

Described spectrum flatness characteristic parameter is the geometric mean of some spectral magnitude and the ratio of arithmetical mean, or this ratio is multiplied by a coefficient;

Particularly, by spectral magnitude X _{dFT_AMP}be divided into several frequency bands, and calculate the spectrum flatness of each frequency band of present frame, obtain the spectrum flatness characteristic parameter of present frame.

Spectral magnitude is divided into 3 frequency bands by the present embodiment, and calculate the spectrum flatness feature of these 3 frequency bands, and its specific implementation step is as follows:

First, by X _{dFT_AMP}be divided into 3 frequency bands according to the index of following table.

Secondly, calculate respectively the spectrum flatness of each subband, obtain the spectrum flatness characteristic parameter of present frame.The accounting equation of each spectrum flatness characteristic ginseng value of present frame is as follows:

SMR (k) = \frac{{(\underset{n &Element; Freq_band (k)}{Π} X_{DFT_AM P} (n))}^{1 / (freq_band_end (k) - freq_band_start (k) + 1)}}{\frac{1}{freq_band_end (k) - freq_band_start (k) + 1} \underset{n &Element; Freq_band (k)}{Σ} X_{DFT_AMP} (n)};

0≤k＜3

Finally, the spectrum flatness characteristic parameter of present frame is carried out to smothing filtering, obtain the final spectrum flatness characteristic parameter of present frame.

sSMR(k)＝smr_scale·sSMR _-1(k)+(1-smr_scale)·SMR(k)；0≤k＜3

Wherein smr_scale is smoothing factor, and its span is [0.6,1], sSMR _-1(k) be the value of k spectrum flatness characteristic parameter of previous frame.。

Tonality characteristic parameter is to obtain by the correlation of spectral difference coefficient in the frame of two frame signals before and after calculating, or continues this correlation to carry out that smothing filtering obtains.

Particularly, in the frame of front and back two frame signals, the computing method of the correlation of spectral difference coefficient are as follows:

Calculate tonality characteristic parameter according to spectral magnitude, wherein tonality characteristic parameter can calculate according to all spectral magnitudes or partial frequency spectrum amplitude.

Its calculation procedure is as follows:

A, by part (being not less than 8 spectral coefficients) or all spectral magnitude do calculus of differences with adjacent spectral magnitude, and difference result be less than to 0 value set to 0, obtain one group of non-negative spectral difference coefficient.

The present embodiment chosen position index is that 3 to 61 frequency coefficient is example, calculates tonality characteristic parameter.Detailed process is as follows:

Frequency 3 is done to calculus of differences to the adjacent spectra amplitude of frequency 61, and equation is as follows:

spec_dif[n-3]＝X _{DFT_AMP}(n+1)-X _{DFT_AMP}(n)；3≤n＜62；

By the variable zero setting that is less than 0 in spec_dif.

B, asks for the non-negative spectral difference coefficient of present frame that step a calculates and the related coefficient of the non-negative spectral difference coefficient of former frame, obtains the first tonality characteristic ginseng value.Calculation equation is as follows:

tonality_ratel = \frac{Σ_{n = 0}^{56} spec_dif [n] \cdot pre_spec_dif [n]}{\sqrt{Σ_{n = 0}^{56} spec_dif {[n]}^{2} \cdot Σ_{n = 0}^{56} pre_spec_dif {[n]}^{2}}}

Wherein, the non-negative spectral difference coefficient that pre_spec_dif is former frame.

C, carries out level and smooth computing to the first tonality characteristic ginseng value, obtains the second tonality characteristic ginseng value.Accounting equation is as follows:

tonality_rate2＝tonal_scale·tonality_rate2 _-1+(1-tonal_scale)·tonality_rate1

Tonal_scale is tonality characteristic parameter smoothing factor, and its span is [0.1,1], tonality_rate2 _-1for the second tonality characteristic ginseng value of former frame, its initial value span is [0,1].

Step 203: the ground unrest energy of estimating to obtain according to former frame, the frame energy parameter of present frame and signal to noise ratio (S/N ratio) sub belt energy calculate the signal to noise ratio (S/N ratio) parameter of present frame;

Step 204: the initial background noise mark tunefulness mark that obtains present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters;

Step 205: calculate VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter;

Particularly, the concrete methods of realizing of this step 205 vide infra in conjunction with the description of Fig. 3.

Understandably, the step before step 205VAD judgement, as long as parameter does not wherein have front and back cause-effect relationship, front and back order is adjustable, such as the step 204 that obtains initial background noise mark tunefulness mark can be before snr computation step 203.

After the initial background noise of present frame identifies and need to revise, for the calculating of next frame signal to noise ratio (S/N ratio) parameter, the operation that therefore obtains the initial background noise mark of present frame also can be after VAD judgement.

Step 206: initial background noise mark is revised according to the court verdict of present frame VAD, tonality characteristic parameter, signal to noise ratio (S/N ratio) parameter, tonality mark, time domain degree of stability characteristic parameter;

If signal to noise ratio (S/N ratio) parameter S NR2 is less than the threshold value SNR2_redec_thr1 of a setting, SNR1 is less than that SNR1_redec_thr1, VAD mark vad_f1ag equals 0, tonality characteristic parameter tonality_rate2 be less than tonality_rate2_thr1, tonality mark tonality_flag equal 0 and time domain degree of stability characteristic parameter lt_stable_rate0 be less than lt_stable_rate0_redec_thr1 (being set to 0.1), ground unrest mark assignment is 1.

Step 207: according to the modified value of ground unrest mark and the frame energy parameter of present frame, the full band ground unrest energy of former frame, obtain the ground unrest energy of present frame; The ground unrest energy of described present frame is for next frame signal to noise ratio (S/N ratio) calculation of parameter.

Judge whether to carry out ground unrest renewal according to ground unrest mark, if ground unrest is designated 1, according to estimating that the ratio of the energy with ground unrest energy and current frame signal carries out ground unrest renewal entirely.Ground unrest energy estimates to comprise that subband ground unrest energy is estimated and full band ground unrest energy is estimated.

A, subband ground unrest energy estimation equation is as follows:

E _{sb2_bg}(k)＝E _{sb2_bg_pre}(k)·α _{bg_e}+E _{sb2_bg}(k)·(1-α _{bg_e})；0≤k＜num_sb

Wherein num_sb is the number of frequency domain subband, E _{sb2_bg_pre}(k) the subband ground unrest energy of k signal to noise ratio (S/N ratio) subband of expression former frame.

α _{bg_e}be that ground unrest upgrades the factor, its value is determined by full band ground unrest energy and the present frame energy parameter of former frame.Computation process is as follows:

If previous frame is with background energy E entirely _{t_bg}be less than the frame energy parameter E of present frame _t1, value 0.96, otherwise value 0.95.

B, is with ground unrest energy to estimate entirely:

If the ground unrest of present frame is designated 1, upgrade ground unrest energy accumulation value E _{t_sum}with ground unrest energy accumulative total frame number N _{et_counter}, accounting equation is as follows:

E _{t_sum}＝E _{t_sum_-1}+E _t1；

N _{Et_counter}＝N _{Et_counter_-1}+1；

Wherein E _{t_sum_-1}for the ground unrest energy accumulation value of former frame, N _{et_counter_-1}the ground unrest energy calculating for former frame adds up frame number.

C, is with ground unrest energy by ground unrest energy accumulation value E entirely _{t_sum}with accumulative total frame number N _{et_counter}ratio obtain:

E_{t_bg} = \frac{E_{t_sum}}{N_{Etcounter}}

Judge N _{et_counter}whether equal 64, if N _{et_counter}equal 64 respectively by ground unrest energy accumulation value E _{t_sum}with accumulative total frame number N _{et_counter}take advantage of 0.75.

D, according to tonality mark, frame energy parameter, the value with ground unrest energy is adjusted subband ground unrest energy and ground unrest energy accumulation value entirely.Computation process is as follows:

If tonality mark tonality_flag equals 1 and frame energy parameter E _t1value be less than ground unrest energy feature parameter E _{t_bg}value be multiplied by a gain coefficient gain,

, E _{t_sum}=E _{t_sum}gain+delta; E _{sb2_bg}(k)=E _{sb2_bg}(k) gain+delta;

Wherein, the span of gain is [0.3,1].

In embodiment 1 and embodiment 2, calculate the flow process of VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter, comprise the steps: as shown in Figure 3

Step 301: calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) lt_snr when long;

When average length, activate tone signal energy E _fgwith average ground unrest energy E when long _bgcalculating and definition see step 307.When long, signal to noise ratio (S/N ratio) lt_snr accounting equation is as follows:

in this formula, when long, signal to noise ratio (S/N ratio) lt_snr adopts logarithm to represent.

Step 302: calculate the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of several frames recently, obtain average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

Accounting equation is as follows:

SNR 2_lt_ave = \frac{1}{F_num} Σ_{n = 0}^{F_num} SNR 2 (n)

SNR2 (n) represents the present frame value with signal to noise ratio (S/N ratio) SNR2 entirely of n frame forward, the totalframes that F_num is calculating mean value, and its span is [8,64].

Step 303: according to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) lt_snr when long, above activate continuously sound frame number continuous_speech_num and above continuing noise frame number continuous_noise_num obtain the signal-noise ratio threshold snr_thr of VAD judgement;

Specific implementation step is as follows:

First, the initial value of signal-noise ratio threshold snr_thr is set, scope is [0.1,2], is preferably 1.06.

Secondly, adjust first the value of signal-noise ratio threshold snr_thr according to spectrum gravity center characteristics parameter.Its step is as follows: if spectrum gravity center characteristics parameter s p_center[2] value be greater than the threshold value spc_vad_dec_thr1 of a setting, snr_thr adds a bias, the preferential bias that changes gets 0.05; Otherwise, if sp_center[1] and be greater than spc_vad_dec_thr2, snr_thr adds a bias, the preferential bias that changes gets 0.10; Otherwise snr_thr adds a bias, the preferential bias that changes gets 0.40; Wherein, threshold value spc_vad_dec_thr1 and spc_vad_dec_thr2 span are [1.2,2.5]

Again, according to activate continuously above sound frame number continuous_speech_num, above continuing noise frame number continuous_noise_num, average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave and when long signal to noise ratio (S/N ratio) lt_snr adjust the value of snr_thr for bis-times.If continuous speech number continuous_speech_num is greater than the threshold value cpn_vad_dec_thr1 of a setting above, snr_thr deducts 0.2; Otherwise, if continuing noise number continuous_noise_num is greater than the threshold value cpn_vad_dec_thr2 of a setting above, and SNR2_lt_ave is greater than a bias and adds that signal to noise ratio (S/N ratio) lt_snr is multiplied by coefficient lt_tsnr_scale when long, snr_thr adds a bias, and the preferential bias that changes gets 0.1; Otherwise if continuous_noise_num is greater than the threshold value cpn_vad_dec_thr3 of a setting, snr_thr adds a bias, the preferential bias that changes gets 0.2; Otherwise if continuous_noise_num is greater than the threshold value cpn_vad_dec_thr4 of a setting, snr_thr adds a bias, the preferential bias that changes gets 0.1.Wherein, threshold value cpn_vad_dec_thr1, cpn_vad_dec_thr2, cpn_vad_dec_thr3, cpn_vad_dec_thr4 span is [2,500], coefficient lt_tsnr_scale span is [0,2].Skip this step, directly enter final step, also can realize the present invention.

Finally, during according to length, the value of signal to noise ratio (S/N ratio) lt_snr is finally adjusted signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.

Update equation is as follows:

snr_thr＝snr_thr+(lt_tsnr-thr_offset)·thr_scale；

Wherein, thr_offset is a bias, and its span is [0.5,3]; Thr_scale is a gain coefficient, and its span is [0.1,1].

Step 304: signal to noise ratio (S/N ratio) parameter S NR1, the SNR2 calculating according to decision threshold snr_thr and the present frame of VAD calculates initial VAD judgement;

Computation process is as follows:

If SNR1 is greater than decision threshold snr_thr, judge that present frame is to activate sound frame, indicate by the value of VAD mark vad_flag whether present frame is to activate sound frame, in the present embodiment, represent that by value 1 present frame is to activate sound frame, 0 represents that present frame is inactive sound frame.Otherwise, judging that present frame is inactive sound frame, the value of VAD mark Vad_flag sets to 0.

If SNR2 is greater than the threshold value snr2_thr of a setting, judge that present frame is to activate sound frame, the value of VAD mark vad_flag puts 1.Wherein, the span of snr2_thr is [1.2,5.0]

Step 305: during according to tonality mark, on average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave, spectrum center of gravity and length, signal to noise ratio (S/N ratio) lt_snr revises the court verdict of VAD;

Concrete steps are as follows:

If tonality mark instruction present frame is tonality signal, tonality_flag is 1, judges that present frame is to activate tone signal, and vad_flag mark puts 1.

If the average thresholding SNR2_lt_ave_t_thr1 that is entirely greater than a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave adds that signal to noise ratio (S/N ratio) lt_snr takes advantage of in coefficient lt_tsnr_tscale when long, judge that present frame is to activate sound frame, vad_flag mark puts 1.

Wherein, the span of the present embodiment SNR2_lt_ave_thr1 is [Isosorbide-5-Nitrae], and the span of lt_tsnr_tscale is [0.1,0.6].

If be on average entirely greater than the thresholding SNR2_lt_ave_t_thr2 of a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave, and spectrum gravity center characteristics parameter s p_center[2] be greater than the thresholding sp_center_t_thr1 of a setting and signal to noise ratio (S/N ratio) lt_snr is less than a setting when long thresholding lt_tsnr_t_thr1, judge that present frame is to activate sound frame, vad_f1ag mark puts 1.Wherein, the span of SNR2_lt_ave_t_thr2 is [1.0,2.5], and the span of sp_center_t_thr1 is [2.0,4.0], and the span of lt_tsnr_t_thr1 is [2.5,5.0].

If SNR2_lt_ave is greater than the thresholding SNR2_lt_ave_t_thr3 of a setting, and spectrum gravity center characteristics parameter s p_center[2] be greater than the thresholding sp_center_t_thr2 of a setting and signal to noise ratio (S/N ratio) lt_snr is less than a setting when long thresholding lt_tsnr_t_thr2, judge that present frame is to activate sound frame, vad_flag mark puts 1.Wherein, the span of SNR2_lt_ave_t_thr3 is [0.8,2.0], and the span of sp_center_t_thr2 is [2.0,4.0], and the span of lt_tsnr_t_thr2 is [2.5,5.0].

If SNR2_lt_ave is greater than the thresholding SNR2_lt_ave_t_thr4 of a setting, and spectrum gravity center characteristics parameter s p_center[2] be greater than the thresholding sp_center_t_thr3 of a setting and signal to noise ratio (S/N ratio) lt_snr is less than a setting when long thresholding lt_tsnr_t_thr3, judge that present frame is to activate sound frame, vad_flag mark puts 1.Wherein, the span of SNR2_lt_ave_t_thr4 is [0.6,2.0], and the span of sp_center_t_thr3 is [3.0,6.0], and the span of lt_tsnr_t_thr3 is [2.5,5.0].

Step 306: according to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) parameter with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, revise and activate sound and keep frame number;

Concrete calculation procedure is as follows:

It is to activate phonetic symbol will instruction present frame for activating sound frame that current activation sound keeps the precondition of frame number correction, if do not meet this condition, does not revise the value that current activation sound keeps frame number num_speech_hangover, directly enters step 307.

Activating sound keeps frame number correction step as follows:

If continuous speech frame number continuous_speech_num is less than the threshold value continuous_speech_num_thr1 of a setting above, and lt_tsnr is less than the threshold value lt_tsnr_h_thr1 of a setting, current activation sound keeps frame number num_speech_hangover to equal the minimum sound frame number that activates continuously deducting continuous speech frame number continuous_speech_num above.Otherwise, if SNR2_lt_ave is greater than the threshold value SNR2_lt_ave_thr1 of a setting, and continuous speech frame number continuous_speech_num is greater than the threshold value continuous_speech_num_thr2 of a setting above, during according to length, the size of signal to noise ratio (S/N ratio) lt_tsnr arranges and activates the value that sound keeps frame number num_speech_hangover.Otherwise, do not revise the value that current activation sound keeps frame number num_speech_hangover.It is wherein minimum in the present embodiment that to activate continuously sound frame number value be 8, its can be between [6,20] value.

Concrete steps are as follows:

If signal to noise ratio (S/N ratio) lt_snr is greater than 2.6 when long, the value of num_speech_hangover is 3; Otherwise if signal to noise ratio (S/N ratio) lt_snr is greater than 1.6 while length, the value of num_speech_hangover is 4; Otherwise the value of num_speech_hangover is 5.

Step 307: keep frame number num_speech_hangover interpolation activation sound to keep according to the court verdict of present frame and activation sound, obtain the VAD court verdict of present frame.

Its method is:

If present frame is judged as inactive sound, activate sound and be masked as 0, and activate sound and keep frame number num_speech_hangover to be greater than 0, add and activate sound and keep, arrange and activate sound and be masked as 1,, and the value of num_speech_hangover is subtracted to 1.

Obtain the final VAD court verdict of present frame.

Preferably, after step 304, also comprise the initial court verdict according to VAD, while calculating average length, activate tone signal energy E _fg;after step 307, also comprise ground unrest energy E while calculating average length according to VAD court verdict _bg, calculated value is for next frame VAD judgement.

When average length, activate tone signal energy E _fgconcrete computation process is as follows:

A),, if the initial court verdict instruction of VAD present frame is for activating sound frame, the value of VAD mark is 1, and E _t1be greater than E _bgseveral times, the present embodiment is got 6 times, upgrades and activates sound energy accumulation value fg_energy and the average sound energy accumulation frame number fg_energy_count that activates when long when average long.Update method is that fg_energy adds E _t1obtain new fg_energy.Fg_energy_count adds 1 and obtains new fg_energy_count.

B), in order to ensure that activating tone signal energy while on average length can reflect up-to-date activation tone signal energy, equal some setting value fg_max_frame_num if activate sound energy accumulation frame number value while on average growing, cumulative frame number and accumulated value are multiplied by an attenuation coefficient attenu_coef1 simultaneously.Fg_max_frame_num value 512 in the present embodiment, attenu_coef1 value is 0.75.

C), activate sound energy accumulation value fg_energy when on average growing and activate tone signal energy divided by average activation when sound energy accumulation frame number is on average grown when long, calculation equation is as follows:

E_{fg} = \frac{fg_energy}{fg_energy_count}

Ground unrest energy E when average length _bgcomputing method be:

Suppose that bg_energy_count is ground unrest energy accumulation frame number, the energy that has comprised how many frames for recording the accumulated value of nearest ground unrest energy.Bg_energy is the accumulated value of nearest ground unrest energy.

A), if present frame is judged as inactive sound frame, the value of VAD mark is 0, and SNR2 is less than 1.0, upgrades ground unrest energy accumulation value bg_energy and ground unrest energy accumulation frame number bg_energy_count.Update method is that ground unrest energy accumulation value bg_energy adds E _t1obtain new ground unrest energy accumulation value bg_energy.Ground unrest energy accumulation frame number bg_energy_count adds 1 and obtains new ground unrest energy accumulation frame number bg_energy_count.

B),, if ground unrest energy accumulation frame number bg_energy_count is the maximum count frame number that ground unrest energy calculates while equaling on average to grow, cumulative frame number and accumulated value are multiplied by attenuation coefficient attenu_coef2 simultaneously.Wherein, the maximum count frame number that when the present embodiment is on average grown, ground unrest energy calculates is 512, and attenuation coefficient attenu_coef2 equals 0.75.

C), by ground unrest energy accumulation value bg_energy except in the time that ground unrest energy accumulation frame number is on average grown ground unrest energy calculation equation as follows:

E_{bg} = \frac{bg_energy}{bg_energy_count}

In order to realize above-mentioned activation sound detection method embodiment 1 and 2, the present invention also provides a kind of activation sound to detect (VAD) device embodiment 1, and as shown in Figure 4, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

Characteristic parameter acquiring unit, for calculating the frame energy parameter of present frame and the value of spectrum gravity center characteristics parameter according to subband signal;

Corresponding to embodiment of the method 2, described characteristic parameter acquiring unit, also for calculate the value of time domain degree of stability characteristic parameter according to subband signal, for calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude; ;

Each characteristic parameter can adopt existing method to obtain, and also can adopt following methods to obtain:

Described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains;

Tonality characteristic parameter is to obtain by the correlation of spectral difference coefficient in the frame of two frame signals before and after calculating, or continues that this correlation is carried out to smothing filtering and obtain.

As shown in Figure 5, the present invention activates sound and detects (VAD) device embodiment 2, and as different from Example 1, described device also comprises mark computing unit and ground unrest energy process unit, wherein:

Mark computing unit, for obtain the tonality mark of present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters:

Ground unrest energy process unit, it comprises:

Mark computing module, for obtaining the initial background noise mark of present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters;

Mark correcting module, for revising initial background noise mark according to the court verdict of present frame VAD, tonality characteristic parameter, signal to noise ratio (S/N ratio) parameter, tonality mark, time domain degree of stability characteristic parameter;

Ground unrest energy harvesting module, for according to the modified value of ground unrest mark and the frame energy parameter of present frame, the full band ground unrest energy of former frame, obtain the ground unrest energy of present frame, the ground unrest energy of described present frame is for next frame signal to noise ratio (S/N ratio) calculation of parameter.

Corresponding to embodiment of the method 1 and 2, as shown in Figure 6, described VAD decision unit comprises:

Snr computation module when long, for calculate by former frame average long time activate message

Number energy and the average ratio of ground unrest energy when long, calculate signal to noise ratio (S/N ratio) lt_snr when long;

Average full band signal to noise ratio (S/N ratio) computing module, for calculating the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of several frames recently, obtains average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

Signal-noise ratio threshold computing module, for according to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) lt_snr when long, above activate continuously sound frame number continuous_speech_num and above continuing noise frame number continuous_noise_num obtain the signal-noise ratio threshold snr_thr of VAD judgement;

Initial VAD judging module, calculates initial VAD judgement for the signal to noise ratio (S/N ratio) parameter S NR1, the SNR2 that calculate according to decision threshold snr_thr and the present frame of VAD;

VAD modified result module, during according to tonality mark, on average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave, spectrum center of gravity and length, signal to noise ratio (S/N ratio) lt_snr revises the court verdict of VAD;

Activate sound and keep frame correcting module, for according to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, revise and obtain activating sound and keep frame number;

VAD judging module, for keeping frame number num_speech_hangover interpolation activation sound to keep according to the court verdict of present frame and activation sound, obtains the VAD court verdict of present frame.

More preferably, described VAD decision unit also comprises: energy computing module, for according to the initial court verdict of VAD, activates tone signal energy E while calculating average length _fg; And average ground unrest energy E when long according to VAD court verdict _bgupgrade, the value after renewal is for next frame VAD judgement.

The present invention also provides a kind of ground unrest detection method embodiment, and as shown in Figure 7, the method comprises:

Step 701: the subband signal and the spectral magnitude that obtain present frame;

Step 702: the frame energy parameter calculating according to subband signal, the value of composing gravity center characteristics parameter, time domain degree of stability characteristic parameter, calculate the value of composing flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Preferably, the weighted stacking value that described frame energy parameter is each subband signal energy or directly superposition value.

Described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains.

The ratio of the expectation of the variance that described time domain degree of stability parameter is frame energy amplitude and amplitude superposition value square, or this ratio is multiplied by a coefficient.

Described spectrum flatness parameter is the geometric mean of some spectral magnitude and the ratio of arithmetical mean, or this ratio is multiplied by a coefficient.

Particularly, step 701 and step 702 can adopt method same as above, do not repeat them here.

Step 703: carry out ground unrest detection according to spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, present frame energy parameter, judge whether present frame is ground unrest.

Preferably, the following arbitrary condition of judgement is set up, and judges that present frame is not noise signal:

Described time domain degree of stability parameter l t_stable_rate0 is greater than the threshold value of a setting;

The smothing filtering value of the first interval spectrum gravity center characteristics parameter value is greater than the threshold value of a setting, and time domain degree of stability characteristic ginseng value is also greater than the threshold value of some settings;

Value after tonality characteristic parameter or its smothing filtering is greater than the threshold value of a setting, and time domain degree of stability characteristic parameter lt_stable_rate0 value is greater than the threshold value of its setting;

Value after the spectrum flatness characteristic parameter of each subband or separately smothing filtering is all less than the threshold value of each self-corresponding setting;

Or, judgment frame energy parameter E _t1value be greater than the threshold value E_thr1 of setting.

Particularly, suppose that present frame is ground unrest.

The present embodiment indicates by a ground unrest mark background_flag whether present frame is ground unrest, if and agreement judges that present frame is ground unrest, it is 1 that ground unrest mark background_flag is set, and is 0 otherwise ground unrest mark background_flag is set.

Detect according to time domain degree of stability characteristic parameter, spectrum gravity center characteristics parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, present frame energy parameter whether present frame is noise signal.If not noise signal, ground unrest is identified to background_flag and set to 0.

Detailed process is as follows:

Judge whether time domain degree of stability parameter l t_stable_rate0 is greater than the threshold value lt_stable_rate_thr1 of a setting.If so, judge that present frame is not noise signal, and background_flag is set to 0.The present embodiment threshold value lt_stable_rate_thr1 span is [0.8,1.6];

Whether the level and smooth spectrum of judgement gravity center characteristics parameter value is greater than the threshold value sp_center_thr1 of a setting, and time domain degree of stability characteristic ginseng value is also greater than the threshold value lt_stable_rate_thr2 of some settings.If so, judge that present frame is not noise signal, and background_flag is set to 0.The span of sp_center_thr1 is [1.6,4]; The span of 1t_stable_rate_thr2 be (0,0.1].

Judge whether the value of tonality characteristic parameter tonality_rate2 is greater than the threshold value tonality_rate_thr1 of a setting, whether time domain degree of stability characteristic parameter lt_stable_rate0 value is greater than the threshold value lt_stable_rate_thr3 of setting, if above-mentioned condition is set up simultaneously, judge that present frame is not ground unrest, background_flag assignment is 0.Threshold value tonality_rate_thr1 span is in [0.4,0.66].The span of threshold value lt_stable_rate_thr3 is [0.06,0.3].

Judge spectrum flatness characteristic parameter sSMR[0] value whether be less than the threshold value sSMR_thr1 of setting, judge spectrum flatness characteristic parameter sSMR[1] value whether be less than the threshold value sSMR_thr2 of setting, judge and compose flatness characteristic parameter sSMR[2] value whether be less than the sSMR_thr3 of setting.If above-mentioned condition is set up simultaneously, judge that present frame is not ground unrest.Background_flag assignment is 0.The span of threshold value sSMR_thr1, sSMR_thr2, sSMR_thr3 is [0.88,0.98].Judge flatness characteristic parameter sSMR[0] value whether be less than the threshold value sSMR_thr4 of setting, judge spectrum flatness characteristic parameter sSMR[1] value whether be less than the threshold value sSMR_thr5 of setting, judge and compose flatness characteristic parameter sSMR[1] value whether be less than the threshold value sSMR_thr6 of setting.If above-mentioned arbitrary condition is set up, judge that present frame is not ground unrest.Background_flag assignment is 0.The span of sSMR_thr4, sSMR_thr5, sSMR_thr6 is [0.80,0.92]

Judgment frame energy parameter E _t1value whether be greater than the threshold value E_thr1 of setting, if above-mentioned condition is set up, judge that present frame is not ground unrest.Background_flag assignment is 0.E_thr1 carries out value according to the dynamic range of frame energy parameter.

If present frame is not detected is not ground unrest, represent that present frame is ground unrest.

Corresponding to said method, the present invention also provides a kind of ground unrest pick-up unit, and as shown in Figure 8, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

Preferably, the following arbitrary condition of described ground unrest judging unit judgement is set up, and judges that present frame is not noise signal:

The present invention also provides a kind of tonality signal detecting method, and as shown in Figure 9, method comprises:

Step 901: the subband signal and the spectral magnitude that obtain present frame;

Step 902: calculate the spectrum gravity center characteristics parameter of present frame, the value of time domain degree of stability characteristic parameter according to subband signal, calculate the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Preferably, described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains; The ratio of the expectation of the variance that described time domain degree of stability characteristic parameter is amplitude superposition value and amplitude superposition value square, or this ratio is multiplied by a coefficient;

Step 903: judge according to tonality characteristic parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter whether present frame is tonality signal.

When step 903 determines whether tonality signal, carry out following operation:

A) suppose that current frame signal is non-tonality signal, and indicate whether present frame is tonality frame with a tonality flag of frame tonality_frame.

In the present embodiment, the value of tonality_frame is that 1 expression present frame is tonality frame, and 0 represents that present frame is non-tonality frame;

B) judge whether the value of tonality_rate2 after tonality characteristic parameter tonality_ratel or its smothing filtering is greater than threshold value tonality_decision_thr1 or the tonality_decision_thr2 of corresponding setting, if above-mentioned condition has an establishment, perform step C), otherwise execution step D);

Wherein, the span of tonality_decision_thr1 is [0.5,0.7], and the span of tonality_ratel is [0.7,0.99].

If C time domain degree of stability characteristic ginseng value lt_stable_rate0 is less than the threshold value lt_stable_decision_thr1 of a setting; Spectrum gravity center characteristics parameter value sp_center[1] be greater than the threshold value spc_decision_thr1 of a setting, and the spectrum flatness characteristic parameter of each subband is all less than each self-corresponding default threshold value, particularly, spectrum flatness characteristic parameter sSMR[0] be less than threshold value sSMF_decision_thr1 or the sSMR[1 of a setting] be less than threshold value sSMF_decision_thr2 or the sSMR[2 of a setting] be less than the threshold value sSMF_decision_thr3 of a setting; Judge that present frame is tonality frame, the value that tonality flag of frame tonality_frame is set is 1, otherwise is judged as non-tonality frame, and the value that tonality flag of frame tonality_frame is set is 0.And continuation execution step D.

Wherein, the span of threshold value lt_stable_decision_thr1 is [0.01,0.25], and spc_decision_thr1 is [1.0,1.8], sSMF_decision_thr1 is [0.6,0.9], sSMF_decision_thr2[0.6,0.9], sSMF_decision_thr3[0.7,0.98].

D) according to tonality flag of frame tonality_frame, tonality degree characteristic parameter tonality_degree is upgraded, wherein tonality extent index tonality_degree initial value arranges in the time that activation sound detection device is started working, span is [0,1].In different situations, tonality degree characteristic parameter tonality_degree computing method difference:

If current tonality flag of frame instruction present frame is tonality frame, adopt following equation to upgrade tonality degree characteristic parameter tonality_degree:

tonality_degree＝tonality_degree _-1·td_scale_A+td_scale_B；

Wherein, tonality_degree _-1for the tonality degree characteristic parameter of former frame.Its initial value span is [0,1].Td_scale_A is attenuation coefficient, and its span is [0,1]; Td_scale_B is cumulative coefficient, and its span is [0,1].

E) judge according to the tonality degree characteristic parameter tonality_degree after upgrading whether present frame is tonality signal, and the value of tonality mark tonality_flag is set.

Particularly, if tonality degree characteristic parameter tonality_degree is greater than the threshold value of certain setting, judge that present frame is tonality signal, otherwise, judge that present frame is non-tonality signal.

Corresponding to aforementioned tonality signal detecting method, the present invention also provides a kind of tonality signal supervisory instrument, and as shown in figure 10, this pick-up unit comprises:

Bank of filters, for obtaining the subband signal of present frame;

Calculation of characteristic parameters unit, for calculate current spectrum gravity center characteristics parameter, the value of time domain degree of stability characteristic parameter according to subband signal, calculates the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

As previously mentioned, described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains;

As shown in figure 11, described tonality signal judging unit comprises:

Tonality signal initialization module, is non-tonality signal for setting current frame signal, and indicates whether present frame is tonality frame with a tonality flag of frame tonality_frame;

Tonality characteristic parameter judge module, for judging whether the value of tonality_rate2 after tonality characteristic parameter tonality_rate1 or its smothing filtering is greater than the threshold value of corresponding setting;

Tonality signal judge module, in the time that described tonality characteristic parameter judge module is judged as YES, if time domain degree of stability characteristic ginseng value is less than the threshold value of a setting; Spectrum gravity center characteristics parameter value is greater than the threshold value of a setting, and the spectrum flatness characteristic parameter of each subband is all less than each self-corresponding default threshold value; Judge that present frame is tonality frame; Judging according to the tonality degree characteristic parameter tonality_degree calculating whether present frame is tonality signal, and in the time that described tonality characteristic parameter judge module is judged as NO, for judging according to the tonality degree characteristic parameter tonality_degree after upgrading whether present frame is tonality signal, and the value of tonality mark tonality_flag is set;

Tonality extent index update module, when after tonality characteristic parameter tonality_rate1 or its smothing filtering, the value of tonality_rate2 is all less than the threshold value of corresponding setting, according to tonality flag of frame, tonality degree characteristic parameter tonality_degree is upgraded, wherein tonality extent index tonality_degree initial value arranges in the time that activation sound detection device is started working.

Particularly, if current tonality flag of frame instruction present frame is tonality frame, tonality extent index update module adopts following equation to upgrade tonality degree characteristic parameter tonality_degree:

tonality_degree＝tonality_degree _-1·td?scale_A+td_scale_B；

If tonality degree characteristic parameter tonality_degree is greater than the threshold value of certain setting, described tonality signal judge module judges that present frame is tonality signal, otherwise, judge that present frame is non-tonality signal.

Particularly, if tonality degree characteristic parameter tonality_degree is greater than this threshold value 0.5, judge that present frame is tonality signal, the value that tonality mark tonality_flag is set is 1; Otherwise, judging that present frame is non-tonality signal, it is 0 that this value is set.The threshold value interval of tonality signal decision is [0.3,0.7].

The present invention also provides in a kind of VAD judgement and has activated the modification method that sound keeps frame number, and as shown in figure 12, the method comprises:

Step 1201: calculate signal to noise ratio (S/N ratio) lt_snr when long according to subband signal;

Particularly, calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) lt_snr when long; When long, signal to noise ratio (S/N ratio) lt_snr can adopt logarithm to represent.

Step 1202: calculate average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

Calculate the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of several frames recently, obtain average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

Step 1203: according to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) parameter with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, keep frame number to revise to current activation sound.

Understandably, the precondition of current activation sound maintenance frame number correction is to activate phonetic symbol will instruction present frame for activating sound frame.

Preferably, while keeping frame number to revise to current activation sound, if continuous speech frame number is less than the threshold value 1 of a setting above, and signal to noise ratio (S/N ratio) lt_snr is less than the threshold value 2 of a setting when long, current activation sound keeps frame number to equal the minimum sound frame number that activates continuously deducting continuous speech frame number above; Otherwise, if be on average entirely greater than the threshold value 3 of a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave, and continuous speech frame number is greater than the threshold value 4 of a setting above, during according to length, the size of signal to noise ratio (S/N ratio) arranges and activates the value that sound keeps frame number, otherwise does not revise the value that current activation sound keeps frame number num_speech_hangover.

Keep the modification method of frame number corresponding to aforementioned activation sound, the present invention also provides in a kind of VAD judgement and has activated the correcting device that sound keeps frame number, and as shown in figure 13, this correcting device comprises:

Particularly, when long, snr computation unit calculates by former frame activates tone signal energy and the average ratio of ground unrest energy when long when average long, calculates signal to noise ratio (S/N ratio) lt_snr when long;

Particularly, the described average mean value with signal to noise ratio (S/N ratio) SNR2 entirely that entirely calculates nearest several frames with signal to noise ratio (S/N ratio) computing unit, obtains average full band signal to noise ratio (S/N ratio) SNR2_lt_ave.

As described above, the precondition of current activation sound maintenance frame number correction is to activate phonetic symbol will instruction present frame for activating sound frame.

Preferably, activate sound and keep frame number amending unit, while keeping frame number to revise to current activation sound, if continuous speech frame number is less than the threshold value 1 of a setting above, and when long, signal to noise ratio (S/N ratio) lt_snr is less than the threshold value 2 of a setting, current activation sound keeps frame number to equal the minimum sound frame number that activates continuously deducting continuous speech frame number above, otherwise, if be on average entirely greater than the threshold value 3 of a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave, and continuous speech frame number is greater than the threshold value 4 of a setting above, during according to length, the size of signal to noise ratio (S/N ratio) arranges and activates the value that sound keeps frame number, otherwise do not revise the value that current activation sound keeps frame number nun_speech_hangover.

The present invention also provides the method for adjustment of signal-noise ratio threshold in a kind of VAD judgement, and as shown in figure 14, this method of adjustment comprises:

Step 1401: the spectrum gravity center characteristics parameter that calculates present frame according to subband signal;

Particularly, described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains.

Step 1402: calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) lt_snr when long;

Step 1403: according to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) when long, above activate continuously sound frame number and above continuing noise frame number continuous_noise_num adjust the signal-noise ratio threshold of VAD judgement.

Particularly, as shown in figure 15, the step of adjusting signal-noise ratio threshold comprises:

Step 1501: the initial value that signal-noise ratio threshold snr_thr is set;

Step 1502: the value of adjusting first signal-noise ratio threshold snr_thr according to spectrum center of gravity parameter;

Step 1503: according to activate continuously above sound frame number continuous_speech_num, above continuing noise frame number continuous_noise_num, average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave and when long signal to noise ratio (S/N ratio) lt_snr adjust the value of signal-noise ratio threshold snr_thr for bis-times;

Step 1504: during according to length, the value of signal to noise ratio (S/N ratio) lt_snr is finally revised signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.

Corresponding to the method for adjustment of aforementioned signal-noise ratio threshold, the present invention also provides the adjusting gear of signal-noise ratio threshold in a kind of VAD judgement, and as shown in figure 16, this adjusting gear comprises:

Preferably, described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not, or this ratio carries out the value that smothing filtering obtains.

Snr computation unit when long, for calculate by former frame average long time activate tone signal energy and the average ratio of ground unrest energy when long, calculate signal to noise ratio (S/N ratio) lt_snr when long;

Particularly, when described signal-noise ratio threshold adjustment unit is adjusted signal-noise ratio threshold, the initial value of signal-noise ratio threshold snr_thr is set; Adjust first the value of signal-noise ratio threshold snr_thr according to spectrum center of gravity parameter; According to activate continuously above sound frame number continuous_speech_num, above continuing noise frame number continuous_noise_num, average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave and when long signal to noise ratio (S/N ratio) lt_snr adjust the value of snr_thr for bis-times; Finally, during according to length, the value of signal to noise ratio (S/N ratio) lt_snr is finally adjusted signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.

Modern a lot of speech coding standards, as AMR, AMR-WB, all vad enabled function.Aspect efficiency, the VAD of these scramblers can not reach good performance under all typical background noise.Particularly, under astable noise, as office noise, the VAD efficiency of these scramblers is all lower.And for music signal, these VAD sometimes there will be error-detecting, cause corresponding Processing Algorithm to occur obvious Quality Down.

Method of the present invention has overcome the shortcoming of existing vad algorithm, also improves the accuracy rate of music detection when raising VAD is to non-stationary noise detection efficiency.Make to adopt the voice frequency signal processing algorithm of this VAD can obtain better performance.

Ground unrest detection method provided by the invention, can make the estimation of ground unrest more accurately with stable, is conducive to improve the accuracy rate that VAD detects.The tonality signal detecting method that the present invention provides simultaneously, has improved the accuracy rate of tonality music detection.The activation sound that the present invention provides simultaneously keeps the modification method of frame number, can make under different noises and signal to noise ratio (S/N ratio), and vad algorithm can obtain better balance in performance and efficiency.In the VAD judgement that the present invention provides simultaneously, the method for adjustment of signal-noise ratio threshold, can make VAD decision algorithm can reach good accuracy rate under different signal to noise ratio (S/N ratio)s, in the situation that ensuring the quality of products, and further raising efficiency.

One of ordinary skill in the art will appreciate that all or part of step in said method can carry out instruction related hardware by program and complete, described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can realize with one or more integrated circuit.Correspondingly, the each module/unit in above-described embodiment can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.The present invention is not restricted to the combination of the hardware and software of any particular form.

Claims

1. activate sound and detect (VAD) method, it is characterized in that, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

2. the method for claim 1, is characterized in that, before or after obtaining VAD court verdict, the method also comprises:

Obtain the initial background noise mark of present frame according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters;

After obtaining VAD court verdict, the method also comprises: according to the court verdict of present frame VAD, tonality characteristic parameter, signal to noise ratio (S/N ratio) parameter, tonality mark, time domain degree of stability characteristic parameter, initial background noise mark is revised;

The modified value identifying according to ground unrest and the frame energy parameter of present frame, the full band ground unrest energy of former frame, obtain the subband ground unrest energy of present frame and be entirely with ground unrest energy;

The ground unrest energy of described present frame is for next frame signal to noise ratio (S/N ratio) calculation of parameter.

3. method as claimed in claim 1 or 2, is characterized in that:

4. the method for claim 1, is characterized in that,

Calculate VAD court verdict according to tonality mark, signal to noise ratio (S/N ratio) parameter, spectrum gravity center characteristics parameter, frame energy parameter, calculation procedure is as follows:

A, calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) when long;

B, calculates the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of nearest some frames, obtains average full band signal to noise ratio (S/N ratio) SNR2_lt_ave;

C, according to spectrum gravity center characteristics parameter, signal to noise ratio (S/N ratio) lt_snr when long, above activate continuously sound frame number continuous_speech_num and above continuing noise frame number continuous_noise_num obtain the signal-noise ratio threshold snr_thr of VAD judgement;

D, calculates initial VAD judgement according to the decision threshold snr_thr of VAD and signal to noise ratio (S/N ratio) parameter S NR1, SNR2;

E, during according to tonality mark, on average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave, spectrum center of gravity and length, signal to noise ratio (S/N ratio) lt_snr revises the court verdict of VAD;

F, according to the court verdict of some frames above, the VAD court verdict of signal to noise ratio (S/N ratio) lt_snr, the average signal to noise ratio (S/N ratio) parameter with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely and present frame when long, revises and activates sound and keep frame number;

G, keeps frame number num_speech_hangover interpolation activation sound to keep according to the court verdict of present frame and activation sound, obtains the VAD court verdict of present frame.

5. method as claimed in claim 4, is characterized in that: after steps d, also comprise the initial court verdict according to VAD, activate tone signal energy E while calculating average length _fg; After step g, also comprise ground unrest energy E while calculating average length according to VAD court verdict _bg, calculated value is for next frame VAD judgement.

6. one kind is activated sound detection (VAD) device, it is characterized in that, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

7. VAD device as claimed in claim 6, is characterized in that,

Described device also comprises ground unrest energy process unit, and it comprises:

8. the VAD device as described in claim 6 or 7, is characterized in that:

9. VAD device as claimed in claim 6, is characterized in that, described VAD decision unit comprises:

Snr computation module when long, for calculate by former frame average long time activate tone signal energy and the average ratio of ground unrest energy when long, calculate signal to noise ratio (S/N ratio) lt_snr when long;

VAD modified result module, according to average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave, spectrum center of gravity and when long signal to noise ratio (S/N ratio) lt_snr the court verdict of VAD is revised;

10. VAD device as claimed in claim 9, is characterized in that: described VAD decision unit also comprises: energy computing module, for according to the initial court verdict of VAD, activates tone signal energy E while calculating average length _fg; And calculate ground unrest energy E when average long according to VAD court verdict _bg, calculated value is for next frame VAD judgement.

11. 1 kinds of ground unrest detection methods, is characterized in that, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

12. methods as claimed in claim 11, is characterized in that:

The ratio of the expectation of the variance that described time domain degree of stability parameter is frame energy amplitude and amplitude superposition value square, or this ratio is multiplied by a coefficient;

13. methods as claimed in claim 11, is characterized in that: the following arbitrary condition of judgement is set up, and judges that present frame is not noise signal:

14. 1 kinds of ground unrest pick-up units, is characterized in that, this device comprises:

Bank of filters, for obtaining the subband signal of present frame;

15. pick-up units as claimed in claim 14, is characterized in that:

16. pick-up units as claimed in claim 14, is characterized in that: the following arbitrary condition of described ground unrest judging unit judgement is set up, and judges that present frame is not noise signal:

17. 1 kinds of tonality signal detecting methods, is characterized in that, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

Judge according to tonality characteristic parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter whether present frame is tonality signal.

18. methods as claimed in claim 17, is characterized in that:

19. methods as claimed in claim 17, is characterized in that: while determining whether tonality signal, carry out following operation:

A) suppose that current frame signal is non-tonality signal, and indicate whether present frame is tonality frame with a tonality flag of frame tonality_frame;

B) judge that whether the value of tonality_rate2 after tonality characteristic parameter tonality_rate1 or its smothing filtering is greater than the threshold value of corresponding setting, if above-mentioned condition has an establishment, performs step C), otherwise execution step D);

C) if time domain degree of stability characteristic ginseng value is less than the threshold value of a setting; Spectrum gravity center characteristics parameter value is greater than the threshold value of a setting, and the spectrum flatness characteristic parameter of each subband is all less than each self-corresponding default threshold value; Judge that present frame is tonality frame, the value of tonality flag of frame is set, otherwise be judged as non-tonality frame, the value of tonality flag of frame is set, and continue execution step D);

D) according to tonality flag of frame, tonality degree characteristic parameter tonality_degree is upgraded, wherein tonality extent index tonality_degree initial value arranges in the time that the detection of activation sound is started working;

20. methods as claimed in claim 17, is characterized in that: if current tonality flag of frame instruction present frame is tonality frame, adopt following equation to upgrade tonality degree characteristic parameter tonality_degree:

tonality_degree＝tonality_degree _-1·td_scale_A+td_scale_B;

Wherein, tonality_degree _-1for the tonality degree characteristic parameter of former frame, its initial value span is [0,1], and td_scale_A is attenuation coefficient, and td_scale_B is cumulative coefficient.

21. methods as claimed in claim 17, is characterized in that:

If tonality degree characteristic parameter tonality_degree is greater than the threshold value of certain setting, judge that present frame is tonality signal, otherwise, judge that present frame is non-tonality signal.

22. 1 kinds of tonality signal supervisory instruments, is characterized in that, this pick-up unit comprises:

Bank of filters, for obtaining the subband signal of present frame;

Calculation of characteristic parameters unit, for calculate the value of spectrum gravity center characteristics parameter, time domain degree of stability characteristic parameter according to subband signal, calculates the value of spectrum flatness characteristic parameter tunefulness characteristic parameter according to spectral magnitude;

Whether tonality signal judging unit, for being tonality signal according to tonality characteristic parameter, time domain degree of stability characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter present frame.

23. pick-up units as claimed in claim 22, is characterized in that:

24. pick-up units as claimed in claim 22, is characterized in that: described tonality signal judging unit comprises:

25. pick-up units as claimed in claim 22, it is characterized in that: if current tonality flag of frame instruction present frame is tonality frame, tonality extent index update module adopts following equation to upgrade tonality degree characteristic parameter tonality_degree:

tonality_degree＝tonality_degree _-1·td_scale_A+td_scale_B;

26. pick-up units as claimed in claim 22, it is characterized in that: if tonality degree characteristic parameter tonality_degree is greater than the threshold value of certain setting, described tonality signal judge module judges that present frame is tonality signal, otherwise, judge that present frame is non-tonality signal.

In 27. 1 kinds of VAD judgements, current activation sound keeps the modification method of frame number, it is characterized in that, the method comprises:

Obtain subband signal and the spectral magnitude of present frame;

Calculate signal to noise ratio (S/N ratio) lt_snr and average full band signal to noise ratio (S/N ratio) SNR2_lt_ave when long according to subband signal, according to the court verdict of some frames above, signal to noise ratio (S/N ratio) lt_snr, the average VAD court verdict with signal to noise ratio (S/N ratio) SNR2_lt_ave, present frame entirely when long, keep frame number to revise to current activation sound.

28. methods as claimed in claim 27, is characterized in that, calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) lt_snr when long; Calculate the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of several frames recently, obtain average full band signal to noise ratio (S/N ratio) SNR2_lt_ave.

29. methods as claimed in claim 27, is characterized in that: it is to activate phonetic symbol will instruction present frame for activating sound frame that current activation sound keeps the precondition of frame number correction.

30. methods as claimed in claim 27, it is characterized in that: while keeping frame number to revise to current activation sound, if continuous speech frame number is less than the threshold value 1 of a setting above, and when long, signal to noise ratio (S/N ratio) lt_snr is less than the threshold value 2 of a setting, current activation sound keeps frame number to equal the minimum sound frame number that activates continuously deducting continuous speech frame number above, otherwise, if be on average entirely greater than the threshold value 3 of a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave, and continuous speech frame number is greater than the threshold value 4 of a setting above, during according to length, the size of signal to noise ratio (S/N ratio) arranges and activates the value that sound keeps frame number, otherwise do not revise the value that current activation sound keeps frame number.

In 31. 1 kinds of VAD judgements, current activation sound keeps the correcting device of frame number, it is characterized in that, this correcting device comprises:

32. correcting devices as claimed in claim 31, is characterized in that: when long, snr computation unit calculates by former frame activates tone signal energy and the average ratio of ground unrest energy when long when average long, calculate signal to noise ratio (S/N ratio) lt_snr when long; The described average mean value with signal to noise ratio (S/N ratio) SNR2 entirely that entirely calculates nearest several frames with signal to noise ratio (S/N ratio) computing unit, obtains average full band signal to noise ratio (S/N ratio) SNR2_lt_ave.

33. correcting devices as claimed in claim 31, is characterized in that: it is to activate phonetic symbol will instruction present frame for activating sound frame that current activation sound keeps the precondition of frame number correction.

34. correcting devices as claimed in claim 31, it is characterized in that: activate sound and keep frame number amending unit, while keeping frame number to revise to current activation sound, if continuous speech frame number is less than the threshold value 1 of a setting above, and when long, signal to noise ratio (S/N ratio) lt_snr is less than the threshold value 2 of a setting, current activation sound keeps frame number to equal the minimum sound frame number that activates continuously deducting continuous speech frame number above, otherwise, if be on average entirely greater than the threshold value 3 of a setting with signal to noise ratio (S/N ratio) SNR2_lt_ave, and continuous speech frame number is greater than the threshold value 4 of a setting above, during according to length, the size of signal to noise ratio (S/N ratio) arranges and activates the value that sound keeps frame number, otherwise do not revise the value that current activation sound keeps frame number.

In 35. 1 kinds of VAD judgements, the method for adjustment of signal-noise ratio threshold, is characterized in that, this method of adjustment comprises:

Obtain subband signal and the spectral magnitude of present frame;

Calculate current spectrum gravity center characteristics parameter according to subband signal;

Calculate by former frame average long time activate tone signal energy and the ratio of ground unrest energy when average long, calculate signal to noise ratio (S/N ratio) when long;

36. methods as claimed in claim 35, is characterized in that:

37. methods as claimed in claim 35, is characterized in that, the step of adjusting signal-noise ratio threshold comprises:

The initial value of signal-noise ratio threshold snr_thr is set;

Adjust first the value of signal-noise ratio threshold snr_thr according to spectrum center of gravity parameter;

According to activate continuously above sound frame number continuous_speech_num, above continuing noise frame number continuous_noise_num, average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave and when long signal to noise ratio (S/N ratio) lt_snr adjust the value of snr_thr for bis-times;

Again signal-noise ratio threshold snr_thr is finally revised according to the value of lt_snr, obtain the signal-noise ratio threshold snr_thr of present frame.

38. methods as claimed in claim 37, is characterized in that,

Calculate the mean value with signal to noise ratio (S/N ratio) SNR2 entirely of several frames recently, obtain average full band signal to noise ratio (S/N ratio) SNR2_lt_ave.

In 39. 1 kinds of VAD judgements, the adjusting gear of signal-noise ratio threshold, is characterized in that, this adjusting gear comprises:

Characteristic parameter acquiring unit, for calculating current spectrum gravity center characteristics parameter according to subband signal;

Snr computation unit when long, for calculate by former frame average long time activate tone signal energy and the average ratio of ground unrest energy when long, obtain signal to noise ratio (S/N ratio) lt_snr when long;

40. adjusting gears as claimed in claim 39, is characterized in that:

Described spectrum gravity center characteristics parameter is the weighted accumulation value of all or part subband signal energy and the ratio of weighted accumulation value not.

41. adjusting gears as claimed in claim 39, is characterized in that, when described signal-noise ratio threshold adjustment unit is adjusted signal-noise ratio threshold, the initial value of signal-noise ratio threshold snr_thr are set; Adjust first the value of signal-noise ratio threshold snr_thr according to spectrum center of gravity parameter; According to activate continuously above sound frame number continuous_speech_num, above continuing noise frame number continuous_noise_num, average entirely with signal to noise ratio (S/N ratio) SNR2_lt_ave and when long signal to noise ratio (S/N ratio) lt_snr adjust the value of snr_thr for bis-times; Finally, during according to length, the value of signal to noise ratio (S/N ratio) lt_snr is finally adjusted signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.