CN103903634B

CN103903634B - The detection of activation sound and the method and apparatus for activating sound detection

Info

Publication number: CN103903634B
Application number: CN201210570563.5A
Authority: CN
Inventors: 江东平; 袁浩; 朱长宝
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2012-12-25
Filing date: 2012-12-25
Publication date: 2018-09-04
Anticipated expiration: 2032-12-25
Also published as: CN109119096B; CN112992188A; CN103903634A; CN109119096A

Abstract

A kind of method and apparatus the present invention relates to activation sound detection (VAD) and for activating sound to detect, this method include：Obtain the subband signal and spectral magnitude of present frame；The value of the frame energy parameter and spectrum gravity center characteristics parameter of present frame is calculated according to subband signal；The signal-to-noise ratio parameter of present frame is calculated in the frame energy parameter and signal-to-noise ratio sub-belt energy of the background noise energy, present frame estimated according to former frame；VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter.The accuracy rate of non-stationary noise (such as office noise) and music detection can be improved in the method for the present invention and device.

Description

The detection of activation sound and the method and apparatus for activating sound detection

Technical field

The present invention relates to it is a kind of activation sound detection (VAD) and for activate sound detect method (including ambient noise detection, Present frame activation sound keeps the adjustment etc. of signal-noise ratio threshold in the amendment of frame number, VAD judgements in tonality signal detection, VAD judgements Method) and device.

Background technology

In normal voice communication, user is speaking sometimes, is listening sometimes, this when will occur non-in communication process Scale section is activated, both call sides total non-speech stage will be more than the total voice coding duration of both call sides under normal circumstances 50%.In inactive scale section, only ambient noise, usually not any useful information of ambient noise.The fact that utilization, In voice frequency signal processing procedure, by activating sound to detect the detection of (VAD) algorithm for activation sound and inactive sound, and use Different methods are respectively processed.Modern many speech coding standards, such as AMR, AMR-WB, all vad enabled function.It is imitating In terms of rate, the VAD of these encoders can not reach good performance under all typical background noises.Especially non- Under steady state noise, the VAD efficiency of these encoders is all relatively low.And for music signal, these VAD sometimes will appear mistake inspection It surveys, causes corresponding Processing Algorithm apparent quality occur and decline.

Invention content

A kind of method the technical problem to be solved in the present invention is to provide activation sound detection (VAD) and for activating sound to detect (including the currently active sound keeps believing in the amendment of frame number, VAD judgements in ambient noise detection, tonality signal detection, VAD judgement Make an uproar than the methods of the adjustment of thresholding) and device, to improve the accuracy rate of VAD detections.

In order to solve the above technical problems, the present invention provides a kind of activation sounds to detect (VAD) method, this method includes：

Obtain the subband signal and spectral magnitude of present frame；

Frame energy parameter, spectrum gravity center characteristics parameter and the time-domain stability degree feature of present frame are calculated according to subband signal The value of parameter；The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude；

The frame energy parameter and signal-to-noise ratio sub-belt energy meter of the background noise energy, present frame estimated according to former frame Calculation obtains the signal-to-noise ratio parameter of present frame；

According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng Number, tonality calculation of characteristic parameters obtain the tonality mark of present frame；

VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter.

In order to solve the above technical problems, the present invention provides a kind of activation sounds to detect (VAD) device, which includes：

Filter group, the subband signal for obtaining present frame；

Spectral magnitude computing unit, the spectral magnitude for obtaining present frame；

Characteristic parameter acquiring unit, frame energy parameter, the spectrum center of gravity for present frame to be calculated according to subband signal are special Levy the value of parameter and time-domain stability degree characteristic parameter；Spectrum flatness characteristic parameter tunefulness feature is calculated according to spectral magnitude The value of parameter；

Indicate computing unit, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame；

Signal-to-noise ratio computation unit, the frame energy ginseng of background noise energy, present frame for being estimated according to former frame The signal-to-noise ratio parameter of present frame is calculated in number and signal-to-noise ratio sub-belt energy；

VAD decision units, for being calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter Obtain VAD court verdicts.

In order to solve the above technical problems, the present invention provides a kind of ambient noise detection method, this method includes：

Obtain the subband signal and spectral magnitude of present frame；

The frame energy parameter that is calculated according to subband signal, spectrum gravity center characteristics parameters, time-domain stability degree characteristic parameter The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for value；

According to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, Current frame energy parameter carries out ambient noise detection, judges whether present frame is ambient noise.

In order to solve the above technical problems, the present invention provides a kind of ambient noise detection device, which includes：

Filter group, the subband signal for obtaining present frame；

Calculation of characteristic parameters unit, frame energy parameter for being calculated according to subband signal, spectrum gravity center characteristics parameter, Spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude in the value of time-domain stability degree characteristic parameter Value；

Ambient noise judging unit, for special according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness It levies parameter, tonality characteristic parameter, current frame energy parameter and carries out ambient noise detection, judge whether present frame is ambient noise.

In order to solve the above technical problems, the present invention provides a kind of tonality signal detecting method, this method includes：

Obtain the subband signal and spectral magnitude of present frame；

Spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter is calculated according to subband signal, according to frequency spectrum width The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated in value；

According to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter, Judge whether present frame is tonality signal.

In order to solve the above technical problems, the present invention provides a kind of tonality signal supervisory instrument, which includes：

Filter group, the subband signal for obtaining present frame；

Spectrum gravity center characteristics parameter, time-domain stability degree feature is calculated according in subband signal in calculation of characteristic parameters unit The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value of parameter；

Tonality signal judging unit, for according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness feature Parameter, spectrum gravity center characteristics parameter judge whether present frame is tonality signal.

In order to solve the above technical problems, present frame activation sound keeps repairing for frame number in being adjudicated the present invention provides a kind of VAD Correction method, this method include：

Signal-to-noise ratio lt_snr and average full band signal-to-noise ratio SNR2_lt_ave when being calculated long；

According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_lt_ave, The signal-to-noise ratio of present frame and the VAD court verdicts of present frame keep frame number to be modified the currently active sound.

In order to solve the above technical problems, the currently active sound keeps the amendment of frame number in being adjudicated the present invention provides a kind of VAD Device, the correcting device include：

Signal-to-noise ratio computation unit when long, signal-to-noise ratio lt_snr when for calculating long；

Average full band signal-to-noise ratio computing unit, for calculating averagely full band signal-to-noise ratio SNR2_lt_ave；

Activate sound to keep frame number amending unit, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr, The average VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of present frame and present frame entirely, to the currently active Sound keeps frame number to be modified.

In order to solve the above technical problems, the present invention provides the method for adjustment of signal-noise ratio threshold in a kind of VAD judgements, the tune Adjusting method includes：

The spectrum gravity center characteristics parameter of present frame is calculated according to subband signal；

Calculate that former frame is calculated it is average long when activation sound signal energy and it is average long when background noise energy ratio Value, signal-to-noise ratio lt_snr when obtaining long；

According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame number and front continuing noise frame The signal-noise ratio threshold of number continuous_noise_num adjustment VAD judgements.

In order to solve the above technical problems, the present invention provides the adjusting apparatus of signal-noise ratio threshold in a kind of VAD judgements, the tune Engagement positions include：

Characteristic parameter acquiring unit, the spectrum gravity center characteristics parameter for present frame to be calculated according to subband signal；

Signal-to-noise ratio computation unit when long, activation sound signal energy peace when being averaged long being calculated for calculating former frame The ratio of background noise energy when long, signal-to-noise ratio lt_snr when obtaining long；

Signal-noise ratio threshold adjustment unit, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame The signal-noise ratio threshold of number and front continuing noise frame number continuous_noise_num adjustment VAD judgements.

The shortcomings that the method for the present invention and device overcome existing vad algorithm is improving VAD to non-stationary noise detection efficiency While also improve the accuracy rate of music detection.Allow and is obtained preferably using the voice frequency signal Processing Algorithm of this VAD Performance.

Description of the drawings

Fig. 1 is the schematic diagram of invention activation sound detection method embodiment 1；

Fig. 2 is the schematic diagram of invention activation sound detection method embodiment 2；

Fig. 3 is to obtain the process schematic of VAD court verdicts in the embodiment of the present invention 1,2；

Fig. 4 is the modular structure schematic diagram that invention activation sound detects (VAD) device embodiment 1；

Fig. 5 is the modular structure schematic diagram that invention activation sound detects (VAD) device embodiment 2；

Fig. 6 is the modular structure schematic diagram of the VAD decision units in VAD devices of the present invention；

Fig. 7 is the schematic diagram of ambient noise detection method embodiment of the present invention；

Fig. 8 is the modular structure schematic diagram of ambient noise detection device of the present invention；

Fig. 9 is the schematic diagram of tonality signal detecting method embodiment of the present invention；

Figure 10 is the modular structure schematic diagram of tonality signal supervisory instrument of the present invention；

Figure 11 is the modular structure schematic diagram of the tonality signal judging unit of tonality signal supervisory instrument of the present invention；

Figure 12 is the schematic diagram for the modification method embodiment that the currently active sound keeps frame number in VAD of the present invention judgements；

Figure 13 is the modular structure schematic diagram for the correcting device that the currently active sound keeps frame number in VAD of the present invention judgements；

Figure 14 is the schematic diagram of the method for adjustment embodiment of signal-noise ratio threshold in VAD of the present invention judgements；

Figure 15 is the idiographic flow schematic diagram of present invention adjustment signal-noise ratio threshold；

Figure 16 is the modular structure schematic diagram of the adjusting apparatus of signal-noise ratio threshold in VAD of the present invention judgements.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes.Obviously, described embodiments are only a part of the embodiments of the present invention, and not all embodiment.Based on this Embodiment in invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

Invention activation sound detects (VAD, Voice Activity Detection) embodiment of the method 1, as shown in Figure 1, This method includes：

Step 101：Obtain the subband signal and spectral magnitude of present frame；

With frame length it is 20ms in the present embodiment, is illustrated for the audio stream that sample rate is 32kHz.In other frame lengths and Under the conditions of sample rate, method of the invention is equally applicable.

By present frame time-domain signal input filter group, sub-band filter calculating is carried out, filter group subband signal is obtained；

The filter group in 40 channels is used in the present embodiment, the present invention is for the filter group using other port numbers It is equally applicable.

Present frame time-domain signal is inputted to the filter group in 40 channels, sub-band filter calculating is carried out, obtains 16 time samples Filter group subband signal X [k, l], 0≤k ＜ 40,0≤l ＜ 16 of upper 40 subbands of point, wherein k are filter group subband Index, value indicate that the corresponding subband of coefficient, l are that the time sampling point of each subband indexes, and implementation step is as follows：

101a：640 nearest audio signal sample values are stored in data buffer storage.

101b：Data in data buffer storage are moved into 40 positions, 40 earliest sampled values are removed data buffer storage, and handle 40 new sampling points are deposited on 0 to 39 position.

Data x in caching is multiplied by window coefficient, obtains array z, calculation equation is as follows：

Z [n]=x [n] W_qmf[n]；0≤n ＜ 640；

Wherein W_qmfFor filter group window coefficient.

One 80 points of data u is calculated using pseudocode below,

Array r and i are obtained using following equation calculation：

R [n]=u [n]-u [79-n], 0≤n ＜ 40

I [n]=u [n]+u [79-n]

101c：The calculating process for repeating 101b is filtered until by all filtered device group of all data of this frame, last It is filter group subband signal X [k, l] to export result.

101d：After completing process calculated above, the filter group subband signal X of 16 time sampling points of 40 subbands is obtained [k, l], 0≤k ＜ 40,0≤l ＜ 16.

Time-frequency conversion is carried out to filter group subband signal, and spectral magnitude is calculated.

Time-frequency conversion wherein is carried out to whole filter group subbands or part filter group subband, calculates spectral magnitude, all The embodiment of the present invention may be implemented.The time-frequency conversion method of the present invention can be DFT, FFT, DCT or DST.The present embodiment For DFT, illustrate its concrete methods of realizing.Calculating process is as follows：

The 16 time sampling point data indexed on each filter group subband for 0 to 9 are carried out with 16 points of DFT transform, Spectral resolution is further increased, and calculates the amplitude of each frequency point, obtains spectral magnitude X_{DFT_AMP}。

The amplitude process for calculating each frequency point is as follows：

First, array X is calculated_DFTThe energy of [k] [j] on each point, calculation equation are as follows：

X_{DFT_POW}[k, j]=(real (X_DFT[k, j])²+(image(X_DFT[k, j])²；0≤k ＜ 10；0≤j ＜ 16；Wherein real(X_{DFT_POW}[k, j]), image (X_{DFT_POW}[k, j]) spectral coefficient XD is indicated respectively_{FT_POW}The real and imaginary parts of [k, j].

If k is even number, the spectral magnitude on each frequency point is calculated using following equation：

If k is odd number, the spectral magnitude on each frequency point is calculated using following equation：

X_{DFT_AMP}Spectral magnitude as after time-frequency conversion.

Step 102：The value of the frame energy parameter and spectrum gravity center characteristics parameter of present frame is calculated according to subband signal；

Art methods acquisition can be used in frame energy parameter, the value for composing gravity center characteristics parameter tunefulness characteristic parameter, excellent Selection of land, each parameter obtain with the following method：

The frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value；Specifically：

A) energy of each filter group subband is calculated according to filter group subband signal X [k, l], calculation equation is as follows：

B) the part sense of hearing is obtained than more sensitive filter group subband or the energy accumulation of all filter group subbands Frame energy parameter.

Wherein according to psycho-acoustic model, human ear is to very low frequencies (such as 100Hz or less) and high frequency (such as 20kHz or more) sound It can be than less sensitive, it is considered herein that the filter group subband arranged from low to high according to frequency, inverse is taken to from second son Second subband is the sense of hearing than more sensitive main filter group subband, by the part or all of sense of hearing than more sensitive filter group Sub-belt energy is cumulative to obtain frame energy parameter 1, and calculation equation is as follows：

Wherein, e_sb_start is starting subband index, and value range is [0,6].E_sb_end is to terminate subband rope Draw, value is more than 6, is less than sub-band sum.

The value of frame energy parameter 1 is plus the partly or entirely not used filter group subband when calculating frame energy parameter 1 Energy weighted value, obtain frame energy parameter 2, calculation equation is as follows：

Wherein e_scale1, e_scale2 are weighting scale factor, and value range is respectively [0,1]._{num_band}For subband Total number.

Spectrum gravity center characteristics parameter be by ask filter group sub-belt energy weighting summation and with the direct phase of sub-belt energy The ratio of the sum added carries out what smothing filterings obtained by composing gravity center characteristics parameter values to other.

Spectrum gravity center characteristics parameter may be used following sub-step and realize：

a：Subband interval division for composing the calculating of gravity center characteristics parameter is as follows：

b：Using the spectrum gravity center characteristics parameter computation interval dividing mode and following formula of a, two spectrum centers of gravity are calculated Characteristic ginseng value, respectively first interval compose gravity center characteristics parameter and second interval composes gravity center characteristics parameter.

Delta1, Delta2 are respectively a small bias, and value range is (0,1).Wherein k is spectrum center of gravity number rope Draw.

c：Smothing filtering operation is carried out to first interval spectrum gravity center characteristics parameter sp_center [0], obtains smoothly composing center of gravity Characteristic ginseng value, i.e. first interval compose the smothing filtering value of gravity center characteristics parameter value, and calculating process is as follows：

Sp_center [2]=sp_center_--1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_ scale)

Wherein, spc_sm_scale is that spectrum center of gravity parameter smoothing filters scale factor, sp_center_-1[2] previous frame is indicated Smooth spectrum gravity center characteristics parameter value, initial value 1.6.

Step 103：Frame energy parameter and signal-to-noise ratio of the background noise energy, present frame estimated according to former frame The signal-to-noise ratio parameter of present frame is obtained with energy balane；

The background noise energy of former frame can be obtained by existing method.

If present frame is start frame, the value of signal-to-noise ratio subband background noise energy is using the initial value given tacit consent to.Former frame is believed It makes an uproar more identical as the principle that the signal-to-noise ratio subband background energy of present frame is estimated than the estimation of subband background noise energy, the letter of present frame It makes an uproar and estimates to see below the step 207 in embodiment 2 than subband background energy.Specifically, the signal-to-noise ratio parameter of present frame can adopt It is realized with existing signal-noise ratio computation method.Preferably, using following methods：

First, filter group subband is reclassified as several signal-to-noise ratio subbands, divides index such as following table,

Again, according to the ambient noise energy of the energy of each signal-to-noise ratio subband of present frame and each signal-to-noise ratio subband of previous frame Gauge operator band average signal-to-noise ratio SNR1.Accounting equation is as follows：

Wherein E_{sb2_bg}To estimate the background noise energy of the obtained each signal-to-noise ratio subband of previous frame, num_band noises Than subband number.It obtains the principle of the background noise energy of previous frame signal-to-noise ratio subband and obtains the signal-to-noise ratio subband back of the body of present frame The principle of scape energy is identical, obtains the step of process of the signal-to-noise ratio subband background energy of present frame see below embodiment 2 207；

Finally, according to the previous frame for estimating to obtain, the frame energy parameter with background noise energy and present frame, calculating are complete entirely Band signal-to-noise ratio SNR2：

Wherein E_{t_bg}To estimate obtained previous frame band background noise energy entirely, previous frame band background noise energy entirely is obtained Principle is identical as the full principle with background noise energy of present frame is obtained, and obtains the full mistake with background noise energy of present frame Journey see below the step 207 of embodiment 2；

Signal-to-noise ratio parameter includes sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio SNR2 in the present embodiment.Full band background is made an uproar The background noise energy of acoustic energy and each subband is referred to as background noise energy.

Step 104：VAD is calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter Court verdict.

Embodiment 2

Invention activation sound detects (VAD) embodiment of the method 2, carries out multiphase filtering to the audio signal framing of input, obtains Time-frequency conversion is further carried out to filter group subband signal, and to filter group subband signal, and spectral magnitude is calculated, Signal characteristic abstraction is carried out on each filter group subband signal and spectral magnitude respectively, obtains each characteristic ginseng value.Root The ambient noise that present frame is calculated according to characteristic ginseng value identifies tunefulness mark.According to current frame energy parameter value and background The signal-to-noise ratio parameter of present frame is calculated in noise energy, according to the signal-to-noise ratio parameter of the present frame being calculated, previous frame VAD (voice activation detects, Voice Activity Detection) court verdicts and each characteristic parameter, judge that present frame is No is activation sound frame.Ambient noise mark is modified according to activation sound frame court verdict, obtains new ambient noise mark. Judge whether to be updated ambient noise according to new ambient noise mark.The detailed process of VAD detections is as follows：

As shown in Fig. 2, this method embodiment 2 includes：

Step 201：Obtain the subband signal and spectral magnitude of present frame；

Step 202：Current frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability is calculated according to subband signal Spend the value of characteristic parameter；The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude；

The frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value；

The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy Ratio；

Specifically,

Spectrum gravity center characteristics parameter is obtained according to the energy balane of each filter group subband, spectrum gravity center characteristics parameter is to pass through Ask filter group sub-belt energy weighting summation and with sub-belt energy be directly added and ratio or by other spectrum weights Heart characteristic ginseng value carries out what smothing filtering obtained.

Sp_center [2]=sp_center_-1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_ scale)

Wherein, spc_sm_scale is that spectrum center of gravity parameter smoothing filters scale factor, sp_center_-1[2] previous frame is indicated Smooth spectrum gravity center characteristics parameter value its initial value be 1.6.

The time-domain stability degree characteristic parameter is the variance of amplitude superposition value and the desired ratio of amplitude superposition value square, Or the ratio is multiplied by a coefficient；

Specifically,

Time-domain stability degree characteristic parameter is calculated by the frame energy parameter of newest several frame signals.In the present embodiment Time-domain stability degree characteristic parameter is calculated using the frame energy parameter of newest 40 frame signal.Specifically calculating step is：

First, the energy magnitude of nearest 40 frame signal is calculated, accounting equation is as follows：

Wherein, e_offset is a bias, and value range is [0,0.1]

Secondly, the energy magnitude of present frame to adjacent two frame of the 40th frame of front is added successively, it is folded obtains 20 amplitudes It is value added.Specific accounting equation is as follows：

Amp_t2(n)=Amp_t1(-2n)+Amp_t1(-2n-1)；0≤n ＜ 20；

Wherein, when n=0, Amp_t1The energy magnitude for indicating present frame, when n ＜ 0, Amp_t1Indicate the n frames of present frame forward Energy magnitude.

Finally, by calculating the ratio of the variance and average energy of 20 nearest amplitude superposition values, time-domain stability is obtained Spend characteristic parameter 1td_stable_rate0.Calculation equation is as follows：

The spectrum flatness characteristic parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average, or is somebody's turn to do Ratio is multiplied by a coefficient；

Specifically, by spectral magnitude X_{DFT_AMP}Several frequency bands are divided into, and the spectrum for calculating each frequency band of present frame is flat Degree, obtains the spectrum flatness characteristic parameter of present frame.

Spectral magnitude is divided into 3 frequency bands by the present embodiment, and calculates the spectrum flatness feature of this 3 frequency bands, specific Realize that steps are as follows：

First, by X_{DFT_AMP}It is divided into 3 frequency bands according to the index of following table.

Secondly the spectrum flatness for, calculating separately each subband, obtains the spectrum flatness characteristic parameter of present frame.Present frame The accounting equation of each spectrum flatness characteristic ginseng value is as follows：

Finally, smothing filtering is carried out to the spectrum flatness characteristic parameter of present frame, obtains the final spectrum flatness of present frame Characteristic parameter.

SSMR (k)=smr_scalesSMR_-1(k)+(1-smr_scale)·SMR(k)；0≤k ＜ 3

Wherein smr_scale is smoothing factor, and value range is [0.6,1], sSMR_-1(k) it is k-th of spectrum of previous frame The value of flatness characteristic parameter..

Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or Continue to carry out what smothing filtering obtained to the correlation.

Specifically, the computational methods of the correlation of spectral difference coefficient are as follows in the frame of front and back two frame signal：

Tonality characteristic parameter is calculated according to spectral magnitude, wherein tonality characteristic parameter can be according to all spectral magnitudes Or partial frequency spectrum amplitude is calculated.

Steps are as follows for its calculating：

Calculus of differences is done in part (being not less than 8 spectral coefficients) or whole spectral magnitudes by a with adjacent spectral magnitude, And the value by difference result less than 0 is set to 0, and obtains one group of non-negative spectral difference coefficient.

The present embodiment select location index for 3 to 61 frequency point coefficient for, calculate tonality characteristic parameter.Detailed process is such as Under：

The adjacent spectra amplitude of frequency point 3 to frequency point 61 is done into calculus of differences, equation is as follows：

Spec_dif [n-3]=X_{DFT_AMP}(n+1)-X_{DFT_AMP}(n)；3≤n ＜ 62；

0 variable zero setting will be less than in spec_dif.

B seeks the non-negative spectral difference system of present frame that step a is calculated non-negative spectral difference coefficient and former frame Several related coefficients obtains the first tonality characteristic ginseng value.Calculation equation is as follows：

Wherein, pre_spec_dif is the non-negative spectral difference coefficient of former frame.

C carries out smoothing operation to the first tonality characteristic ginseng value, obtains the second tonality characteristic ginseng value.Accounting equation is such as Under：

Tonality_rate2=tonal_scaletonality_rate2_-1+(1-tonal_scale)· tonality_rate1

Tonal_scale is tonality characteristic parameter smoothing factor, and value range is [0.1,1], tonality_rate2_-1 For the second tonality characteristic ginseng value of former frame, initial value value range is [0,1].

Step 203：Frame energy parameter and signal-to-noise ratio of the background noise energy, present frame estimated according to former frame The signal-to-noise ratio parameter of present frame is obtained with energy balane；

Step 204：It is flat according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum Degree characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark tunefulness mark of present frame；

Step 205：VAD is calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter Court verdict；

Specifically, the concrete methods of realizing of the step 205 see below the description in conjunction with Fig. 3.

Understandably, step 205VAD judgement before the step of, as long as parameter therein does not have front and back causality, then before Afterwards sequence it is adjustable, such as obtain initial background noise mark tunefulness mark step 204 can be in signal-to-noise ratio computation step 203 Before.

The initial background noise mark of present frame needs to be used for the calculating of next frame signal-to-noise ratio parameter after correcting, therefore obtains The operation of the initial background noise mark of present frame can also be after VAD judgements.

Step 206：According to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality mark, time domain Stability characteristic parameter is modified initial background noise mark；

If threshold value SNR2_redec_thr1, SNR1 that signal-to-noise ratio parameter SNR2 is less than a setting are less than SNR1_ Redec_thr1, VAD indicate that vad_f1ag is equal to 0, tonality characteristic parameter tonality_rate2 and is less than tonality_rate2_ Thr1, tonality mark tonality_flag are equal to 0 and time-domain stability degree characteristic parameter lt_stable_rate0 and are less than lt_ Stable_rate0_redec_thr1 (is set as 0.1), then ambient noise mark is assigned a value of 1.

Step 207：According to the frame energy parameter of the correction value of ambient noise mark and present frame, the full band background of former frame Noise energy obtains the background noise energy of present frame；The background noise energy of the present frame is joined for next frame signal-to-noise ratio Number calculates.

Ambient noise update is judged whether to according to ambient noise mark, if ambient noise is identified as 1, basis is estimated The ratio that meter obtains the energy entirely with background noise energy and current frame signal carries out ambient noise update.Background noise energy is estimated Meter includes that the estimation of subband background noise energy and full band background noise energy are estimated.

A, subband background noise energy estimation equation are as follows：

E_{sb2_bg}(k)=E_{sb2_bg_pre}(k)·α_{bg_e}+E_{sb2_bg}(k)·(1-α_{bg_e})；0≤k ＜ num_sb

Wherein num_sb is the number of frequency domain sub-band, E_{sb2_bg_pre}(k) subband of k-th of signal-to-noise ratio subband of former frame is indicated Background noise energy.

α_{bg_e}It is ambient noise updating factor, value is by the complete with background noise energy and current frame energy parameter of former frame It determines.Calculating process is as follows：

If previous frame is entirely with background ENERGY E_{t_bg}Less than the frame energy parameter E of present frame_t1, then value 0.96, otherwise value 0.95.

B, full band background noise energy are estimated：

If the ambient noise of present frame is identified as 1, background noise energy accumulated value E is updated_{t_sum}With ambient noise energy Amount adds up frame number N_{Et_counter}, accounting equation is as follows：

E_{t_sum}=E_{t_sum_-1}+E_t1；

N_{Et_counter}=N_{Et_counter_-1}+1；

Wherein E_{t_sum_-1}For the background noise energy accumulated value of former frame, N_{Et_counter_-1}The back of the body being calculated for former frame Scape noise energy adds up frame number.

C, full band background noise energy is by background noise energy accumulated value E_{t_sum}With accumulative frame number N_{Et_counter}Ratio be worth It arrives：

Judge N_{Et_counter}Whether 64 are equal to, if N_{Et_counter}Equal to 64 respectively by background noise energy accumulated value E_{t_sum}With accumulative frame number N_{Et_counter}Multiply 0.75.

D, according to tonality mark, frame energy parameter, the value with background noise energy is to subband background noise energy and the back of the body entirely Scape noise energy accumulated value is adjusted.Calculating process is as follows：

If tonality mark tonality_flag is equal to 1 and frame energy parameter E_t1Value to be less than background noise energy special Levy parameter E_{t_bg}Value be multiplied by a gain coefficient gain,

Then, E_{t_sum}=E_{t_sum}·gain+delta；E_{sb2_bg}(k)=E_{sb2_bg}(k)·gain+delta；

Wherein, the value range of gain is [0.3,1].

In embodiment 1 and embodiment 2, according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter The flow of VAD court verdicts is calculated, includes the following steps as shown in Figure 3：

Step 301：Activation sound signal energy when being averaged long and ambient noise when being averaged long being calculated by former frame The ratio of energy, signal-to-noise ratio lt_snr when being calculated long；

Sound signal ENERGY E is activated when average long_fgBackground noise energy E when averagely long_bgCalculating and definition see step 307.Signal-to-noise ratio lt_snr accounting equations are as follows when long：

In the formula, signal-to-noise ratio lt_snr is indicated using logarithm when long.

Step 302：The full average value with signal-to-noise ratio SNR2 for calculating several frames recently obtains averagely full band signal-to-noise ratio SNR2_lt_ave；

Accounting equation is as follows：

SNR2 (n) indicates that the full value with signal-to-noise ratio SNR2 of present frame n-th frame forward, F_num be the total of calculating average value Frame number, value range are [8,64].

Step 303：According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound frame number Continuous_speech_num and front continuing noise frame number continuous_noise_num obtains the letter of VAD judgements It makes an uproar than thresholding snr_thr；

Steps are as follows for specific implementation：

First, the initial value of setting signal-noise ratio threshold snr_thr, ranging from [0.1,2], it is therefore preferable to 1.06.

Secondly, the value of signal-noise ratio threshold snr_thr is adjusted for the first time according to spectrum gravity center characteristics parameter.Its step are as follows：If The value for composing gravity center characteristics parameter sp_center [2] is more than the threshold value spc_vad_dec_thr1 of a setting, then snr_thr In addition a bias, the preferential bias that changes takes 0.05；Otherwise, if sp_center [1] is more than spc_vad_dec_ Thr2, then snr_thr add a bias, the preferential bias that changes takes 0.10；Otherwise, snr_thr adds a bias, The preferential bias that changes takes 0.40；Wherein, threshold value spc_vad_dec_thr1 and spc_vad_dec_thr2 value ranges are [1.2,2.5]

Again, sound frame number continuous_speech_num, front continuing noise frame number are continuously activated according to front Continuous_noise_num, average full band signal-to-noise ratio SNR2_lt_ave and it is long when signal-to-noise ratio lt_snr bis- adjustment snr_ The value of thr.If front continuous speech number continuous_speech_num is more than the threshold value cpn_vad_ of a setting Dec_thr1, then snr_thr subtract 0.2；Otherwise, if front continuing noise number continuous_noise_num is more than one The threshold value cpn_vad_dec_thr2 of a setting, and SNR2_lt_ave is more than signal-to-noise ratio lt_ when a bias adds long Snr is multiplied by coefficient lt_tsnr_scale, then snr_thr adds a bias, and the preferential bias that changes takes 0.1；Otherwise, such as Fruit continuous_noise_num is more than the threshold value cpn_vad_dec_thr3 of a setting, then snr_thr adds one Bias, the preferential bias that changes take 0.2；Otherwise, if continuous_noise_num is more than the threshold value of a setting Cpn_vad_dec_thr4, then snr_thr add a bias, the preferential bias that changes takes 0.1.Wherein, threshold value cpn_ Vad_dec_thr1, cpn_vad_dec_thr2, cpn_vad_dec_thr3, cpn_vad_dec_thr4 value range be [2, 500], coefficient lt_tsnr_scale value ranges are [0,2].This step is skipped, final step is directly entered, can also realize this Invention.

Finally, according to it is long when signal-to-noise ratio lt_snr value final adjustment is carried out to signal-noise ratio threshold snr_thr again, worked as The signal-noise ratio threshold snr_thr of previous frame.

Update equation is as follows：

Snr_thr=snr_thr+ (lt_tsnr-thr_offset) thr_scale；

Wherein, thr_offset is a bias, and value range is [0.5,3]；Thr_scale is a gain system Number, value range are [0.1,1].

Step 304：Signal-to-noise ratio parameter SNR1, SNR2 being calculated according to the decision threshold snr_thr and present frame of VAD Initial VAD judgements are calculated；

Calculating process is as follows：

If SNR1 is more than decision threshold snr_thr, present frame is judged to activate sound frame, indicates vad_flag with VAD Value indicate whether present frame is activation sound frame, indicate that present frame is activation sound frame with value 1 in the present embodiment, 0 expression is currently Frame is inactive sound frame.Otherwise, present frame is judged for inactive sound frame, and the value of VAD marks Vad_flag is set to 0.

If SNR2 is more than the threshold value snr2_thr of a setting, present frame is judged to activate sound frame, VAD marks The value of vad_flag sets 1.Wherein, the value range of snr2_thr is [1.2,5.0]

Step 305：According to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum center of gravity and it is long when signal-to-noise ratio lt_ Snr is modified the court verdict of VAD；

It is as follows：

If tonality mark indicates that present frame is tonality signal, i.e. tonality_flag is 1, then judges that present frame is sharp Sound signal living, vad_flag marks set 1.

If the average thresholding SNR2_lt_ave_t_thr1 for being more than a setting with signal-to-noise ratio SNR2_lt_ave entirely is added Signal-to-noise ratio lt_snr multiplies in coefficient lt_tsnr_tscale when long, then judges present frame to activate sound frame, vad_flag marks to set 1。

Wherein, the value range of the present embodiment SNR2_lt_ave_thr1 is [Isosorbide-5-Nitrae], the value model of lt_tsnr_tscale It encloses for [0.1,0.6].

If the average thresholding SNR2_lt_ave_t_thr2 for being more than a setting with signal-to-noise ratio SNR2_lt_ave entirely, and And compose gravity center characteristics parameter sp_center [2] be more than setting thresholding sp_center_t_thr1 and it is long when signal-to-noise ratio lt_ Snr is less than the thresholding lt_tsnr_t_thr1 of a setting, then judges present frame to activate sound frame, vad_f1ag marks to set 1.Its In, the value range of SNR2_lt_ave_t_thr2 is [1.0,2.5], the value range of sp_center_t_thr1 be [2.0, 4.0], the value range of lt_tsnr_t_thr1 is [2.5,5.0].

If SNR2_lt_ave is more than the thresholding SNR2_lt_ave_t_thr3 of a setting, and composes gravity center characteristics ginseng Number sp_center [2] be more than setting thresholding sp_center_t_thr2 and it is long when signal-to-noise ratio lt_snr set less than one Fixed thresholding lt_tsnr_t_thr2 then judges present frame to activate sound frame, vad_flag marks to set 1.Wherein, SNR2_lt_ The value range of ave_t_thr3 is [0.8,2.0], and the value range of sp_center_t_thr2 is [2.0,4.0], lt_ The value range of tsnr_t_thr2 is [2.5,5.0].

If SNR2_lt_ave is more than the thresholding SNR2_lt_ave_t_thr4 of a setting, and composes gravity center characteristics ginseng Number sp_center [2] be more than setting thresholding sp_center_t_thr3 and it is long when signal-to-noise ratio lt_snr set less than one Fixed thresholding lt_tsnr_t_thr3 then judges present frame to activate sound frame, vad_flag marks to set 1.Wherein, SNR2_lt_ The value range of ave_t_thr4 is [0.6,2.0], and the value range of sp_center_t_thr3 is [3.0,6.0], lt_ The value range of tsnr_t_thr3 is [2.5,5.0].

Step 306：According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_ The VAD court verdicts of lt_ave, the signal-to-noise ratio parameter of present frame and present frame correct activation sound and keep frame number；

Steps are as follows for specific calculating：

It is that activation phonetic symbol will indicates that present frame is activation sound frame that the currently active sound, which keeps the modified precondition of frame number, if not Meet the condition, does not correct the value that the currently active sound keeps frame number num_speech_hangover, be directly entered step 307.

Sound is activated to keep frame number amendment step as follows：

If front continuous speech frame number continuous_speech_num is less than the threshold value of a setting Continuous_speech_num_thr1, and lt_tsnr be less than one setting threshold value lt_tsnr_h_thr1, then when Preceding activation sound keeps frame number num_speech_hangover to be equal to minimum continuous activation sound frame number and subtracts front continuous speech frame number continuous_speech_num.Otherwise, if SNR2_lt_ave is more than the threshold value SNR2_lt_ave_ of a setting Thr1, and front continuous speech frame number continuous_speech_num is more than the threshold value of a setting Continuous_speech_num_thr2, then according to it is long when signal-to-noise ratio lt_tsnr size setting activation sound keep frame number The value of num_speech_hangover.Otherwise, the value that the currently active sound keeps frame number num_speech_hangover is not corrected. Minimum continuous activation sound frame number value is 8 wherein in the present embodiment, can between [6,20] value.

It is as follows：

If values of the signal-to-noise ratio lt_snr more than 2.6, num_speech_hangover is 3 when long；Otherwise, if it is long When signal-to-noise ratio lt_snr be more than 1.6, then the value of num_speech_hangover is 4；Otherwise, num_speech_hangover Value is 5.

Step 307：Frame number num_speech_hangover additions are kept to swash according to the court verdict of present frame and activation sound Sound living is kept, and obtains the VAD court verdicts of present frame.

Its method is：

If present frame is judged as inactive sound, that is, it is 0 to activate phonetic symbol will, and sound is activated to keep frame number num_ Speech_hangover is more than 0, and addition activation sound is kept, i.e., setting activation phonetic symbol will is 1, and by num_speech_ The value of hangover subtracts 1.

Obtain the final VAD court verdicts of present frame.

Preferably, further include that sound signal energy is activated when calculating average long according to the initial court verdicts of VAD after step 304 Measure E_fg；After step 307, further include background noise energy E when calculating average long according to VAD court verdicts_bg, calculated value is used for Next frame VAD judgements.

Sound signal ENERGY E is activated when average long_fgSpecific calculating process is as follows：

A), if the initial court verdict instruction present frames of VAD are activation sound frame, i.e. the value of VAD marks is 1, and E_t1Greatly In E_bgSeveral times, the present embodiment takes 6 times, then when update is average long activation sound energy accumulation value fg_energy and it is average long when Activate sound energy accumulation frame number fg_energy_count.Update method is that fg_energy adds E_t1Obtain new fg_energy. Fg_energy_count adds 1 to obtain new fg_energy_count.

Activate sound signal energy that can reflect newest activation sound signal energy when b), in order to ensure averagely long, if average Sound energy accumulation frame number value is activated to be equal to some setting value fg_max_frame_num when long, then cumulative frame number and accumulated value are same When be multiplied by an attenuation coefficient attenu_coef1.Fg_max_frame_num values 512, attenu_coef1 in the present embodiment Value is 0.75.

C), by it is averagely long when activation sound energy accumulation value fg_energy divided by it is average long when activate sound energy accumulation frame number Activate sound signal energy, calculation equation as follows when obtaining averagely long：

Background noise energy E when average long_bgComputational methods be：

Assuming that bg_energy_count is the cumulative frame number of background noise energy, for recording nearest background noise energy Accumulated value contains the energy of how many frame.Bg_energy is the accumulated value of nearest background noise energy.

A), if present frame is judged as inactive sound frame, the value of VAD marks is 0, and SNR2 is less than 1.0, then updates The background noise energy accumulated value bg_energy and cumulative frame number bg_energy_count of background noise energy.Update method is the back of the body Scape noise energy accumulated value bg_energy adds E_t1Obtain new background noise energy accumulated value bg_energy.Ambient noise energy The cumulative frame number bg_energy_count of amount adds 1 to obtain the cumulative frame number bg_energy_count of new background noise energy.

B), if background noise energy adds up frame number bg_energy_count be equal to it is averagely long when background noise energy The maximum count frame number of calculating, then cumulative frame number and accumulated value are multiplied by attenuation coefficient attenu_coef2 simultaneously.Wherein, this implementation The maximum count frame number that background noise energy calculates when example is average long is 512, and attenuation coefficient attenu_coef2 is equal to 0.75.

C), by background noise energy accumulated value bg_energy remove in background noise energy add up frame number obtain averagely long when Background noise energy calculation equation is as follows：

In order to realize above-mentioned activation sound detection method Examples 1 and 2, the present invention also provides a kind of detections of activation sound (VAD) device embodiment 1, as shown in figure 4, the device includes：

Filter group, the subband signal for obtaining present frame；

Characteristic parameter acquiring unit, the frame energy parameter and spectrum center of gravity for present frame to be calculated according to subband signal are special Levy the value of parameter；

Corresponding to embodiment of the method 2, the characteristic parameter acquiring unit is additionally operable to that time domain is calculated according to subband signal The value of stability characteristic parameter, for spectrum flatness characteristic parameter tunefulness characteristic parameter to be calculated according to spectral magnitude Value；；

Existing method acquisition can be used in each characteristic parameter, and following methods acquisition can also be used：

The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy Ratio or the ratio carry out the obtained value of smothing filtering；

Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after It is continuous that correlation progress smothing filtering is obtained.

As shown in figure 5, invention activation sound detects (VAD) device embodiment 2, as different from Example 1, described device Further include mark computing unit and background noise energy processing unit, wherein：

Indicate computing unit, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame：

Background noise energy processing unit comprising：

Computing module is identified, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark of present frame；

Correcting module is identified, for according to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality Mark, time-domain stability degree characteristic parameter are modified initial background noise mark；

The frame energy of background noise energy acquisition module, correction value and present frame for being identified according to ambient noise is joined The full band background noise energy of number, former frame, obtains the background noise energy of present frame, the background noise energy of the present frame It is calculated for next frame signal-to-noise ratio parameter.

Corresponding to embodiment of the method 1 and 2, as shown in fig. 6, the VAD decision units include：

Signal-to-noise ratio computation module when long activates message for what is be calculated by former frame when being averaged long

Number energy and it is average long when background noise energy ratio, signal-to-noise ratio lt_snr when being calculated long；

Average full band signal-to-noise ratio computing module, the full average value with signal-to-noise ratio SNR2 for calculating several nearest frames, Obtain averagely full band signal-to-noise ratio SNR2_lt_ave；

Signal-noise ratio threshold computing module, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously swash Sound frame number continuous_speech_num and front continuing noise frame number continuous_noise_num living is obtained The signal-noise ratio threshold snr_thr of VAD judgements；

Initial VAD judging modules, the signal-to-noise ratio for being calculated according to the decision threshold snr_thr and present frame of VAD Initial VAD judgements are calculated in parameter SNR1, SNR2；

VAD modified result modules, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum center of gravity and it is long when believe It makes an uproar and the court verdict of VAD is modified than lt_snr；

Activate sound to keep frame correcting module, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr, flat The VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio of present frame and present frame, amendment obtain activation sound and protect entirely Hold frame number；

VAD judging modules, for keeping frame number num_speech_ according to the court verdict and activation sound of present frame Hangover addition activation sounds are kept, and obtain the VAD court verdicts of present frame.

It is highly preferred that the VAD decision units further include：Energy computation module is used for according to the initial court verdicts of VAD, Sound signal ENERGY E is activated when calculating average long_fg；And background noise energy E when according to VAD court verdicts carrying out averagely long_bgMore Newly, updated value is for next frame VAD judgements.

The present invention also provides a kind of ambient noise detection method embodiments, as shown in fig. 7, this method includes：

Step 701：Obtain the subband signal and spectral magnitude of present frame；

Step 702：Frame energy parameter, spectrum gravity center characteristics parameter, the time-domain stability degree being calculated according to subband signal are special The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value for levying parameter；

Preferably, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value.

The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy Ratio or the ratio carry out the obtained value of smothing filtering.

The time-domain stability degree parameter is the variance of frame energy magnitude and the desired ratio of amplitude superposition value square, or is somebody's turn to do Ratio is multiplied by a coefficient.

The spectrum flatness parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average or the ratio It is multiplied by a coefficient.

Specifically, method same as above can be used in step 701 and step 702, and details are not described herein.

Step 703：It is special according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality Parameter, current frame energy parameter progress ambient noise detection are levied, judges whether present frame is ambient noise.

Preferably, judge that following either condition is set up, then it is noise signal to judge present frame not：

The time-domain stability degree parameter lt_stable_rate0 is more than the threshold value of a setting；

The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree Characteristic ginseng value is also greater than the threshold value that some sets；

Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree feature is joined Number lt_stable_rate0 values are more than the threshold value of its setting；

The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the door of corresponding setting Limit value；

Or, judgment frame energy parameter E_t1Value be more than setting threshold value E_thr1.

Specifically, it is assumed that present frame is ambient noise.

The present embodiment identifies background_flag to indicate whether present frame is that background is made an uproar by an ambient noise Sound, and arrange if it is determined that present frame is ambient noise, then it is 1 that ambient noise mark background_flag, which is arranged, is otherwise set It is 0 to set ambient noise mark background_flag.

According to time-domain stability degree characteristic parameter, spectrum gravity center characteristics parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, Current frame energy parameter detects whether present frame is noise signal.If not noise signal, then ambient noise is identified Background_flag is set to 0.

Detailed process is as follows：

Judge whether time-domain stability degree parameter lt_stable_rate0 is more than the threshold value lt_stable_ of a setting rate_thr1.If it is, it is noise signal to judge present frame not, and background_flag is set to 0.The present embodiment thresholding Value lt_stable_rate_thr1 value ranges are [0.8,1.6]；

Judge smoothly to compose the threshold value sp_center_thr1 whether gravity center characteristics parameter value is more than a setting, and when Domain stability characteristic ginseng value is also greater than the threshold value lt_stable_rate_thr2 that some sets.If it is, judging to work as Previous frame is not noise signal, and background_flag is set to 0.The value range of sp_center_thr1 is [1.6,4]；1t_ The value range of stable_rate_thr2 be (0,0.1].

Judge whether the value of tonality characteristic parameter tonality_rate2 is more than the threshold value tonality_ of a setting Whether rate_thr1, time-domain stability degree characteristic parameter lt_stable_rate0 value are more than the threshold value lt_stable_ of setting Rate_thr3, if above-mentioned condition is set up simultaneously, it is ambient noise to judge present frame not, and background_flag is assigned a value of 0.Threshold value tonality_rate_thr1 value ranges are in [0.4,0.66].Threshold value lt_stable_rate_thr3's takes Value is ranging from [0.06,0.3].

Judge whether the value for composing flatness characteristic parameter sSMR [0] is less than the threshold value sSMR_thr1 of setting, judges that spectrum is flat Whether the value of smooth degree characteristic parameter sSMR [1] is less than the threshold value sSMR_thr2 of setting, judges to compose flatness characteristic parameter sSMR Whether value [2] is less than the sSMR_thr3 of setting.If above-mentioned condition is set up simultaneously, it is ambient noise to judge present frame not. Background_flag is assigned a value of 0.The value range of threshold value sSMR_thr1, sSMR_thr2, sSMR_thr3 be [0.88, 0.98].Judge whether the value of flatness characteristic parameter sSMR [0] is less than the threshold value sSMR_thr4 of setting, judges to compose flatness Whether the value of characteristic parameter sSMR [1] is less than the threshold value sSMR_thr5 of setting, judges to compose flatness characteristic parameter sSMR [1] Value whether be less than setting threshold value sSMR_thr6.If any of the above-described condition is set up, it is that background is made an uproar to judge present frame not Sound.Background_flag is assigned a value of 0.The value range of sSMR_thr4, sSMR_thr5, sSMR_thr6 be [0.80, 0.92]

Judgment frame energy parameter E_t1Value whether be more than setting threshold value E_thr1, if above-mentioned condition set up, sentence Disconnected present frame is not ambient noise.Background_flag is assigned a value of 0.E_thr1 according to the dynamic range of frame energy parameter into Row value.

If it is not ambient noise that present frame, which is not detected, then it represents that present frame is ambient noise.

Corresponding to the above method, the present invention also provides a kind of ambient noise detection devices, as shown in figure 8, the device packet It includes：

Filter group, the subband signal for obtaining present frame；

Preferably, the ambient noise judging unit judges that following either condition is set up, then it is noise to judge present frame not Signal：

The present invention also provides a kind of tonality signal detecting methods, as shown in figure 9, method includes：

Step 901：Obtain the subband signal and spectral magnitude of present frame；

Step 902：Spectrum gravity center characteristics parameter, the time-domain stability degree characteristic parameter of present frame are calculated according to subband signal Value, according to spectral magnitude be calculated spectrum flatness characteristic parameter tunefulness characteristic parameter value；

Preferably, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains；The time-domain stability degree characteristic parameter is amplitude superposition value Variance and amplitude superposition value square desired ratio or the ratio be multiplied by a coefficient；

Step 903：It is special according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum center of gravity Sign parameter judges whether present frame is tonality signal.

When step 903 determines whether tonality signal, following operation is executed：

A) assume that current frame signal is non-tonality signal, indicate to work as with a tonality flag of frame tonality_frame Whether previous frame is tonality frame.

It is tonality frame that the value of tonality_frame, which is 1 expression present frame, in the present embodiment, and 0 indicates that present frame is non-tonality Frame；

B) judge tonality_rate2 after tonality characteristic parameter tonality_ratel or its smothing filtering value whether More than the threshold value tonality_decision_thr1 or tonality_decision_thr2 of corresponding setting, if above-mentioned There are one set up to then follow the steps C for condition), no to then follow the steps D)；

Wherein, the value range of tonality_decision_thr1 is [0.5,0.7], the value of tonality_ratel Ranging from [0.7,0.99].

If C time-domain stability degree characteristic ginseng values lt_stable_rate0 is less than the threshold value lt_ of a setting stable_decision_thr1；Compose the threshold value spc_ that gravity center characteristics parameter value sp_center [1] is more than a setting Decision_thr1, and the spectrum flatness characteristic parameter of each subband is respectively less than corresponding preset threshold value, specifically, It composes flatness characteristic parameter sSMR [0] and is less than the threshold value sSMF_decision_thr1 or sSMR [1] of a setting less than one The threshold value sSMF_decision_thr2 or sSMR [2] of a setting are less than the threshold value sSMF_decision_ of a setting thr3；Present frame is then judged for tonality frame, and the value of setting tonality flag of frame tonality_frame is 1, is otherwise judged as non-tune Property frame, the value of setting tonality flag of frame tonality_frame is 0.And continue to execute step D.

Wherein, the value range of threshold value lt_stable_decision_thr1 is [0.01,0.25], spc_ Decision_thr1 is [1.0,1.8], and sSMF_decision_thr1 is [0.6,0.9], sSMF_decision_thr2 [0.6,0.9], sSMF_decision_thr3 [0.7,0.98].

D) tonality degree characteristic parameter tonality_degree is carried out more according to tonality flag of frame tonality_frame Newly, wherein tonality extent index tonality_degree initial values are configured when activating sound detection device to start to work, and are taken Value is ranging from [0,1].In the case of difference, tonality degree characteristic parameter tonality_degree computational methods are different：

If current tonality flag of frame instruction present frame is tonality frame, using following equation to tonality degree feature Parameter tonality_degree is updated：

Tonality_degree=tonality_degree_-1·td_scale_A+td_scale_B；

Wherein, tonality_degree_-1For the tonality degree characteristic parameter of former frame.Its initial value value range be [0, 1].Td_scale_A is attenuation coefficient, and value range is [0,1]；Td_scale_B is cumulative coefficient, and value range is [0,1].

E) judge whether present frame is tonality letter according to updated tonality degree characteristic parameter tonality_degree Number, and the value of tonality mark tonality_flag is set.

Specifically, if tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, judgement is worked as Previous frame is otherwise tonality signal judges present frame for non-tonality signal.

Corresponding to aforementioned tonality signal detecting method, the present invention also provides a kind of tonality signal supervisory instrument, such as Figure 10 Shown, which includes：

Filter group, the subband signal for obtaining present frame；

Calculation of characteristic parameters unit is steady for current spectrum gravity center characteristics parameter, time domain to be calculated according to subband signal Surely the value for spending characteristic parameter, the value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude；

As previously mentioned, the spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and does not add It weighs the ratio of accumulated value or the ratio carries out the value that smothing filtering obtains；

As shown in figure 11, the tonality signal judging unit includes：

A tonality flag of frame is used in combination for setting current frame signal as non-tonality signal in tonality signal initialization module Tonality_frame indicates whether present frame is tonality frame；

Tonality characteristic parameter judgment module, after judging tonality characteristic parameter tonality_rate1 or its smothing filtering Whether the value of tonality_rate2 is more than the threshold value of corresponding setting；

Tonality signal judgment module, for when the tonality characteristic parameter judgment module is judged as YES, if time domain is steady Surely degree characteristic ginseng value is less than the threshold value of a setting；The threshold value that gravity center characteristics parameter value is more than a setting is composed, and each The spectrum flatness characteristic parameter of subband is respectively less than corresponding preset threshold value；Judge present frame for tonality frame；In basis The tonality degree characteristic parameter tonality_degree being calculated judges whether present frame is tonality signal, and in the tune Property characteristic parameter judgment module when being judged as NO, for according to updated tonality degree characteristic parameter tonality_degree Judge whether present frame is tonality signal, and the value of tonality mark tonality_flag is set；

Tonality extent index update module, for after tonality characteristic parameter tonality_rate1 or its smothing filtering When the value of tonality_rate2 is respectively less than the threshold value of corresponding setting, according to tonality flag of frame to tonality degree characteristic parameter Tonality_degree is updated, and wherein tonality extent index tonality_degree initial values are in activation sound detection device It is configured when start-up operation.

Specifically, if current tonality flag of frame instruction present frame is tonality frame, tonality extent index update module Tonality degree characteristic parameter tonality_degree is updated using following equation：

Tonality_degree=tonality_degree_-1·td scale_A+td_scale_B；

If tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, the tonality signal Judgment module judges present frame for tonality signal, otherwise, judges present frame for non-tonality signal.

Specifically, if tonality degree characteristic parameter tonality_degree is more than the threshold value 0.5, judge current Frame is tonality signal, and the value of setting tonality mark tonality_flag is 1；Otherwise, present frame is judged for non-tonality signal, if It is 0 to set the value.The threshold value interval of tonality signal decision is [0.3,0.7].

Activation sound keeps the modification method of frame number, as shown in figure 12, this method in being adjudicated the present invention also provides a kind of VAD Including：

Step 1201：Signal-to-noise ratio lt_snr when being calculated long according to subband signal；

Specifically, ambient noise energy when activating sound signal energy when being averaged long and being averaged long being calculated by former frame The ratio of amount, signal-to-noise ratio lt_snr when being calculated long；Logarithm expression can be used in signal-to-noise ratio lt_snr when long.

Step 1202：Calculate average full band signal-to-noise ratio SNR2_lt_ave；

The full average value with signal-to-noise ratio SNR2 for calculating several frames recently obtains averagely full band signal-to-noise ratio SNR2_lt_ ave；

Step 1203：According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio The VAD court verdicts of SNR2_lt_ave, the signal-to-noise ratio parameter of present frame and present frame keep frame number to carry out the currently active sound It corrects.

Understandably, it is that activation phonetic symbol will indicates that present frame is activation that the currently active sound, which keeps the modified precondition of frame number, Sound frame.

Preferably, when keeping frame number to be modified the currently active sound, if front continuous speech frame number is set less than one Fixed threshold value 1, and it is long when signal-to-noise ratio lt_snr be less than the threshold value 2 of a setting, then the currently active sound keeps frame number etc. Front continuous speech frame number is subtracted in minimum continuous activation sound frame number；Otherwise, if average full band signal-to-noise ratio SNR2_lt_ave is big In one setting threshold value 3, and front continuous speech frame number be more than one setting threshold value 4, then according to it is long when believe Make an uproar than size setting activation sound keep the value of frame number, otherwise do not correct the currently active sound and keep frame number num_speech_ The value of hangover.

The modification method that frame number is kept corresponding to foregoing activation sound activates sound in being adjudicated the present invention also provides a kind of VAD The correcting device of frame number is kept, as shown in figure 13, which includes：

Specifically, when long signal-to-noise ratio computation unit by former frame be calculated it is average long when activation sound signal energy and The ratio of background noise energy, signal-to-noise ratio lt_snr when being calculated long when average long；

Specifically, average complete the putting down with signal-to-noise ratio SNR2 for calculating several frames recently with signal-to-noise ratio computing unit entirely Mean value obtains averagely full band signal-to-noise ratio SNR2_lt_ave.

As described above, it is that activation phonetic symbol will indicates that present frame is sharp that the currently active sound, which keeps the modified precondition of frame number, Sound frame living.

Preferably, activation sound keeps frame number amending unit, when keeping frame number to be modified the currently active sound, if front Continuous speech frame number is less than the threshold value 1 of a setting, and it is long when signal-to-noise ratio lt_snr be less than the threshold value 2 of a setting, Then the currently active sound keeps frame number to be equal to minimum continuous activation sound frame number and subtracts front continuous speech frame number, otherwise, if average It is more than the threshold value 3 of a setting with signal-to-noise ratio SNR2_lt_ave entirely, and front continuous speech frame number is more than a setting Threshold value 4, then according to it is long when signal-to-noise ratio size setting activation sound keep the value of frame number, otherwise do not correct the currently active sound and protect Hold the value of frame number nun_speech_hangover.

The method of adjustment of signal-noise ratio threshold in being adjudicated the present invention also provides a kind of VAD, as shown in figure 14, the method for adjustment Including：

Step 1401：The spectrum gravity center characteristics parameter of present frame is calculated according to subband signal；

Specifically, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains.

Step 1402：Activation sound signal energy when being averaged long and ambient noise when being averaged long being calculated by former frame The ratio of energy, signal-to-noise ratio lt_snr when being calculated long；

Step 1403：According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame number and front continuous The signal-noise ratio threshold of noise frame number continuous_noise_num adjustment VAD judgements.

Specifically, as shown in figure 15, the step of adjustment signal-noise ratio threshold includes：

Step 1501：The initial value of signal-noise ratio threshold snr_thr is set；

Step 1502：Adjust the value of signal-noise ratio threshold snr_thr for the first time according to spectrum center of gravity parameter；

Step 1503：Sound frame number continuous_speech_num, front continuing noise frame are continuously activated according to front Number continuous_noise_num, average full band signal-to-noise ratio SNR2_lt_ave and it is long when signal-to-noise ratio bis- adjustment of lt_snr The value of signal-noise ratio threshold snr_thr；

Step 1504：According to it is long when signal-to-noise ratio lt_snr value signal-noise ratio threshold snr_thr is finally corrected again, obtain To the signal-noise ratio threshold snr_thr of present frame.

Corresponding to the method for adjustment of aforementioned signal-noise ratio threshold, signal-noise ratio threshold in being adjudicated the present invention also provides a kind of VAD Adjusting apparatus, as shown in figure 16, which includes：

Preferably, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains.

Signal-to-noise ratio computation unit when long, activation sound signal energy peace when being averaged long for being calculated by former frame The ratio of background noise energy when long, signal-to-noise ratio lt_snr when being calculated long；

Specifically, when the signal-noise ratio threshold adjustment unit adjustment signal-noise ratio threshold, setting signal-noise ratio threshold snr_thr's Initial value；Adjust the value of signal-noise ratio threshold snr_thr for the first time according to spectrum center of gravity parameter；Sound frame number is continuously activated according to front Continuous_speech_num, front continuing noise frame number continuous_noise_num, average full band signal-to-noise ratio SNR2_lt_ave and it is long when signal-to-noise ratio lt_snr bis- adjustment snr_thr values；Finally, according to it is long when signal-to-noise ratio lt_snr Value carries out final adjustment to signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.

Modern many speech coding standards, such as AMR, AMR-WB, all vad enabled function.In terms of efficiency, these codings The VAD of device can not reach good performance under all typical background noises.Especially under unstable noise, such as The VAD efficiency of office noises, these encoders is all relatively low.And for music signal, these VAD sometimes will appear mistake inspection It surveys, causes corresponding Processing Algorithm apparent quality occur and decline.

The shortcomings that method of the present invention overcomes existing vad algorithm is improving VAD to the same of non-stationary noise detection efficiency When also improve the accuracy rate of music detection.Allow and better performance is obtained using the voice frequency signal Processing Algorithm of this VAD.

Ambient noise detection method provided by the invention may make that the estimation of ambient noise is more accurate and stablizes, favorably In the accuracy rate for improving VAD detections.Present invention simultaneously provides tonality signal detecting method, improve the standard of tonality music detection True rate.Present invention simultaneously provides activation sound keep frame number modification method, may make under different noises and signal-to-noise ratio, Vad algorithm can preferably be balanced in performance and efficiency.Present invention simultaneously provides VAD judgement in signal-noise ratio threshold tune Adjusting method may make VAD decision algorithms that can reach preferable accuracy rate under different signal-to-noise ratio, in the feelings for ensureing quality Under condition, further raising efficiency.

One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, all or part of step of above-described embodiment can also be realized using one or more integrated circuits.Accordingly Ground, the form that hardware may be used in each module/unit in above-described embodiment are realized, the shape of software function module can also be used Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.

Claims

1. a kind of activation sound detects VAD method, which is characterized in that this method includes：

Obtain the subband signal and spectral magnitude of present frame；

Frame energy parameter, spectrum gravity center characteristics parameter and the time-domain stability degree characteristic parameter of present frame are calculated according to subband signal Value；The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude；

The frame energy parameter and signal-to-noise ratio sub-belt energy of the background noise energy, present frame estimated according to former frame calculate To the signal-to-noise ratio parameter of present frame；

According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, Tonality calculation of characteristic parameters obtains the tonality mark of present frame；

VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter；

Wherein, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value；

The spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and the ratio of unweighted accumulated value Value or the ratio carry out the value that smothing filtering obtains；

The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired Ratio or the ratio are multiplied by a coefficient；

The spectrum flatness characteristic parameter is the ratio of the geometric mean and arithmetic average of the spectral magnitude of multiple subbands, or The ratio is multiplied by a coefficient；

The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after It is continuous that correlation progress smothing filtering is obtained.

2. the method as described in claim 1, which is characterized in that before or after obtaining VAD court verdicts, this method is also wrapped It includes：

According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, Tonality calculation of characteristic parameters obtains the initial background noise mark of present frame；

After obtaining VAD court verdicts, this method further includes：According to the court verdict of present frame VAD, tonality characteristic parameter, Signal-to-noise ratio parameter, tonality mark, time-domain stability degree characteristic parameter are modified initial background noise mark；

According to the frame energy parameter of the correction value of ambient noise mark and present frame, the full band background noise energy of former frame, obtain Subband background noise energy to present frame and full band background noise energy；

The background noise energy of the present frame is calculated for next frame signal-to-noise ratio parameter.

3. the method as described in claim 1, which is characterized in that

VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter, calculate Steps are as follows：

A, the ratio of activation sound signal energy and background noise energy when being averaged long when being averaged long being calculated by former frame, Signal-to-noise ratio when being calculated long；

B calculates the full average value with signal-to-noise ratio SNR2 of several frames recently, obtains averagely full band signal-to-noise ratio SNR2_lt_ave；

C, according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound frame number continuous_ Speech_num and front continuing noise frame number continuous_noise_num obtains the signal-noise ratio threshold snr_ of VAD judgements thr；

D is calculated initial VAD according to the decision threshold snr_thr of VAD and signal-to-noise ratio parameter and adjudicates, wherein the noise Include sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio SNR2 than parameter；

E, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum gravity center characteristics parameter and it is long when signal-to-noise ratio lt_snr The court verdict of VAD is modified；

F, according to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_lt_ave, when The signal-to-noise ratio parameter of previous frame and the VAD court verdicts of present frame correct activation sound and keep frame number；

G keeps frame number num_speech_hangover addition activation sounds to keep according to the court verdict of present frame and activation sound, Obtain the VAD court verdicts of present frame.

4. method as claimed in claim 3, it is characterised in that：Further include according to the initial court verdicts of VAD, meter after step d Sound signal ENERGY E is activated when calculating average long_fg；After step g, further include that background is made an uproar when calculating average long according to VAD court verdicts Acoustic energy E_bg, calculated value is for next frame VAD judgements.

5. a kind of activation sound detects VAD devices, which is characterized in that the device includes：

Filter group, the subband signal for obtaining present frame；

Characteristic parameter acquiring unit, frame energy parameter, spectrum gravity center characteristics ginseng for present frame to be calculated according to subband signal The value of number and time-domain stability degree characteristic parameter；Spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude Value；

Indicate computing unit, for according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, Spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame；

Signal-to-noise ratio computation unit, the frame energy parameter of background noise energy, present frame for being estimated according to former frame and The signal-to-noise ratio parameter of present frame is calculated in signal-to-noise ratio sub-belt energy；

VAD decision units, for being calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter VAD court verdicts；

6. activation sound detects VAD devices as claimed in claim 5, which is characterized in that

Described device further includes background noise energy processing unit comprising：

Identify computing module, for according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, Spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark of present frame；

Correcting module is identified, for according to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality mark Will, time-domain stability degree characteristic parameter are modified initial background noise mark；

Background noise energy acquisition module, it is the frame energy parameter of correction value and present frame for being identified according to ambient noise, preceding The full band background noise energy of one frame, obtains the background noise energy of present frame, the background noise energy of the present frame is used for Next frame signal-to-noise ratio parameter calculates.

7. activation sound detects VAD devices as claimed in claim 5, which is characterized in that the VAD decision units include：

Signal-to-noise ratio computation module when long, activation sound signal energy and average length when being averaged long for being calculated by former frame When background noise energy ratio, signal-to-noise ratio lt_snr when being calculated long；

Average full band signal-to-noise ratio computing module, the full average value with signal-to-noise ratio SNR2 for calculating several nearest frames obtain Average full band signal-to-noise ratio SNR2_lt_ave；

Signal-noise ratio threshold computing module, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound Frame number continuous_speech_num and front continuing noise frame number continuous_noise_num obtain VAD and sentence Signal-noise ratio threshold snr_thr certainly；

Initial VAD judging modules, the signal-to-noise ratio parameter for being calculated according to the decision threshold snr_thr and present frame of VAD Initial VAD judgements are calculated, wherein the signal-to-noise ratio parameter includes sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio SNR2；

VAD modified result modules, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum gravity center characteristics parameter and length When signal-to-noise ratio lt_snr the court verdict of VAD is modified；

Activate sound to keep frame correcting module, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr, average complete VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio of present frame and present frame, amendment obtain activation sound and keep frame Number；

VAD judging modules, for keeping frame number num_speech_hangover to add according to the court verdict and activation sound of present frame Add activation sound to keep, obtains the VAD court verdicts of present frame.

8. activation sound detects VAD devices as claimed in claim 7, it is characterised in that：The VAD decision units further include：Energy Computing module is measured, for according to the initial court verdicts of VAD, sound signal ENERGY E to be activated when calculating average long_fg；And sentenced according to VAD Background noise energy E when certainly result calculates average long_bg, calculated value is for next frame VAD judgements.

9. a kind of ambient noise detection method, which is characterized in that this method includes：

Obtain the subband signal and spectral magnitude of present frame；

The frame energy parameter that is calculated according to subband signal, spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter, root The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude；

According to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, current Frame energy parameter carries out ambient noise detection, judges whether present frame is ambient noise；

It is described to compose the geometric mean of spectral magnitude and the ratio of arithmetic average or the ratio that flatness parameter is multiple subbands Value is multiplied by a coefficient；

10. method as claimed in claim 9, it is characterised in that：Judge that following either condition is set up, then judges that present frame is not made an uproar Acoustical signal：

The time-domain stability degree characteristic parameter lt_stable_rate0 is more than the threshold value of a setting；

The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree feature Parameter value is also greater than the threshold value that some sets；

Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree characteristic parameter Lt_stable_rate0 values are more than the threshold value of its setting；

The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the threshold value of corresponding setting；

11. a kind of ambient noise detection device, which is characterized in that the device includes：

Filter group, the subband signal for obtaining present frame；

Calculation of characteristic parameters unit, frame energy parameter, spectrum gravity center characteristics parameter for being calculated according to subband signal, time domain The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value of stability characteristic parameter；

Ambient noise judging unit, for according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng Number, tonality characteristic parameter, current frame energy parameter carry out ambient noise detection, judge whether present frame is ambient noise；

12. detection device as claimed in claim 11, it is characterised in that：The ambient noise judging unit judges following any Condition is set up, then it is noise signal to judge present frame not：

13. a kind of tonality signal detecting method, which is characterized in that this method includes：

Obtain the subband signal and spectral magnitude of present frame；

Spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter is calculated according to subband signal, according to spectral magnitude meter It calculates and obtains the value of spectrum flatness characteristic parameter tunefulness characteristic parameter；

Worked as according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter judgement Whether previous frame is tonality signal；

Wherein, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy Ratio or the ratio carry out the obtained value of smothing filtering；

14. method as claimed in claim 13, it is characterised in that：When determining whether tonality signal, following operation is executed：

A) assume that current frame signal is non-tonality signal, present frame is indicated with a tonality flag of frame tonality_frame Whether it is tonality frame；

B) judge whether the value of tonality_rate2 after tonality characteristic parameter tonality_rate1 or its smothing filtering is more than The threshold value of corresponding setting, if there are one set up to then follow the steps C for above-mentioned condition), no to then follow the steps D)；

C) if time-domain stability degree characteristic ginseng value is less than the threshold value of a setting；Spectrum gravity center characteristics parameter value is set more than one Fixed threshold value, and the spectrum flatness characteristic parameter of each subband is respectively less than corresponding preset threshold value；Then judge current Frame is tonality frame, and the value of tonality flag of frame is arranged, and is otherwise judged as non-tonality frame, and the value of tonality flag of frame is arranged, and continues to hold Row step D)；

D) tonality degree characteristic parameter tonality_degree is updated according to tonality flag of frame, wherein tonality degree is joined Number tonality_degree initial values are configured when activating sound detection to start to work；

E) judge whether present frame is tonality signal according to updated tonality degree characteristic parameter tonality_degree, and The value of tonality mark tonality_flag is set.

15. method as claimed in claim 14, it is characterised in that：If current tonality flag of frame instruction present frame is tonality Frame is then updated tonality degree characteristic parameter tonality_degree using following equation：

Tonality_degree=tonality_degree_-1·td_scale_A+td_scale_B；

Wherein, tonality_degree_-1For the tonality degree characteristic parameter of former frame, initial value value range is [0,1], Td_scale_A is attenuation coefficient, and td_scale_B is cumulative coefficient.

16. method as claimed in claim 14, it is characterised in that：

If tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, judge present frame for tonality Otherwise signal judges present frame for non-tonality signal.

17. a kind of tonality signal supervisory instrument, which is characterized in that the detection device includes：

Filter group, the subband signal for obtaining present frame；

Calculation of characteristic parameters unit, for spectrum gravity center characteristics parameter to be calculated according to subband signal, time-domain stability degree feature is joined The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for several values；

Tonality signal judging unit, for according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng Number, spectrum gravity center characteristics parameter judge whether present frame is tonality signal；

18. detection device as claimed in claim 17, it is characterised in that：The tonality signal judging unit includes：

Tonality signal judgment module, for when the tonality characteristic parameter judgment module is judged as YES, if time-domain stability degree Characteristic ginseng value is less than the threshold value of a setting；Compose the threshold value that gravity center characteristics parameter value is more than a setting, and each subband Spectrum flatness characteristic parameter be respectively less than corresponding preset threshold value；Judge present frame for tonality frame；According to calculating Obtained tonality degree characteristic parameter tonality_degree judges whether present frame is tonality signal, and in tonality spy When sign parameter judgment module is judged as NO, for being judged according to updated tonality degree characteristic parameter tonality_degree Whether present frame is tonality signal, and the value of tonality mark tonality_flag is arranged；

19. detection device as claimed in claim 18, it is characterised in that：If current tonality flag of frame instruction present frame is Tonality frame, then tonality extent index update module using following equation to tonality degree characteristic parameter tonality_degree It is updated：

Tonality_degree=tonality_degree_-1·td_scale_A+td_scale_B；

20. detection device as claimed in claim 18, it is characterised in that：If tonality degree characteristic parameter tonality_ Degree is more than the threshold value of some setting, then the tonality signal judgment module judges otherwise present frame is sentenced for tonality signal Disconnected present frame is non-tonality signal.