CN103903634B - The detection of activation sound and the method and apparatus for activating sound detection - Google Patents
The detection of activation sound and the method and apparatus for activating sound detection Download PDFInfo
- Publication number
- CN103903634B CN103903634B CN201210570563.5A CN201210570563A CN103903634B CN 103903634 B CN103903634 B CN 103903634B CN 201210570563 A CN201210570563 A CN 201210570563A CN 103903634 B CN103903634 B CN 103903634B
- Authority
- CN
- China
- Prior art keywords
- tonality
- frame
- value
- parameter
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Noise Elimination (AREA)
Abstract
A kind of method and apparatus the present invention relates to activation sound detection (VAD) and for activating sound to detect, this method include:Obtain the subband signal and spectral magnitude of present frame;The value of the frame energy parameter and spectrum gravity center characteristics parameter of present frame is calculated according to subband signal;The signal-to-noise ratio parameter of present frame is calculated in the frame energy parameter and signal-to-noise ratio sub-belt energy of the background noise energy, present frame estimated according to former frame;VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter.The accuracy rate of non-stationary noise (such as office noise) and music detection can be improved in the method for the present invention and device.
Description
Technical field
The present invention relates to it is a kind of activation sound detection (VAD) and for activate sound detect method (including ambient noise detection,
Present frame activation sound keeps the adjustment etc. of signal-noise ratio threshold in the amendment of frame number, VAD judgements in tonality signal detection, VAD judgements
Method) and device.
Background technology
In normal voice communication, user is speaking sometimes, is listening sometimes, this when will occur non-in communication process
Scale section is activated, both call sides total non-speech stage will be more than the total voice coding duration of both call sides under normal circumstances
50%.In inactive scale section, only ambient noise, usually not any useful information of ambient noise.The fact that utilization,
In voice frequency signal processing procedure, by activating sound to detect the detection of (VAD) algorithm for activation sound and inactive sound, and use
Different methods are respectively processed.Modern many speech coding standards, such as AMR, AMR-WB, all vad enabled function.It is imitating
In terms of rate, the VAD of these encoders can not reach good performance under all typical background noises.Especially non-
Under steady state noise, the VAD efficiency of these encoders is all relatively low.And for music signal, these VAD sometimes will appear mistake inspection
It surveys, causes corresponding Processing Algorithm apparent quality occur and decline.
Invention content
A kind of method the technical problem to be solved in the present invention is to provide activation sound detection (VAD) and for activating sound to detect
(including the currently active sound keeps believing in the amendment of frame number, VAD judgements in ambient noise detection, tonality signal detection, VAD judgement
Make an uproar than the methods of the adjustment of thresholding) and device, to improve the accuracy rate of VAD detections.
In order to solve the above technical problems, the present invention provides a kind of activation sounds to detect (VAD) method, this method includes:
Obtain the subband signal and spectral magnitude of present frame;
Frame energy parameter, spectrum gravity center characteristics parameter and the time-domain stability degree feature of present frame are calculated according to subband signal
The value of parameter;The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude;
The frame energy parameter and signal-to-noise ratio sub-belt energy meter of the background noise energy, present frame estimated according to former frame
Calculation obtains the signal-to-noise ratio parameter of present frame;
According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng
Number, tonality calculation of characteristic parameters obtain the tonality mark of present frame;
VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter.
In order to solve the above technical problems, the present invention provides a kind of activation sounds to detect (VAD) device, which includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Characteristic parameter acquiring unit, frame energy parameter, the spectrum center of gravity for present frame to be calculated according to subband signal are special
Levy the value of parameter and time-domain stability degree characteristic parameter;Spectrum flatness characteristic parameter tunefulness feature is calculated according to spectral magnitude
The value of parameter;
Indicate computing unit, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature
Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame;
Signal-to-noise ratio computation unit, the frame energy ginseng of background noise energy, present frame for being estimated according to former frame
The signal-to-noise ratio parameter of present frame is calculated in number and signal-to-noise ratio sub-belt energy;
VAD decision units, for being calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
Obtain VAD court verdicts.
In order to solve the above technical problems, the present invention provides a kind of ambient noise detection method, this method includes:
Obtain the subband signal and spectral magnitude of present frame;
The frame energy parameter that is calculated according to subband signal, spectrum gravity center characteristics parameters, time-domain stability degree characteristic parameter
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for value;
According to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter,
Current frame energy parameter carries out ambient noise detection, judges whether present frame is ambient noise.
In order to solve the above technical problems, the present invention provides a kind of ambient noise detection device, which includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Calculation of characteristic parameters unit, frame energy parameter for being calculated according to subband signal, spectrum gravity center characteristics parameter,
Spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude in the value of time-domain stability degree characteristic parameter
Value;
Ambient noise judging unit, for special according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness
It levies parameter, tonality characteristic parameter, current frame energy parameter and carries out ambient noise detection, judge whether present frame is ambient noise.
In order to solve the above technical problems, the present invention provides a kind of tonality signal detecting method, this method includes:
Obtain the subband signal and spectral magnitude of present frame;
Spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter is calculated according to subband signal, according to frequency spectrum width
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated in value;
According to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter,
Judge whether present frame is tonality signal.
In order to solve the above technical problems, the present invention provides a kind of tonality signal supervisory instrument, which includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Spectrum gravity center characteristics parameter, time-domain stability degree feature is calculated according in subband signal in calculation of characteristic parameters unit
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value of parameter;
Tonality signal judging unit, for according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness feature
Parameter, spectrum gravity center characteristics parameter judge whether present frame is tonality signal.
In order to solve the above technical problems, present frame activation sound keeps repairing for frame number in being adjudicated the present invention provides a kind of VAD
Correction method, this method include:
Signal-to-noise ratio lt_snr and average full band signal-to-noise ratio SNR2_lt_ave when being calculated long;
According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_lt_ave,
The signal-to-noise ratio of present frame and the VAD court verdicts of present frame keep frame number to be modified the currently active sound.
In order to solve the above technical problems, the currently active sound keeps the amendment of frame number in being adjudicated the present invention provides a kind of VAD
Device, the correcting device include:
Signal-to-noise ratio computation unit when long, signal-to-noise ratio lt_snr when for calculating long;
Average full band signal-to-noise ratio computing unit, for calculating averagely full band signal-to-noise ratio SNR2_lt_ave;
Activate sound to keep frame number amending unit, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr,
The average VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of present frame and present frame entirely, to the currently active
Sound keeps frame number to be modified.
In order to solve the above technical problems, the present invention provides the method for adjustment of signal-noise ratio threshold in a kind of VAD judgements, the tune
Adjusting method includes:
The spectrum gravity center characteristics parameter of present frame is calculated according to subband signal;
Calculate that former frame is calculated it is average long when activation sound signal energy and it is average long when background noise energy ratio
Value, signal-to-noise ratio lt_snr when obtaining long;
According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame number and front continuing noise frame
The signal-noise ratio threshold of number continuous_noise_num adjustment VAD judgements.
In order to solve the above technical problems, the present invention provides the adjusting apparatus of signal-noise ratio threshold in a kind of VAD judgements, the tune
Engagement positions include:
Characteristic parameter acquiring unit, the spectrum gravity center characteristics parameter for present frame to be calculated according to subband signal;
Signal-to-noise ratio computation unit when long, activation sound signal energy peace when being averaged long being calculated for calculating former frame
The ratio of background noise energy when long, signal-to-noise ratio lt_snr when obtaining long;
Signal-noise ratio threshold adjustment unit, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame
The signal-noise ratio threshold of number and front continuing noise frame number continuous_noise_num adjustment VAD judgements.
The shortcomings that the method for the present invention and device overcome existing vad algorithm is improving VAD to non-stationary noise detection efficiency
While also improve the accuracy rate of music detection.Allow and is obtained preferably using the voice frequency signal Processing Algorithm of this VAD
Performance.
Description of the drawings
Fig. 1 is the schematic diagram of invention activation sound detection method embodiment 1;
Fig. 2 is the schematic diagram of invention activation sound detection method embodiment 2;
Fig. 3 is to obtain the process schematic of VAD court verdicts in the embodiment of the present invention 1,2;
Fig. 4 is the modular structure schematic diagram that invention activation sound detects (VAD) device embodiment 1;
Fig. 5 is the modular structure schematic diagram that invention activation sound detects (VAD) device embodiment 2;
Fig. 6 is the modular structure schematic diagram of the VAD decision units in VAD devices of the present invention;
Fig. 7 is the schematic diagram of ambient noise detection method embodiment of the present invention;
Fig. 8 is the modular structure schematic diagram of ambient noise detection device of the present invention;
Fig. 9 is the schematic diagram of tonality signal detecting method embodiment of the present invention;
Figure 10 is the modular structure schematic diagram of tonality signal supervisory instrument of the present invention;
Figure 11 is the modular structure schematic diagram of the tonality signal judging unit of tonality signal supervisory instrument of the present invention;
Figure 12 is the schematic diagram for the modification method embodiment that the currently active sound keeps frame number in VAD of the present invention judgements;
Figure 13 is the modular structure schematic diagram for the correcting device that the currently active sound keeps frame number in VAD of the present invention judgements;
Figure 14 is the schematic diagram of the method for adjustment embodiment of signal-noise ratio threshold in VAD of the present invention judgements;
Figure 15 is the idiographic flow schematic diagram of present invention adjustment signal-noise ratio threshold;
Figure 16 is the modular structure schematic diagram of the adjusting apparatus of signal-noise ratio threshold in VAD of the present invention judgements.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes.Obviously, described embodiments are only a part of the embodiments of the present invention, and not all embodiment.Based on this
Embodiment in invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
Invention activation sound detects (VAD, Voice Activity Detection) embodiment of the method 1, as shown in Figure 1,
This method includes:
Step 101:Obtain the subband signal and spectral magnitude of present frame;
With frame length it is 20ms in the present embodiment, is illustrated for the audio stream that sample rate is 32kHz.In other frame lengths and
Under the conditions of sample rate, method of the invention is equally applicable.
By present frame time-domain signal input filter group, sub-band filter calculating is carried out, filter group subband signal is obtained;
The filter group in 40 channels is used in the present embodiment, the present invention is for the filter group using other port numbers
It is equally applicable.
Present frame time-domain signal is inputted to the filter group in 40 channels, sub-band filter calculating is carried out, obtains 16 time samples
Filter group subband signal X [k, l], 0≤k < 40,0≤l < 16 of upper 40 subbands of point, wherein k are filter group subband
Index, value indicate that the corresponding subband of coefficient, l are that the time sampling point of each subband indexes, and implementation step is as follows:
101a:640 nearest audio signal sample values are stored in data buffer storage.
101b:Data in data buffer storage are moved into 40 positions, 40 earliest sampled values are removed data buffer storage, and handle
40 new sampling points are deposited on 0 to 39 position.
Data x in caching is multiplied by window coefficient, obtains array z, calculation equation is as follows:
Z [n]=x [n] Wqmf[n];0≤n < 640;
Wherein WqmfFor filter group window coefficient.
One 80 points of data u is calculated using pseudocode below,
Array r and i are obtained using following equation calculation:
R [n]=u [n]-u [79-n], 0≤n < 40
I [n]=u [n]+u [79-n]
101c:The calculating process for repeating 101b is filtered until by all filtered device group of all data of this frame, last
It is filter group subband signal X [k, l] to export result.
101d:After completing process calculated above, the filter group subband signal X of 16 time sampling points of 40 subbands is obtained
[k, l], 0≤k < 40,0≤l < 16.
Time-frequency conversion is carried out to filter group subband signal, and spectral magnitude is calculated.
Time-frequency conversion wherein is carried out to whole filter group subbands or part filter group subband, calculates spectral magnitude, all
The embodiment of the present invention may be implemented.The time-frequency conversion method of the present invention can be DFT, FFT, DCT or DST.The present embodiment
For DFT, illustrate its concrete methods of realizing.Calculating process is as follows:
The 16 time sampling point data indexed on each filter group subband for 0 to 9 are carried out with 16 points of DFT transform,
Spectral resolution is further increased, and calculates the amplitude of each frequency point, obtains spectral magnitude XDFT_AMP。
The amplitude process for calculating each frequency point is as follows:
First, array X is calculatedDFTThe energy of [k] [j] on each point, calculation equation are as follows:
XDFT_POW[k, j]=(real (XDFT[k, j])2+(image(XDFT[k, j])2;0≤k < 10;0≤j < 16;Wherein
real(XDFT_POW[k, j]), image (XDFT_POW[k, j]) spectral coefficient XD is indicated respectivelyFT_POWThe real and imaginary parts of [k, j].
If k is even number, the spectral magnitude on each frequency point is calculated using following equation:
If k is odd number, the spectral magnitude on each frequency point is calculated using following equation:
XDFT_AMPSpectral magnitude as after time-frequency conversion.
Step 102:The value of the frame energy parameter and spectrum gravity center characteristics parameter of present frame is calculated according to subband signal;
Art methods acquisition can be used in frame energy parameter, the value for composing gravity center characteristics parameter tunefulness characteristic parameter, excellent
Selection of land, each parameter obtain with the following method:
The frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;Specifically:
A) energy of each filter group subband is calculated according to filter group subband signal X [k, l], calculation equation is as follows:
B) the part sense of hearing is obtained than more sensitive filter group subband or the energy accumulation of all filter group subbands
Frame energy parameter.
Wherein according to psycho-acoustic model, human ear is to very low frequencies (such as 100Hz or less) and high frequency (such as 20kHz or more) sound
It can be than less sensitive, it is considered herein that the filter group subband arranged from low to high according to frequency, inverse is taken to from second son
Second subband is the sense of hearing than more sensitive main filter group subband, by the part or all of sense of hearing than more sensitive filter group
Sub-belt energy is cumulative to obtain frame energy parameter 1, and calculation equation is as follows:
Wherein, e_sb_start is starting subband index, and value range is [0,6].E_sb_end is to terminate subband rope
Draw, value is more than 6, is less than sub-band sum.
The value of frame energy parameter 1 is plus the partly or entirely not used filter group subband when calculating frame energy parameter 1
Energy weighted value, obtain frame energy parameter 2, calculation equation is as follows:
Wherein e_scale1, e_scale2 are weighting scale factor, and value range is respectively [0,1].num_bandFor subband
Total number.
Spectrum gravity center characteristics parameter be by ask filter group sub-belt energy weighting summation and with the direct phase of sub-belt energy
The ratio of the sum added carries out what smothing filterings obtained by composing gravity center characteristics parameter values to other.
Spectrum gravity center characteristics parameter may be used following sub-step and realize:
a:Subband interval division for composing the calculating of gravity center characteristics parameter is as follows:
b:Using the spectrum gravity center characteristics parameter computation interval dividing mode and following formula of a, two spectrum centers of gravity are calculated
Characteristic ginseng value, respectively first interval compose gravity center characteristics parameter and second interval composes gravity center characteristics parameter.
Delta1, Delta2 are respectively a small bias, and value range is (0,1).Wherein k is spectrum center of gravity number rope
Draw.
c:Smothing filtering operation is carried out to first interval spectrum gravity center characteristics parameter sp_center [0], obtains smoothly composing center of gravity
Characteristic ginseng value, i.e. first interval compose the smothing filtering value of gravity center characteristics parameter value, and calculating process is as follows:
Sp_center [2]=sp_center--1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_
scale)
Wherein, spc_sm_scale is that spectrum center of gravity parameter smoothing filters scale factor, sp_center-1[2] previous frame is indicated
Smooth spectrum gravity center characteristics parameter value, initial value 1.6.
Step 103:Frame energy parameter and signal-to-noise ratio of the background noise energy, present frame estimated according to former frame
The signal-to-noise ratio parameter of present frame is obtained with energy balane;
The background noise energy of former frame can be obtained by existing method.
If present frame is start frame, the value of signal-to-noise ratio subband background noise energy is using the initial value given tacit consent to.Former frame is believed
It makes an uproar more identical as the principle that the signal-to-noise ratio subband background energy of present frame is estimated than the estimation of subband background noise energy, the letter of present frame
It makes an uproar and estimates to see below the step 207 in embodiment 2 than subband background energy.Specifically, the signal-to-noise ratio parameter of present frame can adopt
It is realized with existing signal-noise ratio computation method.Preferably, using following methods:
First, filter group subband is reclassified as several signal-to-noise ratio subbands, divides index such as following table,
Again, according to the ambient noise energy of the energy of each signal-to-noise ratio subband of present frame and each signal-to-noise ratio subband of previous frame
Gauge operator band average signal-to-noise ratio SNR1.Accounting equation is as follows:
Wherein Esb2_bgTo estimate the background noise energy of the obtained each signal-to-noise ratio subband of previous frame, num_band noises
Than subband number.It obtains the principle of the background noise energy of previous frame signal-to-noise ratio subband and obtains the signal-to-noise ratio subband back of the body of present frame
The principle of scape energy is identical, obtains the step of process of the signal-to-noise ratio subband background energy of present frame see below embodiment 2
207;
Finally, according to the previous frame for estimating to obtain, the frame energy parameter with background noise energy and present frame, calculating are complete entirely
Band signal-to-noise ratio SNR2:
Wherein Et_bgTo estimate obtained previous frame band background noise energy entirely, previous frame band background noise energy entirely is obtained
Principle is identical as the full principle with background noise energy of present frame is obtained, and obtains the full mistake with background noise energy of present frame
Journey see below the step 207 of embodiment 2;
Signal-to-noise ratio parameter includes sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio SNR2 in the present embodiment.Full band background is made an uproar
The background noise energy of acoustic energy and each subband is referred to as background noise energy.
Step 104:VAD is calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
Court verdict.
Embodiment 2
Invention activation sound detects (VAD) embodiment of the method 2, carries out multiphase filtering to the audio signal framing of input, obtains
Time-frequency conversion is further carried out to filter group subband signal, and to filter group subband signal, and spectral magnitude is calculated,
Signal characteristic abstraction is carried out on each filter group subband signal and spectral magnitude respectively, obtains each characteristic ginseng value.Root
The ambient noise that present frame is calculated according to characteristic ginseng value identifies tunefulness mark.According to current frame energy parameter value and background
The signal-to-noise ratio parameter of present frame is calculated in noise energy, according to the signal-to-noise ratio parameter of the present frame being calculated, previous frame
VAD (voice activation detects, Voice Activity Detection) court verdicts and each characteristic parameter, judge that present frame is
No is activation sound frame.Ambient noise mark is modified according to activation sound frame court verdict, obtains new ambient noise mark.
Judge whether to be updated ambient noise according to new ambient noise mark.The detailed process of VAD detections is as follows:
As shown in Fig. 2, this method embodiment 2 includes:
Step 201:Obtain the subband signal and spectral magnitude of present frame;
Step 202:Current frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability is calculated according to subband signal
Spend the value of characteristic parameter;The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude;
The frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio;
Specifically,
Spectrum gravity center characteristics parameter is obtained according to the energy balane of each filter group subband, spectrum gravity center characteristics parameter is to pass through
Ask filter group sub-belt energy weighting summation and with sub-belt energy be directly added and ratio or by other spectrum weights
Heart characteristic ginseng value carries out what smothing filtering obtained.
Spectrum gravity center characteristics parameter may be used following sub-step and realize:
a:Subband interval division for composing the calculating of gravity center characteristics parameter is as follows:
b:Using the spectrum gravity center characteristics parameter computation interval dividing mode and following formula of a, two spectrum centers of gravity are calculated
Characteristic ginseng value, respectively first interval compose gravity center characteristics parameter and second interval composes gravity center characteristics parameter.
Delta1, Delta2 are respectively a small bias, and value range is (0,1).Wherein k is spectrum center of gravity number rope
Draw.
c:Smothing filtering operation is carried out to first interval spectrum gravity center characteristics parameter sp_center [0], obtains smoothly composing center of gravity
Characteristic ginseng value, i.e. first interval compose the smothing filtering value of gravity center characteristics parameter value, and calculating process is as follows:
Sp_center [2]=sp_center-1[2]·spc_sm_scale+sp_center[0]·(1-spc_sm_
scale)
Wherein, spc_sm_scale is that spectrum center of gravity parameter smoothing filters scale factor, sp_center-1[2] previous frame is indicated
Smooth spectrum gravity center characteristics parameter value its initial value be 1.6.
The time-domain stability degree characteristic parameter is the variance of amplitude superposition value and the desired ratio of amplitude superposition value square,
Or the ratio is multiplied by a coefficient;
Specifically,
Time-domain stability degree characteristic parameter is calculated by the frame energy parameter of newest several frame signals.In the present embodiment
Time-domain stability degree characteristic parameter is calculated using the frame energy parameter of newest 40 frame signal.Specifically calculating step is:
First, the energy magnitude of nearest 40 frame signal is calculated, accounting equation is as follows:
Wherein, e_offset is a bias, and value range is [0,0.1]
Secondly, the energy magnitude of present frame to adjacent two frame of the 40th frame of front is added successively, it is folded obtains 20 amplitudes
It is value added.Specific accounting equation is as follows:
Ampt2(n)=Ampt1(-2n)+Ampt1(-2n-1);0≤n < 20;
Wherein, when n=0, Ampt1The energy magnitude for indicating present frame, when n < 0, Ampt1Indicate the n frames of present frame forward
Energy magnitude.
Finally, by calculating the ratio of the variance and average energy of 20 nearest amplitude superposition values, time-domain stability is obtained
Spend characteristic parameter 1td_stable_rate0.Calculation equation is as follows:
The spectrum flatness characteristic parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average, or is somebody's turn to do
Ratio is multiplied by a coefficient;
Specifically, by spectral magnitude XDFT_AMPSeveral frequency bands are divided into, and the spectrum for calculating each frequency band of present frame is flat
Degree, obtains the spectrum flatness characteristic parameter of present frame.
Spectral magnitude is divided into 3 frequency bands by the present embodiment, and calculates the spectrum flatness feature of this 3 frequency bands, specific
Realize that steps are as follows:
First, by XDFT_AMPIt is divided into 3 frequency bands according to the index of following table.
Secondly the spectrum flatness for, calculating separately each subband, obtains the spectrum flatness characteristic parameter of present frame.Present frame
The accounting equation of each spectrum flatness characteristic ginseng value is as follows:
Finally, smothing filtering is carried out to the spectrum flatness characteristic parameter of present frame, obtains the final spectrum flatness of present frame
Characteristic parameter.
SSMR (k)=smr_scalesSMR-1(k)+(1-smr_scale)·SMR(k);0≤k < 3
Wherein smr_scale is smoothing factor, and value range is [0.6,1], sSMR-1(k) it is k-th of spectrum of previous frame
The value of flatness characteristic parameter..
Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or
Continue to carry out what smothing filtering obtained to the correlation.
Specifically, the computational methods of the correlation of spectral difference coefficient are as follows in the frame of front and back two frame signal:
Tonality characteristic parameter is calculated according to spectral magnitude, wherein tonality characteristic parameter can be according to all spectral magnitudes
Or partial frequency spectrum amplitude is calculated.
Steps are as follows for its calculating:
Calculus of differences is done in part (being not less than 8 spectral coefficients) or whole spectral magnitudes by a with adjacent spectral magnitude,
And the value by difference result less than 0 is set to 0, and obtains one group of non-negative spectral difference coefficient.
The present embodiment select location index for 3 to 61 frequency point coefficient for, calculate tonality characteristic parameter.Detailed process is such as
Under:
The adjacent spectra amplitude of frequency point 3 to frequency point 61 is done into calculus of differences, equation is as follows:
Spec_dif [n-3]=XDFT_AMP(n+1)-XDFT_AMP(n);3≤n < 62;
0 variable zero setting will be less than in spec_dif.
B seeks the non-negative spectral difference system of present frame that step a is calculated non-negative spectral difference coefficient and former frame
Several related coefficients obtains the first tonality characteristic ginseng value.Calculation equation is as follows:
Wherein, pre_spec_dif is the non-negative spectral difference coefficient of former frame.
C carries out smoothing operation to the first tonality characteristic ginseng value, obtains the second tonality characteristic ginseng value.Accounting equation is such as
Under:
Tonality_rate2=tonal_scaletonality_rate2-1+(1-tonal_scale)·
tonality_rate1
Tonal_scale is tonality characteristic parameter smoothing factor, and value range is [0.1,1], tonality_rate2-1
For the second tonality characteristic ginseng value of former frame, initial value value range is [0,1].
Step 203:Frame energy parameter and signal-to-noise ratio of the background noise energy, present frame estimated according to former frame
The signal-to-noise ratio parameter of present frame is obtained with energy balane;
Step 204:It is flat according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum
Degree characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark tunefulness mark of present frame;
Step 205:VAD is calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
Court verdict;
Specifically, the concrete methods of realizing of the step 205 see below the description in conjunction with Fig. 3.
Understandably, step 205VAD judgement before the step of, as long as parameter therein does not have front and back causality, then before
Afterwards sequence it is adjustable, such as obtain initial background noise mark tunefulness mark step 204 can be in signal-to-noise ratio computation step 203
Before.
The initial background noise mark of present frame needs to be used for the calculating of next frame signal-to-noise ratio parameter after correcting, therefore obtains
The operation of the initial background noise mark of present frame can also be after VAD judgements.
Step 206:According to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality mark, time domain
Stability characteristic parameter is modified initial background noise mark;
If threshold value SNR2_redec_thr1, SNR1 that signal-to-noise ratio parameter SNR2 is less than a setting are less than SNR1_
Redec_thr1, VAD indicate that vad_f1ag is equal to 0, tonality characteristic parameter tonality_rate2 and is less than tonality_rate2_
Thr1, tonality mark tonality_flag are equal to 0 and time-domain stability degree characteristic parameter lt_stable_rate0 and are less than lt_
Stable_rate0_redec_thr1 (is set as 0.1), then ambient noise mark is assigned a value of 1.
Step 207:According to the frame energy parameter of the correction value of ambient noise mark and present frame, the full band background of former frame
Noise energy obtains the background noise energy of present frame;The background noise energy of the present frame is joined for next frame signal-to-noise ratio
Number calculates.
Ambient noise update is judged whether to according to ambient noise mark, if ambient noise is identified as 1, basis is estimated
The ratio that meter obtains the energy entirely with background noise energy and current frame signal carries out ambient noise update.Background noise energy is estimated
Meter includes that the estimation of subband background noise energy and full band background noise energy are estimated.
A, subband background noise energy estimation equation are as follows:
Esb2_bg(k)=Esb2_bg_pre(k)·αbg_e+Esb2_bg(k)·(1-αbg_e);0≤k < num_sb
Wherein num_sb is the number of frequency domain sub-band, Esb2_bg_pre(k) subband of k-th of signal-to-noise ratio subband of former frame is indicated
Background noise energy.
αbg_eIt is ambient noise updating factor, value is by the complete with background noise energy and current frame energy parameter of former frame
It determines.Calculating process is as follows:
If previous frame is entirely with background ENERGY Et_bgLess than the frame energy parameter E of present framet1, then value
0.96, otherwise value 0.95.
B, full band background noise energy are estimated:
If the ambient noise of present frame is identified as 1, background noise energy accumulated value E is updatedt_sumWith ambient noise energy
Amount adds up frame number NEt_counter, accounting equation is as follows:
Et_sum=Et_sum_-1+Et1;
NEt_counter=NEt_counter_-1+1;
Wherein Et_sum_-1For the background noise energy accumulated value of former frame, NEt_counter_-1The back of the body being calculated for former frame
Scape noise energy adds up frame number.
C, full band background noise energy is by background noise energy accumulated value Et_sumWith accumulative frame number NEt_counterRatio be worth
It arrives:
Judge NEt_counterWhether 64 are equal to, if NEt_counterEqual to 64 respectively by background noise energy accumulated value
Et_sumWith accumulative frame number NEt_counterMultiply 0.75.
D, according to tonality mark, frame energy parameter, the value with background noise energy is to subband background noise energy and the back of the body entirely
Scape noise energy accumulated value is adjusted.Calculating process is as follows:
If tonality mark tonality_flag is equal to 1 and frame energy parameter Et1Value to be less than background noise energy special
Levy parameter Et_bgValue be multiplied by a gain coefficient gain,
Then, Et_sum=Et_sum·gain+delta;Esb2_bg(k)=Esb2_bg(k)·gain+delta;
Wherein, the value range of gain is [0.3,1].
In embodiment 1 and embodiment 2, according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
The flow of VAD court verdicts is calculated, includes the following steps as shown in Figure 3:
Step 301:Activation sound signal energy when being averaged long and ambient noise when being averaged long being calculated by former frame
The ratio of energy, signal-to-noise ratio lt_snr when being calculated long;
Sound signal ENERGY E is activated when average longfgBackground noise energy E when averagely longbgCalculating and definition see step
307.Signal-to-noise ratio lt_snr accounting equations are as follows when long:
In the formula, signal-to-noise ratio lt_snr is indicated using logarithm when long.
Step 302:The full average value with signal-to-noise ratio SNR2 for calculating several frames recently obtains averagely full band signal-to-noise ratio
SNR2_lt_ave;
Accounting equation is as follows:
SNR2 (n) indicates that the full value with signal-to-noise ratio SNR2 of present frame n-th frame forward, F_num be the total of calculating average value
Frame number, value range are [8,64].
Step 303:According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound frame number
Continuous_speech_num and front continuing noise frame number continuous_noise_num obtains the letter of VAD judgements
It makes an uproar than thresholding snr_thr;
Steps are as follows for specific implementation:
First, the initial value of setting signal-noise ratio threshold snr_thr, ranging from [0.1,2], it is therefore preferable to 1.06.
Secondly, the value of signal-noise ratio threshold snr_thr is adjusted for the first time according to spectrum gravity center characteristics parameter.Its step are as follows:If
The value for composing gravity center characteristics parameter sp_center [2] is more than the threshold value spc_vad_dec_thr1 of a setting, then snr_thr
In addition a bias, the preferential bias that changes takes 0.05;Otherwise, if sp_center [1] is more than spc_vad_dec_
Thr2, then snr_thr add a bias, the preferential bias that changes takes 0.10;Otherwise, snr_thr adds a bias,
The preferential bias that changes takes 0.40;Wherein, threshold value spc_vad_dec_thr1 and spc_vad_dec_thr2 value ranges are
[1.2,2.5]
Again, sound frame number continuous_speech_num, front continuing noise frame number are continuously activated according to front
Continuous_noise_num, average full band signal-to-noise ratio SNR2_lt_ave and it is long when signal-to-noise ratio lt_snr bis- adjustment snr_
The value of thr.If front continuous speech number continuous_speech_num is more than the threshold value cpn_vad_ of a setting
Dec_thr1, then snr_thr subtract 0.2;Otherwise, if front continuing noise number continuous_noise_num is more than one
The threshold value cpn_vad_dec_thr2 of a setting, and SNR2_lt_ave is more than signal-to-noise ratio lt_ when a bias adds long
Snr is multiplied by coefficient lt_tsnr_scale, then snr_thr adds a bias, and the preferential bias that changes takes 0.1;Otherwise, such as
Fruit continuous_noise_num is more than the threshold value cpn_vad_dec_thr3 of a setting, then snr_thr adds one
Bias, the preferential bias that changes take 0.2;Otherwise, if continuous_noise_num is more than the threshold value of a setting
Cpn_vad_dec_thr4, then snr_thr add a bias, the preferential bias that changes takes 0.1.Wherein, threshold value cpn_
Vad_dec_thr1, cpn_vad_dec_thr2, cpn_vad_dec_thr3, cpn_vad_dec_thr4 value range be [2,
500], coefficient lt_tsnr_scale value ranges are [0,2].This step is skipped, final step is directly entered, can also realize this
Invention.
Finally, according to it is long when signal-to-noise ratio lt_snr value final adjustment is carried out to signal-noise ratio threshold snr_thr again, worked as
The signal-noise ratio threshold snr_thr of previous frame.
Update equation is as follows:
Snr_thr=snr_thr+ (lt_tsnr-thr_offset) thr_scale;
Wherein, thr_offset is a bias, and value range is [0.5,3];Thr_scale is a gain system
Number, value range are [0.1,1].
Step 304:Signal-to-noise ratio parameter SNR1, SNR2 being calculated according to the decision threshold snr_thr and present frame of VAD
Initial VAD judgements are calculated;
Calculating process is as follows:
If SNR1 is more than decision threshold snr_thr, present frame is judged to activate sound frame, indicates vad_flag with VAD
Value indicate whether present frame is activation sound frame, indicate that present frame is activation sound frame with value 1 in the present embodiment, 0 expression is currently
Frame is inactive sound frame.Otherwise, present frame is judged for inactive sound frame, and the value of VAD marks Vad_flag is set to 0.
If SNR2 is more than the threshold value snr2_thr of a setting, present frame is judged to activate sound frame, VAD marks
The value of vad_flag sets 1.Wherein, the value range of snr2_thr is [1.2,5.0]
Step 305:According to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum center of gravity and it is long when signal-to-noise ratio lt_
Snr is modified the court verdict of VAD;
It is as follows:
If tonality mark indicates that present frame is tonality signal, i.e. tonality_flag is 1, then judges that present frame is sharp
Sound signal living, vad_flag marks set 1.
If the average thresholding SNR2_lt_ave_t_thr1 for being more than a setting with signal-to-noise ratio SNR2_lt_ave entirely is added
Signal-to-noise ratio lt_snr multiplies in coefficient lt_tsnr_tscale when long, then judges present frame to activate sound frame, vad_flag marks to set
1。
Wherein, the value range of the present embodiment SNR2_lt_ave_thr1 is [Isosorbide-5-Nitrae], the value model of lt_tsnr_tscale
It encloses for [0.1,0.6].
If the average thresholding SNR2_lt_ave_t_thr2 for being more than a setting with signal-to-noise ratio SNR2_lt_ave entirely, and
And compose gravity center characteristics parameter sp_center [2] be more than setting thresholding sp_center_t_thr1 and it is long when signal-to-noise ratio lt_
Snr is less than the thresholding lt_tsnr_t_thr1 of a setting, then judges present frame to activate sound frame, vad_f1ag marks to set 1.Its
In, the value range of SNR2_lt_ave_t_thr2 is [1.0,2.5], the value range of sp_center_t_thr1 be [2.0,
4.0], the value range of lt_tsnr_t_thr1 is [2.5,5.0].
If SNR2_lt_ave is more than the thresholding SNR2_lt_ave_t_thr3 of a setting, and composes gravity center characteristics ginseng
Number sp_center [2] be more than setting thresholding sp_center_t_thr2 and it is long when signal-to-noise ratio lt_snr set less than one
Fixed thresholding lt_tsnr_t_thr2 then judges present frame to activate sound frame, vad_flag marks to set 1.Wherein, SNR2_lt_
The value range of ave_t_thr3 is [0.8,2.0], and the value range of sp_center_t_thr2 is [2.0,4.0], lt_
The value range of tsnr_t_thr2 is [2.5,5.0].
If SNR2_lt_ave is more than the thresholding SNR2_lt_ave_t_thr4 of a setting, and composes gravity center characteristics ginseng
Number sp_center [2] be more than setting thresholding sp_center_t_thr3 and it is long when signal-to-noise ratio lt_snr set less than one
Fixed thresholding lt_tsnr_t_thr3 then judges present frame to activate sound frame, vad_flag marks to set 1.Wherein, SNR2_lt_
The value range of ave_t_thr4 is [0.6,2.0], and the value range of sp_center_t_thr3 is [3.0,6.0], lt_
The value range of tsnr_t_thr3 is [2.5,5.0].
Step 306:According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_
The VAD court verdicts of lt_ave, the signal-to-noise ratio parameter of present frame and present frame correct activation sound and keep frame number;
Steps are as follows for specific calculating:
It is that activation phonetic symbol will indicates that present frame is activation sound frame that the currently active sound, which keeps the modified precondition of frame number, if not
Meet the condition, does not correct the value that the currently active sound keeps frame number num_speech_hangover, be directly entered step 307.
Sound is activated to keep frame number amendment step as follows:
If front continuous speech frame number continuous_speech_num is less than the threshold value of a setting
Continuous_speech_num_thr1, and lt_tsnr be less than one setting threshold value lt_tsnr_h_thr1, then when
Preceding activation sound keeps frame number num_speech_hangover to be equal to minimum continuous activation sound frame number and subtracts front continuous speech frame number
continuous_speech_num.Otherwise, if SNR2_lt_ave is more than the threshold value SNR2_lt_ave_ of a setting
Thr1, and front continuous speech frame number continuous_speech_num is more than the threshold value of a setting
Continuous_speech_num_thr2, then according to it is long when signal-to-noise ratio lt_tsnr size setting activation sound keep frame number
The value of num_speech_hangover.Otherwise, the value that the currently active sound keeps frame number num_speech_hangover is not corrected.
Minimum continuous activation sound frame number value is 8 wherein in the present embodiment, can between [6,20] value.
It is as follows:
If values of the signal-to-noise ratio lt_snr more than 2.6, num_speech_hangover is 3 when long;Otherwise, if it is long
When signal-to-noise ratio lt_snr be more than 1.6, then the value of num_speech_hangover is 4;Otherwise, num_speech_hangover
Value is 5.
Step 307:Frame number num_speech_hangover additions are kept to swash according to the court verdict of present frame and activation sound
Sound living is kept, and obtains the VAD court verdicts of present frame.
Its method is:
If present frame is judged as inactive sound, that is, it is 0 to activate phonetic symbol will, and sound is activated to keep frame number num_
Speech_hangover is more than 0, and addition activation sound is kept, i.e., setting activation phonetic symbol will is 1, and by num_speech_
The value of hangover subtracts 1.
Obtain the final VAD court verdicts of present frame.
Preferably, further include that sound signal energy is activated when calculating average long according to the initial court verdicts of VAD after step 304
Measure Efg;After step 307, further include background noise energy E when calculating average long according to VAD court verdictsbg, calculated value is used for
Next frame VAD judgements.
Sound signal ENERGY E is activated when average longfgSpecific calculating process is as follows:
A), if the initial court verdict instruction present frames of VAD are activation sound frame, i.e. the value of VAD marks is 1, and Et1Greatly
In EbgSeveral times, the present embodiment takes 6 times, then when update is average long activation sound energy accumulation value fg_energy and it is average long when
Activate sound energy accumulation frame number fg_energy_count.Update method is that fg_energy adds Et1Obtain new fg_energy.
Fg_energy_count adds 1 to obtain new fg_energy_count.
Activate sound signal energy that can reflect newest activation sound signal energy when b), in order to ensure averagely long, if average
Sound energy accumulation frame number value is activated to be equal to some setting value fg_max_frame_num when long, then cumulative frame number and accumulated value are same
When be multiplied by an attenuation coefficient attenu_coef1.Fg_max_frame_num values 512, attenu_coef1 in the present embodiment
Value is 0.75.
C), by it is averagely long when activation sound energy accumulation value fg_energy divided by it is average long when activate sound energy accumulation frame number
Activate sound signal energy, calculation equation as follows when obtaining averagely long:
Background noise energy E when average longbgComputational methods be:
Assuming that bg_energy_count is the cumulative frame number of background noise energy, for recording nearest background noise energy
Accumulated value contains the energy of how many frame.Bg_energy is the accumulated value of nearest background noise energy.
A), if present frame is judged as inactive sound frame, the value of VAD marks is 0, and SNR2 is less than 1.0, then updates
The background noise energy accumulated value bg_energy and cumulative frame number bg_energy_count of background noise energy.Update method is the back of the body
Scape noise energy accumulated value bg_energy adds Et1Obtain new background noise energy accumulated value bg_energy.Ambient noise energy
The cumulative frame number bg_energy_count of amount adds 1 to obtain the cumulative frame number bg_energy_count of new background noise energy.
B), if background noise energy adds up frame number bg_energy_count be equal to it is averagely long when background noise energy
The maximum count frame number of calculating, then cumulative frame number and accumulated value are multiplied by attenuation coefficient attenu_coef2 simultaneously.Wherein, this implementation
The maximum count frame number that background noise energy calculates when example is average long is 512, and attenuation coefficient attenu_coef2 is equal to 0.75.
C), by background noise energy accumulated value bg_energy remove in background noise energy add up frame number obtain averagely long when
Background noise energy calculation equation is as follows:
In order to realize above-mentioned activation sound detection method Examples 1 and 2, the present invention also provides a kind of detections of activation sound
(VAD) device embodiment 1, as shown in figure 4, the device includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Characteristic parameter acquiring unit, the frame energy parameter and spectrum center of gravity for present frame to be calculated according to subband signal are special
Levy the value of parameter;
Signal-to-noise ratio computation unit, the frame energy ginseng of background noise energy, present frame for being estimated according to former frame
The signal-to-noise ratio parameter of present frame is calculated in number and signal-to-noise ratio sub-belt energy;
VAD decision units, for being calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
Obtain VAD court verdicts.
Corresponding to embodiment of the method 2, the characteristic parameter acquiring unit is additionally operable to that time domain is calculated according to subband signal
The value of stability characteristic parameter, for spectrum flatness characteristic parameter tunefulness characteristic parameter to be calculated according to spectral magnitude
Value;;
Existing method acquisition can be used in each characteristic parameter, and following methods acquisition can also be used:
The frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio or the ratio carry out the obtained value of smothing filtering;
The time-domain stability degree characteristic parameter is the variance of amplitude superposition value and the desired ratio of amplitude superposition value square,
Or the ratio is multiplied by a coefficient;
The spectrum flatness characteristic parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average, or is somebody's turn to do
Ratio is multiplied by a coefficient;
Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
As shown in figure 5, invention activation sound detects (VAD) device embodiment 2, as different from Example 1, described device
Further include mark computing unit and background noise energy processing unit, wherein:
Indicate computing unit, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature
Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame:
Background noise energy processing unit comprising:
Computing module is identified, for joining according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree feature
Number, spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark of present frame;
Correcting module is identified, for according to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality
Mark, time-domain stability degree characteristic parameter are modified initial background noise mark;
The frame energy of background noise energy acquisition module, correction value and present frame for being identified according to ambient noise is joined
The full band background noise energy of number, former frame, obtains the background noise energy of present frame, the background noise energy of the present frame
It is calculated for next frame signal-to-noise ratio parameter.
Corresponding to embodiment of the method 1 and 2, as shown in fig. 6, the VAD decision units include:
Signal-to-noise ratio computation module when long activates message for what is be calculated by former frame when being averaged long
Number energy and it is average long when background noise energy ratio, signal-to-noise ratio lt_snr when being calculated long;
Average full band signal-to-noise ratio computing module, the full average value with signal-to-noise ratio SNR2 for calculating several nearest frames,
Obtain averagely full band signal-to-noise ratio SNR2_lt_ave;
Signal-noise ratio threshold computing module, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously swash
Sound frame number continuous_speech_num and front continuing noise frame number continuous_noise_num living is obtained
The signal-noise ratio threshold snr_thr of VAD judgements;
Initial VAD judging modules, the signal-to-noise ratio for being calculated according to the decision threshold snr_thr and present frame of VAD
Initial VAD judgements are calculated in parameter SNR1, SNR2;
VAD modified result modules, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum center of gravity and it is long when believe
It makes an uproar and the court verdict of VAD is modified than lt_snr;
Activate sound to keep frame correcting module, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr, flat
The VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio of present frame and present frame, amendment obtain activation sound and protect entirely
Hold frame number;
VAD judging modules, for keeping frame number num_speech_ according to the court verdict and activation sound of present frame
Hangover addition activation sounds are kept, and obtain the VAD court verdicts of present frame.
It is highly preferred that the VAD decision units further include:Energy computation module is used for according to the initial court verdicts of VAD,
Sound signal ENERGY E is activated when calculating average longfg;And background noise energy E when according to VAD court verdicts carrying out averagely longbgMore
Newly, updated value is for next frame VAD judgements.
The present invention also provides a kind of ambient noise detection method embodiments, as shown in fig. 7, this method includes:
Step 701:Obtain the subband signal and spectral magnitude of present frame;
Step 702:Frame energy parameter, spectrum gravity center characteristics parameter, the time-domain stability degree being calculated according to subband signal are special
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value for levying parameter;
Preferably, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value.
The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio or the ratio carry out the obtained value of smothing filtering.
The time-domain stability degree parameter is the variance of frame energy magnitude and the desired ratio of amplitude superposition value square, or is somebody's turn to do
Ratio is multiplied by a coefficient.
The spectrum flatness parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average or the ratio
It is multiplied by a coefficient.
Specifically, method same as above can be used in step 701 and step 702, and details are not described herein.
Step 703:It is special according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality
Parameter, current frame energy parameter progress ambient noise detection are levied, judges whether present frame is ambient noise.
Preferably, judge that following either condition is set up, then it is noise signal to judge present frame not:
The time-domain stability degree parameter lt_stable_rate0 is more than the threshold value of a setting;
The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree
Characteristic ginseng value is also greater than the threshold value that some sets;
Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree feature is joined
Number lt_stable_rate0 values are more than the threshold value of its setting;
The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the door of corresponding setting
Limit value;
Or, judgment frame energy parameter Et1Value be more than setting threshold value E_thr1.
Specifically, it is assumed that present frame is ambient noise.
The present embodiment identifies background_flag to indicate whether present frame is that background is made an uproar by an ambient noise
Sound, and arrange if it is determined that present frame is ambient noise, then it is 1 that ambient noise mark background_flag, which is arranged, is otherwise set
It is 0 to set ambient noise mark background_flag.
According to time-domain stability degree characteristic parameter, spectrum gravity center characteristics parameter, spectrum flatness characteristic parameter, tonality characteristic parameter,
Current frame energy parameter detects whether present frame is noise signal.If not noise signal, then ambient noise is identified
Background_flag is set to 0.
Detailed process is as follows:
Judge whether time-domain stability degree parameter lt_stable_rate0 is more than the threshold value lt_stable_ of a setting
rate_thr1.If it is, it is noise signal to judge present frame not, and background_flag is set to 0.The present embodiment thresholding
Value lt_stable_rate_thr1 value ranges are [0.8,1.6];
Judge smoothly to compose the threshold value sp_center_thr1 whether gravity center characteristics parameter value is more than a setting, and when
Domain stability characteristic ginseng value is also greater than the threshold value lt_stable_rate_thr2 that some sets.If it is, judging to work as
Previous frame is not noise signal, and background_flag is set to 0.The value range of sp_center_thr1 is [1.6,4];1t_
The value range of stable_rate_thr2 be (0,0.1].
Judge whether the value of tonality characteristic parameter tonality_rate2 is more than the threshold value tonality_ of a setting
Whether rate_thr1, time-domain stability degree characteristic parameter lt_stable_rate0 value are more than the threshold value lt_stable_ of setting
Rate_thr3, if above-mentioned condition is set up simultaneously, it is ambient noise to judge present frame not, and background_flag is assigned a value of
0.Threshold value tonality_rate_thr1 value ranges are in [0.4,0.66].Threshold value lt_stable_rate_thr3's takes
Value is ranging from [0.06,0.3].
Judge whether the value for composing flatness characteristic parameter sSMR [0] is less than the threshold value sSMR_thr1 of setting, judges that spectrum is flat
Whether the value of smooth degree characteristic parameter sSMR [1] is less than the threshold value sSMR_thr2 of setting, judges to compose flatness characteristic parameter sSMR
Whether value [2] is less than the sSMR_thr3 of setting.If above-mentioned condition is set up simultaneously, it is ambient noise to judge present frame not.
Background_flag is assigned a value of 0.The value range of threshold value sSMR_thr1, sSMR_thr2, sSMR_thr3 be [0.88,
0.98].Judge whether the value of flatness characteristic parameter sSMR [0] is less than the threshold value sSMR_thr4 of setting, judges to compose flatness
Whether the value of characteristic parameter sSMR [1] is less than the threshold value sSMR_thr5 of setting, judges to compose flatness characteristic parameter sSMR [1]
Value whether be less than setting threshold value sSMR_thr6.If any of the above-described condition is set up, it is that background is made an uproar to judge present frame not
Sound.Background_flag is assigned a value of 0.The value range of sSMR_thr4, sSMR_thr5, sSMR_thr6 be [0.80,
0.92]
Judgment frame energy parameter Et1Value whether be more than setting threshold value E_thr1, if above-mentioned condition set up, sentence
Disconnected present frame is not ambient noise.Background_flag is assigned a value of 0.E_thr1 according to the dynamic range of frame energy parameter into
Row value.
If it is not ambient noise that present frame, which is not detected, then it represents that present frame is ambient noise.
Corresponding to the above method, the present invention also provides a kind of ambient noise detection devices, as shown in figure 8, the device packet
It includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Calculation of characteristic parameters unit, frame energy parameter for being calculated according to subband signal, spectrum gravity center characteristics parameter,
Spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude in the value of time-domain stability degree characteristic parameter
Value;
Preferably, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value.
The spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio or the ratio carry out the obtained value of smothing filtering.
The time-domain stability degree parameter is the variance of frame energy magnitude and the desired ratio of amplitude superposition value square, or is somebody's turn to do
Ratio is multiplied by a coefficient.
The spectrum flatness parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average or the ratio
It is multiplied by a coefficient.
Ambient noise judging unit, for special according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness
It levies parameter, tonality characteristic parameter, current frame energy parameter and carries out ambient noise detection, judge whether present frame is ambient noise.
Preferably, the ambient noise judging unit judges that following either condition is set up, then it is noise to judge present frame not
Signal:
The time-domain stability degree parameter lt_stable_rate0 is more than the threshold value of a setting;
The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree
Characteristic ginseng value is also greater than the threshold value that some sets;
Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree feature is joined
Number lt_stable_rate0 values are more than the threshold value of its setting;
The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the door of corresponding setting
Limit value;
Or, judgment frame energy parameter Et1Value be more than setting threshold value E_thr1.
The present invention also provides a kind of tonality signal detecting methods, as shown in figure 9, method includes:
Step 901:Obtain the subband signal and spectral magnitude of present frame;
Step 902:Spectrum gravity center characteristics parameter, the time-domain stability degree characteristic parameter of present frame are calculated according to subband signal
Value, according to spectral magnitude be calculated spectrum flatness characteristic parameter tunefulness characteristic parameter value;
Preferably, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy
The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains;The time-domain stability degree characteristic parameter is amplitude superposition value
Variance and amplitude superposition value square desired ratio or the ratio be multiplied by a coefficient;
The spectrum flatness characteristic parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average, or is somebody's turn to do
Ratio is multiplied by a coefficient;
Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
Step 903:It is special according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum center of gravity
Sign parameter judges whether present frame is tonality signal.
When step 903 determines whether tonality signal, following operation is executed:
A) assume that current frame signal is non-tonality signal, indicate to work as with a tonality flag of frame tonality_frame
Whether previous frame is tonality frame.
It is tonality frame that the value of tonality_frame, which is 1 expression present frame, in the present embodiment, and 0 indicates that present frame is non-tonality
Frame;
B) judge tonality_rate2 after tonality characteristic parameter tonality_ratel or its smothing filtering value whether
More than the threshold value tonality_decision_thr1 or tonality_decision_thr2 of corresponding setting, if above-mentioned
There are one set up to then follow the steps C for condition), no to then follow the steps D);
Wherein, the value range of tonality_decision_thr1 is [0.5,0.7], the value of tonality_ratel
Ranging from [0.7,0.99].
If C time-domain stability degree characteristic ginseng values lt_stable_rate0 is less than the threshold value lt_ of a setting
stable_decision_thr1;Compose the threshold value spc_ that gravity center characteristics parameter value sp_center [1] is more than a setting
Decision_thr1, and the spectrum flatness characteristic parameter of each subband is respectively less than corresponding preset threshold value, specifically,
It composes flatness characteristic parameter sSMR [0] and is less than the threshold value sSMF_decision_thr1 or sSMR [1] of a setting less than one
The threshold value sSMF_decision_thr2 or sSMR [2] of a setting are less than the threshold value sSMF_decision_ of a setting
thr3;Present frame is then judged for tonality frame, and the value of setting tonality flag of frame tonality_frame is 1, is otherwise judged as non-tune
Property frame, the value of setting tonality flag of frame tonality_frame is 0.And continue to execute step D.
Wherein, the value range of threshold value lt_stable_decision_thr1 is [0.01,0.25], spc_
Decision_thr1 is [1.0,1.8], and sSMF_decision_thr1 is [0.6,0.9], sSMF_decision_thr2
[0.6,0.9], sSMF_decision_thr3 [0.7,0.98].
D) tonality degree characteristic parameter tonality_degree is carried out more according to tonality flag of frame tonality_frame
Newly, wherein tonality extent index tonality_degree initial values are configured when activating sound detection device to start to work, and are taken
Value is ranging from [0,1].In the case of difference, tonality degree characteristic parameter tonality_degree computational methods are different:
If current tonality flag of frame instruction present frame is tonality frame, using following equation to tonality degree feature
Parameter tonality_degree is updated:
Tonality_degree=tonality_degree-1·td_scale_A+td_scale_B;
Wherein, tonality_degree-1For the tonality degree characteristic parameter of former frame.Its initial value value range be [0,
1].Td_scale_A is attenuation coefficient, and value range is [0,1];Td_scale_B is cumulative coefficient, and value range is
[0,1].
E) judge whether present frame is tonality letter according to updated tonality degree characteristic parameter tonality_degree
Number, and the value of tonality mark tonality_flag is set.
Specifically, if tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, judgement is worked as
Previous frame is otherwise tonality signal judges present frame for non-tonality signal.
Corresponding to aforementioned tonality signal detecting method, the present invention also provides a kind of tonality signal supervisory instrument, such as Figure 10
Shown, which includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Calculation of characteristic parameters unit is steady for current spectrum gravity center characteristics parameter, time domain to be calculated according to subband signal
Surely the value for spending characteristic parameter, the value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude;
As previously mentioned, the spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and does not add
It weighs the ratio of accumulated value or the ratio carries out the value that smothing filtering obtains;
The time-domain stability degree characteristic parameter is the variance of amplitude superposition value and the desired ratio of amplitude superposition value square,
Or the ratio is multiplied by a coefficient;
The spectrum flatness characteristic parameter is the geometric mean of certain spectral magnitudes and the ratio of arithmetic average, or is somebody's turn to do
Ratio is multiplied by a coefficient;
Tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
Tonality signal judging unit, for according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness feature
Parameter, spectrum gravity center characteristics parameter judge whether present frame is tonality signal.
As shown in figure 11, the tonality signal judging unit includes:
A tonality flag of frame is used in combination for setting current frame signal as non-tonality signal in tonality signal initialization module
Tonality_frame indicates whether present frame is tonality frame;
Tonality characteristic parameter judgment module, after judging tonality characteristic parameter tonality_rate1 or its smothing filtering
Whether the value of tonality_rate2 is more than the threshold value of corresponding setting;
Tonality signal judgment module, for when the tonality characteristic parameter judgment module is judged as YES, if time domain is steady
Surely degree characteristic ginseng value is less than the threshold value of a setting;The threshold value that gravity center characteristics parameter value is more than a setting is composed, and each
The spectrum flatness characteristic parameter of subband is respectively less than corresponding preset threshold value;Judge present frame for tonality frame;In basis
The tonality degree characteristic parameter tonality_degree being calculated judges whether present frame is tonality signal, and in the tune
Property characteristic parameter judgment module when being judged as NO, for according to updated tonality degree characteristic parameter tonality_degree
Judge whether present frame is tonality signal, and the value of tonality mark tonality_flag is set;
Tonality extent index update module, for after tonality characteristic parameter tonality_rate1 or its smothing filtering
When the value of tonality_rate2 is respectively less than the threshold value of corresponding setting, according to tonality flag of frame to tonality degree characteristic parameter
Tonality_degree is updated, and wherein tonality extent index tonality_degree initial values are in activation sound detection device
It is configured when start-up operation.
Specifically, if current tonality flag of frame instruction present frame is tonality frame, tonality extent index update module
Tonality degree characteristic parameter tonality_degree is updated using following equation:
Tonality_degree=tonality_degree-1·td scale_A+td_scale_B;
Wherein, tonality_degree-1For the tonality degree characteristic parameter of former frame.Its initial value value range be [0,
1].Td_scale_A is attenuation coefficient, and value range is [0,1];Td_scale_B is cumulative coefficient, and value range is
[0,1].
If tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, the tonality signal
Judgment module judges present frame for tonality signal, otherwise, judges present frame for non-tonality signal.
Specifically, if tonality degree characteristic parameter tonality_degree is more than the threshold value 0.5, judge current
Frame is tonality signal, and the value of setting tonality mark tonality_flag is 1;Otherwise, present frame is judged for non-tonality signal, if
It is 0 to set the value.The threshold value interval of tonality signal decision is [0.3,0.7].
Activation sound keeps the modification method of frame number, as shown in figure 12, this method in being adjudicated the present invention also provides a kind of VAD
Including:
Step 1201:Signal-to-noise ratio lt_snr when being calculated long according to subband signal;
Specifically, ambient noise energy when activating sound signal energy when being averaged long and being averaged long being calculated by former frame
The ratio of amount, signal-to-noise ratio lt_snr when being calculated long;Logarithm expression can be used in signal-to-noise ratio lt_snr when long.
Step 1202:Calculate average full band signal-to-noise ratio SNR2_lt_ave;
The full average value with signal-to-noise ratio SNR2 for calculating several frames recently obtains averagely full band signal-to-noise ratio SNR2_lt_
ave;
Step 1203:According to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio
The VAD court verdicts of SNR2_lt_ave, the signal-to-noise ratio parameter of present frame and present frame keep frame number to carry out the currently active sound
It corrects.
Understandably, it is that activation phonetic symbol will indicates that present frame is activation that the currently active sound, which keeps the modified precondition of frame number,
Sound frame.
Preferably, when keeping frame number to be modified the currently active sound, if front continuous speech frame number is set less than one
Fixed threshold value 1, and it is long when signal-to-noise ratio lt_snr be less than the threshold value 2 of a setting, then the currently active sound keeps frame number etc.
Front continuous speech frame number is subtracted in minimum continuous activation sound frame number;Otherwise, if average full band signal-to-noise ratio SNR2_lt_ave is big
In one setting threshold value 3, and front continuous speech frame number be more than one setting threshold value 4, then according to it is long when believe
Make an uproar than size setting activation sound keep the value of frame number, otherwise do not correct the currently active sound and keep frame number num_speech_
The value of hangover.
The modification method that frame number is kept corresponding to foregoing activation sound activates sound in being adjudicated the present invention also provides a kind of VAD
The correcting device of frame number is kept, as shown in figure 13, which includes:
Signal-to-noise ratio computation unit when long, signal-to-noise ratio lt_snr when for calculating long;
Specifically, when long signal-to-noise ratio computation unit by former frame be calculated it is average long when activation sound signal energy and
The ratio of background noise energy, signal-to-noise ratio lt_snr when being calculated long when average long;
Average full band signal-to-noise ratio computing unit, for calculating averagely full band signal-to-noise ratio SNR2_lt_ave;
Specifically, average complete the putting down with signal-to-noise ratio SNR2 for calculating several frames recently with signal-to-noise ratio computing unit entirely
Mean value obtains averagely full band signal-to-noise ratio SNR2_lt_ave.
Activate sound to keep frame number amending unit, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr,
The average VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio parameter of present frame and present frame entirely, to the currently active
Sound keeps frame number to be modified.
As described above, it is that activation phonetic symbol will indicates that present frame is sharp that the currently active sound, which keeps the modified precondition of frame number,
Sound frame living.
Preferably, activation sound keeps frame number amending unit, when keeping frame number to be modified the currently active sound, if front
Continuous speech frame number is less than the threshold value 1 of a setting, and it is long when signal-to-noise ratio lt_snr be less than the threshold value 2 of a setting,
Then the currently active sound keeps frame number to be equal to minimum continuous activation sound frame number and subtracts front continuous speech frame number, otherwise, if average
It is more than the threshold value 3 of a setting with signal-to-noise ratio SNR2_lt_ave entirely, and front continuous speech frame number is more than a setting
Threshold value 4, then according to it is long when signal-to-noise ratio size setting activation sound keep the value of frame number, otherwise do not correct the currently active sound and protect
Hold the value of frame number nun_speech_hangover.
The method of adjustment of signal-noise ratio threshold in being adjudicated the present invention also provides a kind of VAD, as shown in figure 14, the method for adjustment
Including:
Step 1401:The spectrum gravity center characteristics parameter of present frame is calculated according to subband signal;
Specifically, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy
The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains.
Step 1402:Activation sound signal energy when being averaged long and ambient noise when being averaged long being calculated by former frame
The ratio of energy, signal-to-noise ratio lt_snr when being calculated long;
Step 1403:According to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame number and front continuous
The signal-noise ratio threshold of noise frame number continuous_noise_num adjustment VAD judgements.
Specifically, as shown in figure 15, the step of adjustment signal-noise ratio threshold includes:
Step 1501:The initial value of signal-noise ratio threshold snr_thr is set;
Step 1502:Adjust the value of signal-noise ratio threshold snr_thr for the first time according to spectrum center of gravity parameter;
Step 1503:Sound frame number continuous_speech_num, front continuing noise frame are continuously activated according to front
Number continuous_noise_num, average full band signal-to-noise ratio SNR2_lt_ave and it is long when signal-to-noise ratio bis- adjustment of lt_snr
The value of signal-noise ratio threshold snr_thr;
Step 1504:According to it is long when signal-to-noise ratio lt_snr value signal-noise ratio threshold snr_thr is finally corrected again, obtain
To the signal-noise ratio threshold snr_thr of present frame.
Corresponding to the method for adjustment of aforementioned signal-noise ratio threshold, signal-noise ratio threshold in being adjudicated the present invention also provides a kind of VAD
Adjusting apparatus, as shown in figure 16, which includes:
Characteristic parameter acquiring unit, the spectrum gravity center characteristics parameter for present frame to be calculated according to subband signal;
Preferably, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted of all or part of subband signal energy
The ratio of accumulated value or the ratio carry out the value that smothing filtering obtains.
Signal-to-noise ratio computation unit when long, activation sound signal energy peace when being averaged long for being calculated by former frame
The ratio of background noise energy when long, signal-to-noise ratio lt_snr when being calculated long;
Signal-noise ratio threshold adjustment unit, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio, front continuously activate sound frame
The signal-noise ratio threshold of number and front continuing noise frame number continuous_noise_num adjustment VAD judgements.
Specifically, when the signal-noise ratio threshold adjustment unit adjustment signal-noise ratio threshold, setting signal-noise ratio threshold snr_thr's
Initial value;Adjust the value of signal-noise ratio threshold snr_thr for the first time according to spectrum center of gravity parameter;Sound frame number is continuously activated according to front
Continuous_speech_num, front continuing noise frame number continuous_noise_num, average full band signal-to-noise ratio
SNR2_lt_ave and it is long when signal-to-noise ratio lt_snr bis- adjustment snr_thr values;Finally, according to it is long when signal-to-noise ratio lt_snr
Value carries out final adjustment to signal-noise ratio threshold snr_thr again, obtains the signal-noise ratio threshold snr_thr of present frame.
Modern many speech coding standards, such as AMR, AMR-WB, all vad enabled function.In terms of efficiency, these codings
The VAD of device can not reach good performance under all typical background noises.Especially under unstable noise, such as
The VAD efficiency of office noises, these encoders is all relatively low.And for music signal, these VAD sometimes will appear mistake inspection
It surveys, causes corresponding Processing Algorithm apparent quality occur and decline.
The shortcomings that method of the present invention overcomes existing vad algorithm is improving VAD to the same of non-stationary noise detection efficiency
When also improve the accuracy rate of music detection.Allow and better performance is obtained using the voice frequency signal Processing Algorithm of this VAD.
Ambient noise detection method provided by the invention may make that the estimation of ambient noise is more accurate and stablizes, favorably
In the accuracy rate for improving VAD detections.Present invention simultaneously provides tonality signal detecting method, improve the standard of tonality music detection
True rate.Present invention simultaneously provides activation sound keep frame number modification method, may make under different noises and signal-to-noise ratio,
Vad algorithm can preferably be balanced in performance and efficiency.Present invention simultaneously provides VAD judgement in signal-noise ratio threshold tune
Adjusting method may make VAD decision algorithms that can reach preferable accuracy rate under different signal-to-noise ratio, in the feelings for ensureing quality
Under condition, further raising efficiency.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program
Related hardware is completed, and described program can be stored in computer readable storage medium, such as read-only memory, disk or CD
Deng.Optionally, all or part of step of above-described embodiment can also be realized using one or more integrated circuits.Accordingly
Ground, the form that hardware may be used in each module/unit in above-described embodiment are realized, the shape of software function module can also be used
Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
Claims (20)
1. a kind of activation sound detects VAD method, which is characterized in that this method includes:
Obtain the subband signal and spectral magnitude of present frame;
Frame energy parameter, spectrum gravity center characteristics parameter and the time-domain stability degree characteristic parameter of present frame are calculated according to subband signal
Value;The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude;
The frame energy parameter and signal-to-noise ratio sub-belt energy of the background noise energy, present frame estimated according to former frame calculate
To the signal-to-noise ratio parameter of present frame;
According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter,
Tonality calculation of characteristic parameters obtains the tonality mark of present frame;
VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter;
Wherein, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and the ratio of unweighted accumulated value
Value or the ratio carry out the value that smothing filtering obtains;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
The spectrum flatness characteristic parameter is the ratio of the geometric mean and arithmetic average of the spectral magnitude of multiple subbands, or
The ratio is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
2. the method as described in claim 1, which is characterized in that before or after obtaining VAD court verdicts, this method is also wrapped
It includes:
According to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter,
Tonality calculation of characteristic parameters obtains the initial background noise mark of present frame;
After obtaining VAD court verdicts, this method further includes:According to the court verdict of present frame VAD, tonality characteristic parameter,
Signal-to-noise ratio parameter, tonality mark, time-domain stability degree characteristic parameter are modified initial background noise mark;
According to the frame energy parameter of the correction value of ambient noise mark and present frame, the full band background noise energy of former frame, obtain
Subband background noise energy to present frame and full band background noise energy;
The background noise energy of the present frame is calculated for next frame signal-to-noise ratio parameter.
3. the method as described in claim 1, which is characterized in that
VAD court verdicts are calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter, calculate
Steps are as follows:
A, the ratio of activation sound signal energy and background noise energy when being averaged long when being averaged long being calculated by former frame,
Signal-to-noise ratio when being calculated long;
B calculates the full average value with signal-to-noise ratio SNR2 of several frames recently, obtains averagely full band signal-to-noise ratio SNR2_lt_ave;
C, according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound frame number continuous_
Speech_num and front continuing noise frame number continuous_noise_num obtains the signal-noise ratio threshold snr_ of VAD judgements
thr;
D is calculated initial VAD according to the decision threshold snr_thr of VAD and signal-to-noise ratio parameter and adjudicates, wherein the noise
Include sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio SNR2 than parameter;
E, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum gravity center characteristics parameter and it is long when signal-to-noise ratio lt_snr
The court verdict of VAD is modified;
F, according to the court verdict of several frames in front, it is long when signal-to-noise ratio lt_snr, average full band signal-to-noise ratio SNR2_lt_ave, when
The signal-to-noise ratio parameter of previous frame and the VAD court verdicts of present frame correct activation sound and keep frame number;
G keeps frame number num_speech_hangover addition activation sounds to keep according to the court verdict of present frame and activation sound,
Obtain the VAD court verdicts of present frame.
4. method as claimed in claim 3, it is characterised in that:Further include according to the initial court verdicts of VAD, meter after step d
Sound signal ENERGY E is activated when calculating average longfg;After step g, further include that background is made an uproar when calculating average long according to VAD court verdicts
Acoustic energy Ebg, calculated value is for next frame VAD judgements.
5. a kind of activation sound detects VAD devices, which is characterized in that the device includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Characteristic parameter acquiring unit, frame energy parameter, spectrum gravity center characteristics ginseng for present frame to be calculated according to subband signal
The value of number and time-domain stability degree characteristic parameter;Spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude
Value;
Indicate computing unit, for according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter,
Spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the tonality mark of present frame;
Signal-to-noise ratio computation unit, the frame energy parameter of background noise energy, present frame for being estimated according to former frame and
The signal-to-noise ratio parameter of present frame is calculated in signal-to-noise ratio sub-belt energy;
VAD decision units, for being calculated according to tonality mark, signal-to-noise ratio parameter, spectrum gravity center characteristics parameter, frame energy parameter
VAD court verdicts;
Wherein, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and the ratio of unweighted accumulated value
Value or the ratio carry out the value that smothing filtering obtains;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
The spectrum flatness characteristic parameter is the ratio of the geometric mean and arithmetic average of the spectral magnitude of multiple subbands, or
The ratio is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
6. activation sound detects VAD devices as claimed in claim 5, which is characterized in that
Described device further includes background noise energy processing unit comprising:
Identify computing module, for according to present frame frame energy parameter, spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter,
Spectrum flatness characteristic parameter, tonality calculation of characteristic parameters obtain the initial background noise mark of present frame;
Correcting module is identified, for according to the court verdict of present frame VAD, tonality characteristic parameter, signal-to-noise ratio parameter, tonality mark
Will, time-domain stability degree characteristic parameter are modified initial background noise mark;
Background noise energy acquisition module, it is the frame energy parameter of correction value and present frame for being identified according to ambient noise, preceding
The full band background noise energy of one frame, obtains the background noise energy of present frame, the background noise energy of the present frame is used for
Next frame signal-to-noise ratio parameter calculates.
7. activation sound detects VAD devices as claimed in claim 5, which is characterized in that the VAD decision units include:
Signal-to-noise ratio computation module when long, activation sound signal energy and average length when being averaged long for being calculated by former frame
When background noise energy ratio, signal-to-noise ratio lt_snr when being calculated long;
Average full band signal-to-noise ratio computing module, the full average value with signal-to-noise ratio SNR2 for calculating several nearest frames obtain
Average full band signal-to-noise ratio SNR2_lt_ave;
Signal-noise ratio threshold computing module, for according to spectrum gravity center characteristics parameter, it is long when signal-to-noise ratio lt_snr, front continuously activate sound
Frame number continuous_speech_num and front continuing noise frame number continuous_noise_num obtain VAD and sentence
Signal-noise ratio threshold snr_thr certainly;
Initial VAD judging modules, the signal-to-noise ratio parameter for being calculated according to the decision threshold snr_thr and present frame of VAD
Initial VAD judgements are calculated, wherein the signal-to-noise ratio parameter includes sub-band averaging Signal to Noise Ratio (SNR) 1 and full band signal-to-noise ratio
SNR2;
VAD modified result modules, according to tonality mark, average full band signal-to-noise ratio SNR2_lt_ave, spectrum gravity center characteristics parameter and length
When signal-to-noise ratio lt_snr the court verdict of VAD is modified;
Activate sound to keep frame correcting module, for according to the court verdicts of several frames in front, it is long when signal-to-noise ratio lt_snr, average complete
VAD court verdicts with signal-to-noise ratio SNR2_lt_ave, the signal-to-noise ratio of present frame and present frame, amendment obtain activation sound and keep frame
Number;
VAD judging modules, for keeping frame number num_speech_hangover to add according to the court verdict and activation sound of present frame
Add activation sound to keep, obtains the VAD court verdicts of present frame.
8. activation sound detects VAD devices as claimed in claim 7, it is characterised in that:The VAD decision units further include:Energy
Computing module is measured, for according to the initial court verdicts of VAD, sound signal ENERGY E to be activated when calculating average longfg;And sentenced according to VAD
Background noise energy E when certainly result calculates average longbg, calculated value is for next frame VAD judgements.
9. a kind of ambient noise detection method, which is characterized in that this method includes:
Obtain the subband signal and spectral magnitude of present frame;
The frame energy parameter that is calculated according to subband signal, spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter, root
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude;
According to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, tonality characteristic parameter, current
Frame energy parameter carries out ambient noise detection, judges whether present frame is ambient noise;
Wherein, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and the ratio of unweighted accumulated value
Value or the ratio carry out the value that smothing filtering obtains;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
It is described to compose the geometric mean of spectral magnitude and the ratio of arithmetic average or the ratio that flatness parameter is multiple subbands
Value is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
10. method as claimed in claim 9, it is characterised in that:Judge that following either condition is set up, then judges that present frame is not made an uproar
Acoustical signal:
The time-domain stability degree characteristic parameter lt_stable_rate0 is more than the threshold value of a setting;
The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree feature
Parameter value is also greater than the threshold value that some sets;
Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree characteristic parameter
Lt_stable_rate0 values are more than the threshold value of its setting;
The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the threshold value of corresponding setting;
Or, judgment frame energy parameter Et1Value be more than setting threshold value E_thr1.
11. a kind of ambient noise detection device, which is characterized in that the device includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Calculation of characteristic parameters unit, frame energy parameter, spectrum gravity center characteristics parameter for being calculated according to subband signal, time domain
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for the value of stability characteristic parameter;
Ambient noise judging unit, for according to spectrum gravity center characteristics parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng
Number, tonality characteristic parameter, current frame energy parameter carry out ambient noise detection, judge whether present frame is ambient noise;
Wherein, the frame energy parameter is the weighted superposition value of each subband signal energy or direct superposition value;
The spectrum gravity center characteristics parameter is the weighted accumulation value of all or part of subband signal energy and the ratio of unweighted accumulated value
Value or the ratio carry out the value that smothing filtering obtains;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
It is described to compose the geometric mean of spectral magnitude and the ratio of arithmetic average or the ratio that flatness parameter is multiple subbands
Value is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
12. detection device as claimed in claim 11, it is characterised in that:The ambient noise judging unit judges following any
Condition is set up, then it is noise signal to judge present frame not:
The time-domain stability degree characteristic parameter lt_stable_rate0 is more than the threshold value of a setting;
The smothing filtering value that first interval composes gravity center characteristics parameter value is more than a threshold value set, and time-domain stability degree feature
Parameter value is also greater than the threshold value that some sets;
Value after tonality characteristic parameter or its smothing filtering is more than the threshold value of a setting, and time-domain stability degree characteristic parameter
Lt_stable_rate0 values are more than the threshold value of its setting;
The spectrum flatness characteristic parameter of each subband or the value after respective smothing filtering are respectively less than the threshold value of corresponding setting;
Or, judgment frame energy parameter Et1Value be more than setting threshold value E_thr1.
13. a kind of tonality signal detecting method, which is characterized in that this method includes:
Obtain the subband signal and spectral magnitude of present frame;
Spectrum gravity center characteristics parameter, the value of time-domain stability degree characteristic parameter is calculated according to subband signal, according to spectral magnitude meter
It calculates and obtains the value of spectrum flatness characteristic parameter tunefulness characteristic parameter;
Worked as according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness characteristic parameter, spectrum gravity center characteristics parameter judgement
Whether previous frame is tonality signal;
Wherein, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio or the ratio carry out the obtained value of smothing filtering;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
The spectrum flatness characteristic parameter is the ratio of the geometric mean and arithmetic average of the spectral magnitude of multiple subbands, or
The ratio is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
14. method as claimed in claim 13, it is characterised in that:When determining whether tonality signal, following operation is executed:
A) assume that current frame signal is non-tonality signal, present frame is indicated with a tonality flag of frame tonality_frame
Whether it is tonality frame;
B) judge whether the value of tonality_rate2 after tonality characteristic parameter tonality_rate1 or its smothing filtering is more than
The threshold value of corresponding setting, if there are one set up to then follow the steps C for above-mentioned condition), no to then follow the steps D);
C) if time-domain stability degree characteristic ginseng value is less than the threshold value of a setting;Spectrum gravity center characteristics parameter value is set more than one
Fixed threshold value, and the spectrum flatness characteristic parameter of each subband is respectively less than corresponding preset threshold value;Then judge current
Frame is tonality frame, and the value of tonality flag of frame is arranged, and is otherwise judged as non-tonality frame, and the value of tonality flag of frame is arranged, and continues to hold
Row step D);
D) tonality degree characteristic parameter tonality_degree is updated according to tonality flag of frame, wherein tonality degree is joined
Number tonality_degree initial values are configured when activating sound detection to start to work;
E) judge whether present frame is tonality signal according to updated tonality degree characteristic parameter tonality_degree, and
The value of tonality mark tonality_flag is set.
15. method as claimed in claim 14, it is characterised in that:If current tonality flag of frame instruction present frame is tonality
Frame is then updated tonality degree characteristic parameter tonality_degree using following equation:
Tonality_degree=tonality_degree-1·td_scale_A+td_scale_B;
Wherein, tonality_degree-1For the tonality degree characteristic parameter of former frame, initial value value range is [0,1],
Td_scale_A is attenuation coefficient, and td_scale_B is cumulative coefficient.
16. method as claimed in claim 14, it is characterised in that:
If tonality degree characteristic parameter tonality_degree is more than the threshold value of some setting, judge present frame for tonality
Otherwise signal judges present frame for non-tonality signal.
17. a kind of tonality signal supervisory instrument, which is characterized in that the detection device includes:
Filter group, the subband signal for obtaining present frame;
Spectral magnitude computing unit, the spectral magnitude for obtaining present frame;
Calculation of characteristic parameters unit, for spectrum gravity center characteristics parameter to be calculated according to subband signal, time-domain stability degree feature is joined
The value of spectrum flatness characteristic parameter tunefulness characteristic parameter is calculated according to spectral magnitude for several values;
Tonality signal judging unit, for according to tonality characteristic parameter, time-domain stability degree characteristic parameter, spectrum flatness feature ginseng
Number, spectrum gravity center characteristics parameter judge whether present frame is tonality signal;
Wherein, the spectrum gravity center characteristics parameter is the weighted accumulation value and unweighted accumulated value of all or part of subband signal energy
Ratio or the ratio carry out the obtained value of smothing filtering;
The time-domain stability degree characteristic parameter be energy magnitude superposition value variance and energy magnitude superposition value square it is desired
Ratio or the ratio are multiplied by a coefficient;
The spectrum flatness characteristic parameter is the ratio of the geometric mean and arithmetic average of the spectral magnitude of multiple subbands, or
The ratio is multiplied by a coefficient;
The tonality characteristic parameter be worth to by calculating the correlation of spectral difference coefficient in the frame of front and back two frame signals, or after
It is continuous that correlation progress smothing filtering is obtained.
18. detection device as claimed in claim 17, it is characterised in that:The tonality signal judging unit includes:
A tonality flag of frame is used in combination for setting current frame signal as non-tonality signal in tonality signal initialization module
Tonality_frame indicates whether present frame is tonality frame;
Tonality characteristic parameter judgment module, after judging tonality characteristic parameter tonality_rate1 or its smothing filtering
Whether the value of tonality_rate2 is more than the threshold value of corresponding setting;
Tonality signal judgment module, for when the tonality characteristic parameter judgment module is judged as YES, if time-domain stability degree
Characteristic ginseng value is less than the threshold value of a setting;Compose the threshold value that gravity center characteristics parameter value is more than a setting, and each subband
Spectrum flatness characteristic parameter be respectively less than corresponding preset threshold value;Judge present frame for tonality frame;According to calculating
Obtained tonality degree characteristic parameter tonality_degree judges whether present frame is tonality signal, and in tonality spy
When sign parameter judgment module is judged as NO, for being judged according to updated tonality degree characteristic parameter tonality_degree
Whether present frame is tonality signal, and the value of tonality mark tonality_flag is arranged;
Tonality extent index update module, for after tonality characteristic parameter tonality_rate1 or its smothing filtering
When the value of tonality_rate2 is respectively less than the threshold value of corresponding setting, according to tonality flag of frame to tonality degree characteristic parameter
Tonality_degree is updated, and wherein tonality extent index tonality_degree initial values are in activation sound detection device
It is configured when start-up operation.
19. detection device as claimed in claim 18, it is characterised in that:If current tonality flag of frame instruction present frame is
Tonality frame, then tonality extent index update module using following equation to tonality degree characteristic parameter tonality_degree
It is updated:
Tonality_degree=tonality_degree-1·td_scale_A+td_scale_B;
Wherein, tonality_degree-1For the tonality degree characteristic parameter of former frame, initial value value range is [0,1],
Td_scale_A is attenuation coefficient, and td_scale_B is cumulative coefficient.
20. detection device as claimed in claim 18, it is characterised in that:If tonality degree characteristic parameter tonality_
Degree is more than the threshold value of some setting, then the tonality signal judgment module judges otherwise present frame is sentenced for tonality signal
Disconnected present frame is non-tonality signal.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210570563.5A CN103903634B (en) | 2012-12-25 | 2012-12-25 | The detection of activation sound and the method and apparatus for activating sound detection |
CN202110060370.4A CN112992188A (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment |
CN201810622976.0A CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210570563.5A CN103903634B (en) | 2012-12-25 | 2012-12-25 | The detection of activation sound and the method and apparatus for activating sound detection |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810622976.0A Division CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment |
CN202110060370.4A Division CN112992188A (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103903634A CN103903634A (en) | 2014-07-02 |
CN103903634B true CN103903634B (en) | 2018-09-04 |
Family
ID=50994913
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110060370.4A Pending CN112992188A (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment |
CN201810622976.0A Active CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment |
CN201210570563.5A Active CN103903634B (en) | 2012-12-25 | 2012-12-25 | The detection of activation sound and the method and apparatus for activating sound detection |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110060370.4A Pending CN112992188A (en) | 2012-12-25 | 2012-12-25 | Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment |
CN201810622976.0A Active CN109119096B (en) | 2012-12-25 | 2012-12-25 | Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment |
Country Status (1)
Country | Link |
---|---|
CN (3) | CN112992188A (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105810201B (en) * | 2014-12-31 | 2019-07-02 | 展讯通信(上海)有限公司 | Voice activity detection method and its system |
CN106328169B (en) | 2015-06-26 | 2018-12-11 | 中兴通讯股份有限公司 | A kind of acquisition methods, activation sound detection method and the device of activation sound amendment frame number |
CN107393558B (en) * | 2017-07-14 | 2020-09-11 | 深圳永顺智信息科技有限公司 | Voice activity detection method and device |
CN111724808A (en) * | 2019-03-18 | 2020-09-29 | Oppo广东移动通信有限公司 | Audio signal processing method, device, terminal and storage medium |
EP3800640A4 (en) | 2019-06-21 | 2021-09-29 | Shenzhen Goodix Technology Co., Ltd. | Voice detection method, voice detection device, voice processing chip and electronic apparatus |
CN112634921B (en) * | 2019-10-09 | 2024-02-13 | 北京中关村科金技术有限公司 | Voice processing method, device and storage medium |
CN112669877B (en) * | 2020-09-09 | 2023-09-29 | 珠海市杰理科技股份有限公司 | Noise detection and suppression method and device, terminal equipment, system and chip |
CN113192531B (en) * | 2021-05-28 | 2024-04-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, terminal and storage medium for detecting whether audio is pure audio |
CN115862685B (en) * | 2023-02-27 | 2023-09-15 | 全时云商务服务股份有限公司 | Real-time voice activity detection method and device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317711B1 (en) * | 1999-02-25 | 2001-11-13 | Ricoh Company, Ltd. | Speech segment detection and word recognition |
CN101379548A (en) * | 2006-02-10 | 2009-03-04 | 艾利森电话股份有限公司 | A voice detector and a method for suppressing sub-bands in a voice detector |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
CN102687196A (en) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | Method for the detection of speech segments |
CN102741918A (en) * | 2010-12-24 | 2012-10-17 | 华为技术有限公司 | Method and apparatus for voice activity detection |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2153170C (en) * | 1993-11-30 | 2000-12-19 | At&T Corp. | Transmitted noise reduction in communications systems |
FI100840B (en) * | 1995-12-12 | 1998-02-27 | Nokia Mobile Phones Ltd | Noise attenuator and method for attenuating background noise from noisy speech and a mobile station |
US6453291B1 (en) * | 1999-02-04 | 2002-09-17 | Motorola, Inc. | Apparatus and method for voice activity detection in a communication system |
CA2420129A1 (en) * | 2003-02-17 | 2004-08-17 | Catena Networks, Canada, Inc. | A method for robustly detecting voice activity |
US7366658B2 (en) * | 2005-12-09 | 2008-04-29 | Texas Instruments Incorporated | Noise pre-processor for enhanced variable rate speech codec |
KR101151746B1 (en) * | 2006-01-02 | 2012-06-15 | 삼성전자주식회사 | Noise suppressor for audio signal recording and method apparatus |
CN101197130B (en) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | Sound activity detecting method and detector thereof |
CN101320559B (en) * | 2007-06-07 | 2011-05-18 | 华为技术有限公司 | Sound activation detection apparatus and method |
CN101236742B (en) * | 2008-03-03 | 2011-08-10 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
CN102044243B (en) * | 2009-10-15 | 2012-08-29 | 华为技术有限公司 | Method and device for voice activity detection (VAD) and encoder |
WO2011049515A1 (en) * | 2009-10-19 | 2011-04-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and voice activity detector for a speech encoder |
CN102194457B (en) * | 2010-03-02 | 2013-02-27 | 中兴通讯股份有限公司 | Audio encoding and decoding method, system and noise level estimation method |
ES2740173T3 (en) * | 2010-12-24 | 2020-02-05 | Huawei Tech Co Ltd | A method and apparatus for performing a voice activity detection |
CN102097095A (en) * | 2010-12-28 | 2011-06-15 | 天津市亚安科技电子有限公司 | Speech endpoint detecting method and device |
CN102074246B (en) * | 2011-01-05 | 2012-12-19 | 瑞声声学科技(深圳)有限公司 | Dual-microphone based speech enhancement device and method |
-
2012
- 2012-12-25 CN CN202110060370.4A patent/CN112992188A/en active Pending
- 2012-12-25 CN CN201810622976.0A patent/CN109119096B/en active Active
- 2012-12-25 CN CN201210570563.5A patent/CN103903634B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317711B1 (en) * | 1999-02-25 | 2001-11-13 | Ricoh Company, Ltd. | Speech segment detection and word recognition |
CN101379548A (en) * | 2006-02-10 | 2009-03-04 | 艾利森电话股份有限公司 | A voice detector and a method for suppressing sub-bands in a voice detector |
CN102687196A (en) * | 2009-10-08 | 2012-09-19 | 西班牙电信公司 | Method for the detection of speech segments |
CN102044242A (en) * | 2009-10-15 | 2011-05-04 | 华为技术有限公司 | Method, device and electronic equipment for voice activity detection |
CN102741918A (en) * | 2010-12-24 | 2012-10-17 | 华为技术有限公司 | Method and apparatus for voice activity detection |
Non-Patent Citations (1)
Title |
---|
LIS, SOPHIA ANTIPOLIS.Digital cellular telecommunications system (Phase 2+);Voice Activity Detector (VAD) for Adaptive Multi-Rate(AMR) speech traffic channels;General description (GSM 06.94 version 7.1.0 Release 1998);Draft ETSI EN 301 708.《IEEE, LIS, SOPHIA ANTIPOLIS CEDEX, FRANCE, vol. SMG11, no. V7.1.0, July 1999 (1999-07-01)》.1999, * |
Also Published As
Publication number | Publication date |
---|---|
CN109119096B (en) | 2021-01-22 |
CN112992188A (en) | 2021-06-18 |
CN103903634A (en) | 2014-07-02 |
CN109119096A (en) | 2019-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103903634B (en) | The detection of activation sound and the method and apparatus for activating sound detection | |
CN104424956B (en) | Activate sound detection method and device | |
CN106328169B (en) | A kind of acquisition methods, activation sound detection method and the device of activation sound amendment frame number | |
US9672841B2 (en) | Voice activity detection method and method used for voice activity detection and apparatus thereof | |
EP1450353B1 (en) | System for suppressing wind noise | |
ES2678415T3 (en) | Apparatus and procedure for processing and audio signal for speech improvement by using a feature extraction | |
CN108766454A (en) | A kind of voice noise suppressing method and device | |
CN105261359B (en) | The noise-canceling system and noise-eliminating method of mobile microphone | |
CN103413547B (en) | A kind of method that room reverberation is eliminated | |
CN103440869A (en) | Audio-reverberation inhibiting device and inhibiting method thereof | |
EP2710590B1 (en) | Super-wideband noise supression | |
CN102347028A (en) | Double-microphone speech enhancer and speech enhancement method thereof | |
JP5157852B2 (en) | Audio signal processing evaluation program and audio signal processing evaluation apparatus | |
CN103578477B (en) | Denoising method and device based on noise estimation | |
CN103544961B (en) | Audio signal processing method and device | |
CN105261375A (en) | Voice activity detection method and apparatus | |
CN104658543A (en) | Method for eliminating indoor reverberation | |
JP2011033717A (en) | Noise suppression device | |
KR20150078510A (en) | Method and system for noise reduction based on spectral and temporal correlations | |
Waqas et al. | Simulation of modified hybrid noise reduction algorithm to enhance the speech quality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |