CN1322488C

CN1322488C - Method for strengthening sound

Info

Publication number: CN1322488C
Application number: CNB2004100345056A
Authority: CN
Inventors: 余水安
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2004-04-14
Filing date: 2004-04-14
Publication date: 2007-06-20
Anticipated expiration: 2024-04-14
Also published as: CN1684143A

Abstract

The present invention provides a method for enhancing speech. The method comprises the procedures: 1. after frame separation and weighted preprocessing are carried out to a signal, adding windows are transformed to a frequency domain; 2. subband separation is carried out to a frequency domain signal, and the energy of each subband is calculated; 3. the signal-to-noise ratio of each subband is calculated; 4. decision is carried out to a current frame, and whether the current frame is noise or not is judged; 5. automatic gain-control gain (AGC gain) is calculated according to the calculated energy of each subband; 6. the signal-to-noise ratio of all bands or the weighted signal-to-noise ratio of partial subbands is calculated; the attenuation gain of each subband is estimated according to the calculated signal-to-noise ratio of all bands and the signal-to-noise ratio of each subband, and the calculated attenuation gain of each subband is smoothed according to the attenuation gain of each subband calculated by frames before; 8. a frequency spectrum is processed by the calculated AGD gain; 9. noise is updated according to the noise decision; 10. a processed frequency spectrum signal is converted to a time domain, and the frequency spectrum signal becomes an output signal through weighted processing.

Description

The method that a kind of voice strengthen

Technical field

The present invention relates to multimedia technology field, specifically refer to the method that a kind of voice strengthen.

Background technology

Video conference terminal is the indispensable part that TV conference system is formed, but owing in the terminal use, often have ground unrest, comprises air-conditioning noise, and fan noise etc., described ground unrest will make the quality of voice obviously descend.And in present some scramblers, adopted coding based on model, if ground unrest is bigger, when signal to noise ratio (S/N ratio) is relatively lower, quality inevitably can occurs based on the coding of model and descend and mis-behave.

For improving voice quality, following several sound enhancement method is arranged in the prior art, typically have following several:

First method is to carry out voice by silence detection to strengthen, and is specially: carry out silence detection by minute frame, divide noise frame and speech frame, in noise frame, attenuate acoustic noise, and in speech frame, unattenuated or part decays, and then reaches the purpose that voice strengthen.

Second method is the noise cancellation method, and ultimate principle is to deduct noise from the voice that band is made an uproar, and adopts array microphone or noise detecting method to come estimating noise, deducts the noise of estimation in time domain or frequency domain from voice, reaches the effect that voice strengthen.

The third method is based on the method for speech production model, as everyone knows, the voiced process of voice can be modeled as driving source and act on a linear time-varying filtering device, driving source is divided into voiced sound and voiceless sound two classes, voiced sound has obvious periodic, adopts to have the pulse signal of some cycles as pumping signal; And voiceless sound does not have obvious periodic, generally adopts white noise as pumping signal, and the linear time-varying filtering device then adopts zero pole model usually, by estimated parameter, utilizes analysis-synthetic method, carries out voice and strengthens.

The 4th kind of method is based on the Enhancement Method that short-time spectrum is estimated.Because normally a kind of non-stationary process of voice is thought stably but can be similar in 10-30ms is the time, estimates the short-time spectrum of " pure " voice from the short-time spectrum of noisy speech, can reach the purpose that voice strengthen.Noise also is a stochastic process in addition, and what can be similar to thinks that it has the characteristic of white noise and Gaussian noise, utilizes the perception of people's ear to the insensitive characteristic of the phase place of voice spectrum component, can handle at the short-time spectrum amplitude of noisy speech.

The 5th kind of method, carrying out voice based on the human auditory system masking characteristics strengthens, the human auditory system masking characteristics has obtained to use widely in audio coding, promptly when the masking threshold of computing voice signal, at first utilize traditional spectrum-subtraction to come the approximate simulation voice signal, utilize the voice signal of simulation to calculate the threshold value of sheltering model, sheltering model 1 and sheltering model 2 in the mpeg audio of general selective maturation.

Though said method can make voice be strengthened to a certain extent, improve the quality of voice, but still have following shortcoming respectively:

The shortcoming of first method: cause the noise of noise frame little though adopt mute detection method to carry out the voice enhancing, but the noise of speech frame is then bigger, the subjective sensation of people's ear is that the noise after voice strengthen is not steady, exist to rise and fall, and people's ear subjective sensation feels poorer to the effect of sudden change.

The shortcoming of second method: adopt the noise cancellation method, may cause existing " music " noise, after promptly voice strengthened, some Frequency and Amplitude of noise was big, and the Frequency and Amplitude that has is little, and then made people's ear subjective sensation uncomfortable.

The shortcoming of the third method: during based on the voice enhancement algorithm of speech production model because the parameter of " pure " voice can't estimate accurately that so can only take the method for some compromises, but this can cause the intelligibility of speech variation after synthetic.

The shortcoming of the 4th kind of method: adopt the short-time spectrum estimation approach to carry out voice and strengthen, though in local signal to noise ratio (S/N ratio) when higher, damage to voice is little, but when signal to noise ratio (S/N ratio) is relatively lower, because attenuate acoustic noise is many, inevitably damage voice, the result is that the voice distortion degree after strengthening is bigger, and subjective sensation is poor.

The shortcoming of the 5th kind of method: carry out voice based on the human auditory system masking characteristics and strengthen, simulate " clean speech " owing to adopt traditional spectrum-subtraction, so when calculating masking threshold, just there is deviation, and adopt and to shelter model calculated amount aspect and can increase a lot, do not utilize the use of a lot of occasions.

Summary of the invention

The object of the invention is to provide a kind of system and method for voice enhancing, to solve the problem that the prior art scheme can not improve voice quality fully and effectively.

For addressing the above problem, the invention provides following technical scheme:

The method that a kind of voice strengthen comprises the steps:

(1), signal is carried out the branch frame, after the weighting pre-service, windowing transforms to frequency domain;

(2), frequency-region signal is carried out the branch subband, calculate the energy of each subband;

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(4), present frame is carried out noise decision, judge whether it is noise;

(5), according to the sub belt energy that calculates, calculate the automatic gain ride gain;

(6), according to signal to noise ratio (S/N ratio) or the ground unrest energy and the present frame energy of each subband, calculate full band signal to noise ratio (S/N ratio);

(7), according to the decay gain of the full band signal to noise ratio (S/N ratio) of calculating and each each subband of subband SNR estimation, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

(8), with the automatic gain ride gain of calculating, frequency spectrum is handled, reach automatic gain control, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

(9), according to noise decision, noise is upgraded;

(10), the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

The method that a kind of voice strengthen comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(4), according to the signal to noise ratio (S/N ratio) and the sub belt energy that calculate, calculate sound tolerance and frequency spectrum and depart from, thereby carry out noise decision, force to upgrade according to the result of noise decision and adjudicate;

(6), each subband signal to noise ratio (S/N ratio) is adjusted;

(7), according to signal to noise ratio (S/N ratio) or the ground unrest energy and the present frame energy of each subband, calculate full band signal to noise ratio (S/N ratio)

(8), according to the full band signal to noise ratio (S/N ratio) of calculating, each subband signal to noise ratio (S/N ratio) is estimated the decay gain of each subband, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

(9), with the automatic gain ride gain of calculating, frequency spectrum is handled, reach automatic gain control, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

(10), according to noise decision with force to upgrade judgement, noise is upgraded;

(11), the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

The method that a kind of voice strengthen comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(5), each subband signal to noise ratio (S/N ratio) of estimating is adjusted;

(7), according to the full band signal to noise ratio (S/N ratio) of calculating, each subband signal to noise ratio (S/N ratio) is estimated the decay gain of each subband, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

(8), utilizing each subband decay of calculating to gain handles the frequency spectrum of each subband, reaches the purpose of denoising;

(9), according to noise decision with force to upgrade judgement, noise is upgraded;

The method that a kind of voice strengthen comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(4), present frame is carried out noise decision, confirm whether be noise frame;

(6), with the automatic gain ride gain of calculating, frequency spectrum is handled, reach automatic gain control;

(7), according to noise decision, noise is upgraded;

(8), the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

The method that a kind of voice strengthen comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(5), each subband signal to noise ratio (S/N ratio) is adjusted;

(6), according to the decay of each each subband of subband SNR estimation gain, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

(7), utilizing each subband decay of calculating to gain handles the frequency spectrum of each subband, reaches the purpose of denoising;

(8), according to noise decision, noise is upgraded;

(9), the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

By above-mentioned technical scheme, the system and method that voice of the present invention strengthen has following advantage:

When higher, the denoising decay is many in the signal to noise ratio (S/N ratio) of input; When input signal-to-noise ratio was relatively lower, the denoising decay was few, guaranteed like this to reduce the damage to voice in certain decay as far as possible, caused distortion.

Can prevent in the variation of subband decay gain bigger, reach very little in distortion or the situation that can not perceive under, strengthen denoising effect.

Can guarantee to strengthen the stationarity of back noise, almost not have the music noise through voice.

In conjunction with automatic gain control, further attenuate acoustic noise.When having voice, can improve the intelligibility of voice as far as possible.

Description of drawings

The system construction drawing that Fig. 1 uses for the inventive method specific embodiment.

Embodiment

Before specifically introducing a kind of speech-enhancement system of the present invention and method, make a presentation with regard to overall thought of the present invention earlier: the present invention mainly is the signal to noise ratio (S/N ratio) according to this locality, the local ground unrest of adaptive reduction strengthens the intelligibility of voice, and does not damage speech quality.The frequency spectrum of the ground unrest by continuous each subband of estimation, noise be the signal to noise ratio (S/N ratio) of full band signal when, the decay factor of adaptive each subband of adjustment, on the basis that guarantees voice quality, reduce noise greatly, in frequency domain, estimate simultaneously speech energy, voice signal is carried out adaptive gain control, the level of stable output voice signal, further reduce noise, improve voice quality.

The system that voice of the present invention strengthen carries out the branch frame to signal earlier, because the most of concentration of energy of ground unrest is in low-frequency band, so carry out filtering through a Hi-pass filter earlier, to carrying out frequency domain transform through filtered signal, in frequency domain to the signaling molecule tape handling behind minute frame, thereby realize the function of two aspects: automatic gain control and noise reduction are described below:

According to the sub belt energy that calculates, noise upgrades the energy that judgement symbol calculates full band (perhaps part subband), thereby calculates the gain of automatic gain control, stablizes the speech output signal level.

According to the energy of the subband signal that calculates, the noise energy of the subband that background noise estimator is estimated is calculated signal to noise ratio (S/N ratio), and after revising, the gain of calculating each subband is handled the Frequency and Amplitude of frequency domain, reaches the purpose of noise reduction.

In following specific embodiment was introduced, the system that voice strengthen was input as the 16KHz sampling rate, the voice downlink data upon handover of 16bits progress, and the branch frame is 10ms.

Please refer to Fig. 1, be the system construction drawing that the inventive method specific embodiment is used, wherein solid line is expressed as the data stream and the control stream of speech-enhancement system inside; Dotted line is expressed as the control stream of outside input, and the system of described raising voice quality comprises:

Hi-pass filter, it is the voice of 16KHz that described Hi-pass filter receives sampling rate, because ground unrest is generally bigger at the low frequency part energy in the voice, so utilize can the decay component of low frequency part of described Hi-pass filter.Can adopt FIR (finite impulse response) wave filter or IIR (infinite impulse response) wave filter when specifically selecting Hi-pass filter for use, cutoff frequency can be 100Hz.

The frequency domain transform module links to each other with described Hi-pass filter, and receives the voice signal from described Hi-pass filter, described voice signal is being carried out earlier signal is carried out windowing process before the frequency domain transform.

Before windowing was with pre-treatment, at these frame data of windowing D data being arranged was former frame section data data, and wherein aliasing partly is described below:

d(m，n)＝d(m-1，L+n)；0≤n＜D

Wherein, L=160 is Frame length, and D=48 is the aliasing data length; M represents this frame; N represents data directory.

And other data in this frame for to be weighted the data after the processing through the data S (n) after the described high pass filter, processes, specifically describe as follows:

d(m，D+n)＝S(n)+ζ×S(n-1)；

Wherein, 0≤n＜L; ζ=-0.8.

Described d (n) data are carried out windowing process, be output as g (n), specific as follows:

g(n)＝{

D (m, n) * sin ²(π (n+0.5)/2D); Wherein, 0＜=n＜D, D=48,

D (m, n); Wherein, D＜=n＜L, L=160,

D (m, n) * sin ²(π (n-L+D+0.5)/2D); Wherein, L＜=n＜D+L, D+L=208; D+L＜=n＜M, M=256,

The sub belt energy estimation module, with this frequency domain transform module link to each other, receive voice signal, and it carried out Energy Estimation through frequency domain transform from described frequency domain transform module, concrete estimation is carried out in the following way:

G (k) = \frac{2}{M} * Σ_{n = 0}^{M - 1} g (n) e^{- j 2 πnk / M} 0 \leq k < M

Wherein, M=256 is DFT (Discrete Fourier Transform) computational length.

E_{ch} (m, i) = \max {E_{\min}, α_{ch} (m) E_{ch} (m - 1) + (1 - α_{ch} (m) \frac{1}{f_{h} (i) - f_{l} (i) + 1} Σ_{k = f_{l} (i)}^{f_{h} (i)} {| G (k) |}^{2}}

Wherein, 0≤i≤N _cE _Min=0.0625, be energy minimum in the subband; α _Ch(m) be the sub belt energy smoothing factor; N _c=26, be the number of subband; f _h(i) be the interior maximum spectrum point of i subband; f _l(i) be the interior minimum frequency spectrum point of i subband, described:

f _l＝{2，4，6，8，10，12，14，17，20，23，27，31，36，42，49，56，64，72，79，86，93，100，107，114，121}

f _h＝{3，5，7，9，11，13，16，19，22，26，30，35，41，48，55，63，71，78，85，92，99，106，113，120，127}

Described sub belt energy smoothing factor α _Ch(m), be defined as follows:

α_{ch} (m) = {\begin{matrix} 0; m \leq 1, \\ 0.45; m > 1, \end{matrix}

Wherein the first frame m=1 supposes α _Ch(m) be 0, afterwards in all frames, α _Ch(m) be 0.45.

The subband SNR estimator links to each other with described sub belt energy estimation module, receives the voice signal of handling through described sub belt energy estimation module, and described voice signal neutron band signal to noise ratio (S/N ratio) is estimated that specifically estimation mode is:

σ_{q} (i) = \max {0, \min {89, round {{10 \log}_{10} (\frac{E_{ch} (m, i)}{E_{n} (m, i)}) / 0.375}}};

Wherein, 0≤i＜N _cE _n(m) be the ground unrest energy of the present frame of estimation; σ _qAmplitude limit is between 0 to 89.

Frequency spectrum departs from estimator, and it act as: the difference of average energy when calculating this frame energy with length, and as one of noise decision condition, specific implementation is:

At first calculate the logarithm energy of subband spectrum energy, be:

E _DB(m, i)=10log ₁₀(E _Ch(m, i)); Wherein, 0≤I＜N _b

Calculate the difference of this frame energy and long-time average energy then, be:

Δ_{E} (m) = Σ_{i = 0}^{N_{b} - 1} | E_{dB} (m, i) - {\overset{&OverBar;}{E}}_{dB} (m, i) |

Wherein It is the long-time energy that utilizes the earlier data frame to calculate.And in first frame, be calculated as follows:

\overset{&OverBar;}{E_{dB}} = E_{dB} (m);

For present frame m, calculate all sub belt energies and the logarithm energy, computing formula is as follows:

E_{tot} (m) = {10 \log}_{10} (Σ_{i = 0}^{N_{b} - 1} E_{ch} (m, i))

Calculate the windowing factor:

α (m) = α_{H} - \frac{α_{H} - α_{L}}{E_{H} - E_{L}} (E_{H} - E_{tot} (m))

Then the windowing factor of calculating is carried out amplitude limit.

α(m)＝max{α _L，min{α _H，α(m)}}

E wherein _HAnd E _LBe E _TotMinimum and maximum value point.And windowing factor amplitude limit is α≤α (m)≤α _H

These constant definitions are: E _H=50dB, E _L=30dB, α _H=0.99, α _L=0.50.N _b＝16。

In next frame, the formula that upgrades long-time average energy is:

{\overset{&OverBar;}{E}}_{dB} (m + 1, i) = α (m) \overset{&OverBar;}{E_{dB}} (m, i) + (1 - α (m)) E_{dB} (m, i)

Sound tolerance estimator links to each other with described subband SNR estimator, receives the voice signal from described subband SNR estimator, and its sound tolerance is estimated that specific implementation is:

v (m) = Σ_{i = 0}^{N_{b} - 1} V (σ_{q} (i))

Wherein V (k) is a k value among the meter V of following 90 elements, is defined as follows:

V＝{2，2，2，2，2，2，2，2，2，2，2，3，3，3，3，3，4，4，4，5，5，5，6，6，7，7，7，8，8，9，9，10，10，11，12，12，13，13，14，15，15，16，17，17，18，19，20，20，21，22，23，24，24，25，26，27，28，28，29，30，31，31，32，33，34，35，36，37，37，38，39，40，41，42，43，44，45，46，47，48，49，50，50，50，50，50，50，50，50，50，50}

Noise upgrades judging module, deviates from estimator with described sound tolerance estimator with frequency spectrum respectively and links to each other, and it act as: whether utilize the result of sound tolerance estimator and the result that frequency spectrum deviates from estimator, adjudicating this frame is noise frame.

update_flag＝FALSE

if(v(m)＜＝UPDATE_THLD)

{

update_flag＝TRUE

update_cnt＝0

}

else?if(E _tot(m)＞NOISE_FLOOR_CHAN?&&?Δ _E(m)＜DEV_THLD)

{

update_cnt++

if(update_cnt＞＝UPDATE_CNT_THLD)

update_flag＝TRUE

}

if(update_cnt＝last_update_cnt)

hyster_cnt++

else

hyster_cnt＝0

last_update_cnt＝update_cnt

if(hyster_cnt＞HYSTER_CNT_THLD)

update_cnt＝0

Force to upgrade judging module, upgrading judging module with described noise links to each other, it act as: monitor described noise and upgrade judging module, in data, there is the quiet phase, to cause long-time ground unrest to upgrade, can't adjudicate update mode, then when timer count value during greater than a setting if long-time noise upgrades judging module, to force to upgrade, the ground unrest of renewal is got the minimum value of self-energy during this period of time.

The automatic gain control module links to each other with the frequency domain transform module with described sub belt energy estimation module respectively, and it act as: according to the energy of sound import, control the size of output sound, adjust the level of output sound.Wai Bu echo cancellation module also can provide sign a: echo_flag in addition, if echo_flag is TRUE, does not have the local voice input in the expression input.Specific implementation is:

E_{gain} = {10 \log}_{10} Σ_{i = 0}^{N_{c} - 1} (E_{ch} (m, i) \times (f_{H} (i) f_{L} (i) + 1))

Calculate full band or part subband energy and after, calculate the gain factor of automatic gain control:

False code as described below:

if((update_flag＝＝FALSE)&&(echo_flag＝＝FALSE))

{

if(first_time＝＝TRUE)

{

first_time＝FALSE；

short_dB＝E _gain；

long_dB＝E _gain；

}

if(first_time＝＝FALSE)

{

short_dB＝α ₁×short_dB+(1-α ₁)×E _gain

if(short_dB＜long_dB)

long_dB＝α ₂×long_dB+(1-α ₂)×short_dB

else

long_dB＝α ₃×long_dB+(1-α ₃)×short_dB

gain_dB＝β×gain_dB+(1-β)×(targ?et_dB-long_dB)

gain_dB＝max{0，min{-12，gain_dB}}

}

α wherein ₁Can be defined as 0.8, be weighting factor in short-term.

α ₂Can be defined as 0.9; α ₃Can be defined as 0.99, weighting factor when these two parameters are long.

β is defined as 0.8, is the gain-smoothing factor, prevents the sudden change that gains, causes through after the Gain Automatic control fluctuating of voice signal.

Gain factor to automatic gain control carries out amplitude limit at last.

Subband signal to noise ratio (S/N ratio) correcting module, with described sound tolerance estimator, noise upgrades judging module and the subband SNR estimator links to each other, it act as respectively: at the subband signal to noise ratio (S/N ratio) of needs adjustment, judge and adjust that its detailed process is:

index_cnt＝0

for(i＝N _M?to?N _c-1?step?1)

{

if(σ _q(i)≥INDEX_THLD)

index_cnt＝index_cnt+1

}

if(index_cnt＜INDEX_CNT_THLD)

modify_flag＝TRUE；

else

modify_flag＝FALSE

if(modify_flag＝＝TRUE)

for(i＝0?to?N _c-1?step?1)

if((v(m)≤METRIC_THLD)or(σ _q(i)≤SETBACK_THLD))

σ _q′(i)＝1

else

σ _q′(i)＝σ _q(i)

else

{σ _q′}＝{σ _q}

for(i＝0?to?N _c-1?step?1)

if(σ _q′(i)＜σ _th)

σ _q″＝σ _th

else

σ _q″＝σ _th′

Full band signal to noise ratio (S/N ratio) computing module links to each other with described subband signal to noise ratio (S/N ratio) correcting module, and computation process is: at first calculate full band signal to noise ratio (S/N ratio), and carry out amplitude limit, computing formula is:

σ_{all} = 0.375 \times \frac{1}{N_{H} - N_{L} + 1} Σ_{i = N_{L}}^{N_{H}} σ_{q}^{''}

σ _all＝max{σ _{all_min}，σ _all}

Wherein, minimum value constant σ _{All_min}=6.

Level and smooth when then the full band signal to noise ratio (S/N ratio) that calculates being grown.

The subband gain calculation module, link to each other with described subband signal to noise ratio (S/N ratio) correcting module and full band signal to noise ratio (S/N ratio) computing module respectively, act as the decay gain of calculating each subband, wherein, in order to prevent that under different state of signal-to-noise the decay gain is excessive, causes the distortion of voice, when calculated gains, be weighted level and smooth.Also decaying at the subband of former frame in addition, gain is weighted to the present frame subband in gain,, prevents because of decay the discomfort of the subjective sense of hearing sensation that causes with the sudden change that the decay between the frame before and after relaxing gains.

Background noise estimator departs from estimator with described subband gain calculation module, subband SNR estimator and frequency spectrum respectively and links to each other, and act as: the energy of real-time estimation input noise, and concrete estimation procedure is as follows:

if(signal_number＝＝FALSE)

if(update_flag＝＝TRUE)

E _n(m+1，i)＝max{E _min，α _nE _n(m，i)+(1-α _n)E _ch(m，i)}

else

E _n(m+1，i)＝max{E _min，α _nE _n(m，i)+(1-α _n)noise_temp(m，i)}

In four frames of beginning, E _n(m i) is initialized as:

E _n(m，i)＝max{E _init，E _ch(m，i)}

E wherein _Init=0.0625

Frequency domain filter links to each other with described automatic gain control module, and it act as: each subband is carried out gain control, reach the purpose of noise reduction, concrete implementation is as follows:

H (k) = {\begin{matrix} γ_{gain} (i) G (k) & f_{L} (i) \leq k \leq f_{H} (i) 0 \leq i \leq N_{c} \\ G (k) & else \end{matrix}

H(k)＝gain_dB×H(k)

The spatial transform module links to each other with described subband gain calculation module, receives the voice signal of process frequency domain transform and is converted into time-domain signal and output.Concrete mapping mode is as follows:

h (m, n) = \frac{1}{2} Σ_{k = 0}^{M - 1}

(k) e^{j 2 πnk / M}

h^{'} (n) = {\begin{matrix} h (m, n) + h (m - 1, n + L); & 0 \leq n < M - L \\ h (m, n); & M - L \leq n < L \end{matrix}

And then above-mentioned output signal carried out aftertreatment, be weighted calculating:

S(n+1)＝h′(n)+ζ _dS(n-1)

In conjunction with above-mentioned system, the method that voice of the present invention are strengthened elaborates below:

First embodiment: the method that a kind of voice strengthen comprises the steps:

1, signal is carried out the branch frame, after the weighting pre-service, windowing transforms to frequency domain;

2, frequency-region signal is carried out the branch subband, calculate the energy of each subband;

3, calculate the signal to noise ratio (S/N ratio) of each subband;

4, present frame is carried out noise decision, judge whether it is noise;

5,, calculate the automatic gain ride gain according to the sub belt energy that calculates;

6,, calculate full band signal to noise ratio (S/N ratio) according to signal to noise ratio (S/N ratio) or the ground unrest energy and the present frame energy of each subband;

7, according to the decay gain of the full band signal to noise ratio (S/N ratio) of calculating and each each subband of subband SNR estimation, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

8, with the automatic gain ride gain of calculating, frequency spectrum is handled, reached automatic gain control, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

9,, noise is upgraded according to noise decision;

10, the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

Calculating automatic gain ride gain in the described step 5 specifically comprises following step:

51, calculate the weighted energy of full band or part subband;

52, when present frame is first frame of non-noise frame, with the weighted energy of full band that calculates or part subband to short-time energy and when long energy carry out initialization;

53, when present frame is not first frame of non-noise frame, earlier short-time energy is weighted level and smooth renewal;

Energy is weighted level and smooth renewal when 54, utilizing short-time energy to length;

Energy and target energy threshold value are subtracted each other when 55, calculating length, and gain is automatically decayed;

56, utilize the decay gain of former frame, carry out level and smooth the automatic decay gain that present frame calculates;

57, amplitude limit is carried out in the automatic decay gain of calculating.

When utilizing short-time energy energy when long to be weighted level and smooth renewals in the described step 54, if short-time energy energy when growing then adopts the first weighting smoothing factor α ₁If short-time energy energy when long adopts the second weighting smoothing factor α ₂, when concrete the application, the described first weighting smoothing factor α ₁=0.9, the second weighting smoothing factor α ₂=0.99, in addition, the first weighting smoothing factor also can be set to 0.8.

Calculate full band signal to noise ratio (S/N ratio) in the described step 6 and specifically comprise following step:

61, calculate the full band signal to noise ratio (S/N ratio) of present frame;

62, the full band signal to noise ratio (S/N ratio) of the present frame that calculates is carried out amplitude limit;

Be with signal to noise ratio (S/N ratio) to be weighted smoothly full when 63, making an uproar final long of comparison with taking a message entirely of the present frame that calculates.

Calculate the full band signal to noise ratio (S/N ratio) of present frame in the described step 61, four kinds of modes specifically arranged:

First kind, the signal to noise ratio (S/N ratio) of all subbands is averaged.

Second kind, the signal to noise ratio (S/N ratio) of all subbands is weighted on average, weights can be 0.

The third, get the ratio of the full band noise energy of full band energy and estimation.

The 4th kind, can get the ratio of the full band noise energy that full band energy that each sub belt energy is weighted and subband noise energy be weighted, weights can be 0.

Entirely be with signal to noise ratio (S/N ratio) to be weighted smoothly final when long with the signal to noise ratio (S/N ratio) of the present frame that calculates in the described step 63, specifically comprise the steps:

631, utilize earlier the signal to noise ratio (S/N ratio) of being with entirely in short-term of making an uproar with former frame of taking a message entirely of the present frame that calculates to be weighted, that calculates present frame is with signal to noise ratio (S/N ratio) in short-term entirely, and weighting factor is β, optional weighting factor β=0.98 when specifically using;

Full band signal to noise ratio (S/N ratio) is weighted when 632, utilizing in short-term entirely with signal to noise ratio (S/N ratio) and former frame long, is with signal to noise ratio (S/N ratio) when calculating present frame long entirely.

During the full band signal to noise ratio (S/N ratio) when calculating present frame long of described step 632, be with signal to noise ratio (S/N ratio) full when needing more entirely with signal to noise ratio (S/N ratio) and former frame long, if be with entirely in short-term signal to noise ratio (S/N ratio) less than or be with signal to noise ratio (S/N ratio) full during smaller or equal to length, adopt the first weighting factor β ₁Otherwise, adopt the second weighting factor β ₂, when concrete the application, the described first weighting factor β ₁=0.995, the second weighting factor β ₂=0.99.

Decay gain according to the full band signal to noise ratio (S/N ratio) of calculating and each each subband of subband SNR estimation in the described step 7 specifically comprises:

71, utilize the initial denoising decay gain of each subband of snr computation of each subband;

73, utilize the make an uproar initial denoising decay gain of each subband of comparison of taking a message entirely to adjust denoising decay gain in the middle of obtaining;

74, utilize the subband denoising decay gain of front some frames that the denoising decay gain of the centre of subband is adjusted, when before the signal characteristic of frame of some be when slowly changing, the big rate of change of denoising decay gain in the middle of the restriction; The signal characteristic of the frame of former some is when changing fast, otherwise then.

Second embodiment, the method that a kind of voice strengthen is made of following step:

3, calculate the signal to noise ratio (S/N ratio) of each subband;

4, according to the signal to noise ratio (S/N ratio) and the sub belt energy that calculate, calculate sound tolerance and frequency spectrum and depart from, thereby carry out noise decision, force to upgrade according to the result of noise decision and adjudicate;

6, each subband signal to noise ratio (S/N ratio) is adjusted;

7,, calculate full band signal to noise ratio (S/N ratio) according to signal to noise ratio (S/N ratio) or the ground unrest energy and the present frame energy of each subband;

8, according to the full band signal to noise ratio (S/N ratio) of calculating, the decay of each each subband of subband SNR estimation gain, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

9, with the automatic gain ride gain of calculating, frequency spectrum is handled, reached automatic gain control, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

10, upgrade judgement according to noise decision and pressure, noise is upgraded;

11, the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

Judgement is upgraded in pressure in the described step 4, specifically is made up of the following step:

41, start a counter, when renewal judgement present frame is non-noise frame, begin to force to upgrade judgement;

42, when counter is 0, the noise of forcing to upgrade is carried out assignment, assignment is each sub belt energy of present frame;

43, when counter is not equal to 0, at each subband, the noise of pressure renewal and the energy of present frame are compared, get its minimum value.

44, counter adds 1, when counter equals a threshold value, upgrades noise with the pressure of calculating and removes to upgrade noise, unison counter clear 0.

In addition owing to may comprise electric echo or acoustic echo in the input data, so automatic gain control is except judging whether it is the noise frame, owing to may also comprise echo frame in non-noise frame, also should increase the step of the automatic gain control calculated gains of echo frame judgement, be specially:

The weighted energy of A, the full band of calculating or part subband;

B, when present frame is first frame of non-noise frame and echo frame, with the weighted energy of full band that calculates or part subband to short-time energy and when long energy carry out initialization;

C, when present frame is not first frame of non-noise frame and echo frame, earlier short-time energy is weighted level and smooth renewal;

D, utilize short-time energy energy when long to be weighted level and smooth renewal;

Energy and target energy threshold value are subtracted each other when E, calculating length, and gain is automatically decayed;

F, utilize the decay gain of former frame, carry out level and smooth the automatic decay gain that present frame calculates;

G, amplitude limit is carried out in the automatic decay gain of calculating.

The 3rd embodiment, the method that a kind of voice strengthen is made of following step:

3, calculate the signal to noise ratio (S/N ratio) of each subband;

5, each subband signal to noise ratio (S/N ratio) is adjusted;

7, according to the full band signal to noise ratio (S/N ratio) of calculating, the decay of each each subband of subband SNR estimation gain, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

8, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

9, upgrade judgement according to noise decision and pressure, noise is upgraded;

The 4th embodiment, the method that a kind of voice strengthen is made of following step:

3, calculate the signal to noise ratio (S/N ratio) of each subband;

4, present frame is carried out noise decision, confirm whether be noise frame;

6, with the automatic gain ride gain of calculating, frequency spectrum is handled, reached automatic gain control;

7,, noise is upgraded according to noise decision;

8, the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

The 5th embodiment, the method that a kind of voice strengthen is made of following step:

3, calculate the signal to noise ratio (S/N ratio) of each subband;

4, present frame is carried out noise decision, confirm whether be noise frame;

5, each subband signal to noise ratio (S/N ratio) is adjusted;

6, according to the decay gain of each each subband of subband SNR estimation, the subband decay gain that frame calculates before utilizing is carried out smoothing processing to each subband decay gain of calculating;

7, utilize each subband decay gain of calculating that the frequency spectrum of each subband is handled, reach the purpose of denoising;

8, according to noise decision noise is upgraded;

9, the spectrum signal after handling is transformed to time domain, be weighted processing, become output signal.

Claims

1, a kind of method of voice enhancing is characterized in that, comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(4), present frame is carried out noise decision, judge whether it is noise;

(9), according to noise decision, noise is upgraded;

2, the method for claim 1 is characterized in that, the calculating automatic gain ride gain in the described step (5) specifically comprises following step:

(51), calculate the weighted energy of full band or part subband;

(52), when present frame is first frame of non-noise frame, with the weighted energy of full band that calculates or part subband to short-time energy and when long energy carry out initialization;

(53), when present frame is not first frame of non-noise frame, earlier short-time energy is weighted level and smooth renewal;

Energy is weighted level and smooth renewal when (54), utilizing short-time energy to length;

(55), calculate that energy and target energy threshold value subtract each other when long, gain is automatically decayed;

(56), utilize the decay gain of former frame, carry out level and smooth to the automatic decay gain that present frame calculates;

(57), amplitude limit is carried out in the automatic decay gain of calculating.

3, method as claimed in claim 2 is characterized in that, when utilizing short-time energy energy when long to be weighted level and smooth renewals in the described step (54), if short-time energy energy when growing then adopts the first weighting smoothing factor α ₁If short-time energy energy when long adopts the second weighting smoothing factor α ₂

4, method as claimed in claim 3 is characterized in that, the described first weighting smoothing factor α ₁=0.9, the second weighting smoothing factor α ₂=0.99.

5, as each described method in the claim 1 to 4, it is characterized in that, calculate full band signal to noise ratio (S/N ratio) in the described step (6) and specifically comprise following step:

(61), calculate the full band signal to noise ratio (S/N ratio) of present frame;

(62), the full band signal to noise ratio (S/N ratio) of the present frame that calculates is carried out amplitude limit;

Be with signal to noise ratio (S/N ratio) to be weighted smoothly full when (63), making an uproar final long of comparison with taking a message entirely of the present frame that calculates.

6, method as claimed in claim 5 is characterized in that, calculates the full band signal to noise ratio (S/N ratio) of present frame in the described step (61), and four kinds of modes are specifically arranged:

First kind, the signal to noise ratio (S/N ratio) of all subbands is averaged;

Second kind, the signal to noise ratio (S/N ratio) of all subbands is weighted on average, weights can be 0;

The third, get the ratio of the full band noise energy of full band energy and estimation;

The 4th kind, get the ratio of the full band noise energy that full band energy that each sub belt energy is weighted and subband noise energy be weighted, weights can be 0.

7, method as claimed in claim 6 is characterized in that, entirely is with signal to noise ratio (S/N ratio) to be weighted smoothly final when long with the signal to noise ratio (S/N ratio) of the present frame that calculates in the described step (63), specifically comprises the steps:

(631), earlier utilize the signal to noise ratio (S/N ratio) of being with entirely in short-term of making an uproar with former frame of taking a message entirely of the present frame that calculates to be weighted, that calculates present frame is with signal to noise ratio (S/N ratio) in short-term entirely;

Be with signal to noise ratio (S/N ratio) when (632), full band signal to noise ratio (S/N ratio) is weighted when utilizing in short-term entirely with signal to noise ratio (S/N ratio) and former frame long, calculating present frame long entirely.

8, method as claimed in claim 7, it is characterized in that, described step (632) when calculating present frame long, be with signal to noise ratio (S/N ratio) full the time, be with signal to noise ratio (S/N ratio) when needing more entirely entirely with signal to noise ratio (S/N ratio) and former frame long, if be with entirely in short-term signal to noise ratio (S/N ratio) less than or when long full band signal to noise ratio (S/N ratio), adopt the first weighting factor β ₁Otherwise, adopt the second weighting factor β ₂

9, method as claimed in claim 8 is characterized in that, the described first weighting factor β ₁=0.995, the second weighting factor β ₂=0.99.

10, as each described method in the claim 1 to 4, it is characterized in that, estimate that according to the full band signal to noise ratio (S/N ratio) of calculating and each subband signal to noise ratio (S/N ratio) the decay gain of each subband specifically comprises in the described step 7:

(71), utilize the initial denoising decay gain of each subband of snr computation of each subband;

(72), utilize the make an uproar initial denoising decay gain of each subband of comparison of taking a message entirely to adjust denoising decay gain in the middle of obtaining;

(73), utilize the subband denoising decay gain of front some frames that middle the denoising decay gain of subband is adjusted, when before the signal characteristic of frame of some be when slowly changing, the big rate of change that gains of the denoising decay in the middle of the restriction; The signal characteristic of the frame of former some is when changing fast, otherwise then.

11, a kind of method of voice enhancing is characterized in that, comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(6), each subband signal to noise ratio (S/N ratio) is adjusted;

12, method as claimed in claim 11 is characterized in that, judgement is upgraded in the pressure in the described step 4, specifically is made up of the following step:

(41), start a counter, when upgrading the judgement present frame and be non-noise frame, begin to force to upgrade judgement;

(42), when counter is 0, the noise that force to upgrade is carried out assignment, assignment is each sub belt energy of present frame;

(43), when counter is not equal to 0, at each subband, the noise that force to upgrade and the energy of present frame are compared, get its minimum value;

(44), counter adds 1, when counter equals a threshold value, upgrade noise with the pressure of calculating and remove to upgrade noise, unison counter clear 0.

13, method as claimed in claim 11 is characterized in that, described method also comprises the step of the automatic gain control calculated gains that echo frame is judged, is specially:

The weighted energy of A, the full band of calculating or part subband;

G, amplitude limit is carried out in the automatic decay gain of calculating.

14, a kind of method of voice enhancing is characterized in that, comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(5), each subband signal to noise ratio (S/N ratio) of estimating is adjusted;

15, a kind of method of voice enhancing is characterized in that, comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(7), according to noise decision, noise is upgraded;

16, a kind of method of voice enhancing is characterized in that, comprises the steps:

(3), calculate the signal to noise ratio (S/N ratio) of each subband;

(5), each subband signal to noise ratio (S/N ratio) is adjusted;

(8), according to noise decision, noise is upgraded;