CN102054482B

CN102054482B - Method and device for enhancing voice signal

Info

Publication number: CN102054482B
Application number: CN2009102369170A
Authority: CN
Inventors: 刘霖; 田康
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2009-10-27
Filing date: 2009-10-27
Publication date: 2012-11-28
Anticipated expiration: 2029-10-27
Also published as: CN102054482A

Abstract

The embodiment of the invention discloses a method and a device for enhancing a voice signal. The method comprises the steps of: obtaining a noising voice signal, and carrying out perception weighted filtering on the noising voice signal; converting the noising voice signal subjected to the perception weighted filtering into a frequency domain, carrying out spectrum subtraction and phase synthesis on the noising voice signal in the frequency domain, and converting the voice signal subjected to the spectrum subtraction and phase synthesis into a time domain; and carrying out inverse perception weighted filtering on the voice signal subjected to the spectrum subtraction and the phase synthesis to obtain an enhanced voice signal. Through using the invention, the noising voice signal is subjected to the perception weighted filtering, the interference of the noising voice signal to the noise is effectively eliminated, the enhanced voice signal is obtained and the human vision is met.

Description

The method and apparatus that a kind of voice signal strengthens

Technical field

The present invention relates to communication technical field, the method and apparatus that particularly a kind of voice signal strengthens.

Background technology

Along with the development of 3G (3rd Generation, 3-G (Generation Three mobile communication system)), visual telephone service has obtained using widely.Visual telephone service can let both call sides observe the residing conversation scene of the other side when realizing basic communication, has strengthened user's use experience.In calling course of video telephone; In order to let camera capture the real-time conversation scene image of both call sides; Both call sides need keep certain distance with mobile phone microphone when conversation, therefore, sneaked into a large amount of noises in the call voice signal that mobile phone microphone collects; The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone.

In the prior art; In order to reduce the interference of noise to speech quality; Noisy Speech Signal is transformed to frequency domain through Fourier transform, Noisy Speech Signal is carried out spectral subtraction algorithm, from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise at frequency domain; Obtain the amplitude spectrum of clean speech, its principle is as follows:

The noisy speech model is:

Y (n)=s (n)+d (n) formula (1)

Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.

Fourier transform is made on formula (1) both sides, can be got:

Y (k)=S (k)+D (k) formula (2)

Wherein, Y (k) represents the Fourier coefficient of noisy speech, and S (k) represents the Fourier coefficient of clean speech, and D (k) represents the Fourier coefficient of noise.

Ignore the phase difference between noisy speech and the clean speech, can get:

| Y (k) |=| S (k) |+| D (k) | formula (3)

Utilize the insensitivity of people's ear, can directly from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech for phase information, and as the amplitude spectrum of the enhancing voice that obtain.Obtaining basic expression formula thus is:

formula (4)

And in actual use, much more more what to use is the improved form of spectrum subtraction, and formula (5) is the improved form of spectrum subtraction:

| \hat{S} (k) | = {[{| Y (k) |}^{α} - β {| D (k) |}^{α}]}^{1 / α}

Formula (5)

The spectral subtraction algorithm of this improved form and the difference of common spectral subtraction algorithm are to have introduced α and two parameters of β, and the introducing of parameter provides very big dirigibility to spectral subtraction algorithm.Noisy speech is carried out the system principle of spectrum subtraction, as shown in Figure 1.

Yet; Utilize the process of spectral subtraction algorithm abating noises to have following technological deficiency in the prior art: when noisy speech is carried out spectral substraction; Can't judge noise spectrum and voice spectrum accurately; Therefore the spectral substraction algorithm also makes voice spectrum receive bigger subduing in abating noises, has influenced the perception of human auditory to normal voice spectrum.

Therefore; Carry out the reduction that occurs in the process of voice enhancing based on utilizing spectral subtraction algorithm in the prior art to voice signal; Existing spectral subtraction algorithm has been done a lot of improvement,, optimized the performance that voice strengthen through the intensity of abating noises in the adjustment spectral substraction.

Scheme 1 according to the frequency spectrum probability nature of noisy speech and the probability nature of noise spectrum, averages calculating, in order to the intensity of control abating noises amplitude;

Scheme 2, with α=1 in traditional spectral subtraction algorithm, β=2 change α=2 into, β=5, the subtractive method of spectrums that is improved, the coefficient that utilizes training to obtain, the intensity of control noise reduction.

In the implementation process of scheme 1 and scheme 2, the applicant finds, in scheme 1; Its implementation procedure complexity is high; Be based on probability distribution for the control of noise reduction and carry out, do not combine human auditory's characteristic, can not be very satisfactory on human auditory's effect; In scheme 2; Though, obtain a pair of effect α preferably, the β value through a large amount of experiments; But because applied environment is constantly changing; This mode possibly obtain reasonable effect under certain environment, and under bigger environment, its noise reduction control still can not obtain gratifying effect.Therefore; Above-mentioned through the adjustment for abating noises intensity in the spectral substraction, in the scheme of the performance that the optimization voice strengthen, the subject matter of existence is: when noisy speech is carried out spectral substraction; Because noise spectrum can only draw through estimation; The noise spectrum of its estimation is not accurate enough, in carrying out the spectral substraction process, and maybe be because damping strength be controlled bad making when cutting down noise; Too much reduction the intensity of speech manual, the voice signal after being enhanced can not satisfy human auditory's needs.

Summary of the invention

The method and apparatus that the embodiment of the invention provides a kind of voice signal to strengthen, the voice signal that is used to be enhanced satisfies human auditory's characteristics simultaneously.

The method that the embodiment of the invention provides a kind of voice signal to strengthen comprises:

Obtain Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;

Said Noisy Speech Signal after said sense weighted filtering handled is transformed into frequency domain, and it is synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;

Voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced;

Wherein, it is synthetic that said Noisy Speech Signal is carried out spectral substraction and phase place, comprising:

Noise-cut intensity to people Er Yi perception frequency range in the said Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.

Preferably, said Noisy Speech Signal is carried out perception weighted filtering handles, comprising:

The frequency band signals amplitude of people Er Yi perception in the said Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the said Noisy Speech Signal is difficult for perception reduces.

Preferably, said Noisy Speech Signal being carried out perception weighted filtering handles employed transport function and is:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

Wherein, γ ₁And γ ₂Be weighting factor, α _iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.

Preferably, said Noisy Speech Signal is carried out perception weighted filtering handle, said Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative ₁And γ ₂Be weighting factor, α _iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.

Preferably, the voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:

The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with said spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.

The embodiment of the invention provides a kind of voice signal enhanced device, it is characterized in that, comprising:

The perception weighted filtering module is used for after obtaining Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;

The spectrum subtraction module, after the said Noisy Speech Signal after said sense weighted filtering is handled was transformed into frequency domain, it was synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;

Contrary perception weighted filtering module, the voice signal that is used for said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering processing, the voice signal that is enhanced;

Wherein, said spectrum subtraction module specifically is used for:

Preferably, said perception weighted filtering module specifically is used for:

Preferably, said perception weighted filtering module, said Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

Preferably, said perception weighted filtering module is carried out the perception weighted filtering processing to said Noisy Speech Signal, and said Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Preferably, said contrary perception weighted filtering module specifically is used for:

Compared with prior art; The embodiment of the invention has the following advantages: in time domain Noisy Speech Signal is carried out perception weighted filtering and handle; The frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, and accomplishes the perception weighted filtering processing back of noisy speech synthetic in spectral substraction and phase place that frequency domain carries out noisy speech; Make in the noise-cut intensity of the frequency range of people Er Yi perception less; The noise-cut intensity of frequency range that is difficult for perception at people's ear is bigger, and the voice signal that is enhanced has satisfied human auditory's characteristics.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is for carrying out the system principle synoptic diagram of spectral subtraction algorithm in the prior art to noisy speech;

The schematic flow sheet of the method that a kind of voice signal that Fig. 2 provides for the embodiment of the invention strengthens;

The system principle synoptic diagram that a kind of voice signal that Fig. 3 provides for the embodiment of the invention strengthens;

Fig. 4 is the schematic flow sheet of the method for a kind of voice signal enhancing under the application scenarios of the present invention.

A kind of voice signal enhanced device structural representation that Fig. 5 provides for the embodiment of the invention.

Embodiment

In embodiments of the present invention, Noisy Speech Signal is carried out perception weighted filtering handle, with the frequency band signals amplitude rising of people Er Yi perception in the Noisy Speech Signal; The frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces; Completion is transformed into frequency domain through Fourier transform with Noisy Speech Signal after the perception weighted filtering of Noisy Speech Signal is handled, and at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm; Frequency range noise reduction intensity in people Er Yi perception is less; The intensity of frequency range noise reduction that is difficult for perception at people's ear is bigger, after accomplishing spectral subtraction algorithm, voice signal is transformed into time domain through inverse Fourier transform; Handle the voice signal that is enhanced through contrary perception weighted filtering again.

As shown in Figure 2, the schematic flow sheet of the method that a kind of voice signal that provides for the embodiment of the invention strengthens may further comprise the steps:

Step 201, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle.

Obtain Noisy Speech Signal after the conversation beginning; In time domain Noisy Speech Signal being carried out perception weighted filtering handles; Perception weighted filtering is handled and to be made that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.

Step 202, the Noisy Speech Signal after will feeling weighted filtering and handling are transformed into frequency domain, and it is synthetic with phase place at frequency domain Noisy Speech Signal to be carried out spectral substraction, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain.

Noisy speech is behind perception weighted filtering; Through Fourier transform Noisy Speech Signal is transformed into frequency domain; It is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain through inverse Fourier transform.

Concrete, Noisy Speech Signal is carried out spectral substraction synthesize with phase place, make that the noise-cut intensity of people Er Yi perception frequency range is little in the Noisy Speech Signal, the noise-cut intensity of frequency range that people's ear in the Noisy Speech Signal is difficult for perception is big.

Step 203, the voice signal to spectral substraction and phase place after synthetic carry out contrary perception weighted filtering and handle the voice signal that is enhanced.

Noisy speech is carried out spectral substraction and the phase place voice signal after synthetic to be handled through contrary perception weighted filtering; The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic; The frequency band signals amplitude that people's ear is difficult for perception raises the voice signal after being enhanced.Voice signal after the enhancing is less in the intensity of its noise-cut of frequency range of people Er Yi perception, and people's ear to be difficult for the intensity of its noise-cut of frequency range of perception bigger, the effect that voice strengthen is obvious, satisfies human auditory's characteristics.

The method that the voice signal that the application of the invention embodiment provides strengthens; The frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising; The frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.

In order more clearly to understand technical scheme of the present invention, detailed introduction is done in the perception weighted filtering processing of Noisy Speech Signal.

In implementation process of the present invention; Noisy Speech Signal carries out before the spectral subtraction algorithm; Noisy Speech Signal is carried out perception weighted filtering in time domain handle, the voice signal after perception weighted filtering is handled is transformed into frequency domain through Fourier transform with Noisy Speech Signal; At frequency domain Noisy Speech Signal is carried out spectral substraction and phase place and synthetic; Voice signal with spectral substraction and phase place after synthetic is transformed into time domain through inverse Fourier transform, handles the voice signal that is enhanced through contrary perception weighted filtering again.

Concrete, it is through making up a perceptual weighting filter, through using this perceptual weighting filter Noisy Speech Signal is carried out filtering, voice signal being transformed to the perceptual weighting territory that the perception weighted filtering of Noisy Speech Signal is handled.

In embodiments of the present invention, the transport function of perceptual weighting filter is as follows:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1 formula (6)

Wherein, γ ₁And γ ₂Be two weighting factors, γ ₁And γ ₂Can obtain optimum value by a large amount of training; α _iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can read from the Voice decoder code stream, has reduced the complexity in the perceptual weighting implementation procedure.

After Noisy Speech Signal handled through perception weighted filtering, be in the expression formula of time domain:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Formula (7)

Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Noisy Speech Signal makes that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal after the said sensed weighted filtering is handled, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.After the perception weighted filtering of accomplishing Noisy Speech Signal was handled, less to the intensity of the frequency range noise-cut of people Er Yi perception in carrying out spectral substraction and phase place building-up process, the intensity of frequency range noise-cut that people's ear is difficult for perception was bigger.

Concrete; When carrying out spectral substraction; Intensity to the frequency range noise-cut of people Er Yi perception is less, and promptly the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; In addition; The intensity of frequency range noise-cut that people's ear is difficult for perception is bigger; Be that the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity under the signal amplitude after the reduction, increased be difficult for perception at people's ear frequency range to noise abatement intensity, satisfied human auditory's characteristics.

Behind the spectral substraction of accomplishing noisy speech; Handle through contrary perception weighted filtering; The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic; The frequency band signals amplitude that people's ear is difficult for perception raises, and has realized the voice reinforced effects to Noisy Speech Signal.Contrary perception weighted filtering processing to voice signal is reversible process with perception weighted filtering, and specific algorithm repeats no more at this.

Based on above-mentioned perception weighted filtering principle to Noisy Speech Signal, the invention provides the system principle synoptic diagram that a kind of voice signal strengthens, as shown in Figure 3.As can be seen from Figure 3; When the conversation beginning; Obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle, Noisy Speech Signal is transformed to the perceptual weighting territory after Fourier transform is transformed into frequency domain with Noisy Speech Signal; It is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place; Be transformed into time domain through inverse Fourier transform successively with accomplishing the synthetic voice signal of spectral substraction and phase place, handle the voice signal after being enhanced through contrary perception weighted filtering again.Wherein, handling with the perception weighted filtering of noisy speech is handled to the contrary perception weighted filtering of accomplishing the synthetic voice signal of spectral substraction and phase place is reversible process.

Need to prove, in embodiments of the present invention, Noisy Speech Signal is carried out in the spectral substraction process; Its noise spectrum is the experience estimation value, obtains the optimum spectral range of noise spectrum through a large amount of training, and can be according to the spectrum intensity moderate change of Noisy Speech Signal; Assurance is carried out in the spectral substraction process Noisy Speech Signal; Noise in the effective subduction zone noisy speech signal, the voice signal that is enhanced, the variation of its noise spectrum under varying environment repeated no more at this.

Below, in conjunction with concrete application scenarios of the present invention technical scheme of the present invention is set forth in detail, as shown in Figure 4, the schematic flow sheet of the method that strengthens for a kind of voice signal under the application scenarios of the present invention may further comprise the steps:

Noisy Speech Signal is obtained in step 401, conversation beginning.

Conversation on video telephone begins; Both call sides can be observed the other side's conversation scene video; And both call sides and telephone microphone are keeping certain distance to make things convenient for collection of video signal; Therefore in the voice signal that microphone collects except the voice signal of both call sides, also sneaked into a large amount of noises, in embodiments of the present invention this voice signal of sneaking into much noise is called Noisy Speech Signal.The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone, in order to improve the signal to noise ratio (S/N ratio) of conversation signal, the Noisy Speech Signal that obtains is carried out noise reduction handle.The model of noisy speech is suc as formula shown in (1) in the embodiment of the invention.

The noisy speech model is:

Y (n)=s (n)+d (n) formula (1)

Need to prove; Perception weighted filtering is handled and the spectral substraction algorithm is subdued the noise intensity in the voice signal through Noisy Speech Signal is carried out in embodiments of the present invention; Except in calling course of video telephone, cutting down noise, can also be used for other communication mode and realize to noise abatement, like plain old telephone; Visual telephone or the like, the variation of this application scenarios is repeated no more at this.

Step 402, from decoded stream, obtain the LP Prediction Parameters, make up perceptual weighting filter.

Noisy Speech Signal is carried out before the spectral subtraction algorithm; Noisy Speech Signal is carried out perception weighted filtering to be handled; Perception weighted filtering processing to noisy speech realizes that through making up perceptual weighting filter in the present invention, the transport function of perceptual weighting filter is suc as formula shown in (6):

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1 formula (6)

Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can directly read from decoded stream.According to weighting factor γ ₁And γ ₂, and the LP Prediction Parameters of from decoded stream, obtaining, make up perceptual weighting filter.

Step 403, Noisy Speech Signal is carried out perception weighted filtering handle.

Noisy Speech Signal y (n)=s (n)+d (n) is carried out perception weighted filtering through perceptual weighting filter handle, Noisy Speech Signal is when carrying out the perception weighted filtering processing, and its Noisy Speech Signal is the time domain Noisy Speech Signal.Noisy Speech Signal carries out through perceptual weighting filter that its form is a time domain after the filtering, and its expression formula in time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Formula (7)

Wherein, x (k) represents the Noisy Speech Signal before the perceptual weighting, and y (k) representative is through the Noisy Speech Signal after the perceptual weighting, γ ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

Handle through Noisy Speech Signal being carried out perception weighted filtering, can be so that the frequency band signals amplitude of people Er Yi perception raise in the Noisy Speech Signal, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.

Step 404, Noisy Speech Signal is carried out Fourier transform, Noisy Speech Signal is transformed into frequency domain.

Noisy Speech Signal converts the voice signal function of time domain the voice signal function of frequency field to through Fourier transform after perception weighted filtering is handled, and carries out the spectral substraction algorithm of Noisy Speech Signal at frequency domain.Concrete, through Fourier transform the process that Noisy Speech Signal is transformed into frequency domain is repeated no more.

Step 405, at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place synthetic.

Noisy Speech Signal is after perception weighted filtering processing and Fourier transform; It is synthetic at frequency domain noisy speech to be carried out spectral substraction and phase place; At this moment, the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and people's ear is difficult for the frequency band signals amplitude reduction of perception in the Noisy Speech Signal.

Noisy speech is carried out spectral substraction and phase place when synthetic, use less noise-cut intensity in the frequency range of people Er Yi perception, the frequency range that is difficult for perception at people's ear is used bigger noise-cut intensity.Concrete, the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; People's ear is difficult for the frequency range of perception and under the signal amplitude after the reduction, carries out bigger noise-cut intensity, increased be difficult for perception at people's ear frequency range to noise abatement intensity.

Step 406, the voice signal that will accomplish spectral subtraction algorithm are transformed into time domain through inverse Fourier transform.

Noisy speech is transformed into time domain through inverse Fourier transform with Noisy Speech Signal after accomplishing the spectral substraction algorithm, inverse Fourier transform is the inverse operation of Fourier transform, and concrete conversion process repeats no more once more.

Step 407, the signal that will be transformed into time domain carry out contrary perception weighted filtering and handle the voice signal that is enhanced.

With accomplish spectral substraction with phase place the voice signal after synthetic handle through contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception is reduced, people's ear is difficult for the frequency band signals amplitude rising of perception, the voice signal that is enhanced.The contrary perception weighted filtering of voice signal is handled process reversible when the perception weighted filtering of Noisy Speech Signal is handled, and specific algorithm repeats no more at this.

As shown in Figure 5, a kind of voice signal enhanced device 500 structural representations for the embodiment of the invention provides comprise:

Perception weighted filtering module 510 is used for after obtaining Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle;

Spectrum subtraction module 520; Be connected with perception weighted filtering module 510; After Noisy Speech Signal after the sense weighted filtering handled was transformed into frequency domain, it was synthetic with phase place at frequency domain Noisy Speech Signal to be carried out spectral substraction, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain;

Contrary perception weighted filtering module 530 is connected with spectrum subtraction module 520, and the voice signal that is used for spectral substraction and phase place after synthetic carries out against perception weighted filtering processing, the voice signal that is enhanced.

Wherein, perception weighted filtering module 510 specifically is used for: the frequency band signals amplitude of Noisy Speech Signal people Er Yi perception is raise, and the frequency band signals amplitude that the people's ear in the Noisy Speech Signal is difficult for perception reduces.

510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled employed transport function and is:

W (z) = \frac{A (z / γ_{1})}{A (z / γ_{2})} = \frac{1 - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} z^{- i}}{1 - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} z^{- i}},

0＜γ ₂＜γ ₁≤1

Wherein, γ ₁And γ ₂Be weighting factor, α _iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.

510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled, and Noisy Speech Signal in the expression formula of time domain is:

y (k) - Σ_{i = 1}^{p} α_{i} γ_{1}^{i} y (k - i) = x (k) - Σ_{i = 1}^{p} α_{i} γ_{2}^{i} x (k - i)

Spectrum subtraction module 520 specifically is used for: the noise-cut intensity to said Noisy Speech Signal people Er Yi perception frequency range is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.

Contrary perception weighted filtering module 530 specifically is used for: the frequency band signals amplitude of voice signal people Er Yi perception is reduced, and the frequency band signals amplitude that people's ear is difficult for perception raises.

The method and apparatus that a kind of voice signal that the application of the invention embodiment is provided strengthens; In time domain Noisy Speech Signal being carried out perception weighted filtering handles; The frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise; The frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces; And the frequency range of people Er Yi perception in the Noisy Speech Signal carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.

Through the description of above embodiment, those skilled in the art can be well understood to the embodiment of the invention and can realize through hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding; The technical scheme of the embodiment of the invention can be come out with the embodied of software product, this software product can be stored in a non-volatile memory medium (can be CD-ROM, USB flash disk; Portable hard drive etc.) in; Comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that embodiment of the present invention embodiment is necessary.

It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.

The invention described above embodiment sequence number is not represented the quality of embodiment just to description.

More than disclosedly be merely several specific embodiment of the present invention, still, the embodiment of the invention is not limited thereto, any those skilled in the art can think variation all should fall into the protection domain of the embodiment of the invention.

Claims

1. the method that voice signal strengthens is characterized in that, comprising:

2. the method for claim 1 is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle, and comprising:

3. according to claim 1 or claim 2 method is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:

0＜γ ₂＜γ ₁≤1

4. according to claim 1 or claim 2 method is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle, and said Noisy Speech Signal in the expression formula of time domain is:

5. the method for claim 1 is characterized in that, the voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:

6. a voice signal enhanced device is characterized in that, comprising:

Wherein, said spectrum subtraction module specifically is used for:

7. device as claimed in claim 6 is characterized in that, said perception weighted filtering module specifically is used for:

8. like claim 6 or 7 described devices, it is characterized in that said perception weighted filtering module is carried out perception weighted filtering to said Noisy Speech Signal and handled employed transport function and be:

0＜γ ₂＜γ ₁≤1

9. like claim 6 or 7 described devices, it is characterized in that said perception weighted filtering module is carried out the perception weighted filtering processing to said Noisy Speech Signal, said Noisy Speech Signal in the expression formula of time domain is:

10. device as claimed in claim 6 is characterized in that, said contrary perception weighted filtering module specifically is used for: