CN102054482B - Method and device for enhancing voice signal - Google Patents

Method and device for enhancing voice signal Download PDF

Info

Publication number
CN102054482B
CN102054482B CN2009102369170A CN200910236917A CN102054482B CN 102054482 B CN102054482 B CN 102054482B CN 2009102369170 A CN2009102369170 A CN 2009102369170A CN 200910236917 A CN200910236917 A CN 200910236917A CN 102054482 B CN102054482 B CN 102054482B
Authority
CN
China
Prior art keywords
perception
noisy speech
speech signal
voice signal
weighted filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009102369170A
Other languages
Chinese (zh)
Other versions
CN102054482A (en
Inventor
刘霖
田康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN2009102369170A priority Critical patent/CN102054482B/en
Publication of CN102054482A publication Critical patent/CN102054482A/en
Application granted granted Critical
Publication of CN102054482B publication Critical patent/CN102054482B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the invention discloses a method and a device for enhancing a voice signal. The method comprises the steps of: obtaining a noising voice signal, and carrying out perception weighted filtering on the noising voice signal; converting the noising voice signal subjected to the perception weighted filtering into a frequency domain, carrying out spectrum subtraction and phase synthesis on the noising voice signal in the frequency domain, and converting the voice signal subjected to the spectrum subtraction and phase synthesis into a time domain; and carrying out inverse perception weighted filtering on the voice signal subjected to the spectrum subtraction and the phase synthesis to obtain an enhanced voice signal. Through using the invention, the noising voice signal is subjected to the perception weighted filtering, the interference of the noising voice signal to the noise is effectively eliminated, the enhanced voice signal is obtained and the human vision is met.

Description

The method and apparatus that a kind of voice signal strengthens
Technical field
The present invention relates to communication technical field, the method and apparatus that particularly a kind of voice signal strengthens.
Background technology
Along with the development of 3G (3rd Generation, 3-G (Generation Three mobile communication system)), visual telephone service has obtained using widely.Visual telephone service can let both call sides observe the residing conversation scene of the other side when realizing basic communication, has strengthened user's use experience.In calling course of video telephone; In order to let camera capture the real-time conversation scene image of both call sides; Both call sides need keep certain distance with mobile phone microphone when conversation, therefore, sneaked into a large amount of noises in the call voice signal that mobile phone microphone collects; The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone.
In the prior art; In order to reduce the interference of noise to speech quality; Noisy Speech Signal is transformed to frequency domain through Fourier transform, Noisy Speech Signal is carried out spectral subtraction algorithm, from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise at frequency domain; Obtain the amplitude spectrum of clean speech, its principle is as follows:
The noisy speech model is:
Y (n)=s (n)+d (n) formula (1)
Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.
Fourier transform is made on formula (1) both sides, can be got:
Y (k)=S (k)+D (k) formula (2)
Wherein, Y (k) represents the Fourier coefficient of noisy speech, and S (k) represents the Fourier coefficient of clean speech, and D (k) represents the Fourier coefficient of noise.
Ignore the phase difference between noisy speech and the clean speech, can get:
| Y (k) |=| S (k) |+| D (k) | formula (3)
Utilize the insensitivity of people's ear, can directly from the amplitude spectrum of noisy speech, deduct the amplitude spectrum of noise, obtain the amplitude spectrum of clean speech for phase information, and as the amplitude spectrum of the enhancing voice that obtain.Obtaining basic expression formula thus is:
Figure G2009102369170D00021
formula (4)
And in actual use, much more more what to use is the improved form of spectrum subtraction, and formula (5) is the improved form of spectrum subtraction:
| S ^ ( k ) | = [ | Y ( k ) | α - β | D ( k ) | α ] 1 / α Formula (5)
The spectral subtraction algorithm of this improved form and the difference of common spectral subtraction algorithm are to have introduced α and two parameters of β, and the introducing of parameter provides very big dirigibility to spectral subtraction algorithm.Noisy speech is carried out the system principle of spectrum subtraction, as shown in Figure 1.
Yet; Utilize the process of spectral subtraction algorithm abating noises to have following technological deficiency in the prior art: when noisy speech is carried out spectral substraction; Can't judge noise spectrum and voice spectrum accurately; Therefore the spectral substraction algorithm also makes voice spectrum receive bigger subduing in abating noises, has influenced the perception of human auditory to normal voice spectrum.
Therefore; Carry out the reduction that occurs in the process of voice enhancing based on utilizing spectral subtraction algorithm in the prior art to voice signal; Existing spectral subtraction algorithm has been done a lot of improvement,, optimized the performance that voice strengthen through the intensity of abating noises in the adjustment spectral substraction.
Scheme 1 according to the frequency spectrum probability nature of noisy speech and the probability nature of noise spectrum, averages calculating, in order to the intensity of control abating noises amplitude;
Scheme 2, with α=1 in traditional spectral subtraction algorithm, β=2 change α=2 into, β=5, the subtractive method of spectrums that is improved, the coefficient that utilizes training to obtain, the intensity of control noise reduction.
In the implementation process of scheme 1 and scheme 2, the applicant finds, in scheme 1; Its implementation procedure complexity is high; Be based on probability distribution for the control of noise reduction and carry out, do not combine human auditory's characteristic, can not be very satisfactory on human auditory's effect; In scheme 2; Though, obtain a pair of effect α preferably, the β value through a large amount of experiments; But because applied environment is constantly changing; This mode possibly obtain reasonable effect under certain environment, and under bigger environment, its noise reduction control still can not obtain gratifying effect.Therefore; Above-mentioned through the adjustment for abating noises intensity in the spectral substraction, in the scheme of the performance that the optimization voice strengthen, the subject matter of existence is: when noisy speech is carried out spectral substraction; Because noise spectrum can only draw through estimation; The noise spectrum of its estimation is not accurate enough, in carrying out the spectral substraction process, and maybe be because damping strength be controlled bad making when cutting down noise; Too much reduction the intensity of speech manual, the voice signal after being enhanced can not satisfy human auditory's needs.
Summary of the invention
The method and apparatus that the embodiment of the invention provides a kind of voice signal to strengthen, the voice signal that is used to be enhanced satisfies human auditory's characteristics simultaneously.
The method that the embodiment of the invention provides a kind of voice signal to strengthen comprises:
Obtain Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;
Said Noisy Speech Signal after said sense weighted filtering handled is transformed into frequency domain, and it is synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;
Voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced;
Wherein, it is synthetic that said Noisy Speech Signal is carried out spectral substraction and phase place, comprising:
Noise-cut intensity to people Er Yi perception frequency range in the said Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.
Preferably, said Noisy Speech Signal is carried out perception weighted filtering handles, comprising:
The frequency band signals amplitude of people Er Yi perception in the said Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the said Noisy Speech Signal is difficult for perception reduces.
Preferably, said Noisy Speech Signal being carried out perception weighted filtering handles employed transport function and is:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
Preferably, said Noisy Speech Signal is carried out perception weighted filtering handle, said Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
Preferably, the voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with said spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
The embodiment of the invention provides a kind of voice signal enhanced device, it is characterized in that, comprising:
The perception weighted filtering module is used for after obtaining Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;
The spectrum subtraction module, after the said Noisy Speech Signal after said sense weighted filtering is handled was transformed into frequency domain, it was synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;
Contrary perception weighted filtering module, the voice signal that is used for said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering processing, the voice signal that is enhanced;
Wherein, said spectrum subtraction module specifically is used for:
Noise-cut intensity to people Er Yi perception frequency range in the said Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.
Preferably, said perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception in the said Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the said Noisy Speech Signal is difficult for perception reduces.
Preferably, said perception weighted filtering module, said Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
Preferably, said perception weighted filtering module is carried out the perception weighted filtering processing to said Noisy Speech Signal, and said Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
Preferably, said contrary perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with said spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
Compared with prior art; The embodiment of the invention has the following advantages: in time domain Noisy Speech Signal is carried out perception weighted filtering and handle; The frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces, and accomplishes the perception weighted filtering processing back of noisy speech synthetic in spectral substraction and phase place that frequency domain carries out noisy speech; Make in the noise-cut intensity of the frequency range of people Er Yi perception less; The noise-cut intensity of frequency range that is difficult for perception at people's ear is bigger, and the voice signal that is enhanced has satisfied human auditory's characteristics.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is for carrying out the system principle synoptic diagram of spectral subtraction algorithm in the prior art to noisy speech;
The schematic flow sheet of the method that a kind of voice signal that Fig. 2 provides for the embodiment of the invention strengthens;
The system principle synoptic diagram that a kind of voice signal that Fig. 3 provides for the embodiment of the invention strengthens;
Fig. 4 is the schematic flow sheet of the method for a kind of voice signal enhancing under the application scenarios of the present invention.
A kind of voice signal enhanced device structural representation that Fig. 5 provides for the embodiment of the invention.
Embodiment
In embodiments of the present invention, Noisy Speech Signal is carried out perception weighted filtering handle, with the frequency band signals amplitude rising of people Er Yi perception in the Noisy Speech Signal; The frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces; Completion is transformed into frequency domain through Fourier transform with Noisy Speech Signal after the perception weighted filtering of Noisy Speech Signal is handled, and at frequency domain Noisy Speech Signal is carried out spectral subtraction algorithm; Frequency range noise reduction intensity in people Er Yi perception is less; The intensity of frequency range noise reduction that is difficult for perception at people's ear is bigger, after accomplishing spectral subtraction algorithm, voice signal is transformed into time domain through inverse Fourier transform; Handle the voice signal that is enhanced through contrary perception weighted filtering again.
As shown in Figure 2, the schematic flow sheet of the method that a kind of voice signal that provides for the embodiment of the invention strengthens may further comprise the steps:
Step 201, obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle.
Obtain Noisy Speech Signal after the conversation beginning; In time domain Noisy Speech Signal being carried out perception weighted filtering handles; Perception weighted filtering is handled and to be made that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.
Step 202, the Noisy Speech Signal after will feeling weighted filtering and handling are transformed into frequency domain, and it is synthetic with phase place at frequency domain Noisy Speech Signal to be carried out spectral substraction, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain.
Noisy speech is behind perception weighted filtering; Through Fourier transform Noisy Speech Signal is transformed into frequency domain; It is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain through inverse Fourier transform.
Concrete, Noisy Speech Signal is carried out spectral substraction synthesize with phase place, make that the noise-cut intensity of people Er Yi perception frequency range is little in the Noisy Speech Signal, the noise-cut intensity of frequency range that people's ear in the Noisy Speech Signal is difficult for perception is big.
Step 203, the voice signal to spectral substraction and phase place after synthetic carry out contrary perception weighted filtering and handle the voice signal that is enhanced.
Noisy speech is carried out spectral substraction and the phase place voice signal after synthetic to be handled through contrary perception weighted filtering; The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic; The frequency band signals amplitude that people's ear is difficult for perception raises the voice signal after being enhanced.Voice signal after the enhancing is less in the intensity of its noise-cut of frequency range of people Er Yi perception, and people's ear to be difficult for the intensity of its noise-cut of frequency range of perception bigger, the effect that voice strengthen is obvious, satisfies human auditory's characteristics.
The method that the voice signal that the application of the invention embodiment provides strengthens; The frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising; The frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
In order more clearly to understand technical scheme of the present invention, detailed introduction is done in the perception weighted filtering processing of Noisy Speech Signal.
In implementation process of the present invention; Noisy Speech Signal carries out before the spectral subtraction algorithm; Noisy Speech Signal is carried out perception weighted filtering in time domain handle, the voice signal after perception weighted filtering is handled is transformed into frequency domain through Fourier transform with Noisy Speech Signal; At frequency domain Noisy Speech Signal is carried out spectral substraction and phase place and synthetic; Voice signal with spectral substraction and phase place after synthetic is transformed into time domain through inverse Fourier transform, handles the voice signal that is enhanced through contrary perception weighted filtering again.
Concrete, it is through making up a perceptual weighting filter, through using this perceptual weighting filter Noisy Speech Signal is carried out filtering, voice signal being transformed to the perceptual weighting territory that the perception weighted filtering of Noisy Speech Signal is handled.
In embodiments of the present invention, the transport function of perceptual weighting filter is as follows:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1 formula (6)
Wherein, γ 1And γ 2Be two weighting factors, γ 1And γ 2Can obtain optimum value by a large amount of training; α iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can read from the Voice decoder code stream, has reduced the complexity in the perceptual weighting implementation procedure.
After Noisy Speech Signal handled through perception weighted filtering, be in the expression formula of time domain:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i ) Formula (7)
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Noisy Speech Signal makes that the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal after the said sensed weighted filtering is handled, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.After the perception weighted filtering of accomplishing Noisy Speech Signal was handled, less to the intensity of the frequency range noise-cut of people Er Yi perception in carrying out spectral substraction and phase place building-up process, the intensity of frequency range noise-cut that people's ear is difficult for perception was bigger.
Concrete; When carrying out spectral substraction; Intensity to the frequency range noise-cut of people Er Yi perception is less, and promptly the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; In addition; The intensity of frequency range noise-cut that people's ear is difficult for perception is bigger; Be that the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity under the signal amplitude after the reduction, increased be difficult for perception at people's ear frequency range to noise abatement intensity, satisfied human auditory's characteristics.
Behind the spectral substraction of accomplishing noisy speech; Handle through contrary perception weighted filtering; The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with spectral substraction and phase place after synthetic; The frequency band signals amplitude that people's ear is difficult for perception raises, and has realized the voice reinforced effects to Noisy Speech Signal.Contrary perception weighted filtering processing to voice signal is reversible process with perception weighted filtering, and specific algorithm repeats no more at this.
Based on above-mentioned perception weighted filtering principle to Noisy Speech Signal, the invention provides the system principle synoptic diagram that a kind of voice signal strengthens, as shown in Figure 3.As can be seen from Figure 3; When the conversation beginning; Obtain Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle, Noisy Speech Signal is transformed to the perceptual weighting territory after Fourier transform is transformed into frequency domain with Noisy Speech Signal; It is synthetic at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place; Be transformed into time domain through inverse Fourier transform successively with accomplishing the synthetic voice signal of spectral substraction and phase place, handle the voice signal after being enhanced through contrary perception weighted filtering again.Wherein, handling with the perception weighted filtering of noisy speech is handled to the contrary perception weighted filtering of accomplishing the synthetic voice signal of spectral substraction and phase place is reversible process.
Need to prove, in embodiments of the present invention, Noisy Speech Signal is carried out in the spectral substraction process; Its noise spectrum is the experience estimation value, obtains the optimum spectral range of noise spectrum through a large amount of training, and can be according to the spectrum intensity moderate change of Noisy Speech Signal; Assurance is carried out in the spectral substraction process Noisy Speech Signal; Noise in the effective subduction zone noisy speech signal, the voice signal that is enhanced, the variation of its noise spectrum under varying environment repeated no more at this.
Below, in conjunction with concrete application scenarios of the present invention technical scheme of the present invention is set forth in detail, as shown in Figure 4, the schematic flow sheet of the method that strengthens for a kind of voice signal under the application scenarios of the present invention may further comprise the steps:
Noisy Speech Signal is obtained in step 401, conversation beginning.
Conversation on video telephone begins; Both call sides can be observed the other side's conversation scene video; And both call sides and telephone microphone are keeping certain distance to make things convenient for collection of video signal; Therefore in the voice signal that microphone collects except the voice signal of both call sides, also sneaked into a large amount of noises, in embodiments of the present invention this voice signal of sneaking into much noise is called Noisy Speech Signal.The introducing of noise has reduced the signal to noise ratio (S/N ratio) of conversation signal, has influenced the speech quality of videophone, in order to improve the signal to noise ratio (S/N ratio) of conversation signal, the Noisy Speech Signal that obtains is carried out noise reduction handle.The model of noisy speech is suc as formula shown in (1) in the embodiment of the invention.
The noisy speech model is:
Y (n)=s (n)+d (n) formula (1)
Wherein, y (n) represents noisy speech, and s (n) represents clean speech, the noise that d (n) representative is sneaked into.
Need to prove; Perception weighted filtering is handled and the spectral substraction algorithm is subdued the noise intensity in the voice signal through Noisy Speech Signal is carried out in embodiments of the present invention; Except in calling course of video telephone, cutting down noise, can also be used for other communication mode and realize to noise abatement, like plain old telephone; Visual telephone or the like, the variation of this application scenarios is repeated no more at this.
Step 402, from decoded stream, obtain the LP Prediction Parameters, make up perceptual weighting filter.
Noisy Speech Signal is carried out before the spectral subtraction algorithm; Noisy Speech Signal is carried out perception weighted filtering to be handled; Perception weighted filtering processing to noisy speech realizes that through making up perceptual weighting filter in the present invention, the transport function of perceptual weighting filter is suc as formula shown in (6):
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1 formula (6)
Wherein, γ 1And γ 2Be two weighting factors, γ 1And γ 2Can obtain optimum value by a large amount of training; α iBe the LP predictive coefficient of voice signal, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Because most hybrid coding forms that adopt based on linear prediction in the current video-phone system, the LP predictive coefficient can directly read from decoded stream.According to weighting factor γ 1And γ 2, and the LP Prediction Parameters of from decoded stream, obtaining, make up perceptual weighting filter.
Step 403, Noisy Speech Signal is carried out perception weighted filtering handle.
Noisy Speech Signal y (n)=s (n)+d (n) is carried out perception weighted filtering through perceptual weighting filter handle, Noisy Speech Signal is when carrying out the perception weighted filtering processing, and its Noisy Speech Signal is the time domain Noisy Speech Signal.Noisy Speech Signal carries out through perceptual weighting filter that its form is a time domain after the filtering, and its expression formula in time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i ) Formula (7)
Wherein, x (k) represents the Noisy Speech Signal before the perceptual weighting, and y (k) representative is through the Noisy Speech Signal after the perceptual weighting, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Handle through Noisy Speech Signal being carried out perception weighted filtering, can be so that the frequency band signals amplitude of people Er Yi perception raise in the Noisy Speech Signal, the frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces.
Step 404, Noisy Speech Signal is carried out Fourier transform, Noisy Speech Signal is transformed into frequency domain.
Noisy Speech Signal converts the voice signal function of time domain the voice signal function of frequency field to through Fourier transform after perception weighted filtering is handled, and carries out the spectral substraction algorithm of Noisy Speech Signal at frequency domain.Concrete, through Fourier transform the process that Noisy Speech Signal is transformed into frequency domain is repeated no more.
Step 405, at frequency domain Noisy Speech Signal to be carried out spectral substraction and phase place synthetic.
Noisy Speech Signal is after perception weighted filtering processing and Fourier transform; It is synthetic at frequency domain noisy speech to be carried out spectral substraction and phase place; At this moment, the frequency band signals amplitude of people Er Yi perception raises in the Noisy Speech Signal, and people's ear is difficult for the frequency band signals amplitude reduction of perception in the Noisy Speech Signal.
Noisy speech is carried out spectral substraction and phase place when synthetic, use less noise-cut intensity in the frequency range of people Er Yi perception, the frequency range that is difficult for perception at people's ear is used bigger noise-cut intensity.Concrete, the frequency range of people Er Yi perception is carried out less noise-cut intensity under the signal amplitude after the rising, has reduced when the frequency range of people Er Yi perception is carried out noise-cut the influence to voice signal; People's ear is difficult for the frequency range of perception and under the signal amplitude after the reduction, carries out bigger noise-cut intensity, increased be difficult for perception at people's ear frequency range to noise abatement intensity.
Step 406, the voice signal that will accomplish spectral subtraction algorithm are transformed into time domain through inverse Fourier transform.
Noisy speech is transformed into time domain through inverse Fourier transform with Noisy Speech Signal after accomplishing the spectral substraction algorithm, inverse Fourier transform is the inverse operation of Fourier transform, and concrete conversion process repeats no more once more.
Step 407, the signal that will be transformed into time domain carry out contrary perception weighted filtering and handle the voice signal that is enhanced.
With accomplish spectral substraction with phase place the voice signal after synthetic handle through contrary perception weighted filtering, the frequency band signals amplitude of people Er Yi perception is reduced, people's ear is difficult for the frequency band signals amplitude rising of perception, the voice signal that is enhanced.The contrary perception weighted filtering of voice signal is handled process reversible when the perception weighted filtering of Noisy Speech Signal is handled, and specific algorithm repeats no more at this.
The method that the voice signal that the application of the invention embodiment provides strengthens; The frequency range of people Er Yi perception in the Noisy Speech Signal is carried out less noise-cut intensity under the signal amplitude after the rising; The frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
As shown in Figure 5, a kind of voice signal enhanced device 500 structural representations for the embodiment of the invention provides comprise:
Perception weighted filtering module 510 is used for after obtaining Noisy Speech Signal, Noisy Speech Signal is carried out perception weighted filtering handle;
Spectrum subtraction module 520; Be connected with perception weighted filtering module 510; After Noisy Speech Signal after the sense weighted filtering handled was transformed into frequency domain, it was synthetic with phase place at frequency domain Noisy Speech Signal to be carried out spectral substraction, and the voice signal after spectral substraction and phase place synthesized is transformed into time domain;
Contrary perception weighted filtering module 530 is connected with spectrum subtraction module 520, and the voice signal that is used for spectral substraction and phase place after synthetic carries out against perception weighted filtering processing, the voice signal that is enhanced.
Wherein, perception weighted filtering module 510 specifically is used for: the frequency band signals amplitude of Noisy Speech Signal people Er Yi perception is raise, and the frequency band signals amplitude that the people's ear in the Noisy Speech Signal is difficult for perception reduces.
510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled employed transport function and is:
W ( z ) = A ( z / γ 1 ) A ( z / γ 2 ) = 1 - Σ i = 1 p α i γ 1 i z - i 1 - Σ i = 1 p α i γ 2 i z - i , 0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
510 pairs of Noisy Speech Signals of perception weighted filtering module carry out perception weighted filtering to be handled, and Noisy Speech Signal in the expression formula of time domain is:
y ( k ) - Σ i = 1 p α i γ 1 i y ( k - i ) = x ( k ) - Σ i = 1 p α i γ 2 i x ( k - i )
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the LP predictive coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of LP predictive coefficient, and p is the exponent number of LP predictive coefficient, 1≤i≤p.
Spectrum subtraction module 520 specifically is used for: the noise-cut intensity to said Noisy Speech Signal people Er Yi perception frequency range is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.
Contrary perception weighted filtering module 530 specifically is used for: the frequency band signals amplitude of voice signal people Er Yi perception is reduced, and the frequency band signals amplitude that people's ear is difficult for perception raises.
The method and apparatus that a kind of voice signal that the application of the invention embodiment is provided strengthens; In time domain Noisy Speech Signal being carried out perception weighted filtering handles; The frequency band signals amplitude of people Er Yi perception in the Noisy Speech Signal is raise; The frequency band signals amplitude that people's ear in the Noisy Speech Signal is difficult for perception reduces; And the frequency range of people Er Yi perception in the Noisy Speech Signal carried out less noise-cut intensity under the signal amplitude after the rising, the frequency range that people's ear is difficult for perception is carried out bigger noise-cut intensity, the voice signal after being enhanced under the signal amplitude after the reduction.
Through the description of above embodiment, those skilled in the art can be well understood to the embodiment of the invention and can realize through hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding; The technical scheme of the embodiment of the invention can be come out with the embodied of software product, this software product can be stored in a non-volatile memory medium (can be CD-ROM, USB flash disk; Portable hard drive etc.) in; Comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that embodiment of the present invention embodiment is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosedly be merely several specific embodiment of the present invention, still, the embodiment of the invention is not limited thereto, any those skilled in the art can think variation all should fall into the protection domain of the embodiment of the invention.

Claims (10)

1. the method that voice signal strengthens is characterized in that, comprising:
Obtain Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;
Said Noisy Speech Signal after said sense weighted filtering handled is transformed into frequency domain, and it is synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;
Voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering and handles the voice signal that is enhanced;
Wherein, it is synthetic that said Noisy Speech Signal is carried out spectral substraction and phase place, comprising:
Noise-cut intensity to people Er Yi perception frequency range in the said Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.
2. the method for claim 1 is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle, and comprising:
The frequency band signals amplitude of people Er Yi perception in the said Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the said Noisy Speech Signal is difficult for perception reduces.
3. according to claim 1 or claim 2 method is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle employed transport function and be:
Figure FSB00000860303200011
0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
4. according to claim 1 or claim 2 method is characterized in that, said Noisy Speech Signal is carried out perception weighted filtering handle, and said Noisy Speech Signal in the expression formula of time domain is:
Figure FSB00000860303200012
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
5. the method for claim 1 is characterized in that, the voice signal to said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering to be handled, and comprising:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with said spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
6. a voice signal enhanced device is characterized in that, comprising:
The perception weighted filtering module is used for after obtaining Noisy Speech Signal, said Noisy Speech Signal is carried out perception weighted filtering handle;
The spectrum subtraction module, after the said Noisy Speech Signal after said sense weighted filtering is handled was transformed into frequency domain, it was synthetic with phase place at frequency domain said Noisy Speech Signal to be carried out spectral substraction, and the voice signal after said spectral substraction and phase place synthesized is transformed into time domain;
Contrary perception weighted filtering module, the voice signal that is used for said spectral substraction and phase place after synthetic carries out contrary perception weighted filtering processing, the voice signal that is enhanced;
Wherein, said spectrum subtraction module specifically is used for:
Noise-cut intensity to people Er Yi perception frequency range in the said Noisy Speech Signal is little, and the noise-cut intensity of frequency range that people's ear in the said Noisy Speech Signal is difficult for perception is big.
7. device as claimed in claim 6 is characterized in that, said perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception in the said Noisy Speech Signal is raise, and the frequency band signals amplitude that people's ear in the said Noisy Speech Signal is difficult for perception reduces.
8. like claim 6 or 7 described devices, it is characterized in that said perception weighted filtering module is carried out perception weighted filtering to said Noisy Speech Signal and handled employed transport function and be:
Figure FSB00000860303200021
0<γ 2<γ 1≤1
Wherein, γ 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
9. like claim 6 or 7 described devices, it is characterized in that said perception weighted filtering module is carried out the perception weighted filtering processing to said Noisy Speech Signal, said Noisy Speech Signal in the expression formula of time domain is:
Figure FSB00000860303200031
Wherein, x (k) represents the voice signal before the perceptual weighting, and the voice signal after the perceptual weighting, γ have been passed through in y (k) representative 1And γ 2Be weighting factor, α iBe the linear predictor coefficient of the voice signal that from Voice decoder device code stream, extracts, i is the subscript of linear predictor coefficient, and p is the exponent number of linear predictor coefficient, 1≤i≤p.
10. device as claimed in claim 6 is characterized in that, said contrary perception weighted filtering module specifically is used for:
The frequency band signals amplitude of people Er Yi perception reduces in the voice signal with said spectral substraction and phase place after synthetic, and the frequency band signals amplitude that people's ear is difficult for perception raises.
CN2009102369170A 2009-10-27 2009-10-27 Method and device for enhancing voice signal Expired - Fee Related CN102054482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102369170A CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102369170A CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Publications (2)

Publication Number Publication Date
CN102054482A CN102054482A (en) 2011-05-11
CN102054482B true CN102054482B (en) 2012-11-28

Family

ID=43958736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102369170A Expired - Fee Related CN102054482B (en) 2009-10-27 2009-10-27 Method and device for enhancing voice signal

Country Status (1)

Country Link
CN (1) CN102054482B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867497A (en) * 2014-02-26 2015-08-26 北京信威通信技术股份有限公司 Voice noise-reducing method
CN113506577A (en) * 2021-06-25 2021-10-15 贵州电网有限责任公司 Method for perfecting voiceprint library based on incremental acquisition of telephone recording

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1496032A (en) * 1999-06-09 2004-05-12 ������������ʽ���� Nois silencer
CN101140758A (en) * 2006-09-06 2008-03-12 华为技术有限公司 Perception weighting filtering wave method and perception weighting filter thererof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1496032A (en) * 1999-06-09 2004-05-12 ������������ʽ���� Nois silencer
CN101140758A (en) * 2006-09-06 2008-03-12 华为技术有限公司 Perception weighting filtering wave method and perception weighting filter thererof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
浦小祥 等.一种基于感知滤波的语音去噪算法.《通信技术》.2008,第41卷(第7期),第236-238页. *
黄雅婷.基于听觉特性的电子耳蜗语音增强的研究.《中国优秀硕士学位论文全文数据库》.2008,(第11期),第15-36页. *

Also Published As

Publication number Publication date
CN102054482A (en) 2011-05-11

Similar Documents

Publication Publication Date Title
CN102652336B (en) Speech signal restoration device and speech signal restoration method
CN102576542B (en) Method and device for determining upperband signal from narrowband signal
JP3881943B2 (en) Acoustic encoding apparatus and acoustic encoding method
CN111081268A (en) Phase-correlated shared deep convolutional neural network speech enhancement method
CN101916567B (en) Speech enhancement method applied to dual-microphone system
JP3881946B2 (en) Acoustic encoding apparatus and acoustic encoding method
JP5301471B2 (en) Speech coding system and method
CN1750124A (en) Bandwidth extension of band limited audio signals
KR20070085532A (en) Stereo encoding apparatus, stereo decoding apparatus, and their methods
CN1193344C (en) Speech decoder and method for decoding speech
CN104981870B (en) Sound enhancing devices
US7330813B2 (en) Speech processing apparatus and mobile communication terminal
CN107112027A (en) The bi-directional scaling of gain shape circuit
CN114007157A (en) Intelligent noise reduction communication earphone
CN103413557A (en) Voice signal bandwidth expansion method and device thereof
CN111899750B (en) Speech enhancement algorithm combining cochlear speech features and hopping deep neural network
CN102054482B (en) Method and device for enhancing voice signal
JP2023548707A (en) Speech enhancement methods, devices, equipment and computer programs
CN101960514A (en) Signal analysis/control system and method, signal control device and method, and program
CN113936680B (en) Single-channel voice enhancement method based on multi-scale information perception convolutional neural network
CN1312463C (en) Generation LSF vector
JP3472279B2 (en) Speech coding parameter coding method and apparatus
CN115171709B (en) Speech coding, decoding method, device, computer equipment and storage medium
CN109215635B (en) Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement
US20080219473A1 (en) Signal processing method, apparatus and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121128

Termination date: 20211027

CF01 Termination of patent right due to non-payment of annual fee