CN104269178A - Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals - Google Patents

Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals Download PDF

Info

Publication number
CN104269178A
CN104269178A CN201410390560.2A CN201410390560A CN104269178A CN 104269178 A CN104269178 A CN 104269178A CN 201410390560 A CN201410390560 A CN 201410390560A CN 104269178 A CN104269178 A CN 104269178A
Authority
CN
China
Prior art keywords
noise
voice signal
formula
signal
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410390560.2A
Other languages
Chinese (zh)
Inventor
张金明
刘宇
陈少卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huadi Computer Group Co Ltd
Original Assignee
Huadi Computer Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huadi Computer Group Co Ltd filed Critical Huadi Computer Group Co Ltd
Priority to CN201410390560.2A priority Critical patent/CN104269178A/en
Publication of CN104269178A publication Critical patent/CN104269178A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention provides a method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals. The method mainly includes the steps of calculating power spectra of an additive model of the voice signals; obtaining a power estimation value of the voice signals according to the power spectra of the additive model of the voice signals and power estimation values of signals with noise and noisy signals; conducting inverse Fourier transformation on the power estimation value of the voice signals to obtain strengthened voice signals; conducting wavelet transformation processing on the strengthened voice signals through the adoption of a wavelet transformation matrix; and conducting threshold processing on the wavelet transformation matrix, conducting inverse transformation on the wavelet transformation matrix after threshold processing, and obtaining the voice signals without noise. By means of the method and device, the problem of the background noise of the voice signals in a teleconference is effectively solved; in addition, other kinds of noise like music noise cannot be caused.

Description

Carry out Adaptive spectra to voice signal to subtract and the method and apparatus of wavelet packet de-noise process
Technical field
The present invention relates to voice process technology field, particularly relate to and a kind ofly Adaptive spectra is carried out to voice signal subtract and the method and apparatus of wavelet packet de-noise process.
Background technology
In the large enterprise of the industries such as China People's Armed Police, public security, railway, electric power, petroleum and petrochemical industry, often hold production scheduling meeting, need to support the complete interactively TeleConference Bridge of Large Copacity.In these TeleConference Bridges, there is a large amount of xenogenesis voice resource terminals, need to carry out mutual voice communication by conversion equipment between these xenogenesis voice resource terminals, as realized intercommunication by programme-controlled exchange between the pseudo-terminal (phone) in public communication network, digital cluster system, satellite network and wired network; Intercommunication is realized by soft switch between VoIP voice call terminal; Ultrashort wave (USW) net and shortwave network termination realize the conversion to IP system respectively by IP radio network gateway and integrated access equipment, and then realize intercommunication by the terminal such as SoftSwitch and IP phone, or be connected with programme-controlled exchange by trunk interface, realize the intercommunication with other pseudo-terminals.
Above-mentioned xenogenesis voice resource terminal and conversion equipment noise in actual applications normally random, its ground unrest source and kind different, need to carry out noise Processing for removing to signals with noise, to eliminate ground unrest, improve voice quality, improve the sharpness of voice, intelligibility and comfort level, improve the performance of speech processing system.
The shortcoming of the noise cancellation method of TeleConference Bridge of the prior art is: adopt single noise-reduction method to different background noise source and kind, and only improve part or effective to certain noise like, as subtracted spectrometry noise reduction, itself also can bring into " music noise ".
Summary of the invention
The embodiment provides and a kind ofly Adaptive spectra is carried out to voice signal subtract and the method and apparatus of wavelet packet de-noise process, to effectively eliminate the noise of the voice signal in teleconference.
The invention provides following scheme:
(mutually corresponding with claims)
The technical scheme provided as can be seen from the embodiment of the invention described above, the embodiment of the present invention is by setting up the additive model of voice signal, power estimation value according to the power spectrum of the additive model of voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal and the voice signal after strengthening, the voice signal after denoising is obtained again according to the noise power of real-time estimation and wavelet transform matrix, efficiently solve the ground unrest problem of the voice signal in teleconference, and the noises such as other " music noise " can not be brought.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The structural representation of a kind of voice resource process that Fig. 1 provides for the embodiment of the present invention one and Exchange Platform;
Fig. 2 adopts Adaptive spectra subtraction signals with noise to be carried out to the principle schematic of denoising Processing for a kind of speech enhan-cement disposable plates that the embodiment of the present invention one provides;
To be a kind of speech enhan-cement disposable plates of the embodiment of the present invention carry out Adaptive spectra to voice signal to Fig. 3 subtracts and the processing flow chart of method of wavelet packet de-noise process;
A kind of principle schematic of signals with noise being carried out to five layers of wavelet decomposition that Fig. 4 provides for the embodiment of the present invention one;
What Fig. 5 provided for the embodiment of the present invention a kind ofly carries out Adaptive spectra to voice signal and subtracts and the concrete structure schematic diagram of device of wavelet packet de-noise process, in figure, the power estimation value acquisition module 51 of voice signal, strengthen voice signal acquisition module 52, wavelet transform process processing module 53, de-noising voice signal acquisition module 54.
Embodiment
For ease of the understanding to the embodiment of the present invention, be further explained explanation below in conjunction with accompanying drawing for several specific embodiment, and each embodiment does not form the restriction to the embodiment of the present invention.
Embodiment one
As shown in Figure 1, this platform mainly completes the function such as transcoding, multichannel audio mixing, recording, playback, echo cancellation of speech enhan-cement, audio medium stream for a kind of voice resource process that the embodiment of the present invention provides and the structural representation of Exchange Platform.This platform possesses code stream forwarding capability, front audio circulation can be dealt into dispatching desk, memory device; Possess powerful Streaming Media processing power, separate unit supports that 1440 road voice non-blocking schedulings are concurrent, forward process, and 480 side's audio mixings are supported in single game meeting.
Above-mentioned voice resource process and the I/O plate in Exchange Platform: mainly realize switch and be connected with the outside of hardware platform, simultaneously by the form of external interrupt line information with front and rear panel stitch, realize the relay interconnection with Digital Trunk Board.
Digital Trunk Board: mainly trunk information is converted to H.110 bus core bus form under the control of CPU board, provide information source for each resource board carries out function services.
Voice Navigation plate: to core bus needing provide voice transmitting-receiving, the speech channel information of the various types of voice value-added service such as dual-audio collects the digits, FSK (Frequency-shift Keying: frequency shift keying) sends out number processes.
Meeting bridge plate: by processing core bus data, provide the foundation of multiparty teleconferencing.
CPU board: by the harmoniously working capability between each voice resource plate of PCI (Peripheral Component Interconnect: Peripheral Component Interconnect standard) bus marco.
Speech enhan-cement disposable plates: main noise reduction process and the speech enhan-cement process adopting Adaptive spectra subtraction to complete signals with noise, improves voice quality, improves the sharpness of voice, intelligibility and comfort level, improve the speech performance of TeleConference Bridge.
A kind of speech enhan-cement disposable plates that this embodiment provides adopts Adaptive spectra subtraction to carry out the principle schematic of denoising Processing as shown in Figure 2 to signals with noise, based on the principle of the Adaptive spectra subtraction denoising Processing shown in above-mentioned Fig. 2, a kind of above-mentioned speech enhan-cement disposable plates that this embodiment provides to voice signal carry out Adaptive spectra subtract and wavelet packet de-noise process method treatment scheme as shown in Figure 3, comprise following treatment step:
Step S310, signals with noise, noise signal and voice signal according to after low-pass filter process set up the additive model of voice signal.
Hi-pass filter is a system making high-frequency stop low frequency to be passed through than being easier to pass through.It eliminates low-frequency component unnecessary in signal and eliminates low-frequency disturbance in other words.
The reference frequency of the Hi-pass filter in the embodiment of the present invention can be set to 50Hz, this Hi-pass filter does not allow the voice signal lower than 50Hz to pass through, and allow the voice signal higher than 50Hz to pass through, this Hi-pass filter carries out low-pass filter process to signals with noise, the 50Hz power supply Hz noise of inhibition zone noise cancellation signal.
Because voice signal is short-term stationarity, so think that in short-time spectrum amplitude Estimation voice signal is stationary random signal, suppose s (m), n (m) and y (m) represent voice signal, noise signal and low-pass filter process respectively after signals with noise.
Suppose that noise n (m) is incoherent additive noise with voice s (m), so, set up the additive model of voice signal:
Y (m)=s (m)+n (m) formula 1
Step S320, calculate the power spectrum of the additive model of described voice signal, the power estimation value according to the power spectrum of the additive model of described voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal.
After respectively windowing process is carried out to signal y (m), s (m), n (m), obtain signal y w(m), s w(m), n w(m), S s(ω), S n(ω) and S y(ω) represent the short-time spectrum of the signals with noise after voice signal, noise signal and low-pass filter process respectively, above-mentioned windowing process represents the frame length number getting Fourier transform.
Then have:
Y w(m)=s w(m)+n w(m) formula 2
Respectively Fourier transform is done to the two ends of formula 2, obtains
Y w(ω)=S w(ω)+N w(ω) formula 3
ω in formula 3 represents frequency
To the two ends rate of doing work spectrum respectively of formula 3, this power spectrum represents research signal various features in a frequency domain.
Obtain:
| Y w(ω) | 2=| S w(ω) | 2+ | N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω) formula 4
Wherein, N * w(ω) power spectrum of noise signal is represented, S * w(ω) power spectrum of voice signal is represented.
| Y w(ω) | 2represent the power estimation value of signals with noise, Stochastic Analysis Method can be adopted to average according to signals with noise y (m) and estimate | Y w(ω) | 2.
All the other are every must be approximately average statistical, and because s (m) and n (m) is independent, then mutual average statistical is 0, so the power estimation value of voice signal computing formula be:
| S ^ w ( ω ) | 2 = | Y w ( ω ) | 2 - E [ | N w ( ω ) | 2 ] Formula 5
Wherein E [| N w(ω) | 2=| N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω)=| N w(ω) | 2
Because s (m) and n (m) is independent, then mutual average statistical is 0.
Adopt the power estimation value of VAD (Voice Activity Detection, voice activation detection method) estimated noise signal | N w(ω) | 2, utilize the front 50ms of input signal " quiet section " to carry out estimating noise, in voice signal, speaker can constantly produce voice interval owing to breathing, utilize these interval estimating background noise comprising, after utilizing VAD " quiet section " to be detected, recycle the power estimation value of following formula to noise signal | N w(ω) | 2upgrade:
| N w ( ω ) | 2 = α | N w - 1 ( ω ) | 2 + ( 1 - α ) | S ^ w - 1 ( ω ) | 2 Formula 6
Wherein 0 < α < 1, | N w-1(ω) | be the power estimation value of the noise signal of former frame, it is the power estimation value of the voice signal of former frame.
Because noise is local stationary, therefore can think that the noise before pronunciation is identical with the noise power spectrum during pronunciation, " silent frames " before sending out voice thus can be utilized to carry out estimating noise.
As can be seen from formula 5, the power estimation value of voice signal can not ensure it is non-negative, this is because there is error when estimating noise, when the power estimation value of noise signal is greater than the power estimation value of certain frame signals with noise, the power estimation value of this frame voice signal negative situation will be appeared as, these negative values we can by change their symbol make it to become on the occasion of, also can directly to they zero setting, the embodiment of the present invention plant processing mode after adopting.
Step S330, inverse Fourier transform is carried out to the power estimation value of voice signal, the voice signal after being enhanced.
As long as obtain the power estimation value of signals with noise at frequency domain formula 5, the voice signal after just can being enhanced according to formula 7:
formula 7
In above-mentioned formula represent the phase place of voice signal
Above-mentioned IFFT represents inverse Fourier transform.
Utilize people's ear to the feature of phase-unsensitive, the phase recovery of original tape noise cancellation signal can be utilized in formula 7 to time domain speech signal, thus the voice signal after being enhanced, complete the whole speech enhan-cement process based on spectrum-subtraction.
Step S340, to strengthen after voice signal make wavelet transform process.
Noise signal in voice signal after the embodiment of the present invention adopts wavelet packet Adaptive Thresholding to eliminate above-mentioned enhancing further.Voice signal and the noise different characteristic under wavelet transformation can be utilized, by processing the object reaching voice and noise separation to coefficient of wavelet decomposition.In practical engineering application, voice signal is usually expressed as low frequency signal or some more stable signals, and noise signal is then usually expressed as high-frequency signal.
Using the voice signal after above-mentioned enhancing as signals with noise, first carry out wavelet decomposition to signals with noise, such as, carry out five layers of wavelet decomposition, what this embodiment provided a kind ofly carries out the principle schematic of five layers of wavelet decomposition as shown in Figure 4 to signals with noise:
y=cA5+cD1+cD2+cD3+cD4+cD5
Wherein cA ifor the approximate part decomposed, cD ifor the detail section decomposed, i=1,2,3,4,5, then noise section is generally comprised within cD 1, cD 2, cD 3, cD 4, cD 5in, process wavelet coefficient with threshold value, reconstruction signal can reach the object of denoising.
Above-mentioned hypothesis s, n and y are represented the signals with noise after voice signal, noise signal and wavelet decomposition respectively, and noise signal n has variances sigma 2white Gaussian noise, be additive noise incoherent with voice signal s.So the additive model of the voice signal after being enhanced:
Y=s+n formula 8
Wavelet transformation is carried out to the two ends of above-mentioned formula 8, obtains:
Y=S+N formula 9
Wherein: s=Ws, N=Wn, W are the wavelet transform matrix of setting, W ( j , k ) = &Integral; R x ( t ) &psi; j , k ( t ) &OverBar; dt
Wherein &psi; j , k ( t ) = 1 2 j &psi; ( t 2 j - k ) , j , k &Element; Z
ψ (t) represents wavelet function, and R represents real number field, and t represents the time, and x (t) represents the signal needing conversion.
Step S350, threshold value process is done to wavelet transform matrix W, inverse transformation W is done to the wavelet transform matrix that thresholding threshold process is crossed -1, be eliminated the voice signal of noise.
Choose foremost threshold value form:
&eta; t n = &sigma; 2 log n Formula 10
σ in above-mentioned formula represents that the noise criteria difference of subband is estimated, n represents subband sequence length, and subband here refers to voice critical band in 0 ~ 4kHz frequency range and divides.
Threshold value process can be expressed as can prove to use when n is tending towards infinity threshold formula 10 pairs of wavelet transform matrix to do soft-threshold process, almost can remove the noise in signals with noise completely.
Inverse transformation W is done to the wavelet transform matrix that thresholding threshold process is crossed -1, be eliminated the voice signal of noise signal
s ^ = W - 1 &CenterDot; &eta; t n &CenterDot; W &CenterDot; d Formula 11
D in above-mentioned formula 11 is wavelet coefficient, sees WAVELET PACKET DECOMPOSITION formula below.
Orthogonal scaling function φ (t) and wavelet function ψ (t) form two-scale equation
&phi; ( t ) = 2 &Sigma; k h k &phi; ( 2 t - k ) , Wherein h k=< φ (t), φ (2t-k) >, k ∈ Z
&psi; ( t ) = 2 &Sigma; k g k &psi; ( 2 t - k ) , Wherein g k=< ψ (t), ψ (2t-k) >, k ∈ Z
After WAVELET PACKET DECOMPOSITION, the wavelet coefficient between former and later two yardsticks closes and is:
d m j + 1,2 n = &Sigma; l d l j , h h ^ l - 2 m d m j + 1,2 n + 1 = &Sigma; l d l j , h g ^ l - 2 m
Embodiment two
This embodiment offers and a kind ofly carry out Adaptive spectra to voice signal and subtract and the device of wavelet packet de-noise process, its concrete structure as shown in Figure 5, comprises following module:
The power estimation value acquisition module 51 of voice signal, for calculating the power spectrum of the additive model of voice signal, the power estimation value according to the power spectrum of the additive model of described voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal;
Strengthen voice signal acquisition module 52, for carrying out inverse Fourier transform to the power estimation value of described voice signal, the voice signal after being enhanced;
Wavelet transform process processing module 53, makes wavelet transform process for utilizing wavelet transform matrix to the voice signal after described enhancing;
De-noising voice signal acquisition module 54, for doing threshold value process to wavelet transform matrix, does inverse transformation to the wavelet transform matrix that thresholding threshold process is crossed, and be eliminated the voice signal of noise.
Further, the power estimation value acquisition module 51 of described voice signal, specifically for carrying out low-pass filter process by Hi-pass filter to signals with noise;
If s (m), n (m) and y (m) represent the signals with noise after voice signal, noise signal and low-pass filter process respectively, noise n (m) is incoherent additive noise with voice s (m), and the additive model setting up voice signal is:
Y (m)=s (m)+n (m) formula 1;
Signal y is obtained after respectively windowing process is carried out to described signal y (m), s (m), n (m) w(m), s w(m), n w(m), described S s(ω), S n(ω) and S y(ω) short-time spectrum of the signals with noise after voice signal, noise signal and low-pass filter process is represented respectively;
Y w(m)=s w(m)+n w(m) formula 2
Respectively Fourier transform is done to the two ends of described formula 2, obtains:
Y w(ω)=S w(ω)+N w(ω) formula 3
ω in described formula 3 represents frequency
To the two ends rate of doing work spectrum respectively of described formula 3, obtain:
| Y w(ω) | 2=| S w(ω) | 2+ | N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω) formula 4
Wherein, N * w(ω) power spectrum of noise signal is represented, S * w(ω) power spectrum of voice signal is represented;
The power estimation value of signals with noise is adopted Stochastic Analysis Method to average to estimate according to signals with noise y (m) | Y w(ω) | 2, adopt voice activation detection method to estimate the power estimation value of noise signal | N w(ω) | 2, obtain the power estimation value of voice signal computing formula be:
| S ^ w ( &omega; ) | 2 = | Y w ( &omega; ) | 2 - E [ | N w ( &omega; ) | 2 ] Formula 5
Wherein E [| N w(ω) 2=| N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω)=| N w(ω) | 2.Because s (m) and n (m) is independent, then mutual average statistical is 0.
Further, described de-noising voice signal acquisition module 52, specifically for set the voice signal after enhancing as
formula 7
In described formula 7 represent the phase place of voice signal, described IFFT represents inverse Fourier transform.
Further, described wavelet transform process processing module 53, specifically for by the voice signal after described enhancing as signals with noise, wavelet decomposition is carried out to this signals with noise, if s, n and y are respectively the signals with noise after voice signal, noise signal and wavelet decomposition, the additive model of the voice signal after being enhanced:
Y=s+n formula 8
Wavelet transformation is carried out to the two ends of described formula 8, obtains:
Y=S+N formula 9
Wherein: s=Ws, N=Wn, W are the wavelet transform matrix of setting, W ( j , k ) = &Integral; R x ( t ) &psi; j , k ( t ) &OverBar; dt
Its &psi; j , k ( t ) = 1 2 j &psi; ( t 2 j - k ) , j , k &Element; Z
ψ (t) represents wavelet function, and R represents real number field, and t represents the time, and x (t) represents the signal needing conversion.
Further, described de-noising voice signal acquisition module 54, specifically for selected threshold form:
&eta; t n = &sigma; 2 log n
σ in above-mentioned formula represents that the noise criteria difference of subband is estimated, n represents subband sequence length;
Inverse transformation W is done to the wavelet transform matrix that thresholding threshold process is crossed -1, be eliminated the voice signal of noise signal
s ^ = W - 1 &CenterDot; &eta; t n &CenterDot; W &CenterDot; d
D in above-mentioned formula 11 is wavelet coefficient, sees WAVELET PACKET DECOMPOSITION formula below.
Orthogonal scaling function φ (t) and wavelet function ψ (t) form two-scale equation
&phi; ( t ) = 2 &Sigma; k h k &phi; ( 2 t - k ) , Wherein h k=< φ (t), φ (2t-k) >, k ∈ Z
&psi; ( t ) = 2 &Sigma; k g k &psi; ( 2 t - k ) , Wherein g k=< ψ (t), ψ (2t-k) >, k ∈ Z
After WAVELET PACKET DECOMPOSITION, the wavelet coefficient between former and later two yardsticks closes and is:
d m j + 1,2 n = &Sigma; l d l j , h h ^ l - 2 m d m j + 1,2 n + 1 = &Sigma; l d l j , h g ^ l - 2 m .
With the device of the embodiment of the present invention carry out to voice signal carry out Adaptive spectra subtract and the detailed process of wavelet packet de-noise process and preceding method embodiment similar, repeat no more herein.
In sum, the embodiment of the present invention is by setting up the additive model of voice signal, power estimation value according to the power spectrum of the additive model of voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal and the voice signal after strengthening, the voice signal after denoising is obtained again according to the noise power of real-time estimation and wavelet transform matrix, efficiently solve the ground unrest problem of the voice signal in teleconference, and the noises such as other " music noise " can not be brought.
The embodiment of the present invention significantly improves the voice quality of teleconference, improves the sharpness of voice, intelligibility and comfort level; Improve the performance of speech enhan-cement process; Extend the method for speech enhan-cement process, overcome the shortcoming of single method of speech processing; While meeting Large Copacity, complete mutual teleconference requirement, after speech enhan-cement process, tonequality is closer to on-the-spot meeting.
One of ordinary skill in the art will appreciate that: accompanying drawing is the schematic diagram of an embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for device or system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.Apparatus and system embodiment described above is only schematic, the wherein said unit illustrated as separating component or can may not be and physically separates, parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.
The above; be only the present invention's preferably embodiment, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (9)

1. carry out Adaptive spectra to voice signal to subtract and the method for wavelet packet de-noise process, it is characterized in that, comprising:
Calculate the power spectrum of the additive model of voice signal, the power estimation value according to the power spectrum of the additive model of described voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal;
Inverse Fourier transform is carried out to the power estimation value of described voice signal, the voice signal after being enhanced;
Wavelet transform matrix is utilized to make wavelet transform process to the voice signal after described enhancing;
Do threshold value process to wavelet transform matrix, do inverse transformation to the wavelet transform matrix that thresholding threshold process is crossed, be eliminated the voice signal of noise.
2. according to claim 1ly Adaptive spectra is carried out to voice signal subtract and the method for wavelet packet de-noise process, it is characterized in that, the described power spectrum calculating the additive model of described voice signal, power estimation value according to the power spectrum of the additive model of described voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal, comprising:
By Hi-pass filter, low-pass filter process is carried out to signals with noise;
If s (m), n (m) and y (m) represent the signals with noise after voice signal, noise signal and low-pass filter process respectively, noise n (m) is incoherent additive noise with voice s (m), and the additive model setting up voice signal is:
Y (m)=s (m)+n (m) formula 1.
Signal y is obtained after respectively windowing process is carried out to described signal y (m), s (m), n (m) w(m), s w(m), n w(m), described S s(ω), S n(ω) and S y(ω) short-time spectrum of the signals with noise after voice signal, noise signal and low-pass filter process is represented respectively;
Y w(m)=s w(m)+n w(m) formula 2
Respectively Fourier transform is done to the two ends of described formula 2, obtains:
Y w(ω)=S w(ω)+N w(ω) formula 3
ω in described formula 3 represents frequency
To the two ends rate of doing work spectrum respectively of described formula 3, obtain:
| Y w(ω) | 2=| S w(ω) | 2+ | N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω) formula 4
Wherein, N * w(ω) power spectrum of noise signal is represented, S * w(ω) power spectrum of voice signal is represented;
The power estimation value of signals with noise is adopted Stochastic Analysis Method to average to estimate according to signals with noise y (m) | Y w(ω) | 2, adopt voice activation detection method to estimate the power estimation value of noise signal | N w(ω) | 2, obtain the power estimation value of voice signal computing formula be:
| S ^ w ( &omega; ) | 2 = | Y w ( &omega; ) | 2 - E [ | N w ( &omega; ) | 2 ] Formula 5
Wherein E [| N w(ω) | 2=| N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω)=| N w(ω) | 2.
3. according to claim 2ly carry out Adaptive spectra to voice signal and subtract and the method for wavelet packet de-noise process, it is characterized in that, described employing voice activation detection method estimates the power estimation value of noise signal | N w(ω) | 2, comprising:
After adopting voice activation detection method the quiet section of signals with noise to be detected, recycle the power estimation value of following formula 6 pairs of noise signals | N w(ω) | 2upgrade:
| N w ( &omega; ) | 2 = &alpha; | N w - 1 ( &omega; ) | 2 + ( 1 - &alpha; ) | S ^ w - 1 ( &omega; ) | 2 Formula 6
Wherein 0 < α < 1, | N w-1(ω) | be the power estimation value of the noise signal of former frame, it is the power estimation value of the voice signal of former frame.
4. Adaptive spectra carried out to voice signal subtract and the method for wavelet packet de-noise process according to Claims 2 or 3, it is characterized in that, the described power estimation value to described voice signal carries out inverse Fourier transform, and the voice signal after being enhanced, comprising:
If the voice signal after strengthening is
formula 7
In described formula 7 represent the phase place of voice signal, described IFFT represents inverse Fourier transform.
5. according to claim 4ly carry out Adaptive spectra to voice signal and subtract and the method for wavelet packet de-noise process, it is characterized in that, the described wavelet transform matrix that utilizes makes wavelet transform process to the voice signal after described enhancing, comprising:
By the voice signal after described enhancing as signals with noise, wavelet decomposition is carried out to this signals with noise, if s, n and y are respectively the signals with noise after voice signal, noise signal and wavelet decomposition, the additive model of the voice signal after being enhanced:
Y=s+n formula 8
Wavelet transformation is carried out to the two ends of described formula 8, obtains:
Y=S+N formula 9
Wherein: s=Ws, N=Wn, W are the wavelet transform matrix of setting, W ( j , k ) = &Integral; R x ( t ) &psi; j , k ( t ) &OverBar; dt
Wherein &psi; j , k ( t ) = 1 2 j &psi; ( t 2 j - k ) , j , k &Element; Z
ψ (t) represents wavelet function, and R represents real number field, and t represents the time, and x (t) represents the signal needing conversion.
6. according to claim 5ly Adaptive spectra is carried out to voice signal subtract and the method for wavelet packet de-noise process, it is characterized in that, described does threshold value process to wavelet transform matrix, inverse transformation is done to the wavelet transform matrix that thresholding threshold process is crossed, be eliminated the voice signal of noise, comprising:
Selected threshold form:
&eta; t n = &sigma; 2 log n
σ in above-mentioned formula represents that the noise criteria difference of subband is estimated, n represents subband sequence length;
Inverse transformation W is done to the wavelet transform matrix that thresholding threshold process is crossed -1, be eliminated the voice signal of noise signal
s ^ = W - 1 &CenterDot; &eta; t n &CenterDot; W &CenterDot; d .
7. carry out Adaptive spectra to voice signal to subtract and the device of wavelet packet de-noise process, it is characterized in that, comprising:
The power estimation value acquisition module of voice signal, for calculating the power spectrum of the additive model of voice signal, the power estimation value according to the power spectrum of the additive model of described voice signal and signals with noise, noise signal obtains the power estimation value of described voice signal;
Strengthen voice signal acquisition module, for carrying out inverse Fourier transform to the power estimation value of described voice signal, the voice signal after being enhanced;
Wavelet transform process processing module, makes wavelet transform process for utilizing wavelet transform matrix to the voice signal after described enhancing;
De-noising voice signal acquisition module, for doing threshold value process to wavelet transform matrix, does inverse transformation to the wavelet transform matrix that thresholding threshold process is crossed, and be eliminated the voice signal of noise.
8. according to claim 7ly Adaptive spectra carried out to voice signal subtract and the device of wavelet packet de-noise process, it is characterized in that:
The power estimation value acquisition module of described voice signal, specifically for carrying out low-pass filter process by Hi-pass filter to signals with noise;
If s (m), n (m) and y (m) represent the signals with noise after voice signal, noise signal and low-pass filter process respectively, noise n (m) is incoherent additive noise with voice s (m), and the additive model setting up voice signal is:
Y (m)=s (m)+n (m) formula 1;
Signal y is obtained after respectively windowing process is carried out to described signal y (m), s (m), n (m) w(m), s w(m), n w(m), described S s(ω), S n(ω) and S y(ω) short-time spectrum of the signals with noise after voice signal, noise signal and low-pass filter process is represented respectively;
Y w(m)=s w(m)+n w(m) formula 2
Respectively Fourier transform is done to the two ends of described formula 2, obtains:
Y w(ω)=S w(ω)+N w(ω) formula 3
ω in described formula 3 represents frequency
To the two ends rate of doing work spectrum respectively of described formula 3, obtain:
| Y w(ω) | 2=| S w(ω) | 2+ | N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω) formula 4
Wherein, N * w(ω) power spectrum of noise signal is represented, S * w(ω) power spectrum of voice signal is represented;
The power estimation value of signals with noise is adopted Stochastic Analysis Method to average to estimate according to signals with noise y (m) | Y w(ω) | 2, adopt voice activation detection method to estimate the power estimation value of noise signal | N w(ω) | 2, obtain the power estimation value of voice signal computing formula be:
| S ^ w ( &omega; ) | 2 = | Y w ( &omega; ) | 2 - E [ | N w ( &omega; ) | 2 ] Formula 5
Wherein E [| N w(ω) 2|=| N w(ω) | 2+ S w(ω) N * w(ω)+S * w(ω) N w(ω)=| N w(ω) | 2.Because s (m) and n (m) is independent, then mutual average statistical is 0;
Described de-noising voice signal acquisition module, specifically for set the voice signal after enhancing as
formula 7
In described formula 7 represent the phase place of voice signal, described IFFT represents inverse Fourier transform.
9. according to claim 8ly Adaptive spectra carried out to voice signal subtract and the device of wavelet packet de-noise process, it is characterized in that:
Described wavelet transform process processing module, specifically for by the voice signal after described enhancing as signals with noise, wavelet decomposition is carried out to this signals with noise, if s, n and y are respectively the signals with noise after voice signal, noise signal and wavelet decomposition, the additive model of the voice signal after being enhanced:
Y=s+n formula 8
Wavelet transformation is carried out to the two ends of described formula 8, obtains:
Y=S+N formula 9
Wherein: s=Ws, N=Wn, W are the wavelet transform matrix of setting, W ( j , k ) = &Integral; R x ( t ) &psi; j , k ( t ) &OverBar; dt
Wherein &psi; j , k ( t ) = 1 2 j &psi; ( t 2 j - k ) , j , k &Element; Z
ψ (t) represents wavelet function, and R represents real number field, and t represents the time, and x (t) represents the signal needing conversion;
Described de-noising voice signal acquisition module, specifically for selected threshold form:
&eta; t n = &sigma; 2 log n
σ in above-mentioned formula represents that the noise criteria difference of subband is estimated, n represents subband sequence length;
Inverse transformation W is done to the wavelet transform matrix that thresholding threshold process is crossed -1, be eliminated the voice signal of noise signal
s ^ = W - 1 &CenterDot; &eta; t n &CenterDot; W &CenterDot; d .
CN201410390560.2A 2014-08-08 2014-08-08 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals Pending CN104269178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410390560.2A CN104269178A (en) 2014-08-08 2014-08-08 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410390560.2A CN104269178A (en) 2014-08-08 2014-08-08 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals

Publications (1)

Publication Number Publication Date
CN104269178A true CN104269178A (en) 2015-01-07

Family

ID=52160693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410390560.2A Pending CN104269178A (en) 2014-08-08 2014-08-08 Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals

Country Status (1)

Country Link
CN (1) CN104269178A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303357A (en) * 2016-08-30 2017-01-04 福州瑞芯微电子股份有限公司 The video call method of a kind of far field speech enhan-cement and system
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN108198545A (en) * 2017-12-19 2018-06-22 安徽建筑大学 A kind of audio recognition method based on wavelet transformation
CN109643554A (en) * 2018-11-28 2019-04-16 深圳市汇顶科技股份有限公司 Adaptive voice Enhancement Method and electronic equipment
CN110033757A (en) * 2019-04-04 2019-07-19 行知技术有限公司 A kind of voice recognizer
CN110808059A (en) * 2019-10-10 2020-02-18 天津大学 Speech noise reduction method based on spectral subtraction and wavelet transform
CN111110469A (en) * 2019-12-13 2020-05-08 南方医科大学南方医院 Multifunctional nursing bed
CN111226277A (en) * 2017-12-18 2020-06-02 华为技术有限公司 Voice enhancement method and device
CN111641572A (en) * 2020-05-22 2020-09-08 Oppo广东移动通信有限公司 Noise power evaluation method and device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1133152C (en) * 1999-04-19 2003-12-31 摩托罗拉公司 Noise suppression using external voice activity detection
CN1838839A (en) * 2006-04-07 2006-09-27 清华大学 Loudspeaker simple tone detecting method
CN101320559A (en) * 2007-06-07 2008-12-10 华为技术有限公司 Sound activation detection apparatus and method
CN101625869A (en) * 2009-08-11 2010-01-13 中国人民解放军第四军医大学 Non-air conduction speech enhancement method based on wavelet-packet energy
US20100023327A1 (en) * 2006-11-21 2010-01-28 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain
CN101976566A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Voice enhancement method and device using same
CN102832908A (en) * 2012-09-20 2012-12-19 西安科技大学 Wavelet transform and variable-step-size LMS (least mean square) adaptive filtering based signal denoising method
CN103594094A (en) * 2012-08-15 2014-02-19 王景芳 Self-adaptive spectral subtraction real-time speech enhancement
CN103903629A (en) * 2012-12-28 2014-07-02 联芯科技有限公司 Noise estimation method and device based on hidden Markov model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1133152C (en) * 1999-04-19 2003-12-31 摩托罗拉公司 Noise suppression using external voice activity detection
CN1838839A (en) * 2006-04-07 2006-09-27 清华大学 Loudspeaker simple tone detecting method
US20100023327A1 (en) * 2006-11-21 2010-01-28 Iucf-Hyu (Industry-University Cooperation Foundation Hanyang University Method for improving speech signal non-linear overweighting gain in wavelet packet transform domain
CN101320559A (en) * 2007-06-07 2008-12-10 华为技术有限公司 Sound activation detection apparatus and method
CN101625869A (en) * 2009-08-11 2010-01-13 中国人民解放军第四军医大学 Non-air conduction speech enhancement method based on wavelet-packet energy
CN101976566A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Voice enhancement method and device using same
CN103594094A (en) * 2012-08-15 2014-02-19 王景芳 Self-adaptive spectral subtraction real-time speech enhancement
CN102832908A (en) * 2012-09-20 2012-12-19 西安科技大学 Wavelet transform and variable-step-size LMS (least mean square) adaptive filtering based signal denoising method
CN103903629A (en) * 2012-12-28 2014-07-02 联芯科技有限公司 Noise estimation method and device based on hidden Markov model

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
冯玉亮: "《语音增强方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
孙晋松: "《语音增强算法的研究及改进》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张洪涛 等: "《谱减法与小波变换联合语音增强研究》", 《湖北工业大学学报》 *
朱建华 等: "《基于谱减法和小波阈值的新型语音降噪算法研究》", 《煤炭技术》 *
田玉静 等: "《Bark子带小波包自适应阈值语音去噪方法》", 《计算机应用》 *
田玉静 等: "《小波包自适应阈值语音降噪新算法》", 《应用声学》 *
赵洋: "《基于正交小波变换和谱减法的语音增强研究》", 《科技创新导报》 *
闵姝君: "《基于自适应谱估计的语音增强算法研究及应用》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈国明: "《基于人耳掩蔽效应的语音增强算法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
靳晨升: "《语音增强算法的研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
马明: "《基于谱减法和小波阈值分解的联合消噪算法研究》", 《无线互联科技》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106303357A (en) * 2016-08-30 2017-01-04 福州瑞芯微电子股份有限公司 The video call method of a kind of far field speech enhan-cement and system
CN106303357B (en) * 2016-08-30 2019-11-08 福州瑞芯微电子股份有限公司 A kind of video call method and system of far field speech enhan-cement
CN106531174A (en) * 2016-11-27 2017-03-22 福州大学 Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN111226277A (en) * 2017-12-18 2020-06-02 华为技术有限公司 Voice enhancement method and device
CN108198545A (en) * 2017-12-19 2018-06-22 安徽建筑大学 A kind of audio recognition method based on wavelet transformation
CN108198545B (en) * 2017-12-19 2021-11-02 安徽建筑大学 Speech recognition method based on wavelet transformation
CN109643554A (en) * 2018-11-28 2019-04-16 深圳市汇顶科技股份有限公司 Adaptive voice Enhancement Method and electronic equipment
CN110033757A (en) * 2019-04-04 2019-07-19 行知技术有限公司 A kind of voice recognizer
CN110808059A (en) * 2019-10-10 2020-02-18 天津大学 Speech noise reduction method based on spectral subtraction and wavelet transform
CN111110469A (en) * 2019-12-13 2020-05-08 南方医科大学南方医院 Multifunctional nursing bed
CN111641572A (en) * 2020-05-22 2020-09-08 Oppo广东移动通信有限公司 Noise power evaluation method and device and storage medium

Similar Documents

Publication Publication Date Title
CN104269178A (en) Method and device for conducting self-adaption spectrum reduction and wavelet packet noise elimination processing on voice signals
EP3698360B1 (en) Noise reduction using machine learning
CN104200811A (en) Self-adaption spectral subtraction and noise elimination processing method and device for voice signals
CN111341336B (en) Echo cancellation method, device, terminal equipment and medium
US7313518B2 (en) Noise reduction method and device using two pass filtering
US8010355B2 (en) Low complexity noise reduction method
JP5122879B2 (en) Partitioned fast convolution in time and frequency domain
EP2905778B1 (en) Echo cancellation method and device
CN106233382B (en) A kind of signal processing apparatus that several input audio signals are carried out with dereverberation
JP6547003B2 (en) Adaptive mixing of subband signals
CN101896964A (en) Systems, methods, and apparatus for context descriptor transmission
CN105872275B (en) A kind of speech signal time delay estimation method and system for echo cancellor
CN105788607A (en) Speech enhancement method applied to dual-microphone array
CN103632677A (en) Method and device for processing voice signal with noise, and server
CN102377454B (en) Method and device for echo cancellation
CN106210368A (en) The method and apparatus eliminating multiple channel acousto echo
CN104064191B (en) Sound mixing method and device
CN114530160A (en) Model training method, echo cancellation method, system, device and storage medium
CN204117590U (en) Voice collecting denoising device and voice quality assessment system
WO2015009293A1 (en) Background noise reduction in voice communication
CN102623016A (en) Wideband speech processing method and device
KR20110024969A (en) Apparatus for filtering noise by using statistical model in voice signal and method thereof
CN112489669B (en) Audio signal processing method, device, equipment and medium
WO2022240442A1 (en) Noise reduction based on dynamic neural networks
Chavan et al. Studies on implementation of Harr and Daubechies wavelet for denoising of speech signal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150107