CN1212608C - A multichannel speech enhancement method using postfilter - Google Patents

A multichannel speech enhancement method using postfilter Download PDF

Info

Publication number
CN1212608C
CN1212608C CNB031570747A CN03157074A CN1212608C CN 1212608 C CN1212608 C CN 1212608C CN B031570747 A CNB031570747 A CN B031570747A CN 03157074 A CN03157074 A CN 03157074A CN 1212608 C CN1212608 C CN 1212608C
Authority
CN
China
Prior art keywords
power spectrum
signal
noise
auto
signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB031570747A
Other languages
Chinese (zh)
Other versions
CN1523573A (en
Inventor
杜利民
阎兆立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CNB031570747A priority Critical patent/CN1212608C/en
Publication of CN1523573A publication Critical patent/CN1523573A/en
Application granted granted Critical
Publication of CN1212608C publication Critical patent/CN1212608C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention discloses a speech enhancing method which adopts a postfilter and is used for enhancing multichannel speech signals. The method comprises the procedures: 1. the time delay of a speech signal in every channel is calculated; 2. every channel signal is aligned at a time domain through delay compensation; 3. beam formation is carried out to signals of every channel by a beam former; 4. the autopower spectrum of a pure speech signal and the autopower spectrum of a signal with noise are estimated, the frequency response function of a Weiner filter is obtained, wherein the estimation of the cross power spectrum of noise is eliminated for obtaining the estimation of the autopower spectrum of the pure speech signal in the estimation of the cross power spectrum of the signal with noise; 5. the postpositive Wiener filter filters output beams of the beam former, and speech is enhanced. Because the present invention considers correlation among channel noises, the present invention more accords with actual conditions, noise can be effectively eliminated at low-frequency bands especially, and the effect of speech enhancement is improved.

Description

A kind of multicenter voice Enhancement Method that adopts postfilter
Technical field
The present invention relates to computer speech signal Processing field, more particularly, the present invention relates to a kind of multicenter voice Enhancement Method that adopts postfilter
Background technology
It is a kind of selectivity treatment technology of signal that voice strengthen, and mainly solves from the voice signal that is subjected to the different modes pollution, extracts the problem of pure as far as possible target voice signal.The purpose that voice strengthen is to improve the sense of hearing of voice signal, improves intelligibility, is used for communication, hearing aid, intercepts, field such as audiovisual conference.Along with the development of speech recognition technology, under quiet environment, can reach very high discrimination in addition, but the degeneration of discrimination is comparatively serious under noise circumstance.Therefore voice strengthen a kind of means of handling as speech recognition front-ends, are current very active important research directions in the world.
According to the microphone number that picks up voice signal, voice strengthen and to be divided into two types of single channel and hyperchannels.The single channel speech-enhancement system only needs a microphone, and hardware resource requires low, and algorithm complexity is less, but the de-noising performance is limited.The multicenter voice enhanced system is used microphone array, and multi channel signals has comprised abundant spatial information and temporal information, has bigger performance boost space.Therefore from since last century is over 90 years, it is people's a research focus that the microphone array voice strengthen always.
Adopt the typical workflow of the multicenter voice Enhancement Method of microphone array may be summarized as follows:
1) at first utilize time delay algorithm for estimating (as the broad sense cross correlation function, Adaptive Time Delay algorithm for estimating etc.) to obtain voice signal in each interchannel time delay, estimating signal time delay exactly is the basis that multicenter voice strengthens.
2) then by delay compensation, each channel signal in time domain alignment.
3) with Beam-former the signal of each passage being carried out wave beam forms.
4) with a postfilter (being S filter) beamformer output of Beam-former is carried out filtering, realize the enhancing of voice.
Wherein, in abovementioned steps (4),, need obtain the frequency response function of S filter for the beamformer output to Beam-former carries out filtering.
At first will remove time delay microphone signal x before i(t) and x j(t) be modeled as the combination of sound source s (t) and additive noise n (t):
x i(t)=s(t-τ i)+n i(t) (1)
x j(t)=s(t-τ j)+n j(t) (2)
Wherein, i and j are the numbering of microphone/passage, τ i, τ jIt is the travel-time (be time postpone) of sound source to microphone.
The form of S filter frequency response function is:
H ( f ) = φ ss ( f ) φ xx ( f ) - - - ( 3 )
φ wherein Ss(f) be the auto-power spectrum of desirable clean speech signal s (t), φ Xx(f) be the auto-power spectrum of signals with noise (s (t)+n (t)).The auto-power spectrum of signals with noise can directly calculate by measuring microphone signal, but the auto-power spectrum of clean speech signal can't a priori be obtained, and particularly voice signal is again a non-stationary signal, and its power spectrum is constantly to change.Therefore the key of S filter is the power spectrum that obtains the clean speech signal in the Noisy Speech Signal in each passage as far as possible exactly, and obtains the S filter frequency response function according to this power spectrum.Zelinski utilizes multi-channel information to preferably resolve this problem, and he at first supposes:
1, signal and ground unrest are incoherent.
2, also be incoherent between the noise that each passage is recorded.
3, the noise power spectrum recorded of each passage is identical.
Like this, after the simple crosscorrelation of ignoring between the relevant and noise of signal and ground unrest, obtain
φ x i x j ( f ) = φ ss ( f ) - - - ( 4 )
φ wherein Xixj(f) be signals with noise x iAnd x jCross-power spectrum.Formula (4) substitution formula (3) is just obtained the S filter frequency response function.Average by the spectral density of all possible microphone combination is calculated, can obtain estimated result more accurately:
H ^ ( f ) = E [ R { Σ i = 0 N - 2 Σ j = i + 1 N - 1 φ ^ x i x j } ] E [ Σ i = 0 N - 1 φ x i x j ] - - - ( 5 )
Wherein N represents passage/microphone number, and operational character R{.} gets real, because the signal auto-power spectrum must be a real number.
But this method is because to be based between the noise that each passage records also be uncorrelated this hypothesis, but the simple crosscorrelation of this each channel noise only could be ignored under the situation of high frequency substantially, and under the low frequency situation, the simple crosscorrelation of each channel noise is comparatively obvious, can not be left in the basket, so this method can not be practical.Therefore just need a kind of algorithm process that is applicable under the low frequency situation.
Summary of the invention
The objective of the invention is to overcome existing multicenter voice Enhancement Method and only be suitable for this shortcoming of high frequency,, provide a kind of multicenter voice Enhancement Method that adopts postfilter by considering the simple crosscorrelation of interchannel noise signal.
In order to realize purpose of the present invention, the invention provides a kind of sound enhancement method that adopts postfilter, be used for the enhancing of multicenter voice signal, comprise the steps:
1) the computing voice signal is in the time delay of each passage.
2) by delay compensation, with each channel signal in time domain alignment.
3) with Beam-former the signal of each passage being carried out wave beam forms.
4) estimate the auto-power spectrum and the signals with noise auto-power spectrum of clean speech signal, obtain the frequency response function of S filter.
Wherein, the auto-power spectrum of clean speech signal obtains as follows:
A) in all voice channels, choose two passages wantonly as a combination;
B) estimate in the described combination of channels two interchannel signals with noise cross-power spectrums and noise cross-power spectrum;
C) in described interchannel signals with noise cross-power spectrum estimation, remove the noise cross-power spectrum estimation and obtain interchannel clean speech signal auto-power spectrum estimation;
D) all possible combination of channels in a) is all carried out b) and operation c), the interchannel clean speech signal auto-power spectrum that then all is obtained is estimated to do average, and this average result is estimated as the auto-power spectrum of the clean speech signal in the step 4).
Wherein, the signals with noise auto-power spectrum is the average result of the signals with noise auto-power spectrum of all passages.
5) with rearmounted described S filter the beamformer output of Beam-former is carried out filtering, realize the enhancing of voice.
Described multicenter voice signal comprises two passage voice signals at least.
In order to reduce operand, this sound enhancement method can only be used to strengthen the low frequency part of voice signal; And the HFS of voice signal still uses existing sound enhancement method, for example the Zelinski algorithm.
Because the present invention has considered the correlativity between each channel noise when obtaining the auto-power spectrum of clean speech signal, this more tallies with the actual situation, and especially can remove noise effectively in low-frequency range, has improved the effect that voice strengthen.
Description of drawings
Fig. 1 adopts the enhancing example of sound enhancement method to one section noisy speech; Wherein (a) is original noisy speech, (b) is the voice enhancement process result of filtering behind the employing Zelinski, and figure (c) is the voice enhancement process result who adopts method of the present invention to obtain.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
To formula (1) and (2) given signal model x i(t) and x j(t) remove time delay τ i, τ jRemake Fourier transform afterwards, obtain
X ^ i ( f ) = S ( f ) + N i ( f ) e j 2 π W f τ i - - - ( 6 )
X ^ j ( f ) = S ( f ) + N j ( f ) e j 2 π W f τ j - - - ( 7 )
In the formula With Be that time delay is removed back x i(t+ τ i) and x j(t+ τ j) Fourier transform, (^) represent that erasure signal postpones; S (f) is the purified signal Fourier transform; N i(f) and N j(f) be the Fourier transform of noise; W is a frame length.By formula (6), (7) obtain the cross-power spectrum of signals with noise
φ ^ x i x j ( f ) = φ ss ( f ) + φ ^ n i n j ( f ) - - - ( 8 )
Wherein
φ ^ n i n j ( f ) = φ n i n j ( f ) e j 2 π W f τ ij - - - ( 9 )
In the formula
Figure C0315707400067
Be signals with noise x i(t+ τ i) and x j(t+ τ j) cross-power spectrum, φ Ss(f) be the auto-power spectrum of purified signal, φ Ninj(f) and Be respectively to postpone to remove forward and backward noise cross-power spectrum.τ IjijIt is the time delay between two passage i and the j signal.
Be not difficult to find out from formula (8), in order to obtain the auto-power spectrum φ of purified signal Ss(f), at first will estimate noise cross-power spectrum part in the formula, and in the prior art, the noise cross-power spectrum is left in the basket partly.Formula (9) shows the noise cross-power spectrum Along with time delay τ IjChange change, this also is simply to postpone the reason that addition and Wiener filtering algorithm can not be handled mobile sound source.According to above analysis, the noise cross-power spectrum can obtain by following formula:
φ ^ n i n j ′ ( f ) = φ n i n j ′ ( f ) e j 2 π W f τ ij - - - ( 10 )
In the formula Be to postpone to eliminate back noise cross-power spectrum estimation, φ ' Ninj(f) be original noise cross-power spectrum estimation, it can obtain in speech gaps.() ' expression signal estimated value.According to formula (8), (10) obtain the purified signal power Spectral Estimation
φ ss ′ ( f ) = φ ^ x i x j ( f ) - φ n i n j ′ ( f ) e j 2 π W fτ ij - - - ( 11 )
Also can estimate φ ' simultaneously by the calculating of signals with noise auto-power spectrum Ss(f).Release by formula (1)
φ x i x j ( f ) = φ ss ( f ) + φ n i n j ( f ) - - - ( 12 )
Therefore obtain
φ ss ′ ( f ) = φ x i x j ( f ) - φ n i n j ′ ( f ) - - - ( 13 )
φ ' in the formula Ninj(f) be that noise power spectrum is estimated.All microphones are made up the φ ' that tries to achieve according to formula (11), (13) Ss(f) work is average to improve the estimation of purified signal auto-power spectrum, obtains the estimation of S filter
H ^ = R { E [ Σ i = 0 N - 1 ( φ x i x j - φ n i n j ′ ) + Σ i = 0 N - 2 Σ j = i + 1 N - 1 ( φ ^ x i x j ( f ) - φ ^ n i n j ′ ( f ) e j 2 π W f τ ij ) ] } R { E [ Σ i = 1 N φ x i x j ] } - - - ( 14 )
R{.} represents to get real.Because power spectrum signal φ Ss(f) may be arithmetic number only, so also will make the half-wave integer, the negative that removal may occur to it.
In the specific implementation, power spectrum all upgrades by the following formula of repeatedly being with
&phi; x i x j ( k + 1 , f ) = &alpha; &phi; x i x j ( k , f ) + ( 1 - &alpha; ) X i ( f ) X j * ( f ) , 0 < &alpha; &le; 1 - - - ( 15 )
X represents signal or noise in the formula; φ Xixj(k+1, f) expression k+1 frame power Spectral Estimation, φ Xixj(k f) is k frame power Spectral Estimation.X (f) is the Fourier Tranform of signal x (k), and α is the number between 0 to 1, has reflected that power spectrum upgrades speed.
The simple crosscorrelation of each channel noise is only comparatively obvious in low frequency part, can ignore substantially at HFS.Therefore in order rationally to reduce operand, can be the low frequency part below the signal 1kHz with formula (14) filtering, and HFS is still used the algorithm process of Zelinski, as shown in Equation (5).
Figure (1) is one section noisy speech result, and wherein (a) is original noisy speech, (b) is the voice enhancement process result of filtering behind the employing Zelinski, and figure (c) is the voice enhancement process result who adopts method of the present invention to obtain.As can be seen from the figure, filtering algorithm can not effectively be removed the low-frequency noise that wherein comprises behind the Zelinski, and this part noise is in 1kHz, so also can't remove with high-pass filtering; Method of the present invention has then been removed low-frequency noise substantially.

Claims (3)

1, a kind of sound enhancement method that adopts postfilter is used for the enhancing of multicenter voice signal, and described multicenter voice signal comprises two passage voice signals at least, comprises the steps:
1) the computing voice signal is in the time delay of each passage;
2) by delay compensation, with each channel signal in time domain alignment;
3) with Beam-former the signal of each passage being carried out wave beam forms;
4) estimate the auto-power spectrum and the signals with noise auto-power spectrum of clean speech signal, obtain the frequency response function of S filter;
5) with rearmounted described S filter the beamformer output of Beam-former is carried out filtering, realize the enhancing of voice;
It is characterized in that in the step 4), the auto-power spectrum of clean speech signal obtains as follows:
A) in all voice channels, choose two passages wantonly as a combination;
B) estimate in the described combination of channels two interchannel signals with noise cross-power spectrums and noise cross-power spectrum;
C) in described interchannel signals with noise cross-power spectrum estimation, remove the noise cross-power spectrum estimation and obtain interchannel clean speech signal auto-power spectrum estimation;
D) all possible combination of channels in a) is all carried out b) and operation c), the interchannel clean speech signal auto-power spectrum that then all is obtained is estimated to do average, and this average result is estimated as the auto-power spectrum of the clean speech signal in the step 4).
2, the sound enhancement method of employing postfilter according to claim 1 is characterized in that, the signals with noise auto-power spectrum described in the step 4) is the average result of the signals with noise auto-power spectrum of all passages.
3, the sound enhancement method of employing postfilter according to claim 1 and 2 is characterized in that, this sound enhancement method only is used to strengthen the low frequency part of voice signal.
CNB031570747A 2003-09-12 2003-09-12 A multichannel speech enhancement method using postfilter Expired - Fee Related CN1212608C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031570747A CN1212608C (en) 2003-09-12 2003-09-12 A multichannel speech enhancement method using postfilter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031570747A CN1212608C (en) 2003-09-12 2003-09-12 A multichannel speech enhancement method using postfilter

Publications (2)

Publication Number Publication Date
CN1523573A CN1523573A (en) 2004-08-25
CN1212608C true CN1212608C (en) 2005-07-27

Family

ID=34287137

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031570747A Expired - Fee Related CN1212608C (en) 2003-09-12 2003-09-12 A multichannel speech enhancement method using postfilter

Country Status (1)

Country Link
CN (1) CN1212608C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1851866B1 (en) * 2005-02-23 2011-08-17 Telefonaktiebolaget LM Ericsson (publ) Adaptive bit allocation for multi-channel audio encoding
CN100535993C (en) * 2005-11-14 2009-09-02 北京大学科技开发部 Speech enhancement method applied to deaf-aid
DE602007003220D1 (en) * 2007-08-13 2009-12-24 Harman Becker Automotive Sys Noise reduction by combining beamforming and postfiltering
KR102079000B1 (en) * 2010-07-02 2020-02-19 돌비 인터네셔널 에이비 Selective bass post filter
CN103813251B (en) * 2014-03-03 2017-01-11 深圳市微纳集成电路与系统应用研究院 Hearing-aid denoising device and method allowable for adjusting denoising degree
CN105100338B (en) * 2014-05-23 2018-08-10 联想(北京)有限公司 The method and apparatus for reducing noise
CN105244036A (en) * 2014-06-27 2016-01-13 中兴通讯股份有限公司 Microphone speech enhancement method and microphone speech enhancement device
CN104202098B (en) * 2014-08-08 2016-06-15 中国科学院上海微系统与信息技术研究所 A kind of broadband power Power estimation method based on multichannel compression sampling
CN106653048B (en) * 2016-12-28 2019-10-15 云知声(上海)智能科技有限公司 Single channel sound separation method based on voice model
CN108957392A (en) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 Sounnd source direction estimation method and device
CN108615535B (en) * 2018-05-07 2020-08-11 腾讯科技(深圳)有限公司 Voice enhancement method and device, intelligent voice equipment and computer equipment
CN109243476B (en) * 2018-10-18 2021-09-03 电信科学技术研究院有限公司 Self-adaptive estimation method and device for post-reverberation power spectrum in reverberation voice signal
CN109538946B (en) * 2018-11-13 2020-11-10 阎兆立 Urban tap water pipeline leakage detection positioning method

Also Published As

Publication number Publication date
CN1523573A (en) 2004-08-25

Similar Documents

Publication Publication Date Title
CN1212608C (en) A multichannel speech enhancement method using postfilter
Li et al. Embedding and beamforming: All-neural causal beamformer for multichannel speech enhancement
CN1763846A (en) Voice gain factor estimating device and method
CN1967658A (en) Small scale microphone array speech enhancement system and method
CN103827967B (en) Voice signal restoring means and voice signal restored method
CN1356014A (en) System and method for dual microphone signal noise reduction using spectral substraction
JP2013527493A (en) Robust noise suppression with multiple microphones
CN1625920A (en) Audio apparatus and its reproduction program
CN1805278A (en) Method and apparatus to record a signal using a beam forming algorithm
CN1746974A (en) Method of enhancing quality of speech and apparatus thereof
WO2013009949A1 (en) Microphone array processing system
CN104021798A (en) Method for soundproofing an audio signal by an algorithm with a variable spectral gain and a dynamically modulatable hardness
CN102982808A (en) Speech denoising device and method based on wavelet transform
EP3731227A1 (en) Voice signal enhancing method and device
CN101587712A (en) A kind of directional speech enhancement method based on minitype microphone array
CN108962276B (en) Voice separation method and device
CN1819720A (en) Crosstalk eliminator and elimination thereof
CN1805011A (en) Adaptive filter method and apparatus for improving speech quality of mobile communication apparatus
CN1180602C (en) Method and apparatus for space-time echo cancellation
CN111225317B (en) Echo cancellation method
CN101039486A (en) Method for suppressing voice noise of mobile phone
Ngo et al. Variable speech distortion weighted multichannel wiener filter based on soft output voice activity detection for noise reduction in hearing aids
CN1212603C (en) Non linear spectrum reduction and missing component estimation method
CN108074580B (en) Noise elimination method and device
CN1201287C (en) Hiaden Markov model edge decipher data reconstitution method f speech sound identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee