CN105165026B

CN105165026B - Use the filter and method of the informed space filtering of multiple instantaneous arrival direction estimations

Info

Publication number: CN105165026B
Application number: CN201380073406.6A
Authority: CN
Inventors: 埃马努埃尔·哈贝茨; 奥利弗·蒂尔加特; 塞巴斯蒂安·布劳恩; 马亚·塔塞斯卡
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2012-12-21
Filing date: 2013-11-25
Publication date: 2019-08-13
Anticipated expiration: 2033-11-25
Also published as: WO2014095250A1; CN105165026A; RU2015129784A; EP2936830B8; EP2936830A1; ES2612528T3; JP6196320B2; US20150286459A1; RU2641319C2; US10331396B2; EP2936830B1; BR112015014380B1; EP2747451A1; JP2016506664A; BR112015014380A2

Abstract

A kind of filter (100) is provided, for generating audio output signal based on two or more input microphone signals, audio output signal includes multiple audio output signal samples.Audio output signal and two or more input microphone signals are indicated in time-frequency domain, wherein, each of multiple audio output signal samples are assigned to the time-frequency band ((k, n)) in multiple time-frequency bands ((k, n)).Filter (100) includes weight generator (110), it is adapted to as multiple time-frequency band ((k, each of n)), receive the arrival direction information of one or more sound components of one or more sound sources or the location information of one or more sound sources, and it is adapted to be according to the time-frequency band ((k, n) the arrival direction information of one or more sound components of one or more sound sources) or according to the time-frequency band ((k, n) location information of one or more sound sources), for the multiple time-frequency band ((k, each of n)), generate weighted information.And, filter includes output signal maker (120), it is used for by according to the time-frequency band ((k, n) weighted information) and the time-frequency band ((k is assigned to according in each of two or more input microphone signals, n) audio input sample), for multiple time-frequency band ((k, n) the time-frequency band ((k is distributed in each generation in), n) one in multiple audio output signal samples), Lai Shengcheng audio output signal.

Description

Use the filter and method of the informed space filtering of multiple instantaneous arrival direction estimations

Technical field

The present invention relates to Audio Signal Processings, and more particularly to using multiple instantaneous arrival directions estimations informed space The filter and method of filtering.

Background technique

It is generally found in Modern Communication System and extracts sound source under the conditions of noise and reverberation.At past 40 years, mention A large amount of space filtering technologies are gone out, to complete this task.It is filtered when the signal of observation meets signal model and calculating When information required for device is accurate, existing space filter is optimal.However, in fact, usually violate signal model, and Information required for estimating is a great challenge.

Existing space filter can broadly be divided into linear spatial filter (for example, referring to [1,2,3,4]) and parameter is empty Between filter (parametric spatial filter) (for example, referring to [5,6,7,8]).In general, linear spatial filter needs Estimate one or more second-order statistics (SOS) for propagating vector or desired one or more sources plus the SOS interfered. Some spatial filters be designed to extract reverberation or dereverberation single source signal (for example, referring to [9,10,11,12,13, 14,15,16]), and other spatial filters be designed to extract two or more reverberation source signals and (for example, referring to [17,18]).The above method needs the priori knowledge in the direction in one or more expectation sources or is simply expected to source individually or together When period for activating.

The shortcomings that these methods, is, cannot sufficiently rapidly adapt to new situation, for example, in desired source activation time-varying activation Source is mobile or competition loudspeaker.Parameter space filter is typically based on fairly simple signal model, for example, in time-frequency domain Received signal is made of single plane wave plus diffusion sound, and according to the instantaneous estimation of model parameter, calculating parameter space Filter.The advantages of parameter space filter be high flexible directional response, diffusion sound and interfere source relatively high inhibition, with And quickly it is suitable for the ability of new situation.However, can actually easily violate basic monoplane wave as shown in [19] Signal model, this significantly reduces the performance of parameter space filter.It should be noted that prior art parameter space filter uses All available microphone signals, to estimate model parameter, and only single microphone signal and real value gain are final for calculating Output signal.It is not direct to find out the extension of the output signal of enhancing for combining multiple available microphone signals (straight forward)。

Therefore, provided that then highly being supported for obtaining the improvement concept for it is expected roomage response to sound source.

Summary of the invention

Therefore, the object of the present invention is to provide the improvement concept for extracting sound source.By according to claim 1 Filter, the method according to claim 11 and by computer program according to claim 18, solve this The target of invention.

A kind of filter is provided, which is used to generate audio based on two or more input microphone signals defeated Signal out (it includes multiple audio output signal samples).Indicate that audio output signal is defeated with two or more in time-frequency domain Enter microphone signal, wherein each of multiple audio output signal samples are assigned to multiple time-frequency band (time- Frequency bin) in a time-frequency band.

The filter includes weight generator, and weight generator is adapted to as each of multiple time-frequency bands, reception one The location information of the arrival direction information of a or multi-acoustical one or more sound components or one or more sound sources, and Be adapted to be one or more sound components of one or more sound sources according to the time-frequency band arrival direction information or It is each generation weighted information in multiple time-frequency bands according to the location information of one or more sound sources of the time-frequency band.

Moreover, the filter includes output signal maker, output signal maker is used for by in multiple time-frequency bands Each of, the audio output signal sample for distributing to the time-frequency band in multiple audio output signal samples is generated, is come According to the weighted information of the time-frequency band and according to the audio input sample for distributing to the time-frequency band, two or more are generated Audio output signal in each of a input microphone signal.

Embodiment provides a kind of spatial filter, for obtaining the desired sound for the sound source that at most L are activated simultaneously It answers.It is subjected to the diffusion plus noise power at the output of filter of L linear restriction by minimizing, obtains provided empty Between filter.With prior art concept on the contrary, L constraint is estimated based on instantaneous narrowband arrival direction.Further it is provided that for spreading Than the novel estimator of/diffusion power sufficiently high time and spectral resolution is presented, to realize dereverberation and noise in noise It reduces.

According to some at most L sound embodiment there is provided for obtaining in the activation simultaneously of each T/F moment The concept of desired any roomage response in source.For this purpose, the instantaneous parameters information (IPI) about sound scenery is integrated to space filter In the design of wave device, " knowing spatial filter (informed spatial filter) " is generated.

In some embodiments, for example, this informed spatial filter is based on complicated weight (complex weight) Make all available microphone signal combinations, to provide the output signal of enhancing.

According to embodiment, for example, spatial filter of knowing can be realized as the filter of the linear constraint minimal variance space (LCMV) Wave device or parametric multi-channel Wiener filter.

In some embodiments, it for example, adding self noise power by the diffusion that minimum is subjected to L linear restriction, obtains Informed spatial filter provided by obtaining.

In some embodiments, compared with the existing technology, L constraint is estimated based on instantaneous arrival direction (DOA), and The generated response to L DOA is corresponding with specific desired orientation.

Moreover, providing the novel estimation for being used for required signal and noise statistics (for example, diffusion noise ratio (DNR)) Sufficiently high time and spectral resolution is presented, for example, to reduce reverberation and noise in device.

Moreover, providing a kind of for generating the side of audio output signal based on two or more input microphone signals Method, the audio output signal include multiple audio output signal samples.Audio output signal and two are indicated in time-frequency domain Or more input microphone signal, wherein each of multiple audio output signal samples are assigned in multiple time-frequency bands A time-frequency band.This method comprises:

For each of multiple time-frequency bands ((k, n)), one or more sound components of one or more sound sources are received The location information of arrival direction information or one or more sound sources,

According to the arrival direction information of one or more sound components of one or more sound sources of the time-frequency band or It is each generation weighted information in multiple time-frequency bands according to the location information of one or more sound sources of the time-frequency band, and And

By generating the multiple sounds for distributing to the time-frequency band ((k, n)) for each of multiple time-frequency bands ((k, n)) One in frequency output signal sample, when according to the weighted information of the time-frequency band ((k, n)) and according to distributing to described The audio input sample of frequency range ((k, n)) generates audio output letter in each of two or more input microphone signals Number.

Moreover, providing a kind of computer program, when for executing on computer or signal processor, implement above-mentioned side Method.

Detailed description of the invention

Hereinafter, referring to attached drawing, the embodiment that the present invention will be described in more detail, in which:

Fig. 1 a shows the filter according to embodiment；

Fig. 1 b shows a possible application scenarios of the filter according to embodiment；

Fig. 2 shows the filters and multiple microphones according to embodiment；

Fig. 3 shows the weight generator according to embodiment；

Fig. 4 shows the amplitude of two example responses according to embodiment；

Fig. 5 shows the weight generator according to the another embodiment for implementing linear constraint minimal variance method；

Fig. 6 is shown to be generated according to the weight for the further embodiment for implementing parametric multi-channel wiener filter approaches Device；

Fig. 7 shows true and estimation the diffusion noise ratio according to time and frequency；

Fig. 8 shows the directivity index for comparing spatial filter and white noise acoustic gain；

Fig. 9 show estimation arrival direction and generated gain；And

Figure 10 shows the example for the case where Stereoloudspeaker reproduces.

Specific embodiment

Fig. 1 a shows filter 100, is used to generate audio output based on two or more input microphone signals Signal (it includes multiple audio output signal samples).Audio output signal is indicated in time-frequency domain and two or more are defeated Enter microphone signal, wherein each of multiple audio output signal samples are assigned to one in multiple time-frequency bands (k, n) Time-frequency band (k, n).

Filter 100 includes weight generator 110, is adapted to as each of multiple time-frequency bands (k, n), reception one The location information of the arrival direction information of a or multi-acoustical one or more sound components or one or more sound sources, and It is adapted to be the arrival direction information of one or more sound components of one or more sound sources according to the time-frequency band (k, n) Or the location information of one or more sound sources according to the time-frequency band (k, n), it is each of multiple time-frequency bands (k, n), Generate weighted information.

Moreover, filter includes output signal maker 120, it is used for through the weighting according to the time-frequency band (k, n) Information and according to the audio input sample for distributing to the time-frequency band (k, n), is each generation in multiple time-frequency bands (k, n) One in the multiple audio output signal sample of the time-frequency band (k, n) is distributed to, it is defeated to generate two or more Enter audio output signal in each of microphone signal.

For example, each of two or more input microphone signals include multiple audio input samples, wherein each Audio input sample is assigned to a time-frequency band (k, n), and audio signal generator 120 can be adapted to be according to it is described when In the weighted information of frequency range (k, n) and the audio input sample according in each of two or more input microphone signals One, that is, be assigned to institute in audio input sample according in each of two or more input microphone signals An audio input sample for stating time-frequency band (k, n), generates in multiple audio output signal samples and is assigned to the time-frequency One audio output signal sample of section (k, n).

For each audio output signal sample of each time-frequency band (k, n) to be generated, weight generator 110 is given birth to again At individual weighted information.Then, audio signal generator 120 is based upon the weighted information of time-frequency band generation, generates and considers Time-frequency band (k, n) audio output signal sample.It in other words, is to generate audio output signal by weight generator 110 Each time-frequency band of sample calculates new weighted information.

When generating weighted information, weight generator 110 is adapted to be the information for considering one or more sound sources.

For example, weight generator 110 is contemplated that the position of the first sound source.In embodiments, weight generator can also be examined Consider the position of the second sound source.

For example, alternatively, the first sound source can emit the first sound wave with the first sound component.With the first sound component First sound wave reaches microphone, and weight generator 110 it is contemplated that the first sound component/arrival direction of sound wave.Whereby, Weight generator 110 takes into account the information about the first sound source.Moreover, the second sound source can emit with second sound component The second sound wave.The second sound wave with second sound component reaches microphone, and weight generator 110 is it is contemplated that the rising tone Cent amount/arrival direction of the second sound wave.Whereby, weight generator 110 also takes into account the information about the second sound source.

Fig. 1 b shows the possible application scenarios of the filter 100 according to embodiment.With the first sound component First sound wave is emitted by the first loudspeaker 121 (the first sound source) and reaches the first microphone 111.Consider in the first microphone The arrival direction (arrival direction of the=the first sound wave) of the first sound component at 111.Moreover, the with second sound component Two sound waves are emitted by the second loudspeaker 122 (the second sound source) and reach the first microphone 111.Weight generator 110 can also The arrival direction of the second sound component at the first microphone 111 is considered, to determine weighted information.Moreover, weight generator It is also possible to consider the arrival direction (arrival direction of=sound wave) of the sound component at other microphones (for example, microphone 112), To determine weighted information.

It should be noted that sound source can (for example) be the physical sound sources being physically present in the environment, for example, loudspeaker, Musical instrument or individual speak.

It should be noted, however, that image source (mirror image source) is also sound source.For example, by loudspeaker 122 The sound wave of transmitting can be reflected by wall 125, and then, and sound wave seems from different from the actually transmitting position of loudspeaker of sound wave Position 123 emit.This image source 123 is also regarded as sound source.Weight generator 110 can be adapted to be basis and image source phase The arrival direction information of pass generates weighted information according to the location information about one, two or more image source.

Fig. 2 shows according to the filter 100 of embodiment and multiple microphones 111,112,113 ..., 11n.Scheming In 2 embodiment, filter 100 further comprises filter group 101.Moreover, weight generates in the embodiment of Fig. 2 Device 110 includes information computational module 102, weight calculation module 103 and transmission function selecting module 104.

It is handled in time-frequency domain, respectively, k indicates frequency index, and n indicates time index.By M time domain wheat Gram wind x_1...M(t) from microphone 111,112,13 ..., in 11n in input equipment (filter 100), pass through filter group These time domain microphone signals are converted into time-frequency domain by 101.The microphone signal of conversion is provided by following vector:

X (k, n)=[X₁(k,n)X₂(k,n)...X_M(k,n)]^T。

Filter 100 exports desired signal Y (k, n) (audio output signal).Audio output signal (desired signal) Y (k, N) it can for example indicate the enhancing signal for mono reproduction, appoint for the earphone signal of ears audio reproduction or for having The loudspeaker signal that the spatial sound of the loudspeaker setting of meaning reproduces.

Desired signal Y (k, n) is generated by output signal maker 120, for example, being based on wink by, for example, following formula When complexity weight w (k, n)=[W₁(k,n)W₂(k,n)…W_M(k,n)]^TCarry out the linear combination of M microphone signal x (k, n):

Y (k, n)=w^H(k, n) x (k, n) (1)

Weight w (k, n) is determined by weight calculation module 103.For each k and each n, redefine determining w (k, n).In other words, for each time-frequency band (k, n), the determination of weight w (k, n) is carried out.More specifically, for example, based on instantaneous ginseng Number information (IPI)And it based on corresponding expectation transmission function G (k, n), calculates weight w (k, n).

Information computational module 102 is configured as calculating IPI from microphone signal x (k, n)IPI description is given Time-frequency moment (k, n) signal and be included in microphone signal x (k, n) in noise component(s) special characteristic.

Fig. 3 shows the weight generator 110 according to embodiment.Weight generator 110 includes information computational module 102, weight calculation module 103 and transmission function selecting module 104.

As shown in the example in Fig. 3, IPI mainly includes one or the direction for example calculated by DOA estimation module 201 The instantaneous arrival direction (DOA) of property sound component (for example, plane wave).

As explained below, DOA information can be by spatial frequency (for example, passing through ), pass through phase shift (for example, passing through), by time delay between microphone, by propagate vector (for example, by), by double interaural intensity differences (ILD) or angle is expressed as (for example, passing through by interaural time difference (ITD) [azimuthElevation angle theta (k, n)]).

Moreover, IPIIt can be for example including other information, for example, the second-order statistic of signal or noise component(s) (SOS)。

In embodiments, weight generator 110, which is adapted to be, inputs microphone signal according to about two or more Signal or noise component(s) statistical information, be multiple time-frequency bands (k, n) in each generation weighted information.For example, this system Counting information is the second-order statistic referred to here.For example, statistical information can be the power of noise component(s), signal diffusion (signal-to-diffuse) information, signal noise (signal-to-noise) information, diffusion noise (diffuse-to- Noise) the signal of information, the power of signal component, the power of diffusion component or two or more input microphone signals The power spectral density matrix of component or noise component(s).

Second-order statistic can be calculated by statistics computing module 205.The second-order statistic information can make an uproar for example including fixation The power of sound component (for example, self noise), the power of on-fixed noise component(s) (for example, diffusion noise), signal diffusion ratio (SDR), signal-to-noise ratio (SNR) or diffusion noise ratio (DNR).The information allows to calculate optimal weight w according to specific optimisation criteria (k,n)。

" steady noise component "/" slowly varying noise component(s) " is, for example, to have as the time does not change or slowly varying Statistical property noise component(s).

" on-fixed noise component(s) " is, for example, with the noise component(s) with time fast-changing statistical property.

In embodiments, weight generator 110, which is adapted to be, inputs microphone about two or more according to expression First noise information of the information of the first noise component(s) of signal and according to indicate about two or more input microphone Second noise information of the information of the second noise component(s) of signal is each generation weighted information in multiple time-frequency bands (k, n).

For example, the first noise component(s) can be on-fixed noise component(s), and the first noise information can be about non-solid Determine the information of noise component(s).

For example, the second noise component(s) can be the noise component(s) of steady noise component/slowly varying, and the second noise is believed Breath can be the information about fixed/slowly varying noise component(s).

In embodiments, weight generator 110 be configured as by using such as predefined statistical information (for example, Information as caused by on-fixed noise component(s) about the spatial coherence between two or more input microphone signals) it is raw At the first noise information (for example, information about revocable/non-slowly varying noise component(s)), and wherein, weight is raw Grow up to be a useful person 110 be configured as generating in the case where not utilizing statistical information the second noise information (for example, about it is fixed/slowly The information of the noise component(s) of variation).

About fast-changing noise component(s), input microphone signal enough information cannot be provided separately determine about The information of this noise component(s).For example, furthermore, it is necessary to statistical information determines the information about fast-changing noise component(s).

However, not needing statistical information about not changing or not fast-changing noise component(s) and making an uproar to determine about these The information of sound component.On the contrary, assessment microphone signal is enough.

It should be noted that using the DOA information counting statistics information of estimation, as shown in FIG. 3.It is further noted that , also IPI can be provided in outside.For example, the DOA of sound (being the position of sound source respectively) can be by video camera and face recognition Algorithm determines, it is assumed that human conversations person forms sound scenery.

Transmission function selecting module 104 is configured to supply transmission function G (k, n).Fig. 2 and Fig. 3 (may be complicated) Transmission function G (k, n) describes the expected response of system, gives (for example, parameter current) IPIFor example, G (k, n) Any pickup pattern (pick-up of expectation space microphone for the signal enhancing in mono reproduction can be described Pattern), for the relevant speaker gain of the DOA of loudspeaker reproduction or the head related transfer function of binaural reproduction (HRTF)。

It should be noted that in general, the statistical information of the sound scenery of record rapidly changes with time and frequency.Knot Fruit, IPIAnd optimum weighting w (k, n) is effective only for specific time-frequency index accordingly, therefore, gives each k It is recalculated with n.Therefore, system can adapt to current record case immediately.

It should further be noted that M input microphone can form single microphone array, or can be distributed as in difference Position form multiple arrays.Moreover, IPIIt may include location information, rather than DOA information, for example, sound source is in three-dimensional Position in space.Whereby, can define spatial filter, these filters not only according to need to specific trend pass filtering, and And the three-dimensional spatial area of record scene is filtered.

When the location information of sound source is available, all explanations provided DOA are equally applicable.For example, location information It can be by DOA (angle) and apart from expression.When indicating using this position, DOA can be obtained immediately from location information.Alternatively, Location information can be described for example by x, y, z coordinate.Then, the location information based on sound source and based on recording respective input wheat The position of the microphone of gram wind number can be easy to calculate DOA.

In the following, it is described that further embodiment.

Some embodiments allow to reduce the selective SoundRec carried out spatially by dereverberation and noise.At this Under background, embodiment is provided, application space filtering, for extracting in source, the signal increasing in terms of dereverberation and noise reductions By force.The purpose of this embodiment is, calculates signal Y corresponding with having any pickup output of direction microphone of pattern (k,n).This means that directional sound (for example, single plane wave) is decayed or saved as needed according to its DOA, inhibit simultaneously Spread sound or microphone self noise.According to embodiment, provided spatial filter is especially in combination with making prior art space The advantages of filter, provides high directivity index (DI) and high white noise acoustic gain (WNG) with high DNR. According to some embodiments, spatial filter can be only by linear restriction, this allows quickly to calculate weight.For example, Fig. 2 and figure 3 transmission function G (k, n) can for example indicate that pattern is picked up in the expectation of directional microphone.

Hereinafter, the formula for providing the problem indicates.Then, weight calculation module 103 and IPI computing module are provided 102 embodiment carries out selective SoundRec spatially to reduce using dereverberation and noise.Moreover, description The embodiment of corresponding TF selecting module 104.

Firstly, providing the formula of the problem indicates.Consider to be located at d_1...MThe array of the M isotropic directivity microphone at place.It is right In each (k, n), it is assumed that sound field is by spatially spreading in isotropism and uniformly L < M the plane wave propagated in sound field (direction sound) is constituted.Microphone signal x (k, n) is writeable are as follows:

Wherein, x_l(k, n)=[X_l(k,n,d₁)...X_l(k,n,d_M)]^TIncluding proportional to the acoustic pressure of first of plane wave Microphone signal, x_d(k, n) is measured on-fixed noise (for example, diffusion noise), and x_n(k, n) is steady noise/slow Slowly the noise (for example, microphone self noise) changed.

Assuming that three components in formula (2) are uncorrelated each other, then power spectral density (PSD) matrix of microphone signal It can be described by following formula:

Wherein, Φ_d(k, n)=φ_d(k, n) Γ_d(k) (4)

Wherein, Φ_n(k, n) is the PSD matrix of the noise of steady noise/slowly varying, and φ_d(k, n) is that on-fixed is made an uproar The anticipating power of sound, the power can rapidly change with time and frequency.By γ_ij(k) the coherence matrix Γ indicated_d(k) I-th j element is the coherence as caused by on-fixed noise between microphone i and j.For example, for spherical isotropy Diffusion field, γ_ij(k)=sinc (κ r_ij) [20], wherein wave number k and r_ij=| | dj-d_i||.Coherence matrix Γ_d(k) the i-th j Element is the coherence as caused by the noise of steady noise/slowly varying between microphone i and j.It makes an uproar certainly for microphone Sound, Φ_n(k, n)=φ_n(k, n) I, wherein I is unit matrix and φ_n(k, n) is the expectation power of self noise.

Directional sound xl (k, n) in (2) is writeable are as follows:

Wherein,Be the DOA of first of plane wave azimuth (Indicate array broadside directive) andIt is to propagate vector. I-th A element

Describe the phase shift of first of plane wave of the microphone from the first to i-th.It should be noted that r_i=| | di-d1 | | Equal to the distance between first and i-th of microphone.

AngleCommonly referred to as spatial frequency.The DOA of first of wave can ByOr byIt indicates.

As described above, the purpose of embodiment is, microphone signal x (k, n) is filtered, so that from particular space The directional sound that region reaches is decayed or is amplified as needed, while inhibiting fixed and on-fixed noise.Therefore, it is desirable to letter It number may be expressed as:

Wherein,It is real value or arbitrary (for example, predefined) directivity function of complex value, the function It can be with frequency dependence.

Fig. 4 is related to the scene according to the tool of embodiment there are two arbitrary directivity function and source position.In particular, scheme 4 show two example directionalityWithAmplitude.It is using When (referring to the solid line in Fig. 4), fromIn the direction sound attenuating 21dB that reaches, and the direction sound in other directions It is unattenuated.In principle, any direction is designed to even function, for example,(referring to the void in Fig. 4 Line).Moreover,It can be designed as with time change, for example, once position, just extract mobile or go out Existing sound source.

By the linear combination of microphone signal, the estimation of signal Y (k, n) is obtained, for example, passing through

Wherein, w (k, n) is the complicated weight vectors of length M.Corresponding optimal weight vector w (k, n) is obtained below. Hereinafter, for simplicity, weight w (k, n) is omitted to the dependence of k and n.

Now, two embodiments of weight calculation module 103 in figure 2 and figure 3 are described.

From (5) and (7), it then follows w (k, n) should meet linear restriction:

Moreover, on-fixed and fixation/slowly varying noise power at the output of filter should reduce as far as possible.

Fig. 5 describes the embodiments of the present invention of application space filtering.In particular, Fig. 5 is shown according to another reality Apply the weight generator 110 of mode.Again, weight generator 110 include information computational module 102, weight calculation module 103 with And transmission function selecting module 104.

More specifically, Fig. 5 shows minimum variance (LCMV) method of linear restriction.In this embodiment (see Fig. 5), According to the IPI I (k, n) for the DOA for including L plane wave and fixed and revocable noise statistical information, weight w is calculated (k,n).Subsequent information may include the independent power φ of DNR, two noise component(s)s_n(k, n) and φ_d(k, n) or two The PSD matrix Φ of noise component(s)_nAnd Φ_d。

For example, Φ_dIt can be considered as the first noise information of the first noise component(s) about two noise component(s)s, and Φ_nIt can It is considered as the second noise information of the second noise component(s) about two noise component(s)s.

For example, weight generator 110 can be configured to according in at least some of one or more microphone input signals One or more coherences between first noise component(s), to determine the first noise information Φ_d.For example, weight generator 110 can It is configured as the relevant of coherence according to caused by indicating to input the first noise component(s) of microphone signals as two or more Matrix Γ_d(k), the first noise information is determined, for example, by applying formula Φ_d(k, n)=φ_d(k, n) Γ_d(k)。

By minimizing self noise (steady noise/slowly varying noise) at the output of filter and spreading sound function The summation of rate (on-fixed noise) is found out for solving the weight w (k, n) in the problems in (8), that is,

It uses (4) and assumes Φ_n(k, n)=φ_n(k, n) I, optimization problem may be expressed as:

Wherein,

It is the time-varying input DNR at microphone.Specifying constraint (9), the solution for (10) and (12) are [21]。

=C^-1A[A^HC^-1A]^-1G, (15)

Wherein,Including flat according to L that propagate vector The DOA information of surface wave.Corresponding expected gain is given by the following formula:

The embodiment of the estimation of Ψ (k, n) and other desired IPI is described below.

Other embodiments are based on parametric multi-channel Wiener filter.In this embodiment, as shown in fig. 6, IPI into One step includes the information about signal statistics, it may for example comprise the signal PSD matrix Φ of the power of L plane wave (direction sound)_s (k,n).Moreover, optional control parameter λ_1...L(k, n) is considered as controlling the degree of distorted signals in each of L plane wave.

Fig. 6 shows answering for the spatial filter for implementing the weight generator 110 using parametric multi-channel Wiener filter Embodiment.Here, weight generator 110 includes information computational module 102, weight calculation module 103 and transmitting letter Number selecting module 104.

By parametric multi-channel wiener filter approaches, calculate weight w (k, n).Wiener filter minimizes at output Residue signal power, that is,

Cost function (cost function) C (k, n) being minimized is writeable are as follows:

Wherein, Φ_s(k, n)=E { x_s(k,n)x_s(k,n)^HIt include direction sound PSD, and x_s(k, n)=[X₁(k,n, d₁)X₂(k,n,d₁)….X_L(k,n,d₁)] it include the signal proportional to the acoustic pressure of L plane wave at reference microphone.It wants It is noted that Φ_s(k, n) is diagonal matrix, wherein diagonal element dia_g{Φ_s(k, n) }=[φ₁(k,n)...φ_L(k,n)]^T It is the power of the plane wave reached.It may include diagonal matrix Λ (k, n) to control the distorted signals of introducing comprising when Between and frequency dependence control parameter diag { Λ }=[λ₁(k,n)λ₂(k,n)...λ_L(k,n)]^T, that is,

C_PW(k, n)=[g-A^H(k, n) w]^HΛ (k, n) Φ_s(k, n) [g-A^H(k, n) w]

w^HΦ_u(k, n) w. (20)

Consider C_PWThe solution of the minimization problem in (17) of (k, n) is:

W=[A^HΛ (k, n) Φ_s(k, n) A+ Φ_u]-¹A Λ (k, n) Φ_s(k, n) g. (21)

This equates

It should be noted that for Λ^-1=0, obtain the LCMV solution in (14).For Λ^-1=I is obtained more Channel Wiener filter.For other values λ_1...L(k, n) can control the distortion level and residual noise of corresponding source signal respectively The degree of inhibition.Therefore, it generally according to available parameter information, limitsThat is,

Wherein, f () is any user-defined function.For example, λ can be selected according to the following formula_1...L(k, n):

Wherein, φ_l(k, n) is the power of first of signal (first of plane wave), and φ_u(k, n)=φ_n(k,n)+φ_d (k, n) is the power of undesirable signal (steady noise/slowly varying noise adds revocable noise).Whereby, parameter is tieed up Filter of receiving depends on the statistical information of the signal component about two or more input microphone signals, therefore, parameter dimension Filter of receiving further depends on the statistical information of the noise component(s) about two or more input microphone signals.

If source 1 is strong compared with noise, then obtain close to 0Mean to obtain LCMV solution (the not distortion of source signal).If noise is strong compared with source power, then obtain close to 1Mean to obtain Parametric Wiener filtering device (strong inhibition noise).

Φ is described below_s(k, n) and Φ_uThe estimation of (k, n).

In the following, it is described that the embodiment of instantaneous parameters estimation module 102.

Before it can calculate weight, need to estimate different IPI.Well-known narrowband DOA estimator (example can be passed through Such as, ESPRIT [22] or root MUSIC [23]) or other prior art estimators obtain L plane calculating in module 201 Wave.For example, these algorithms can provide (for example) azimuth to reach one or more waves of arraySpatial frequencyPhase shiftOr propagate vectorDOA estimation will not be discussed further, Because DOA estimation itself is well known in the art.

In the following, it is described that diffusion noise ratio (DNR) is estimated.In particular, the estimation of description input DNR Ψ (k, n), that is, The realization of module 202 in Fig. 5.DNR estimation utilizes the DOA information obtained in module 201.It, can in order to estimate Ψ (k, n) Using additional space filter, which removes L plane wave, only to capture diffusion sound.For example, passing through maximum The WNG for changing array, finds out the weight of this spatial filter, that is,

It obeys

Constraint condition (27) ensures that non-zero weights w_Ψ.Propagate vectorWith specific directionIt is right It answers, the DOA in direction and L plane waveIt is different.Hereinafter, forIt selects and allPhase Direction away from maximum distance, that is,

Wherein,Given weight w_Ψ, it is given by the following formula the output power of exceptional space filter:

Now, input DNR can be calculated by (13) and (29), that is,

Assuming that power being as the time is constant or is slowly changed, at mute (silence), for example, can estimate Microphone self noise φ required for counting_nThe anticipating power of (k, n).It should be noted that due to selected optimisation criteria (45), So the DNR estimator proposed actually needs not be provided minimum estimate variance, but it is to provide just result.

Hereinafter, on-fixed PSD φ is discussed_dThe estimation of (k, n), that is, another reality of the module (202) in Fig. 5 It is existing.Following formula can be used, estimate the power (PSD) of on-fixed noise:

Wherein, in previous paragraph, w is defined_Ψ.It should be noted that when mute (that is, in no signal and non-solid When determining noise), it can estimate fixed/slowly varying noise PSD matrix Φ_n(k, n), that is,

Φ_n(k, n)=E { x (k, n) x^H(k, n) }, (32)

Wherein, desired value is approached by being averaging to mute frame (silent frame) n.It can be by prior art side Method detects mute frame.

Hereinafter, the estimation of undesirable signal PSD matrix (see module 203) is discussed.

Following formula can be passed through:

Φ_u(k, n)=φ_n(k, n) (Ψ (k, n) Γ_d(k)+Γ_n(k)), (33)

Or more generally pass through following formula:

Φ_u(k, n)=φ_d(k, n) Γ_d(k)+Φ_n(k, n), (34),

Obtain undesirable signal (fixed/slowly varying noise adds revocable noise) Φ_uThe PSD matrix of (k, n).

Wherein, Γ_d(k) and Γ_n(k) it can be used as prior information (seeing above).As explained above, can calculate DNR Ψ (k, N), fixed/slowly varying noise power φ_n(k, n) and other desired amount.Therefore, Φ_u(k, n) estimation is using by module The 201 DOA information obtained.

In the following, it is described that the estimation of signal PSD matrix (see module 204).

It can be calculated by following formula for calculating Φ_sThe power φ of plane wave is reached required for (k, n)_1...L(k, N):

Wherein, weight w_lThe plane wave for inhibiting all arrival, in addition to first of wave, that is,

For example,

It is subjected to (36).Φ_s(k, n) estimation utilizes the DOA information obtained in module (201).Such as the institute in previous paragraph It explains, undesirable signal Φ can be calculated_uThe required PSD matrix of (k, n).

Now, the transmission function selecting module 104 according to embodiment is described.

It in this application, can be according to DOA informationFind out the gain of corresponding plane wave 1It is differentTransmission functionIt can be used for the system, for example, conduct User-defined prior information.It can also be based on the analysis of image, for example, calculating gain using the position of face detected.? Two examples are described in Fig. 4.These transmission functions are corresponding with the desired pickup pattern of direction microphone.It can provide transmission functionFor example, as look-up table, that is, for estimationCorresponding gain is selected from look-up table It should be noted that can also be according to spatial frequencyRather than azimuthThat is, According to G (k, μ), rather thanDefine transmission function.Can also source location information be based on rather than DOA information, calculated Gain

Now, experimental result is provided.Following analog result illustrates the usability of above embodiment.Compare and is proposed System and prior art systems, be explained.Then, experimental setup is discussed and result is provided.

Firstly, considering existing space filter.

Although can estimate PSD φ in mute period_n(k, n), but usually assume that φ_d(k, n) is unknown and is difficult to discover. Accordingly, it is considered to the two existing space filters that can be calculated without this knowledge.

First spatial filter be known as delay summation beam-shaper, and minimize at the output of filter from Noise power [that is, maximizing WNG] [1].Then, by following formula, acquisition makes to be subjected to the equal between (7) and (8) of (9) The optimization weight vectors that square error (MSE) minimizes:

In the presence of the close solution [1] for being directed to (38), which allows quickly to calculate w_n.It should be noted that The filter needs not be provided maximum DI.

Second space filter is known as steady superdirectivity (SD) beam-shaper, and by under on WNG Limit minimizes the diffusion sound power [that is, maximizing DI] [24] at the output of filter.Lower limit increase pair on WNG The robustness of the error in vector is being propagated, and is limiting the amplification [24] of self noise.Then, it by following formula, obtains most Smallization is subjected to the MSE between (7) and (8) of (9) and meets the optimization weighing vector of the lower limit on WNG:

And it is subjected to quadratic constraints w^Hw<β.Parameter beta^-1It defines minimum WNG and determines the achievable DI of filter.It is real On border, best weights are found out between the abundant WNG that is generally difficult in low SNR and sufficiently high DI in high snr cases Weighing apparatus.Moreover, because quadratic constraints causes non-convex optimization problem, this needs the time to solve so solving (39).Due to time-varying Constraint (9) causes to need to recalculate complicated weighing vector to each k and n, so this especially a problem.

Now, consider experimental setup.Assuming that L=2 plane wave in the model in (2) and between microphone The uniform linear array (ULA) of the M=4 microphone of spacing 3cm simulates shoes box space using source images method [25,26] (7.0×5.4×2.4m³、RT₆₀≈ 380ms),AndWhen be respectively provided with two sound source (distances 1.75m compares Fig. 4).Signal includes 0.6s mute, followed by fuzzy word (double talk).White Gauss noise is added to In microphone signal, the segmental signal-to-noise ratio (SSNR) of 26dB is generated.Sound is sampled with 16kHz and is converted using 512 point STFT At time-frequency domain, there is 50% overlapping.

Consider the directivity function of Fig. 4That is, should in distortionless situation extraction source A, while the power of source B declines Subtract 21dB.Consider the two spatial filters and provided spatial filter above.For steady SD beam-shaper (39), the smallest WNG is set as -12dB.For the spatial filter (12) of offer, as explained above, estimation DNR Ψ (k, n).When starting, self noise power φ is calculated from mute signal part_n(k,n).Expection in (3) is by more than τ=50ms's The recursive average time approaches.

Hereinafter, consider the constraint of non-time-varying direction.

For this simulation, it is assumed that about the two source positionsWithPriori knowledge.In all processing steps, It usesWithTherefore, the direction in (9) and (26) constrains not with time change.

Fig. 7 shows true and estimation DNR Ψ (k, n).The region of the two labels respectively indicates the mute of signal And movable part.In particular, Fig. 7 describes true and estimation the DNR Ψ (k, n) according to time and frequency.Due to reverberation Environment, so obtaining higher DNR during speech activity.Since the time averaged of combination is handled, so in Fig. 7 (b) The DNR of estimation handle limited temporal resolution.However, Ψ (k, n) estimation is accurate enough, as shown in following result.

Fig. 8 (a) describes w_nAnd w_dSpatial filter w that is (unrelated with signal) and being proposed_nd(with signal phase Close) average DI.For the spatial filter proposed, it is shown that the mute part of signal and during speech activity [ The two signal sections are marked in Fig. 7 (b)] DI.In mute period, spatial filter (the dotted line w that is proposed_nd) provide and w_n Identical low DI.(the solid line w during speech activity_nd), DI obtained and steady SD beam-shaper (w_d) equally high.Figure 8 (b) show corresponding WNG.In mute period, spatial filter (the dotted line w that is proposed_nd) high WNG is realized, and in signal During activity, WNG is lower.

Fig. 8: compare the DI and WNG of spatial filter.For w_d, minimum WNG is set as -12dB, so that spatial filter is steady It is strong, resist microphone self noise.

It is combined in general, Fig. 8 shows the advantages of proposed spatial filter makes the two existing space filters: During mute part, maximum WNG is provided, causes minimum self noise to amplify, that is, high robustness.

During the usually activity and high reverberation of masking self noise, high DI (using low WNG as cost) is provided, causes to expand Dissipate the best reduction of sound.In this case, even fairly minimal WNG tolerable.

It should be noted that for higher frequency (f > 5kHz), due to the correlation matrix Γ in (39) and (12)_d(k) No better than unit matrix, so all spatial filters are almost universally run.

Hereinafter, consider instantaneous direction constraint.

For the simulation, it is assumed that not aboutWithAvailable prior information.Estimated by ESPRITWithTherefore, constraint condition (9) is with time change.Only for steady SD beam forming Device (w_d), using with fixed observer directionCorresponding single and non-time-varying constraint condition (9).This beam-shaper As reference.

Fig. 9 describes estimationWith generated gainIn particular, Fig. 9 shows Estimation is gone outWith generated gain |If DOA is located at space in Fig. 4 Window interior (solid line), the then plane wave reached are unattenuated.Otherwise, the power attenuation 21dB of wave.

Table 1 shows the performance of all spatial filters (* is untreated).Value in bracket indicates non-time-varying direction about Beam condition, the value not in bracket indicate instantaneous direction constraint condition.Before calculating SIR, SRR and SSNR, signal carries out A Weighting.

Table 1

In particular, it is mixed in signal-to-noise ratio (SIR), signal at the output of filter to summarize spatial filter for table 1 Overall performance in terms of ringing than (SRR) and SSNR.In SIR and SRR (source separation, dereverberation) aspect, the method proposed (w_nd) and steady SD beam-shaper (w_d) peak performance is provided.However, the w proposed_ndSSNR ratio w_dSSNR high 6dB, this expression can understand the advantages of hearing.Use w_n, obtain the optimum performance in terms of SSNR.In terms of PESQ, w_ndWith w_dSurpass w_n.Using instantaneous direction constraint condition, rather than varying constraint condition (value in bracket) mainly reduce it is achievable SIR, but in the case where changing source position, provide and rapidly adapt to.It should be noted that the institute of each time frame is in need multiple The calculating time of miscellaneous weighting is for w_dGreater than 80s (tool box CVX [27,28]) and the method for being proposed is less than 0.08s (MATLAB R2012b、MacBook Pro 2008)。

In the following, it is described that the embodiment that spatial sound reproduces.The purpose of embodiment is, (for example) passes through Mike Wind array captures sound scenery, and again by arbitrary sound reproduction system (for example, the setting of 5.1 loudspeakers, headphone reproduction) Existing spatial sound, to create luv space impression again.Assuming that sound reproduction system includes N number of channel, that is, calculate N number of output Signal Y (k, n).

Firstly, providing problem formulations indicates.Consider signal model (see above formula (2)), and formula indicates similar Problem.Fixed/slowly varying noise is corresponding with undesirable microphone self noise, and revocable noise and desired diffusion Sound is corresponding.Since the luv space impression for reproducing record scene is most important, so sound is spread in expectation in this application.

It hereinafter, should be in an absence of distortion from correspondingIt realizes and reproduces direction sound X_l (k,n,d₁).Moreover, diffusion sound should be reproduced using from institute's directive correct energy, while inhibiting microphone self noise. Therefore, the desired signal Y (k, n) in (7) indicates now are as follows:

Wherein, Y_i(k, n) is the signal (i={ 1 ..., N }) of i-th of channel of sound reproduction system, X_d,i(k,n,d) It is in the arbitrary point reproduced from loudspeaker i (for example, in the first microphone d₁) at the diffusion sound that measures, and G_d(k, n) is The gain function of sound is spread, to ensure to spread the correct power of sound during reproduction (usually)。 It is desirable that signal X_d,i(k, n) has correctly diffusion sound power, and mutually irrelevant on channel i, that is,

The transmission function of direction sound componentIt is corresponding to speaker gain function relevant with DOA. The example for the case where description is reproduced for Stereoloudspeaker in Figure 10.If wave 1 fromMiddle arrival, then G₁=1 and G₂=0.This indicates only to reproduce this direction sound from the channel i=1 (left channel) of playback system.ForHave That is, passing through the equal power of the two loudspeakers, direction sound is reproduced. Or, if it is desired to binaural reproduction, thenIt can be corresponding with HRTF.

As described above, being based on complexity weight w (k, n), by the linear combination of microphone signal, signal Y is estimated_i(k, n), That is,

It is subjected to specific constraint condition.The constraint condition and calculating of weighting w (k, n) are explained in next fraction.

Hereinafter, under the present context, consider the weight calculation module 103 according to corresponding embodiment, provide Fig. 2's Two embodiments of weight calculation module 103.W is concluded that from formula (5) and formula (40)_i(k, n) should meet Linear restriction:

Moreover, diffusion sound power should be kept.Therefore, w_i(k, n) can satisfy quadratic constraints:

Moreover, the self noise power at the output of filter should minimize.Therefore, optimal weight can calculate are as follows:

It is subjected to formula (43) and formula (44).This causes (for example) soluble convex by well known numerical method [29] Optimization problem.

Well known narrowband DOA estimator (example is passed through according to corresponding embodiment for instantaneous parameters estimation module 102 Such as, ESPRIT [22] or root MUSIC [23]) or other prior art estimators, L plane wave can be obtained

Now, consider the transmission function selecting module 104 according to corresponding embodiment.In this application, believed according to DOA BreathThe gain of channel i is found out for corresponding direction sound 1 For differentWith the transmission function of channel iIt can be used for system, for example, as user-defined priori Information.It can also be based on the analysis of image, for example, calculating gain using the position of face detected.

Transmission function is usually providedAs look-up table, that is, for estimationFrom look-up table The middle corresponding gain of selectionIt should be noted that can also be according to spatial frequencyRather than AzimuthThat is, according to G_i(k, μ), rather than Define transmission function.It should be noted that transmitting Function can also be corresponding with the HRTF for being able to carry out ears Sound reproducing.In this case,Usually complex value. It should be noted that can also be according to source location information, rather than DOA information calculates gain or transmission function.

An example of Stereoloudspeaker reproduction is described in Figure 10.In particular, Figure 10 shows the increasing of stereoscopic rendering Beneficial function.

Although describing some aspects under the background of equipment, but it is clear that these aspects are also represented by retouching for correlation method It states, wherein box or apparatus and method step or the feature of method and step are corresponding.Similarly, it is retouched under the background of method and step The aspect stated is also represented by the corresponding box of relevant device or the description of project or feature.

Decomposed signal of the invention is storable on digital storage mediums or can transmit over a transmission medium, for example, nothing Line transmission medium or wired transmissions medium, such as, internet.

It is required according to certain implementations, embodiments of the present invention can be implemented within hardware or in software.It can be used Digital storage mediums (for example, floppy disk, DVD, CD, ROM, PROM, EPROM, EEPROM or FLASH execute implementation), in the number The control signal that electronically readable is stored on word storage medium, cooperate (or can cooperate) with programmable computer system, To execute corresponding method.

According to certain embodiments of the present invention include non-transient data medium, control signal with electronically readable, These signals can coordinate with programmable computer system, to execute a kind of method being described herein.

In general, it is the computer program product with program code that embodiments of the present invention are implementable, when computer journey When sequence product is run on computers, which is operable as executing a kind of method.For example, the program code is storable in In machine-readable carrier.

Other embodiments include being stored in machine-readable carrier to be used to execute a kind of method being described herein Computer program.

In other words, therefore, the embodiment of method of the invention is the computer program product with program code, works as meter When calculation machine program is run on computers, said program code is for executing a kind of method being described herein.

Therefore, the further embodiment of method of the invention is that (or digital storage media or computer can for data medium Read medium) comprising deposit the computer program for being used to execute a kind of method being described herein of record on it

Therefore, the further embodiment of method of the invention is data flow or signal sequence, is indicated for executing A kind of computer program of method described herein.For example, data flow or signal sequence can be configured to pass through data communication Connect (for example, passing through internet) transmission.

Further embodiment includes processing apparatus, for example, computer or programmable logic device, be configured as or It is adapted for executing a kind of method being described herein.

Further embodiment includes the computer with computer program mounted thereto, and the computer program is used In a kind of method that execution is described herein.

In some embodiments, programmable logic device (for example, field programmable gate array) can be used for executing at this The some or all of functions of the method for described in the text.In some embodiments, field programmable gate array can be with microprocessor Cooperation, to execute a kind of method being described herein.In general, these methods are preferably executed by any hardware device.

Above embodiment is merely illustrative the principle of the present invention.It is to be understood that the setting and details that are described herein Modifications and variations will be readily apparent to those of skill in the art.Therefore, it is intended that only by unexamined patent The limitation of the scope of the claims, the detail for describing and explaining presentation without being received through embodiment herein Limitation.

Bibliography

[1]J.Benesty,J.Chen,and Y.Huang,Microphone Array Signal Processing.Berlin,Germany:Springer-Verlag,2008.

[2]S.Doclo,S.Gannot,M.Moonen,and A.Spriet,“Acoustic beamforming for hearing aid applications,”in Handbook on Array Processing and Sensor Networks,S.Haykin and K.Ray Liu,Eds.Wiley,2008,ch.9.

[3]S.Gannot and I.Cohen,“Adaptive beamforming and postfiltering,”in Springer Handbook of Speech Processing,J.Benesty,M.M.Sondhi,and Y.Huang, Eds.Springer-Verlag,2008,ch.47.

[4]J.Benesty,J.Chen,and E.A.P.Habets,Speech Enhancement in the STFT Domain,ser.SpringerBriefs in Electrical and Computer Engineering.Springer- Verlag,2011.

[5]I.Tashev,M.Seltzer,and A.Acero,“Microphone array for headset with spatial noise suppressor,”in Proc.Ninth International Workshop on Acoustic, Echo and Noise Control(IWAENC),Eindhoven,The Netherlands,2005.

[6]M.Kallinger,G.Del Galdo,F.Kuech,D.Mahne,and R.Schultz-Amling, “Spatial filtering using directional audio coding parameters,”in Proc.IEEE Intl.Conf.on Acoustics,Speech and Signal Processing(ICASSP),Apr.2009,pp.217– 220.

[7]M.Kallinger,G.D.Galdo,F.Kuech,and O.Thiergart,“Dereverberation in the spatial audio coding domain,”in Audio Engineering Society Convention 130, London UK,May 2011.

[8]G.Del Galdo,O.Thiergart,T.Weller,and E.A.P.Habets,“Generating virtual microphone signals using geometrical information gathered by distributed arrays,”in Proc.Hands-Free Speech Communication and Microphone Arrays(HSCMA),Edinburgh,United Kingdom,May 2011.

[9]S.Nordholm,I.Claesson,and B.Bengtsson,“Adaptive array noise suppression of handsfree speaker input in cars,”IEEE Trans.Veh.Technol., vol.42,no.4,pp.514–518,Nov.1993.

[10]O.Hoshuyama,A.Sugiyama,and A.Hirano,“A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters,”IEEE Trans.Signal Process.,vol.47,no.10,pp.2677–2684,Oct.1999.

[11]S.Gannot,D.Burshtein,and E.Weinstein,“Signal enhancement using beamforming and nonstationarity with applications to speech,”IEEE Trans.Signal Process.,vol.49,no.8,pp.1614–1626,Aug.2001.

[12]W.Herbordt and W.Kellermann,“Adaptive beamforming for audio signal acquisition,”in Adaptive Signal Processing:Applications to real-world problems,ser.Signals and Communication Technology,J.Benesty and Y.Huang, Eds.Berlin,Germany:Springer-Verlag,2003,ch.6,pp.155–194.

[13]R.Talmon,I.Cohen,and S.Gannot,“Convolutive transfer function generalized sidelobe canceler,”IEEE Trans.Audio,Speech,Lang.Process.,vol.17, no.7,pp.1420–1434,Sep.2009.

[14]A.Krueger,E.Warsitz,and R.Haeb-Umbach,“Speech enhancement with a GSC-like structure employing eigenvector-based transfer function ratios estimation,”IEEE Trans.Audio,Speech,Lang.Process.,vol.19,no.1,pp.206–219, Jan.2011.

[15]E.A.P.Habets and J.Benesty,“Joint dereverberation and noise reduction using a two-stage beamforming approach,”in Proc.Hands-Free Speech Communication and Microphone Arrays(HSCMA),2011,pp.191–195.

[16]M.Taseska and E.A.P.Habets,“MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator,” in Proc.Intl.Workshop Acoust.Signal Enhancement(IWAENC),Sep.2012.

[17]G.Reuven,S.Gannot,and I.Cohen,“Dual source transfer-function generalized sidelobe canceller,”IEEE Trans.Speech Audio Process.,vol.16,no.4, pp.711–727,May 2008.

[18]S.Markovich,S.Gannot,and I.Cohen,“Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals,”IEEE Trans.Audio,Speech,Lang.Process.,vol.17,no.6,pp.1071– 1086,Aug.2009.

[19]O.Thiergart and E.A.P.Habets,“Sound field model violations in parametric spatial sound processing,”in Proc.Intl.Workshop Acoust.Signal Enhancement(IWAENC),Sep.2012.

[20]R.K.Cook,R.V.Waterhouse,R.D.Berendt,S.Edelman,and M.C.Thompson Jr.,“Measurement of correlation coefficients in reverberant sound fields,”The Journal of the Acoustical Society of America,vol.27,no.6,pp.1072–1077,1955.

[21]O.L.Frost,III,“An algorithm for linearly constrained adaptive array processing,”Proc.IEEE,vol.60,no.8,pp.926–935,Aug.1972.

[22]R.Roy and T.Kailath,“ESPRIT-estimation of signal parameters via rotational invariance techniques,”Acoustics,Speech and Signal Processing,IEEE Transactions on,vol.37,no.7,pp.984–995,July 1989.

[23]B.Rao and K.Hari,“Performance analysis of root-music*,”in Signals,Systems and Computers,1988.Twenty-Second Asilomar Conference on, vol.2,1988,pp.578–582.

[24]H.Cox,R.M.Zeskind,and M.M.Owen,“Robust adaptive beamforming,”IEEE Trans.Acoust.,Speech,Signal Process.,vol.35,no.10,pp.1365–1376,Oct.1987.

[25]J.B.Allen and D.A.Berkley,“Image method for efficiently simulating small-room acoustics,”J.Acoust.Soc.Am.,vol.65,no.4,pp.943–950, Apr.1979.

[26]E.A.P.Habets.(2008,May)Room impulse response(RIR)generator. [Online].Available:http://home.tiscali.nl/ehabets/rirgenerator.html；It sees also: http://web.archive.org/web/20120730003147/http://home.tiscali.nl/ehabets/rir_ generator.html

[27]I.CVX Research,“CVX:Matlab software for disciplined convex programming,version 2.0beta,”http://cvxr.com/cvx,September 2012.

[28]M.Grant and S.Boyd,“Graph implementations for nonsmooth convex programs,”in Recent Advances in Learning and Control,ser.Lecture Notes in Control and Information Sciences,V.Blondel,S.Boyd,and H.Kimura,Eds.Springer- Verlag Limited,2008,pp.95–110.

[29]H.L.Van Trees,Detection,Estimation,and Modulation Theory:Part IV: Optimum Array Processing.John Wiley&Sons,April 2002,vol.1.

Claims

1. a kind of filter (100), described for generating audio output signal based on two or more input microphone signals Audio output signal includes multiple audio output signal samples, wherein indicated in time-frequency domain the audio output signal and The two or more input microphone signals, wherein each of the multiple audio output signal sample is assigned to A time-frequency band in multiple time-frequency bands, and wherein, the filter (100) includes:

Weight generator (110), is adapted to: for each of the multiple time-frequency band, receiving one or more sound sources The arrival direction information of one or more sound components or the location information of one or more of sound sources, and the weight is raw Grow up to be a useful person (110) be adapted to be one or more of sound components of one or more of sound sources according to the time-frequency band The location information of the arrival direction information or one or more of sound sources according to the time-frequency band is described more Each generation weighted information in a time-frequency band；Wherein, the weight generator (110) is adapted to be according to the first noise information And according to the second noise information be each generation weighted information in the multiple time-frequency band, first noise information refers to Show the information of the first coherence matrix of the first noise component(s)s about the two or more input microphone signals, described the Second coherence matrix of the two noise informations instruction about the second noise component(s) of the two or more input microphone signals Information；And

Output signal maker (120), for by the weighted information according to the time-frequency band and according to described two Or more be assigned to the audio input sample of the time-frequency band in each of input microphone signal, when being the multiple One in the multiple audio output signal sample of the time-frequency band is distributed in each generation in frequency range, described to generate Audio output signal.

2. filter (100) according to claim 1, wherein the weight generator (110) is configured as passing through utilization Statistical information generates first noise information, and wherein, the weight generator (110) is configured as not utilizing institute Second noise information is generated in the case where stating statistical information, wherein predefine the statistical information.

3. filter (100) according to claim 1, wherein the weight generator (110) is adapted to be according to following Formula is each generation weighted information in the multiple time-frequency band:

Wherein, Φ_u=Φ_d+Φ_n,

Wherein, Φ_dIt is the first power spectral density of first noise component(s) of the two or more input microphone signals Matrix,

Wherein, Φ_nIt is the second power spectrum density of second noise component(s) of the two or more input microphone signals Matrix,

Wherein, A indicates the arrival direction information,

Wherein, w_ndIt is the vector for indicating the weighted information,

Wherein, g indicates gain,

Wherein,It is the predefined directionality of the first real value or complex value according to the arrival direction information Function, and

Wherein,It is the predefined directionality letter of other real values or complex value according to the arrival direction information Number,

Wherein, k indicates frequency, and wherein, and n indicates the time.

4. filter (100) according to claim 1, wherein the weight generator (110) is configured as according to Two or more input microphone signals first noise component(s) it is at least some between one or more coherences, To determine first noise information, wherein predefine one or more of coherences.

5. filter (100) according to claim 1, wherein the weight generator (110) is configured as according to relevant Matrix Γ_d(k) first noise information, the coherence matrix Γ are determined_d(k) instruction is due to the two or more defeated Enter coherence caused by first noise component(s) of microphone signal, wherein predefine the coherence matrix Γ_d(k), In, k indicates frequency.

6. filter (100) according to claim 5, wherein the weight generator (110) is configured as according to following Formula determines first noise information:

Φ_d(k, n)=φ_d(k, n) Γ_d(k),

Wherein, Γ_dIt (k) is the coherence matrix, wherein the coherence matrix is predefined,

Wherein, Φ_d(k, n) is first noise information, and

Wherein, φ_d(k, n) is the anticipating power of first noise component(s) of the two or more input microphone signals,

Wherein, k indicates frequency, and wherein, and n indicates the time.

7. filter (100) according to claim 1, wherein the weight generator (110) is configured as according to Second noise information and according to the arrival direction information, to determine first noise information.

8. filter (100) according to claim 1,

Wherein, the weight generator (110) is configurable to generate the weighted information as the first weighted information w_Ψ, and

Wherein, the weight generator (110) is configured as generating the first weighting letter by determining the second weighted information Breath,

Wherein, the weight generator (110) is configured as generating the first weighted information w by application following formula_Ψ

To meet following formula:

Wherein,Indicate the arrival direction information,

Wherein,It indicates to propagate vector, and

Wherein, w indicates second weighted information,

Wherein, k indicates frequency, and wherein, and n indicates the time.

9. filter (100) according to claim 8, wherein the weight generator (110) is configured as according to Second weighted information and generated according to the two or more input microphone signals diffusion noise information or diffusion point The power of amount, with determination first weighted information.

10. filter (100) according to claim 1, wherein the weight generator (110) is configured as by answering The weighted information is determined with parametric Wiener filtering device, wherein the parametric Wiener filtering device is depended on about described two Or more the signal component of input microphone signal statistical information, and wherein, the parametric Wiener filtering device depends on The statistical information of noise component(s) about the two or more input microphone signals.

11. filter (100) according to claim 1, wherein the weight generator (110) is configured as according to finger Show the arrival direction information of the arrival direction of one or more plane waves, to determine the weighted information.

12. filter (100) according to claim 1,

Wherein, the weight generator (110) includes for providing the transmission function selecting module of predefined transmission function (104), and

Wherein, the weight generator (110) is configured as according to the arrival direction information and according to described predefined Transmission function, to generate the weighted information.

13. filter (100) according to claim 12, wherein the transmission function selecting module (104) is configured as The predefined transmission function is provided, so that the predefined transmission function indicates arbitrarily according to the arrival direction information Pattern is picked up, so that the predefined transmission function is according to arrival direction information instruction speaker gain, or so as to The predefined transmission function indicates head related transfer function according to the arrival direction information.

14. a kind of method for generating audio output signal based on two or more input microphone signals, the audio Output signal includes multiple audio output signal samples, wherein the audio output signal and described is indicated in time-frequency domain Two or more input microphone signals, wherein each of the multiple audio output signal sample is assigned to multiple A time-frequency band in time-frequency band, and wherein, which comprises

For each of the multiple time-frequency band, the arrival direction of one or more sound components of one or more sound sources is received The location information of information or one or more of sound sources,

According to the arrival direction of one or more of sound components of one or more of sound sources of the time-frequency band The location information of information or one or more of sound sources according to the time-frequency band is in the multiple time-frequency band Each generation weighted information, wherein according to the first noise information and according to the second noise information in the multiple time-frequency band Each to generate the weighted information, the first noise information instruction is about the two or more input microphone signals The information of first coherence matrix of the first noise component(s), the second noise information instruction is about the two or more inputs The information of second coherence matrix of the second noise component(s) of microphone signal；And

By the weighted information according to the time-frequency band and according in the two or more input microphone signals Each of the audio input sample for being assigned to the time-frequency band, distribute to institute for each generation in the multiple time-frequency band One in the multiple audio output signal sample of time-frequency band is stated, to generate the audio output signal.

15. a kind of computer readable storage medium, is stored with computer-executable code, for when in computer or signal processing Implement according to the method for claim 14 when being executed on device.