EP1919251B1

EP1919251B1 - Beamforming weights conditioning for efficient implementations of broadband beamformers

Info

Publication number: EP1919251B1
Application number: EP20060022602
Authority: EP
Inventors: Franck Beaucoup; Michael Tetelbaum
Original assignee: Mitel Networks Corp
Current assignee: Mitel Networks Corp
Priority date: 2006-10-30
Filing date: 2006-10-30
Publication date: 2010-09-01
Anticipated expiration: 2026-10-30
Also published as: DE602006016617D1; EP1919251A1

Description

FIELD OF THE INVENTION

The present invention relates to a beam forming design method and a real-time implementation structure that reduces the computational complexity of broadband beamformers.

BACKGROUND OF THE INVENTION

EP 1 517 581 A2 discloses a method for designing a beam former, and a beam former made in accordance with this method, characterized by a uniform speaker phone response condition, resulting in optimal beam forming directivity under a uniform coupling constraint. According to that document, a finite number of individual beam formers are constrained to have the same response to a loudspeaker signal (as well as the same gain in their respective look directions) without specifying the exact value of their response to this signal. This results in beam former weights that are optimal in the minimum variance sense and satisfy the uniform coupling constraint. The minimum variance condition combines all beam former weights at once, and the uniform coupling constraint is expressed as a finite number of linear constraints on the weights of the individual beam formers, without specifying an arbitrary, a priori value for the actual value of the uniform response.
Sensor array processing (also known as beamforming) consists of combining the signals received by several omni-directional sensors to provide spatial directivity (see H.L.Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002 for a general presentation). Beamforming has been used for several decades for applications such as radar, sonar, hearing aids and smart antennas for telecommunications. More recently, with the availability of inexpensive signal processing power, microphone arrays have also been used for low-cost desktop products such as end-point speech processing devices for-Personal Computers (PC) applications (see M.Brandstein and D.Ward, "Microphone arrays, signal processing techniques and applications," Springer, 2001) and audio conference phones (see M.Tetelbaum and F.Beaucoup, "Design and implementation of a conference phone based on microphone array technology" Proceedings of Global Signal Processing Conference and Expo (GSPx) 2004, San Jose, CA, Sep 2004).
The computational complexity of array processing depends on such factors as the number of sensors (respectively sources), the amount of spatial directivity desired from the array, and the bandwidth of operation compared to its average frequency (narrow-band or broad-band beamforming, as defined in H.L.Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002 and M.Brandstein and D.Ward, "Microphone arrays, signal processing techniques and applications," Springer, 2001). For cost-sensitive applications and in scenarios where the number of sensors is large, this computational complexity can be a critical issue.
Under the conventional narrow-band assumption, a beamformer can be described as a weighted summation of signals received by an array of sensors. In the simplest case, the signals are only submitted to pure delays, resulting in a beamformer known as the delay-and-sum beamformer (or conventional beamformer). In the more general case of superdirective beamforming, the weights applied to the various channels can use both magnitude and phase information to create more effective spatial filtering. In the traditional complex-domain representation of signals, these weights are therefore complex numbers. In general, these beamforming weights are chosen to optimise a criterion related to the directivity of the beamformer, such as in the popular Minimum Variance Distortionless Response (MVDR) and Linearly Constrained Minimum Variance (LCMV) beamformers (see H.L. Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002 and M.Brandstein and D.Ward, "Microphone arrays, signal processing techniques and applications," Springer, 2001).
In the case of broadband beamforming, both time-domain and frequency-domain implementations are possible. In the time domain, the weighting operations of the narrow-band case are replaced by temporal filtering on each channel. In terms of computational complexity for the resulting time-domain implementation, shorter filters are more favourable. To design the filters, one approach is to first calculate complex-domain weights at discrete frequencies across the desired frequency range and then fit a time-domain filter to the frequency response consisting of the complex beamforming weights over the whole frequency range for each particular channel (see J.G. Ryan. "Near-field beamforming using microphone arrays", PhD thesis, Carleton University, 1999). The time-domain filters are most commonly Finite Impulse Response (FIR) filters, but Infinite Impulse Response (IIR) filters can also be used. Any traditional filter-fit technique can be used provided that it respects both the magnitude and the phase of the complex weights (see J.G. Ryan. "Near-field beamforming using microphone arrays", PhD thesis, Carleton University, 1999 and L.J.Karam and J.H.McClellan, "Complex Chebyshev Approximation for FIR Filter Design," IEEE Trans. on Circuits and Systems II. March 1995. Pp 207-216.). Another approach is to design the time-domain filters directly in such a way that they minimise some optimisation criterion (i.e. MVDR, LCMV) over the frequency range of operation. Several variants of this strategy are discussed in H.L. Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002, J.G. Ryan. "Near-field beamforming using microphone arrays", PhD thesis, Carleton University, 1999, and L.C.Godara and M.R.Sayyah Jahroml, "Limitations and capabilities of frequency-domain broadband constrained beamforming schemes," IEEE Trans. Sig. Proc., vol. 47, no. 9, September 1999.
As mentioned above, an alternative to time-domain implementation is implementation in the frequency domain, a technique sometimes referred to as Discrete Fourier Transform (DFT) beamforming (see H.L.Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002). The principle is to make use of fast convolution in the frequency domain in order to reduce the computational complexity of the beamforming process. Each sensor signal is transformed into the frequency domain with a DFT operation, and narrow-band beamforming is then performed independently on each individual bin in the frequency domain. The resulting frequency-domain beamformer output is then brought back into the time domain with an Inverse DFT (IDFT). This frequency domain technique may be implemented with block processing or with sample processing with the sliding DFT algorithm as explained in M.L.Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002.
The filtering operation in the frequency-domain can be performed with the traditional techniques of overlap-add or overlap-save (see S.Haykin, "Adaptive filter theory," Prentice Hall, 1996). To minimise the amount of overlap (and thereby reduce the computational complexity) and reduce the algorithmic delay without introducing spatial aliasing, it is desirable that the frequency domain weights correspond to time-domain filters that are as short as possible. To achieve that, short time-domain filters are fitted to the original frequency-domain weights as described for time-domain implementations, followed by use of the frequency response of these time-domain filters to perform narrow-band beamforming on each bin in the frequency domain. In both the time-domain and the frequency-domain implementations, the step of designing short time-domain beamforming filters is an important step towards computational efficiency. The lower order of time-domain filters that can be used to achieve desired performance (i.e.: the directionality) of the resulting beamformer, the lower the computational complexity of the final real-time implementation.
With superdirective beamformers such as the MVDR and LCMV beamformers, it is well-known (see M.Brandstein and D.Ward, "Microphone arrays, signal processing techniques and applications," Springer, 2001) that for conventional models of isotropic noise, the frequency response of the beamforming weights on each channel typically exhibits a strong low-pass characteristic; that is, presents a strong peak at low frequencies. In terms of implementation efficiency, this phenomenon can be seen as putting significant stress on the beamforming filter design to obtain a close fit to the complex-domain weights, which can result in high filter orders and therefore higher computational complexity.

SUMMARY OF THE INVENTION

The present invention describes a novel design procedure that produces a more efficient time-domain representation of the frequency-domain weights and therefore results in more efficient real-time implementations, both in the time domain and in the frequency domain.
It is an aspect of the present invention to introduce a new stage, referred to as a conditioning stage, in the traditional channel-based beamforming filter design approach.
More particularly, the conditioning stage is implemented so as to remove from all beamforming channels (by division in the frequency domain, and therefore without affecting the spatial directivity) some common characteristics in their weights' frequency responses. The conditioning stage facilitates the filter fit on each channel, thereby reducing the filter order and consequently the computational complexity of the beamforming structure, whether time-domain or frequency-domain. The resulting change in the beamformer's response to its look direction can be compensated for by a single-channel "conditioning equalisation" filter placed on the output of the beamformer.
According to an aspect of an embodiment, in a channel-based beamforming system in which beamforming weights are predetermined, an improvement of the beamforming design method comprising modifying each beamforming weight in the frequency domain by dividing each beamforming weight by a common characteristic established in the frequency response across an array of sensors prior to fitting each beamforming weight to a filter.
According to another aspect of an embodiment, provided is a weight-conditioning beamformer for providing spatial directivity, said beamformer comprising an array of sensors, corresponding beamforming filters fitted with conditioned beamforming weights, wherein each conditioned beamforming weight is obtained by dividing each beamforming weight by a common characteristic established in the frequency response across an array of sensors prior to fitting each beamforming weight to the filter, and a summer for summing the outputs of said filters.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram of a time-domain implementation of a broadband beamformer.
Figure 2 is a block diagram of a frequency-domain implementation of a broadband beamformer.
Figure 3 is a process diagram of a broadband beamformer according to an embodiment of the invention.
Figure 4 is a block diagram of the embodiment shown in Figure 3, illustrating a time-domain implementation of a broadband beamformer with weight conditioning.
Figure 5 is a graph showing multi-channel normalized filter-fit quality according to a multi-channel normalised filter-fit error criterion with and without the weights condition step.

DESCRIPTION OF THE EMBODIMENTS

As mentioned above with respect to broadband beamforming, there are two options available for implementation: a time-domain implementation (Figure 1) and frequency-domain implementation (Figure 2).
In the time domain implementation, one approach for designing the time-domain filters is to first calculate complex-domain weights at discrete frequencies across the desired frequency range, and then fit a time-domain filter to the frequency response consistent of the complex beamforming weights over the whole frequency range for each particular channel. In the real time implementation shown in Figure 1, each output from the array of sensors 12 is then filtered by the corresponding time-domain filter F _k 14; the resulting outputs are then summed 16 and outputted for subsequent processing.
The alternate approach, frequency-domain implementation 20, also referred to as DFT beamforming, is shown in Figure 2. As described above, DFT beamforming makes use of fast convolution in the frequency domain in order to reduce the computational complexity of the beamforming process from an array of sensors 22. Each sensor signal is transformed into the frequency domain with a DFT operation 24, and narrow-band beamforming 26 is then performed independently on each individual bin in the frequency domain. The resulting frequency-domain beamformer output is then summed 28 brought back into the time domain with an Inverse DFT 30. In Figure 2, this approach is shown using the Fast Fourier Transform (FFT) algorithm.
From a consideration of the prior art, it was determined that the directivity of a beamformer, regardless of the exact indicator that is used to measure it (e.g. directivity index, front-back ratio, signal-to-interference-plus-noise ratio, etc... see H.L.Van Trees, "Optimum array processing (detection, estimation and modulation theory, part IV)," John Wiley and Sons, 2002 or M.Brandstein and D.Ward, "Microphone arrays, signal processing techniques and applications," Springer, 2001) is homogeneous as a function of the beamforming weights or filters. Multiplying all channels by the same complex number at any given frequency does not alter the directivity of the resulting beamformer. For broad-band beamformers, multiplying weights for all channels by the same "frequency response" does not alter the directivity of the resulting beamformer (although it clearly affects its response in the look direction over the frequency range).
It was also determined that in many cases and particularly in the case of conventional models of isotropic noise, the beamforming weights coming from such traditional superdirective beamforming design techniques as MVDR and LCMV, tend to present a rather similar frequency response across all channels. This frequency response is typically "low-pass", meaning that the magnitude response of the individual beamforming weights is much larger for low frequencies than for high frequencies. It can be a challenge to fit such weights with low-order FIR filters.
Given the above, it was further determined that all beamforming weights could be divided by some common pattern in their frequency response in order to facilitate the filter fit and produce shorter beamforming filters without compromising on the beamformer's spatial directivity. This division into the beamforming weights defines the underlying principle of the additional "conditioning stage" having regard to the traditional channel-based beamforming filter design approach. The resulting design process is shown in Figure 3. In particular, frequency-domain beamforming weights are first chosen to optimise a criterion related to the directivity of the beamformer 40, such as in the popular Minimum Variance Distortionless Response (MVDR) and Linearly Constrained Minimum Variance (LCMV) beamformers. The weights are then subjected to conditioning 42 wherein each is divided by some common pattern in their frequency response. The weights are then filter fit 44. Using the resulting filters, the beamformer is then implemented 46 as a time-domain or frequency-domain beamformer. Note that since the conditioning stage affects the beamformer's frequency response in its look direction, compensation, if required, can be achieved by placing a "conditioning equalisation" filter at the output of the beamformer. This equalisation filter only affects the frequency response of the beamformer and not its directivity.
In terms of computational complexity, even if the conditioning equalisation filter is needed by the application, it appears that the savings that can be achieved by using shorter filters on each beamforming channel can easily outweigh the addition of this single-channel filter at the output of the beamformer. The analysis has to be carried out for each specific case, taking into account the actual savings per channel, the number of channels, as well as the opportunity to embed the conditioning equalisation filter in some already existing filter if possible.
The present invention is not intended to be restricted to a specific conditioning function. In a preferred embodiment, one of the beamforming channel weights, or a delayed version of it, is used as the conditioning function. One advantage of this conditioning is that one channel becomes a trivial channel (pure delay) that does not need FIR filtering in the final implementation. The resulting beamforming structure with the conditioning equalisation filter is shown In Figure 4 for a time-domain implementation. As shown, each output from the array of sensors 52 is filtered by the corresponding conditioned time-domain filter F _k 54; the resulting outputs are then summed 60 and outputted for subsequent processing. As shown, the summed signal is then subjected to a conditioning equalization filter 62; it will be appreciated that the conditioning equalization filter is optional, depending on the particular implementation. Although shown for time-domain implementation; one skilled in the art can easily derive the corresponding structure for a frequency-domain (DFT) implementation.
To evaluate the effectiveness of this conditioning on the quality of the filter fit, a "multi-channel normalised filter-fit error" is introduced as a cost function. The expression W_j (v_k ), 1≤j≤M, 1≤k≤N denotes the complex-domain beamforming weights of the M channels over the discrete set of frequencies v_k, 1≤k≤N, and [ρ₁ ρ ₂ ... ρ _N ] the (complex) values of the conditioning function on these same frequencies. These weights are assumed to yield a distortionless beamformer in its look direction (e.g. MVDR or LCMV). As such, the cost function is defined as $a . J ([ρ_{1} ρ_{2} \dots ρ_{M}]) = \sum_{i = 1}^{N} \frac{\sum_{j = 1}^{M} {|{\tilde{F}}_{j} ([ρ_{1} ρ_{2} \dots ρ_{M}], v_{i}) - W_{j} . \times [ρ_{1} ρ_{2} \dots ρ_{M}]|}^{2}}{| ρ_{i} |^{2}},$
where F̃_j ([ρ₁ ρ ₂ ... ρ _M ]v₁ ) denotes the frequency response at frequency v₁ of the filter adjusted (according to a given filter-fit procedure, for instance a simple least-squares fit (see S.Haykln, "Adaptive filter theory," Prentice Hall, 1996) to the j^th channel weight W, dot-multiplied (that is, multiplied element-wise) by the conditioning function [ρ₁ ρ₂, ... ρ _M ]. Note that the reason for the normalisation factor |ρ₁|² at the denominator is to prevent the error function from being trivially minimal for an identically zero conditioning function. Another way of seeing this is that if the distortionless response of the beamformer is to be conserved despite the normalising step, then a conditioning equalisation filter is placed on the output of the beamformer as explained above. The effect of this filter on the squared filter-fit error on each channel is precisely to apply the normalisation factor |ρ ₁ |² to the squared filter-fit error on each channel.
Figure 5 shows the values of this filter-fit quality criterion as a function of the filter order, with and without the weights conditioning step, for a regularised MVDR beamformer over the frequency range [300Hz, 3300Hz] on a 6-microphone uniform linear array of 15cm in length. For this particular example, the weights conditioning step reduces the length of the beamforming filters by roughly 30% for the same quality of filter fit (and therefore the same directivity).
The present invention describes a beamforming design method and a real-time implementation structure that can significantly reduce the computational complexity of broadband beamformers without affecting their performance in terms of spatial directivity or robustness to uncorrelated noise.
It will be appreciated that, although embodiments of the invention have been described and illustrated in detail, various modifications and changes may be made. While preferred embodiments are described above, some of the features described above can be replaced or even omitted.
Although the above description and figures have been presented in the context of an array of sensors, the invention applies equally to arrays of sources for spatially directive transmitters. In such a case, the time-domain filters fitted with the conditioned weights would be applied to the input signals to the transmitter, as opposed to the output signals of the sensors. The present invention pertains to the design procedure making use of conditioning function, and is not restricted to a specific conditioning function. Many choices are possible for the conditioning function; the present invention in intended to cover all choices as long as they fit in the framework of the design procedure shown in Figure 3. As a general example, one skilled in the art could search for a function that would be optimal according to some criterion related to the quality of the filter fit, or the directivity cost function used to calculate the frequency-domain beamforming weights. Specifically, the multi-channel normalised filter-fit error function described above could represent such a criterion. For a given filter order, a traditional gradient-based optimisation procedure could be used to determine the vector [ρ₁ ρ ₂ ... ρ _M ] that minimises this function. Note that because the function is homogeneous, the unknown vector must be normalised one way or another in order to have access to local minima. One way is to fix ρ₁ = 1 and therefore look for a conditioning vector of the form [1 ρ₂ ... ρ _M ].
Furthermore, the above discussion relates to the present invention in the context of an audio conferencing environment. However, other applications making use of fixed beamforming in the presence of some kind of isotropic noise (e.g. any audio processing) could potentially benefit from it. When it comes to adaptive beamforming and specifically adaptive interference cancellation, the outcome is less certain because there is no a-priori guarantee that all frequency-domain beamforming weights will indeed present a strong common pattern in their frequency response and therefore benefit from the conditioning stage. Nevertheless, the use of the aforementioned beamforming weights conditioning stage in a real-time scenario would also fall within the scope of the present invention.
Still further alternatives and modifications may occur to those skilled in the art. All such alternatives and modifications are believed to be within the scope of the invention.

Claims

A beam forming design method for designing a channel-based beamforming system in which beamforming weights are predetermined, said method comprising:
modifying each beamforming weight in the frequency domain by dividing each beamforming weight by a common characteristic established in the frequency response across an array of sensors prior to fitting each beamforming weight to a filter.
The method of claim 1, wherein said common characteristic is one of the beamforming weights on one channel of said channel-based beamforming system.
The method of claim 1, wherein said common characteristic is a delayed version of one of the beamforming weights on one channel of said channel-based beamforming system.
The method of claim 1, wherein said filter is a finite impulse response filter.
A weight-conditioning beamformer for providing spatial directivity, said beamformer comprising:
an array of sensors;

corresponding beamforming filters adapted to be fitted with conditioned beamforming weights, wherein each conditioned beamforming weight is obtained by dividing each beamforming weight by a common characteristic established in the frequency response across an array of sensors prior to fitting each beamforming weight to the filter; and

a summer for summing the outputs of said filters.
The beamformer of claim 5, further comprising an equalization filter for correcting the frequency response following summation of the signals.
The beamformer of claim 5, wherein said beamforming filter is a finite impulse response filter.