WO2019056372A1

WO2019056372A1 - An adaptive filter

Info

Publication number: WO2019056372A1
Application number: PCT/CN2017/103183
Authority: WO
Inventors: Morgan James Colmer
Original assignee: Global Silicon Limited
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2019-03-28
Also published as: CN111201712A; CN111201712B

Abstract

The invention provides a method of processing signals by an adaptive filter. The method includes the steps of determining a direct path distance d between a speaker and a microphone. Based on said determined direct path distance d, a number of taps of the adaptive filter having a zero-valued coefficient is calculated. Input signals to the adaptive filter are then processed adaptively on all remaining taps of the adaptive filter not having a zero-valued coefficient. The adaptive filter can be utilized in an AEC system which dynamically adapts to changes in the direct path distance d.

Description

An Adaptive Filter

The invention relates to an adaptive filter and a method of processing signals by an adaptive filter. The invention more particularly relates to a method of dynamically changing an input of an adaptive filter in response to a change in a distance between a microphone and a speaker. The method relates particularly, but not exclusively, to acoustic echo cancellation (AEC).

AEC is frequently used in speakerphone and hands-free telephony equipment to remove echoes. Such echoes may be caused by sounds from the far end of the communication link, e.g. the user’s voice at the remote end, being emitted by a local loudspeaker and being captured by a local microphone in addition to the intended capture of the local sound, e.g. the local user’s voice. This echo path will often be perceived by the user at the remote end as an echo of their own voice over the top of the local user’s voice. It is generally desirable to remove this unwanted echo signal.

As illustrated in Fig. 1, AEC is often implemented with an adaptive filter 10. The adaptive filter 10 attempts to replicate the transfer function of the acoustic environment 12 where the loudspeaker 14 and the microphone 16 are located based upon an error signal. The error signal is the difference between the microphone feedback signal and the output signal of the adaptive filter 10. Once the adaptive filter 10 has adapted to mimic the acoustic environment 12 where the loudspeaker 14 and the microphone 16 are located, only the additional signal of the local voice will be transmitted to the far end receiver 18.

The adaptive filter 10 often requires considerable processing resources to be practicably implemented. Often a 1024 or 2048 tap adaptive filter 10 is needed. The processing requirements required to implement an adaption algorithm for calculating the adaptive filter tap coefficients are scaled as a function of the filter length. The length of the adaptive filter 10 should be proportional to a tail length of the combined echo paths 20 within the acoustic environment 12 that need to be cancelled. In general, longer echo paths 20 have lower gains and so a system designer may choose a length of the adaptive filter 10 to match the performance requirements of the AEC system implemented with an N tap adaptive filter.

The feedback signal from the microphone 16 can be considered in terms of its impulse response with the various echo paths 20 representing taps of the adaptive filter 10. The AEC system is essentially trying to match the impulse responses of the adaptive filter 10 to the impulse response of the acoustic environment 12. Because of the finite speed of sound in air, the impulse response of the acoustic environment 12 will have a zero response for the duration of time that is equivalent to a time of flight for a direct path between the loudspeaker 14 and the microphone 16. The adaptive filter 10 must have zero-valued coefficients for the samples times, i.e. taps, which represent the time of flight for a direct path between the loudspeaker 14 and the microphone 16. Essentially, there can be no echo paths 20 within a time shorter than a flight time of a direct path 22 of the sound between the speaker and the microphone and thus the taps of the adaptive filter 10 that represent these times do not need to be calculated. When the locations of the loudspeaker 14 and the microphone 16 are fixed with respect to each other, there will generally be a fixed proportion of the initial taps of the adaptive filter 10 which have zero valued coefficients.

Traditional AEC can be very efficient when the relative locations of the loudspeaker 14 and the microphone 16 are fixed with respect to each other. When, however, the relative positions of the loudspeaker 14 and the microphone 16 are not fixed with respect to each other, the AEC system cannot make any assumptions regarding the number of zero-valued coefficient initial taps in the adaptive filter 10 and thus must attempt to calculate all of them in real time on each occasion. This might be the case when, for example, the local user has a portable microphone and moves around within the local acoustic environment 12, although it is equally the case where one of the loudspeaker 14 or the microphone 16 is moved relative to the other. The processing requirements are further increased due to the need to make the adaptive filter 10 converge faster because the acoustic environment 12 as seen by the local microphone 16 may now be dynamically changing and thus needs to adapt faster than the local user is able to move the microphone 14 within that environment 12.

There is therefore a need for an improved method of processing signals in an adaptive filter.

Objects of the Invention

An object of the invention is to mitigate or obviate to some degree one or more problems associated with known methods of processing signals in an adaptive filter.

The above object is met by the combination of features of the main claims; the sub-claims disclose further advantageous embodiments of the invention.

Another object of the invention is to provide a method for changing an input of an adaptive filter in response to a change in a distance between a microphone and a speaker.

Another object of the invention is to provide a method for dynamically changing an input of an adaptive filter in response to dynamic changes in distance between a microphone and a speaker.

One skilled in the art will derive from the following description other objects of the invention. Therefore, the foregoing statements of object are not exhaustive and serve merely to illustrate some of the many objects of the present invention.

In a first main aspect, the invention provides a method of processing signals by an N tap adaptive filter, the method comprising the steps of: determining a direct path distance d between a speaker and a microphone; based on said determined direct path distance d, calculating a number of taps of the adaptive filter having a zero-valued coefficient; and processing input signals adaptively on all remaining taps of the adaptive filter not having a zero-valued coefficient.

In a second aspect, the invention provides a non-transitory computer readable medium storing machine readable code which, when executed by a processor, causes an electronic processing device to implement the steps of the method of the first aspect.

In a third aspect, the invention provides a microphone unit comprising computer readable medium storing machine readable code which, when executed by a processor of said microphone unit, causes the microphone unit to implement the steps of the method of the first aspect.

In a fourth aspect, the invention provides a sound system comprising a speaker and a microphone unit according to the third aspect.

The summary of the invention does not necessarily disclose all the features essential for defining the invention; the invention may reside in a sub-combination of the disclosed features.

The foregoing and further features of the present invention will be apparent from the following description of preferred embodiments which are provided by way of example only in connection with the accompanying figures, of which:

Figure 1 is a schematic diagram of a known AEC system based on an adaptive filter;

Figure 2 is a block diagram of a system in which the method of the invention can be performed;

Figure 3 is a schematic diagram of an AEC system in accordance with the invention based on an adjustable adaptive filter;

Figure 4 shows the AEC system in accordance with the invention in more detail; and

Figure 5 provides a comparison of the performance of the known AEC system of Fig. 1 with the AEC system in accordance with the invention.

Description of Preferred Embodiments

The following description is of preferred embodiments by way of example only and without limitation to the combination of features necessary for carrying the invention into effect.

Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments, but not other embodiments.

It should be understood that the elements shown in the FIGS, may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces.

The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term "processor" or "controller" should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor ("DSP") hardware, read-only memory ("ROM") for storing software, random access memory ("RAM"), and non-volatile storage.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

Referring to Fig. 2, shown by way of example only is a sound system 100 in which the improved AEC system of the invention may be implemented. The sound system comprises a speaker unit 110 and a microphone unit 210 separated by a direct path distance d. The speaker unit 110 may comprise only a speaker unit, but preferably comprises a conference call unit 120 or the like including one or more integrated microphones 130 and input means 140 to enable users to operate the conference unit 120 to set up and hold conference calls. The conference unit 120 preferably provides a main speaker module such as a loudspeaker 150. The conference unit 120 is also provided with a processor 160 and a memory 170. The memory 170 stores machine readable instructions which, when executed by the processor 160, cause the conference unit 120 to implement the methods and functions hereinafter described. The conference unit 120 may also be provided with means 180 such as an accelerometer, a magnetometer or the like which senses when the conference unit 120 has been moved. Other means of detecting the position of the conference unit 120 within an acoustic environment, particularly with respect to the microphone unit 210, may be used additionally or alternatively to the movement sensing means 180.

The microphone unit 210 may comprise a portable unit such that it can be held by a user, although this is not essential. In any case, the microphone unit 210 is preferably configured such that it may be placed at any distance d from the conference unit 120 where d is a measure in metres of the direct path distance from the loudspeaker 150 to a microphone module 220 of the microphone unit 210. In use, the distance d may vary over time should a user move the microphone unit 210 and/or move the conference unit 120. The microphone unit 210 is also provided with a processor 230 and a memory 240. The memory 240 stores machine readable instructions which, when executed by the processor 230, cause the microphone unit 210 to implement the methods and functions hereinafter described. The microphone unit 210 may also be provided with means 250 such as an accelerometer, a magnetometer or the like which senses when the microphone unit 210 has been moved. Other means of detecting the position of the microphone unit 210 within the acoustic environment, particularly with respect to the conference unit 120, may be used additionally or alternatively to the movement sensing means 250.

Whilst the conference unit 120 and the microphone unit 210 are shown as comprising separate devices, it will be understood that the method of the invention can be applied to any sound system where the position of a microphone may be adjusted with respect to a speaker even if the microphone and speaker are provided in the same device or apparatus.

The sound system may include a distance measuring device or system 310 which is arranged to determine the value of the direct path distance d between the conference unit 120 and the microphone unit 210. The distance measuring device or system 310 may comprise any one or any combination of: means for cross-correlating the speaker drive signal and the microphone feedback signal to derive d; radio frequency (RF) ranging means; ultrasound ranging means; or a machine vision system. The distance measuring device or system 310 may comprise a stand-alone unit or may be integrated with one or other of the conference unit 120 and the microphone unit 210. The foregoing examples of distance measuring means are provided by way of example only. It will be understood that any known time of flight (ToF) measuring device, system or apparatus may be employed in the implementation of the invention to determine, calculate, derive or measure the direct path distance d between the conference unit 120 and the microphone unit 210.

The method of the invention generally relates to a method of processing signals for AEC or the like using an N tap adaptive filter where the inputs, i.e. taps, to the adaptive filter can be modified in response to a change in value of the direct path distance d, where N is an integer ≥ 2. The method can be better understood from Figs. 3 and 4 which each show an adaptive filter 410 in an AEC system 400 in accordance with the invention where AEC is used to remove echoes from signals being communicated to a remote receiver system 500 from the local sound system 100.

If the shortest path length d (time of flight) between the loudspeaker 150 and the microphone 220 is known or can be determined, calculated, derived or measured then a Time of Flight Optimization (ToFO) technique can be implemented to optimize the processing requirements to implement AEC within the adaptive filter 410. When the local microphone 220 is located further away from the local loudspeaker 150, the proportion of zero-valued coefficient taps in the adaptive filter 410 increases and thus the processing requirements will fall as the zero-valued coefficient taps need little or no processing compared to the computational requirements for processing adaptively signals input on remaining taps not having a zero-valued coefficient. The processing bandwidth that has been freed up can potentially be used to dynamically decrease the convergence time or to increase the temporal range of the adaptive filter 410. The section of the adaptive filter 410 that has taps calculated to have zero-valued coefficients can be treated as a FIFO filter from a processing point of view which will consume very little processing overhead.

The method of the invention therefore comprises processing signals by the N tap adaptive filter 410 by firstly determining, calculating, deriving, measuring or otherwise obtaining a direct path distance d between the local loudspeaker 150 and the local microphone 220. Then, based on the determined or obtained direct path distance d, calculating a number of taps of the adaptive filter 410 having a zero-valued coefficient. Consequently, the method involves processing input signals adaptively only on all remaining taps of the adaptive filter 410 not having a zero-valued coefficient. For those taps of the adaptive filter 410 calculated to have a zero-valued coefficient, the method may include processing input signals on these taps as First-In-First-Out (FIFO) filter taps. In other words, the taps determined to each have a zero-valued coefficient can be treated as comprising a FIFO section 410A of a combined FIFO and adaptive filter 410 and all remaining taps not having a zero-valued coefficient can be treated as comprising an adaptive tap section 410B of the combined FIFO and adaptive filter 410.

Preferably, the number of taps of the adaptive filter 410 having a zero-valued coefficient is calculated from the equation:

where:

N ₁ is the number of taps of the adaptive filter having a zero-valued coefficient;

d is the direct path distance between the speaker and the microphone;

c is the speed of sound in air; and

F _S is the system sample rate.

Preferably also, the adaptive filter 410 is treated as a combined FIFO and adaptive filter based on the equation:

N = N ₁ + N ₂

where:

N is the total number of taps of the combined FIFO and adaptive filter;

N ₁ is the number of taps comprising a FIFO section of the combined FIFO and adaptive filter;

N ₂ is the number of taps comprising an adaptive section of the combined FIFO and adaptive filter.

In one embodiment, the method involves determining or obtaining a direct path distance d between the local loudspeaker 150 and the local microphone 220 on only one occasion at the start of operation and adjusting operation of the adaptive filter 410 based on said once only determination of the direct path distance d.

In another embodiment, the method involves determining or obtaining an initial direct path distance d between the local loudspeaker 150 and the local microphone 220 and thereafter only determining or obtaining a new value for the direct path distance d when it is sensed that one or other of the local loudspeaker 150 or the local microphone 220 has been moved, i.e. in response to inputs from one or other of the movement sensing means 180, 250 or any other suitable means.

In yet another embodiment, the method involves continuously or periodically determining or obtaining an initial direct path distance d between the local loudspeaker 150 and the local microphone 220 in order to dynamically adjust the input to the adaptive filter 410 by dynamically adjusting the sizes of the FIFO section 410A and the adaptive filter section 410B. As such, the method may further comprise the step of dynamically adjusting the values of N ₁ and N ₂ in response to changes in the determined distance d between the local loudspeaker 150 and the local microphone 220.

For example, if the AEC processor 160, 230 has enough computational speed (MIPS) to run, for example, a large adaptive filter such as a 2048 tap adaptive filter 410, then, if the local microphone 220 and the local loudspeaker 150 are a long way apart, much of this computational speed is wasted as the coefficients of taps of the adaptive filter 410 that represent the direct path time of flight between the local microphone 220 and the local loudspeaker 150 will converge to zero (because there cannot be any echo path which are shorter than the direct path) and, as such, these taps will require little or no processing. Doing this means that, in the case of a 2048 tap filter, the whole of the filter is now being used to cancel an even longer echo tail and thus giving better performance.

Preferably, the dynamic adjustment of the adaptive filter 410 is performed every say 10ms.

Fig. 5 shows a comparison of the performance of the known AEC system of Fig. 1 with the AEC system 400 in accordance with the invention. The AEC window of the known AEC system can be designed to very efficiently deal with echo cancellation where the distance d between the local loudspeaker and the local microphone is known and is fixed as illustrated by window 610 in part (a) of Fig. 5. However, as seen in parts (b) and (c) of Fig. 5, if the distance d is changed such that the local loudspeaker and the local microphone are moved apart, the known AEC system window 610 progressively fails to cover the part 630a of the acoustic environment signal 630 requiring AEC adaptive processing. In contrast, as shown by window 620, the ability provided by the method of the invention to adjust the adaptive filter 410 enables efficient AEC adaptive processing at any value of distance d.

In the method of the invention, the local microphone 220 and the loudspeaker 150 are preferably linear and thus should not contribute significantly to the system transfer function based upon the error signal.

It is envisaged that the adaptive filter 410 will comprise a finite impulse response (FIR) filter 410.

The method may be implemented at one or both of the conference unit 120 and the microphone unit 210.

The invention also provides a non-transitory computer readable medium 170, 240 storing machine readable code which, when executed by a processor 160, 230, causes an electronic processing device 120, 210 to implement the steps of the method hereinbefore described.

The invention also provides a microphone unit 210 comprising computer readable medium storing machine readable code 240 which, when executed by a processor 230 of said microphone unit 210, causes the microphone unit 210 to implement the steps of the method hereinbefore described.

The invention also provides a sound system 100 comprising a speaker unit 110 and a microphone unit 210. The sound system may also include a distance measuring unit 310. Preferably, the local microphone 220 and the local loudspeaker 150 are linear and thus do not contribute significantly to the system transfer function based on the error signal.

While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only exemplary embodiments have been shown and described and do not limit the scope of the invention in any manner. It can be appreciated that any of the features described herein may be used with any embodiment. The illustrative embodiments are not exclusive of each other or of other embodiments not recited herein. Accordingly, the invention also provides embodiments that comprise combinations of one or more of the illustrative embodiments described above. Modifications and variations of the invention as herein set forth can be made without departing from the spirit and scope thereof, and, therefore, only such limitations should be imposed as are indicated by the appended claims.

In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.

It is to be understood that, if any prior art publication is referred to herein, such reference does not constitute an admission that the publication forms a part of the common general knowledge in the art.

Claims

A method of processing signals by an N tap adaptive filter, the method comprising the steps of:
obtaining a direct path distance d between a speaker and a microphone;
based on said direct path distance d, calculating a number of taps of the adaptive filter having a zero-valued coefficient; and
processing input signals adaptively on all remaining taps of the adaptive filter not having a zero-valued coefficient.
The method of claim 1 further comprising the step of processing input signals on the taps having a zero-valued coefficient as First-In-First-Out (FIFO) filter taps.
The method of claim 1 or claim 2 further comprising the step of re-calculating the number of taps of the adaptive filter having a zero-valued coefficient in response to a change in the distance d between the speaker and the microphone.
The method of claim 3, wherein, after an initial determination of the direct path distance d, the step of obtaining the direct path distance d between the speaker and the microphone is performed in response to a sensed movement of one of the speaker or the microphone relative to the other.
The method of claim 3, wherein the step of obtaining a direct path distance d is performed dynamically.
The method of claim 5, wherein the step of obtaining a direct path distance d is performed periodically.
The method of any one of the preceding claims, wherein the number of taps of the adaptive filter having a zero-valued coefficient is calculated from the equation:

where:
N ₁ is the number of taps of the adaptive filter having a zero-valued coefficient;
d is the direct path distance between the speaker and the microphone;
c is the speed of sound in air; and
F _S is the system sample rate.
The method of claim 7, wherein the adaptive filter is treated as a combined FIFO and adaptive filter based on the equation:
N = N ₁ + N ₂
where:
N is the total number of taps of the combined FIFO and adaptive filter;
N ₁ is the number of taps comprising a FIFO section of the combined FIFO and adaptive filter;
N ₂ is the number of taps comprising an adaptive section of the combined FIFO and adaptive filter calculated according to claim 7.
The method of claim 8 further comprising the step of dynamically adjusting the values of N ₁ and N ₂ in response to changes in the distance d between the speaker and the microphone.
The method of any one of the preceding claims, wherein the adaptive filter comprises a finite impulse response (FIR) filter.
The method of any one of the preceding claims, wherein the step of obtaining a direct path distance d between a speaker and a microphone comprises any one or any combination of: cross-correlating the speaker drive signal and the microphone feedback signal to derive d; using a radio frequency (RF) ranging technique between the microphone and the speaker; using an ultrasound ranging technique between the microphone and the speaker; or using a machine vision system.
A non-transitory computer readable medium storing machine readable code which, when executed by a processor, causes an electronic processing device to implement the steps of the method of any one of the preceding claims.
A microphone unit comprising computer readable medium storing machine readable code which, when executed by a processor of said microphone unit, causes the microphone unit to implement the steps of the method of any one claims 1 to 11.
A sound system comprising a speaker unit and a microphone unit according to claim 13.