WO1997037514A1

WO1997037514A1 - Apparatus for processing stereophonic signals

Info

Publication number: WO1997037514A1
Application number: PCT/GB1997/000772
Authority: WO
Inventors: Richard Clemow; Fawad Nackvi
Original assignee: Central Research Laboratories Limited
Priority date: 1996-03-30
Filing date: 1997-03-20
Publication date: 1997-10-09
Also published as: EP0890295A1; DE69707847D1; EP0890295B1; DE69707847T2; TW357537B; DK0890295T3; JP2000507762A; GB9606814D0

Abstract

In order to simplify filter construction in a circuit which can be used for crosstalk cancellation in binaural signals to be played through loudspeakers, the circuit includes first and second signal paths for receiving as inputs, left and right binaural signals LEFT-IN, RIGHT-IN. The first signal path (14) includes a first summing junction (10) and the second signal path includes a second summing junction (12). The output of summing junction (10) is coupled by a first cross-path (14) to an input of summing junction (12), and the output of summing junction (12) is coupled by a second cross-path (22) to an input of junction (10). The first and second cross-paths include respective first and second filter means (17, 25) which include crosstalk filters (18, 26) each having a transfer function A/S, where A and S represent far-ear and near-ear HRTFs. The outputs of the summing junctions (10, 12) represent output signals LEFT-OUT, RIGHT-OUT, incorporating crosstalk cancellation.

Description

Apparatus for Processing Stereophonic Signals

This invention relates to apparatus for processing stereophonic signals, particularly though not exclusively binaural signals. The invention also relates to stereo expansion apparatus wherein stereophonic signals are processed to produce a greater impression of three dimensions.

The processing of binaural signals to produce highly realistic three-dimensional sound images is well known, see our International Patent Application No. WO 94/22278 (our ref. PQ12529). Binaural technology is based on recordings made using a so-called "artificial head" microphone system, and the recordings are subsequently processed digitally. The use of the artificial head ensures that the natural three-dimensional sound cues, which the brain uses to determine the position of sound sources in three- dimensional space, are incorporated into the stereo recording. Subsequent signal processing of the binaural signals ensures that transaural crosstalk is cancelled (crosstalk occurs when an audio signal intended for one ear of a listener is also received by the other ear) and that the three-dimensional cues are effective on playback of the material through loudspeakers, such that the brain can interpret the cues correctly. Without the processing, the recordings sound tonally incorrect and do not reproduce their three- dimensional attributes through loudspeaker auditioning. For the purposes of the present specification, the term "binaural signals" is intended to mean two-channel or stereophonic signals which include a component representing audio diffraction effects created by an artificial head means positioned between a pair of spaced apart microphones. The artificial head means may be, as is common, a precise model of a human head and torso, with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced microphones, which nevertheless creates diffraction signals from the source of sound signals; it may even be an electrical synthesis circuit or system which creates and applies such a signal component to stereophonic signals.

There are many applications for binaural reproduction where only a limited amount of processing power is available for processing the digital signals, for example in a personal computer. Known signal processing systems for binaural signals have in general required digital filters of some complexity for crosstalk cancellation, which makes them unsuitable for use in applications where processing power is limited.

Crosstalk-cancellation has been achieved in the prior art using a number of filter architectures. The filters represent various combinations of two basic functions, firstly the transfer function (S) between a first loudspeaker of a pair of loudspeakers and the ear of a listener closer to such first loudspeaker, and secondly a function (A) representing the transfer function from the same first loudspeaker to the far ear of the listener (closer to the other loudspeaker). These functions S and A are termed "head related transfer functions" (HRTFs), and such functions have been measured and are widely published - see for example HL Han, J. Audio Eng. Soc, Jan./Feb. 1994, 42, (1/2), pp.15-36. Of course, precise values of the HRTFs may vary if, instead of measurements on a real human head, the HRTF is derived from measurements or calculations based on a model; if the model chosen is simply a block of wood between the microphones then the transfer function will be much simpler than that of a realistic dummy head; for the purposes of this specification, "head related transfer function" is intended to cover all such functions as measured on a real head or measured or calculated from a model of a human head.

Referring now to Figure 1 this shows one form of filter architecture described in Figure 5 of US-A-3,236,949 to Atal and Schroeder where all the crosstalk-cancellation effects are built into a set of four filters Fl, F2, F3, F4. A binaural input has LEFT-IN and RIGHT-IN input signals, filter Fl feeding the LEFT-IN signal to a LEFT-OUT output via a summing junction 2, where the LEFT-IN signal is combined with a RIGHT- IN signal via filter F2. The RIGHT-IN signal is also fed through filter F4 and combined at summing junction 4 with LEFT-IN signal received via filter F3, to provide output signal RIGHT-OUT.

Figure 2 shows an alternative architecture as disclosed in GB-A-394,325 to Blumlein and US-A-4,893,342 to Cooper and Bauck where the filters are arranged as SUM and DIFFERENCE filters, with binaural LEFT-IN and RIGHT-IN signals being supplied to both filters via summing junctions 3 and sub tractor junction 5 and the outputs of the filters being fed to summing junction 6, and subtractor junction 8 to derive output signals LEFT-OUT, RIGHT-OUT. The arrangements of Figures 1 and 2 require filters of some complexity since they build in all the crosstalk-cancellation effects into one filter set. If an attempt is made to reduce the complexity of these filters too far, critical detail is lost and the arrangement becomes ineffective.

A third arrangement, shown in Figure 3, and disclosed in our copending application WO 94/22278 Our Reference (PQ 12529) also suffers from the same problem of complexity in a Y FILTER, although not in an X FILTER. In Figure 3, a binaural RIGHT-IN signal is combined via FILTER X with the LEFT-IN SIGNAL in summing junction 10, the output of junction 10 providing via FILTER Y a LEFT-OUT signal. The RIGHT-IN signal is combined in summing junction 12 with LEFT-IN signal supplied via FILTER X, and the output of junction 12 is provided via FILTER Y as a RIGHT-OUT signal.

A further filter architecture is shown in Japan Acoustics Institute Collected Lecture Papers May 1976 pages 659, 660 - Figure 6 - A Circuit Of Stereo Sound Image Synthesis - T. Doi and O. Hamada. A schematic diagram is shown in Figure 4, wherein the output of a summing junction 10 is applied through a filter A to the input of a summing junction 12, and through a filter B to provide an output signal LEFT-OUT. The output of summing junction 12 is applied through a filter A to the input of summing junction 10, and through a filter B to provide an output signal RIGHTOUT. The filters A, and B are complex in construction, but notably filters A in the cross feed paths comprise low pass filters in the same way that Blumlein (GB-A-394,325) does.

Summary of the Invention

It is an object of the present invention to provide a circuit incoφorating crosstalk cancellation filters for enabling binaural signals to be played over loudspeakers, such that relatively simple filters may employed.

The present invention provides in a first aspect apparatus for processing binaural signals including first and second signal paths for receiving respectively left and right binaural input signals, the first signal path including a first combining junction and the second signal path including a second combining junction, the output of the first combining junction being coupled by a first cross-path to an input of the second combining junction, and the output of the second combining junction being coupled by a second cross-path to an input of the first combining junction, wherein each of the first and second cross-paths includes crosstalk filter means having a transfer function A/S, where A and S represent respectively far-ear and near-ear HRTFs, as defined above, and wherein the outputs of the first and second combining junctions represent binaural output signals.

Each combining junction will commonly be a summing junction, but may be a subtracting or differencing junction. Since it is normally required to subtract a component of one channel from the other channel in order to compensate for crosstalk, if a summing junction is employed, then the cross-path should provide a signal inversion.

In a further aspect, the invention provides apparatus for processing binaural signals including providing left and right binaural input signals to respective first and second signal paths, the signal paths providing as outputs left and right binaural output signals, feeding the left binaural output signal via a first cross-path to the second signal path through a first cross-talk filter means and combining the filtered left binaural output signal with the right binaural input signal, and feeding the right binaural output signal via a second cross-path to the first signal path through a second cross -talk filter means and combining the filter right binaural output signal with the left binaural input signal, each of the first a crosstalk filter means having a transfer function A S, where A and S represent respectively far-ear and near-ear HRTFs, as defined above.

Such a circuit architecture permits crosstalk filter means of a particularly simple construction in order to realise the function A/S. It will be appreciated that a simple filter could not be used in the prior art configurations of Figures 1 to 4 because in those arrangements the filters have to deal with multiple cancellation problems whereas in the present invention, since the cross-paths extend between the output of one channel and a combining junction in the other, the multiple cancellation problem does not arise.

It will be appreciated that any electrical creation of binaural signals and HRTFs will inevitably involve some simplification and approximation to reality. In particular whilst as indicated above the binaural signals may be produced by a number of means, it is preferred that the artificial head means include ear structures, in which are located microphones, mounted on either side of a head structure in order to create the various cues necessary for realistic three dimensional sound reproduction in all situations. The combining junction will commonly be a summing junction, but may be a subtracting or differencing junction. Since it is normally required to subtract a component of one channel from the other channel in order to compensate for crosstalk, if a summingj unction is employed, then the cross-path should provide a signal inversion.

A particular advantage of the present invention arises in that it is fully compatible with the invention described in our copending International Patent Application No. WO 95/15069 (our ref. PQ 12582); this addressed one problem arising with binaural sound recordings which is that generally a listener has to sit still in a well-defined position relative to the loudspeakers, or the binaural effect is lost. In other words, there is a "sweet spot" of only small dimensions in which the binaural effect is produced. The International Application discloses a mechanism for broadening the "sweet spot" to accommodate head movement, by a mechanism involving a less than complete crosstalk cancellation, the crosstalk being reduced by a factor between 0.95 and 0.5. By inserting such an attenuation factor in the cross-paths, the sweet spot is accordingly broadened. Further by introducing such an attenuation factor, the circuit configuration is made more stable, in that the DC gain of the cross-paths is made less than one.

Brief Description of the Drawings

A preferred embodiment of the invention will now be described with reference to the accompanying drawings, wherein:

Figures 1 to 4 are examples of prior art cross-talk cancellation arrangements; Figure 5 is a schematic diagram of a preferred embodiment of the invention; Figures 6 and 7 are graphical representations of a first filter function of a filter for Figure 5 in terms of gain and phase versus frequency, as compared with a theoretically ideal filter function;

Figure 8 is a circuit diagram of a preferred filter for implementing the filter function of Figures 6 and 7; Figures 9 and 10 are graphical representations of a second filter function of a filter for Figure 5 in terms of gain and phase versus frequency, as compared with a theoretically ideal filter function;

Figure 11 is a circuit diagram of a preferred filter for implementing the filter function of Figures 9 and 10;

Figures 12 and 13 are graphical representations of a third filter function of a filter for Figure 5 in terms of gain and phase versus frequency, as compared with a theoretically ideal filter function;

Figure 14 is a circuit diagram of a preferred filter for implementing the filter function of Figures 12 and 13;

Figures 15 and 16 are graphical representations of a fourth filter function of a filter for Figure 5 in terms of gain and phase versus frequency, as compared with a theoretically ideal filter function; and

Figure 17 is a circuit diagram of a preferred filter for implementing the filter function of Figures 15 and 16.

Description of the Preferred Embodiment

Referring now to Figure 5, in the preferred embodiment of the present invention LEFT-IN binaural input signal is applied to a first summing junction 10, and a RIGHT- IN binaural input signal is applied to a second summing junction 12. The output of the summing junction 10 is applied to an input of the second summing junction 12 through a first cross-path 14 which includes a filter means 17 comprising a delay 16, and a cross¬ talk cancellation filter 17. The cross-path 14 also includes a gain control unit 20. The output of the summing junction 12 is applied to an input of the first summing junction 10 through a second cross-path 22 which includes a filter means 25 comprising a delay 24, and a cross-talk cancellation filter 26. The path 16 also includes a gain unit 28. The outputs of the summing junctions provide output signals LEFT-OUT, RIGHT-OUT.

It may be seen from inspection, where k represents the overall transfer function of each cross-path:

LEFT-OUT = LEFT-IN + k.RIGHT-OUT (i)

RIGHT-OUT = RIGHT-IN + k.LEFT-OUT (ii)

Substituting for RIGHT-OUT in (i) LEFT-OUT = LEFT-IN + k.RIGHT-IN + k².LEFT-OUT Hence, by rearrangement

LEFT-OUT = (1- k )-l(LEFT-IN + k.RIGHT-IN) (iii)

With a similar equation for RIGHT-OUT. The filter transfer function for each filter 18, 26 is -A/S, S being the same-side transfer function (from a speaker to the nearest ear), and A the alternate side transfer function. Delays 16, 24 introduce a time delay τ, which is the time delay difference of the two functions A and S. This results in theoretically perfect crosstalk-cancellation for unity gain cross-coupling paths in the circuit of Figure 5. However, the preferred embodiment includes a gain factor x of slightly less than unity introduced by gain control units 20, 28. The reason for this is as follows. In practice S and A are often measured from an artificial head. Very low frequency measurements are very difficult to make due to the difficulty of generating very low frequency acoustic signals. It is therefore common practice to force artifically the A and S functions to the same gain at zero frequency. This makes the A and S functions behave as if the source is distant. The gain of the filter -A/S consequently has a gain of (minus) unity at zero frequency. The arrangement of Figure 5 without the gain control units 20, 28 would therefore have positive feedback of unity at low frequencies and would be unstable. By promidu's the gain control units 20, 28 a DC gain factor of less than one avoids this problem. A secondary benefit of promidu's the gain control units 20, 28 is that the amount of crosstalk-cancellation is reduced, and the benefits disclosed in our International Patent Application WO 95/15069 as discussed above are realised. Hence the output signals may be expressed thus:

LEFT-OUT = (l-x²A²S ²) '(LEFT-IN -xAS ¹. RIGHT-IN).... (iv) With a similar equation for RIGHTOUT.

Referring now to Figure 6, this is a graphical representation in terms of gain versus frequency of the theoretical value 40 of the function -A S, derived from measurements on an artificial head, and an approximation function 42 provided by a filter in accordance with the invention. The crosstalk-cancellation reduction factor of between 0.5 and 0.95 is not shown on this graph, but is implemented as the GAIN function 20 and 28 in Figure 5. From Figure 6 it will be seen that each filter 18, 26 has a pronounced dip at around 7 kHz and a pronounced peak at around 9 kHz in order closely to approximate to the theoretical function. It will be understood that it is practically not feasible to implement a filter which reproduces each and every detail of a theoretical A/S function, as it would reqi e a great many filter stages and further the details of the function would vary depending on the precise measurement conditions.

Figure 6 also shows a plot of the poles and zeroes of the filter whose response is shown as 42. The approximation may be made as accurate as desired, but for the purposes of this example, a filter with 4 poles and 4 zeroes is shown. Figure 7 is a graphical representation in terms of phase versus frequency of the theoretical -A/S function 40 and of the approximation filter 42. The time delay element of A has been omitted for clarity.

One purpose of the invention is to make the implementation of the invention economical on processing power as described earlier. IIR filters are particularly appropriate and one preferred crosstalk filter is shown in Figure 8 for implementing the approximation curve of Figure 6, consisting of two cascaded second-order IIR sections.

The filter 18, 26 requires 8 multipliers. In Figure 8, the two cascaded second order sections 50, 52 have similar configurations, and in each section, an input signal is passed to a summing junction 54 where summing occurs with an output from summing junction 56. The output of junction 54 is applied to a further summing junction 58 and to two one-sample delay units 60, 62. The output of delay unit 60 is scaled in a multiplier 64 by a coefficient B1/B0 and applied to an input of summing junction 58, and is scaled by a coefficient Al in a multiplier 66 and applied to an input of summing junction 56. The output of delay unit 66 is scaled by coefficient A2 in a multiplier 68 and applied to an input of summing junction 56, and is scaled by coefficient B2/B0 in a multiplier 70 and applied to a summing junction 72. Summing junction 72 also receives the output signal from summing junction 58, and provides an output signal.

It will be appreciated that the transfer function of each filter section can be represented as follows: output/input = (1 + Z 'Bl/BO + z^~2 B2/B0)(1-Z ¹A1-Z^"2A2)^"1 i.e. a second order filter with two poles and two zeros, z is the well known z transform.

The curve 40 in Figure 6 is an example of data derived from measurements on an artificial or human head. It contains some unwanted detail, caused by for example spurious resonances, antiresonances and reflections. For example, the sharp peak at around 9 kHz and the sharp dip at around 16 kHz are probably due to such effects. A good approach is therefore to smooth curve 40 before trying to design a filter to fit it.

Figure 9 shows a graph similar to Figure 6, except that the measured -A/S function 76 has been smoothed but still retains the important characteristics of the function. Curve 78 shows the response of an approximation filter which closely follows the desired response, with an error of less than 2 dB in the range 0 to 15 kHz. Figure 10 shows graph of phase against frequency of the theoretical - A/S junction 76 and the approximation filter and is analogous to Figure 7. A secondary benefit of this smoothing process is that a simpler filter 18, 26 can be designed to fit the curve 78, in line with the objectives of the invention. One implementation of this filter is shown in Figure 11, using 5 multipliers, wherein similar parts to those of Figure 8 have the same reference numerals. The second section 80 of the filter 18, 26 is simplified, having a summing junction 82 receiving as one input the output of stage 50. A delay 84 and a multiplier 86 are coupled between the output and a further input of junction 82. The output of summing junction 82 provides an output to the stage.

Even further reduction in the filter length 18, 26 can be accomplished if some approximation of the filter 18, 26 is acceptable. Figures 12 and 13 show an example. In Figure 12, the approximation filter function 90 has an error of less than 5 dB over the full frequency range, with positive and negative errors distributed equally. For many applications, this approximation may be satisfactory. One implementation is shown in Figure 14, using only 2 multipliers, wherein similar parts to those of Figure 8 have the same reference numerals. Thus an input is provided to a summing junction 54 whose output is coupled to one input of summing junction 58, which provides an output signal. The output of junction 54 is also coupled to a delay unit 60, which is coupled to an input of junction 54 by a multiplier 66 providing a coefficient Al and to an input of junction 58 by a multiplier 64 providing a coefficient B1/B0.

A further level of simplification is possible, using only one multiplier, as shown in Figures 15 and 16. Here, the desired approximation function 100 is only followed accurately up to 6 kHz, and thereafter the response cannot be made to follow the desired curve accurately, but the low frequency region is more important than the higher frequency region. One possible implementation is shown in Figure 17, wherein parts similar to that of Figure 11 are denoted by the same reference numerals. Thus a summing junction 82 receiving as one input an input signal INPUT. A delay 84 and a multiplier 86 are coupled between the output and a further input of junction 82. The output of summing junction 82 provides an output signal OUTPUT.

All the filters disclosed above will produce crosstalk cancellation without major artefacts and loss of three-dimensional impression, although the more faithfully the filter follows the ideal A/S, the better is the sound impression. In particular, for higher frequencies above 10 kHz, height cues are present and therefore accurate cancellation is desirable at higher frequencies to create a height impression.

The filters disclosed above were tested by a group of listeners, listening to a binaural music track arranged to rotate a sound image "perfectly" around the listener in the horizontal plane, and applying the cross talk cancellation filters to determine their effect. For the simpler filters, unless the filter characteristics were carefully optimised as shown, undesirable effects might occur such as the rearward, directly-behind-the-head positions fail, and the source reverts to a frontal position; the image may start to separate, with e.g. vocals, bass, percussion etc. separating spatially.

Claims

1. Apparatus for processing binaural signals including first and second signal paths for receiving respectively left and right binaural input signals, the first signal path including a first combining junction and the second signal path including a second combining junction, the output of the first combining junction being coupled by a first cross-path to an input of the second combining junction, and the output of the second combining junction being coupled by a second cross-path to an input of the first combining junction, wherein each of the first and second cross-paths includes crosstalk filter means having a transfer function A/S, where A and S represent respectively far-ear and near-ear HRTFs, as defined above, and wherein the outputs of the first and second combining junctions represent binaural output signals.

2. Apparatus according to claim 1, wherein each cross-path has a gain control means which provides a frequency independent attenuation factor of between 0.95 and 0.5.

3. Apparatus according to claim 1 or claim 2, wherein each combining junction is a summing junction, and each cross-path provides a signal inversion.

4. Apparatus according to any one of the preceding claims, wherein each crosstalk filter means includes one or more sections at least one of which comprises a second order infinite impulse response (IIR) filter.

5. Apparatus according to claim 4, wherein each crosstalk filter means includes first and second sections connected in cascade, each section comprising a second order IIR filter.

6. Apparatus according to claim 4 or 5, wherein the or each IIR filter includes a first summing junction for receiving an input signal, coupled via first and second delay elements to a second summing junction for providing an output, wherein feedback and feed-forward, paths including coefficient multipliers, are provided between the delay elements and the summing junctions.

7. A method of processing binaural signals including providing left and right binaural input signals to respective first and second signal paths, the signal paths providing as outputs left and right binaural output signals, feeding the left binaural output signal via a first cross-path to the second signal path through a first cross-talk filter means and combining the filtered left binaural output signal with the right binaural input signal, and feeding the right binaural output signal via a second cross-path to the first signal path through a second cross-talk filter means and combining the filter right binaural output signal with the left binaural input signal, each of the first a crosstalk filter means having a transfer function A/S, where A and S represent respectively far-ear and near-ear HRTFs, as defined above.

8. Apparatus for processing binaural signals substantially as herein described with reference to and as shown in figures 5 to 17 of the accompanying drawings.

9. A method of processing binaural signals substantially as herein described with reference to figures 5 to 17 of the accompanying drawings.