CA2058984C - Polyphonic coding - Google Patents

Polyphonic coding

Info

Publication number
CA2058984C
CA2058984C CA002058984A CA2058984A CA2058984C CA 2058984 C CA2058984 C CA 2058984C CA 002058984 A CA002058984 A CA 002058984A CA 2058984 A CA2058984 A CA 2058984A CA 2058984 C CA2058984 C CA 2058984C
Authority
CA
Canada
Prior art keywords
filter
signal
channel
sum
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002058984A
Other languages
French (fr)
Other versions
CA2058984A1 (en
Inventor
Christopher Ellis Holt
Edward Munday
Barry Michael George Cheetham
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=10658483&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CA2058984(C) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Publication of CA2058984A1 publication Critical patent/CA2058984A1/en
Application granted granted Critical
Publication of CA2058984C publication Critical patent/CA2058984C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/86Arrangements characterised by the broadcast information itself
    • H04H20/88Stereophonic broadcast systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques

Abstract

A polyphonic (e.g stereo) audioconferencing system, in which input left and right channels are time-aligned by variable delay stages (10a, 10b), controlled by a delay calculator (9) (e.g. by deriving the maximum cross-correlation value), and then summed in an adder (2) and subtracted in subtracter (3) to form sum and difference signals. The sum signal is transmitted in relatively high quality; the difference signal is reconstructed at the decoder by predictions from the sum signal using an adaptive filter (5), The decoder adaptive filter (5) is configured either by received filter coefficients or, using backwards adaptation, from a received residual signal produced by a corresponding adaptive filter (4) in the coder, or both. Preferably, the adaptive filter (4) is a lattice filter, employing a gradient algorithm for coefficient update. The complexity of the adaptive filter (4) is reduced by pre-whitening, in the encoder, both the sum and difference signals using corresponding whitening filters (14a, 14b) derived from the sum channel.

Description

"6,,~ 2~

:
POLYPHONIC CODING

t This invention relates to polyphonic coding technique~, particularl~, but not exclusively, for coding speech signals.
It is well~known that polyphonic, specifically stereophonic, sound is more perceptually appealing than monophonic sound. Where several sound sources, say within a conferance room, are to be transmitted to a second room, polyphonic sound allows a spatial reconstruction of the original sound field with an image of each sound source being perceived at an identifiable point corresponding to its position in the original conference room. This can ~l1min~te confusion and misunderst~n~;nys during audio-conference discussions ;~ since each participant may be identified both by the sound of his voice and by his perceived position within the conference room.
Inevitably, polyphonic tr~n~m~Qsions require an increase in transmission capacity as compared with monophonic transmissions. The conventional approach of transmitting two independent channels, thus doubling ~he required transmission capacity, imposes an unnaceptably high cost penalty in many applications and is not possible in some cases because of the need to use e~isting ~h~nne with fL~ed tr~nsmission capacities.
2~ In stereophonic (i.e. two-channel polyphonic) systems, two microphones (hereinafter referred to as left and right microphones), at difEerent positions, are used to pick up sound generated within a room (for e~ample by a person or persons speaking). The signals pic~ed up by the microphones are in general diEEerent. Each microphone siynal (referred to hereinafter as ~L(t) with ~p~ace ~ .

wn 90/~6136 ~ ~ ~ pcr/~90/00928 ~'~

trans~orm ~L(s) and xR(t) with Laplace transform ~(s) respectively) may be considered to be the superposition of source signals processed by respective acoustic transfer functions. These transfer functions are strongly affected by the distances between the sound sources and each microphone and also by the acoustic properties o~ the room. Taking the case of a single source, e.g. a single person speaking at some fixed point within the room, the distances between the source and the left and right microphones give rise to different delays, and there will also be different degrees of attenuation.
In most practical environments such as conference rooms, the signal reaching each microphone may have travelled via many reflected paths (e.g. from walls or ceilings) as well as directly, producing time spreading, frequency dependent colouration due to resonances and antiresonances, and perhaps discrete echos.
From the foregoing, in theory, the signal from one ; microphone may be formally related to that Ero~ the other by designating an inter~h~nnel transfer function H say;
i.e. ~(s) = Hts) ~ (s) where s is complex frequency para~eter. This statement is based on an assumption of linearity and time-invariance for the effect o~ room acoustics on a sound signal as it travels from its source to a microphone. However, in the absence of knowledge as to the nature of H, this statement does no more than postulate a correlation between the two signals. Such a postulation seems inherently sensible, however, at least in the special case of a single sound source, and therefore one way of reducing the bi~-rate needed to represent stereo signals should ~e to reduce the redundancy o~ one relative to the other (to reduce this correla~ion) prior to transmission and re-introduce it after reception.

. 2~g~3~.
~o 90/16136 PCT/GB90/00~28 In general, H(s) is not unique and can be signal- and time- dependent. However when the source signals are white and uncorrelated, i.e. when their autocorrelation ~unctions are zero except at t=0 and their cross-correlation functions are zero for all t, H(s) will depend on fa~tors not subject to rapid change, such as room acoustics and the positions of the microphones and sound sources, rather than ~he nature o~ the source signals which may be rapidly changing.
To realise such a system in physical form, the fundamental problems of causality and stability must be overcome. Consider for a moment a single source signal which is delayed by dL seconds before reaching the left microphone and by dR seconds before reaching the right microphone (although the point to be made has more general implications). If the source is near to, say, the left microphone, then dL will be smaller than dr. The interchannel transfer function H(s) must delay xL(t) by the difference between the two delays, dR ~ dL to produce the right channel xR(t). Since dR ~ dL is positive, H(s) will be causal. If the signal source is now moved closer to the right microphone than to the left, dR ~ dL becomes negative and H(s) ~e~:-es non-causal;
in other words, khere is no causal relationship between the right channel and the le~t ch~nnPl, but rather the reverse so the right ch~nnPl can no longer be predicted from the left ch~nnel, since a given event occurs first in the right ch~nnPI. It will therefore be realised that a simple system in which one fi~ed channel is always transmitted and the other is reconstructed ~rom it is impossible to realise in a direct sense.
According to a first aspect of the invention, there is provided a polyphonic signal coding apparatus comprising:
' 35 - means for receiving at least two input ch~nn~lq from different sources;

:

w O 90/16136 ~ 5 PCT/GB90/00928 f' '~

- means for producing a sum channel representing the sum of such signals, and for producing at least one difference channel representing a difference therebetween;
- means for periodically generating a plurality of parametric coefficients which, if applied to a plural order predictor filter, would enable the prediction of the di~ference ~h~nnpl from the sum channel thus filtered; and - means for outputtiny data representing the said sum channel and data enabling the reconstruction of the said di~ference channel therefrom.
In a first embodiment, the difference signal reconstruction data are filter coefficients. In a second embodiment, the residual signal representing the difference between the difference signal and the sum signal when thus filtered is formed at the transmitter, and this is transmitted as the difference signal reconstruction data. In this embodiment, the prediction residual signal may be efficiently encoded to allow a backward adaptation technique to be used at the decoder for deriving the prediction filter coefficients. The residual is also used as an error signal which is added to the prediction ~ilter's output at the decoder to correct for innaccuracies in the prediction o~ the difference channel from the sum channel. This llresidual only~
emho~;r~~t is also useful where the laft channel, say, is predicted from the right channel (without forming sum and difference signals) - provided suitable measures are taken : to ensure ca~ l;ty - to give high quality polyphonic reproduction. In a third embodiment, both are transmitted.
Pre~erably, the means for generating the filter coèfficients is an adaptive filter, advantageously a lattice filter. This type of filter also gives advantages in non-sum and di~ference polyphonic systems.
., -~ 2 ~ 8 ~- 10 go/16136 pcr/GBso/oos2g -- 5 ~
.

In preferred embodiments, variable delay means are disposed in at least one of the input signal paths, and controlled to time align the two signals prior to Eorming the sum and difference signals so that causal prediction filters of reasonable order can he used.
; This aspect of the invention has several important advantages:
(i) The 'sum signal' is fully compatible with monophonic encoding and is unaffected by the polyphonic coding except for the introduction of an imperceptible delay. In the event of loss of stereo, monophonic back-up is thus available.
(ii) The sum signal may be transmitted by conventional ; low bit-rate coding techniques (eg. LPC) without modification.
(iii) The encoding technique for the difference signals can be varied to suit the application and the available transmission capacity between the above three embodiments. The type of residual signal and prediction coefficients can also be selected in various different ways, while still conforming to the basic encoding principle.
(iv) Overall, the apparatus encodes polyphonic signals with only a modest increase in bit-rate requirement ; 25 as compared with monophonic transmission.
(v) The encoding is digital and hence the performance o~ the apparatus will be predictable, not subject to ageing effects or component drift and easily mass-produced.
A method of ~alculating appro~imations to H(s) when the source signals are not white (which, of course, includes all speech or music signals) is proposed in a second aspect of the invention, using the idea of a 'prewhit~ning ~ilterl.

.
. .

W O 90/16136 ~ PCT/~B90/00928 According to a second aspect o~ the invention, there is provided a polyphonic signal coding apparatus comprising:-- means for recelving at least two input channels;
- means for filtering each input channel in accordance with a filter approximating the spectral inverse of a first of said chalmels to produce respective filtered chanels, the first said filtered channel thereby being substantially spectrally whitened;
_ means for receiving said filtered chanels and for periodically generating parametric data for each filtered channel (other than said first), which would enable the prediction of each input channel from said first; and - means for outputting data representing the first channel, and data representing said parametric data.
This aspect of the invention provides, as above, the advantages of a digital system compatible with existing techniques and simplifies the process of modelliny (at the encoder) the required interchannel transfer function.
Broadly corresponding decoding apparatus is also provided according to the invention, as are systems including such encoding and decoding apparatus, particularly in a audioconferencing application, but also in a polyphonic recording application. Other aspects of the invention are as cl~;med and disclosed herein.
The words "prediction" and "predictor" in this specification include not only prediction of ~uture data from past data, but also estimation of present data o~ a ch~nnPl from past and present data of another ~hAnnel.
Thé invention will now be illustrated, by way of example only, with reference to the ac~ p~nying drawings in which:
- Figure l illustrates generally an encoder according to a first aspect of the invention;

10 90/16136 2 ~ PCTiGB90/009~8 ; - Figure 2 illustrates generally a corresponding decoder;
- Figure 3a illustrates an encoder according to a preferred embodiment of the invencion;
- Figure 3b illustrates a corresponding decoder;
- Figures 4a and 4b show respectively a corresponding encoder and decoder according to a second aspect of the invention.
- Figures 5a and 5b illustrate an encoder and a decoder according to a second aspect of the invention;
- Figure 6 illustrates part of an encoder according to a yet further embodiment of the invention.
The embodiments illustrated are restricted to 2 channels (stereo) for ease of presentation, but the invention may be generalised to any number of channels.
One possible way of removing the redundancy between two input signals (or predicting one from the other) would be to connect between the two channels an adaptive predictor ~ilter whose slowly changing parameters are calculated by standard techniques (such as, for example, hlock cross-correlation analysis or sequential lattice adaptation). In an audioconferencing environment, the two signals will originate from sound sources within a room, and the acoustic transfer function between each source ~nd '' 25 each microphone will be characterised typically by weak poles (~rom room resonances) and strong zeros (due to absorption and destructive interference). An all-zero filter could there~ore produce a reasonable appro~imation to the acoustic transfer function between a source and a i 30 microphone and such a filter could also be used to predict say the left microphone signal ~L(t) from ~ (t) when the source is close to the right microphone. However, if the source were now moved away from the right microphone and placed close to the left, the nature of the required , ~ ~ 5 ~ v ~ ~
W o 90/16136 PCT/GB90/00928-- 8 ~

filter would be effectively inverted even when delays are introduced to guarantee causality. The filter mus~ now mo~el a transfer function with weak zeros and strong poles - a difficult task for an all-~ero ~ilter. Other types of filter are not, in general, inherently stable. The net effect of this is to cause ~nequal degradation in the reconstr~cted chA~nel when the source shifts from one microphone to the other. This further makes the simplistic prediction of one channel tsaY~ the left) ~rom o the other (say, the right) hard to realise.
In a system according to the first aspect of the invention, better results have been obtained by forming a "sum signal" xs(t) = xL(t~ ~ ~ (t) and predicting either a difference signal xD(t) = xL(t) - xR(t) or simply xL(t) or xR(t) using an all-zero adaptive digital filter.
In practice, xR(t) and xL(t) (or xs(t) and xD(t) ) will be processed in sampled data form as the digital signals xR[n] and xL[n] ( or xS[n] and xD~n] ) and ik will be more convenient to use the z-transform' transfer fuction H(z) rather than H(S).
Referring to Figure 1, in its essential form the invention comprises a pair of inpu~s la, lb for receiving a pair of speech signals, e.g. from left and right microphones. The signals at the inputs, x~(t) and xL(t), may be in digital form. It may be convenient at this point ~o pre-process the signals, e.g. by band limiting. ~ach signal is then supplied to an adder 2 and a subtractor 3, the output of the adder being the sum ~ 30 signal xs(t) = xR~t) ~ xL(t), and the output ~ the subtracter 3 being the difference signal xD(t) = ~(t) + xL(t) i.e. XD(t) = H(s) Xs(s). The sum and dif~erence signals are then supplied to filter derivation stage ~, which derives the coefficients of a multi-stage A'(~ 90/16136 PCr/C~B90/00928 _ g _ prediction filter which, when driven with the sum signal, will approximate the diEference signal. The di~erence between the approximated difference signal and th~ actual difference signal, the prediction residual signal, will usually also be produced (although this is not invariably necessary). The sum signal ts then encoded (preferably usin~ LPC or sub-band coding), for transmission or storage, along with further data ~n~hl ing reconstruction of the difference signal. The filter coefficients may be o sent, or alternatively (as discussed further below), the residual signal may be transmitted, the difference channel being reconstituted by deriving the filter parameters at the receiver using a backwards adaptive process known in the art; or both may be transmitted.
Although it would be possible to calculate filter parameters directly (using LPC analysis techniques), one simple and effective way of provid mg the derivation stage 4 is to use an adaptive filter (for e~ample, an adaptive transversal filter) receiving as input the sum 2~ ch~nn~l and modelling the difference chAnnPl so as to reduce the prediction residual. Such general techniques of filter adaptation are well-known in the art.
Our initial experiments with this structure have used a transversal FIR filter with coefficient update by an algorithm for min;mi~ing the m~an square value of the residual~ which is slmple to implement. The filter coefficients change only slowly because the room accoustic - (and hence the interchannel transfer function) is relatively stable.
Referring to Figure 2, in a corresponding receiver, the sum signal xs(t) is received together witll either the filter parameters or the residual signal, or hoth, for the di~ference channel, and an adaptive filter 5 corresponding to that for which the parameters were ~ W ~ 90/1613S p~T/GBso/oo92~ '~

derived at the coder receives as input khe sum signal and produces as output the reconstructed difference signal when configured either with the received parameters or with paranleters derived hy backwards adaptation ~rom the 5 received residual signal. Sum and difference signals are then bo~h fed to an adder 6 and a subtracter 7, which produce as outputs respectively the reconstructed left and right channels at output nodes 8a and 8b.
Since a high-quality sum signal is sent, the encoder is fully mono-compatible. In the event of loss of stereo information, monophonic back-up is thus available.
As discussed above, one component of the transfer functions HL and ~ is a delay component relating to the direct distance between the signal source and each of the microphones, and there is a corresponding delay difference d. There is thus a strong cross-correlation between one channel and the other when delayed by d.
This method, however, requires considerably processing power.
An alternative method of delay estimation found in papers on sonar research is to use an adaptive filter.
The leEt channel input is delayed by half the filter length and the coefficients are updated using the LMS
algorithm to m;n;m;~e the mean-square error or the output. The transversal filter coefficients will, in theory, become the required cross-correlation coefficients. This may seem like unnecessary repetition o~ filter coefficient derivation were it not for the proper~y of this delay estima~or that the r x;m11m value of the cross-correlation coefficient (at the position of the r~x;m--m filter coefficien~) is obtained some ~ime before the ~ilter has converged. This methcd may be improved further because spatial inEormation is also available Erom the relative amplitudes of the input channels; this could ,., ,~, h ~
- /0 90/t6136 - ll - PCT/GB90/00928 be used to apply a weighting function to the filter coe~ficients to speed convergence.
Referring to Figure 3a, in a preferred embodLment of the invention, the complexity and length of the filter to be calculated is therefore reduced by calculating the required value of d in a delay calculator stage 9 (preferably employing one of the above methods), and then bringing the channels into time alignment by delaylng one or other by d using, for example, a pair of variable delays lOa, lOb (although one fixed and one variable delay could be used) controlled by the delay calculator 9. With the major part of the speech information in the channels time aligned, the sum and difference signals are then formed.
Referring to Figure 3b, the delay length d is preferably transmitted to the decoder, so that after reconstructing the difference channel and subsequently the left and right channels, corresponding variable length delay stages lla, llb in one or other of the chAnne~s can restore the interchannel delay.
In the illustrated structure, the ~sum" signal is thus no longer quite the true sum of xL(t) + xR(t); because of the delay d it is xL(t) + ~R(t-d). It may therefore be preferred to locate the delays lOa, lOb (and, possibly, the delay calculator) downstream of the adder and subtractor 2 and 3; this gives, for practical purposes, the same benefits of reducing the necessary filter length.
In practice, the delay is generally imperceptible;
~ 30 typically, up to l.6 ms. Alternatively, a fi~ed delay, sufficiently long to guarantee ca1~qA1ity, may be used, thus removing the need to encode the delay parameter.
In the first ~ nt of the invention, as stated above, only the filter parameters are transmitted as W O 90/16136 ~ l PCT/GB90/~0928 difference signal data. With 16 bits per coefficient, this meant that a transmission capacity of 5120 bits/sec is needed for the difference channel (plus 8 bits for the delay parameter). This is well wlthin the capacity of a stan~ard 6~ kbit/sec transmission system used which allocates 48 kbits/sec to the sum channel (efficiently transmitted by an existing monophonic encoding technique) and offers 16 kbits/sec for other "overhead" data. This mode of the embodiment gives a good signal to noise ratio o and the stereo image is present, although it is highly dependent on the accuracy of the algorithm used to adapt the predictive filter. Inaccuracies tend to cause the stereo image to wander during the course of a conference particularly when the conversation is passed from one speaking person to another at some distance from the first.
Referring to Figure 4a, in a second embodiment of the invention, only the residual signal is transmitted as difference signal data. The sum signal is encoded (12a) using, for example, sub-band coding. It is also locally decoded (13a) to provide a signal equivalent to that at the decoder, for input to adaptive filter 4. The residual difference channel is also encoded (possibly including b~n~l;m;ting) by residual coder 12b, and a corresponding ~ local decoder 13b provides the signal m;n;~;~ed to adapt : 25 filter 4. The advantage this creates is that inaccuraciesin generating the parameters cause an increase in the dynamic range of the residual channel and a correspon~;n~
decrease in SNR, but with no loss in stereo image.
Referring to Figure 4b, at the decoder, the analysis ~ 30 filter parameters are recovered froM the transmitted residual by using a backwards-adapting replica filter 5 of the adaptive filter 4 at the coder. Decoders 13c, 13d are identical to local decoders ~3a, 13b and so the filter 5 receives the same inputs, and thus produces the same parameters, as that of encoder filter 4.

'!O 90/16136 2 ~ PCr/GB90/00928 In a further embodiment (not shown), both filter parameters and residual signal are transmitted as side-information, overcoming many of the problems with the residual only embodiment because the important stereo information in the first 2 kHz is preserved intact and the relative amplitude information at higher frequencies is largel~ retained by the filter parameters.
Both the above residual-only and hybrid (i.e. residual plus parameters) e~bodiments are preferably employed, as o described, to predict the difference channel from the sum channel. However, it is found that the same advantages of retaining the stereo image (albeit with a decrease in SNR) are found when the input channels are left and right, rather than ~um and difference, provided the problem of causality is overcome in some manner (e~g. by inserting a relatively lony fixed delay in one or other path). The scope of the invention therefore encompasses this also.
The parameter-only embodiment described above preferably uses a single adaptive filter 4 to remove redundancy between the sum and difference ch~nnPl~. An effec~ discovered during testing was a curious 'whispering' effect if the coefficients were not sent at a certain rate, which was far above what should have been necessary to describe changes in the acoustic environment. This was because the adaptive filter, in addition to ~delling the room acoustic transfer function, was also trying to perform an LPC analysis of the speech.
This is solved in the second aspect of the invention by whitening the spectra of the input signals to the adaptive filter as shown in Figure 5, so as to reduce the ; rapidly-ch~ng; ng speech co-ponPnt leaving principally the room acoustic component.
In the second aspect of the invention, the adaptive filter 4 which models the acoustic transfer functions may W O 90il6l36 ~ g'l PCT/GB90/00928 ~

, be the same as before (for example, a lattice filter of order lO). The sum channel is passed through a whitening filter 14a (which may be lattice or a simple transversal structure).
The master whitening Eilter 14a receives the sum channel and adapts to derive an approximate spectral inverse filter to the sum signal (or, at least, the speech components thereof) by m;nimi~ing its own OlltpUt. The output of the filter 14a is therefore substantially white. The parameters derived by the master filter 14a are supplied to the slave whitening filter 14b, which is connected to receive and filter the difference signal.
The output of the slave whitening filter 14b is therefore the difference signal filtered by the inverse of the sum signal, which substantially removes common signal components, reducing the correlation between the two and leaving the output of 14b as consisting primarily of the acoustic response of the room. It thus reduces the dynamic range of the residual considerably.
The ef~ect is to whiten the sum channel and to partially whiten the difference channel without affecting the spectral differences between them as a result of room acoustics, so that the derived coefficients of adaptive filter 4 are model parameters of the room acoustics.
In one Pmhod;m~nt, the coefficients only are transmitted and the decoder is simply that of Figure 2 (needing no further filters). In this embodiment, of course, residual encoder 12b and decoder 13b are omitted.
An adaptive filter will generally not be long enough to filter out long-term information, such as pitch information in spe~ch, so the sum channel will not be completely nwhite". However, if a long-term predictor (~nown in LPC coding) is additionally employed in filters 14a and 14b, then filter 4 could, in principle, be ~ O 90/16136 2 ~ 8 ~ PcT/~B90/nn928 connected to filter the difference ~h~nnel alone, and thus to model the inverse of the room acoustic.
Since this second aspect of the invention reduces the dynamic range of the residual, it is particularly advantageous to employ this whitening scheme with the residual-only transmission described above. In this case, prior to backwards adaptation at the decoder, it is necessary to filter the residual using the inverse of the whitening ~ilter, or to filter the sum channel using the o whitening filter. Either ~ilter can be derived from the sum channel information which is transmitted.
Referring to Figure 5b, in residual-only transmission, an adaptive whitening filter 24a (identical to 14a at the encoder) receives the (decoded) sum channel and adapts to whiten its output. A slave filter 24b (identical to 14b at the encoder) receives the coefficients of 24a. Using the whitened sum channel as its input, and adapting from the (decoded) residual by backwards adaptation, adaptive filter S regenerates a filtered signal which is added to the (decoded) residual and the sum is filtered by slave filter 24b to yield the difference ch~nnel. The sum and difference ch~nn~l~ are then processed (6, 7 not shown) to yield the original left and right channels.
In a further embodiment (not shown), both residual and coefficients are transmitted.
Although this pre-whitening aspect of the invention has been described in relation to the preferre~ em~odiment of the invention using sum and difference channels, it i5 also applicable where the two ch~nn~l~ are ~left' and ' right' ch~n~ls.
For a typical audioconferencing applicationl the residual will have a bandwidth of 8 kHz and must be quantised and transmitted using spare channel capacity of abou~ 16 kbit/s. The whitened residual will be, in ; W O 90/1~136 ~ 3 ~ ~ ~CT/CB90/00928 principle, small in mean square value, but will not be optimally whitened since the copy pre-whitening filker 14b through which the residual passes has coefficients derived to whiten the sum channel and not necessarily the difference channel. Typically, the d~namic range of the ~iltered signal is reduced by 12dB over the unfiltered difference channel. One approach to this residual quantisation problem is to reduce the bandwidth of the residual signal. This allows downsampling to a lower rate, with a consequential increase in bits per sample.
It is well known that most of the spatial information in a stereo signal .is contained within the 0~2 kHz band, and ~ therefore reducing the residual bandwidth from 8 kHz to a; value in excess of 2 ~HZ does not affect the perceived stereo image appreciably. Results have shown that reducing the residual bandwidth to 4 kHz (and taking the upper 4 kHz band to be identical to that of the sum channel) produces good quality stereophonic speech when the reduced bandwidth residual is sub-band coded using a standard technique.
Experiments with various adaptive filters for the filter 4 (and, where applicable, 12) showed that a s~andard transversal FIR filter was slow to converge.
A ~aster per~ormance can be obtained by using a lattice structure, with coefficient update using a gradient algorithm based on Burg's method, as shown in Figure 7.
The structure uses a lattice filter 14a to pre-whiten the spectrum o~ the primary input. The decorrelated backwards residual outputs are then used as inputs to a simple linear combiner which attempts to model the input spectrum of the secondary input. ~lthough the modelling process is the same as with the simple transversal FIR
filter, the effect of the lattice fllter is to point the error vector in the dir%ction of the optimum LMS residual .

. O 90/16136 PCI/GB90/0092 solution. This speeds convergence considerably.
A lattice filter of order 20 is found effective in practice.
The lattlce filter s~ructure is particularly useful as described above, but could also be used in a system in which, instead of forming sum and difference signals, a (suitably delayed) left channel is predicted from the right ch~nnel.
Although the embodiments described show a stereophonic system, it will be appreciated that with, for example, quadrophonic systems, the invention is implemented by forming a sum signal and 3 difference signals, and predicting each from the sum signal as above.
Whilst the invention has been described as applied to a low bit-rate transmission system, e.g. for teleconferencing, it is also useful for example for digital storage of music on well known digital record carriers such as Compac' Discs, by providing a formatting means for arranging the data in a format suitable for such record carriers.
Conveniently, much or all of the signal processing involved is realised in a single suitably programmed digital signal processing (dsp) chip package; two channel packages are also commercially available. Software to implement adaptive filters, LPC analysis and crsss-correlations are well known.

,

Claims (12)

1. Polyphonic signal coding apparatus comprising:
- means for receiving a first and at least one second channel;
- means for filtering the first and second channel in accordance with a filter approximating the spectral inverse of the first channel to produce respective filtered channels, the first said filtered channel thereby being substantially spectrally whitened;
- means, connected to receive the filtered channels, for periodically generating reconstruction data enabling the formation, from the first channel, of an estimate of the second channel, the generating means being operable to generate a plurality of filter coefficients which, if applied to a plural order predictor filter, would enable the prediction of the second channel from the first channel thus filtered;
- means for outputting data representing the said first channel and the reconstruction data.
2. Apparatus according to claim 1, wherein said filtering means comprises an adaptive, master, filter arranged to filter the first channel so as to produce a whitened output, and a slave filter arranged to filter said second channel, the slave filter being configured so as to have an equivalent response to the adaptive filter of the filtering means.
3. Apparatus according to claim 1 or 2, in which the generating means includes an adaptive filter connected to receive the first filtered channel and produce a predicted second channel therefrom; and means for producing a residual signal representing the difference between the said predicted second channel and the actual second filtered channel, and in which the said reconstruction data comprises data representing the said residual signal.
4. Apparatus according to claim 3 in which the adaptive filter is controlled only by the said residual signal and the said reconstruction data consists of the said residual signal.
5. Apparatus according to claim 1, 2, or 3, wherein the reconstruction data comprises the said filter coefficients.
6. Apparatus according to any one of claims 1 to 5, further comprising:
- input means for receiving input signals; and - means for producing the said channels therefrom, the first channel being a sum channel representing the sum of such input signals and the second of further channels representing the differences therebetween.
7. Apparatus according to any one of claims 1 to 6, including variable delay means for delaying at least one of the channels, and means for controlling the differential delay applied to the channels so as to increase the correlation upstream of the generating means, the output means being arranged to output also data representing the said differential delay.
8. Apparatus according to claim 6, in which the input means includes variable delay means for delaying the least one of the input signals, and means for controlling the differential delay applied to the signals so as to increase the correlation upstream of the generating means, the output means being arranged to output also data representing the said differential delay.
9. Polyphonic signal decoding apparatus comprising:
- means for receiving data representing a sum signal, and signal reconstruction data; and means operable in response to the reconstruction data to modify the sum signal so as to produce at least two output signals, the modifying means comprising:
- a configurable plural order predictor filter for receiving said signal reconstruction data and modifying its coefficients in accordance therewith, the filter being connected to receive the said sum signal and reconstruct therefrom difference signal;
- an adaptive, master, filter arranged to filter the sum signal in accordance with approximately the spectral inverse of the sum signal so as to produce a whitened output, and a slave filter arranged to filter said difference signal, the slave filter being configured so as to have an equivalent response to the adaptive master filter; and - means for adding the filtered difference signal to the filtered sum signal, and for subtracting the reconstructed difference signal from the sum signal, so as to produce at least two output signals.
10. Apparatus as claimed in claim 9, in which the difference signal reconstruction data comprises residual signal data and the apparatus includes means to add the residual signal data to the output of the filter to form the reconstructed difference signal.
11. Apparatus as claimed in claim 10, in which the predictor filter is connected to receive the residual signal data and to modify its coefficients in accordance therewith.
12. A method of coding polyphonic input signals comprising:
- Producing therefrom a sum signal representing the sum of such signals; and reconstruction data to enable the formation, from the sum signal, of a further one of the input signals;
- Producing from the input signals at least one difference signal representing a difference therebetween;
- analysing said sum and difference signals and generating therefrom a plurality of coefficients which, if applied to a multi-stage predictor filter, would enable the prediction of the difference signal from the sum signal thus filtered;
- the coded output comprising the said sum signal and data enabling the reconstruction of the said difference signal therefrom;
characterised by , before said analysis, filtering the sum signal and difference signal in accordance with a filter approximating the spectral inverse of the sum signal, the sum signal thereby being substantially spectrally whitened.
CA002058984A 1989-06-15 1990-06-15 Polyphonic coding Expired - Lifetime CA2058984C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB8913758.2 1989-06-15
GB898913758A GB8913758D0 (en) 1989-06-15 1989-06-15 Polyphonic coding

Publications (2)

Publication Number Publication Date
CA2058984A1 CA2058984A1 (en) 1990-12-16
CA2058984C true CA2058984C (en) 1998-12-01

Family

ID=10658483

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002058984A Expired - Lifetime CA2058984C (en) 1989-06-15 1990-06-15 Polyphonic coding

Country Status (13)

Country Link
EP (1) EP0478615B2 (en)
JP (1) JP2703405B2 (en)
AT (1) ATE121900T1 (en)
AU (1) AU640667B2 (en)
CA (1) CA2058984C (en)
DE (1) DE69018989T3 (en)
DK (1) DK0478615T3 (en)
ES (1) ES2071823T3 (en)
FI (1) FI915873A0 (en)
GB (1) GB8913758D0 (en)
HK (1) HK137196A (en)
NO (1) NO180030C (en)
WO (1) WO1990016136A1 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992012607A1 (en) * 1991-01-08 1992-07-23 Dolby Laboratories Licensing Corporation Encoder/decoder for multidimensional sound fields
US5274740A (en) * 1991-01-08 1993-12-28 Dolby Laboratories Licensing Corporation Decoder for variable number of channel presentation of multidimensional sound fields
DE4136825C1 (en) * 1991-11-08 1993-03-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung Ev, 8000 Muenchen, De
US5278909A (en) * 1992-06-08 1994-01-11 International Business Machines Corporation System and method for stereo digital audio compression with co-channel steering
EP0608937B1 (en) * 1993-01-27 2000-04-12 Koninklijke Philips Electronics N.V. Audio signal processing arrangement for deriving a centre channel signal and also an audio visual reproduction system comprising such a processing arrangement
DE4320990B4 (en) * 1993-06-05 2004-04-29 Robert Bosch Gmbh Redundancy reduction procedure
US5488665A (en) * 1993-11-23 1996-01-30 At&T Corp. Multi-channel perceptual audio compression system with encoding mode switching among matrixed channels
GB2303516A (en) * 1995-07-20 1997-02-19 Plessey Telecomm Teleconferencing
DE19526366A1 (en) 1995-07-20 1997-01-23 Bosch Gmbh Robert Redundancy reduction method for coding multichannel signals and device for decoding redundancy-reduced multichannel signals
US6016473A (en) * 1998-04-07 2000-01-18 Dolby; Ray M. Low bit-rate spatial coding method and system
DE19829284C2 (en) * 1998-05-15 2000-03-16 Fraunhofer Ges Forschung Method and apparatus for processing a temporal stereo signal and method and apparatus for decoding an audio bit stream encoded using prediction over frequency
SE519552C2 (en) * 1998-09-30 2003-03-11 Ericsson Telefon Ab L M Multichannel signal coding and decoding
SE519976C2 (en) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Coding and decoding of signals from multiple channels
SE519981C2 (en) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Coding and decoding of signals from multiple channels
SE519985C2 (en) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Coding and decoding of signals from multiple channels
FR2821475B1 (en) * 2001-02-23 2003-05-09 France Telecom METHOD AND DEVICE FOR SPECTRALLY RECONSTRUCTING MULTI-CHANNEL SIGNALS, ESPECIALLY STEREOPHONIC SIGNALS
BRPI0517949B1 (en) * 2004-11-04 2019-09-03 Koninklijke Philips Nv conversion device for converting a dominant signal, method of converting a dominant signal, and computer readable non-transient means
JP5285626B2 (en) * 2007-03-01 2013-09-11 ジェリー・マハバブ Speech spatialization and environmental simulation
JPWO2009122757A1 (en) * 2008-04-04 2011-07-28 パナソニック株式会社 Stereo signal conversion apparatus, stereo signal inverse conversion apparatus, and methods thereof
RU2497204C2 (en) 2008-05-23 2013-10-27 Конинклейке Филипс Электроникс Н.В. Parametric stereophonic upmix apparatus, parametric stereophonic decoder, parametric stereophonic downmix apparatus, parametric stereophonic encoder
TR201901336T4 (en) 2010-04-09 2019-02-21 Dolby Int Ab Mdct-based complex predictive stereo coding.
EP2375409A1 (en) * 2010-04-09 2011-10-12 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction
ES2953084T3 (en) 2010-04-13 2023-11-08 Fraunhofer Ges Forschung Audio decoder to process stereo audio using a variable prediction direction
UA107771C2 (en) 2011-09-29 2015-02-10 Dolby Int Ab Prediction-based fm stereo radio noise reduction
US9380387B2 (en) 2014-08-01 2016-06-28 Klipsch Group, Inc. Phase independent surround speaker

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU546069B2 (en) * 1981-09-07 1985-08-15 Kahn, Leonard Richard Am stereo distortion correction
JPH0761043B2 (en) * 1986-04-10 1995-06-28 株式会社東芝 Stereo audio transmission storage method
GB8628046D0 (en) * 1986-11-24 1986-12-31 British Telecomm Transmission system

Also Published As

Publication number Publication date
GB8913758D0 (en) 1989-08-02
DE69018989T2 (en) 1995-09-07
CA2058984A1 (en) 1990-12-16
ES2071823T3 (en) 1995-07-01
EP0478615B1 (en) 1995-04-26
NO914947D0 (en) 1991-12-13
FI915873A0 (en) 1991-12-13
WO1990016136A1 (en) 1990-12-27
HK137196A (en) 1996-08-02
AU5837990A (en) 1991-01-08
AU640667B2 (en) 1993-09-02
ATE121900T1 (en) 1995-05-15
DE69018989T3 (en) 1998-11-19
NO180030B (en) 1996-10-21
JPH04506141A (en) 1992-10-22
NO914947L (en) 1992-02-13
EP0478615A1 (en) 1992-04-08
NO180030C (en) 1997-01-29
EP0478615B2 (en) 1998-04-15
JP2703405B2 (en) 1998-01-26
DK0478615T3 (en) 1995-07-17
DE69018989D1 (en) 1995-06-01

Similar Documents

Publication Publication Date Title
CA2058984C (en) Polyphonic coding
US5434948A (en) Polyphonic coding
US5701346A (en) Method of coding a plurality of audio signals
CA2645910C (en) Methods and apparatuses for encoding and decoding object-based audio signals
RU2381571C2 (en) Synthesisation of monophonic sound signal based on encoded multichannel sound signal
EP0563832A1 (en) Stereo audio encoding apparatus and method
JP4236813B2 (en) Frame and basic audio coding with additional filter bank for aliasing suppression
JP4229586B2 (en) Frame and basic audio coding with additional filter bank for aliasing suppression
JP4126680B2 (en) Frame and basic audio coding with additional filter bank for aliasing suppression
JPS61112433A (en) Frequency region voice encoding method and device
EP1952391A1 (en) Method for encoding and decoding multi-channel audio signal and apparatus thereof
JP4126682B2 (en) Frame and basic audio coding with additional filter bank for aliasing suppression
JP2001521308A5 (en)
JP2001521347A5 (en)
US7725324B2 (en) Constrained filter encoding of polyphonic signals
JP4323520B2 (en) Constrained filter coding of polyphonic signals
Minami et al. Stereophonic adpcm voice coding method
Väänänen Inter-Channel Prediction to Prevent Unmasking of Quantization Noise in Beamforming

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry
MKEX Expiry

Effective date: 20100615