WO2011039668A1

WO2011039668A1 - Apparatus for mixing a digital audio

Info

Publication number: WO2011039668A1
Application number: PCT/IB2010/054164
Authority: WO
Inventors: Albertus Cornelis Den Brinker; Erik Gosuinus Petrus Schuijers
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2009-09-29
Filing date: 2010-09-15
Publication date: 2011-04-07
Also published as: TW201138492A

Abstract

The invention proposes an apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers. The proposed apparatus comprises a unit for deriving mixing parameters for subsequent time intervals, a unit for determining an Overall Phase Difference (OPD) information parameter, and a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters. The determining unit of the proposed apparatus is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.

Description

Apparatus for mixing a digital audio

FIELD OF INVENTION

The invention relates to an apparatus for mixing a digital audio.

DESCRIPTION OF THE PRIOR ART

Down-mixing is an often applied technique. In its simplest form, having a stereo signal which one wants to render over a mono system, one can take the mid signal (left plus right) and feed it to the loudspeaker.

Down-mixing is also applied in Parametric Stereo (PS) coding and its extension of multi-channel coding (e.g., MPEG Surround: MPS). In the PS case, stereo cues (inter-channel level differences, time- or phase-differences and coherence) are determined per time- frequency tile (typically a Bark or ERB band division of the frequency axis) and transmitted to the decoder together with a down-mix signal.

In a Parametric Stereo decoder, the processing is reversed. The down-mix signal is processed (per time-frequency tile) to create two signals based on the stereo cues. This process is known as up-mixing. De-correlation signals are typically added to create the desired measure of (in)coherence.

The stereo cues that are transmitted are the Inter-channel Level Difference (ILD), the Inter-channel Phase Difference (IPD) and the Inter-channel Coherence (ICC). Additionally, an Overall Phase Difference (OPD) may be transmitted or, alternatively, may be calculated in the decoder if the method of down-mixing in the encoder is fixed and known to the decoder (see Breebaart, J., van de Par, S., Kohlrausch, A., and Schuijers, E. (2005). Parametric coding of stereo audio. Eurasip J. Applied Signal Proc. 9, 1305-1322, and granted European patent EP 1595247).

It is well-known that creating the mid signal typically results in somewhat dull signals, i.e., less brightness/high-frequency content. The reason for this is that in typical audio signals, the low-frequencies are time-aligned whereas at high frequencies they are not. Direct summation of the two stereo channels effectively suppresses the non-aligned signal components. This has been appreciated when constructing the down-mixes for PS (or MPS). Textbooks such as e.g. J. Breebaart and C. Faller. Spatial audio processing. Chichester (UK): Wiley, 2007. Pages 76-77, state that phase alignment must be done to ensure best results. However, in practice, the down-mixing is often a so-called passive down-mix (i.e., the mean of left and right signals). There are several reasons for this, like complexity and algorithmic delay (J. Lapierre and R. Lefebvre. On improving parametric stereo audio coding. 120th AES Convention, Paris (F), 20-23 May 2006). But there are also quality issues when creating a phase aligned down-mix. If the inter-channel phase-difference is measured, there is an ambiguity whether to align the phase of the left to the right or vice versa. Also trying to shift the phase of both equally but in opposite directions leads to ambiguity. On top of that, the phase difference is numerically ill-conditioned when the correlation is low. Overall this leads to additional artifacts when creating a down-mix by phase-alignment, most notably modulations on tonal components.

All of these reasons have led to the adoption of a passive or active down-mix. In a passive down-mix, the down-mix signal is the average of left and right. Unfortunately, also passive down-mixing has some problems. One of these (as already noted) is that acoustic energy can get lost (Samsudin, Evelyn Kurniawati, Farook Sattar, Ng Boon Poh, and Sapna George. A subband domain downmixing scheme for parametric stereo encoder. 120th AES Convention, Paris (F), 20-13 May 2006. ConvA Paper 6815, or J. Breebaart and C. Faller. Spatial audio processing. Chichester (UK): Wiley, 2007. Pages 76-77). Several methods to compensate have been proposed like active down-mixing (rescaling the down-mix) or decoder-side energy compensation (J. Lapierre and R. Lefebvre. On improving parametric stereo audio coding. 120th AES Convention, Paris (F), 20-23 May 2006). The compensation is on a rather global level and does not discriminate between tonal components (where compensation is necessary) and noise. However, in both passive as active down-mix, out-of- phase components are completely absent in the down-mix signal. For active down-mixes this leads to numerical problems when determining the re-scaling and/or artifacts in the decoded signal.

SUMMARY OF THE INVENTION

Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination. The invention is defined by the independent claims. The dependent claims define advantageous embodiments. One aspect of the invention proposes an apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the apparatus comprising:

- a unit for deriving mixing parameters for subsequent time intervals,

- a unit for determining an Overall Phase Difference (OPD) information parameter, and

- a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters,

characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.

The invention targets a better down-mix procedure where the objective is to overcome the problems of phase-alignment process and thus to be able to pull out-of-phase signal components into the down-mix. The main advantage is the improved quality. A second advantage (dependent on the specific implementation and desired normalization of the down- mix signal) is that the rescaling of the amplitude of the signal need not be undertaken anymore, because the down-mix can be automatically scaled to the coherent energy (since signal and energy cancellation no longer occur).

The invention also targets a blind OPD estimation in the decoder by the same principles underlying the OPD determination in the proposed (down/up-)mixing procedure. This is advantageous because the OPD need not to be transmitted nor any assumption about the down-mix needs to be made.

The blind up-mix is the preferred up-mix method associated with the proposed down-mix. An alternative would be to measure with the new down-mix an OPD in a standard way in the encoder, e.g., by measuring the phase-alignment of the created down-mix with the left (or right) signal. The OPD can then be transmitted. However, this introduces the already noted problem that the correlation between the down-mix and left (or right) signal may be low and thus the OPD inaccurate. This means that the blind up-mix has two advantages over existing methods when dealing with a down-mix created by the proposed method: no bit rate needs to be spent on transmitting OPD information and quality issues due to OPD

inaccuracies cannot occur.

The invention is based on the following insights. Creating a phase-aligned down-mix is desired in order to prevent loss of energy. The necessary difference phase is easily measured and is called the IPD (Inter-channel Phase Difference). However, generating the so-called Overall Phase Difference poses a problem. It can be defined by e.g. phase- alignment to the left signal, right signal or mid signal but in all of these cases signals can be determined where the OPD is ill-defined due to the fact that the signal to which the alignment takes place is small or even zero.

Therefore, instead of aligning the two signals amongst themselves, it is proposed to align the signals to some third signal which is not derived from the signal pieces which have to be aligned. A very suitable signal is the already created down-mix signal (in particular, that of a previous frame) since in that way smoothness of the down-mix signal itself can be guaranteed. Furthermore, the correlation in the overlap region of two

consecutive frames is necessarily high due to the fact that they are generated from the same signal pieces.

This prevents the problems of phase alignment of signals having a low correlation (except for the zero-signal which, fortunately, can be treated in rather arbitrary ways).

In a similar way, an OPD in the up-mix (decoder) can be derived. The basic idea here is that the OPD is used to attain phase alignment between frames; it is in fact not associated with creating a proper stereo image; this latter is defined by the stereo cues (ILD, IPD and ICC) rather than by the OPD. Current OPD definitions rely on phase alignment to signals existing in the encoder. The current OPD definition deviates from that and therefore the phases of the encoded and decoded signal may vary significantly. This however, is in no way detrimental to the stereo image; the inter-channel phase differences are kept intact by the IPD. In particular, the OPD of the current frame is an updated version of that of the previous frame such that a maximum phase alignment between the decoded signals of consecutive frames is realized.

The essence of the present invention in up- and down-mixing is the phase alignment (OPD) between consecutive frames of the output (up-mixed or down-mixed) signals. In particular, the current output signals(s) are phase-aligned to past output signal(s) instead of phase-aligned to a signal or signals (or components thereof) in the current frame as is the current state of the art.

Apparatus as claimed in claim 1, characterized in that the OPD parameter information is a differential OPD parameter.

In a further embodiment, the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the n input signal components in a sub interval lying in said given time interval and/or said previous time interval.

In a further embodiment, the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the m output signal components in a sub interval lying in said given time interval and/or said previous time interval.

In a further embodiment, the time intervals are overlapping time intervals. In a further embodiment, said previous time interval is an immediately preceding time interval.

In a further embodiment, n>m.

In a further embodiment, n=2 and p=l .

In a further embodiment, n<m.

In a further embodiment, n=l and p=2.

In a further embodiment, for n<m and the apparatus further comprising an input for receiving an OPD parameter value, the determining unit is adapted to re-initialize the OPD parameter information on the basis of the received OPD parameter value.

In a further embodiment, further being adapted to receive an indicator signal indicating the receipt of the OPD parameter value, the determining unit being adapted to reinitialize the OPD parameter information in response to said received indicator signal.

In a further embodiment, the mixing parameters and the OPD parameter value are derived for each of a plurality of frequency bands. BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be further elucidated in the figures:

Fig. 1 An example block diagram of an apparatus for mixing digital audio according to the invention;

Fig. 2 An example of time intervals used for derivation of OPD parameter information;

Fig. 3 An example of weighing functions used for derivation of OPD parameter information;

Fig. 4 An example of time intervals where overlapping intervals are smaller than half the length of the time interval; Fig. 5 A first example down-mixing scheme according to the invention for two input signal components;

Fig. 6 A second example down-mixing scheme according to the invention for two input signal components;

Fig. 7 A detailed second example down-mixing scheme according to the invention for two input signal components;

Fig. 8 An example of OPD calculation at the up-mixing side.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

Stereo or multi-channel encoding and decoding is typically done in subbands where complex signals appear, invention applies to the processing for a single sub-band only. In the notation used across the description of the invention * is used to denote a complex conjugate (or transposition and conjugation in case of vectors and matrices) and j = V-T .

A mixing process is a linear mapping from one set of signals to another set. This can be written as:

If p < n the process is called down-mixing, if p > n the process is called up- mixing. The matrix H is usually a signal-dependent matrix which is varying at a much lower rate than the signal itself. In audio coding this is exploited by measuring and transmitting the matrix information at a much lower update rate than the signals. In audio coding, the matrix entries are a function of an Interchannel Level Difference (ILD), an Interchannel Phase Difference (IPD) and an Interchannel Coherence (ICC). These three parameters are typically measured in the audio encoder from the signals x. and transmitted to the decoder such that the mixing process can be reversed. Since the update of the mixing parameters is at lower rate than that of the signal, the concept of a frame where a single mixing parameter is associated with a set of time indices is common. These frames are often overlapping (or may be viewed as such), and output signals from a mixer be generated in an overlap-add fashion.

The OPD is usually defined as a phase difference between one of the signals x. and one of the signals y_l . The invention deviates from this in order to improve the quality of the mixed signals, and also to define an appropriate OPD in the decoder in absence of knowledge of the encoding OPD definition.

The invention defines the OPD by requiring that the resulting mix (up- or down-mix) signal is as smooth as possible. In particular, the smoothness is measured from the phase alignment between one or more (consecutive) output signals.

Two consecutive frames and a general idea of how to arrive at an OPD are considered in the description of the current invention. The concept can nevertheless be extended from a current frame and the previous frame to that of the current frame and set of past frames. Suppose that two mixing matrices are given, called H_m and H_m+1 corresponding to m and m + 1 ^st frame. Consider an associated time interval such that if K_x is the mid of frame m and K₂ the mid of frame m + 1 , K_x < k≤ K₂ is considered as the overlap region. However, other definitions of a time interval are feasible a well. It is also very common to consider a window defined on such an interval, which is denoted by w₀ (k) .

Consider the signals y_l on the overlap region that would be created from the signals x_i if the mixing matrix H_l is applied. These signals are denoted by y_ul (k) .

Similarly, the signals y_2J (k) are defined. To optimize the smoothness (in other words to minimize discontinuity of the output signals over the overlap region), the average co -variance c can be introduced which is defined as:

However, other measures like e.g. an average correlation coefficient can be used as well. Also, if desired the window may depend on the index / .

It is now noted that c is proportional to β^~β"^+ι where 0_m+1 is the OPD of the m + l frame. In particular:

where x(k) denotes the vector of signals X_; {k) at instant k and ^* denotes Hermitian transposition. Setting the phase of c equal to zero, which corresponds to an average phase alignment of the signals, uniquely defines 0_m+1 as a function of the input signals X_; {k) and the matrices H_l and H₂ and the previous OPD 0_m .

If n = 1 , setting the phase of c to zero uniquely defines 0_m+1 as a function of the matrices H_l and H₂ and the previous OPD (0_m ).

A similar principle can be constructed by not only considering a single previous frame ( m ) but also by using more previous frames.

It should be noted that the first OPD (at the beginning of mixing process) can be chosen arbitrarily. In fact, a standard definition of an OPD could be used, i.e., an OPD based on phase-alignment of the one of the mixed signals with any of the input signals.

Alternatively, the OPD calculation according to the invention may be re-initialized at certain frame with the OPD according to a standard definition.

Alternatively, the OPD parameters may be transmitted. Using phase parameters such as IPD and OPD in the upmix may contribute significantly to the perceived quality in PS (Parametric Stereo) based audio codecs and may in particular substantially improve sound source localization. However, as IPD only describes the relative phase modification between the two output signals and not the phase difference between the downmix and the individual stereo channels, it cannot provide any information of how the phase modification should be distributed across the output stereo channels. The OPD parameter is indicative of the phase offset between the downmix and at least one of the stereo channels and it thus reflects how the phase should be distributed between the channels. The OPD may accordingly be included in the encoded signal by an encoder. However, although the OPD can be transmitted from encoder to decoder using a relatively limited bit budget, this approach does increase the overall data rate for the signal. Therefore, the an OPD estimation may be performed at the decoder side such that the OPD value is not included in the encoded signal but is instead calculated by the decoder from the other parameter values.

The OPD may e.g. be calculated from (ref. e.g. Jimmy Lapierre and Roch Lefebvre, "On Improving Parametric Stereo Audio Coding", Presented at the 120^th

Convention, 2006 May 20-23 Paris, Audio engineering Society, France, Preprint 6804).

However, such approaches may result in a discontinuity for the OPD value as a function of the IPD value. For example, discontinuities can occur in time (e.g. if IPD changes in time from just below to just above π). However, as the processing of segments include time averaging (e.g. as part of FFT or QMF windowing) this may lead to cancellation of the output signal. In particular, such an approach typically results in audible artifacts perceived as 'clicks' or 'warbling' sounds. Similarly, the IPD changes (and thus sign inversions) may occur in the frequency domain between one subband and the next (e.g. if IPD in one band is just below π and just above π in a neighboring band). This may similarly result in noticeable artifacts.

To prevent such artifacts, the encoder may include signaling information in the encoded stereo signal indicating whether an OPD parameter should purely be estimated by the decoder or should be replaced (or compensated) by a phase correction parameter that is included in the encoded stereo signal.

In some embodiments, signaling information may be provided for each segment of the parametrically encoded signal. Thus, the encoded signal may be segmented in typically the time domain when being encoded. The presence indication may simply indicate whether there are any phase correction parameter values for the current segment. For example, the presence indication can be a single bit denoting that for the current frame all time frequency blocks can be estimated reliably by the decoder. This may provide a very low data rate overhead (possibly a single bit per segment) and may reduce the complexity and/or resource usage of the decoder. In other embodiments, a more detailed presence indication may be used. Specifically, the presence indication may comprise individual presentation indications for a plurality of sets of time frequency blocks of the down-mix. Specifically, each set may correspond to one time frequency block for which individual PS parameters are provided. Further the sets may cover all time frequency blocks of the signal. Thus, specifically a single presence indication bit may be included for each parameter time frequency block indicating whether e.g. the OPD for the block can be purely estimated by the decoder or whether it must take into account a phase correction parameter provided for the block. It will be appreciated that in many embodiments, the phase correction parameter may indeed be provided for each time frequency block that belongs to the second parts of the downmix.

At the decoder side (depending on the signaling information), the OPD information is estimated from the other PS parameters or is decoded from the bit-stream. The latter case may still employ the estimated data, depending on the coding scheme, e.g. by transmitting the difference between the estimated OPD and the OPD derived in the decoder.

Fig. 1 shows a block diagram of an apparatus 500 for mixing n digital input signals, denoted as xi,...,x_n 501 into m digital output signals, denoted as yi,... ,y_m502. The apparatus 500 contains a unit for deriving mixing parameters 520 for subsequent time intervals. This derivation may comprise external control data, such as e.g. a matrix with time- varying gains mapping the input channels to the output channels, or it may comprise decoding a digital bit-stream comprising mixing parameters. These mixing parameters may consist directly of a matrix with time-varying gains, but may also consist of a set of spatial image parameters that describe desired relationships of the output channels. The unit 530 determining the Overall Phase Difference (OPD) information parameter for a given time interval advantageously employs mixing parameters in said given time interval and the mixing parameters in a previous time interval, both indicated in the figure by 503. The mix unit 510 mixes the n input signals 501 into p output signals 502 in response to the OPD information parameter 505 and the mixing parameters 504. This allows creating a phase- aligned mix of the output signals, thereby reducing artefacts as known by state-of-the-art methods that do not apply phase alignment of subsequent time intervals.

Several mixing schemes will be subsequently described in more detail wherein either p or n equals 1. For p = 1 it will also be illustrated that it is possible to deviate from the average co-variance measure without deviating from the current invention.

It should be clear from the wording of description of these detailed examples how the signals and blocks match to the wording of the general mixing scheme as described above.

The down-mix (typically per frame) is weighted sum of the left and right signal. The proposed processing will be described on a frame basis. For this purpose a prototype window w is defined. Furthermore, a shift parameter U , also referred to as an update parameter, is also defined. Typically, the window has finite support with its support larger than U such that processing is done in overlapping frames. The m ^th window is defined as: w_m (k + mU) = w(k) and amplitude-complimentary windows as: = 1

m

The left and right frame signals l_m and r_m are then expressed as: i_m {k) = i{k)w_m {k),

r_m {k) = r{k)w_m {k).

The down-mix in frame m is called d m_m and is described as d_m {k) = A{a_ll_m {k)+ a_rr_m {k))

In order to create the down-mix, the signals cc^ , cc_r and A need to be determined. Instead of these three components only two weights: A _l and A _r nQQd to be determined. For a first example down-mix scheme it is however, for illustration purpose, more convenient to treat these parameters separately.

The total down-mix signal is the sum of the frame down-mixes

4*)= m

The invention is now that the weights cc^ and cc_r are determined by phase alig onment of the signals / m and r m with d m_m— ,i . This is shown in Fig ©. 5.

In the following an additional explanation will be given as to how the OPD parameter information can be derived for an apparatus which mixes n input signal components into p output signal components. The signal components can be wideband signal components or narrow band (or sub band) signal components. In the latter case, the signal components are the signal components in one and the same of the sub bands. Time intervals , Tj, Tj+i, Tj+2, .... are defined as given in Fig. 2. In this example, the time intervals are overlapping by half of the length of the time intervals. This, however is not a necessity. The time intervals can also overlap by more or less than half of the length of the time intervals.

In a first step, the signal components of the n input signal components in the time interval Tj are used to derive the mixing matrix H and thus the matrix coefficients in the mixing matrix H.

As an example, consider the case where the n input signals are down-mixed to a single signal (p=l). The matrix H reduces to a row matrix. The phase difference between each channel and the first channel can be measured (IPD parameter) and the entries in the matrix may be chosen such that they compensate for the phase differences. In this way the down-mix becomes a phase-aligned down-mix where no destructive phase cancellation occurs.

In a second step, the signal components of the n input signal components in the time interval T_{i+ 1} are used to derive the mixing matrix H_{i+ 1} and thus the matrix coefficients in the mixing matrix

The first step is actually carried out in this way in case n > p, or in case of the apparatus being a down mixing apparatus. In case of upmixing (n < p), the matrix coefficients of the matrices H and or the coefficients such as ILD, IPD and ICC from which the matrixing coefficients can be derived (as derived in the down matrixing apparatus) are supplied to the upmixing apparatus, for each of the subsequent time intervals , Tj,

In a third step, for the n input signal components in the overlapping time interval OLTj, Overall Phase Difference parameter information is derived by calculating the cross-correlation in the overlap region:

c = e-^ l x {k)H _lHi kMk)

k=K_x 1=1 where (¾^■ and (¾^■ ₊; are the Overall Phase difference parameters for the time interval Tj and T_i+i respectively. Setting the angle of the cross-correlation to zero, yields phase alignment. As (¾^■ is already known from the previous time interval T_u 9_i+i is uniquely defined.

This procedure continues for subsequent time intervals.

The mixing matrices so obtained for the subsequent time intervals , T,,

Ti₊i, Tj+2, .... and the Overall Phase Difference information so obtained for the subsequent overlapping time intervals , OLTj, OLT_i+i, OLT_i+2, .... are now used in the actual mixing stage to mix the n input signal components into the p output signal components. This is done in the following way.

The matrix parameters (the mixing matrices ... H,, H_i+i, ... or their equivalents in the form of the ILD, IPD and ICC parameters per time interval T, ) have been obtained from either the above described method or extracted from an incoming bitstream. Further, Overall Phase Difference information is derived as explained above and is now available in the form of (¾ 9_i+i, for subsequent overlapping time intervals , OLTj, OLT_i+i,

OLT_i+2, .... The p output signal components in the overlapping time interval OL T, can now be obtained from the following formula:

Y(k) = expQOi) A H_t X(k) + ex_P09_i+l) B H_i+1 X(k) where A and B are weighing (or windowing) functions that behave as a function of time (k) as indicated in Fig. 3, where A varies for increasing time in the overlapping time interval OL T, from a maximum value (e.g. 1 or ½) to a value of zero, whereas B varies for increasing time in the overlapping time interval OL T, from a value zero to the same maximum value. is a vector containing n input signal components at time k where k is in the overlapping time interval OL T,.

This is repeated for subsequent overlapping time intervals.

As will be appreciated the overall phase correction carried out by the exponent term in the formula is each time updated with the next 9 value. In that sense, the 9 values could be considered as differential OPD information. It will be evident that, instead of generating absolute OPD information for subsequent time intervals (...., 9j, 9_i+i, ...), it is equally well possible to each time generate differential OPD information for subsequent time intervals ( , Θ, - 9 ₁, 9_i+1 - Q ...).

In case the overlapping intervals are smaller than half the length of the time intervals, see Fig. 4, the processing for deriving the mixing matrix parameters Hj, H_i+i, H_i+2 .... for the subsequent time intervals Tj, T_i+i , T_i+2 ... is the same.

The derivation of the overall phase difference parameter information Θ, for the subsequent overlapping time intervals OLTj is done in the same way as described above, based on at least the portions of the signal components X in those overlapping time intervals.

Fig. 5 shows an example down-mix device according to the invention. The signals l_m 101 and d_m__x 103 are fed into a block PA 110 which measures a phase alignment and determines the coefficient cc^ . Similarly, cc_r is determined. The signals r_m 102 and d_m__x 103 are fed into a block PA 120 which measures a phase alignment and determines the coefficient cc_r . In the simplest case, cc^ and cc_r only change the phase, because |α₂1 = |oc_r | = 1 . The coefficients cc^ and a _r are used for amplification of signals l_m lOland r_m 102, respectively, in gain blocks 130 and 140, respectively. The amplified signals provided by blocks 130 and 140 are fed into an adder 150 that adds both signals and results in a signal s_m 106: s_m (k) = a_ll_m {k) + a_rr_m {k)

Subsequently, the signals s_m 106, I _m 101 and r_m 102 are fed into a unit Ec 160 which determines and applies amplification by a gain A : d_m {k) = As_m {k) .

The operation of the individual block 110 and 120 can be as follows. In the PA unit, a second window w₀ (overlap window) is defined on the union of the supports of w_m and w_m__j . Subsequently, the following measure is defined in the respective units:

Alternatively, the cross-correlations can be determined as an alternative measure:

The weights cc^ and cc_r can be determined from these measures. For example cc^ and cc_r can be determined as: a, " I

a = e^j%> with θ, = Zv,

Θ_Γ = Zv_, which aligns l_m and r_m to the down-mix of the previous frame. Another example of determining cc^ and cc_r is:

α_Γ = + P r -

The parameters b_l and b_r bias the down-mix. If b_r = b_l it biases towards the sum signal (which is usually qualitatively a good solution). Taking (i.e., in contrast to v_l both amplitude and phase) ensures that the signal with small coherence with the past down-mix is (somewhat) suppressed. This improves the smoothness of the down-mix signal.

Having created s_m , it is often desired to control its strength. This can be done by determining the energies

and taking

or

Instead of using d_m__x as alignment target, it is also possible to create from d_m__x a second signal (e.g., a prediction) of the down-mix in the current frame and align to this signal. It is also possible to use the sum and difference signals (/_m + r_m and l_m - r_m ) instead of / m and r m itself.

Alternatively, the left and right signals can first be phase-aligned among themselves and subsequently added. For this preliminary down-mix d_m , an OPD can be calculated such that the current down-mix signal aligns with the past down-mix signal d_m__x .

The scheme is shown in Fig. 6. A first unit (IPD) 210 receives the left signal 101 and right signal 102 of frame m , and calculates the IPD of that frame, which is indicated as a signal ipd_m 201. Subsequently the signal 201 can be taken up in an encoded bit stream. On basis of the measured ipd, the two signals are phase aligned and outputted as l_m 203 and r_m 204. The signals 203 and 204 are subsequently added in an adder 220 to form a preliminary down-mix d_m 205. Next, in a unit 230 an OPD is calculated and applied such that the resulting down- mix d_m 207 is aligned with the previous down-mix d_m__x 206 in the overlap region.

A more detailed sketch of this down-mixing scheme is provided in Fig. 7. The signals l_m 101 and r_m 102 are fed to an alignment measurement unit PI indicated as 211, which measures the phase alignment between the left signal lOland right signal 102. The output is the ipd 201, which can be taken up in the bit stream. On basis of the measured ipd, the right signal 102 is aligned to the left signal 101 by the multiplier 212 that creates signal r_m 204. An adder 220 adds the signals 204 and 101. In the OPD unit 230, the preliminary down-mix d_m 205 is compared with the previous down-mix d_m__x 206 in unit P2 indicated as

231 in the overlap region. The unit 231 has the same character as the units PA from Fig. 8. The output of PA is a multiplication factor (typically a complex number on the unit circle) 208, which drives a multiplier 232 and transforms the preliminary down-mix 205 to the final down-mix d_m 207, which is phase-aligned with d_m__x .

It should be clear that in this implementation of the inventive idea, output energy (energy of d_m ) is automatically equal to the coherent input energy: i.e.,

E_d = E_l + E_r +

I provided that both multipliers have magnitude 1 and the phase of the first multiplier corresponds to the IPD. It is obvious that the down-mix schemes described above can be extended in a straightforward way to create a down-mix signal from 3 or more signals instead of two.

Suppose K_l is the time index associated with the m ^th data set of stereo cues ILD, IPD, ICC and K₂ the instant m + 1 ^st set. These are available at the decoder.

Gains g_n , g_l2 , g_rl , g_r2 and ipd phases φ₁ ,φ₂ can be derived from the stereo cues. This calculation depends on whether the down-mix has been normalized to the sum energy of the input signal or another normalization like the normalization to the coherent energy. The following terminology is used: an OPD at instant m is called 0_m , and the down- mix signal is called d(k) in the time interval K_l≤ k≤ K₂ .

An objective is to calculate 0_m+1 . This is achieved by requiring that the up-mix signals in the overlap area created by the first parameter set is as good as possible aligned with the up-mix signals created by using the second parameter set. Instead of measuring the alignment as is done in the down-mix schemes, it can be shown that the optimal OPD alignment change can be calculated from the transmitted stereo cues of two consecutive frames.

As a measure an average covariance of the two up-mixed signals is used:

where l_l , r_x are the left and right up-mix from d using the parameter set of K_x and a similar definition of l₂ , r₂ . As before, w₀ is a window function in the overlap region. For the signals, the following expressions hold: k = S_lide¹

i = g_rlde ^l e

= g_r2de^≠1 e where the decorrelation signals that are usually added to l_m and r_m are disregarded.

Substituting this in the definition of c gives

With the definition Δ m = θ m ₊+₁l -Θ c can be expressed as:

c =∑w₀ (k]d(kf [g_ng_l2 + g_rlg_r'₂e* e-»^>

k=K

Choosing maximum alignment means setting the phase of c equal to zero. This implies for the phase update:

(the sum over weighted squared d -amplitude does not contribute to the phase). Finally, the following formula holds

The processing related to the above calculation is shown in Fig. 8. The unit OPD-D indicated as 310 receives the stereo cues from the present frame 302 and the stereo cues from the next frame 301. The unit 310 generates the phase update factor in the form of e^jAm indicated as 303. In the OPD-D unit 310, the stereo cues are translated to the gains g and the phases φ after which the above expression for e^jAm can be used. Next, the OPD can be updated by e^j(>m+l = e^]km e^]*^m , which is realized using the multiplier 320.

The OPD calculation can be initialized in an arbitrary manner, e.g. by taking the first ( m = 1 ) OPD equal to 0. It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and/or processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same units, processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be

implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.

Furthermore, although individually listed, a plurality of units, means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.

Claims

CLAIMS:

1. Apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the apparatus comprising:

a unit for deriving mixing parameters for subsequent time intervals,

a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.

2. Apparatus as claimed in claim 1, characterized in that the OPD parameter information is a differential OPD parameter.

3. Apparatus as claimed in claim 1, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the n input signal components in a sub interval lying in said given time interval and/or said previous time interval.

4. Apparatus as claimed in claim 1, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the p output signal components in a sub interval lying in said given time interval and/or said previous time interval.

5. Apparatus as claimed in claim 1, characterized in that the time intervals are overlapping time intervals.

6. Apparatus as claimed in claim 1 or 5, wherein said previous time interval is an immediately preceding time interval.

7. Apparatus as claimed in claim 1, characterized in that n>p.

8. Apparatus as claimed in claim 7, characterized in that n=2 and p=l .

9. Apparatus as claimed in claim 1, characterized in that n<p.

10. Apparatus as claimed in claim 9, characterized in that n=l and p=2.

11. Apparatus as claimed in claim 1 or 2, wherein n<p, the apparatus further comprising an input for receiving an OPD parameter value, characterized in that the determining unit is adapted to re-initialize the OPD parameter information on the basis of the received OPD parameter value.

12. Apparatus as claimed in claim 9, further being adapted to receive an indicator signal indicating the receipt of the OPD parameter value, the determining unit being adapted to re-initialize the OPD parameter information in response to said received indicator signal.

13. Apparatus as claimed in any of the preceding claims, wherein the mixing parameters and the OPD parameter value are derived for each of a plurality of frequency bands.

14. Method for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the method comprising the steps of:

deriving mixing parameters for subsequent time intervals,

- determining an Overall Phase Difference (OPD) information parameter, and mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters,

characterized in that the step of determining an Overall Phase Difference (OPD) information parameter is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.