WO2011039668A1 - Apparatus for mixing a digital audio - Google Patents

Apparatus for mixing a digital audio Download PDF

Info

Publication number
WO2011039668A1
WO2011039668A1 PCT/IB2010/054164 IB2010054164W WO2011039668A1 WO 2011039668 A1 WO2011039668 A1 WO 2011039668A1 IB 2010054164 W IB2010054164 W IB 2010054164W WO 2011039668 A1 WO2011039668 A1 WO 2011039668A1
Authority
WO
WIPO (PCT)
Prior art keywords
opd
time interval
mixing
signal components
mixing parameters
Prior art date
Application number
PCT/IB2010/054164
Other languages
French (fr)
Inventor
Albertus Cornelis Den Brinker
Erik Gosuinus Petrus Schuijers
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2011039668A1 publication Critical patent/WO2011039668A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/09Electronic reduction of distortion of stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems

Definitions

  • the invention relates to an apparatus for mixing a digital audio.
  • Down-mixing is an often applied technique.
  • having a stereo signal which one wants to render over a mono system one can take the mid signal (left plus right) and feed it to the loudspeaker.
  • Down-mixing is also applied in Parametric Stereo (PS) coding and its extension of multi-channel coding (e.g., MPEG Surround: MPS).
  • PS Parametric Stereo
  • MPS MPEG Surround
  • stereo cues inter-channel level differences, time- or phase-differences and coherence
  • time- frequency tile typically a Bark or ERB band division of the frequency axis
  • the processing is reversed.
  • the down-mix signal is processed (per time-frequency tile) to create two signals based on the stereo cues. This process is known as up-mixing.
  • De-correlation signals are typically added to create the desired measure of (in)coherence.
  • the stereo cues that are transmitted are the Inter-channel Level Difference (ILD), the Inter-channel Phase Difference (IPD) and the Inter-channel Coherence (ICC). Additionally, an Overall Phase Difference (OPD) may be transmitted or, alternatively, may be calculated in the decoder if the method of down-mixing in the encoder is fixed and known to the decoder (see Breebaart, J., van de Par, S., Kohlrausch, A., and Schuijers, E. (2005). Parametric coding of stereo audio. Eurasip J. Applied Signal Proc. 9, 1305-1322, and granted European patent EP 1595247).
  • ILD Inter-channel Level Difference
  • IPD Inter-channel Phase Difference
  • ICC Inter-channel Coherence
  • OPD Overall Phase Difference
  • the down-mixing is often a so-called passive down-mix (i.e., the mean of left and right signals).
  • passive down-mix i.e., the mean of left and right signals.
  • quality issues when creating a phase aligned down-mix. If the inter-channel phase-difference is measured, there is an ambiguity whether to align the phase of the left to the right or vice versa. Also trying to shift the phase of both equally but in opposite directions leads to ambiguity. On top of that, the phase difference is numerically ill-conditioned when the correlation is low. Overall this leads to additional artifacts when creating a down-mix by phase-alignment, most notably modulations on tonal components.
  • One aspect of the invention proposes an apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the apparatus comprising:
  • OPD Overall Phase Difference
  • a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters
  • the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.
  • the invention targets a better down-mix procedure where the objective is to overcome the problems of phase-alignment process and thus to be able to pull out-of-phase signal components into the down-mix.
  • the main advantage is the improved quality.
  • a second advantage (dependent on the specific implementation and desired normalization of the down- mix signal) is that the rescaling of the amplitude of the signal need not be undertaken anymore, because the down-mix can be automatically scaled to the coherent energy (since signal and energy cancellation no longer occur).
  • the invention also targets a blind OPD estimation in the decoder by the same principles underlying the OPD determination in the proposed (down/up-)mixing procedure. This is advantageous because the OPD need not to be transmitted nor any assumption about the down-mix needs to be made.
  • the blind up-mix is the preferred up-mix method associated with the proposed down-mix.
  • An alternative would be to measure with the new down-mix an OPD in a standard way in the encoder, e.g., by measuring the phase-alignment of the created down-mix with the left (or right) signal. The OPD can then be transmitted.
  • this introduces the already noted problem that the correlation between the down-mix and left (or right) signal may be low and thus the OPD inaccurate.
  • the blind up-mix has two advantages over existing methods when dealing with a down-mix created by the proposed method: no bit rate needs to be spent on transmitting OPD information and quality issues due to OPD
  • the invention is based on the following insights. Creating a phase-aligned down-mix is desired in order to prevent loss of energy. The necessary difference phase is easily measured and is called the IPD (Inter-channel Phase Difference). However, generating the so-called Overall Phase Difference poses a problem. It can be defined by e.g. phase- alignment to the left signal, right signal or mid signal but in all of these cases signals can be determined where the OPD is ill-defined due to the fact that the signal to which the alignment takes place is small or even zero.
  • IPD Inter-channel Phase Difference
  • an OPD in the up-mix can be derived.
  • the basic idea here is that the OPD is used to attain phase alignment between frames; it is in fact not associated with creating a proper stereo image; this latter is defined by the stereo cues (ILD, IPD and ICC) rather than by the OPD.
  • Current OPD definitions rely on phase alignment to signals existing in the encoder. The current OPD definition deviates from that and therefore the phases of the encoded and decoded signal may vary significantly. This however, is in no way detrimental to the stereo image; the inter-channel phase differences are kept intact by the IPD.
  • the OPD of the current frame is an updated version of that of the previous frame such that a maximum phase alignment between the decoded signals of consecutive frames is realized.
  • phase alignment between consecutive frames of the output (up-mixed or down-mixed) signals.
  • the current output signals(s) are phase-aligned to past output signal(s) instead of phase-aligned to a signal or signals (or components thereof) in the current frame as is the current state of the art.
  • the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the n input signal components in a sub interval lying in said given time interval and/or said previous time interval.
  • the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the m output signal components in a sub interval lying in said given time interval and/or said previous time interval.
  • the time intervals are overlapping time intervals.
  • said previous time interval is an immediately preceding time interval.
  • n m
  • n ⁇ m.
  • the determining unit is adapted to re-initialize the OPD parameter information on the basis of the received OPD parameter value.
  • the determining unit being adapted to reinitialize the OPD parameter information in response to said received indicator signal.
  • the mixing parameters and the OPD parameter value are derived for each of a plurality of frequency bands.
  • FIG. 1 An example block diagram of an apparatus for mixing digital audio according to the invention
  • Fig. 2 An example of time intervals used for derivation of OPD parameter information
  • Fig. 3 An example of weighing functions used for derivation of OPD parameter information
  • Fig. 4 An example of time intervals where overlapping intervals are smaller than half the length of the time interval;
  • Fig. 5 A first example down-mixing scheme according to the invention for two input signal components;
  • Fig. 6 A second example down-mixing scheme according to the invention for two input signal components
  • Fig. 7 A detailed second example down-mixing scheme according to the invention for two input signal components
  • Fig. 8 An example of OPD calculation at the up-mixing side.
  • Stereo or multi-channel encoding and decoding is typically done in subbands where complex signals appear, invention applies to the processing for a single sub-band only.
  • a mixing process is a linear mapping from one set of signals to another set. This can be written as:
  • the matrix H is usually a signal-dependent matrix which is varying at a much lower rate than the signal itself. In audio coding this is exploited by measuring and transmitting the matrix information at a much lower update rate than the signals.
  • the matrix entries are a function of an Interchannel Level Difference (ILD), an Interchannel Phase Difference (IPD) and an Interchannel Coherence (ICC). These three parameters are typically measured in the audio encoder from the signals x. and transmitted to the decoder such that the mixing process can be reversed.
  • the OPD is usually defined as a phase difference between one of the signals x. and one of the signals y l .
  • the invention deviates from this in order to improve the quality of the mixed signals, and also to define an appropriate OPD in the decoder in absence of knowledge of the encoding OPD definition.
  • the invention defines the OPD by requiring that the resulting mix (up- or down-mix) signal is as smooth as possible.
  • the smoothness is measured from the phase alignment between one or more (consecutive) output signals.
  • the signals y 2J (k) are defined.
  • the average co -variance c can be introduced which is defined as:
  • the window may depend on the index / .
  • c is proportional to ⁇ ⁇ " + ⁇ where 0 m+1 is the OPD of the m + l frame.
  • x(k) denotes the vector of signals X ; ⁇ k) at instant k and * denotes Hermitian transposition. Setting the phase of c equal to zero, which corresponds to an average phase alignment of the signals, uniquely defines 0 m+1 as a function of the input signals X ; ⁇ k) and the matrices H l and H 2 and the previous OPD 0 m .
  • a similar principle can be constructed by not only considering a single previous frame ( m ) but also by using more previous frames.
  • the first OPD (at the beginning of mixing process) can be chosen arbitrarily.
  • a standard definition of an OPD could be used, i.e., an OPD based on phase-alignment of the one of the mixed signals with any of the input signals.
  • the OPD calculation according to the invention may be re-initialized at certain frame with the OPD according to a standard definition.
  • the OPD parameters may be transmitted.
  • phase parameters such as IPD and OPD in the upmix may contribute significantly to the perceived quality in PS (Parametric Stereo) based audio codecs and may in particular substantially improve sound source localization.
  • PS Parametric Stereo
  • the OPD parameter is indicative of the phase offset between the downmix and at least one of the stereo channels and it thus reflects how the phase should be distributed between the channels.
  • the OPD may accordingly be included in the encoded signal by an encoder.
  • the OPD can be transmitted from encoder to decoder using a relatively limited bit budget, this approach does increase the overall data rate for the signal. Therefore, the an OPD estimation may be performed at the decoder side such that the OPD value is not included in the encoded signal but is instead calculated by the decoder from the other parameter values.
  • the OPD may e.g. be calculated from (ref. e.g. Jimmy Lapierre and Roch Lefebvre, "On Improving Parametric Stereo Audio Coding", Presented at the 120 th
  • discontinuities can occur in time (e.g. if IPD changes in time from just below to just above ⁇ ).
  • time averaging e.g. as part of FFT or QMF windowing
  • this may lead to cancellation of the output signal.
  • audible artifacts perceived as 'clicks' or 'warbling' sounds.
  • the IPD changes (and thus sign inversions) may occur in the frequency domain between one subband and the next (e.g. if IPD in one band is just below ⁇ and just above ⁇ in a neighboring band). This may similarly result in noticeable artifacts.
  • the encoder may include signaling information in the encoded stereo signal indicating whether an OPD parameter should purely be estimated by the decoder or should be replaced (or compensated) by a phase correction parameter that is included in the encoded stereo signal.
  • signaling information may be provided for each segment of the parametrically encoded signal.
  • the encoded signal may be segmented in typically the time domain when being encoded.
  • the presence indication may simply indicate whether there are any phase correction parameter values for the current segment.
  • the presence indication can be a single bit denoting that for the current frame all time frequency blocks can be estimated reliably by the decoder. This may provide a very low data rate overhead (possibly a single bit per segment) and may reduce the complexity and/or resource usage of the decoder.
  • a more detailed presence indication may be used.
  • the presence indication may comprise individual presentation indications for a plurality of sets of time frequency blocks of the down-mix.
  • each set may correspond to one time frequency block for which individual PS parameters are provided. Further the sets may cover all time frequency blocks of the signal. Thus, specifically a single presence indication bit may be included for each parameter time frequency block indicating whether e.g. the OPD for the block can be purely estimated by the decoder or whether it must take into account a phase correction parameter provided for the block. It will be appreciated that in many embodiments, the phase correction parameter may indeed be provided for each time frequency block that belongs to the second parts of the downmix.
  • the OPD information is estimated from the other PS parameters or is decoded from the bit-stream.
  • the latter case may still employ the estimated data, depending on the coding scheme, e.g. by transmitting the difference between the estimated OPD and the OPD derived in the decoder.
  • Fig. 1 shows a block diagram of an apparatus 500 for mixing n digital input signals, denoted as xi,...,x n 501 into m digital output signals, denoted as yi,... ,y m 502.
  • the apparatus 500 contains a unit for deriving mixing parameters 520 for subsequent time intervals.
  • This derivation may comprise external control data, such as e.g. a matrix with time- varying gains mapping the input channels to the output channels, or it may comprise decoding a digital bit-stream comprising mixing parameters.
  • These mixing parameters may consist directly of a matrix with time-varying gains, but may also consist of a set of spatial image parameters that describe desired relationships of the output channels.
  • the unit 530 determining the Overall Phase Difference (OPD) information parameter for a given time interval advantageously employs mixing parameters in said given time interval and the mixing parameters in a previous time interval, both indicated in the figure by 503.
  • the mix unit 510 mixes the n input signals 501 into p output signals 502 in response to the OPD information parameter 505 and the mixing parameters 504. This allows creating a phase- aligned mix of the output signals, thereby reducing artefacts as known by state-of-the-art methods that do not apply phase alignment of subsequent time intervals.
  • the down-mix (typically per frame) is weighted sum of the left and right signal.
  • the proposed processing will be described on a frame basis.
  • a prototype window w is defined.
  • a shift parameter U also referred to as an update parameter, is also defined.
  • the window has finite support with its support larger than U such that processing is done in overlapping frames.
  • the signals cc ⁇ , cc r and A need to be determined. Instead of these three components only two weights: A l and A r nQQd to be determined.
  • a l and A r nQQd For a first example down-mix scheme it is however, for illustration purpose, more convenient to treat these parameters separately.
  • the total down-mix signal is the sum of the frame down-mixes
  • the invention is now that the weights cc ⁇ and cc r are determined by phase alig onment of the signals / m and r m with d m m — ,i . This is shown in Fig ⁇ . 5.
  • the signal components can be wideband signal components or narrow band (or sub band) signal components. In the latter case, the signal components are the signal components in one and the same of the sub bands.
  • Time intervals , Tj, Tj+i, Tj+2, .... are defined as given in Fig. 2. In this example, the time intervals are overlapping by half of the length of the time intervals. This, however is not a necessity. The time intervals can also overlap by more or less than half of the length of the time intervals.
  • the signal components of the n input signal components in the time interval Tj are used to derive the mixing matrix H and thus the matrix coefficients in the mixing matrix H.
  • the matrix H reduces to a row matrix.
  • the phase difference between each channel and the first channel can be measured (IPD parameter) and the entries in the matrix may be chosen such that they compensate for the phase differences. In this way the down-mix becomes a phase-aligned down-mix where no destructive phase cancellation occurs.
  • the signal components of the n input signal components in the time interval T i+ 1 are used to derive the mixing matrix H i+ 1 and thus the matrix coefficients in the mixing matrix
  • the first step is actually carried out in this way in case n > p, or in case of the apparatus being a down mixing apparatus.
  • the matrix coefficients of the matrices H and or the coefficients such as ILD, IPD and ICC from which the matrixing coefficients can be derived are supplied to the upmixing apparatus, for each of the subsequent time intervals , Tj,
  • the matrix parameters (the mixing matrices ... H,, H i+ i, ... or their equivalents in the form of the ILD, IPD and ICC parameters per time interval T, ) have been obtained from either the above described method or extracted from an incoming bitstream. Further, Overall Phase Difference information is derived as explained above and is now available in the form of (3 ⁇ 4 9 i+ i, for subsequent overlapping time intervals , OLTj, OLT i+ i,
  • Y(k) expQOi) A H t X(k) + ex P 09 i+l ) B H i+1 X(k)
  • a and B are weighing (or windowing) functions that behave as a function of time (k) as indicated in Fig. 3, where A varies for increasing time in the overlapping time interval OL T, from a maximum value (e.g. 1 or 1 ⁇ 2) to a value of zero, whereas B varies for increasing time in the overlapping time interval OL T, from a value zero to the same maximum value.
  • a and B are weighing (or windowing) functions that behave as a function of time (k) as indicated in Fig. 3, where A varies for increasing time in the overlapping time interval OL T, from a maximum value (e.g. 1 or 1 ⁇ 2) to a value of zero, whereas B varies for increasing time in the overlapping time interval OL T, from a value zero to the same maximum value.
  • the overall phase correction carried out by the exponent term in the formula is each time updated with the next 9 value.
  • the 9 values could be considered as differential OPD information. It will be evident that, instead of generating absolute OPD information for subsequent time intervals (...., 9j, 9 i+ i, ...), it is equally well possible to each time generate differential OPD information for subsequent time intervals ( , ⁇ , - 9 1 , 9 i+1 - Q ).
  • the derivation of the overall phase difference parameter information ⁇ , for the subsequent overlapping time intervals OLTj is done in the same way as described above, based on at least the portions of the signal components X in those overlapping time intervals.
  • Fig. 5 shows an example down-mix device according to the invention.
  • the signals l m 101 and d m _ x 103 are fed into a block PA 110 which measures a phase alignment and determines the coefficient cc ⁇ .
  • cc r is determined.
  • the signals r m 102 and d m _ x 103 are fed into a block PA 120 which measures a phase alignment and determines the coefficient cc r .
  • cc ⁇ and cc r only change the phase, because
  • ⁇ 2 1
  • 1 .
  • the coefficients cc ⁇ and a r are used for amplification of signals l m lOland r m 102, respectively, in gain blocks 130 and 140, respectively.
  • a second window w 0 overlap window
  • w m overlap window
  • cross-correlations can be determined as an alternative measure:
  • weights cc ⁇ and cc r can be determined from these measures. For example cc ⁇ and cc r can be determined as: a, " I
  • d m _ x As alignment target, it is also possible to create from d m _ x a second signal (e.g., a prediction) of the down-mix in the current frame and align to this signal. It is also possible to use the sum and difference signals (/ m + r m and l m - r m ) instead of / m and r m itself.
  • the left and right signals can first be phase-aligned among themselves and subsequently added.
  • an OPD can be calculated such that the current down-mix signal aligns with the past down-mix signal d m _ x .
  • a first unit (IPD) 210 receives the left signal 101 and right signal 102 of frame m , and calculates the IPD of that frame, which is indicated as a signal ipd m 201. Subsequently the signal 201 can be taken up in an encoded bit stream. On basis of the measured ipd, the two signals are phase aligned and outputted as l m 203 and r m 204. The signals 203 and 204 are subsequently added in an adder 220 to form a preliminary down-mix d m 205. Next, in a unit 230 an OPD is calculated and applied such that the resulting down- mix d m 207 is aligned with the previous down-mix d m _ x 206 in the overlap region.
  • FIG. 7 A more detailed sketch of this down-mixing scheme is provided in Fig. 7.
  • the signals l m 101 and r m 102 are fed to an alignment measurement unit PI indicated as 211, which measures the phase alignment between the left signal lOland right signal 102.
  • the output is the ipd 201, which can be taken up in the bit stream.
  • the right signal 102 is aligned to the left signal 101 by the multiplier 212 that creates signal r m 204.
  • An adder 220 adds the signals 204 and 101.
  • the preliminary down-mix d m 205 is compared with the previous down-mix d m _ x 206 in unit P2 indicated as
  • the unit 231 has the same character as the units PA from Fig. 8.
  • the output of PA is a multiplication factor (typically a complex number on the unit circle) 208, which drives a multiplier 232 and transforms the preliminary down-mix 205 to the final down-mix d m 207, which is phase-aligned with d m _ x .
  • output energy (energy of d m ) is automatically equal to the coherent input energy: i.e.,
  • K l is the time index associated with the m th data set of stereo cues ILD, IPD, ICC and K 2 the instant m + 1 st set. These are available at the decoder.
  • Gains g n , g l2 , g rl , g r2 and ipd phases ⁇ 1 , ⁇ 2 can be derived from the stereo cues. This calculation depends on whether the down-mix has been normalized to the sum energy of the input signal or another normalization like the normalization to the coherent energy. The following terminology is used: an OPD at instant m is called 0 m , and the down- mix signal is called d(k) in the time interval K l ⁇ k ⁇ K 2 .
  • An objective is to calculate 0 m+1 . This is achieved by requiring that the up-mix signals in the overlap area created by the first parameter set is as good as possible aligned with the up-mix signals created by using the second parameter set. Instead of measuring the alignment as is done in the down-mix schemes, it can be shown that the optimal OPD alignment change can be calculated from the transmitted stereo cues of two consecutive frames.
  • the processing related to the above calculation is shown in Fig. 8.
  • the unit OPD-D indicated as 310 receives the stereo cues from the present frame 302 and the stereo cues from the next frame 301.
  • the unit 310 generates the phase update factor in the form of e jAm indicated as 303.
  • the stereo cues are translated to the gains g and the phases ⁇ after which the above expression for e jAm can be used.
  • the invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
  • the invention may optionally be
  • an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Algebra (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The invention proposes an apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers. The proposed apparatus comprises a unit for deriving mixing parameters for subsequent time intervals, a unit for determining an Overall Phase Difference (OPD) information parameter, and a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters. The determining unit of the proposed apparatus is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.

Description

Apparatus for mixing a digital audio
FIELD OF INVENTION
The invention relates to an apparatus for mixing a digital audio.
DESCRIPTION OF THE PRIOR ART
Down-mixing is an often applied technique. In its simplest form, having a stereo signal which one wants to render over a mono system, one can take the mid signal (left plus right) and feed it to the loudspeaker.
Down-mixing is also applied in Parametric Stereo (PS) coding and its extension of multi-channel coding (e.g., MPEG Surround: MPS). In the PS case, stereo cues (inter-channel level differences, time- or phase-differences and coherence) are determined per time- frequency tile (typically a Bark or ERB band division of the frequency axis) and transmitted to the decoder together with a down-mix signal.
In a Parametric Stereo decoder, the processing is reversed. The down-mix signal is processed (per time-frequency tile) to create two signals based on the stereo cues. This process is known as up-mixing. De-correlation signals are typically added to create the desired measure of (in)coherence.
The stereo cues that are transmitted are the Inter-channel Level Difference (ILD), the Inter-channel Phase Difference (IPD) and the Inter-channel Coherence (ICC). Additionally, an Overall Phase Difference (OPD) may be transmitted or, alternatively, may be calculated in the decoder if the method of down-mixing in the encoder is fixed and known to the decoder (see Breebaart, J., van de Par, S., Kohlrausch, A., and Schuijers, E. (2005). Parametric coding of stereo audio. Eurasip J. Applied Signal Proc. 9, 1305-1322, and granted European patent EP 1595247).
It is well-known that creating the mid signal typically results in somewhat dull signals, i.e., less brightness/high-frequency content. The reason for this is that in typical audio signals, the low-frequencies are time-aligned whereas at high frequencies they are not. Direct summation of the two stereo channels effectively suppresses the non-aligned signal components. This has been appreciated when constructing the down-mixes for PS (or MPS). Textbooks such as e.g. J. Breebaart and C. Faller. Spatial audio processing. Chichester (UK): Wiley, 2007. Pages 76-77, state that phase alignment must be done to ensure best results. However, in practice, the down-mixing is often a so-called passive down-mix (i.e., the mean of left and right signals). There are several reasons for this, like complexity and algorithmic delay (J. Lapierre and R. Lefebvre. On improving parametric stereo audio coding. 120th AES Convention, Paris (F), 20-23 May 2006). But there are also quality issues when creating a phase aligned down-mix. If the inter-channel phase-difference is measured, there is an ambiguity whether to align the phase of the left to the right or vice versa. Also trying to shift the phase of both equally but in opposite directions leads to ambiguity. On top of that, the phase difference is numerically ill-conditioned when the correlation is low. Overall this leads to additional artifacts when creating a down-mix by phase-alignment, most notably modulations on tonal components.
All of these reasons have led to the adoption of a passive or active down-mix. In a passive down-mix, the down-mix signal is the average of left and right. Unfortunately, also passive down-mixing has some problems. One of these (as already noted) is that acoustic energy can get lost (Samsudin, Evelyn Kurniawati, Farook Sattar, Ng Boon Poh, and Sapna George. A subband domain downmixing scheme for parametric stereo encoder. 120th AES Convention, Paris (F), 20-13 May 2006. ConvA Paper 6815, or J. Breebaart and C. Faller. Spatial audio processing. Chichester (UK): Wiley, 2007. Pages 76-77). Several methods to compensate have been proposed like active down-mixing (rescaling the down-mix) or decoder-side energy compensation (J. Lapierre and R. Lefebvre. On improving parametric stereo audio coding. 120th AES Convention, Paris (F), 20-23 May 2006). The compensation is on a rather global level and does not discriminate between tonal components (where compensation is necessary) and noise. However, in both passive as active down-mix, out-of- phase components are completely absent in the down-mix signal. For active down-mixes this leads to numerical problems when determining the re-scaling and/or artifacts in the decoded signal.
SUMMARY OF THE INVENTION
Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination. The invention is defined by the independent claims. The dependent claims define advantageous embodiments. One aspect of the invention proposes an apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the apparatus comprising:
- a unit for deriving mixing parameters for subsequent time intervals,
- a unit for determining an Overall Phase Difference (OPD) information parameter, and
- a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters,
characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.
The invention targets a better down-mix procedure where the objective is to overcome the problems of phase-alignment process and thus to be able to pull out-of-phase signal components into the down-mix. The main advantage is the improved quality. A second advantage (dependent on the specific implementation and desired normalization of the down- mix signal) is that the rescaling of the amplitude of the signal need not be undertaken anymore, because the down-mix can be automatically scaled to the coherent energy (since signal and energy cancellation no longer occur).
The invention also targets a blind OPD estimation in the decoder by the same principles underlying the OPD determination in the proposed (down/up-)mixing procedure. This is advantageous because the OPD need not to be transmitted nor any assumption about the down-mix needs to be made.
The blind up-mix is the preferred up-mix method associated with the proposed down-mix. An alternative would be to measure with the new down-mix an OPD in a standard way in the encoder, e.g., by measuring the phase-alignment of the created down-mix with the left (or right) signal. The OPD can then be transmitted. However, this introduces the already noted problem that the correlation between the down-mix and left (or right) signal may be low and thus the OPD inaccurate. This means that the blind up-mix has two advantages over existing methods when dealing with a down-mix created by the proposed method: no bit rate needs to be spent on transmitting OPD information and quality issues due to OPD
inaccuracies cannot occur.
The invention is based on the following insights. Creating a phase-aligned down-mix is desired in order to prevent loss of energy. The necessary difference phase is easily measured and is called the IPD (Inter-channel Phase Difference). However, generating the so-called Overall Phase Difference poses a problem. It can be defined by e.g. phase- alignment to the left signal, right signal or mid signal but in all of these cases signals can be determined where the OPD is ill-defined due to the fact that the signal to which the alignment takes place is small or even zero.
Therefore, instead of aligning the two signals amongst themselves, it is proposed to align the signals to some third signal which is not derived from the signal pieces which have to be aligned. A very suitable signal is the already created down-mix signal (in particular, that of a previous frame) since in that way smoothness of the down-mix signal itself can be guaranteed. Furthermore, the correlation in the overlap region of two
consecutive frames is necessarily high due to the fact that they are generated from the same signal pieces.
This prevents the problems of phase alignment of signals having a low correlation (except for the zero-signal which, fortunately, can be treated in rather arbitrary ways).
In a similar way, an OPD in the up-mix (decoder) can be derived. The basic idea here is that the OPD is used to attain phase alignment between frames; it is in fact not associated with creating a proper stereo image; this latter is defined by the stereo cues (ILD, IPD and ICC) rather than by the OPD. Current OPD definitions rely on phase alignment to signals existing in the encoder. The current OPD definition deviates from that and therefore the phases of the encoded and decoded signal may vary significantly. This however, is in no way detrimental to the stereo image; the inter-channel phase differences are kept intact by the IPD. In particular, the OPD of the current frame is an updated version of that of the previous frame such that a maximum phase alignment between the decoded signals of consecutive frames is realized.
The essence of the present invention in up- and down-mixing is the phase alignment (OPD) between consecutive frames of the output (up-mixed or down-mixed) signals. In particular, the current output signals(s) are phase-aligned to past output signal(s) instead of phase-aligned to a signal or signals (or components thereof) in the current frame as is the current state of the art.
Apparatus as claimed in claim 1, characterized in that the OPD parameter information is a differential OPD parameter.
In a further embodiment, the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the n input signal components in a sub interval lying in said given time interval and/or said previous time interval.
In a further embodiment, the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the m output signal components in a sub interval lying in said given time interval and/or said previous time interval.
In a further embodiment, the time intervals are overlapping time intervals. In a further embodiment, said previous time interval is an immediately preceding time interval.
In a further embodiment, n>m.
In a further embodiment, n=2 and p=l .
In a further embodiment, n<m.
In a further embodiment, n=l and p=2.
In a further embodiment, for n<m and the apparatus further comprising an input for receiving an OPD parameter value, the determining unit is adapted to re-initialize the OPD parameter information on the basis of the received OPD parameter value.
In a further embodiment, further being adapted to receive an indicator signal indicating the receipt of the OPD parameter value, the determining unit being adapted to reinitialize the OPD parameter information in response to said received indicator signal.
In a further embodiment, the mixing parameters and the OPD parameter value are derived for each of a plurality of frequency bands. BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be further elucidated in the figures:
Fig. 1 An example block diagram of an apparatus for mixing digital audio according to the invention;
Fig. 2 An example of time intervals used for derivation of OPD parameter information;
Fig. 3 An example of weighing functions used for derivation of OPD parameter information;
Fig. 4 An example of time intervals where overlapping intervals are smaller than half the length of the time interval; Fig. 5 A first example down-mixing scheme according to the invention for two input signal components;
Fig. 6 A second example down-mixing scheme according to the invention for two input signal components;
Fig. 7 A detailed second example down-mixing scheme according to the invention for two input signal components;
Fig. 8 An example of OPD calculation at the up-mixing side.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
Stereo or multi-channel encoding and decoding is typically done in subbands where complex signals appear, invention applies to the processing for a single sub-band only. In the notation used across the description of the invention * is used to denote a complex conjugate (or transposition and conjugation in case of vectors and matrices) and j = V-T .
A mixing process is a linear mapping from one set of signals to another set. This can be written as:
Figure imgf000008_0001
If p < n the process is called down-mixing, if p > n the process is called up- mixing. The matrix H is usually a signal-dependent matrix which is varying at a much lower rate than the signal itself. In audio coding this is exploited by measuring and transmitting the matrix information at a much lower update rate than the signals. In audio coding, the matrix entries are a function of an Interchannel Level Difference (ILD), an Interchannel Phase Difference (IPD) and an Interchannel Coherence (ICC). These three parameters are typically measured in the audio encoder from the signals x. and transmitted to the decoder such that the mixing process can be reversed. Since the update of the mixing parameters is at lower rate than that of the signal, the concept of a frame where a single mixing parameter is associated with a set of time indices is common. These frames are often overlapping (or may be viewed as such), and output signals from a mixer be generated in an overlap-add fashion.
The OPD is usually defined as a phase difference between one of the signals x. and one of the signals yl . The invention deviates from this in order to improve the quality of the mixed signals, and also to define an appropriate OPD in the decoder in absence of knowledge of the encoding OPD definition.
The invention defines the OPD by requiring that the resulting mix (up- or down-mix) signal is as smooth as possible. In particular, the smoothness is measured from the phase alignment between one or more (consecutive) output signals.
Two consecutive frames and a general idea of how to arrive at an OPD are considered in the description of the current invention. The concept can nevertheless be extended from a current frame and the previous frame to that of the current frame and set of past frames. Suppose that two mixing matrices are given, called Hm and Hm+1 corresponding to m and m + 1 st frame. Consider an associated time interval such that if Kx is the mid of frame m and K2 the mid of frame m + 1 , Kx < k≤ K2 is considered as the overlap region. However, other definitions of a time interval are feasible a well. It is also very common to consider a window defined on such an interval, which is denoted by w0 (k) .
Consider the signals yl on the overlap region that would be created from the signals xi if the mixing matrix Hl is applied. These signals are denoted by yul (k) .
Similarly, the signals y2J (k) are defined. To optimize the smoothness (in other words to minimize discontinuity of the output signals over the overlap region), the average co -variance c can be introduced which is defined as:
Figure imgf000009_0001
However, other measures like e.g. an average correlation coefficient can be used as well. Also, if desired the window may depend on the index / .
It is now noted that c is proportional to β" where 0m+1 is the OPD of the m + l frame. In particular:
Figure imgf000009_0002
where x(k) denotes the vector of signals X; {k) at instant k and * denotes Hermitian transposition. Setting the phase of c equal to zero, which corresponds to an average phase alignment of the signals, uniquely defines 0m+1 as a function of the input signals X; {k) and the matrices Hl and H2 and the previous OPD 0m .
If n = 1 , setting the phase of c to zero uniquely defines 0m+1 as a function of the matrices Hl and H2 and the previous OPD (0m ).
A similar principle can be constructed by not only considering a single previous frame ( m ) but also by using more previous frames.
It should be noted that the first OPD (at the beginning of mixing process) can be chosen arbitrarily. In fact, a standard definition of an OPD could be used, i.e., an OPD based on phase-alignment of the one of the mixed signals with any of the input signals.
Alternatively, the OPD calculation according to the invention may be re-initialized at certain frame with the OPD according to a standard definition.
Alternatively, the OPD parameters may be transmitted. Using phase parameters such as IPD and OPD in the upmix may contribute significantly to the perceived quality in PS (Parametric Stereo) based audio codecs and may in particular substantially improve sound source localization. However, as IPD only describes the relative phase modification between the two output signals and not the phase difference between the downmix and the individual stereo channels, it cannot provide any information of how the phase modification should be distributed across the output stereo channels. The OPD parameter is indicative of the phase offset between the downmix and at least one of the stereo channels and it thus reflects how the phase should be distributed between the channels. The OPD may accordingly be included in the encoded signal by an encoder. However, although the OPD can be transmitted from encoder to decoder using a relatively limited bit budget, this approach does increase the overall data rate for the signal. Therefore, the an OPD estimation may be performed at the decoder side such that the OPD value is not included in the encoded signal but is instead calculated by the decoder from the other parameter values.
The OPD may e.g. be calculated from (ref. e.g. Jimmy Lapierre and Roch Lefebvre, "On Improving Parametric Stereo Audio Coding", Presented at the 120th
Convention, 2006 May 20-23 Paris, Audio engineering Society, France, Preprint 6804).
However, such approaches may result in a discontinuity for the OPD value as a function of the IPD value. For example, discontinuities can occur in time (e.g. if IPD changes in time from just below to just above π). However, as the processing of segments include time averaging (e.g. as part of FFT or QMF windowing) this may lead to cancellation of the output signal. In particular, such an approach typically results in audible artifacts perceived as 'clicks' or 'warbling' sounds. Similarly, the IPD changes (and thus sign inversions) may occur in the frequency domain between one subband and the next (e.g. if IPD in one band is just below π and just above π in a neighboring band). This may similarly result in noticeable artifacts.
To prevent such artifacts, the encoder may include signaling information in the encoded stereo signal indicating whether an OPD parameter should purely be estimated by the decoder or should be replaced (or compensated) by a phase correction parameter that is included in the encoded stereo signal.
In some embodiments, signaling information may be provided for each segment of the parametrically encoded signal. Thus, the encoded signal may be segmented in typically the time domain when being encoded. The presence indication may simply indicate whether there are any phase correction parameter values for the current segment. For example, the presence indication can be a single bit denoting that for the current frame all time frequency blocks can be estimated reliably by the decoder. This may provide a very low data rate overhead (possibly a single bit per segment) and may reduce the complexity and/or resource usage of the decoder. In other embodiments, a more detailed presence indication may be used. Specifically, the presence indication may comprise individual presentation indications for a plurality of sets of time frequency blocks of the down-mix. Specifically, each set may correspond to one time frequency block for which individual PS parameters are provided. Further the sets may cover all time frequency blocks of the signal. Thus, specifically a single presence indication bit may be included for each parameter time frequency block indicating whether e.g. the OPD for the block can be purely estimated by the decoder or whether it must take into account a phase correction parameter provided for the block. It will be appreciated that in many embodiments, the phase correction parameter may indeed be provided for each time frequency block that belongs to the second parts of the downmix.
At the decoder side (depending on the signaling information), the OPD information is estimated from the other PS parameters or is decoded from the bit-stream. The latter case may still employ the estimated data, depending on the coding scheme, e.g. by transmitting the difference between the estimated OPD and the OPD derived in the decoder.
Fig. 1 shows a block diagram of an apparatus 500 for mixing n digital input signals, denoted as xi,...,xn 501 into m digital output signals, denoted as yi,... ,ym502. The apparatus 500 contains a unit for deriving mixing parameters 520 for subsequent time intervals. This derivation may comprise external control data, such as e.g. a matrix with time- varying gains mapping the input channels to the output channels, or it may comprise decoding a digital bit-stream comprising mixing parameters. These mixing parameters may consist directly of a matrix with time-varying gains, but may also consist of a set of spatial image parameters that describe desired relationships of the output channels. The unit 530 determining the Overall Phase Difference (OPD) information parameter for a given time interval advantageously employs mixing parameters in said given time interval and the mixing parameters in a previous time interval, both indicated in the figure by 503. The mix unit 510 mixes the n input signals 501 into p output signals 502 in response to the OPD information parameter 505 and the mixing parameters 504. This allows creating a phase- aligned mix of the output signals, thereby reducing artefacts as known by state-of-the-art methods that do not apply phase alignment of subsequent time intervals.
Several mixing schemes will be subsequently described in more detail wherein either p or n equals 1. For p = 1 it will also be illustrated that it is possible to deviate from the average co-variance measure without deviating from the current invention.
It should be clear from the wording of description of these detailed examples how the signals and blocks match to the wording of the general mixing scheme as described above.
The down-mix (typically per frame) is weighted sum of the left and right signal. The proposed processing will be described on a frame basis. For this purpose a prototype window w is defined. Furthermore, a shift parameter U , also referred to as an update parameter, is also defined. Typically, the window has finite support with its support larger than U such that processing is done in overlapping frames. The m th window is defined as: wm (k + mU) = w(k) and amplitude-complimentary windows as: = 1
m
The left and right frame signals lm and rm are then expressed as: im {k) = i{k)wm {k),
rm {k) = r{k)wm {k).
The down-mix in frame m is called d mm and is described as dm {k) = A{allm {k)+ arrm {k))
In order to create the down-mix, the signals cc^ , ccr and A need to be determined. Instead of these three components only two weights: A l and A r nQQd to be determined. For a first example down-mix scheme it is however, for illustration purpose, more convenient to treat these parameters separately.
The total down-mix signal is the sum of the frame down-mixes
4*)= m
The invention is now that the weights cc^ and ccr are determined by phase alig onment of the signals / m and r m with d mm— ,i . This is shown in Fig ©. 5.
In the following an additional explanation will be given as to how the OPD parameter information can be derived for an apparatus which mixes n input signal components into p output signal components. The signal components can be wideband signal components or narrow band (or sub band) signal components. In the latter case, the signal components are the signal components in one and the same of the sub bands. Time intervals , Tj, Tj+i, Tj+2, .... are defined as given in Fig. 2. In this example, the time intervals are overlapping by half of the length of the time intervals. This, however is not a necessity. The time intervals can also overlap by more or less than half of the length of the time intervals.
In a first step, the signal components of the n input signal components in the time interval Tj are used to derive the mixing matrix H and thus the matrix coefficients in the mixing matrix H.
As an example, consider the case where the n input signals are down-mixed to a single signal (p=l). The matrix H reduces to a row matrix. The phase difference between each channel and the first channel can be measured (IPD parameter) and the entries in the matrix may be chosen such that they compensate for the phase differences. In this way the down-mix becomes a phase-aligned down-mix where no destructive phase cancellation occurs.
In a second step, the signal components of the n input signal components in the time interval Ti+ 1 are used to derive the mixing matrix Hi+ 1 and thus the matrix coefficients in the mixing matrix
The first step is actually carried out in this way in case n > p, or in case of the apparatus being a down mixing apparatus. In case of upmixing (n < p), the matrix coefficients of the matrices H and or the coefficients such as ILD, IPD and ICC from which the matrixing coefficients can be derived (as derived in the down matrixing apparatus) are supplied to the upmixing apparatus, for each of the subsequent time intervals , Tj,
Figure imgf000014_0001
In a third step, for the n input signal components in the overlapping time interval OLTj, Overall Phase Difference parameter information is derived by calculating the cross-correlation in the overlap region:
c = e-^ l x {k)H lHi kMk)
k=Kx 1=1 where (¾ and (¾ +; are the Overall Phase difference parameters for the time interval Tj and Ti+i respectively. Setting the angle of the cross-correlation to zero, yields phase alignment. As (¾ is already known from the previous time interval Tu 9i+i is uniquely defined.
This procedure continues for subsequent time intervals.
The mixing matrices so obtained for the subsequent time intervals , T,,
Ti+i, Tj+2, .... and the Overall Phase Difference information so obtained for the subsequent overlapping time intervals , OLTj, OLTi+i, OLTi+2, .... are now used in the actual mixing stage to mix the n input signal components into the p output signal components. This is done in the following way.
The matrix parameters (the mixing matrices ... H,, Hi+i, ... or their equivalents in the form of the ILD, IPD and ICC parameters per time interval T, ) have been obtained from either the above described method or extracted from an incoming bitstream. Further, Overall Phase Difference information is derived as explained above and is now available in the form of (¾ 9i+i, for subsequent overlapping time intervals , OLTj, OLTi+i,
OLTi+2, .... The p output signal components in the overlapping time interval OL T, can now be obtained from the following formula:
Y(k) = expQOi) A Ht X(k) + exP09i+l) B Hi+1 X(k) where A and B are weighing (or windowing) functions that behave as a function of time (k) as indicated in Fig. 3, where A varies for increasing time in the overlapping time interval OL T, from a maximum value (e.g. 1 or ½) to a value of zero, whereas B varies for increasing time in the overlapping time interval OL T, from a value zero to the same maximum value. is a vector containing n input signal components at time k where k is in the overlapping time interval OL T,.
This is repeated for subsequent overlapping time intervals.
As will be appreciated the overall phase correction carried out by the exponent term in the formula is each time updated with the next 9 value. In that sense, the 9 values could be considered as differential OPD information. It will be evident that, instead of generating absolute OPD information for subsequent time intervals (...., 9j, 9i+i, ...), it is equally well possible to each time generate differential OPD information for subsequent time intervals ( , Θ, - 9 1, 9i+1 - Q ...).
In case the overlapping intervals are smaller than half the length of the time intervals, see Fig. 4, the processing for deriving the mixing matrix parameters Hj, Hi+i, Hi+2 .... for the subsequent time intervals Tj, Ti+i , Ti+2 ... is the same.
The derivation of the overall phase difference parameter information Θ, for the subsequent overlapping time intervals OLTj is done in the same way as described above, based on at least the portions of the signal components X in those overlapping time intervals.
Fig. 5 shows an example down-mix device according to the invention. The signals lm 101 and dm_x 103 are fed into a block PA 110 which measures a phase alignment and determines the coefficient cc^ . Similarly, ccr is determined. The signals rm 102 and dm_x 103 are fed into a block PA 120 which measures a phase alignment and determines the coefficient ccr . In the simplest case, cc^ and ccr only change the phase, because |α21 = |ocr | = 1 . The coefficients cc^ and a r are used for amplification of signals lm lOland rm 102, respectively, in gain blocks 130 and 140, respectively. The amplified signals provided by blocks 130 and 140 are fed into an adder 150 that adds both signals and results in a signal sm 106: sm (k) = allm {k) + arrm {k)
Subsequently, the signals sm 106, I m 101 and rm 102 are fed into a unit Ec 160 which determines and applies amplification by a gain A : dm {k) = Asm {k) .
The operation of the individual block 110 and 120 can be as follows. In the PA unit, a second window w0 (overlap window) is defined on the union of the supports of wm and wm_j . Subsequently, the following measure is defined in the respective units:
Figure imgf000016_0001
Alternatively, the cross-correlations can be determined as an alternative measure:
Figure imgf000016_0002
The weights cc^ and ccr can be determined from these measures. For example cc^ and ccr can be determined as: a, " I
a = ej%> with θ, = Zv,
ΘΓ = Zv, which aligns lm and rm to the down-mix of the previous frame. Another example of determining cc^ and ccr is:
αΓ = + P r -
The parameters bl and br bias the down-mix. If br = bl it biases towards the sum signal (which is usually qualitatively a good solution). Taking (i.e., in contrast to vl both amplitude and phase) ensures that the signal with small coherence with the past down-mix is (somewhat) suppressed. This improves the smoothness of the down-mix signal.
Having created sm , it is often desired to control its strength. This can be done by determining the energies
Figure imgf000017_0001
and taking
Figure imgf000017_0002
or
Figure imgf000017_0003
Instead of using dm_x as alignment target, it is also possible to create from dm_x a second signal (e.g., a prediction) of the down-mix in the current frame and align to this signal. It is also possible to use the sum and difference signals (/m + rm and lm - rm ) instead of / m and r m itself.
Alternatively, the left and right signals can first be phase-aligned among themselves and subsequently added. For this preliminary down-mix dm , an OPD can be calculated such that the current down-mix signal aligns with the past down-mix signal dm_x .
The scheme is shown in Fig. 6. A first unit (IPD) 210 receives the left signal 101 and right signal 102 of frame m , and calculates the IPD of that frame, which is indicated as a signal ipdm 201. Subsequently the signal 201 can be taken up in an encoded bit stream. On basis of the measured ipd, the two signals are phase aligned and outputted as lm 203 and rm 204. The signals 203 and 204 are subsequently added in an adder 220 to form a preliminary down-mix dm 205. Next, in a unit 230 an OPD is calculated and applied such that the resulting down- mix dm 207 is aligned with the previous down-mix dm_x 206 in the overlap region.
A more detailed sketch of this down-mixing scheme is provided in Fig. 7. The signals lm 101 and rm 102 are fed to an alignment measurement unit PI indicated as 211, which measures the phase alignment between the left signal lOland right signal 102. The output is the ipd 201, which can be taken up in the bit stream. On basis of the measured ipd, the right signal 102 is aligned to the left signal 101 by the multiplier 212 that creates signal rm 204. An adder 220 adds the signals 204 and 101. In the OPD unit 230, the preliminary down-mix dm 205 is compared with the previous down-mix dm_x 206 in unit P2 indicated as
231 in the overlap region. The unit 231 has the same character as the units PA from Fig. 8. The output of PA is a multiplication factor (typically a complex number on the unit circle) 208, which drives a multiplier 232 and transforms the preliminary down-mix 205 to the final down-mix dm 207, which is phase-aligned with dm_x .
It should be clear that in this implementation of the inventive idea, output energy (energy of dm ) is automatically equal to the coherent input energy: i.e.,
Ed = El + Er +
Figure imgf000018_0001
I provided that both multipliers have magnitude 1 and the phase of the first multiplier corresponds to the IPD. It is obvious that the down-mix schemes described above can be extended in a straightforward way to create a down-mix signal from 3 or more signals instead of two.
Suppose Kl is the time index associated with the m th data set of stereo cues ILD, IPD, ICC and K2 the instant m + 1 st set. These are available at the decoder.
Gains gn , gl2 , grl , gr2 and ipd phases φ12 can be derived from the stereo cues. This calculation depends on whether the down-mix has been normalized to the sum energy of the input signal or another normalization like the normalization to the coherent energy. The following terminology is used: an OPD at instant m is called 0m , and the down- mix signal is called d(k) in the time interval Kl≤ k≤ K2 .
An objective is to calculate 0m+1 . This is achieved by requiring that the up-mix signals in the overlap area created by the first parameter set is as good as possible aligned with the up-mix signals created by using the second parameter set. Instead of measuring the alignment as is done in the down-mix schemes, it can be shown that the optimal OPD alignment change can be calculated from the transmitted stereo cues of two consecutive frames.
As a measure an average covariance of the two up-mixed signals is used:
Figure imgf000019_0001
where ll , rx are the left and right up-mix from d using the parameter set of Kx and a similar definition of l2 , r2 . As before, w0 is a window function in the overlap region. For the signals, the following expressions hold: k = Slide1
Figure imgf000019_0002
i = grlde l e
= gr2de≠1 e where the decorrelation signals that are usually added to lm and rm are disregarded.
Substituting this in the definition of c gives
Figure imgf000020_0001
With the definition Δ m = θ m ++1l -Θ c can be expressed as:
c =∑w0 (k]d(kf [gngl2 + grlgr'2e* e-»>
k=K
Choosing maximum alignment means setting the phase of c equal to zero. This implies for the phase update:
Figure imgf000020_0002
(the sum over weighted squared d -amplitude does not contribute to the phase). Finally, the following formula holds
The processing related to the above calculation is shown in Fig. 8. The unit OPD-D indicated as 310 receives the stereo cues from the present frame 302 and the stereo cues from the next frame 301. The unit 310 generates the phase update factor in the form of ejAm indicated as 303. In the OPD-D unit 310, the stereo cues are translated to the gains g and the phases φ after which the above expression for ejAm can be used. Next, the OPD can be updated by ej(>m+l = e]km e]*m , which is realized using the multiplier 320.
The OPD calculation can be initialized in an arbitrary manner, e.g. by taking the first ( m = 1 ) OPD equal to 0. It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and/or processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same units, processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.
The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be
implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.
Furthermore, although individually listed, a plurality of units, means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate. Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.

Claims

CLAIMS:
1. Apparatus for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the apparatus comprising:
a unit for deriving mixing parameters for subsequent time intervals,
- a unit for determining an Overall Phase Difference (OPD) information parameter, and
a mixing unit for mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.
2. Apparatus as claimed in claim 1, characterized in that the OPD parameter information is a differential OPD parameter.
3. Apparatus as claimed in claim 1, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the n input signal components in a sub interval lying in said given time interval and/or said previous time interval.
4. Apparatus as claimed in claim 1, characterized in that the determining unit is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval and the p output signal components in a sub interval lying in said given time interval and/or said previous time interval.
5. Apparatus as claimed in claim 1, characterized in that the time intervals are overlapping time intervals.
6. Apparatus as claimed in claim 1 or 5, wherein said previous time interval is an immediately preceding time interval.
7. Apparatus as claimed in claim 1, characterized in that n>p.
8. Apparatus as claimed in claim 7, characterized in that n=2 and p=l .
9. Apparatus as claimed in claim 1, characterized in that n<p.
10. Apparatus as claimed in claim 9, characterized in that n=l and p=2.
11. Apparatus as claimed in claim 1 or 2, wherein n<p, the apparatus further comprising an input for receiving an OPD parameter value, characterized in that the determining unit is adapted to re-initialize the OPD parameter information on the basis of the received OPD parameter value.
12. Apparatus as claimed in claim 9, further being adapted to receive an indicator signal indicating the receipt of the OPD parameter value, the determining unit being adapted to re-initialize the OPD parameter information in response to said received indicator signal.
13. Apparatus as claimed in any of the preceding claims, wherein the mixing parameters and the OPD parameter value are derived for each of a plurality of frequency bands.
14. Method for mixing a digital audio comprising n input signal components into a mixed digital audio signal comprising p output signal components, where n and p are integers, the method comprising the steps of:
deriving mixing parameters for subsequent time intervals,
- determining an Overall Phase Difference (OPD) information parameter, and mixing the n input signal components into the p output signal components in response to said OPD parameter information and said mixing parameters,
characterized in that the step of determining an Overall Phase Difference (OPD) information parameter is adapted to generate the OPD parameter information for a given time interval from the mixing parameters in said given time interval and the mixing parameters in a previous time interval.
PCT/IB2010/054164 2009-09-29 2010-09-15 Apparatus for mixing a digital audio WO2011039668A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
EP09171625.8 2009-09-29
EP09171625 2009-09-29
EP09180302 2009-12-22
EP09180302.3 2009-12-22

Publications (1)

Publication Number Publication Date
WO2011039668A1 true WO2011039668A1 (en) 2011-04-07

Family

ID=43063640

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2010/054164 WO2011039668A1 (en) 2009-09-29 2010-09-15 Apparatus for mixing a digital audio

Country Status (2)

Country Link
TW (1) TW201138492A (en)
WO (1) WO2011039668A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2631906A1 (en) * 2012-02-27 2013-08-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Phase coherence control for harmonic signals in perceptual audio codecs
JP2016525716A (en) * 2013-07-22 2016-08-25 フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. Suppression of comb filter artifacts in multi-channel downmix using adaptive phase alignment
RU2679254C1 (en) * 2015-02-26 2019-02-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for audio signal processing to obtain a processed audio signal using a target envelope in a temporal area

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1595247A1 (en) 2003-02-11 2005-11-16 Koninklijke Philips Electronics N.V. Audio coding
US20090110201A1 (en) * 2007-10-30 2009-04-30 Samsung Electronics Co., Ltd Method, medium, and system encoding/decoding multi-channel signal
EP2169666A1 (en) * 2008-09-25 2010-03-31 Lg Electronics Inc. A method and an apparatus for processing a signal
WO2010097748A1 (en) * 2009-02-27 2010-09-02 Koninklijke Philips Electronics N.V. Parametric stereo encoding and decoding
US20100241436A1 (en) * 2009-03-18 2010-09-23 Samsung Electronics Co., Ltd. Apparatus and method for encoding and decoding multi-channel signal
WO2010115850A1 (en) * 2009-04-08 2010-10-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for upmixing a downmix audio signal using a phase value smoothing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1595247A1 (en) 2003-02-11 2005-11-16 Koninklijke Philips Electronics N.V. Audio coding
US20090110201A1 (en) * 2007-10-30 2009-04-30 Samsung Electronics Co., Ltd Method, medium, and system encoding/decoding multi-channel signal
EP2169666A1 (en) * 2008-09-25 2010-03-31 Lg Electronics Inc. A method and an apparatus for processing a signal
WO2010097748A1 (en) * 2009-02-27 2010-09-02 Koninklijke Philips Electronics N.V. Parametric stereo encoding and decoding
US20100241436A1 (en) * 2009-03-18 2010-09-23 Samsung Electronics Co., Ltd. Apparatus and method for encoding and decoding multi-channel signal
WO2010115850A1 (en) * 2009-04-08 2010-10-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for upmixing a downmix audio signal using a phase value smoothing

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
BREEBAART J ET AL: "Parametric Coding of Stereo Audio", INTERNET CITATION, 1 June 2005 (2005-06-01), pages 1305 - 1322, XP002514252, ISSN: 1110-8657, Retrieved from the Internet <URL:http://www.jeroenbreebaart.com/papers/jasp/jasp2005.pdf> [retrieved on 20090210] *
BREEBAART, J.; VAN DE PAR, S.; KOHLRAUSCH, A.; SCHUIJERS, E.: "Parametric coding of stereo audio", EURASIP J. APPLIED SIGNAL PROC., vol. 9, 2005, pages 1305 - 1322
J. BREEBAART; C. FALLER: "Spatial audio processing", 2007, WILEY, pages: 76 - 77
J. BREEBAART; C. FALLER: "Spatial audio processing. Chichester", 2007, WILEY, pages: 76 - 77
J. LAPIERRE; R. LEFEBVRE: "On improving parametric stereo audio coding", 120TH AES CONVENTION, 20 May 2006 (2006-05-20)
JIMMY LAPIERRE AND ROCH LEFEBVRE: "On Improving Parametric Stereo Audio Coding", AES CONVENTION PAPER 6804,, 1 May 2006 (2006-05-01), pages 1 - 9, XP009131876 *
JIMMY LAPIERRE; ROCH LEFEBVRE: "On Improving Parametric Stereo Audio Coding", 120TH CONVENTION, 20 May 2006 (2006-05-20)
KIM JUNGHOE ET AL: "Enhanced Stereo Coding with Phase Parameters for MPEG Unified Speech and Audio Coding", AES CONVENTION 127; OCTOBER 2009, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 1 October 2009 (2009-10-01), XP040509156 *
KIM MIYOUNG, OH EUNMI, SHIM HWAN: "Stereo audio coding improved by phase parameters", 8289, 4 November 2010 (2010-11-04), San Francisco, CA, USA, XP002610240, Retrieved from the Internet <URL:http://www.aes.org/e-lib/inst/download.cfm/15711.pdf?ID=15711> [retrieved on 20101118] *
SAMSUDIN; EVELYN KURNIAWATI; FAROOK SATTAR; NG BOON POH; SAPNA GEORGE: "A subband domain downmixing scheme for parametric stereo encoder", 120TH AES CONVENTION, 20 May 2006 (2006-05-20)
WERNER OOMEN ET AL: "MPEG4-Ext2: CE on Low Complexity parametric stereo", ITU STUDY GROUP 16 - VIDEO CODING EXPERTS GROUP -ISO/IEC MPEG & ITU-T VCEG(ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), XX, XX, no. M10366, 2 December 2003 (2003-12-02), XP030039221 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2631906A1 (en) * 2012-02-27 2013-08-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Phase coherence control for harmonic signals in perceptual audio codecs
WO2013127801A1 (en) * 2012-02-27 2013-09-06 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Phase coherence control for harmonic signals in perceptual audio codecs
CN104170009A (en) * 2012-02-27 2014-11-26 弗兰霍菲尔运输应用研究公司 Phase coherence control for harmonic signals in perceptual audio codecs
AU2013225076B2 (en) * 2012-02-27 2016-04-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Phase coherence control for harmonic signals in perceptual audio codecs
CN104170009B (en) * 2012-02-27 2017-02-22 弗劳恩霍夫应用研究促进协会 Phase coherence control for harmonic signals in perceptual audio codecs
RU2612584C2 (en) * 2012-02-27 2017-03-09 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Control over phase coherency for harmonic signals in perceptual audio codecs
US10818304B2 (en) 2012-02-27 2020-10-27 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Phase coherence control for harmonic signals in perceptual audio codecs
JP2016525716A (en) * 2013-07-22 2016-08-25 フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. Suppression of comb filter artifacts in multi-channel downmix using adaptive phase alignment
US10360918B2 (en) 2013-07-22 2019-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
US10937435B2 (en) 2013-07-22 2021-03-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
RU2679254C1 (en) * 2015-02-26 2019-02-06 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Device and method for audio signal processing to obtain a processed audio signal using a target envelope in a temporal area
US10373623B2 (en) 2015-02-26 2019-08-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing an audio signal to obtain a processed audio signal using a target time-domain envelope

Also Published As

Publication number Publication date
TW201138492A (en) 2011-11-01

Similar Documents

Publication Publication Date Title
EP2499638B1 (en) Parametric encoding and decoding
EP1999747B1 (en) Audio decoding
EP2834813B1 (en) Multi-channel audio encoder and method for encoding a multi-channel audio signal
JP5174973B2 (en) Apparatus, method and computer program for upmixing a downmix audio signal
JP5122681B2 (en) Parametric stereo upmix device, parametric stereo decoder, parametric stereo downmix device, and parametric stereo encoder
EP2702776B1 (en) Parametric encoder for encoding a multi-channel audio signal
JP5490143B2 (en) Upmixer, method, and computer program for upmixing a downmix audio signal
US8090122B2 (en) Audio mixing using magnitude equalization
CA2887228C (en) Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding
WO2010097748A1 (en) Parametric stereo encoding and decoding
JP7174081B2 (en) multi-channel audio coding
JP2015517121A (en) Inter-channel difference estimation method and spatial audio encoding device
WO2011039668A1 (en) Apparatus for mixing a digital audio
Lang et al. Novel low complexity coherence estimation and synthesis algorithms for parametric stereo coding
JP2017058696A (en) Inter-channel difference estimation method and space audio encoder

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10763025

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10763025

Country of ref document: EP

Kind code of ref document: A1