AU2019222947B2 - Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding - Google Patents

Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding Download PDF

Info

Publication number
AU2019222947B2
AU2019222947B2 AU2019222947A AU2019222947A AU2019222947B2 AU 2019222947 B2 AU2019222947 B2 AU 2019222947B2 AU 2019222947 A AU2019222947 A AU 2019222947A AU 2019222947 A AU2019222947 A AU 2019222947A AU 2019222947 B2 AU2019222947 B2 AU 2019222947B2
Authority
AU
Australia
Prior art keywords
signal
stereo
coding mode
frequency band
transform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
AU2019222947A
Other versions
AU2019222947A1 (en
Inventor
Pontus Carlsson
Kristofer Kjoerling
Heiko Purnhagen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby International AB
Original Assignee
Dolby International AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2010225051A external-priority patent/AU2010225051B2/en
Priority claimed from AU2013206557A external-priority patent/AU2013206557B2/en
Application filed by Dolby International AB filed Critical Dolby International AB
Priority to AU2019222947A priority Critical patent/AU2019222947B2/en
Publication of AU2019222947A1 publication Critical patent/AU2019222947A1/en
Application granted granted Critical
Publication of AU2019222947B2 publication Critical patent/AU2019222947B2/en
Priority to AU2021290344A priority patent/AU2021290344B2/en
Priority to AU2022209299A priority patent/AU2022209299A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to audio encoder and decoder systems. An embodiment of the encoder system comprises a downmix stage for generating a downmix signal and a residual signal based on a stereo signal. In addition, the encoder system comprises a parameter determining stage for determining parametric stereo parameters such as an inter-channel intensity difference and an inter-channel cross-correlation. Preferably, the parametric stereo parameters are time- and frequency-variant. Moreover, the encoder system comprises a transform stage. The transform stage generates a pseudo left/right stereo signal by performing a transform based on the downmix signal and the residual signal. The pseudo stereo signal is processed by a perceptual stereo encoder. For stereo encoding, left/right encoding or mid/side encoding is selectable. Preferably, the selection between left/right stereo encoding and mid/side stereo encoding is time- and frequency-variant

Description

Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
Related Applications
The present application is a divisional application of Australian patent Application No. 2018200340, which is a divisional of Australian patent No. 2015246158, itself a divisional of Australian patent application No. 2013206557, itself a divisional application of Australian patent application No. 2010225051; the entire contents of all applications are incorporated herein by reference.
Technical Field
The application relates to audio coding, in particular to stereo audio coding combining parametric and waveform based coding techniques.
Background of the Invention
Joint coding of the left (L) and right (R) channels of a stereo signal enables more efficient coding compared to independent coding of L and R. A common approach for joint stereo coding is mid/side (M/S) coding. Here, a mid (M) signal is formed by adding the L and R signals, e.g. the M signal may have the form
M = 1/2(L + R) .
Also, a side (S) signal is formed by subtracting the two channels L and R, e.g. the S signal may have the form
S = 1/2(L - R).
In case of M/S coding, the M and S signals are coded instead of the L and R signals.
In the MPEG (Moving Picture Experts Group) AAC (Advanced Audio Coding) standard (see standard document ISO/IEC 13818-7), L/R stereo coding and M/S stereo coding can be chosen in a time-variant and frequency-variant manner. Thus, the stereo encoder can apply L/R coding for some frequency bands of the stereo signal, whereas MIS coding is used for encoding other frequency bands of over time the stereo signal (frequency variant), Mreover, the encoder can switol between L/R and M/S coding (time-variant). In MPEG AAC, the stereo encoding is carried out in the frequency domain, more particularly in the MDCT (modified or discrete cosine transformi) domait This allows to adaptive choose either L/R between MIS coding in afrequency and also time variant mamer. The decision when L/R and M/S stereo enooding may be based by evaluating the lde signal: and the energy of the side signal is low, M/S stereo encoding is more efficient shoulder used, Alternatively, for deciding between both stereo coding schemes, there both coding schemes may be tried out and the selection maybe based on suiting quantization.efforts, i.e., the observed perceptual entropy.
An alternative approach to joint stereo coding is patametric stereo (PS) coding, Here, the stereo signal isconveyed as amono downmix signal after encoding the The downinix signal with a conventional audio encoder such as an AAC encoder. downmix signalis a superposition ofthe L and R channels. The mono downmix signal is conveyed in combination with additional time-variant and frequency variant PS parameters, such as the inter-channel(i.e. between L and R)intensity difference (ID) and the inter-channel cross-correlation (ICC). Inthe decoder, a ste based on the decoded downmix signal and the parametric stereo parameters reo signal is reconstructed that approximates the perceptual stereo Image of the, the downmix original stereo signal, For reconstructing, a decorrelated version of by an signal is generated by a decorrelator. Such decorrelator may be realized paper appropriate all-pass filter, PS encoding and decoding is described in the Proo. Of "Low Complexity Parametric Stereo Coding inMPEG-4", H Putnhagen, Naples, Italy, Octo the 7Th Int. Conference on Digital Audio Effects (DAFx'04), ber 5-8, 2004, pages 163-168, The disclosure of this docui-dent is hereby incorpo rated by reference,
makes use of the The MPEG Surround standard (see document ISO/IEC 23003-1) chan concept of PS coding, In an MPEG Surround decoder a phirality, of'output MPEG Sur- nels is created based on fewer input channels and control parameters, round decoders and encoders are constructed by cascading parametric stereo modules, which in MPEG Surround are refered to as OTT modules (One-To-Two modules) for the decoder and R-OTT modules (Reverse-One-To-Two modules) for the encoder, An OTT module determines two output channels by means of a OTT s single input channel(downmix signal) accompanied by PS parameters, An module corresponds to a PS decoder and an R-OTT module corresponds to a,PS encoder, Parametric stereo can be realized by using MPEG Surround with a single OTT module at the decoder side and a single R-OTT module at the encoder side; this is also referred to as MPEG Surround 2-1-2" mode. The bitstream syntax There io may differ, but the underlying theory and signal processing are the same, Surround 2-1 fore, in the following all the references to PS also include "MPEG 2" or MPEG Surround based parimetrio stereo,
In a PS encoder (e.g. in a MPEG Surround PS encoder) a residual signal (RES) Such resi may be determined and transmitted in additionto the downmix signal, dual signal indicates the error associated with representing original channels by may be used their dowmnix and PS parameters, In the decoder the residual signal to better instead of the decorrelated version of the downinix signal, This allows of an addi reconstruct the waveforms of the original channels L and R. The use (see docu tionalr esidual signal is e.g. described in the MPEGSuround standard mentISO/IEC23003-1) and inthe paper "MPEG Surround -TheISO/MPEC J, Herre et Standard for Efficient and Compatible Multi-Channel Audio Coding, a, Audio Engineering Convention Paper 7084, 1 2 2 d Convention, May 5-8, the residual 2007. The disclosure ofboth documents, inparticular the remarksto signal therein, is herewith incorporated by reference,
than PS coding with residual is amoie general approach to joint stereo coding M/S coding: M/S coding performs a signalrotation when transforming L/R sig nals into M/S signals. Also, PS coding with residual performs a signal rotation when transforming the L/R signals into downmix and residual signals. However, parameters, in the latter case the signal rotation is varable and depends on the PS
Due tothemore general approach of PS coding withresidual,PS coding with residual allows amore efficient coding of certain types of signals like apaned mono signal thanM/S coding. Thus, the proposed coder allows to efficiently combine parametric stereo coding techniques with waveformbased stereo coding' techniques.
stereo en Often, perceptual stereo encoders, such as an MPEG AAC perceptual encoding, where coder, can decide between L/R stereo encoding and M/S stereo signal. Such inthe latter case a mid/side signal is generated based on the stereo bands L/R stereo en selection may be frequency-variant, i.e, for somefrequency coding may be used, whereas for other frequency bands MIS stereo encoding may beused,
In a situation where the L and R channels are basically independentsignals, such stereo encoding since in is perceptual stereo encoder would typically not use M/S gain in comparison this situation such encoding scheme does not offer any coding to L/R stereo encoding, The encoder would fall back to plain L/R stereo encoding, basically processing L and R independently,
signal that In thesame situation, aPS encoder system wouId create a downmix processing of the contains both the L and R channels, which prevents independent this can imply less effi L and R channels, For PS coding with a residual signal, or M/S dlent coding comparedto stereo encoding, where LR stereo encoding stereo encoding is adaptively selectable,
a perceptual stereo coder Thus, there are situations where a PS coder outperfoms with adaptive selection between L/R stereo encoding and M/S stereo encoding, coder. whereas in other situations the latter coder outperforms the PS
Summary of the invention
me The present application describes an audio enooder system and an encoding thod that are based on the idea of combing PS codingusing aresidual with adap tive L/R orM/S perceptual stereo coding (e.g, AAC perceptualjoint stereo coding of adaptive L/R or in the MDCT domain), This allows to combine the advantages coding M/S stereo.coding (e.g, usedin MPEG AAC) and the advantages of PS the application with aresidual signal (e,g, used in MPEG Surround), Moreover, method. describes a corresponding audio decoder system and a decoding
for encoding a stereo A first aspect of the applicationrelates to an encoder system the enooder system, signal to a bitstream signal According to an embodiment of signal the encoder system comprises a downmix stage for generating a downmix signal may cover all and'a residual signal based on the stereo signal. The residual or only apart of the used audio frequency range, In addition, the encoder system PS parameters such as comprises a parameter determining stage for determining Pre an inter-channel intensity difference and an inter-channel cross-oorrelation. stage and the ferably, the PS parameters are frequency-variant. Such downmix of aPS encoder, parameter determining stage are typically part
down In addition, the encoder system comprises perceptual encoding means are selectable; stream of the downmix stage, wherein two encoding schemes - encoding based on a sum of the downmix signal and the residual signal signal or and based on a difference of the downmnx signal and the residual - encoding based on the downmix signal and based on the residual signal.
signal and the It should be noted that in case encoding is based on the downmix be encoded or residual signal, the downtnix signal and the residual signal may case encoding is based on a sum signals proportional thereto may belenoded.In and on a difference, the sum and difference may be encoded or signals propor tional thereto maybe encoded.
The selection maybe frequenocy-variant (and time-variant), ie, for a firstfrequen cy band it may be selected that the encoding is based on asum signal and a differ ence signal, whereas for a second frequency band itmay be selected that the en coding is based on the downix signal and. based on the residual signal,
Such encoder system has the advantage that is allows to switch between L/R ste reo coding and PS coding withresidual (preferably in a frequency-variant man ner): If the perceptual encoding means select (for particular band or for the whole used frequency range) encoding based on downmix and residual signals, the encoding system behaves like a system usingstandard PS codingwithresi dual, However, if the pe6eptual encoding means select (for a particular band or for the whole used frequency range) encoding based on a sun signal of the downmix signal and the residual signal and based on a difference signal of the downmix signal and the residual signal, under certain circumstances the sum and difference operations essentially compensate the prior downmix operation (except for a possibly different gain factor) such that the overall system can actually per form LR encoding of the overallstereo signal or for afrequency band thereof E.g, such circumstances ocurwhen the L and R channels of the stereo signal are independent and have the samelevel as will be explained in detail later on,
Preferably, the adaption of the encoding scheme is time and frequency dependent, Thus, preferably some frequency bands of the stereo signal are encoded by a L/R encoding scheme, whereas other frequency bands of the stereo signal are encoded by a PS coding scheme with residual.
It should be noted that in case the encoding is based on the dowmnix signal and based on theresidua signal as discussed above, the actual signal which is input to the core encoder may be formed by two serial operations on the downtnix signal 3o and residual signal which are inverse (except for a possibly different gain factor). E g. a downmix signal and aresidual signal are fed to an-MS to L/R transform transform stage and thenthe output of the transform stage is fed to a L/Rto MIS to the stage, The resulting signal (which is thenused for encoding) corresponds dowmnix-signal and theresidual signal (expect for apossibly different gain fac tor).
Thefollowing embodiment makes use of this idea. Acoordingto an embodiment of the encoder system, the encoder system comprises a downmix stage and a pa rameter determining stage as discussed above. Moreover, the encoder system comprises a transforn stage (e,g. as part of the encoding means discussed above). The transform stage generates a pseudo L/R stereo signal signal. The by performing a transform of the downmix signal and the residual transform stage preferably performs a sum and difference transform, where the of downmix signal and the residual signals are summed to generate one channel by a factor) and sub the pseudo stereo signal (possibly, the sumis also multiplied tracted from each other to generate the other channel ofthe pseudo stereo signal a first channel (possibly, the difference is also multiplied by a factor). Preferably, the sum (e.g. the pseudo left channel) of the pseudo stereo signal is proportionalto of the downmix and residual signals, where a second channel (e.g, the pseudo right channel) is proportional to the difference of the downmix and residual sig encod nals. Thus, the downmix signal DMX and residual signal RES from the PS er may be converted into a pseudo stereo signal Lp, Rp according to the following equations: g(DAMX+ RES) L, =
R, = g(DM -X-RES) of In the above equations the gainnormalization factor g has e.g. a value
g= ,
stereo encoder The pseudo stereo signal is preferably processed by a perceptual stereo encoding or M/S (e.g, as part of the encoding means), For encoding, L/R stereo encoding is selectable, The adaptive L/R or M/S perceptual stereo encoder may be an AAC based encoder. Preferably, the selection between L/R stereo en coding and M/S stereo encoding is frequency-variant; thus, the selection may vary for different frequency bands as discussed above,-Also, the selection between L/R between L/R . nooding and MS encoding is preferably time-variant The decision encoder, encoding and M/S encoding is preferably made by the perceptual stereo
Such perceptual encoder having the option for M/S encoding can internally con bands) pute (pseudo) M and S signals (in the time domain or in selected frequency based on the pseudostereo L/R signal, Such pseudo M and S signals correspond to the downmix and residual signals (except for a possibly different gain factor). encodes Hence, if theperceptual stereo encoder selects MS encoding, it actually the dowimix and residual signals (which corespond to the pseudo M and S sig nals) as it would be done in a system using standard PS coding with residual,
Moreover, under special ciroumstances the transform stage essentially compe sates the prior downmix operation (except for apossibly different gain factor) encoding of the such that the overall encoder system can actually perform L/R is selected overall stereo signal or for a frequency band thereof (ifL/R encoding of the in the perceptual encoder).This is e.g. the case when the L and Rchannels in de stereo signal are independent and have the same level as will be explained taillater on, Thus, for a givenfrequenoy band the pseudo stereo signal essentially band - the corresponds or is proportionalto the stereo signal, if- for the frequency and have left and right channels of the stereo signal are essentially independent essentially the samelevel.
and Thus, the encoder system actually allows toswitch between L/R stereo coding PS coding with residual, in order to be able to adapt to the properties of the given is time and stereo input signal, Preferably, the adaption of the encoding scheme signal frequency dependent. Thus, preferably some frequency bands of the stereo ofthe are enooded by a L/R encoding scheme, whereas other frequency bands be noted stereo signal are encoded by a PS coding scheme with residual. It should that M/S coding Is basically a special case of PS codingwith residual (sincethe L/R to MS transform is a special case of thePS dwnix operation) and thus the encoder system may also perform overall M/S coding, and s Said embodiment having the transform stage downstream ofthe PS enode that a uptream of the L/R or M/S perceptual stereo encoder has the advantage conventional PS encoder and a conventional perceptual encoder can be used. due to the Nevertheless, the PS encoder or the perceptualencoder may be adapted special use here.
The new concept improves the performance of stereo coding by enabling an effi olent combination of PS coding and jointstereo coding.
According to an alternative embodiment, the encoding means as discussed above comprise transform stage for performing a sumand difference transform based on the downmix signal and the residual signal for one or more frequency bands (e.g. for the whole used frequency range or only for one frequency range), The transform may be performed in a frequency domain or in a time domain. The fr transform stage generates a pseudo left/right stereo signal for the one or more to the sum quency bands. One channel of the pseudo stereo signal corresponds and the other channel corresponds to the difference,
output of Thus, in case encoding is based on the sum and difference signalsthe is based the transform stage may be used for encoding, whereas in case encoding on the downmix signal and the residual signal thesignals upstream of the enood not use two seri ing stage may be used for encoding, Thus, this embodiment does al sum and difference transforms on the downmix signal and residual signal, re sulting in the downmix signal and residual signal (except for a possibly different gain factor).
When selecting encoding based on the downmix signal and residual signal, para
metric stereo encoding of the stereo signal is selected. When seletingencoding based on the sum and difference (i.e. encoding based on the pseudo stereo signal) L/R encoding of the stereo signal isselected,
The transform stage may be a L/R to M/S transform stage as part of a perceptual encoder with adaptive selection between L/R and M/S stereo encoding (possibly to M/S transform the gain factor is different in comparison to a conventionalL/R M/S stereo encoding stage). It should be noted that the decision between L/R and residual should be inverted. Thus, encoding based on the downmix signaland stage) when signal is selected (ie. the encoded signal did not pass the transform based on the the decision means decide M/S perceptual decoding, and encoding stage is selected (i.e, the en pseudo stereo signal as generated by the transform ineans decide L/R coded signal passed the transform stage) when the decision perceptual decoding.
may The encoder system according to any ofthe embodiments discussed above encoder. SBR is a form of comprise an additional SBR (spectalband replication) HFR (High Frequency Reconstruction). An SBR encoder determines side infor of the audio signal in mation for the reconstruction of the higher frequency range the perceptual encod the decoder, Only the lower frequency range is encoded by is connectedup er, thereby reducing the bitrate, Preferably, the SBR encoder be inthe stereo domain and stream of the PS encoder, Thus, the SBR encoder may will be discussed in detail in generates SBR parameters for a stereo signal, This connection with the drawings.
determining Preferably, the PS encoder (i.e. the dow-nnix stage and the parameter (also the PS decoder as dis stage) operates in an oversampled frequency domain domain), For time cussed below preferably operates in an oversampled frequency a QMF 30 to-frequency transform e.g. a complex valuedhybrid filter bank having be used upstream of the PS en- (quadrature mirror filter) and aNyquist filter may coder as described in MPEG Surrotndstandard (see documentISO/IEC 23003-1) This allows for time and frequency adaptive signal processing without audible aliasing artifacts, The adaptive L/R or MIS encoding, on the other hand, is prefer ably carriedout in the critically sampled MDCT domain (e.g, as described in AAC)in order to ensure nefficient quantized signal representation,
The conversion between downmix andresidualsignals and the pseudo L/R stereo signal may be carried out in the time domain since the PS encoder and the percep tual stereo encoder are typically connected inthe time domain anyway, Thus, the transform stage for generating the pseudo L/R signal may operate inthe time do main,
In other embodiments as discussed in connection with the drawings, the transform stage operates in an oversampled frequency domain or in a critically sampled MDCT domain,
A second aspect of the application relates to a decoder system for decoding a bit stream signal as generated by the encoder system discussed above,
According to an embodiment of the decoder system, the decoder system compris es perceptual decoding means for decoding based on the bitstream signal. The decoding means are configuredto generate by decoding an (internal) first signal and an (internal) second signal and to output a dowmnix signal and a residual sig nal, The downmix signal andtheresidual signal is selectively - based on the sum of the first signal and of the second signal and based on the difference of the first signal and of the second signal or - based on the first signal and based on the second signal.
As discussed above in connection with the encoder system, also here the selection may be frequenoy-variant or frequency-invariant, the stereo signal Moreover, he system comprises anupmix stage for generating operation of based onthe dowmix signal and the residual signal, with the upmix stereo parameters. the upmix stage being dependent on the one or more parametric actually switch Analogously to the encoder system, the decoder system allows to a time and between L/R decoding and PS decoding with residual, preferably in frequency variant manner,
According to another embodiment, the decoder system comprises a perceptual the bitstream stereo decoder (e.g. as part of the decoding means) for decoding signal, The perceptual decoder signal, withthe decoder generating a pseudo stereo L/Rperceptual may be an-AAC based decoder, For the perceptual stereo decoder, or fre decoding or M/S perceptual decoding is selectable in a frequenoy-variant controlled by the deci quency-invariant manner (the actual selection is preferably the bitstream), The sion inthe encoder which is conveyed as side-information in scheme used for en decoder selects the decoding scheme based on the encoding by informa coding, The used encoding scheme may be indicatedto the decoder tion contained inthe received bitstream.
signal and a Moreover, a transform stage is provided for generating a downmix stereo signal, In other residual signal by performing a transform of the pseudo decoder is con words: The pseudo stereo signal as obtained from the perceptual transform is a sum and verted back to the downnix and residual signals. Such to the sum of a difference transform: The resulting downmix signal is proportional resulting residual left channel and aright channel of the pseudo stereo signal. The channel and the right channel of signal is proportional to the difference of the left M/S transform was carried out. the pseudo stereo signal, Thus, quasi an L/R to be converted to the The pseudo stereo signal with the two channels Lp, Rp may downix and residual signals according to the following equations:
1 DMX -(L,+ R,) 2g
RES=-(L-1R) 2g Inthe above equations the gain normalization factor g may have e,g. avalue of
g= /2.The residual signal RES used in the decoder may cover the wholeused audio frequency range or only a part ofthe used audio frequency range.
The dowmnix and residual signals are then processed by an upmix stage of a PS decoder to obtainthe final stereo output signal, The upmixing of the downmix and residual signals to the stereo signal is dependent on the received PS parameters.
According to an alternative embodiment, the perceptual decoding means may based comprise a sum and difference transform stage for performing a transform on the first signal and the second signal for one or more frequency bands (e.g. for the whole used frequency range), Tius, the transform stage generates the down mix signal and the residual signal for the case that the dowmnix signal and the residual signal are based on the sum of the first signal and ofthe second signal and based on the difference of the first signal and of the second signal. The trans form stage may operate inthe time domain or in a frequency domain.
As similarly discussed in connection with the encoder system, the transform stage may be a M/S to L/R transform stage as part of a perceptual deo6oder with adaptive selection betweenL/R andM/S stereo decoding (possibly the gain factor Is differ ent in comparison to a conventional M/S to L/R transform stage). It should be noted that the selection between L/R and M/S stereo decoding should be inverted.
The decoder system according to any of the preceding embodiments may com informationfromthe SBR prise an additional SBR decoder for decoding the side encoder and generating a high frequency component of the audio signal Prefera- decoder. This will be bly, the SBR decoder is located downstream of the PS discussed in detail in connection with drawings,
Preferably, the upmix stage operates in an oversampled frequency domain, e.g. a decoder. hybrid filter bank as discussed above may be used upstream of the PS the S The L/R to M S transform may be.carried out in the time domain since perceptual decoder and the PS decoder (including the upimix stage) are typically connected in the time domain,
In other embodiments as discussed in connection with the drawings, the L/R to M/S or in a transform is carried out in ati oversampled frequency domain (eg, QMF), critically sampled frequency domain (e.g., MDCT). to a A third aspect of the applicationrelates to a method for encoding a stereo signal bitstream signal. The method operates analogously to the encoder system discussed above. Thus, the above remarks related to the encoder system are basically also applicable to encoding method.
i5 A fourth aspect of the invention relates to a method for decoding a bitstream signal same including PS parameters to generate a stereo signal, The method operates in the remarks related to the way as the decoder system discussed above, Thus, the above decoder system are basically also applicable to decoding method,
A Wfth aspect of the invention relates to an encoder system configured for encoding a stereo signal to a bitstream signal, the encoder system comprising: a downmixing the means configured for generating a downmix signal and aresidual signal based on or stereo signal; a parameter determining means configured for determining one of more parametric stereo parameters; and perceptual encoding means downstream for the downinixing means, wherein the perceptual encoding means are configured encoding the downmix signal and the residual signal, and wherein the perceptual encoding means are configured for selecting left/tight perceptual encoding, or mid/side perceptual enoding.
A sixth aspect of the invention relates to a decoder system configured for decoding a stereo bitstream signal including one or more parametric stereo parameters to a for signal, the decoder system comprising: perceptual decoding means configured decoding based on the bitstream signal, wherein the decoding means are configured decoding means are to generate a downmix signal and a residual signal, wherein the configured to selectively perform left/right perceptual decoding or mid/side perceptual decoding; and upmixing means configured for performing an upmix operation for generating the stereo signal based on the downmix signal and the residual signal, with the upmix operation of the upmixing means being dependent on the one or more parametric stereo parameters.
A seventh aspect of the invention relates to a method for encoding a stereo signal to a bitstream signal, the method comprising: generating a downmix signal and a residual signal based on the stereo signal; determining one or more parametric stereo parameters; perceptual encoding downstream of generating the downmix signal and the residual signal, wherein left/right perceptual encoding, or mid/side perceptual encoding is selectable.
An eighth aspect of the invention relates to a method for decoding a bitstream signal including parametric stereo parameters to a stereo signal, the method comprising: perceptual decoding based on the bitstream signal, wherein perceptual decoding comprises generating a downmix signal and residual signal by selectively performing left/right perceptual decoding or mid/side perceptual decoding; and generating the stereo signal based on the downmix signal and the residual signal by an upmix operation, with the upmix operation being dependent on the parametric stereo parameters.
According to a ninth aspect, there is provided herein a method for encoding a stereo input signal having a left channel and a right channel, the method comprising: selecting either a transform coding mode or a linear predictive coding mode as a selected coding mode; encoding the stereo input signal using the selected coding mode to produce an encoded output signal; and generating a bitstream signal including the encoded output signal, wherein, if the linear predictive coding mode is selected, the encoding comprises: downmixing the stereo input signal to a mono signal, the mono signal being a sum of the left channel and the right channel, estimating parameters for reconstructing the stereo input signal from the mono signal, encoding the mono signal using linear predictive coding to produce an encoded mono signal, and outputting the encoded mono signal and the parameters as the encoded output signal, wherein, if the transform coding mode is selected, the encoding comprises: analyzing the stereo input signal, selecting a stereo coding mode based on the analyzing, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, encoding the stereo input signal using the selected stereo coding mode in a first frequency band to produce an encoded
15a
stereo signal in a first frequency band, downmixing the stereo input signal to a mono signal in a second frequency band, encoding the mono signal in the second frequency band using transform coding to produce an encoded mono signal in the second frequency band, and outputting the encoded stereo signal in the first frequency band and the encoded mono signal in the second frequency band as the encoded output signal.
According to a tenth aspect, there is provided herein a device for encoding a stereo input signal having a left channel and a right channel to produce an encoded output signal, the device comprising: a mode selector for selecting either a transform coding mode or a linear predictive coding mode; a transform encoder for encoding the stereo input signal if the selected coding mode is transform coding; a linear predictive encoder for encoding the stereo input signal if the selected coding mode is linear predictive coding; and a bitstream generator for generating a bitstream signal including the encoded output signal, wherein the linear predictive encoder is configured to: downmix the stereo input signal to a mono signal, the mono signal being a sum of the left channel and the right channel, estimate parameters for reconstructing the stereo input signal from the mono signal, encode the mono signal using linear predictive coding to produce an encoded mono signal, and output the encoded mono signal and the estimated parameters as the encoded output signal, wherein the transform encoder is configured to: analyze the stereo input signal, select a stereo coding mode, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, encode the stereo input signal using the selected stereo coding mode in a first frequency band to produce an encoded stereo signal in the first frequency band, downmix the stereo input signal to a mono signal in a second frequency band, encode the mono signal in the second frequency band using transform coding to produce an encoded mono signal in the second frequency band, and output the encoded stereo signal in the first frequency band and the encoded mono signal in the second frequency band as the encoded output signal.
According to an eleventh aspect, there is provided herein a method for decoding a bitstream signal to produce a decoded output signal having a left channel and a right channel, the method comprising: extracting an encoded audio signal from the bitstream signal, the encoded audio signal generated using a selected coding mode, the selected coding mode including either a transform coding mode or a linear predictive coding mode; decoding the encoded audio signal using the selected coding mode to produce a decoded signal; and outputting the decoded signal as the decoded output signal, wherein, if the
15b
selected coding mode is the linear predictive coding mode, the decoding comprises: receiving a mono signal, the mono signal being a sum of the left channel and the right channel, decoding the mono signal using linear predictive decoding to produce a decoded mono signal, extracting parameters from the bitstream signal for reconstructing a stereo audio signal, reconstructing the stereo audio signal using the decoded mono signal and the parameters to produce a reconstructed stereo audio signal, and outputting the reconstructed stereo audio signal as the decoded signal, wherein, if the selected coding mode is the transform coding mode, the decoding comprises: receiving a stereo signal in a first frequency band, the stereo signal generated using a selected stereo coding mode, the selected stereo coding mode including either mid/siding stereo coding or left/right stereo coding, receiving a mono signal in a second frequency band, decoding the stereo signal in the first frequency band using the selected stereo coding mode to produce a decoded stereo signal in the first frequency band, decoding the mono signal in the second frequency band using transform decoding to produce a decoded mono signal in the second frequency band, and outputting the decoded stereo signal in the first frequency band and the decoded mono signal in the second frequency band as the decoded signal.
According to a twelveth aspect, there is provided herein a device for decoding a bitstream signal to produce a decoded output signal having a left channel and a right channel, the device comprising: a demultiplexer for extracting an encoded audio signal from the bitstream signal, the encoded audio signal generated using a selected coding mode, the selected coding mode including either a transform coding mode or a linear predictive coding mode; a transform decoder for decoding the encoded audio signal if the selected coding mode is the transform coding mode; and a linear predictive decoder for decoding the encoded audio signal if the selected coding mode is the linear predictive coding mode, wherein the linear predictive decoder is configured to: receive a mono signal, the mono signal being a sum of the left channel and the right channel, decode the mono signal using linear predictive decoding to produce a decoded mono signal, extract parameters from the bitstream signal for reconstructing a stereo signal, reconstruct the stereo signal using the decoded mono signal and the parameters to produce a reconstructed stereo audio signal, and output the reconstructed stereo signal as the decoded output signal, wherein transform decoder is configured to: receive a stereo signal in a first frequency band, the stereo signal generated using a selected stereo coding mode, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, receive a mono
15c
signal in a second frequency band, decode the stereo signal in the first frequency band using the selected stereo coding mode to produce a decoded stereo signal in the first frequency band, decode the mono signal in the second frequency band using transform decoding to produce a decoded mono signal in the first frequency band, and output the decoded stereo signal in the first frequency band and the decoded mono signal in the second frequency band as the decoded output signal.
The invention is explained below by way of illustrative examples with reference to the accompanying drawings, wherein
Fig. 1 illustrates an embodiment of an encoder system, where optionally the PS parameters assist the psycho-acoustic control in the perceptual stereo encoder;
Fig. 2 illustrates an embodiment of the PS encoder;
Fig. 3 illustrates an embodiment of a decoder system;
Fig. 4 illustrates a further embodiment of the PS encoder including a detector to deactivate PS encoding if L/R encoding is beneficial;
Fig. 5 illustrates an embodiment of a conventional PS encoder system having an additional SBR encoder for the downmix;
Fig. 6 illustrates an embodiment of an encoder system having an additional SBR encoder for the downmix signal;
SBR Fig. 7 illustrates an embodiment of an encoder system having an additional encoder in the stereo domain;
Figs, 8a-8d illustrate various time-frequency representations of one of the two output channels at the decoder output;
Fig, 9a illustrates an embodiment of the core encoder; between Fig. 9b illustrates an embodiment of an encoder that permits switching coding in a linear predictive domain (typically for mono signals only) and coding in a transform domain (typically for both 'monoand stereo signals);
Fig. 10 illustrates an embodiment of an encoder system;
Fig. 1la illustrates a part of an embodiment of an encoder system;
Fig, 1lb illustrates an exemplary implementation of the embodiment in Fig.11a;
Fig. 11illustrates an alternative to the embodiment in Fig, 11a;
Fig. 12 illustrates an embodiment of an encoder system system of Fig, 13 illustrates an embodiment of the stereo coder as part of the encoder Fig. 12; the bitstream Fig, 14 illustrates an embodiment of a decoder system for decoding signal as generated by the encoder system of Fig. 6; the bitstream Fig, 15 illustrates an embodiment of a decoder system for decoding signal as generated by the encoder system of Fig. 7;
Fig. 16a illustrates a part of an embodiment of a decoder system;
Fig. 16b illustrates an exemplary implementation of the embodiment in Fig, 16a;
Fig, 16c illustrates an alternative to the embodiment in Fig. 16a;
Fig, 17 illustrates an embodiment of an encoder system; and
Fig, 18 illustrates an embodiment of a decoder system.
encoding Fig shows an embodiment of an encoder system which combines PS This embo- using a residual with adaptive L/R or M/S perceptual stereo encoding, diment Is merely illustrative for the principles of the present application. It is un derstood thatmodifications and variations of the embodiment will be apparentto others skilled in the at The encoder system comprises a PS encoder receiving a stereosignal L, R. The PS encoder 1 has a dowmix stage for generating down mix DMXand residual RES signals based on the stereo signa1L, R, This opea L tion canbe described by means of a 2-2 downmix matrix H- that converts the and R signals to the downmix signal DMX and residual signalRES:
DV H1-('RS) R) i.e. the elements Typically, the matrix H'Is frequeny-variant and time-variant, of the matrix H-'vary over frequency and vary from time slot to time slot, The a matrix H -may be updated every frame (e.g, every 21 or 42 ms) and may have frequencyresolution of a plurality of bands, e.g, 28,20, or 10 bands (named "pa rameter bands") on a perceptually oriented (Bark-like) frequency scale.
The elements of the matrix H- depend on the time- and-frequency-variant PS CLD - channel lev parameters lID (inter-ohannelintensity difference; also called el difference) and ICC (inter-ohannel oross-correlation). For determining PS pa rameters 5, e,g, 1ID and ICC, the PS encoder I comprises a parameter determining matrix2His stage, An example for computing the matrix elements of the inverse docu given by the following and described in the MPEG Surround specification ment ISO/EC 23003-1, subolause 6,5,3.2 which is hereby incorporated byrefer ence: a, ccos(a +#) c sin(a'±+)] Le, cos (-a+#) c, sin(-a+#)'
where F CL '
10 " 1 1 cm , and c 2 m 1+10'T 1+1010 and where ps artan tan(a ,and a= arcoos(p), and where pICC,
2 that converts the Moreover, the encoder system comprises atransform stage a downmix signal DMX and residual signal RES fromthe PS encoder 1 into to the following equations: pseudo stereo signal Lp, Rp, e,g, according L, - g(DMX+ RES)
R,= g(DMX - RES) g has e.g. value of In the above equations the gain normalization factor L,, R, can be g /2For g /2, the two equations for pseudo stereo signal
rewritten as: (riL, 1/2'ji DMX) D 1/2)
aperceptual stereo encoder 3, which The pseudo stereo signal L,, R,is then fed to M/S encoding is a form of adaptively selects either L/R or M/S stereo encoding. based onjoint encoding aspects, joint stereo coding. L/R encoding may be also R channels from a common bit e.g, bits may be allocated jointly for the L and reservoir,
The selection between L/R or M/S stereo encoding is preferably ftequenoy other frequency variant, i.e. some frequency bands may be L/Rencoded, whereas implementing the selection be bands may be MIS encoded. An embodiment for the document "Sum-Difference tween L/R or M/S stereo encoding is described in International Confertence Stereo Transform Coding", J. D. Johnston et al, IEEE 1992, pages 569-572. The on Acoustics; Speech, and Signal Processing (ICASSP) encoding therein, in patio discussions ofthe selection between L/a or M/S stereo reference, ulat sections 5.1 and 52, is hereby incorporated by
Based on the pseudo stereo signal Lp, Rp, the perceptual encoder 3 can internally compute (pseudo) mid/side signals Mp, Sp. Such signals basically correspond to the downmix signal DMX and residual signal RES (except for a possibly different gain factor), Hence, if the perceptual encoder 3 selects M/S encoding for a fre quency band, the perceptual encoder 3 basically encodes the downmix signal DMX and residual signal RES for that frequency band (except for a possibly dif ferent gain factor) as it also would be done in a convntionalperceptual encoder system using conventional PS coding with residual, The PS parameters 5 and the output bitstream 4 of the perceptual encoder 3 are multiplexed into a single bit stream 6 by a multiplexer 7,
In additionto PS encoding ofthe stereo signal, the encoder system in Fig, 1 al lows L/R coding of the stereo signal as will be explained inthe following; As dis cussed above, the elements of the downmix matrix T' of the encoder (and also of the upmix matrix Hused in the decoder) depend on the time- andfrequency variant PS parameters IID (inter-channelintensity difference; also called CLD ohannellevel difference) and ICC (nter-channel cross-correlation). An example for computing the matrix elements of the upmix matrix His described above. In case of using residual coding, the right column of the 2'2upmix matrixHis given as
However, preferably, the right column of the 2-2 matrixH should instead be mod ifted to
The left column is preferably computed as given in the MPEG Surround speciflca lion,
0 dB Modifyingthe right column of the upmix matrix H ensures that for IID L and and ICC=0 (i,e, the case where for the respective band the stereo channels f is ob R are independent and have the same level) the following upmix matrix tained for the band:
Please note that the upix matrix H and also the downmix matrix H- are typi are dif cally frequency-variant and time-variant, Thus, the values of the matrices of a ferent for different titne/frequency tiles (a tile corresponds to the intersection above case the particular frequency band and a particular time.period), In the the downmixmatrix H'is identicalto the upmix matrix H, Thus, for the band equation: pseudo stereo signalLp, Rp can computed by the following
L 1 D vA (12 r1/ 1/2
,f1/2 -Vi/ 2 )KR) 1/2/ l2 ).LX 1 02(-L 'L) f1/2i {2 2 R )01) R) R)
matrix H Hence, in this case the PS encoding with residual using the downmix stage 2 cor followed by the generation ofthe pseudo L/R signal inthe transform responds to the unity matrix and does not change thestereo signal for the respec tive frequency band at all, ie, L, L
In other words: the transform stage 2 compensates the dowmnix matrix I ~such that the pseudo stereo signalL,, R, corresponds to the input stereo signal L, R, en This allows to encodothe original input stereo signal L, R by the perceptual coder 3 for the particular band; When L/K encoding is selected by the perceptual like a L/R encoder 3 for encoding the particular band, the encoder system behaves perceptual encoder for encoding theband of the stereo input signal L, R,
The encoder systeinin Fig. 1 allows seamless and adaptive switching between manner, L/R coding and PS coding with residualin a frequency- and tne-variant The encoder system avoids discontifuitles inthe waveform when switching the coding scheme, This prevents artifacts. In order to achieve smooth transitions, encod linear interpolation may be applied to the elements of the matrix Hf inthe er and the matrix H inthe decoder forsamples between two stereo parameter up dates,
encoder I comprises a Fig, 2 shows an embodiment of the PS encoder 1. The PS is downmix stage 8 which generates the downix signal DMX and residual signal RES based on the stereo signal L, R. Further, the PS encoder 1 comprises a para the stereo meter estimating stage 9 for estimating the PS parameters 5 based on signal L, R,
configured to Fig, 3 illustrates an embodiment of a corresponding decoder system This embo decode the bitstream 6 as generated by the encoder system of Fig, L It is un diment is merely illustrative for the principles of the present application, apparent to derstood that modifications and variations ofthe embodiment will be 10 for others sIdlled intheart, The decoder system comprises a demultiplexer separating the PS parameters 5 and the audio bitstream4 as generated by the per decoder 11, ceptual encoder 3. The audio bitstream 4 is fed to a perceptual stereo audio which can selectively decode anL/R encoded bitstream or an M/S encoded en bitstream. The operation of the decoder 11is inverse to the operation of the coder 3. Analogously to the perceptual encoder 3, the perceptual decoder 11 pre ferably allows for a frequency-variant and time-variant decoding scheme, Some by the frequency bands which are L/R encoded by the encoder 3 are L/R decoded decoder 11, whereas other frequency bands which are MIS encoded by the encod er 3 are M/S decoded by the decoder 11, The decoder 11 outputs the pseudo stereo signal L, R, which was input to the perceptual encoder 3 before. The pseudo ste reo signalL, R, as obtained from the perceptual decoder 11 is converted backto the downmix signal DMX and residual signal RES by a L/R to M/S transfonn stage 12. The operation of the L/R to M/S transform stage 12 at the decoder side is inverse to the operation of the transform stage 2 at the encoder side, Preferably, the transform stage 12 determines the dowmix signal DMX and residual signal RES according to the following equations:
DMX{+(L,+R,) 2g
RES -(L - R,) 2g In the above equations, the gain normalization factor g is identical to the gain normalization factor g at the encoder side and has e,g. a value of g 4/2
The downmix signal DMX and residual signal RES are then processed by the PS decoder 13 to obtain the final L and R output signals, The upmix step inthe de coding process for PS coding with aresidual canbe described by means of the 2'2 upmrixmatrix Hthat converts the downmix signal DMX and residual signal RES back to the L and R channels:
=H ,
R RE
The computation of the elements of the upmix matrix H was already discussed above,
The PS encoding and PS decoding process in the PS encoder 1 and the PS decoder 13 is preferably carried out in an oversampled frequency domain. For tImeto frequency transfonn e.g, a complex valued hybrid filter bank having a QMF (qua drature mirror filter) and aNyquist filter may be used upstream of the PS encoder, such as the filter bank described in MPEG Surround standard (see document
ISO/IEC 23003-1). The complex QMFTrepresentation of the signal is ovesampled for time with factor 2 since it is complex-valued and not real-valued, This allows and frequency adaptive signal processing without audible aliasing artifacts. Such at hybridfiltr bank typically provides high frequency resolution (narrow band) into a low frequencies, while at high frequency, several QMF bands are grouped wider band. The paper "Low Complexity Parametric Stereo Coding in MPEG-4", H. Puohagen, Proc, of the 7"' Int. Conference on Digital Audio Effects an enbo (DAFx'04), Naples, Italy, October 5-8, 2004, pages 163-168 describes is here diment of a hybrid filter bank (see section3.2 and Fig. 4). This disclosure rate is as by incorporated by reference. In this document a 48 klz sampling sumed, with the (nominal) bandwidth of a band from a 64 band QMF bank being of 375 Hzihe perceptual Barkfrequency scale however asks for a bandwidth approximately 100 Hz for frequencies below 500 Hz, Hence, the first 3 QMF filter bands may be split into further morenarrow subbands by means of a Nyquist negative bank The first QM1F band may be split into 4 bands (plus two more for into two bands each. frequenies), and the 2nd and 3rd QMF bands may be split
out in Preferably, the adaptive L/R or M/S encoding, on the other hand, is carried order to en the critically sampled MDCT domain (eg, as described in AAC)in downmix sure an efficient quartitized signal representation. The conversion of the signal Lp, R, in the signal DMX and residual signal RES to the pseudo stereo PS encoder 1 transform stage 2 may be carried out in the time domain since the Also and the perceptual encoder 3 may be connected inthe time domain anyway, decoder 13 in the decoding system, the perceptual stereo decoder 11 and the PS of the pseudo are preferably connected inthe time domain, Thus, the conversion RES in the stereo signal L,, Rp to the downmix signal DMX and residual signal transform stage 12 may be also carried out in the time domain,
in Fig. 1 is An adaptive L/R or MIS stereo coder such as shown as the encoder 3 model to so typically a-peroeptual audio coder that incorporates a psychoacoustic is an enable high coding efficiency at low bitrates, An example for such encoder
MDCT AAC encoder, which employs transform coding in a critically sampled controlled domain in combination with time- and frequency-variant quantization decision by using a psycho-acoustio model Also, the time- and frequency-variant ento between L/R and M/S coding Is typically controlled with hlp of perceptual py measures that are calculated using a psycho-acousti model,
in Fig, 1) operates on a The perceptual stereo encoder (such as the encoder 3 the coding efficien pseudo L/R stereo signal (see Lp, Rp inFig. 1), For optimizing decision between L/R cy of the stereo encoder (in particular for making theright encoding and M/S encoding) it is advantageous to modify the psycho-acoustio between L/R controlimeohanism (including the control mechanism which decides the time- and and M/S stereo encoding and the controlmechanismwhich controls encoder in order to ac f'requency-variant quantization) in the perceptual stereo count for the signal modifications (pseudo L/R to DMX and RES conversion, fol the final lowed by PS decoding) that are applied in the decoder when generating stereo output signal L, R. These signal modifications can affect binaural masing mechanisms. There phenomena that are exploited in the psycho-acousti control preferably be'adapted ao fore, these psycho-acoustic control mechanisms should cordingly, For this, it can be beneficialifthe psycho-acoustic controlmechanisms Fig, 1) but also to do not have access only to the pseudo L/R signal (see Lp, R, in L, R. The the PS parameters (see 5 in Fig. 1) and/or to the original stereo signal and to the access of the psycho-acoustic control mechanisms to the P8 parameters on this informa stereo signalL, R is indicated in Fig, I by the dashedlines, Based tion, e.g, the nasldng threshold(s) maybe adapted,
to augment the An alternative approach to optimize psycho-acoustic control is is able to effeo encoder system with a detector forming a deactivation stage that a time- and fre tively deactivate PS encoding when appropriate, preferably in appropriate-when L/R quiency-varant manner, Deactivating PS encoding is e.g. control stereo coding is expected to be beneficial or when the psycho-acoustic would have problems to encode the pseudo L/R signal efficiently, PS encoding in such a way may be effectively deactivated by setting the downmix matrix H` that the downmixmatrix 5- followed by the transform (see stage 2 in Fig. 1) matrix corresponds to the unity matrix (Ie, to an identity operation) or to theunity by forcing the PS times a factor, Bg, PS encoding may beeffectively'deactivated case the pseudo parameters IID and/or ICC to ID = 0 dB and ICC= 0, In this above. stereo signal L, R, corresponds to the stereo signal L, R as discussed
Such detector controlling a PS parameter modification is shown in Fig. 4. Here, the parameter estimat the detector 20 receives the PS parameters 5 determined by encoding, the detector Ing stage 9. When the detector does not deactivate the PS the multiplex 20 passes the PS parameters through to the downmix stage 8 and to to the PS parameters 5' fed to er 7, ie, in this case the PS parameters 5 correspond is disadvanta the downnix stage. In case the detector detects that PS encoding frequency bands), geous and PS encoding should be deactivated (for one or more paameters IID the detector modifies the affected PS parameters 5 (e.g set the PS parameters 5' and/or ICC to IID =0 dB and ICC - 0) and feeds the modified PS to downix stage 8, The detector can optionally also consider theleft and right dashed lines in Fig. signals L, R for deciding on aPS parameter modification (see 4),
Inthe following figures, the tenn QMF (quadrature mirror filter or filter bank) a Nyquist filter also includes a QMF subband filter bank in combination with the description bank, I.e. a hybrid filter bank struotare, Furthermore, all values in and upnix matrices below may be frequency dependent, e.g. different dowanmix may be extracted for different frequency ranges. Furthermore, the residual coding the residual signal Is may only cover part of the used audio frequency range (i6 of downmix as only coded for a part of the used audio frequency range). Aspects in the QMF domain will be outlined below may for some frequencyranges occur ranges only e,g, phase as (e.g according to prior art), while for other frequency whereas amplitude trans pets will be dealt with in the complex QMF domain, formation is dealt with in the real-valued MDCT domain,
In Fig. 5, a conventionalPS encoder system is depicted. Each of the stereo chan nelsL,, is at firstanalyzed by a complexQMF30 with Msubbands, e.g a QMF withM' 64 subbands, The subband signals aroused to estimate PS parameters 5 and a downmix signalDMX in aPS encoder 31. The downmix signalDMX is used to estimate SBR (Spectral Bandwidth Replication) parameters 33 in an SBR enboder 32, The SBR encoder 32 extracts the SBRparameters 33 representing the spectral envelope of the original high band signal, possiblyin combination with noise and tonality measures, As opposed to the PS encoder 31, the SBR encoder 32 does not affect the signal passed on to the core coder 34, The downniix signal DMX of the PS encoder 31 is synthesized using an inverse QMF 35 withN sub bands, .g. a complex QMF with N -32may be used, where only the 32lowest subbands of the 64 subbands used by the PS encoder 31 and the SBR encoder 32 are synthesized. Thus, by using half the number of subbands for the same frame size, a time domain signal of half the bandwidth compared to the inpnt is ob tained, and passed into the core coder 34, Due to the reduced bandwidth the sam pling rate can be reducedto the half (not shown). The core encoder 34 performs perceptual encoding ofthe mono input signal to generate a bitstream36, The PS parameters 5 are embedded Inthe bItstream36 by amultiplexer (not shown).
Fig. 6 shows a further embodiment of an encoder system which combines PS cod be ing using aresidual with a stereo core coder 48, with the stereo core coder 48 embodiment is ng capable of adaptive /R or M/S perceptual stereo coding. This merely illustrative for the principles of the present application, It is understood thatmodifications and variations of the embodiment will be apparent to others skilled In the art, The inputchannels L, R representing the left and right original channels are analyzed by a complex QMF 30, in a similar way as discussed in connection with Fig. S. In contrast to the PS encoder 31 in Fig, 5, the PS encoder 41 in Fig, 6 does not only output a downmix signalDMX but also outputs aresi dual signal RES, The dowmnix signalDMX is used by an SBR encoder 32 to do termine SBR parameters 33 of the dowtunix signal DMX A fixed DMX/RES to pseudo L/R transform (I.e. an MIS to L/R transform)is applied to the downmix DMX and the residualRES signals in a transform stage 2. The transform stage 2 in Fig. 6 corresponds to the transform stage 2in Fig. 1. The transform stage 2 to creates a "pseudo"left and right channel signal Lp, R for the cote encoder 48 is applied in-the operate on, In this embodiment, the inverse LR to M/S transform QMF domain, prior to the subband synthesis by filter banks 35. Preferably, the the number N (e.g N =32) of subbands for the synthesis corresponds to half 48 number M (e.g, M= 64) of subbands used for the analysis and the core coder is norestriction to operates at half the sampling rate, It should benoted that there use 64 subband channels for the QMF analysis in the encoder, and 32 subbands for the synthesis, other values are possible as well, depending-oti which sampling rate is desired for the signalreceived by-the core coder 48. The core stereo encod generate er 48 performs perceptual encoding ofthe signal of the filter banks 35 to a bitstream signal46, The PS parameters 5 are embedded inthe bitstream signal 46 by a multiplexer (not shown). Optionally, the PS parameters and/or the original indicates L/Rinput signal may be used by the core encoder 48. Such information infor to the core encoder 48 how the PS encoder 41 rotated the stereo space, The mation may guidethe core encoder 48 how to control quantization in a percep tually optimal way. This is indicated in Fig 6 by the dashed lines.
which is similar to Fig. 7 illustrates a further embodiment of an encoder system the embodiment in Fig. 6, Tn comparison to the embodiment of Fig 6, in Fig, 7 the SBRencoder42 is connected upstream of the PS'encoder 41. In Fig, 7the on the SBR encoder 42 has been moved prior to the PS encoder 41, thus operating on the left and right channels (here: in the QMF domain), instead of operating downmix signal DMIX as in Fig. 6,
be con Due to the re-arrangement of the SBR encoder 42, the PS enooder 41 may but e.g. only on the figured to operatenot on the full bandwidth of the input signal parame frequency range below the SBR crossover frequency. In Fig. 7, the SBR ters 43 are in stereo for the SBR range, andthe output from the corresponding PS decode as will be discussed later on in connection with Fig. 15 produces a stereo source frequency range for the SBR decoder to operate on. This modification, i.e. connecting the SBR encoder module 42upstream of the PS encodbminodule 41 in the encoder system and correspondingly placing the SBR decoder module after s the PS decoder module inthe decoder system (see Fig. 15),has the benefit that the use of a deconelated signal for generating the stereo output can be reduced, Please note that in case no residual signal exists at all or for a particularfrequency band, a decorrelated version of the downmix signal DMX misused instead in the PS de coder, However, areconstruction based on a decorrelated signal reduces the audio quality, Thus, reducing the use of the decorrelated signal increases the audio qual ity.
This advantage of the embodiment in Fig. 7 in comparison to the embodiment in Fig, 6 will be now explained more in detail withreference to Figs, 8a to 8d,
In Fig, 8a, a time frequency representation of one ofthe two output channels L, R where the (atthe decoder side) is visualized. In base of Fig, 8a, an encoder is used PS encoding module is placedin front of the SBR encoding module such as the encoder in Fig. 5 or Fig. 6 (in the decoder the PS decoder is placed after the SBR decoder, see Fig, 14). Moreover, the residualis coded only in a low bandwidth frequency range 50, which is smaller than the frequency range 51 of the core cod er. As evident from thespectrogramvisualization in Fig. 8a, the frequency range 52 where a decorrelated signal Is to be used by the PS decoder covers all of the frequency range apart from the lower frequency range 50 covered by the use of theresidual signal, Moreover, the SBR covers a frequency range 53 starting sig nificantly higher than that of the decorrelated signal, Thus, the entire frequency range separates in the following frequency ranges: in the lower frequency range (see range 50 in Fig, 8a), waveformocoding is used; inthe middle frequencyrange (see intersection of frequency ranges 51 and 52), waveform coding in combination with adecorrelated signalis used; and in the higher frequency range (see frequen fre- cy range 53), a SBR regenerated signal which is regenerated from the lower signal produced by the PS quencies Is used In combination with the decorrelated decoder,
L, R In Fig 8b, atime frequency representation of one of the two output channels SBR encoder is connected (atthie decoder side) is visualized for the casewhenthe (and the SBR decoder is located upstream of the PS encoder in the encoder system low bitrate scenario is after the PS decoder in the decoder system), In Fig. 8b a is per illustrated, with the residual signal bandwidth 60 (where residual coding 61 Since the SBR de formed) being lower than the bandwidth ofthe core coder 15), the coding process operates on the decoder side after the PS decoder (see Fig. the reconstruction of residual signal used for the low frequencies is also used for SBR range at least a part (see frequency range 64) of the higherfrequencies in the 63,
on intermediate bi The advantage becomes even more apparent when operating core coder trates where the residual signal bandwidth approaches or is equal to the of Fig. 8a (where the bandwidth. In this case, the time frequency representation 6 is used)results in the order of PS encoding and SBR encoding as shown in Fig. theresidual signal es time frequency representation shownin Fig, 8, In Fig. 8o, coder;inthe SBR fre sentially covers the entire lowband range 51 of the core decoder, In Fig. 8d, the quency range 53 the decorrelated signal is used by the PS of the enood time frequency representationin case of the preferred order on a stereo signal before PS ing/decoding modules (i.e. SBR encoding operating oper encoding, as shown in Fig. 7) is visualized. Here, the PS decoding module 15. Thus, ates prior to the SBR decoding module in the decoder, as shown in Fig. reconstruction. the residual signal is part of the low band used for highfrequency When the residual signal bandwidth equals that of the mono downnix signal to decoder the out bandwidth, no decorrelated signal information will be needed 8d). put signal (see the full frequency range being hatched in Fig.
In Fig. 9a, an embodiment of the stereo core encoder 48 withadaptively selectable L/R or MIS stereo encoding in the MDCT transform domain is illustrated, Such stereo encoder 48 may be used in Figs, 6 and 7. A nono core encoder 34 as shown in Fig. 5 can be considered as a special case of the stereo core encoder 48 in Fig, 9a, where only a single mono input channel is processed (i.e, where the second input channel, shown as dashed line in Fig. 9a, is not present).
In Fig, 9b, an embodiment of amore generalized encoder is illustrated, For mono signals, encoding can be switched between coding in alinear predictive domain (see block 71) and coding in a transform domain (see block 48). Such type of core coder introduces several coding methods which can adaptively be used dependent ode the upon the characteristics of the input signal. Here, the coder can choose to ste signal using either an AAC style transform coder 48 (available for mono and reo signals, with adaptively selectable L/R or M/S coding in case of stereo sig is nals) or an AMR-WB+(Adaptive Multi Rate -WideBand Plus) style core coder the 71 (only available for mono signals). The AMR-WB+ core coder 71 evaluates residual ofalinear predictor 72, and inturn alsochooses between a transform coding approach of the linear prediction residual or a classic speech coder ACELP (Algebraic Code Excited Linear Prediction) approach for coding the linear predic tion residual, For deciding between AAC style transform coder 48 and the AMR WB+style core coder 71, amode decision stage 73 is usedwhich decides based on the input signal betweenboth coders 48 and 71.
The encoder 48 is a stereo AAC style MDCT based coder. When the mode deci sion73 steers the input signal to use MDCT based coding, the mono input signal or the stereo input signals are coded by the AAC based MDCT coder 48. The MDCT coder 48 does an MDCT analysis of the one or two signals in vDCT stages 74, In case of a stereo signal, further, an M/S or L/R decisionon a frequen L/R cy band basis is performed in a stage 75 prior to quantization and coding. stereo encoding or MIS stereo encoding is selectable in a frequency-variant man ner. The stage 75 also performs a L/R to M/S transform. If M/S encoding is de- an M/S signal for this Oided for a particularfrequencyband, the stage 75'outputs aL/R signal for this frequency frequency band, Otherwise, the stage 75 outputs band, the full efcienoy of the stereo Hence, when the transform coding mode is used, core code can be used for stereo, coding functionality of the underlying to the linear predictive domain When the mode decision 73 steers the mono signal by means of linear predictive coder 71, the mono signalis subsequently analyzed made on whether to code the LP analysis in block 72, Subsequently, a decisionis coder 76 or a TCX style coder residualby means of a time-domain ACELP style MDCT domain. The linear pre 77 (Transform Coded eXoitation) operating in the coding capability. diotive domain coder 71 does not have any inheentstereo linear predictive domain coder Hence, to allow coding of stereo signal with the Fig, 5 can be used In this 71, an encoder configuration similar to that shownin 5 and a mono dowmnix sig configuration, aPS enooder generates PS parameters domain oader nal DMX, which is then encoded by the lineat predictive an encoder system, wherein parts of Fig. 10 illustrates a further embodiment of The DMX/RES to pseudo L/R Fig, 7 and Fig, 9 are combined in a new fashion. the AAC style downmix coder 70 block 2, as outlined in Fig, 7, is arranged within embodiment has the advantage that the prior to the stereo MDCT analysis 74.This only when the stereo MDCT core DMX/RES to pseudo L/R transform2 is applied the fullefficiency coder is used, HeAce, when the transform coding mode is used, core coder can be used for 25 of the stereo coding functionality of the underlying stereo o ding ofthe frequency range covered by the residual signal.
either on the mono input signal or While the mode decision 73 in Fig. 9b operates in Fig. 10 operates on the on the input stereo signal, the mode decision 73' of a mono input sig- 30 dowmnix signal DMX and the residualsignal RES, In case
-32.
nal, the mono signal can directly be sed as the DMX signal, the RES signal is set to zero, and the PS parameters can default to IID = 0 dB and ICC 1,
When the mode decision 73' steers the downmix signal DMX to the linear predic tive domain coder 71, the downnix signal DMX is subsequently analyzed by means of linear predictive analysis in block 72, Subsequently, a decision is made on whether to code the LPresidual by means of a time-domain ACELP style cod er 76 or a TCX style coder 77 (Transform Coded eXcitation) operating in the MDCT domain. The linear predictive domain coder 71 doesnot have any inherent stereo coding capability that can be used for coding the residual signalin addition to the downmixsignal DMX Hence, a dedicated residual code 78 is employed for enooding the residual signalRES when the downtix signalDMX is encoded by the predictive domain coder 71. F.g. such coder 78 may be a mono AAC cod er,
It should be noted that the coder 71 and 78 in Fig. 10 may be omitted (in this case the mode decision stage 73'is not necessary anymore),
Fig, 11a illustrates a detail of an alternative further embodiment of an enoder system which achieves the same advantage as the embodiment In Fig, 10. In con trastto the embodiment of Fig,10, in Fig, 11a the DMX/RES to pseudo L/R trans forn 2 is placed after the MDCT analysis 74 of the core coder 70, i.ethe trans form operates in the MDCT domain, The transform in block 2 is linear and time invariant and thus can be placed after the MDCT analysis 74.The remaining blocks of Fig. 10 which are not shown in Fig. 11 can be optionally addec in the same way in Fig, 11a, The MDCT analysis blocks 74 may be also alternatively placed after the transform block 2.,
Fig. 1lb illustrates an implementation of the embodiment in Fig, 11a, In Fig. 1lb, an exemplary implementation of the stage 75 for selecting between M/S or L/R encoding is show, The stage 75 comprises a sum and difference transform stage the pseudo ste 98 (more precisely a L/R to M/S transform stage) which receives reo signal L., Rp ,The transform stage 98 generates a pseudo mid/side signal Mp, Sp by performing ah L/R to Iv4/S transform, Except for a possible gain factor, the following applies: My = DMX and Sp = RES.
The stage 75 decides between L/R or M/S encoding. Based on the decision, either the pseudo stereo signal Lp, R, or the pseudo mid/side signalMp, Sp are selected be noted that also (see selection switch) and encoded in AAC block 97, It should two AAC blocks 97 may be used (not shown in Fig. 1lb), with the first AAC block 97 assigned to the pseudo stereo signal Lp, R, and thesecond AAC block 97 seleco assignedto the pseudo mid/side signal Mp, Sp, In this case, the L/R or M/S tion is performed by selecting either the output of the first AAC block 97 or the output of the second AAC block 97,
Fig, 11shows an alternative to the embodiment in Fig, la. Here, no explicit transform stage 2 is used. Rather, the transform stage 2 and the stage 75 is com bined in a single stage 75' The downmix signalDMX and the residual signal RES are fed to a sum and difference transform stage 99 (mpre precisely a DMX/RES to pseudo L/Rtransform stage) as part of stage 75' The transform stage 99 generates a pseudo stereo signalL, Rp,. The DN/RES to pseudoL/R in transform stage 99 in Fig. 1l ois similar to the L/R to M/S transform stage 98 inFig, 11tthe Fig, I1b (expect for a possibly different gain factor). Nevertheless, selection between MIS and L/R decoding needs to be inverted incomparisonto switch for the Figf1b. Note that in both Fig. bI and Fig, 110, the position of the L/R or MIS selection is shown in Lp/Rp position, which is the upper one in Fig. mean 1 band the lower one in Fig, Ieo, This visualizes the notion of the inverted ing of the L/R orM/S selection,
It should be noted that the switch in Figs. 1lband11 preferably exists indivi be dually for each frequency band in the MDCT domain such that the selection tween L/R and M/S can be both time- and frequency-variant, In other words: the stages 98 positionof theswitch is preferably frequency-variant The transfonn and 99 may transform the whole used frequency range or may only transfonn a single frequency band,
Moreover, it should be noted that all blocks 2, 98 and 99 can be called "sumand difference transform blocks" since all blocks implement a transfonnmatrix In the form of
Merely, the gain factor c may be different in the blocks 2, 98, 99.
In Fig. 12, a further embodiment of an encoder system is outlined. It uses an ex tended set of PS parameters which, in addition to ID an ICC (described above), includes two futher parametersIPD (inter channelphase different, see ep. be low) and OPD (overall phase difference, see opd below) that allow to characterize the phase relationship between the two channels L and R of a stereo signal, An example for these phase parameters is given in ISO/IEC 14496.3 subolause 8,6,4.6.3 which is hereby incorporated by reference. When phase paraineters axe used, the resultingupmix matrix Hcoax (and its inverse becomes
complex-valued, according to; HCOLeX = H .p1 ,
where 0 (exp(feo) 0 exp(Jo2 ))'
and where
The stage 80 of the PS encoder which operates in the complex QMF domain only rota takes care of phase dependenoesbetween the channels L, R. The downmix domain which tion(i.e. the transformation from the L/R domain to the DMX/RES Labove)is takencareofintheMDCT domain as was described by the matrix'H between the two part of the stereo core coder 81, Hence, the phase dependencies channels are extracted in the complex QMF domain, while other, real-valued, waveform dependencies are extracted inthe real-valued critically sampled MDCT has domain as part of the stereo coding mechanism of the core coder used, This can the advantage that the extraction of linear dependencies between the channels be tightly integrated inthe stereo 6o ding of the core co der (though, to prevent that aliasing inthe critical sampled MDCT domain, only for the frequency range is covered by residual coding, possibly minus a "guard band" on the frequency axis).
related The phane adjustment stage 80 of the PS encoder in Fig, 12 extracts phase difference) and OPD PS parameters, e.g. the parameters TPD (inter channel phase pro (overall phase difference). Hence, the phase adjustment matrix Hf that it duces may be according to the following,
H exp(-jp) (0 Hg= 0 exp(~?)
As discussedbefore, the downmixrotationpart of the PS module is dealtwithin the stereo coding module 81 of the core coder in Fig. 12, The stereo coding mod ule 81 operates in the MDCT domain and is shown in Fig, 13. The stereo coding domain, module 81 receives the phase adjusted stereo signalL , , R inthe MDCT matrix This signalis downmixed in a downtix stage 82 by a dowmixrotation H- whichis thereal-valued part of a complex downmix matrix Has discussed above, thereby generating the downmix signalDMX and residual signal RES, The dwnmix operationis followed by the inverse L/R to M/S transform according to the present application (see transform stage 2), thereby generating a the pseudo stereo signa1Lp, Ry. The pseudo stereo signal Lv, Rp is processed by stereo coding algorithm (see adaptive M/S or L/R stereo encoder 83), in this par perceptual en ticular embodiment a stereo coding mechanism that depending on or anM/Srepresenta tropy criteria decides to code either anL/Rrepresntation tion of the signal This decisionis preferably time- and frequency-variant, is suitable to de In Fig. 14 an embodiment of a decoder system is shown which Fig. 6, This em code abitstream 46 as generated by the encoder systemshown in application, It is bodiment is merely illustrative for the principles of the present will be apparent understood that modifications and variations of the embodiment to others skilled in the at. A core decoder 90 deodes the bitsteam 46 into pseu domain by filter do left and rightchannels, which are transformed inthe QMF of the result banks 91, Subsequently, a fixed pseudo L/R to DMX/RES transform thus creating a ing pseudo stereo signa1Lp, R, is performed in transform stage 12, coding, these downmix signal DMX and a residual signal RES, When using SBR signals are low band signals, e,g, the downmix signal DMX and residual signal RES may only contain audio informationfor the low frequency bandup to ap by an SBR decoder 93 to proximately 8 kHz. The downmix signal DMIX is used reconstrUct the high frequency band based on received SBR paramieters (not shown), Boththe output signal (including the low and reconstructedhigh frequen decoder 93 and the residual oy bands of the downmixsignal DMX):from the SBR in the QMF domain (in particu signal RES ate input to a PS decoder 94 operating at the lar in the hybrid QMF+Nyquistfilter domain), The dowumixsignalDMX high frequency input ofthe PS decoder 94 also contains audio information in the input of the PS band (e.g, up to 20 kz), whereas the residual signal RES at the Thus, forte high fre decoder 94 is a low band signal (e.g. limited up to 8 kHz), kHz), the PS decoder 94 uses a quency band (e.g.for the band from 8 kHz to 20 using the band li decorrelated version of the downmix signal DMX instead of the PS decoder 94 mited residual signal RBS, The decoded signals at the output of the are therefore based on a residual signal only up to 8 kz, After PS decoding, two output channels of the PS decoder 94 are transformed in the time domain by flter banks 95, thereby generating the output stereo signal L, R to de In Fig, 15 an embodiment of a decode system is shown.which is suitable code the bitstream 46 as generated by the encoder system shown in Fig, 7, This It is embodiment is merely illustrative for the principles of the present application, apparent understood that modifications and variations of the embodiment will be in Fig, 15 is to others skilled in the art,'The principle operation of the embodiment similar to that of the decoder system outlined in Fig, 14, In contrast to Fig, 14, the Moreo SBR decoder 96 inFig. 15 is located at the output of the PS decoder 94. forming stereo vet, the SBR decoder makes use of SBR parameters (notshown) envelope data in contrast to the mono SBRparameters in Fig. 14 The downmix and residual signal at the Input of the PS decoder 94 are typically low band sig contain audio nals, e.g. the downmix signal DMX and residual signal RBS may kz, information only for the low frequency-band, e.g. up to approxImately 8 PS Based on the low band downmix signal DMX and residual signal RES, the 8 kHz, encoder 94 deteniroes alow band stereo signal, e.g, up to approximately SBR decoder Based on the low bandstereo signal and stereo SBRparameters, the to the 96 reconstracts the high frequency part of the stereo signal, In comparison that no embodiment in Fig,14, the embodiment in Fig, 15 offers the advantage decorrelated signals needed (see also Fig, 8d) and thus an enhanced audio quality signal is is achieved, whereas in Fig, 14 for the high frequency part a decorrelated needed (see also Fig, 8c), thereby reducing the audio quality.
is i-verse to the en Fig, 16a shows an embodiment of a decoding system which a de coding system shown in Fig. 11a, The incoming bitstream signal is fed to decoded code block 100, which generates a first decoded signal102 and a second selected. This is signal103, At the encoder either M/S coding or L/R coding was or L/R indicated in the received bitstream, Based on this information, either M/S is selected in the selection stage 101, In case MS was selected in the encoder, the In case first 102 and second 103 signals are converted into a (pseudo) L/R signal. the 3o L/R was selected in the encoder, the first 102 and second 103 signals maypass Rp at the output of stage 101 without transformation, The pseudo L/R signal Lp, stage.101 is converted into anDMXDES signal by the transform stage 12 (this 100, 101 aid stage quasiperforms a L/R to MIS transform). Preferably, the stages 12 in Fig, 16a operate in the MDCT domain, For transfoming the downmix sig blocks 104 nal DMX and residual signals RES into the time domain, conversion may be used, Thereafter, the resulting signal is fed to a PS decoder (not shown) and optionally to an SBR decoder as shown in Figs, 14 and 15. The blocks 104 may be also alternatively placed before block 12,
Fig, 16b illustrates 'n implementation of the embodiment inFig. 16a, In Fig. 16b, an exemplaryimplementation of thestage 101 for selecting between M/S or L/R decoding is shown. The stage 101 comprises a sum and difference transform stage 105 (M/S to L/R transform) which receives the first 102 and second 103 signals,
Based on the encoding information givenin the bitstream, the stage 101 selects is either L/R or M/S decoding.When L/R decoding is selected, the output signal of the decoding block 100 is fed to the transform stage 12,
no explicit Fig, 16c shows an alternative to the embodiment in Fig. 16a. Here, are transform stage 12 is used, Rather, the transform stage 12 and the stage 101 merged in a single stage 101 'The first 102 and second 103 signals are fed to a sum and difference transform stage 105' (more precisely a pseudo L/R to DIX/RES transform stage) as part of stage 101' The transform stage 105' gene rates a DMX/RES signal. The transform stage 105' in Fig. 16c is similar or iden tical to the transform stage 105 in Fig. 16b (expect for a possibly different gain be in factor), In Fig, 16c the selection between M/S and L/R decoding needs to vertedin comparison to Fig. 16b. In Fig, 16c the switch is in the lower position, whereas in Fig, 16b the switch is in the upper position, This visualizes the inver sion ofthe L/R or M/S selection (the selection signal may be simply inverted by an inverter),
It should be noted that the switch in Figs.16b and 16c preferably exists indivi dually for each frequency band in the MDCT domain such that the selection be tween L/R and M/S can be both time- andfrequency-variant. The transform stages 105 and 105' may transform the whole used frequency range or may onlytrans form a single frequency band,
Fig, 17 shows a further embodiment of an encoding system for coding a stereo signalL, R into a bitstream signal. The encoding system comprises a-downmix based on stage 8 for generating a downmix signal DMX and aresidual signalRES the stereo signal, Further, the encodingsystem comprises aparameter determining stage 9 for determining one or more parametric stereo parameters 5, Further, the encoding system comprises ineans 110 for perceptual encoding downstream of the dowmnix stage 8. The encoding is selectable: - encoding based on a sum signal of the downmix signal DMX and the resi dual signal RES and based on a difference signal of the downmix signal DDIVC and the residual signal RES, or encoding based on the downmix signal DMX and theresidual signal RES.
Preferably, the selection is time- and frequency-variant,
The encoding means 110 comprises a sum and difference tiansform stage 111 which generates the sum and difference signals Further, the enooding means 110 dif comprise a selection block 112 for selecting encoding based on the sum and ference signals or based onthedownmix signal DMX and the residual signal RES,.Furthermore, an encoding block 113 Is provided. Alternatively, two encod ing blocks 113 may be used, with the first encoding block 113 encoding the DMX and RES signals and the second encoding block 113 encoding the sum and differ ence signals, In this case the selection 112is downstream of the two encoding blocks 113.
The sum and difference transforinnblock1 Iis of theform
Thetransformblock 111 may correspond to transformblock99 in Fig. 110,
stereo The output of the perceptual encoder 110 is combined with the parametric 6, parameters 5 inthe multiplexer 7 to form the resultingbitstream
In contrast to the structure in Fig, 17, encoding based on the downmix signal DMX and residual signalRES may be realized when encoding a resulting signal whichis generated by transformingthe dowmnix signal DMX and residual signal (seethe RBS by two serial'sum and difference transforms as shown in Fig, 1lb difference two transform-blocks 2 and 98). Theresulting signal after two sum and transforms corresponds to the dowmix signal DMX andresidual signal RES (ex cept for apossible different gain factor),
is inverse to the encoder Fig, 18 shows an embodiment of a decodersystem which 120 for perceptual decod system inFig. 17. The decoder system comprises means are separated ing based on bitstream signal. Before decoding, the PS parameters 120 com from the bitstream signal 6 in demiltiplexer 10. The decoding means signal prise a core decoder 121 which generates a first signal122 and a second signal DMX and a 123.(by decoding), The decoding means output a downmix residual signal RES,
The dowmnix signalDMX and theresidual signal RES are selectively 123 and - based on the sum of the fist signal122 and of the second signal based on the difference of the first signal122 and of the second signal123 or - based onthe first signal122 aad based on th second signal 123.
Preferably, the selection is time- and frequency-variant. The selection is per formedin the selectionstage 12.
124 The decoding means 120 comprise a sum and difference transfom stage which generates sum and difference signals,
The sum and difference transforminblock 124 is of the form
101
The transform block 124 may correspond to transfornblock 105' in Fig. 16o,
After selection, the DMX and RES signals are fedto an upnix stage 126 for gene rating the stereo signal L, Rbsed on the downmix signal DlvMX and theresidual is signalRES, The upmix operation Is dependent on the PS parameters 5.
Preferably, in Figs, 17 and 18 the selection Is frequency-variant, In Fig. 17, e.g. a time to frequency transform (e.g, by aMDCT or analysis filter bank)may beper formed as first step in the perceptual encoding means 110. In Fig. 18, e.g a fre quency to time transform (e.g. by an inverse MDCT or synthesis filter bank) may be performed as the last step in the perceptual decoding means 120,
patame It should be noted that in the above-described embodiments, the signals, ters and matrices may be frequency-variant or frequency-invariant and/or time fre variant or time-invariant, The described computing steps may be carried out quency-wise or for the complete audio band,
Moreover, it should be noted that the various sum and difference transforms, i.e. the DMX/RES to pseudo L/ transform, the pseudo L/R to DMX/RES transform, the L/R to MIS transform and the M/S to L/R transform, are all of the form
121 each of these Merely, the gain factor c may be different, Therefore, in principle, these transforms. If the transforms may be exchanged by a different transform of this may be compensated in the gainis not correct during the encoding processing, different ofthe sum decoding process. Moreover, when placing two same or two to the and difference transforms is series, the resulting transform corresponds identity matrik (possibly, multiplied by a gain factor).
encoder, different In an encoder system comprising both a PS encoder and a SBR 6, the PS/SBR configurations are possible. In a first configuration, shown in Fig. encoder 41, In a second con SBR encoder 32 is connected downstream of the PS is connected upstream of the PS figuration, shown in Fig. 7, the SBR encoder 42 the properties of the is encoder 41. Depending upon e.g, the desired target bitate, one ofthe configurations core encoder, and/or one or more various other factors, Typically, can be preferred over the other in order to provide best performance, while for higher bi for lower bitrates;the first configuration can be preferred, it is desirable if anen trates, the second configuration can be preferred, Hence, coder system supports both different configurations to be able to choose a pre and/or one or more ferred configuration depending upon e.g. desired target bitrate other criteria.
dif Also in a decoder system comprising both a PS decoder and a SBR decoder, shownin 'ferentPS/SBR configurations ae possible. In afirst configuration, of the PS decoder 94, In a Fig, 14, the SBIR decoder 93 is connected upstream isconnected down second configuration, shown in Fig. 15, the SBR decoder 96 stream of the PS decoder 94. In order to achieve correct operation, the'configura system. If the encoder tion of the decoder system has to match that of the encoder configured is configured according to Fig. 6, then the decoder is correspondingly according to Fig. 14.If the encoder is configuredaccording to Fig. 7,then the ensure decoder is correspondingly configured according to Fig, 15. In order to correctoperation, the encoder preferably signals to the decoder whichPS/SBR is configuration was chosen for encoding (and thus which PS/SBR configuration selects the to be chosen for decoding), Based on this information, the decoder appropriate decdc erconfiguration, there is prefera As discussed above, in order to ensure correct decoder operation, which configuration is bly a mechanism to signal from the encoder to the decoder an dedi to be used inthe decoder, This can be done explicitly (e.g, by means of cated bit or field inthe configuration header of the bitstream as discussed below) in case of or implicitly (eg. by checking whether the SBR datais mono or stereo PS data being present).
ele As discussed abov6, to signal the chosen PS/Sk configuration, a dedicated the ment inthe bitsteam header of the bitstream conveyed from the encoder to decoder may be used, Such a bitstreamheader carries necessary configuration data in the information that is needed to enable the decoder to correctly decode the bit bitstream, The dedicated element in the bitstream header may be e.g. a one flag, a fl6ld, or It may be anindex pointing to a specific entry in atable that speci fies different decoder configurations.
element for Instead of includingin the bitstream header an additional dedicated inthe bitstream signaling the PS/SBR configuration, Information already present oonfi may be evaluated at thedecoding system for selecting the correct PS/SBR from bitstream guration, Rg. the chosen PS/SBR configuration may be derived con header configuration information for the PS decoder and SBR decoder, This decoder is to be confi figuration informationtypically indicates whether the SBR a PS decoder is gured for mono operation or stereo operation, If, for example, in enabled and the SBR decoder is configured for mono operation (as indicated to Fig. 14 can the configuration information), the PS/SBR configuration according be selected, If a PS decoder is enabled and the SBR decoder is configured for stereo operation, the PS/SBR configuration according to Fig. 15 can be selected.
The above-described embodiments are merely illustrative for the principles of the present application, It is understood that modifications and variations of the arrangements and the 5 details described herein will be apparent to others sidled in the art, It is the intent, therefore, that the scope of the application is not limited by the specific details presented by way of description and explanation of the embodiments herein.
The systems and methods disclosed in the application may be implemented as software, firmware, hardware or a combination thereof, Certain-components or all components may be implemented as software running on a digital signalprocessor or microprocessor,or implemented as hardware and or as application specific integrated circuits,
Typical devices making use of the disclosed systems and methods are portable audioplayers, mobile communication devices, set-top-boxes, TV-sets, AVRs (audio-video receiver), personal computers etc.
The above references to and descriptions of prior proposals or products are not intended to be, and are not to be construed as, statements or admissions of common general knowledge in the art, In particular, the above prior art discussion doesnot relate to what is commonly or well known by the person skilled in the art, but assists in the understanding of the inventive step of the present invention of which the identification of pertinent prior art proposals is but one part,
Throughout the specification and claims the word "comprise" and its derivatives are intended to have an inclusive rather than exclusive meaning unless the contrary is expressly stated or the context requires otherwise, That is, the word "comprise" and its derivatives will be taken to indicate the inclusion of not only the listed components, steps or features that it directly references, but also other components, steps or features not specifically listed, unless the contrary is expressly stated or the context requires otherwise,

Claims (30)

1. A method for encoding a stereo input signal having a left channel and a right channel, the method comprising: selecting either a transform coding mode or a linear predictive coding mode as a selected coding mode; encoding the stereo input signal using the selected coding mode to produce an encoded output signal; and generating a bitstream signal including the encoded output signal, wherein, if the linear predictive coding mode is selected, the encoding comprises: downmixing the stereo input signal to a mono signal, the mono signal being a sum of the left channel and the right channel, estimating parameters for reconstructing the stereo input signal from the mono signal, encoding the mono signal using linear predictive coding to produce an encoded mono signal, and outputting the encoded mono signal and the parameters as the encoded output signal, wherein, if the transform coding mode is selected, the encoding comprises: analyzing the stereo input signal, selecting a stereo coding mode based on the analyzing, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, encoding the stereo input signal using the selected stereo coding mode in a first frequency band to produce an encoded stereo signal in a first frequency band, downmixing the stereo input signal to a mono signal in a second frequency band, encoding the mono signal in the second frequency band using transform coding to produce an encoded mono signal in the second frequency band, and outputting the encoded stereo signal in the first frequency band and the encoded mono signal in the second frequency band as the encoded output signal.
2. The method of claim 1 wherein the analyzing includes selecting which stereo coding mode would more efficiently code the stereo input signal.
3. The method of claim 1 wherein the analyzing includes applying both mid/siding stereo coding and left/right stereo coding and selecting the stereo coding mode based on an estimated entropy for each stereo coding mode.
4. The method of claim 1 wherein the parameters are time and frequency variant.
5. The method of claim 1 wherein the selecting is dependent upon characteristics of the stereo input signal.
6. The method of claim 1 wherein the transform coding includes modified discrete cosine transform (MDCT) coding.
7. The method of claim 1 wherein the transform coding further comprises not encoding one or more subbands and generating side information for reconstruction of the one or more subbands.
8. The method of claim 7 wherein the side information includes a parameter used to determine a spectral envelope of the one or more subbands not encoded.
9. The method of claim 1 wherein the transform coding includes a psychoacoustic model.
10. The method of claim 1 wherein the estimating includes estimating parameters for reconstructing the stereo input signal from the mono signal in a plurality of frequency bands and generating estimated parameters for each of the plurality of frequency bands.
11. The method of claim 1 wherein a bandwidth of the first frequency band and a bandwidth of the second frequency band is determined based at least in part on a desired target bitrate.
12. The method of claim 1 wherein the linear predictive coding mode is selected when the stereo input signal is speech.
13. A non-transitory computer readable medium containing instructions that when executed by a processor perform the method of claim 1.
14. A device for encoding a stereo input signal having a left channel and a right channel to produce an encoded output signal, the device comprising: a mode selector for selecting either a transform coding mode or a linear predictive coding mode; a transform encoder for encoding the stereo input signal if the selected coding mode is transform coding; a linear predictive encoder for encoding the stereo input signal if the selected coding mode is linear predictive coding; and a bitstream generator for generating a bitstream signal including the encoded output signal, wherein the linear predictive encoder is configured to: downmix the stereo input signal to a mono signal, the mono signal being a sum of the left channel and the right channel, estimate parameters for reconstructing the stereo input signal from the mono signal, encode the mono signal using linear predictive coding to produce an encoded mono signal, and output the encoded mono signal and the estimated parameters as the encoded output signal, wherein the transform encoder is configured to: analyze the stereo input signal, select a stereo coding mode, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, encode the stereo input signal using the selected stereo coding mode in a first frequency band to produce an encoded stereo signal in the first frequency band, downmix the stereo input signal to a mono signal in a second frequency band, encode the mono signal in the second frequency band using transform coding to produce an encoded mono signal in the second frequency band, and output the encoded stereo signal in the first frequency band and the encoded mono signal in the second frequency band as the encoded output signal.
15. A method for decoding a bitstream signal to produce a decoded output signal having a left channel and a right channel, the method comprising: extracting an encoded audio signal from the bitstream signal, the encoded audio signal generated using a selected coding mode, the selected coding mode including either a transform coding mode or a linear predictive coding mode; decoding the encoded audio signal using the selected coding mode to produce a decoded signal; and outputting the decoded signal as the decoded output signal, wherein, if the selected coding mode is the linear predictive coding mode, the decoding comprises: receiving a mono signal, the mono signal being a sum of the left channel and the right channel, decoding the mono signal using linear predictive decoding to produce a decoded mono signal, extracting parameters from the bitstream signal for reconstructing a stereo audio signal, reconstructing the stereo audio signal using the decoded mono signal and the parameters to produce a reconstructed stereo audio signal, and outputting the reconstructed stereo audio signal as the decoded signal, wherein, if the selected coding mode is the transform coding mode, the decoding comprises: receiving a stereo signal in a first frequency band, the stereo signal generated using a selected stereo coding mode, the selected stereo coding mode including either mid/siding stereo coding or left/right stereo coding, receiving a mono signal in a second frequency band, decoding the stereo signal in the first frequency band using the selected stereo coding mode to produce a decoded stereo signal in the first frequency band, decoding the mono signal in the second frequency band using transform decoding to produce a decoded mono signal in the second frequency band, and outputting the decoded stereo signal in the first frequency band and the decoded mono signal in the second frequency band as the decoded signal.
16. The method of claim 15 wherein the parameters are time and frequency variant.
17. The method of claim 15 wherein the transform coding includes modified discrete cosine transform (MDCT) decoding.
18. The method of claim 15 wherein the transform coding further comprises extracting side information from the bitstream signal for reconstruction of one or more subbands not encoded.
19. The method of claim 18 wherein the side information includes a parameter used to determine a spectral envelope of the one or more subbands not encoded.
20. The method of claim 15 wherein the transform coding includes a psychoacoustic model.
21. The method of claim 15 wherein the parameters comprises parameters for each of a plurality of frequency bands.
22. The method of claim 15 wherein a bandwidth of the first frequency band and a bandwidth of the second frequency band is determined based at least in part on a desired target bitrate.
23. A device for decoding a bitstream signal to produce a decoded output signal having a left channel and a right channel, the device comprising: a demultiplexer for extracting an encoded audio signal from the bitstream signal, the encoded audio signal generated using a selected coding mode, the selected coding mode including either a transform coding mode or a linear predictive coding mode; a transform decoder for decoding the encoded audio signal if the selected coding mode is the transform coding mode; and a linear predictive decoder for decoding the encoded audio signal if the selected coding mode is the linear predictive coding mode, wherein the linear predictive decoder is configured to: receive a mono signal, the mono signal being a sum of the left channel and the right channel, decode the mono signal using linear predictive decoding to produce a decoded mono signal, extract parameters from the bitstream signal for reconstructing a stereo signal, reconstruct the stereo signal using the decoded mono signal and the parameters to produce a reconstructed stereo audio signal, and output the reconstructed stereo signal as the decoded output signal, wherein transform decoder is configured to: receive a stereo signal in a first frequency band, the stereo signal generated using a selected stereo coding mode, the selected stereo coding mode including either a mid/siding stereo coding mode or a left/right stereo coding mode, receive a mono signal in a second frequency band, decode the stereo signal in the first frequency band using the selected stereo coding mode to produce a decoded stereo signal in the first frequency band, decode the mono signal in the second frequency band using transform decoding to produce a decoded mono signal in the first frequency band, and output the decoded stereo signal in the first frequency band and the decoded mono signal in the second frequency band as the decoded output signal.
24. The device of claim 23 wherein the parameters are time and frequency variant.
25. The device of claim 23 wherein the transform coding includes modified discrete cosine transform (MDCT) decoding.
26. The device of claim 23 wherein the transform coding further comprises extracting side information from the bitstream signal for reconstruction of one or more subbands not encoded.
27. The device of claim 26 wherein the side information includes a parameter used to determine a spectral envelope of the one or more subbands not encoded.
28. The device of claim 23 wherein the transform coding includes a psychoacoustic model.
29. The device of claim 23 wherein the parameters comprises parameters for each of a plurality of frequency bands.
30. The device of claim 23 wherein a bandwidth of the first frequency band and a bandwidth of the second frequency band is determined based at least in part on a desired target bitrate.
AU2019222947A 2009-03-17 2019-08-30 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding Active AU2019222947B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2019222947A AU2019222947B2 (en) 2009-03-17 2019-08-30 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
AU2021290344A AU2021290344B2 (en) 2009-03-17 2021-12-23 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.
AU2022209299A AU2022209299A1 (en) 2009-03-17 2022-07-28 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US61/160,707 2009-03-17
US61/219,484 2009-06-23
AU2010225051A AU2010225051B2 (en) 2009-03-17 2010-03-05 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
AU2013206557A AU2013206557B2 (en) 2009-03-17 2013-06-26 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
AU2015246158A AU2015246158B2 (en) 2009-03-17 2015-10-23 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.
AU2018200340A AU2018200340B2 (en) 2009-03-17 2018-01-15 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
AU2019222947A AU2019222947B2 (en) 2009-03-17 2019-08-30 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU2018200340A Division AU2018200340B2 (en) 2009-03-17 2018-01-15 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2021290344A Division AU2021290344B2 (en) 2009-03-17 2021-12-23 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.

Publications (2)

Publication Number Publication Date
AU2019222947A1 AU2019222947A1 (en) 2019-09-19
AU2019222947B2 true AU2019222947B2 (en) 2021-10-14

Family

ID=54602193

Family Applications (3)

Application Number Title Priority Date Filing Date
AU2015246158A Active AU2015246158B2 (en) 2009-03-17 2015-10-23 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.
AU2018200340A Active AU2018200340B2 (en) 2009-03-17 2018-01-15 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
AU2019222947A Active AU2019222947B2 (en) 2009-03-17 2019-08-30 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

Family Applications Before (2)

Application Number Title Priority Date Filing Date
AU2015246158A Active AU2015246158B2 (en) 2009-03-17 2015-10-23 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.
AU2018200340A Active AU2018200340B2 (en) 2009-03-17 2018-01-15 Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

Country Status (1)

Country Link
AU (3) AU2015246158B2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005101370A1 (en) * 2004-04-16 2005-10-27 Coding Technologies Ab Apparatus and method for generating a level parameter and apparatus and method for generating a multi-channel representation
US20090210234A1 (en) * 2008-02-19 2009-08-20 Samsung Electronics Co., Ltd. Apparatus and method of encoding and decoding signals
US20100274557A1 (en) * 2007-11-21 2010-10-28 Hyen-O Oh Method and an apparatus for processing a signal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7573912B2 (en) * 2005-02-22 2009-08-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschunng E.V. Near-transparent or transparent multi-channel encoder/decoder scheme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005101370A1 (en) * 2004-04-16 2005-10-27 Coding Technologies Ab Apparatus and method for generating a level parameter and apparatus and method for generating a multi-channel representation
US20100274557A1 (en) * 2007-11-21 2010-10-28 Hyen-O Oh Method and an apparatus for processing a signal
US20090210234A1 (en) * 2008-02-19 2009-08-20 Samsung Electronics Co., Ltd. Apparatus and method of encoding and decoding signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Grill et al., "Scalable Joint Stereo Coding", 105th AES Convention, Sep. 1998 *

Also Published As

Publication number Publication date
AU2019222947A1 (en) 2019-09-19
AU2018200340B2 (en) 2019-07-11
AU2015246158B2 (en) 2017-10-26
AU2015246158A1 (en) 2015-11-19
AU2018200340A1 (en) 2018-02-08

Similar Documents

Publication Publication Date Title
US11322161B2 (en) Audio encoder with selectable L/R or M/S coding
AU2021290344B2 (en) Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding.
AU2019222947B2 (en) Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding
RU2799400C2 (en) Audio signal processing device for stereo signal encoding into bitstream signal and method for bitstream signal decoding into stereo signal implemented by using audio signal processing device