WO2009146734A1

WO2009146734A1 - Multi-channel audio coding

Info

Publication number: WO2009146734A1
Application number: PCT/EP2008/056813
Authority: WO
Inventors: Juha OJANPERÄ
Original assignee: Nokia Corporation
Priority date: 2008-06-03
Filing date: 2008-06-03
Publication date: 2009-12-10

Abstract

For supporting a reconstruction of a multi-channel audio signal, correlation status information is evaluated at an encoder side or a decoder side. The correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal. In case the correlation status information indicates that there is no significant correlation between the channels, a value representing a difference between channels in the segment of the multi-channel audio signal is modified.

Description

MuIti-channel audio coding

FIELD OF THE INVENTION

The invention relates to the field of multi-channel audio coding.

BACKGROUND OF THE INVENTION

Audio coding systems are used in particular for transmitting or storing audio signals.

In a basic structure of an audio coding system, which is employed for transmission of audio signals, the audio coding system comprises an encoder at a transmitting side and a decoder at a receiving side. The audio signal that is to be transmitted is provided to the encoder. The encoder is responsible for adapting the incoming audio data rate to a bitrate level at which the bandwidth conditions in the transmission channel are not violated. Ideally, the encoder discards only irrelevant information from the audio signal in this encoding process. The encoded audio signal is then transmitted by the transmitting side of the audio coding system and received at the receiving side of the audio coding system. The decoder at the receiving side reverses the encoding process to obtain a decoded audio signal with little or no audible degradation.

Alternatively, the audio coding system could be employed for archiving audio data. In that case, the encoded audio data provided by the encoder is stored in some storage unit, and the decoder decodes audio data retrieved from this storage unit. In this alternative, it is the target that the encoder achieves a bitrate which is as low as possible, in order to save storage space.

The original audio signal can be a mono audio signal or a multi-channel audio signal containing at least a first and a second channel signal. An example of a multichannel audio signal is a stereo audio signal, which is composed of a left channel signal and a right channel signal. Another example is an audio signal that is used for a surround technology and includes for example two stereo channels, an additional center channel and two surround channels.

Depending on the allowed bitrate, different encoding schemes can be applied to a multi-channel audio signal. The different channels can be encoded for instance independently from each other. But typically, a correlation exists between the different channels of a multi-channel audio signal, and the most advanced coding schemes exploit this correlation to achieve a further reduction in the bitrate.

Examples for reducing the bitrate for an encoded stereo audio signal comprise low bitrate stereo extension methods. In a stereo extension method, the stereo audio signal is encoded as a high bitrate mono signal, which is provided by the encoder together with some side information reserved for a stereo extension. In the decoder, the stereo audio signal is then reconstructed from the high bitrate mono signal in a stereo extension making use of the side information. The side information typically takes only a few kbps of the total bitrate. Parametric multi-channel audio coding methods, such as Binaural Cue Coding (BCC) , enable a high-quality multichannel reproduction with reasonable bit-rate compared to a scenario where all channels are encoded and transmitted separately. The compression of a spatial image is based on generating one or several down-mixed signals together with a set of spatial cues. The decoder uses the received down-mixed signals and the spatial cues to synthesize a set of channels - which can be different from the number of input channels - with spatial properties as described by the received spatial cues.

The spatial cues typically include an inter-channel level difference (ICLD), an inter-channel time difference

(ICTD) and an inter-channel coherence/correlation (ICC) .

ICLD and ICTD aim at describing the signals from the actual audio sources, whereas the ICC aims at enhancing the spatial sensation by introducing a diffuse component of the audio image, including reverberations, ambience, etc. These cues are normally provided for each frequency band separately.

The decoding side typically uses a filter that is controlled by the received ICC cues to recreate a coherence/correlation approximating the coherence/correlation which is present in the input signals .

SUMMARY OF SOME EMBODIMENTS OF THE INVENTION

A method is described, which comprises evaluating correlation status information, wherein the correlation status information indicates whether or not there is a - A -

significant correlation between channels in a segment of a multi-channel audio signal. The method further comprises modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

Moreover, a first apparatus is described, which comprises a processor. The processor is configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal. The processor is further configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

The apparatus may comprise for example exclusively the described processor, but it may also comprise additional components. The apparatus could further be for example a module provided for integration into an electronic device, like a processing component, a chip or a circuit implementing the processor, or it could be such a device itself. In the latter case, it could be for instance an electronic device, which comprises in addition an interface configured to receive captured multi-channel audio signals and/or an interface configured to output multi-channel audio signals.

Moreover, a second apparatus is described, which comprises means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal, and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

The means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit. It is to be understood that further or correspondingly adapted means may be comprised for realizing any of the functions that may optionally be implemented in any described embodiment of the first apparatus.

Moreover, a computer readable storage medium is described, in which computer program code is stored. The computer program code realizes the described method when executed by a processor. The computer readable storage medium could be for example a disk or a memory or the like. The computer program code could be stored in the computer readable storage medium in the form of instructions encoding the computer-readable storage medium. It is to be understood that also the computer program code by itself has to be considered an embodiment of the invention. Thus, certain embodiments of the invention provide that information about a correlation status is used as a decision criterion whether to apply a certain modification to a value indicating a difference between different channels of a multi-channel audio signal.

The considered audio signal can be for instance a speech signal, but equally any other kind of audio signal, like a music signal. The considered segment can be for instance a frame of an audio signal, but equally any other kind of segment, like a superframe or a subframe. An audio signal may comprise any number of segments, including one. The described processing can further be performed for example for each of a plurality of frequency bands in the segment of an audio signal, only for selected ones of a plurality of frequency bands, or on the entire frequency range of the segment of the audio signal as a whole. A selection of frequency bands could also differ from one segment to the next.

The multi-channel audio signal may comprise only two channels, for instance the left and right channel of a stereo signal, or any other number of channels, for instance five channels for a surround audio signal. The correlation information status could be derived for instance from an inter-channel correlation (ICC) cue obtained in a binaural cue coding, but it could be obtained in any other manner as well which is suited to indicate whether or not a significant correlation between channels is given. The difference value that may be modified could be for instance an inter-channel level difference (ICLD) cue obtained in a binaural cue coding, but it could equally be any other kind of value that can be modified to increase the decorrelation of the channels if appropriate.

In one embodiment of the described method, the described apparatuses or the described computer program code, the correlation status information is represented by a single bit.

In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based on equations using exclusively non-random values. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.

In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.

In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based alternatively or in addition on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels. In this case, a value indicating the amount of the correlation itself might not required so that only the correlation status has to be provided. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.

In one embodiment of the described method, the modifying of a value representing a difference between channels is based alternatively or in addition on a correlation value indicating the amount of correlation between the channels, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates in contrast a similar level of a signal in the channels. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification. In a corresponding embodiment of the described computer program code, the code may be implemented to realize a corresponding modification.

In one embodiment of the described method, the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is associated to an encoder side, which generates the correlation status information and the value representing a difference between the channels . In a corresponding embodiment of the described computer program code, the code is code for such an encoder side. In one embodiment of the described method, the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is associated to such a decoder side. In a corresponding embodiment of the described computer program code, the code is code for such a decoder side.

In a variation of this embodiment of the described method, the method comprises obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to obtain such additional information. In a corresponding embodiment of the described computer program code, the code may be implemented to obtain such additional information.

In one embodiment, a method is an information providing method, comprising the steps of evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels. In a further embodiment, an apparatus is an information providing apparatus comprising processing means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and processing means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

In one embodiment of the invention, one of the described apparatuses can be seen as an audio signal encoding or decoding apparatus.

It is to be understood that any feature presented for a particular exemplary embodiment may also be used in combination with any other described exemplary embodiment of any category.

Further, it is to be understood that the presentation of the invention in this section is merely exemplary and non-limiting.

Other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims . It should be further understood that the drawings are not drawn to scale and that they are merely intended to conceptually illustrate the structures and procedures described herein.

BRIEF DESCRIPTION OF THE FIGURES

Fig. 1 is a schematic block diagram of a coding system in which an exemplary embodiment of the invention is implemented;

Fig. 2 is a schematic block diagram presenting functional blocks of an exemplary encoder;

Fig. 3 is a schematic block diagram presenting functional blocks of an exemplary decoder; Fig. 4 is a flow chart illustrating an operation at an encoding side in the system of Figure 1; Fig. 5 is a flow chart illustrating an operation at a decoding side in the system of Figure 1; Fig. 6 is a schematic block diagram of an electronic device in which another exemplary embodiment of the invention is implemented; and Fig. 7 is a flow chart illustrating an operation in the electronic device of Figure 6.

DETAILED DESCRIPTION OF THE INVENTION

Figure 1 is a schematic diagram of an exemplary system which supports a correlation status controlled modification of inter-channel level differences.

The system comprises a first electronic device 110 and a second electronic device 120.

The first electronic device 110 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission, for example an audio recording device.

The device 110 comprises a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.

The processor 112 is configured to execute implemented computer program code.

The memory 113 stores computer program code 114, which may be retrieved by the processor 112 for execution. The stored program codes 114 comprise code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension cues and for generating inter-channel correlation values and status. The memory 113 may comprise in addition a data storage portion 115.

The processor 112 and the memory 113 could optionally be integrated in a single component, for example on a chip 111.

The interface 116 could be for instance microphones or comprise a socket for connecting microphones.

The transmitter 117 could belong for example to a cellular engine of the device 110 and be configured to transmit data via a cellular communication network to other devices.

The second electronic device 120 can also be for instance a mobile phone, but equally any other device which is able to decode audio data for presentation to a user. The device 120 comprises a processor 122 and, linked to this processor 122, a memory 123, an interface for presenting audio data 126 to a user and a receiver (RX) 127.

The processor 122 is configured to execute implemented computer program code.

The memory 123 stores computer program code 124, which may be retrieved by the processor 122 for execution. The stored program codes 124 comprise code for decoding audio data. It includes code for modifying stereo extension values under control of an inter-channel correlation status, and for reconstructing a multi-channel audio signal. The memory 123 may comprise in addition a data storage portion (not shown) .

The processor 122 and the memory 123 could optionally be integrated in a single component, for example on a chip 121.

The interface 126 could comprise for instance loudspeakers or a socket for connecting loudspeakers.

The receiver 127 could belong for example to a cellular engine of the device 120 and be configured to receive data via a cellular communication network from other devices .

The interfaces 117 and 127 are configured in any case such that they enable device 110 to transmit encoded audio data to device 120, either directly on a wired or wireless link or indirectly via some communication network.

Figure 2 is a high-level block diagram of an encoder implemented by the program code 114 of device 110. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of an encoder providing the same functions as the program code 114. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.

The encoder includes a transform block 201 for transforming the data of a left channel of an audio signal 'L' from the time domain into the frequency domain. The resulting frequency domain signal is denoted 'Lf'. The encoder further includes a transform block 202 for transforming the data of a right channel of an audio signal 'R' from the time domain into the frequency domain. The resulting frequency domain signal is denoted

The encoder moreover includes a mono conversion block 203, which is configured to create a down-mixed signal by converting the stereo signal into a mono signal

M_f =0.5-(L_f +R_f) and to pass the mono signal to a mono encoder, for example to an embedded variable bitrate (EV- VBR) mono encoder 204. The mono conversion block 203 may be further configured to generate a difference signal, for example D_f =0.5-(L_f -R_f) _r from the stereo signal and to pass the difference signal to a stereo encoder 205 to assist the stereo encoding process. Optionally, a different way to create the down-mixed signal and the difference signal can be used, for example one comprising a linear combination of the input channels with possible phase correction.

The mono encoder 204 is configured to encode a received mono signal and to provide a resulting bitstream to a bitstream multiplexer 208. The stereo encoder 205 is configured to generate and encode stereo extension data, including a quantization to obtain a desired bitrate, and to provide a resulting bitstream to the multiplexer 208. Any kind of stereo encoder could be used to this end.

The encoder moreover includes further transformers 206, which are configured to transform the left and right channel signals L and R to the frequency domain, and a correlation encoder 207, which is configured to analyze the left and right channel signals in the frequency domain, to decide which of the spectral bands need decorrelation at a decoder side, and to pass corresponding correlation flags to the multiplexer 208. It is to be understood that in an alternative embodiment instead of employing separate transformers 206, also the output of transformers 201, 202 could be provided to correlation encoder 207.

Finally, the multiplexer 208 is configured to multiplex all received information to create a bitstream for storage or transmission.

Figure 3 is a high-level block diagram of a decoder implemented by the program code 124 of device 120. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of a decoder providing the same functions as the program code 124. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.

The decoder includes a demultiplexer 307, which is configured to demultiplex a bitstream that has been retrieved from a memory or received from another device and to pass the demultiplexed data to a mono decoder, for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.

The mono decoder 304 is configured to decode a received encoded mono signal.

The stereo decoder 305 is configured to extract and decode stereo extension data from the bitstream, to combine this data with the decoded mono signal to reconstruct a stereo signal, and to output the reconstructed left and right output channels L_f and R_f to inverse transformers 301 and 302. In addition, the stereo decoder 305 is configured to provide the extracted stereo extension values to the decorrelation block 306 before using them in the reconstruction and to receive modified stereo extension values from the decorrelation block 306 for use in the reconstruction.

The decorrelation block 306 is configured to extract correlation flags from the bitstream, to modify the stereo extension values when needed, and to provide the modified values to the stereo decoder 305.

It is to be understood that in a practical implementation, stereo decoder 305 and decorrelation block 306 can be integrated in a single functional block for optimizing the processing.

The inverse transformer 301 is configured to obtain the time domain left channel L by performing a frequency-to- time domain transformation on reconstructed left channel L_f, and the inverse transformer 302 is configured to obtain the time domain right channel R by performing a frequency-to-time domain transformation on reconstructed right channel R_f.

Finally, the regained stereo signal may be provided for presentation to a user or stored for later consumption.

The operation of the encoder implementation of device 110 will be described in more detail with reference to the flow chart of Figure 4.

The operations can be considered to be realized by processor 112 when executing the code for encoding audio data 114 retrieved from memory 113, and equally to be realized by the corresponding functional blocks of the encoder of Figure 2.

When a multi-channel audio signal that is to be stored or transmitted is received by device 110 via audio interface 116, it is forwarded to the processor 112 for encoding. Only for reasons of simplicity, it will be assumed again that the multi-channel audio signal is a stereo signal.

The data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis. The multi-channel audio signal is transformed into the frequency domain (action 401) . The employed transform can be any complex valued transform such as a discrete Fourier transform (DFT) , a quadrature mirror filterbank (QMF) transform, or a combination of a modified discrete cosine transform (MDCT) and a modified discrete sine transform (MDST) . In an exemplary implementation, MDCT is used to obtain the real valued signals whereas MDST is used to obtain the imaginary counterpart for the same input signal.

The left and right channel signals are down-mixed to a mono signal (action 411), and the mono signal is encoded for transmission (action 412) .

For the stereo extension, the frequency range of each frequency domain frame is divided into a plurality of frequency bands .

The left and right channel signals are used for determining multi-channel extension values for each frequency band, including ICLD values (action 421) . The difference signal can be used for enhancing the stereo quality, in particular when higher bitrates are available .

The ICLD values could be determined for each frequency band for example as the logarithm of the power ratio of corresponding subbands from the input signal as follows: id*) = 10 lo_Bl0 - pψRig*h®Ui:)

Offset[i +l]-l

_PLeft{±) = ∑ [L_realf (j)² + L_±magf (jf) j-Offset[i]

Offset[i+l]-l pRightiϊ) = ∑ (R_realfOf + Ri_mag_f(jf) j=Offset{i]

where icld(i) is the ICLD cue for frequency band i of the current frame, where Offset describes the start and end indices for each spectral band, and where L_real , L_j__mag ,

R_reai_f and R±_magf are the complex valued spectral representations of the left and right channels.

The multi-channel extension values are encoded for transmission (action 422) .

Moreover, a correlation measure, such as the inter- channel correlation (ICC) is calculated for each of a plurality of spectral bands (action 431) .

The inter-channel correlation (ICC) can be calculated for example as follows :

lcc_t(i) = 0.3 ^■ iσc_t__]_(!) + 0.7 • a(i), 0 < i < M (1)

where icc_t (i) is the inter-channel correlation in frequency band i of the current frame, where lcc_t_ι contains the ICC values from the previous frame and where M is the number of spectral bands present for each frame. IcC_f-_^1 could be initialized to ¹I¹ (or any other suitable value) at start up. Optionally, the correlation measures in several previous frames could be taken into account for example by generalizing the equation (1} into a weighted sum of the past values:

icc_t(i)=k_j-icc_t__j{i)+k₀-a(i), O≤KM (Ib)

where icc_t-j is the correlation measure in frequency band i of the j:th frame counting backwards from the current frame, and kj is the weight assigned to the correlation measure in frequency band i of the j : th frame counting backwards from the current frame.

Furthermore, the values a(i) can be computed as follows:

pE(i) = p_real(if + Pi_magiif

sbθffset[i + l]-l

PrealOO = ∑ (^Lr_Sal_f (j) ' ^Rreal_f (j) + ^Lim_ag_f (j) ' ^Rimag_f (j)) j = sbθffset[i]

sbθffset[l + l]-l

Pimagi¹) = ∑ (^Limag_f U) " R_rSal_f (j) + ^Lreal_f(j) ' ^Rimag_f(j)) j=sbθf fset[i]

sbθffset[±÷ϊ\-l pLθft(i) = ∑ \^L _realf(j)² + L_±lαagf{jf) j = sbθffset[i]

(2! εbθffset[i + l]-l , . pRight{i) = ∑ ψreal_f(jf + ^R±mag_f(j)²) j = sbθffset[i] where sbOffset describes the start and end indices for each spectral band, and L_realf , L_lmaQf , R_realf and R_lmagf are again the complex valued spectral representations of the left and right channels. The spectral bands that are considered for the ICC related computations can be the same as those considered for the computation of the ICLD values, but they may equally be different. For calculating the ICC, for example Af=14 frequency bands could be selected with the following predetermined start and end indices sbθffset[] for each spectral band:

sbOffsetU = {0, 5, 11, 18, 25, 33, 43, 56, 72, 91, 116, 146, 183, 240, 274}

With an exemplary frequency resolution per spectral bin of 25 Hz, the considered spectral bands for the ICC related computations could then cover 6850 Hz. The total frequency range of the audio signal segment, which be considered in the computation of the ICLD values, could be larger than 6850 Hz, but decorrelation might not be applied to higher frequencies where the impact to subjective quality is lower.

Next, the final ICC value for each band is obtained as follows (action 432) :

JLCC_t (D - 0 < i < M

In the presented exemplary implementation, thus all original ICC values exceeding 0.75 are mapped to a value of '1' indicating that the channel signals do not differ greatly from each other and no decorrelation is needed at the decoder side. The exemplary value 0.75 in equation (3) can be considered as a threshold for a correlation measure value (ICC) indicating significant correlation. The threshold value can be a fixed or adaptive value, and it may be selected for example based on desired performance, based on the application, based on the characteristics of the input signal, etc.

Next, the final ICC values are mapped to flag bits for bitstream multiplexing (action 433) . In an exemplary implementation, respectively two neighboring ICC values are mapped to a single flag bit to further save the side information associated with the signaling bits as follows: i cc_t(i) = 1 or 1O¹ bi t i = 0,2,4, . . . , M icc_t(i + 1) == 1 ( 4 )

T M t otherwise j = 1,2,3, . . . , M / 2

Each flag bit thus provides correlation status information for two frequency bands of a frame indicating that there is a significant correlation between the channels in these frequency bands in the current frame (flag bit = ¹O') or that there is no significant correlation between the channels in these frequency bands in the current frame (flag bit = '1') . Alternative approaches include having one flag bit per frequency band or one flag bit for any arbitrarily selected set of frequency bands.

Finally, the flag bits are provided for transmission or storage as follows:

for (i=0; i < M/2; i++)

Send/store value of icc_flag(i) with 1 bit Optionally, it could be determined in addition whether the signal level in a frequency band is very similar across the channels, that is, whether the ICLD cue is equal to 1 for any frequency band (action 434) . In this case, the final ICC value itself may be provided in addition in the bitstream for enabling a better decorrelation.

To this end, the final ICC values of equation (3) could be quantized for encoding as follows (action 435) :

^lcc _a i_dx(i) = Q{icc_t(i)_f qTbl), 0 < i < M (5)

where qTbl describes a table for quantized ICC values and where the quantization operator Q() returns the table index that minimizes the squared error between the ICC value in question and the quantization table value corresponding to the index. In an exemplary implementation, the table is as follows:

qTbl[] = {0.4, 0.3, 0.2, 0.1}.

As the value of an ICLD cue represents the level difference between the channel signals and does not take into account the phase difference, the threshold value for the high correlation status in equation (3) could be decreased from 0.75 for example to 0.5 to limit the decorrelation only to spectral bands where decorrelation is perceptually most relevant. This has also the advantage that the side information gets simultaneously minimized. The ICC flag could thus be re-mapped in case of an ICLD cue equal to 1 and provided for transmission or storage as follows: for(i=0; i < M/2; i++) { if (icld(2*i) == 1 and icc_t(i) > 0.5) icc_f2ag(i) = O' bit Send value of j-cc_flag(i) with 1 bit }

The quantized indices of the final ICC values could then be provided for transmission or storage as follows:

for(i=0; i < M/2; i++) { if (icc_flag(i) =='1' bit } if (ic2d(2*i) == 1)

Send value of icc_q ±_dxϋ-) with 2 bits

}

It is assumed in this implementation that the mapping between flag bits and frequency bands is known from the context. Optionally, however, information indicating for which frequency band or bands a respective flag applies or any further information could be provided using additional bits {action 436) .

The encoded mono signal, the encoded stereo extension information and the decorrelation information, the latter including ICC flags and optionally encoded ICC values, are multiplexed to a bitstream for transmission via interface 117 or storage in data storage portion 115 (action 441) .

The bitstream can be constructed in such a way that all encoded data belonging to the same frame, i.e. the encoded mono signal, the encoded stereo extension information and the decorrelation information, are included in a single data unit. In another example the encoded mono signal of a frame can be encapsulate in one data unit, while the stereo extension information and the decorrelation information for this frame are combined into another data unit. In yet another example the encoded data of a frame is encapsulate in several data units, each comprising encoded data representing a certain frequency range.

The operation of the decoder implementation of device 120 will be described in more detail with reference to the flow chart of Figure 5.

The operations can be considered to be realized by processor 122 when executing the code for encoding audio data 124 retrieved from memory 123, and equally to be realized by the corresponding functional blocks of the decoder of Figure 3.

When receiving encoded multi-channel audio data via receiver 127 that is to be presented to a user, the data is forwarded to the processor 112 for processing.

The received bitstream is first demultiplexed (action 501) .

The mono signal is extracted from the bitstream and decoded (action 511) .

The extension values, including for example ICLD cues, are equally extracted from the bitstream and decoded (action 521) . Moreover, ICC flags are extracted from the bitstream and expanded to full resolution (action 531) as follows:

for(i=0; i < M/2; i++) { i^ccfiag __dec(^{2 ■} i)=read 1 bit i^ccflag _ _dec(² ' i + I) = ICCf -_lag _ _dec(2 ^■ i)

}

The "read 1 bit", which is used as a respective decoded ICC flag icC_fi_agι corresponds to the icc_fi_ag determined in equation (4) for a respective pair of neighboring frequency bands - optionally modified in case of an ICLD cue equal to 1 as described above. The number of received ICC flags is doubled by associating the same flag Iccfi_ag _dec to two neighboring frequency bands, respectively.

If available, additional associated information is extracted from the bitstream and decoded, for example ICC values or information linking a respective ICC flag and/or value to a respective frequency band. Indices for the ICC values could be read from the bitstream right after the flag bits and converted into ICC values as follows:

for (i=0; i < M/2; i++) { i f ( icc_{fl ag} _ _dec{2 ^■ i) == U ' bit ) if ( icld ( 2*i ) == 1 ) { icc_g (2 • i) = qTbl [ read 2 bits ] icc_g(2 ^■ i + 1) = icc_g(2 ■ i)

} }

The "read 2 bits" correspond to a respective index icCq_i_dχ as defined above in equation (5), while icc_q is the quantized value associated to a particular index icc_q_i_dx in table qTbl. An exemplary table qTbl has already been introduced above. Also the number of obtained quantized ICC values is doubled by associating the same quantized ICC value to two neighboring frequency bands, respectively.

Next, the decoded ICC flag is evaluated for all frequency bands of the current frame (action 532) . That is, it is determined whether it has a value of ¹I' representing a low correlation or a value of '0' representing a significant correlation between the channels.

In case of an ICC flag representing a significant correlation (action 533) , the decoded extension values are not modified (action 534} .

In case of an ICC flag representing a correlation that is not significant (action 533), the decoded extension values are modified (action 536 or 537) .

Both cases can be summarized with the following equation:

(6)

where lcld(i) is the decoded level difference for each frequency band i of a respective frame, as extracted from the bitstream.

The modification summand b(i) may be determined as follows:

b(i) = 0.3 ^■ ice _ CJeC_t-1(I) + 0.7 ^■ ice _ dec_t(i) (7)

where ice _ dec_t is a decoder internal ICC value for the current frame and where ice _ dec_t_ι is a decoder internal

ICC value that contains the decoder ICC value from the previous frame. The decoder internal ICC value for the previous frame ice _ dec_t_₁ may be initialized for example to 1 at start up. The decoder internal ICC value for the current frame ice _ dec_t may be determined as follows:

f^c

( 8 ! icld(if - ⁵ > 2 . 0

otherwise

where iσc_g contains the quantized ICC values for those bands where the corresponding ICLD value is 1, and where MIN returns the minimum of the specified input values.

The adapted scaling using the energy parameter is based on the decoded mono signal and determined as follows: . .. X e(i) > iccGain_t energy(±) = ^A w ^t _r 0 ≤ i < M

otherwise

d(i) = eMonσ(ϊ) ^{■ ■} iccGain_t eMax

eMonod)

" sjbOffsetfi + 1] - sjbθffset[i]

eMax = max_eMoriO(eMojio)

( 9 )

where M_f is the frequency domain signal of the decoded mono signal. sbOffsetf] defines again the offsets for the considered spectral bands, which may be the same as indicated above for the encoding. iccGain_t is an adaptive gain that is initialized to a suitable value, for example to '6' at start up. Then, it is updated for the respective next frame as follows based on the energy (i) computed for the current frame in equation (9) :

iccGain_{t + 1} = MIW(6,O.3 • iccGa±n_t + 0.7 • lccGain_e) (10)

where iccGain_t is the gain value of the current frame, iccGain_t+1 is the gain value for the next frame, and

Equation (9) introduces a time-frequency dependent gain for the decorrelation to improve the perceptual quality. This is especially advantageous for signal frames or other signal segments in which the ICLDs have a relatively flat response.

It is thus checked in accordance with equation (8} whether ICLD is equal to 1 (action 535) . If this is not the case the modification takes place based on the mono signal M_f, the extension values icld(i)_r the frequency band offsets sbOffsetf] and intermediate values from previous frames, namely decoder ICC values icc_dec_t-i (1) and the adaptive gain iccGain_t (action 536) . Otherwise, the modification takes place based on the received extension values icld(i), the quantized ICC values icc_q(±) and decoder ICC values icc_dec_t-_! (i) from the previous frame (action 537) .

It has to be noted that the first option of equation (8) for the cases of icld(i) == 1 is only used in an implementation in which also ICC values are transmitted in case the ICLD values are equal to 1, otherwise the option is simply omitted. Detailed analysis of equation (8) shows, however, that decorrelation performs better with the first option whenever the value of icld(i) is equal to 1. Otherwise, icld(i) of value 1, indicating that the channel signals are similar in a level difference sense, would lead ice dec_t(i) in Equation (8) to zero and thus, no perceptually significant decorrelation contribution could be expected when applying equation (6) .

The original or modified stereo extension values (action 534, 536 or 537) are then used for reconstructing the multi-channel audio signal by up-mixing the decoded mono signal {action 522) .

Finally, the reconstructed multi-channel audio signal is transformed again into the time domain (action 541) and then presented to a user via audio out interface 126.

It has to be noted that it is not required that the ICC flags are transmitted together with the actual audio data._. They could also be transmitted separately from the other data.

Further, as mentioned before, a corresponding decorrelation could also be applied for an audio signal comprising more than two channels. In this case, one of the channels could be selected to be a reference channel, and correlation flags could indicate the correlation between a respective channel and this reference channel. Alternatively, correlation flags could indicate the correlation between any arbitrary pair of channels.

Furthermore, in an embodiment processing an audio signal comprising more than two channels more than one down- mixed signal could be generated and transmitted. In such a case a set of ICLD values, ICC flags, and possibly ICC value may be provided for each down-mixed signal separately.

Moreover, it has to be noted that ICLD cues and inter- channel correlation could also be computed in the time domain instead of the frequency domain. Furthermore, the presented approach could equally be employed for modifying other kinds of values representing a difference between channels than BCC ICLD cues and other inter- channel correlation information than BCC ICC cues.

Another variation of the presented approach may comprise a modified computation of the inter-channel level differences. An exemplary modified computation will be presented in the following for the case of a stereo audio signal .

First, the left and right channel input signals are converted to the frequency domain using a shifted discrete Fourier transform (SDFT) . The resulting complex- valued spectral samples are converted to the energy domain as follows:

Er (i) = f_L ⁽if + f_L. (if, 0 < i < N

where f_L and f_R are the complex valued shifted discrete Fourier transform (SDFT) samples of the left and right channels, respectively, and N is the size of the frame.

Next, the energy level for each spectral subband is calculated according to:

O-TfSeU₁ [i +l]-l e_L(i) = ∑ E_L(i), 0 < 2 < M j = offseti \i\

where offset_λ is a frequency offset table describing the frequency bin offsets for each spectral subband, and where M is the number of spectral subbands present in the region .

The inter-channel level differences can then be determined for different frequency bands in the form of stereo gain values gain{i) as follows:

^vI₁Ci) - t%_{M + gR(1))}

^^*« - i^gR (i) + g_R(i»

o-f-fset₂[i + l]-l ^L 00 = ∑ eχ(i) j = orfset₂[i] offse C₂ [i+ I]-I j=offset2[i]

where offset₂ is the frequency offset table describing the frequency bin offsets for each spectral subband, where K is the number of spectral gain subbands present in the region, and where max ( ) and min() return the maximum and minimum of the specified samples, respectively.

These gain values may then correspond to inter-channel level differences, which are modified whenever an ICC status indicates that there is a low correlation between the channels.

Additional position values may indicate to which channel a respective gain value belongs. The position values may be post-processed to obtain a stable stereo image over time.

Figure 6 is a schematic diagram of an exemplary electronic device which supports a correlation status controlled modification of inter-channel level differences at an encoder side.

The electronic device 610 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission.

The device 610 comprises a processor 612 and, linked to this processor 612, a memory 613, an interface for receiving audio data 616, and a transmitter (TX) 617.

The processor 612 is configured to execute implemented computer program code .

The memory 613 stores computer program code 614, which may be retrieved by the processor 612 for execution. The stored program code 614 comprises code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension values, for determining inter-channel correlation values and status, for modifying the stereo extension values under control of the ICC status, and for encoding the mono signal and the modified stereo extension values for storage or transmission. The memory 613 may comprise in addition a data storage portion 615.

The processor 612 and the memory 613 could optionally be integrated in a single component, for example on a chip 611. The interface 616 could comprise for instance a plurality of microphones or comprise a socket for connecting microphones .

The transmitter 617 could belong for example to a cellular engine of the device 610 and be configured to transmit data via a cellular communication network to other devices.

The operation of the encoder implementation of device 610 will be described in more detail with reference to the flow chart of Figure 7.

The operations can be considered to be realized by processor 712 when executing the code for encoding audio data 714 retrieved from memory 713 or by a corresponding hardware implementation.

When a multi-channel audio signal that is to be stored or transmitted is received by device 610 via audio interface 616, it is forwarded to the processor 612 for encoding. For reasons of simplicity, it will be assumed again that the multi-channel audio signal is a stereo signal.

The data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.

The multi-channel audio signal is transformed into the frequency domain (action 701) .

The left and right channel signals are combined to a mono signal (action 711) , and the mono signal is encoded for transmission (action 712) .

For the stereo extension, each frequency domain frame is divided into M frequency bands.

The left and right channel signals are used for determining multi-channel extension values, including ICLD values for each frequency band (action 721) .

Moreover, the inter-channel correlation is calculated for each spectral band in accordance with above equations (1) and (2) (action 731) .

Next, a final ICC value for each band is obtained in accordance with above equation (3) (action 732) .

In case the ICLD cue for the current frame and spectral band is unequal to 1 (action 733) , the extension values are modified based on the mono signal values under control of the ICC status in accordance with above equations (6)-(10) (action 734) . The ICC status could be determined to this end in accordance with above equation (4) . It is to be understood, however, that generating separate ICC flags is only optional, since no flags have to be transmitted in this case.

In case the ICLD cue for the current frame and spectral band is equal to 1 (action 733), the extension values are modified based on the final ICC values under control of the ICC status in accordance with above equations (6) -(8) (action 735) . The ICC status can be determined for this case in accordance with above equation (4) using for example a threshold value of 0.5 instead of 0.75 in equation (3} . A quantization of the ICC values may not be required in this case, since they are not necessarily transmitted. Thus, the quantized values ±cc_q in equation (8) could simply be replaced by the final ICC values icc_t obtained with equation (3) . Alternatively, the quantized values icc_q in equation (8) could be replaced by the ICC values icc_t(i) determined in accordance with equation (1) .

The modified multi-channel extension values are encoded (action 722) .

The encoded mono signal and the encoded modified stereo extension information are multiplexed to a bitstream for transmission or storage (action 741) .

Some decoders may then decode the encoded data in a conventional manner without applying any further decorrelation processing.

Using a correlation/coherence processing in a parametric multi-channel audio coding process may result in an improved user-experience due to enhanced spatial sensation. Some embodiments of the invention allow reducing the correlation between channels that are derived from the mono signal by modifying values representing a difference between channels, for instance values representing a level difference. As a result, correlation between the channels better approximates that of the original stereo signal, thus improving the feeling of spaciousness. Certain embodiments of the invention further allow improving the naturalness and subjective audio quality of a low bit-rate multi-channel audio coding system by using an improved and effective transmission or storage and by processing the correlation/coherence information in a way exploiting the data from previous frames. Some embodiments may also be suited to improve the multi-channel audio quality across a wide range of signals.

Certain embodiments of the invention using information about a correlation status as a decision criterion whether to modify a value representing a difference between channels in the segment of the multi-channel audio signal ensure that the amount of decorrelation processing is reduced compared to an approach in which a decorrelation processing is performed in any case.

If the modification is performed at the decoder, certain embodiments further ensure that the actual correlation values have to be provided at the most when a decorrelation is appropriate. Such embodiments thus enable a particularly low bitrate coding where only limited bits are available for the coding of the correlation information. The lowest amount of data has to be provided if the correlation status information is encoded as a single bit, the association to frequency bands is predetermined, and the actual correlation values are never provided. However, providing an association to frequency bands as additional information may render some embodiments more flexible, since the association may change in this case from segment to segment of the audio signal. Providing the actual correlation values in selected cases may further improve the decorrelation without unduly increasing the amount of required side information. For example, the transmission of ICC values may be limited to a few cases, while otherwise only a one bit status may be transmitted. If the modification is performed at the encoder, certain embodiments ensure that less side information has to be provided to a decoder and that a decoder which does not support decorrelation processing at all could be employed.

Certain embodiments ensure that only deterministic values are used in the modification instead of random numbers. This ensures that the decorrelation procedure can be adapted better to the concrete spatial situation.

It is to be understood that any presented connection is to be understood in a way that the involved components are operationally coupled. Thus, the connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.

Further, any of the mentioned processors could be of any suitable type, for example a computer processor, an application-specific integrated circuit (ASIC), etc. Any of the mentioned memories could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read- only memory, a flash memory or a hard disc drive memory etc. Furthermore, any other hardware components that have been programmed in such a way to carry out the described functions could be employed as well.

Moreover, any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor. References to 'computer-readable storage medium' should be understood to encompass specialized circuits such as field-programmable gate arrays, application-specific integrated circuits (ASICs), signal processing devices, and other devices.

The functions illustrated by the combination of processor 122 and memory 123, by the decorrelation block 306 or by the combination of processor 612 and memory 613 can be viewed as means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal; and as means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

The program codes 124 or 614 can also be viewed as comprising such means in the form of functional modules.

While there have been shown and described and pointed out fundamental novel features of the invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the spirit of the invention. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto. Furthermore, in the claims means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures.

Claims

What is claimed is:

1. A method comprising: evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

2. The method according to claim 1, wherein the correlation status information is represented by a single bit.

3. The method according to claim 1 or 2, wherein the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.

4. The method according to one of claims 1 to 3, wherein the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.

5. The method according to one of claims 1 to 4, wherein the modifying of a value representing a difference between channels is based on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.

6. The method according to one of claims 1 to 5, wherein the modifying of a value representing a difference between channels is based on a correlation value indicating the amount of correlation between the channels, in case the value of the level difference indicates a similar level of the channels.

7. The method according to one of claim 1 to 6, wherein the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels.

8. The method according to one of claim 1 to 6, wherein the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.

9. The method according to claim 8, comprising obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid.

10. An apparatus comprising a processor, the processor configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and the processor configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

11. The apparatus according to claim 10, wherein the correlation status information is represented by a single bit.

12. The apparatus according to claim 10 or 11, wherein the processor is configured to modify a value representing a difference between channels based on equations using exclusively non-random values .

13. The apparatus according to one of claims 10 to 12, wherein the processor is configured to modify a value representing a difference between channels based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi- channel audio signal.

14. The apparatus according to one of claims 10 to 13, wherein the processor is configured to modify a value representing a difference between channels based on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.

15. The apparatus according to one of claims 10 to 14, wherein the processor is configured to modify a value representing a difference between channels based on a correlation value indicating the amount of correlation between the channels, in case the value of the level difference indicates a similar level of the channels.

16. The apparatus according to one of claim 10 to 15, wherein the processor is configured to modify a value representing a difference between channels at an encoder side and to generate the correlation status information and the value representing a difference between the channels.

17. The apparatus according to one of claim 10 to 15, wherein the processor is configured to modify a value representing a difference between channels at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.

18. The apparatus according to claim 17, wherein the processor is configured to obtain at the decoder side in addition information on frequency bands for which the correlation status information is valid.

19. An electronic device comprising: an apparatus according to one of claims 10 to 18; and an interface configured to output multi-channel audio signals.

20. An electronic device comprising: an apparatus according to one of claims 10 to 18; and an interface configured to receive captured multi- channel audio signals.

21. A computer program code realizing the following when executed by a processor: evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.

22. The computer program code according to claim 21, wherein the correlation status information is represented by a single bit.

23. The computer program code according to claim 21 or 22, wherein the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.

24. The computer program code according to claim 21 or 23, wherein the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.

25. The computer program code according to one of claim 21 to 24, wherein the computer program code is a computer program code for an encoder side generating the correlation status information and the value representing a difference between the channels.

26. The computer program code according to one of claim 21 to 24, wherein the computer program code is a computer program code for a decoder side which is provided with the correlation status information and the value representing a difference between channels generated by an encoder side.

27. A computer readable storage medium in which computer program code according to one of claims 21 to 26 is stored.

28. An apparatus comprising: means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.