WO2009146734A1 - Multi-channel audio coding - Google Patents

Multi-channel audio coding Download PDF

Info

Publication number
WO2009146734A1
WO2009146734A1 PCT/EP2008/056813 EP2008056813W WO2009146734A1 WO 2009146734 A1 WO2009146734 A1 WO 2009146734A1 EP 2008056813 W EP2008056813 W EP 2008056813W WO 2009146734 A1 WO2009146734 A1 WO 2009146734A1
Authority
WO
WIPO (PCT)
Prior art keywords
channels
correlation
difference
value representing
status information
Prior art date
Application number
PCT/EP2008/056813
Other languages
French (fr)
Inventor
Juha OJANPERÄ
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/EP2008/056813 priority Critical patent/WO2009146734A1/en
Publication of WO2009146734A1 publication Critical patent/WO2009146734A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels

Definitions

  • the invention relates to the field of multi-channel audio coding.
  • Audio coding systems are used in particular for transmitting or storing audio signals.
  • the audio coding system comprises an encoder at a transmitting side and a decoder at a receiving side.
  • the audio signal that is to be transmitted is provided to the encoder.
  • the encoder is responsible for adapting the incoming audio data rate to a bitrate level at which the bandwidth conditions in the transmission channel are not violated. Ideally, the encoder discards only irrelevant information from the audio signal in this encoding process.
  • the encoded audio signal is then transmitted by the transmitting side of the audio coding system and received at the receiving side of the audio coding system.
  • the decoder at the receiving side reverses the encoding process to obtain a decoded audio signal with little or no audible degradation.
  • the audio coding system could be employed for archiving audio data.
  • the encoded audio data provided by the encoder is stored in some storage unit, and the decoder decodes audio data retrieved from this storage unit.
  • the encoder achieves a bitrate which is as low as possible, in order to save storage space.
  • the original audio signal can be a mono audio signal or a multi-channel audio signal containing at least a first and a second channel signal.
  • An example of a multichannel audio signal is a stereo audio signal, which is composed of a left channel signal and a right channel signal.
  • Another example is an audio signal that is used for a surround technology and includes for example two stereo channels, an additional center channel and two surround channels.
  • different encoding schemes can be applied to a multi-channel audio signal.
  • the different channels can be encoded for instance independently from each other. But typically, a correlation exists between the different channels of a multi-channel audio signal, and the most advanced coding schemes exploit this correlation to achieve a further reduction in the bitrate.
  • Examples for reducing the bitrate for an encoded stereo audio signal comprise low bitrate stereo extension methods.
  • the stereo audio signal is encoded as a high bitrate mono signal, which is provided by the encoder together with some side information reserved for a stereo extension.
  • the stereo audio signal is then reconstructed from the high bitrate mono signal in a stereo extension making use of the side information.
  • the side information typically takes only a few kbps of the total bitrate.
  • Parametric multi-channel audio coding methods such as Binaural Cue Coding (BCC) , enable a high-quality multichannel reproduction with reasonable bit-rate compared to a scenario where all channels are encoded and transmitted separately.
  • BCC Binaural Cue Coding
  • the compression of a spatial image is based on generating one or several down-mixed signals together with a set of spatial cues.
  • the decoder uses the received down-mixed signals and the spatial cues to synthesize a set of channels - which can be different from the number of input channels - with spatial properties as described by the received spatial cues.
  • the spatial cues typically include an inter-channel level difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference (ICLD), an inter-channel time difference ICLD), an inter-channel time difference
  • ICTD inter-channel coherence/correlation
  • ICC inter-channel coherence/correlation
  • ICLD and ICTD aim at describing the signals from the actual audio sources, whereas the ICC aims at enhancing the spatial sensation by introducing a diffuse component of the audio image, including reverberations, ambience, etc. These cues are normally provided for each frequency band separately.
  • the decoding side typically uses a filter that is controlled by the received ICC cues to recreate a coherence/correlation approximating the coherence/correlation which is present in the input signals .
  • a method which comprises evaluating correlation status information, wherein the correlation status information indicates whether or not there is a - A -
  • the method further comprises modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • a first apparatus which comprises a processor.
  • the processor is configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal.
  • the processor is further configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • the apparatus may comprise for example exclusively the described processor, but it may also comprise additional components.
  • the apparatus could further be for example a module provided for integration into an electronic device, like a processing component, a chip or a circuit implementing the processor, or it could be such a device itself. In the latter case, it could be for instance an electronic device, which comprises in addition an interface configured to receive captured multi-channel audio signals and/or an interface configured to output multi-channel audio signals.
  • a second apparatus which comprises means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal, and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • the means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit. It is to be understood that further or correspondingly adapted means may be comprised for realizing any of the functions that may optionally be implemented in any described embodiment of the first apparatus.
  • a computer readable storage medium in which computer program code is stored.
  • the computer program code realizes the described method when executed by a processor.
  • the computer readable storage medium could be for example a disk or a memory or the like.
  • the computer program code could be stored in the computer readable storage medium in the form of instructions encoding the computer-readable storage medium. It is to be understood that also the computer program code by itself has to be considered an embodiment of the invention.
  • certain embodiments of the invention provide that information about a correlation status is used as a decision criterion whether to apply a certain modification to a value indicating a difference between different channels of a multi-channel audio signal.
  • the considered audio signal can be for instance a speech signal, but equally any other kind of audio signal, like a music signal.
  • the considered segment can be for instance a frame of an audio signal, but equally any other kind of segment, like a superframe or a subframe.
  • An audio signal may comprise any number of segments, including one.
  • the described processing can further be performed for example for each of a plurality of frequency bands in the segment of an audio signal, only for selected ones of a plurality of frequency bands, or on the entire frequency range of the segment of the audio signal as a whole. A selection of frequency bands could also differ from one segment to the next.
  • the multi-channel audio signal may comprise only two channels, for instance the left and right channel of a stereo signal, or any other number of channels, for instance five channels for a surround audio signal.
  • the correlation information status could be derived for instance from an inter-channel correlation (ICC) cue obtained in a binaural cue coding, but it could be obtained in any other manner as well which is suited to indicate whether or not a significant correlation between channels is given.
  • the difference value that may be modified could be for instance an inter-channel level difference (ICLD) cue obtained in a binaural cue coding, but it could equally be any other kind of value that can be modified to increase the decorrelation of the channels if appropriate.
  • ICLD inter-channel level difference
  • the correlation status information is represented by a single bit.
  • the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.
  • the processor or some other means is configured to realize a corresponding modification.
  • the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.
  • the processor or some other means is configured to realize a corresponding modification.
  • the modifying of a value representing a difference between channels is based alternatively or in addition on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.
  • a value indicating the amount of the correlation itself might not required so that only the correlation status has to be provided.
  • the processor or some other means is configured to realize a corresponding modification.
  • the modifying of a value representing a difference between channels is based alternatively or in addition on a correlation value indicating the amount of correlation between the channels, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates in contrast a similar level of a signal in the channels.
  • the processor or some other means is configured to realize a corresponding modification.
  • the code may be implemented to realize a corresponding modification.
  • the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels.
  • the processor or some other means is associated to an encoder side, which generates the correlation status information and the value representing a difference between the channels .
  • the code is code for such an encoder side.
  • the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.
  • the processor or some other means is associated to such a decoder side.
  • the code is code for such a decoder side.
  • the method comprises obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid.
  • the processor or some other means is configured to obtain such additional information.
  • the code may be implemented to obtain such additional information.
  • a method is an information providing method, comprising the steps of evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • an apparatus is an information providing apparatus comprising processing means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and processing means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • one of the described apparatuses can be seen as an audio signal encoding or decoding apparatus.
  • Fig. 1 is a schematic block diagram of a coding system in which an exemplary embodiment of the invention is implemented
  • Fig. 2 is a schematic block diagram presenting functional blocks of an exemplary encoder
  • Fig. 3 is a schematic block diagram presenting functional blocks of an exemplary decoder
  • Fig. 4 is a flow chart illustrating an operation at an encoding side in the system of Figure 1
  • Fig. 5 is a flow chart illustrating an operation at a decoding side in the system of Figure 1
  • Fig. 6 is a schematic block diagram of an electronic device in which another exemplary embodiment of the invention is implemented
  • Fig. 7 is a flow chart illustrating an operation in the electronic device of Figure 6.
  • Figure 1 is a schematic diagram of an exemplary system which supports a correlation status controlled modification of inter-channel level differences.
  • the system comprises a first electronic device 110 and a second electronic device 120.
  • the first electronic device 110 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission, for example an audio recording device.
  • the device 110 comprises a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.
  • a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.
  • TX transmitter
  • the processor 112 is configured to execute implemented computer program code.
  • the memory 113 stores computer program code 114, which may be retrieved by the processor 112 for execution.
  • the stored program codes 114 comprise code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension cues and for generating inter-channel correlation values and status.
  • the memory 113 may comprise in addition a data storage portion 115.
  • the processor 112 and the memory 113 could optionally be integrated in a single component, for example on a chip 111.
  • the interface 116 could be for instance microphones or comprise a socket for connecting microphones.
  • the transmitter 117 could belong for example to a cellular engine of the device 110 and be configured to transmit data via a cellular communication network to other devices.
  • the second electronic device 120 can also be for instance a mobile phone, but equally any other device which is able to decode audio data for presentation to a user.
  • the device 120 comprises a processor 122 and, linked to this processor 122, a memory 123, an interface for presenting audio data 126 to a user and a receiver (RX) 127.
  • RX receiver
  • the processor 122 is configured to execute implemented computer program code.
  • the memory 123 stores computer program code 124, which may be retrieved by the processor 122 for execution.
  • the stored program codes 124 comprise code for decoding audio data. It includes code for modifying stereo extension values under control of an inter-channel correlation status, and for reconstructing a multi-channel audio signal.
  • the memory 123 may comprise in addition a data storage portion (not shown) .
  • the processor 122 and the memory 123 could optionally be integrated in a single component, for example on a chip 121.
  • the interface 126 could comprise for instance loudspeakers or a socket for connecting loudspeakers.
  • the receiver 127 could belong for example to a cellular engine of the device 120 and be configured to receive data via a cellular communication network from other devices .
  • the interfaces 117 and 127 are configured in any case such that they enable device 110 to transmit encoded audio data to device 120, either directly on a wired or wireless link or indirectly via some communication network.
  • Figure 2 is a high-level block diagram of an encoder implemented by the program code 114 of device 110. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of an encoder providing the same functions as the program code 114. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
  • the encoder includes a transform block 201 for transforming the data of a left channel of an audio signal 'L' from the time domain into the frequency domain.
  • the resulting frequency domain signal is denoted 'Lf'.
  • the encoder further includes a transform block 202 for transforming the data of a right channel of an audio signal 'R' from the time domain into the frequency domain.
  • the resulting frequency domain signal is denoted
  • the encoder moreover includes a mono conversion block 203, which is configured to create a down-mixed signal by converting the stereo signal into a mono signal
  • M f 0.5-(L f +R f ) and to pass the mono signal to a mono encoder, for example to an embedded variable bitrate (EV- VBR) mono encoder 204.
  • a different way to create the down-mixed signal and the difference signal can be used, for example one comprising a linear combination of the input channels with possible phase correction.
  • the mono encoder 204 is configured to encode a received mono signal and to provide a resulting bitstream to a bitstream multiplexer 208.
  • the stereo encoder 205 is configured to generate and encode stereo extension data, including a quantization to obtain a desired bitrate, and to provide a resulting bitstream to the multiplexer 208. Any kind of stereo encoder could be used to this end.
  • the encoder moreover includes further transformers 206, which are configured to transform the left and right channel signals L and R to the frequency domain, and a correlation encoder 207, which is configured to analyze the left and right channel signals in the frequency domain, to decide which of the spectral bands need decorrelation at a decoder side, and to pass corresponding correlation flags to the multiplexer 208. It is to be understood that in an alternative embodiment instead of employing separate transformers 206, also the output of transformers 201, 202 could be provided to correlation encoder 207.
  • the multiplexer 208 is configured to multiplex all received information to create a bitstream for storage or transmission.
  • Figure 3 is a high-level block diagram of a decoder implemented by the program code 124 of device 120. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of a decoder providing the same functions as the program code 124. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
  • the decoder includes a demultiplexer 307, which is configured to demultiplex a bitstream that has been retrieved from a memory or received from another device and to pass the demultiplexed data to a mono decoder, for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.
  • a mono decoder for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.
  • the mono decoder 304 is configured to decode a received encoded mono signal.
  • the stereo decoder 305 is configured to extract and decode stereo extension data from the bitstream, to combine this data with the decoded mono signal to reconstruct a stereo signal, and to output the reconstructed left and right output channels L f and R f to inverse transformers 301 and 302.
  • the stereo decoder 305 is configured to provide the extracted stereo extension values to the decorrelation block 306 before using them in the reconstruction and to receive modified stereo extension values from the decorrelation block 306 for use in the reconstruction.
  • the decorrelation block 306 is configured to extract correlation flags from the bitstream, to modify the stereo extension values when needed, and to provide the modified values to the stereo decoder 305.
  • stereo decoder 305 and decorrelation block 306 can be integrated in a single functional block for optimizing the processing.
  • the inverse transformer 301 is configured to obtain the time domain left channel L by performing a frequency-to- time domain transformation on reconstructed left channel L f
  • the inverse transformer 302 is configured to obtain the time domain right channel R by performing a frequency-to-time domain transformation on reconstructed right channel R f .
  • the regained stereo signal may be provided for presentation to a user or stored for later consumption.
  • processor 112 when executing the code for encoding audio data 114 retrieved from memory 113, and equally to be realized by the corresponding functional blocks of the encoder of Figure 2.
  • the data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.
  • the multi-channel audio signal is transformed into the frequency domain (action 401) .
  • the employed transform can be any complex valued transform such as a discrete Fourier transform (DFT) , a quadrature mirror filterbank (QMF) transform, or a combination of a modified discrete cosine transform (MDCT) and a modified discrete sine transform (MDST) .
  • DFT discrete Fourier transform
  • QMF quadrature mirror filterbank
  • MDCT modified discrete cosine transform
  • MDST modified discrete sine transform
  • MDCT is used to obtain the real valued signals
  • MDST is used to obtain the imaginary counterpart for the same input signal.
  • the left and right channel signals are down-mixed to a mono signal (action 411), and the mono signal is encoded for transmission (action 412) .
  • the frequency range of each frequency domain frame is divided into a plurality of frequency bands .
  • the left and right channel signals are used for determining multi-channel extension values for each frequency band, including ICLD values (action 421) .
  • the difference signal can be used for enhancing the stereo quality, in particular when higher bitrates are available .
  • icld(i) is the ICLD cue for frequency band i of the current frame, where Offset describes the start and end indices for each spectral band, and where L real , L j _ mag ,
  • R rea i f and R ⁇ magf are the complex valued spectral representations of the left and right channels.
  • the multi-channel extension values are encoded for transmission (action 422) .
  • a correlation measure such as the inter- channel correlation (ICC) is calculated for each of a plurality of spectral bands (action 431) .
  • the inter-channel correlation can be calculated for example as follows :
  • icc t (i) is the inter-channel correlation in frequency band i of the current frame
  • lcc t _ ⁇ contains the ICC values from the previous frame
  • M is the number of spectral bands present for each frame.
  • IcC f - ⁇ 1 could be initialized to 1 I 1 (or any other suitable value) at start up.
  • the correlation measures in several previous frames could be taken into account for example by generalizing the equation (1 ⁇ into a weighted sum of the past values:
  • icc t -j is the correlation measure in frequency band i of the j:th frame counting backwards from the current frame
  • kj is the weight assigned to the correlation measure in frequency band i of the j : th frame counting backwards from the current frame.
  • sbOffsetU ⁇ 0, 5, 11, 18, 25, 33, 43, 56, 72, 91, 116, 146, 183, 240, 274 ⁇
  • the considered spectral bands for the ICC related computations could then cover 6850 Hz.
  • the total frequency range of the audio signal segment, which be considered in the computation of the ICLD values, could be larger than 6850 Hz, but decorrelation might not be applied to higher frequencies where the impact to subjective quality is lower.
  • the exemplary value 0.75 in equation (3) can be considered as a threshold for a correlation measure value (ICC) indicating significant correlation.
  • the threshold value can be a fixed or adaptive value, and it may be selected for example based on desired performance, based on the application, based on the characteristics of the input signal, etc.
  • the final ICC values are mapped to flag bits for bitstream multiplexing (action 433) .
  • Alternative approaches include having one flag bit per frequency band or one flag bit for any arbitrarily selected set of frequency bands.
  • flag bits are provided for transmission or storage as follows:
  • Send/store value of icc flag (i) with 1 bit it could be determined in addition whether the signal level in a frequency band is very similar across the channels, that is, whether the ICLD cue is equal to 1 for any frequency band (action 434) .
  • the final ICC value itself may be provided in addition in the bitstream for enabling a better decorrelation.
  • qTbl describes a table for quantized ICC values and where the quantization operator Q() returns the table index that minimizes the squared error between the ICC value in question and the quantization table value corresponding to the index.
  • the table is as follows:
  • the threshold value for the high correlation status in equation (3) could be decreased from 0.75 for example to 0.5 to limit the decorrelation only to spectral bands where decorrelation is perceptually most relevant. This has also the advantage that the side information gets simultaneously minimized.
  • the quantized indices of the final ICC values could then be provided for transmission or storage as follows:
  • mapping between flag bits and frequency bands is known from the context.
  • information indicating for which frequency band or bands a respective flag applies or any further information could be provided using additional bits ⁇ action 436) .
  • the encoded mono signal, the encoded stereo extension information and the decorrelation information, the latter including ICC flags and optionally encoded ICC values, are multiplexed to a bitstream for transmission via interface 117 or storage in data storage portion 115 (action 441) .
  • the bitstream can be constructed in such a way that all encoded data belonging to the same frame, i.e. the encoded mono signal, the encoded stereo extension information and the decorrelation information, are included in a single data unit.
  • the encoded mono signal of a frame can be encapsulate in one data unit, while the stereo extension information and the decorrelation information for this frame are combined into another data unit.
  • the encoded data of a frame is encapsulate in several data units, each comprising encoded data representing a certain frequency range.
  • processor 122 when executing the code for encoding audio data 124 retrieved from memory 123, and equally to be realized by the corresponding functional blocks of the decoder of Figure 3.
  • the data When receiving encoded multi-channel audio data via receiver 127 that is to be presented to a user, the data is forwarded to the processor 112 for processing.
  • the received bitstream is first demultiplexed (action 501) .
  • the mono signal is extracted from the bitstream and decoded (action 511) .
  • extension values including for example ICLD cues
  • ICC flags are extracted from the bitstream and expanded to full resolution (action 531) as follows:
  • the "read 1 bit”, which is used as a respective decoded ICC flag icC f i ag ⁇ corresponds to the icc f i ag determined in equation (4) for a respective pair of neighboring frequency bands - optionally modified in case of an ICLD cue equal to 1 as described above.
  • the number of received ICC flags is doubled by associating the same flag Iccfi a g dec to two neighboring frequency bands, respectively.
  • ICC values ICC values or information linking a respective ICC flag and/or value to a respective frequency band.
  • Indices for the ICC values could be read from the bitstream right after the flag bits and converted into ICC values as follows:
  • the "read 2 bits" correspond to a respective index icCq_i d ⁇ as defined above in equation (5), while icc q is the quantized value associated to a particular index icc q _i dx in table qTbl.
  • An exemplary table qTbl has already been introduced above. Also the number of obtained quantized ICC values is doubled by associating the same quantized ICC value to two neighboring frequency bands, respectively.
  • the decoded ICC flag is evaluated for all frequency bands of the current frame (action 532) . That is, it is determined whether it has a value of 1 I' representing a low correlation or a value of '0' representing a significant correlation between the channels.
  • lcld(i) is the decoded level difference for each frequency band i of a respective frame, as extracted from the bitstream.
  • the modification summand b(i) may be determined as follows:
  • ice _ dec t is a decoder internal ICC value for the current frame and where ice _ dec t _ ⁇ is a decoder internal
  • the decoder internal ICC value for the previous frame ice _ dec t _ 1 may be initialized for example to 1 at start up.
  • the decoder internal ICC value for the current frame ice _ dec t may be determined as follows:
  • i ⁇ c g contains the quantized ICC values for those bands where the corresponding ICLD value is 1, and where MIN returns the minimum of the specified input values.
  • eMax max eMoriO (eMojio)
  • sbOffsetf defines again the offsets for the considered spectral bands, which may be the same as indicated above for the encoding.
  • iccGain t is an adaptive gain that is initialized to a suitable value, for example to '6' at start up. Then, it is updated for the respective next frame as follows based on the energy (i) computed for the current frame in equation (9) :
  • iccGain t + 1 MIW(6,O.3 • iccGa ⁇ n t + 0.7 • lccGain e ) (10)
  • iccGain t is the gain value of the current frame
  • iccGain t+1 is the gain value for the next frame
  • Equation (9) introduces a time-frequency dependent gain for the decorrelation to improve the perceptual quality. This is especially advantageous for signal frames or other signal segments in which the ICLDs have a relatively flat response.
  • Detailed analysis of equation (8) shows, however, that decorrelation performs better with the first option whenever the value of icld(i) is equal to 1. Otherwise, icld(i) of value 1, indicating that the channel signals are similar in a level difference sense, would lead ice dec t (i) in Equation (8) to zero and thus, no perceptually significant decorrelation contribution could be expected when applying equation (6) .
  • the original or modified stereo extension values (action 534, 536 or 537) are then used for reconstructing the multi-channel audio signal by up-mixing the decoded mono signal ⁇ action 522) .
  • the reconstructed multi-channel audio signal is transformed again into the time domain (action 541) and then presented to a user via audio out interface 126.
  • the ICC flags are transmitted together with the actual audio data. . They could also be transmitted separately from the other data.
  • a corresponding decorrelation could also be applied for an audio signal comprising more than two channels.
  • one of the channels could be selected to be a reference channel, and correlation flags could indicate the correlation between a respective channel and this reference channel.
  • correlation flags could indicate the correlation between any arbitrary pair of channels.
  • processing an audio signal comprising more than two channels more than one down- mixed signal could be generated and transmitted.
  • a set of ICLD values, ICC flags, and possibly ICC value may be provided for each down-mixed signal separately.
  • ICLD cues and inter- channel correlation could also be computed in the time domain instead of the frequency domain.
  • the presented approach could equally be employed for modifying other kinds of values representing a difference between channels than BCC ICLD cues and other inter- channel correlation information than BCC ICC cues.
  • Another variation of the presented approach may comprise a modified computation of the inter-channel level differences.
  • An exemplary modified computation will be presented in the following for the case of a stereo audio signal .
  • the left and right channel input signals are converted to the frequency domain using a shifted discrete Fourier transform (SDFT) .
  • SDFT discrete Fourier transform
  • the resulting complex- valued spectral samples are converted to the energy domain as follows:
  • f L and f R are the complex valued shifted discrete Fourier transform (SDFT) samples of the left and right channels, respectively, and N is the size of the frame.
  • SDFT shifted discrete Fourier transform
  • the energy level for each spectral subband is calculated according to:
  • offset ⁇ is a frequency offset table describing the frequency bin offsets for each spectral subband, and where M is the number of spectral subbands present in the region .
  • the inter-channel level differences can then be determined for different frequency bands in the form of stereo gain values gain ⁇ i) as follows:
  • offset 2 is the frequency offset table describing the frequency bin offsets for each spectral subband
  • K is the number of spectral gain subbands present in the region
  • max ( ) and min() return the maximum and minimum of the specified samples, respectively.
  • gain values may then correspond to inter-channel level differences, which are modified whenever an ICC status indicates that there is a low correlation between the channels.
  • Additional position values may indicate to which channel a respective gain value belongs.
  • the position values may be post-processed to obtain a stable stereo image over time.
  • Figure 6 is a schematic diagram of an exemplary electronic device which supports a correlation status controlled modification of inter-channel level differences at an encoder side.
  • the electronic device 610 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission.
  • the device 610 comprises a processor 612 and, linked to this processor 612, a memory 613, an interface for receiving audio data 616, and a transmitter (TX) 617.
  • the processor 612 is configured to execute implemented computer program code .
  • the memory 613 stores computer program code 614, which may be retrieved by the processor 612 for execution.
  • the stored program code 614 comprises code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension values, for determining inter-channel correlation values and status, for modifying the stereo extension values under control of the ICC status, and for encoding the mono signal and the modified stereo extension values for storage or transmission.
  • the memory 613 may comprise in addition a data storage portion 615.
  • the processor 612 and the memory 613 could optionally be integrated in a single component, for example on a chip 611.
  • the interface 616 could comprise for instance a plurality of microphones or comprise a socket for connecting microphones .
  • the transmitter 617 could belong for example to a cellular engine of the device 610 and be configured to transmit data via a cellular communication network to other devices.
  • processor 712 when executing the code for encoding audio data 714 retrieved from memory 713 or by a corresponding hardware implementation.
  • a multi-channel audio signal that is to be stored or transmitted is received by device 610 via audio interface 616, it is forwarded to the processor 612 for encoding.
  • the multi-channel audio signal is a stereo signal.
  • the data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.
  • the multi-channel audio signal is transformed into the frequency domain (action 701) .
  • the left and right channel signals are combined to a mono signal (action 711) , and the mono signal is encoded for transmission (action 712) .
  • each frequency domain frame is divided into M frequency bands.
  • the left and right channel signals are used for determining multi-channel extension values, including ICLD values for each frequency band (action 721) .
  • inter-channel correlation is calculated for each spectral band in accordance with above equations (1) and (2) (action 731) .
  • the extension values are modified based on the mono signal values under control of the ICC status in accordance with above equations (6)-(10) (action 734) .
  • the ICC status could be determined to this end in accordance with above equation (4) . It is to be understood, however, that generating separate ICC flags is only optional, since no flags have to be transmitted in this case.
  • the extension values are modified based on the final ICC values under control of the ICC status in accordance with above equations (6) -(8) (action 735) .
  • the ICC status can be determined for this case in accordance with above equation (4) using for example a threshold value of 0.5 instead of 0.75 in equation (3 ⁇ .
  • a quantization of the ICC values may not be required in this case, since they are not necessarily transmitted.
  • the quantized values ⁇ cc q in equation (8) could simply be replaced by the final ICC values icc t obtained with equation (3) .
  • the quantized values icc q in equation (8) could be replaced by the ICC values icc t (i) determined in accordance with equation (1) .
  • the modified multi-channel extension values are encoded (action 722) .
  • the encoded mono signal and the encoded modified stereo extension information are multiplexed to a bitstream for transmission or storage (action 741) .
  • Some decoders may then decode the encoded data in a conventional manner without applying any further decorrelation processing.
  • Using a correlation/coherence processing in a parametric multi-channel audio coding process may result in an improved user-experience due to enhanced spatial sensation.
  • Some embodiments of the invention allow reducing the correlation between channels that are derived from the mono signal by modifying values representing a difference between channels, for instance values representing a level difference. As a result, correlation between the channels better approximates that of the original stereo signal, thus improving the feeling of spaciousness.
  • Certain embodiments of the invention further allow improving the naturalness and subjective audio quality of a low bit-rate multi-channel audio coding system by using an improved and effective transmission or storage and by processing the correlation/coherence information in a way exploiting the data from previous frames.
  • Some embodiments may also be suited to improve the multi-channel audio quality across a wide range of signals.
  • Certain embodiments of the invention using information about a correlation status as a decision criterion whether to modify a value representing a difference between channels in the segment of the multi-channel audio signal ensure that the amount of decorrelation processing is reduced compared to an approach in which a decorrelation processing is performed in any case.
  • certain embodiments further ensure that the actual correlation values have to be provided at the most when a decorrelation is appropriate. Such embodiments thus enable a particularly low bitrate coding where only limited bits are available for the coding of the correlation information.
  • the lowest amount of data has to be provided if the correlation status information is encoded as a single bit, the association to frequency bands is predetermined, and the actual correlation values are never provided.
  • providing an association to frequency bands as additional information may render some embodiments more flexible, since the association may change in this case from segment to segment of the audio signal.
  • Providing the actual correlation values in selected cases may further improve the decorrelation without unduly increasing the amount of required side information.
  • the transmission of ICC values may be limited to a few cases, while otherwise only a one bit status may be transmitted. If the modification is performed at the encoder, certain embodiments ensure that less side information has to be provided to a decoder and that a decoder which does not support decorrelation processing at all could be employed.
  • Certain embodiments ensure that only deterministic values are used in the modification instead of random numbers. This ensures that the decorrelation procedure can be adapted better to the concrete spatial situation.
  • connection is to be understood in a way that the involved components are operationally coupled.
  • connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
  • any of the mentioned processors could be of any suitable type, for example a computer processor, an application-specific integrated circuit (ASIC), etc.
  • Any of the mentioned memories could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read- only memory, a flash memory or a hard disc drive memory etc.
  • any other hardware components that have been programmed in such a way to carry out the described functions could be employed as well.
  • any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor.
  • a computer-readable storage medium e.g., disk, memory, or the like
  • references to 'computer-readable storage medium' should be understood to encompass specialized circuits such as field-programmable gate arrays, application-specific integrated circuits (ASICs), signal processing devices, and other devices.
  • the functions illustrated by the combination of processor 122 and memory 123, by the decorrelation block 306 or by the combination of processor 612 and memory 613 can be viewed as means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal; and as means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
  • the program codes 124 or 614 can also be viewed as comprising such means in the form of functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Stereophonic System (AREA)

Abstract

For supporting a reconstruction of a multi-channel audio signal, correlation status information is evaluated at an encoder side or a decoder side. The correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal. In case the correlation status information indicates that there is no significant correlation between the channels, a value representing a difference between channels in the segment of the multi-channel audio signal is modified.

Description

MuIti-channel audio coding
FIELD OF THE INVENTION
The invention relates to the field of multi-channel audio coding.
BACKGROUND OF THE INVENTION
Audio coding systems are used in particular for transmitting or storing audio signals.
In a basic structure of an audio coding system, which is employed for transmission of audio signals, the audio coding system comprises an encoder at a transmitting side and a decoder at a receiving side. The audio signal that is to be transmitted is provided to the encoder. The encoder is responsible for adapting the incoming audio data rate to a bitrate level at which the bandwidth conditions in the transmission channel are not violated. Ideally, the encoder discards only irrelevant information from the audio signal in this encoding process. The encoded audio signal is then transmitted by the transmitting side of the audio coding system and received at the receiving side of the audio coding system. The decoder at the receiving side reverses the encoding process to obtain a decoded audio signal with little or no audible degradation.
Alternatively, the audio coding system could be employed for archiving audio data. In that case, the encoded audio data provided by the encoder is stored in some storage unit, and the decoder decodes audio data retrieved from this storage unit. In this alternative, it is the target that the encoder achieves a bitrate which is as low as possible, in order to save storage space.
The original audio signal can be a mono audio signal or a multi-channel audio signal containing at least a first and a second channel signal. An example of a multichannel audio signal is a stereo audio signal, which is composed of a left channel signal and a right channel signal. Another example is an audio signal that is used for a surround technology and includes for example two stereo channels, an additional center channel and two surround channels.
Depending on the allowed bitrate, different encoding schemes can be applied to a multi-channel audio signal. The different channels can be encoded for instance independently from each other. But typically, a correlation exists between the different channels of a multi-channel audio signal, and the most advanced coding schemes exploit this correlation to achieve a further reduction in the bitrate.
Examples for reducing the bitrate for an encoded stereo audio signal comprise low bitrate stereo extension methods. In a stereo extension method, the stereo audio signal is encoded as a high bitrate mono signal, which is provided by the encoder together with some side information reserved for a stereo extension. In the decoder, the stereo audio signal is then reconstructed from the high bitrate mono signal in a stereo extension making use of the side information. The side information typically takes only a few kbps of the total bitrate. Parametric multi-channel audio coding methods, such as Binaural Cue Coding (BCC) , enable a high-quality multichannel reproduction with reasonable bit-rate compared to a scenario where all channels are encoded and transmitted separately. The compression of a spatial image is based on generating one or several down-mixed signals together with a set of spatial cues. The decoder uses the received down-mixed signals and the spatial cues to synthesize a set of channels - which can be different from the number of input channels - with spatial properties as described by the received spatial cues.
The spatial cues typically include an inter-channel level difference (ICLD), an inter-channel time difference
(ICTD) and an inter-channel coherence/correlation (ICC) .
ICLD and ICTD aim at describing the signals from the actual audio sources, whereas the ICC aims at enhancing the spatial sensation by introducing a diffuse component of the audio image, including reverberations, ambience, etc. These cues are normally provided for each frequency band separately.
The decoding side typically uses a filter that is controlled by the received ICC cues to recreate a coherence/correlation approximating the coherence/correlation which is present in the input signals .
SUMMARY OF SOME EMBODIMENTS OF THE INVENTION
A method is described, which comprises evaluating correlation status information, wherein the correlation status information indicates whether or not there is a - A -
significant correlation between channels in a segment of a multi-channel audio signal. The method further comprises modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
Moreover, a first apparatus is described, which comprises a processor. The processor is configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal. The processor is further configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
The apparatus may comprise for example exclusively the described processor, but it may also comprise additional components. The apparatus could further be for example a module provided for integration into an electronic device, like a processing component, a chip or a circuit implementing the processor, or it could be such a device itself. In the latter case, it could be for instance an electronic device, which comprises in addition an interface configured to receive captured multi-channel audio signals and/or an interface configured to output multi-channel audio signals.
Moreover, a second apparatus is described, which comprises means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal, and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
The means of this apparatus can be implemented in hardware and/or software. They may comprise for instance a processor for executing computer program code for realizing the required functions, a memory storing the program code, or both. Alternatively, they could comprise for instance a circuit that is designed to realize the required functions, for instance implemented in a chipset or a chip, like an integrated circuit. It is to be understood that further or correspondingly adapted means may be comprised for realizing any of the functions that may optionally be implemented in any described embodiment of the first apparatus.
Moreover, a computer readable storage medium is described, in which computer program code is stored. The computer program code realizes the described method when executed by a processor. The computer readable storage medium could be for example a disk or a memory or the like. The computer program code could be stored in the computer readable storage medium in the form of instructions encoding the computer-readable storage medium. It is to be understood that also the computer program code by itself has to be considered an embodiment of the invention. Thus, certain embodiments of the invention provide that information about a correlation status is used as a decision criterion whether to apply a certain modification to a value indicating a difference between different channels of a multi-channel audio signal.
The considered audio signal can be for instance a speech signal, but equally any other kind of audio signal, like a music signal. The considered segment can be for instance a frame of an audio signal, but equally any other kind of segment, like a superframe or a subframe. An audio signal may comprise any number of segments, including one. The described processing can further be performed for example for each of a plurality of frequency bands in the segment of an audio signal, only for selected ones of a plurality of frequency bands, or on the entire frequency range of the segment of the audio signal as a whole. A selection of frequency bands could also differ from one segment to the next.
The multi-channel audio signal may comprise only two channels, for instance the left and right channel of a stereo signal, or any other number of channels, for instance five channels for a surround audio signal. The correlation information status could be derived for instance from an inter-channel correlation (ICC) cue obtained in a binaural cue coding, but it could be obtained in any other manner as well which is suited to indicate whether or not a significant correlation between channels is given. The difference value that may be modified could be for instance an inter-channel level difference (ICLD) cue obtained in a binaural cue coding, but it could equally be any other kind of value that can be modified to increase the decorrelation of the channels if appropriate.
In one embodiment of the described method, the described apparatuses or the described computer program code, the correlation status information is represented by a single bit.
In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based on equations using exclusively non-random values. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.
In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.
In one embodiment of the described method or the described computer program code, the modifying of a value representing a difference between channels is based alternatively or in addition on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels. In this case, a value indicating the amount of the correlation itself might not required so that only the correlation status has to be provided. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification.
In one embodiment of the described method, the modifying of a value representing a difference between channels is based alternatively or in addition on a correlation value indicating the amount of correlation between the channels, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates in contrast a similar level of a signal in the channels. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to realize a corresponding modification. In a corresponding embodiment of the described computer program code, the code may be implemented to realize a corresponding modification.
In one embodiment of the described method, the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is associated to an encoder side, which generates the correlation status information and the value representing a difference between the channels . In a corresponding embodiment of the described computer program code, the code is code for such an encoder side. In one embodiment of the described method, the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is associated to such a decoder side. In a corresponding embodiment of the described computer program code, the code is code for such a decoder side.
In a variation of this embodiment of the described method, the method comprises obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid. In a corresponding embodiment of one of the described apparatuses, the processor or some other means is configured to obtain such additional information. In a corresponding embodiment of the described computer program code, the code may be implemented to obtain such additional information.
In one embodiment, a method is an information providing method, comprising the steps of evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels. In a further embodiment, an apparatus is an information providing apparatus comprising processing means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and processing means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
In one embodiment of the invention, one of the described apparatuses can be seen as an audio signal encoding or decoding apparatus.
It is to be understood that any feature presented for a particular exemplary embodiment may also be used in combination with any other described exemplary embodiment of any category.
Further, it is to be understood that the presentation of the invention in this section is merely exemplary and non-limiting.
Other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for purposes of illustration and not as a definition of the limits of the invention, for which reference should be made to the appended claims . It should be further understood that the drawings are not drawn to scale and that they are merely intended to conceptually illustrate the structures and procedures described herein.
BRIEF DESCRIPTION OF THE FIGURES
Fig. 1 is a schematic block diagram of a coding system in which an exemplary embodiment of the invention is implemented;
Fig. 2 is a schematic block diagram presenting functional blocks of an exemplary encoder;
Fig. 3 is a schematic block diagram presenting functional blocks of an exemplary decoder; Fig. 4 is a flow chart illustrating an operation at an encoding side in the system of Figure 1; Fig. 5 is a flow chart illustrating an operation at a decoding side in the system of Figure 1; Fig. 6 is a schematic block diagram of an electronic device in which another exemplary embodiment of the invention is implemented; and Fig. 7 is a flow chart illustrating an operation in the electronic device of Figure 6.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 is a schematic diagram of an exemplary system which supports a correlation status controlled modification of inter-channel level differences.
The system comprises a first electronic device 110 and a second electronic device 120.
The first electronic device 110 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission, for example an audio recording device.
The device 110 comprises a processor 112 and, linked to this processor 112, a memory 113, an interface for receiving captured audio data 116, and a transmitter (TX) 117.
The processor 112 is configured to execute implemented computer program code.
The memory 113 stores computer program code 114, which may be retrieved by the processor 112 for execution. The stored program codes 114 comprise code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension cues and for generating inter-channel correlation values and status. The memory 113 may comprise in addition a data storage portion 115.
The processor 112 and the memory 113 could optionally be integrated in a single component, for example on a chip 111.
The interface 116 could be for instance microphones or comprise a socket for connecting microphones.
The transmitter 117 could belong for example to a cellular engine of the device 110 and be configured to transmit data via a cellular communication network to other devices.
The second electronic device 120 can also be for instance a mobile phone, but equally any other device which is able to decode audio data for presentation to a user. The device 120 comprises a processor 122 and, linked to this processor 122, a memory 123, an interface for presenting audio data 126 to a user and a receiver (RX) 127.
The processor 122 is configured to execute implemented computer program code.
The memory 123 stores computer program code 124, which may be retrieved by the processor 122 for execution. The stored program codes 124 comprise code for decoding audio data. It includes code for modifying stereo extension values under control of an inter-channel correlation status, and for reconstructing a multi-channel audio signal. The memory 123 may comprise in addition a data storage portion (not shown) .
The processor 122 and the memory 123 could optionally be integrated in a single component, for example on a chip 121.
The interface 126 could comprise for instance loudspeakers or a socket for connecting loudspeakers.
The receiver 127 could belong for example to a cellular engine of the device 120 and be configured to receive data via a cellular communication network from other devices .
The interfaces 117 and 127 are configured in any case such that they enable device 110 to transmit encoded audio data to device 120, either directly on a wired or wireless link or indirectly via some communication network.
Figure 2 is a high-level block diagram of an encoder implemented by the program code 114 of device 110. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of an encoder providing the same functions as the program code 114. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
The encoder includes a transform block 201 for transforming the data of a left channel of an audio signal 'L' from the time domain into the frequency domain. The resulting frequency domain signal is denoted 'Lf'. The encoder further includes a transform block 202 for transforming the data of a right channel of an audio signal 'R' from the time domain into the frequency domain. The resulting frequency domain signal is denoted
The encoder moreover includes a mono conversion block 203, which is configured to create a down-mixed signal by converting the stereo signal into a mono signal
Mf =0.5-(Lf +Rf) and to pass the mono signal to a mono encoder, for example to an embedded variable bitrate (EV- VBR) mono encoder 204. The mono conversion block 203 may be further configured to generate a difference signal, for example Df =0.5-(Lf -Rf) r from the stereo signal and to pass the difference signal to a stereo encoder 205 to assist the stereo encoding process. Optionally, a different way to create the down-mixed signal and the difference signal can be used, for example one comprising a linear combination of the input channels with possible phase correction.
The mono encoder 204 is configured to encode a received mono signal and to provide a resulting bitstream to a bitstream multiplexer 208. The stereo encoder 205 is configured to generate and encode stereo extension data, including a quantization to obtain a desired bitrate, and to provide a resulting bitstream to the multiplexer 208. Any kind of stereo encoder could be used to this end.
The encoder moreover includes further transformers 206, which are configured to transform the left and right channel signals L and R to the frequency domain, and a correlation encoder 207, which is configured to analyze the left and right channel signals in the frequency domain, to decide which of the spectral bands need decorrelation at a decoder side, and to pass corresponding correlation flags to the multiplexer 208. It is to be understood that in an alternative embodiment instead of employing separate transformers 206, also the output of transformers 201, 202 could be provided to correlation encoder 207.
Finally, the multiplexer 208 is configured to multiplex all received information to create a bitstream for storage or transmission.
Figure 3 is a high-level block diagram of a decoder implemented by the program code 124 of device 120. It is to be understood that the block diagram could equally represent functional blocks of a hardware implementation of a decoder providing the same functions as the program code 124. The blocks are shown to process stereo data, but it has to be noted that the encoder may be adapted for processing audio signals with more than two channels.
The decoder includes a demultiplexer 307, which is configured to demultiplex a bitstream that has been retrieved from a memory or received from another device and to pass the demultiplexed data to a mono decoder, for example an EV-VBR mono decoder 304, to a decorrelation block 306 and to a stereo decoder 305.
The mono decoder 304 is configured to decode a received encoded mono signal.
The stereo decoder 305 is configured to extract and decode stereo extension data from the bitstream, to combine this data with the decoded mono signal to reconstruct a stereo signal, and to output the reconstructed left and right output channels Lf and Rf to inverse transformers 301 and 302. In addition, the stereo decoder 305 is configured to provide the extracted stereo extension values to the decorrelation block 306 before using them in the reconstruction and to receive modified stereo extension values from the decorrelation block 306 for use in the reconstruction.
The decorrelation block 306 is configured to extract correlation flags from the bitstream, to modify the stereo extension values when needed, and to provide the modified values to the stereo decoder 305.
It is to be understood that in a practical implementation, stereo decoder 305 and decorrelation block 306 can be integrated in a single functional block for optimizing the processing.
The inverse transformer 301 is configured to obtain the time domain left channel L by performing a frequency-to- time domain transformation on reconstructed left channel Lf, and the inverse transformer 302 is configured to obtain the time domain right channel R by performing a frequency-to-time domain transformation on reconstructed right channel Rf.
Finally, the regained stereo signal may be provided for presentation to a user or stored for later consumption.
The operation of the encoder implementation of device 110 will be described in more detail with reference to the flow chart of Figure 4.
The operations can be considered to be realized by processor 112 when executing the code for encoding audio data 114 retrieved from memory 113, and equally to be realized by the corresponding functional blocks of the encoder of Figure 2.
When a multi-channel audio signal that is to be stored or transmitted is received by device 110 via audio interface 116, it is forwarded to the processor 112 for encoding. Only for reasons of simplicity, it will be assumed again that the multi-channel audio signal is a stereo signal.
The data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis. The multi-channel audio signal is transformed into the frequency domain (action 401) . The employed transform can be any complex valued transform such as a discrete Fourier transform (DFT) , a quadrature mirror filterbank (QMF) transform, or a combination of a modified discrete cosine transform (MDCT) and a modified discrete sine transform (MDST) . In an exemplary implementation, MDCT is used to obtain the real valued signals whereas MDST is used to obtain the imaginary counterpart for the same input signal.
The left and right channel signals are down-mixed to a mono signal (action 411), and the mono signal is encoded for transmission (action 412) .
For the stereo extension, the frequency range of each frequency domain frame is divided into a plurality of frequency bands .
The left and right channel signals are used for determining multi-channel extension values for each frequency band, including ICLD values (action 421) . The difference signal can be used for enhancing the stereo quality, in particular when higher bitrates are available .
The ICLD values could be determined for each frequency band for example as the logarithm of the power ratio of corresponding subbands from the input signal as follows: id*) = 10 loBl0 - pψRig*h®Ui:)
Offset[i +l]-l
PLeft{±) = ∑ [Lrealf (j)2 + L±magf (jf) j-Offset[i]
Offset[i+l]-l pRightiϊ) = ∑ (RrealfOf + Rimagf(jf) j=Offset{i]
where icld(i) is the ICLD cue for frequency band i of the current frame, where Offset describes the start and end indices for each spectral band, and where Lreal , Lj_mag ,
Rreaif and R±magf are the complex valued spectral representations of the left and right channels.
The multi-channel extension values are encoded for transmission (action 422) .
Moreover, a correlation measure, such as the inter- channel correlation (ICC) is calculated for each of a plurality of spectral bands (action 431) .
The inter-channel correlation (ICC) can be calculated for example as follows :
lcct(i) = 0.3 iσct_]_(!) + 0.7 • a(i), 0 < i < M (1)
where icct (i) is the inter-channel correlation in frequency band i of the current frame, where lcct_ι contains the ICC values from the previous frame and where M is the number of spectral bands present for each frame. IcCf-^1 could be initialized to 1I1 (or any other suitable value) at start up. Optionally, the correlation measures in several previous frames could be taken into account for example by generalizing the equation (1} into a weighted sum of the past values:
icct(i)=kj-icct_j{i)+k0-a(i), O≤KM (Ib)
where icct-j is the correlation measure in frequency band i of the j:th frame counting backwards from the current frame, and kj is the weight assigned to the correlation measure in frequency band i of the j : th frame counting backwards from the current frame.
Furthermore, the values a(i) can be computed as follows:
Figure imgf000021_0001
pE(i) = preal(if + Pimagiif
sbθffset[i + l]-l
PrealOO = ∑ (LrSalf (j) ' Rrealf (j) + Limagf (j) ' Rimagf (j)) j = sbθffset[i]
sbθffset[l + l]-l
Pimagi1) = ∑ (Limagf U) " RrSalf (j) + Lrealf(j) ' Rimagf(j)) j=sbθf fset[i]
sbθffset[±÷ϊ\-l pLθft(i) = ∑ \L realf(j)2 + L±lαagf{jf) j = sbθffset[i]
(2! εbθffset[i + l]-l , . pRight{i) = ∑ ψrealf(jf + R±magf(j)2) j = sbθffset[i] where sbOffset describes the start and end indices for each spectral band, and Lrealf , LlmaQf , Rrealf and Rlmagf are again the complex valued spectral representations of the left and right channels. The spectral bands that are considered for the ICC related computations can be the same as those considered for the computation of the ICLD values, but they may equally be different. For calculating the ICC, for example Af=14 frequency bands could be selected with the following predetermined start and end indices sbθffset[] for each spectral band:
sbOffsetU = {0, 5, 11, 18, 25, 33, 43, 56, 72, 91, 116, 146, 183, 240, 274}
With an exemplary frequency resolution per spectral bin of 25 Hz, the considered spectral bands for the ICC related computations could then cover 6850 Hz. The total frequency range of the audio signal segment, which be considered in the computation of the ICLD values, could be larger than 6850 Hz, but decorrelation might not be applied to higher frequencies where the impact to subjective quality is lower.
Next, the final ICC value for each band is obtained as follows (action 432) :
JLCCt (D - 0 < i < M
Figure imgf000022_0001
In the presented exemplary implementation, thus all original ICC values exceeding 0.75 are mapped to a value of '1' indicating that the channel signals do not differ greatly from each other and no decorrelation is needed at the decoder side. The exemplary value 0.75 in equation (3) can be considered as a threshold for a correlation measure value (ICC) indicating significant correlation. The threshold value can be a fixed or adaptive value, and it may be selected for example based on desired performance, based on the application, based on the characteristics of the input signal, etc.
Next, the final ICC values are mapped to flag bits for bitstream multiplexing (action 433) . In an exemplary implementation, respectively two neighboring ICC values are mapped to a single flag bit to further save the side information associated with the signaling bits as follows: i cct(i) = 1 or 1O1 bi t i = 0,2,4, . . . , M icct(i + 1) == 1 ( 4 )
T M t otherwise j = 1,2,3, . . . , M / 2
Each flag bit thus provides correlation status information for two frequency bands of a frame indicating that there is a significant correlation between the channels in these frequency bands in the current frame (flag bit = 1O') or that there is no significant correlation between the channels in these frequency bands in the current frame (flag bit = '1') . Alternative approaches include having one flag bit per frequency band or one flag bit for any arbitrarily selected set of frequency bands.
Finally, the flag bits are provided for transmission or storage as follows:
for (i=0; i < M/2; i++)
Send/store value of iccflag(i) with 1 bit Optionally, it could be determined in addition whether the signal level in a frequency band is very similar across the channels, that is, whether the ICLD cue is equal to 1 for any frequency band (action 434) . In this case, the final ICC value itself may be provided in addition in the bitstream for enabling a better decorrelation.
To this end, the final ICC values of equation (3) could be quantized for encoding as follows (action 435) :
lcc a idx(i) = Q{icct(i)f qTbl), 0 < i < M (5)
where qTbl describes a table for quantized ICC values and where the quantization operator Q() returns the table index that minimizes the squared error between the ICC value in question and the quantization table value corresponding to the index. In an exemplary implementation, the table is as follows:
qTbl[] = {0.4, 0.3, 0.2, 0.1}.
As the value of an ICLD cue represents the level difference between the channel signals and does not take into account the phase difference, the threshold value for the high correlation status in equation (3) could be decreased from 0.75 for example to 0.5 to limit the decorrelation only to spectral bands where decorrelation is perceptually most relevant. This has also the advantage that the side information gets simultaneously minimized. The ICC flag could thus be re-mapped in case of an ICLD cue equal to 1 and provided for transmission or storage as follows: for(i=0; i < M/2; i++) { if (icld(2*i) == 1 and icct(i) > 0.5) iccf2ag(i) = O' bit Send value of j-ccflag(i) with 1 bit }
The quantized indices of the final ICC values could then be provided for transmission or storage as follows:
for(i=0; i < M/2; i++) { if (iccflag(i) =='1' bit } if (ic2d(2*i) == 1)
Send value of iccq ±dxϋ-) with 2 bits
}
It is assumed in this implementation that the mapping between flag bits and frequency bands is known from the context. Optionally, however, information indicating for which frequency band or bands a respective flag applies or any further information could be provided using additional bits {action 436) .
The encoded mono signal, the encoded stereo extension information and the decorrelation information, the latter including ICC flags and optionally encoded ICC values, are multiplexed to a bitstream for transmission via interface 117 or storage in data storage portion 115 (action 441) .
The bitstream can be constructed in such a way that all encoded data belonging to the same frame, i.e. the encoded mono signal, the encoded stereo extension information and the decorrelation information, are included in a single data unit. In another example the encoded mono signal of a frame can be encapsulate in one data unit, while the stereo extension information and the decorrelation information for this frame are combined into another data unit. In yet another example the encoded data of a frame is encapsulate in several data units, each comprising encoded data representing a certain frequency range.
The operation of the decoder implementation of device 120 will be described in more detail with reference to the flow chart of Figure 5.
The operations can be considered to be realized by processor 122 when executing the code for encoding audio data 124 retrieved from memory 123, and equally to be realized by the corresponding functional blocks of the decoder of Figure 3.
When receiving encoded multi-channel audio data via receiver 127 that is to be presented to a user, the data is forwarded to the processor 112 for processing.
The received bitstream is first demultiplexed (action 501) .
The mono signal is extracted from the bitstream and decoded (action 511) .
The extension values, including for example ICLD cues, are equally extracted from the bitstream and decoded (action 521) . Moreover, ICC flags are extracted from the bitstream and expanded to full resolution (action 531) as follows:
for(i=0; i < M/2; i++) { iccfiag _dec(2 ■ i)=read 1 bit i ccflag _ dec(2 ' i + I) = ICCf -lag _ dec(2 i)
}
The "read 1 bit", which is used as a respective decoded ICC flag icCfiagι corresponds to the iccfiag determined in equation (4) for a respective pair of neighboring frequency bands - optionally modified in case of an ICLD cue equal to 1 as described above. The number of received ICC flags is doubled by associating the same flag Iccfiag dec to two neighboring frequency bands, respectively.
If available, additional associated information is extracted from the bitstream and decoded, for example ICC values or information linking a respective ICC flag and/or value to a respective frequency band. Indices for the ICC values could be read from the bitstream right after the flag bits and converted into ICC values as follows:
for (i=0; i < M/2; i++) { i f ( iccfl ag _ dec{2 i) == U ' bit ) if ( icld ( 2*i ) == 1 ) { iccg (2 • i) = qTbl [ read 2 bits ] iccg(2 i + 1) = iccg(2 ■ i)
} }
The "read 2 bits" correspond to a respective index icCq_idχ as defined above in equation (5), while iccq is the quantized value associated to a particular index iccq_idx in table qTbl. An exemplary table qTbl has already been introduced above. Also the number of obtained quantized ICC values is doubled by associating the same quantized ICC value to two neighboring frequency bands, respectively.
Next, the decoded ICC flag is evaluated for all frequency bands of the current frame (action 532) . That is, it is determined whether it has a value of 1I' representing a low correlation or a value of '0' representing a significant correlation between the channels.
In case of an ICC flag representing a significant correlation (action 533) , the decoded extension values are not modified (action 534} .
In case of an ICC flag representing a correlation that is not significant (action 533), the decoded extension values are modified (action 536 or 537) .
Both cases can be summarized with the following equation:
Figure imgf000028_0001
(6)
where lcld(i) is the decoded level difference for each frequency band i of a respective frame, as extracted from the bitstream.
The modification summand b(i) may be determined as follows:
b(i) = 0.3 ice _ CJeCt-1(I) + 0.7 ice _ dect(i) (7)
where ice _ dect is a decoder internal ICC value for the current frame and where ice _ dect_ι is a decoder internal
ICC value that contains the decoder ICC value from the previous frame. The decoder internal ICC value for the previous frame ice _ dect_1 may be initialized for example to 1 at start up. The decoder internal ICC value for the current frame ice _ dect may be determined as follows:
fc
Figure imgf000029_0001
( 8 ! icld(if - 5 > 2 . 0
Figure imgf000029_0002
otherwise
where iσcg contains the quantized ICC values for those bands where the corresponding ICLD value is 1, and where MIN returns the minimum of the specified input values.
The adapted scaling using the energy parameter is based on the decoded mono signal and determined as follows: . .. X e(i) > iccGaint energy(±) = A w t r 0 ≤ i < M
Figure imgf000030_0001
otherwise
Figure imgf000030_0002
d(i) = eMonσ(ϊ) ■ ■ iccGaint eMax
eMonod)
Figure imgf000030_0003
" sjbOffsetfi + 1] - sjbθffset[i]
eMax = maxeMoriO(eMojio)
( 9 )
where Mf is the frequency domain signal of the decoded mono signal. sbOffsetf] defines again the offsets for the considered spectral bands, which may be the same as indicated above for the encoding. iccGaint is an adaptive gain that is initialized to a suitable value, for example to '6' at start up. Then, it is updated for the respective next frame as follows based on the energy (i) computed for the current frame in equation (9) :
iccGaint + 1 = MIW(6,O.3 • iccGa±nt + 0.7 • lccGaine) (10)
where iccGaint is the gain value of the current frame, iccGaint+1 is the gain value for the next frame, and
Figure imgf000030_0004
Equation (9) introduces a time-frequency dependent gain for the decorrelation to improve the perceptual quality. This is especially advantageous for signal frames or other signal segments in which the ICLDs have a relatively flat response.
It is thus checked in accordance with equation (8} whether ICLD is equal to 1 (action 535) . If this is not the case the modification takes place based on the mono signal Mf, the extension values icld(i)r the frequency band offsets sbOffsetf] and intermediate values from previous frames, namely decoder ICC values icc_dect-i (1) and the adaptive gain iccGaint (action 536) . Otherwise, the modification takes place based on the received extension values icld(i), the quantized ICC values iccq(±) and decoder ICC values icc_dect-! (i) from the previous frame (action 537) .
It has to be noted that the first option of equation (8) for the cases of icld(i) == 1 is only used in an implementation in which also ICC values are transmitted in case the ICLD values are equal to 1, otherwise the option is simply omitted. Detailed analysis of equation (8) shows, however, that decorrelation performs better with the first option whenever the value of icld(i) is equal to 1. Otherwise, icld(i) of value 1, indicating that the channel signals are similar in a level difference sense, would lead ice dect(i) in Equation (8) to zero and thus, no perceptually significant decorrelation contribution could be expected when applying equation (6) .
The original or modified stereo extension values (action 534, 536 or 537) are then used for reconstructing the multi-channel audio signal by up-mixing the decoded mono signal {action 522) .
Finally, the reconstructed multi-channel audio signal is transformed again into the time domain (action 541) and then presented to a user via audio out interface 126.
It has to be noted that it is not required that the ICC flags are transmitted together with the actual audio data.. They could also be transmitted separately from the other data.
Further, as mentioned before, a corresponding decorrelation could also be applied for an audio signal comprising more than two channels. In this case, one of the channels could be selected to be a reference channel, and correlation flags could indicate the correlation between a respective channel and this reference channel. Alternatively, correlation flags could indicate the correlation between any arbitrary pair of channels.
Furthermore, in an embodiment processing an audio signal comprising more than two channels more than one down- mixed signal could be generated and transmitted. In such a case a set of ICLD values, ICC flags, and possibly ICC value may be provided for each down-mixed signal separately.
Moreover, it has to be noted that ICLD cues and inter- channel correlation could also be computed in the time domain instead of the frequency domain. Furthermore, the presented approach could equally be employed for modifying other kinds of values representing a difference between channels than BCC ICLD cues and other inter- channel correlation information than BCC ICC cues.
Another variation of the presented approach may comprise a modified computation of the inter-channel level differences. An exemplary modified computation will be presented in the following for the case of a stereo audio signal .
First, the left and right channel input signals are converted to the frequency domain using a shifted discrete Fourier transform (SDFT) . The resulting complex- valued spectral samples are converted to the energy domain as follows:
Er (i) = fL (if + fL. (if, 0 < i < N
Figure imgf000033_0001
where fL and fR are the complex valued shifted discrete Fourier transform (SDFT) samples of the left and right channels, respectively, and N is the size of the frame.
Next, the energy level for each spectral subband is calculated according to:
O-TfSeU1 [i +l]-l eL(i) = ∑ EL(i), 0 < 2 < M j = offseti \i\
Figure imgf000033_0002
where offsetλ is a frequency offset table describing the frequency bin offsets for each spectral subband, and where M is the number of spectral subbands present in the region .
The inter-channel level differences can then be determined for different frequency bands in the form of stereo gain values gain{i) as follows:
Figure imgf000034_0001
^vI1Ci) - t%M + gR(1))
^^*« - igR (i) + gR(i»
o-f-fset2[i + l]-l ^L 00 = ∑ eχ(i) j = orfset2[i] offse C2 [i+ I]-I j=offset2[i]
where offset2 is the frequency offset table describing the frequency bin offsets for each spectral subband, where K is the number of spectral gain subbands present in the region, and where max ( ) and min() return the maximum and minimum of the specified samples, respectively.
These gain values may then correspond to inter-channel level differences, which are modified whenever an ICC status indicates that there is a low correlation between the channels.
Additional position values may indicate to which channel a respective gain value belongs. The position values may be post-processed to obtain a stable stereo image over time.
Figure 6 is a schematic diagram of an exemplary electronic device which supports a correlation status controlled modification of inter-channel level differences at an encoder side.
The electronic device 610 can be for instance a mobile phone, but equally any other device which is to be able to encode audio data for storage or transmission.
The device 610 comprises a processor 612 and, linked to this processor 612, a memory 613, an interface for receiving audio data 616, and a transmitter (TX) 617.
The processor 612 is configured to execute implemented computer program code .
The memory 613 stores computer program code 614, which may be retrieved by the processor 612 for execution. The stored program code 614 comprises code for encoding audio data. It includes code for generating a mono signal, for generating stereo extension values, for determining inter-channel correlation values and status, for modifying the stereo extension values under control of the ICC status, and for encoding the mono signal and the modified stereo extension values for storage or transmission. The memory 613 may comprise in addition a data storage portion 615.
The processor 612 and the memory 613 could optionally be integrated in a single component, for example on a chip 611. The interface 616 could comprise for instance a plurality of microphones or comprise a socket for connecting microphones .
The transmitter 617 could belong for example to a cellular engine of the device 610 and be configured to transmit data via a cellular communication network to other devices.
The operation of the encoder implementation of device 610 will be described in more detail with reference to the flow chart of Figure 7.
The operations can be considered to be realized by processor 712 when executing the code for encoding audio data 714 retrieved from memory 713 or by a corresponding hardware implementation.
When a multi-channel audio signal that is to be stored or transmitted is received by device 610 via audio interface 616, it is forwarded to the processor 612 for encoding. For reasons of simplicity, it will be assumed again that the multi-channel audio signal is a stereo signal.
The data of the received multi-channel audio signal is divided into subsequent frames, and the processing of the data that is described in the following is performed on a frame-by-frame basis.
The multi-channel audio signal is transformed into the frequency domain (action 701) .
The left and right channel signals are combined to a mono signal (action 711) , and the mono signal is encoded for transmission (action 712) .
For the stereo extension, each frequency domain frame is divided into M frequency bands.
The left and right channel signals are used for determining multi-channel extension values, including ICLD values for each frequency band (action 721) .
Moreover, the inter-channel correlation is calculated for each spectral band in accordance with above equations (1) and (2) (action 731) .
Next, a final ICC value for each band is obtained in accordance with above equation (3) (action 732) .
In case the ICLD cue for the current frame and spectral band is unequal to 1 (action 733) , the extension values are modified based on the mono signal values under control of the ICC status in accordance with above equations (6)-(10) (action 734) . The ICC status could be determined to this end in accordance with above equation (4) . It is to be understood, however, that generating separate ICC flags is only optional, since no flags have to be transmitted in this case.
In case the ICLD cue for the current frame and spectral band is equal to 1 (action 733), the extension values are modified based on the final ICC values under control of the ICC status in accordance with above equations (6) -(8) (action 735) . The ICC status can be determined for this case in accordance with above equation (4) using for example a threshold value of 0.5 instead of 0.75 in equation (3} . A quantization of the ICC values may not be required in this case, since they are not necessarily transmitted. Thus, the quantized values ±ccq in equation (8) could simply be replaced by the final ICC values icct obtained with equation (3) . Alternatively, the quantized values iccq in equation (8) could be replaced by the ICC values icct(i) determined in accordance with equation (1) .
The modified multi-channel extension values are encoded (action 722) .
The encoded mono signal and the encoded modified stereo extension information are multiplexed to a bitstream for transmission or storage (action 741) .
Some decoders may then decode the encoded data in a conventional manner without applying any further decorrelation processing.
Using a correlation/coherence processing in a parametric multi-channel audio coding process may result in an improved user-experience due to enhanced spatial sensation. Some embodiments of the invention allow reducing the correlation between channels that are derived from the mono signal by modifying values representing a difference between channels, for instance values representing a level difference. As a result, correlation between the channels better approximates that of the original stereo signal, thus improving the feeling of spaciousness. Certain embodiments of the invention further allow improving the naturalness and subjective audio quality of a low bit-rate multi-channel audio coding system by using an improved and effective transmission or storage and by processing the correlation/coherence information in a way exploiting the data from previous frames. Some embodiments may also be suited to improve the multi-channel audio quality across a wide range of signals.
Certain embodiments of the invention using information about a correlation status as a decision criterion whether to modify a value representing a difference between channels in the segment of the multi-channel audio signal ensure that the amount of decorrelation processing is reduced compared to an approach in which a decorrelation processing is performed in any case.
If the modification is performed at the decoder, certain embodiments further ensure that the actual correlation values have to be provided at the most when a decorrelation is appropriate. Such embodiments thus enable a particularly low bitrate coding where only limited bits are available for the coding of the correlation information. The lowest amount of data has to be provided if the correlation status information is encoded as a single bit, the association to frequency bands is predetermined, and the actual correlation values are never provided. However, providing an association to frequency bands as additional information may render some embodiments more flexible, since the association may change in this case from segment to segment of the audio signal. Providing the actual correlation values in selected cases may further improve the decorrelation without unduly increasing the amount of required side information. For example, the transmission of ICC values may be limited to a few cases, while otherwise only a one bit status may be transmitted. If the modification is performed at the encoder, certain embodiments ensure that less side information has to be provided to a decoder and that a decoder which does not support decorrelation processing at all could be employed.
Certain embodiments ensure that only deterministic values are used in the modification instead of random numbers. This ensures that the decorrelation procedure can be adapted better to the concrete spatial situation.
It is to be understood that any presented connection is to be understood in a way that the involved components are operationally coupled. Thus, the connections can be direct or indirect with any number or combination of intervening elements, and there may be merely a functional relationship between the components.
Further, any of the mentioned processors could be of any suitable type, for example a computer processor, an application-specific integrated circuit (ASIC), etc. Any of the mentioned memories could be implemented as a single memory or as a combination of a plurality of distinct memories, and may comprise for example a read- only memory, a flash memory or a hard disc drive memory etc. Furthermore, any other hardware components that have been programmed in such a way to carry out the described functions could be employed as well.
Moreover, any of the actions described or illustrated herein may be implemented using executable instructions in a general-purpose or special-purpose processor and stored on a computer-readable storage medium (e.g., disk, memory, or the like) to be executed by such a processor. References to 'computer-readable storage medium' should be understood to encompass specialized circuits such as field-programmable gate arrays, application-specific integrated circuits (ASICs), signal processing devices, and other devices.
The functions illustrated by the combination of processor 122 and memory 123, by the decorrelation block 306 or by the combination of processor 612 and memory 613 can be viewed as means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multichannel audio signal; and as means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
The program codes 124 or 614 can also be viewed as comprising such means in the form of functional modules.
While there have been shown and described and pointed out fundamental novel features of the invention as applied to preferred embodiments thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the spirit of the invention. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto. Furthermore, in the claims means-plus-function clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures.

Claims

What is claimed is:
1. A method comprising: evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
2. The method according to claim 1, wherein the correlation status information is represented by a single bit.
3. The method according to claim 1 or 2, wherein the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.
4. The method according to one of claims 1 to 3, wherein the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.
5. The method according to one of claims 1 to 4, wherein the modifying of a value representing a difference between channels is based on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.
6. The method according to one of claims 1 to 5, wherein the modifying of a value representing a difference between channels is based on a correlation value indicating the amount of correlation between the channels, in case the value of the level difference indicates a similar level of the channels.
7. The method according to one of claim 1 to 6, wherein the modifying of a value representing a difference between channels is performed at an encoder side, which generates the correlation status information and the value representing a difference between the channels.
8. The method according to one of claim 1 to 6, wherein the modifying of a value representing a difference between channels is performed at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.
9. The method according to claim 8, comprising obtaining at the decoder side in addition information on frequency bands for which the correlation status information is valid.
10. An apparatus comprising a processor, the processor configured to evaluate correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and the processor configured to modify a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
11. The apparatus according to claim 10, wherein the correlation status information is represented by a single bit.
12. The apparatus according to claim 10 or 11, wherein the processor is configured to modify a value representing a difference between channels based on equations using exclusively non-random values .
13. The apparatus according to one of claims 10 to 12, wherein the processor is configured to modify a value representing a difference between channels based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi- channel audio signal.
14. The apparatus according to one of claims 10 to 13, wherein the processor is configured to modify a value representing a difference between channels based on a value representing a mono audio signal, in case the value of the difference between channels in the segment of the multi-channel audio signal indicates an dissimilar level of a signal in the channels.
15. The apparatus according to one of claims 10 to 14, wherein the processor is configured to modify a value representing a difference between channels based on a correlation value indicating the amount of correlation between the channels, in case the value of the level difference indicates a similar level of the channels.
16. The apparatus according to one of claim 10 to 15, wherein the processor is configured to modify a value representing a difference between channels at an encoder side and to generate the correlation status information and the value representing a difference between the channels.
17. The apparatus according to one of claim 10 to 15, wherein the processor is configured to modify a value representing a difference between channels at a decoder side, which is provided with correlation status information and a value representing a difference between channels generated by an encoder side.
18. The apparatus according to claim 17, wherein the processor is configured to obtain at the decoder side in addition information on frequency bands for which the correlation status information is valid.
19. An electronic device comprising: an apparatus according to one of claims 10 to 18; and an interface configured to output multi-channel audio signals.
20. An electronic device comprising: an apparatus according to one of claims 10 to 18; and an interface configured to receive captured multi- channel audio signals.
21. A computer program code realizing the following when executed by a processor: evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
22. The computer program code according to claim 21, wherein the correlation status information is represented by a single bit.
23. The computer program code according to claim 21 or 22, wherein the modifying of a value representing a difference between channels is based on equations using exclusively non-random values.
24. The computer program code according to claim 21 or 23, wherein the modifying of a value representing a difference between channels is based on a value obtained in a modification of a value representing a difference between channels performed for one or more preceding segments of the multi-channel audio signal.
25. The computer program code according to one of claim 21 to 24, wherein the computer program code is a computer program code for an encoder side generating the correlation status information and the value representing a difference between the channels.
26. The computer program code according to one of claim 21 to 24, wherein the computer program code is a computer program code for a decoder side which is provided with the correlation status information and the value representing a difference between channels generated by an encoder side.
27. A computer readable storage medium in which computer program code according to one of claims 21 to 26 is stored.
28. An apparatus comprising: means for evaluating correlation status information, wherein the correlation status information indicates whether or not there is a significant correlation between channels in a segment of a multi-channel audio signal; and means for modifying a value representing a difference between channels in the segment of the multi-channel audio signal in case the correlation status information indicates that there is no significant correlation between the channels.
PCT/EP2008/056813 2008-06-03 2008-06-03 Multi-channel audio coding WO2009146734A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/056813 WO2009146734A1 (en) 2008-06-03 2008-06-03 Multi-channel audio coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/056813 WO2009146734A1 (en) 2008-06-03 2008-06-03 Multi-channel audio coding

Publications (1)

Publication Number Publication Date
WO2009146734A1 true WO2009146734A1 (en) 2009-12-10

Family

ID=40351784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/056813 WO2009146734A1 (en) 2008-06-03 2008-06-03 Multi-channel audio coding

Country Status (1)

Country Link
WO (1) WO2009146734A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233684A (en) * 2015-03-09 2021-01-15 弗劳恩霍夫应用研究促进协会 Apparatus and method for encoding or decoding multi-channel signal
WO2022247651A1 (en) * 2021-05-28 2022-12-01 华为技术有限公司 Encoding method and apparatus for multi-channel audio signals

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004030410A1 (en) * 2002-09-26 2004-04-08 Koninklijke Philips Electronics N.V. Method for processing audio signals and audio processing system for applying this method
EP1814104A1 (en) * 2004-11-30 2007-08-01 Matsushita Electric Industrial Co., Ltd. Stereo encoding apparatus, stereo decoding apparatus, and their methods
EP1914722A1 (en) * 2004-03-01 2008-04-23 Dolby Laboratories Licensing Corporation Multichannel audio decoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004030410A1 (en) * 2002-09-26 2004-04-08 Koninklijke Philips Electronics N.V. Method for processing audio signals and audio processing system for applying this method
EP1914722A1 (en) * 2004-03-01 2008-04-23 Dolby Laboratories Licensing Corporation Multichannel audio decoding
EP1814104A1 (en) * 2004-11-30 2007-08-01 Matsushita Electric Industrial Co., Ltd. Stereo encoding apparatus, stereo decoding apparatus, and their methods

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ENGDEGARD J ET AL: "Synthetic ambience in parametric stereo coding", AUDIO ENGINEERING SOCIETY CONVENTION PAPER, NEW YORK, NY, US, 8 May 2004 (2004-05-08), pages 1 - 12, XP002347433 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233684A (en) * 2015-03-09 2021-01-15 弗劳恩霍夫应用研究促进协会 Apparatus and method for encoding or decoding multi-channel signal
CN112233684B (en) * 2015-03-09 2024-03-19 弗劳恩霍夫应用研究促进协会 Apparatus and method for encoding or decoding multi-channel signal
WO2022247651A1 (en) * 2021-05-28 2022-12-01 华为技术有限公司 Encoding method and apparatus for multi-channel audio signals

Similar Documents

Publication Publication Date Title
AU2016325879B2 (en) Method and system for decoding left and right channels of a stereo sound signal
US11756556B2 (en) Audio encoding device, method and program, and audio decoding device, method and program
US8046214B2 (en) Low complexity decoder for complex transform coding of multi-channel sound
EP1749296B1 (en) Multichannel audio extension
CN101128866B (en) Optimized fidelity and reduced signaling in multi-channel audio encoding
JP4934427B2 (en) Speech signal decoding apparatus and speech signal encoding apparatus
US9275648B2 (en) Method and apparatus for processing audio signal using spectral data of audio signal
US8170871B2 (en) Signal coding and decoding
US20060013405A1 (en) Multichannel audio data encoding/decoding method and apparatus
US9167367B2 (en) Optimized low-bit rate parametric coding/decoding
JP2022126688A (en) Support for generation of comfort noise
CN103329197A (en) Improved stereo parametric encoding/decoding for channels in phase opposition
DK2697795T3 (en) ADAPTIVE SHARING Gain / FORM OF INSTALLMENTS
KR20180125475A (en) Multi-channel coding
EP3703050B1 (en) Audio encoding method and related product
KR102492791B1 (en) Time-domain stereo coding and decoding method and related product
KR101387808B1 (en) Apparatus for high quality multiple audio object coding and decoding using residual coding with variable bitrate
WO2009146734A1 (en) Multi-channel audio coding
KR20170047361A (en) Method and apparatus for coding or decoding subband configuration data for subband groups
US20210027794A1 (en) Method and system for decoding left and right channels of a stereo sound signal
WO2024052450A1 (en) Encoder and encoding method for discontinuous transmission of parametrically coded independent streams with metadata
WO2024051955A1 (en) Decoder and decoding method for discontinuous transmission of parametrically coded independent streams with metadata
WO2017148526A1 (en) Audio signal encoder, audio signal decoder, method for encoding and method for decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08760397

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08760397

Country of ref document: EP

Kind code of ref document: A1